Vector Calculus

Introduction to Vectors in Euclidean Space

1.1 Introduction

🧭 Overview

🧠 One-sentence thesis

Vectors extend single-variable calculus into multivariable calculus by representing quantities that have both magnitude and direction in 2-dimensional and 3-dimensional Euclidean space.

📌 Key points (3–5)

From 1D to 3D: Single-variable calculus uses functions on the real line (R), while multivariable calculus uses functions in the Euclidean plane (R²) and Euclidean space (R³).
What a vector is: A directed line segment with both magnitude (length) and direction, motivated by physical quantities like velocity that require more than a single number.
Coordinate systems: R³ uses a right-handed coordinate system with three mutually perpendicular axes (x, y, z) and three coordinate planes.
Common confusion: A vector vs. a point—vectors have direction and can be translated; points are fixed locations; however, vectors starting at the origin correspond one-to-one with their terminal points.
Magnitude formulas: The length of a vector v = (a, b) in R² is √(a² + b²); in R³, v = (a, b, c) has length √(a² + b² + c²).

📐 Dimensional progression

📏 Single-variable calculus (R)

Functions of one variable: y = f(x) where x varies over the real number line R.
Graphs consist of points (x, y) = (x, f(x)) in the Euclidean plane.
Velocity is just a signed number: positive or negative direction plus a magnitude.

🗺️ Two-dimensional space (R²)

Euclidean plane (R²): All ordered pairs of real numbers (a, b) in a Cartesian coordinate system with two perpendicular axes (x and y).

Functions of two variables: z = f(x, y).
The "2" in R² represents the number of dimensions.
Graphs of such functions lie in 3-dimensional space R³.

🌐 Three-dimensional space (R³)

Euclidean space (R³): All ordered triples of real numbers (a, b, c) in a Cartesian coordinate system with three mutually perpendicular axes (x, y, z).

Functions of two variables have graphs as points (x, y, z) = (x, y, f(x, y)) in R³.
Three mutually perpendicular coordinate planes: xy-plane, yz-plane, xz-plane.
Can only be represented on flat surfaces (paper, blackboard) by creating the illusion of three dimensions.

🚫 Four-dimensional space (R⁴)

Functions of three variables would have graphs in R⁴.
Cannot be visualized in our 3-dimensional space or simulated in 2-dimensional drawings.
Must be thought of abstractly.

🧭 Coordinate system handedness

🖐️ Right-handed coordinate system

Index finger test: Point index finger along positive x-axis, middle finger along positive y-axis, thumb along positive z-axis—this is possible with the right hand.
Rotation test: Point thumb upward along positive z-axis while using remaining four fingers to rotate x-axis toward y-axis.
This book uses right-handed systems throughout.

🔄 Left-handed vs. right-handed

Switching x- and y-axes in a right-handed system produces a left-handed system.
Rotating either type of system does not change its handedness.
The choice affects how certain operations (like cross products) are defined.

🎯 Motivation for vectors

🚗 Beyond position: motion and force

Position of an object can be described by coordinates.
Velocity, acceleration, and gravitational force involve both motion and direction.
A single number is insufficient to describe these phenomena in 2D or 3D space.

➕ Velocity in 1D revisited

For motion along a straight line, velocity f′(t) = ±a has two components:
- Magnitude: the nonnegative number a (called speed).
- Direction: the sign ± (positive or negative direction).
For motion along a curve in 2D or 3D, velocity needs a multidimensional representation.

➡️ Arrows as geometric objects

An arrow (directed line segment) naturally has both magnitude (length) and direction.
This geometric object motivates the formal definition of a vector.

📦 Vector definitions

📍 What a vector is

Vector: A directed line segment drawn from an initial point P to a terminal point Q (with P and Q distinct). Denoted by PQ with an arrow. The magnitude is the length ‖PQ‖, and the direction is that of the directed line segment.

Zero vector (0): Just a point; has magnitude ‖0‖ = 0; direction is not defined (neither arbitrary, indeterminate, nor "none"—simply not required by the definition).

Applies to any number of dimensions.
Magnitude and length are used interchangeably.
Often denoted by a single boldface letter (e.g., v).

⚖️ When two vectors are equal

Vector equality: Two nonzero vectors are equal if they have the same magnitude and the same direction. Any vector with zero magnitude equals the zero vector.

Vectors with the same magnitude and direction but different initial points are equal.
Example: Vectors on parallel lines with the same length and pointing the same way are equal.
Don't confuse: Parallel vectors pointing in opposite directions are not equal.

🎯 Standard representation: vectors from the origin

Infinitely many equal vectors exist (differing only by initial/terminal points).
Convention: "The vector" with given magnitude and direction means the one starting at the origin.
Advantages:
- Every coordinate system has an origin.
- Easy correspondence between vectors and points.
- Standard way to compare vectors.

🔗 Point-vector correspondence

A vector v in R³ with initial point at origin and terminal point (3, 4, 5) is written v = (3, 4, 5).
This notation means: initial point is (0, 0, 0), terminal point is (3, 4, 5).
The zero vector: 0 = (0, 0) in R² and 0 = (0, 0, 0) in R³.
Don't confuse: The point (3, 4, 5) and the vector (3, 4, 5) are different objects, but the notation creates a useful correspondence.

🧮 Checking vector equality

🔄 Translation method

To check if two vectors are equal without computing magnitude and direction:

Translate each vector to start at the origin.
Subtract coordinates: New terminal point = original terminal point − original initial point.
Compare terminal points: If the coordinates match, the original vectors are equal.

📝 Example walkthrough

For PQ with P = (2, 1, 5), Q = (3, 5, 7) and RS with R = (1, −3, −2), S = (2, 1, 0):

Vector	Translation calculation	Result
PQ	Q − P = (3, 5, 7) − (2, 1, 5) = (1, 4, 2)	v = (1, 4, 2)
RS	S − R = (2, 1, 0) − (1, −3, −2) = (1, 4, 2)	w = (1, 4, 2)

Since v = w, we conclude PQ = RS.

📏 Magnitude formulas

📐 In R² (two dimensions)

Distance formula: For points P = (x₁, y₁) and Q = (x₂, y₂), distance d = √[(x₂ − x₁)² + (y₂ − y₁)²]

Vector magnitude: For vector PQ, ‖PQ‖ = √[(x₂ − x₁)² + (y₂ − y₁)²]

Standard form: For v = (a, b), ‖v‖ = √(a² + b²)

The standard form is a special case with P = (0, 0) and Q = (a, b).

📦 In R³ (three dimensions)

Distance formula (Theorem 1.1): For points P = (x₁, y₁, z₁) and Q = (x₂, y₂, z₂), distance d = √[(x₂ − x₁)² + (y₂ − y₁)² + (z₂ − z₁)²]

Vector magnitude (Theorem 1.2): For v = (a, b, c), ‖v‖ = √(a² + b² + c²)

🔍 Proof strategy for Theorem 1.2

The proof considers four exhaustive cases:

Case	Condition	Method
1	a = b = c = 0	Direct: ‖v‖ = 0 = √(0² + 0² + 0²)
2	Exactly two are zero	Vector lies along one axis; use absolute value
3	Exactly one is zero	Vector lies in a coordinate plane; use 2D Pythagorean Theorem
4	None are zero	Apply Pythagorean Theorem twice to right triangles in 3D

Example for Case 4: For v = (a, b, c) with all positive, construct right triangles to show ‖v‖² = a² + b² + c².

🧪 Sample calculations

‖(2, −1)‖ in R² = √(4 + 1) = √5
‖(8, 3)‖ in R² = √(64 + 9) = √73
Distance from (2, −1, 4) to (4, 2, −3) in R³ = √(4 + 9 + 49) = √62
‖(5, 8, −2)‖ in R³ = √(25 + 64 + 4) = √93

Vector Algebra

1.2 Vector Algebra

🧭 Overview

🧠 One-sentence thesis

Vector algebra provides systematic rules for adding, subtracting, and scaling vectors through both geometric and coordinate-based methods, enabling any vector to be expressed uniquely as a combination of basis vectors.

📌 Key points (3–5)

Scalar multiplication: stretches or shrinks a vector by a factor k; negative scalars flip direction.
Vector addition: translating the second vector to start at the terminal point of the first; subtraction is adding the negative.
Coordinate formulas: in R² and R³, operations work component-wise: k(v₁, v₂) = (kv₁, kv₂) and v + w = (v₁ + w₁, v₂ + w₂).
Basis vectors: i, j, k are mutually perpendicular unit vectors; every vector can be written uniquely as a combination of them (component form).
Common confusion: geometric vs analytic proofs—geometric proofs use diagrams and translation; analytic proofs use coordinate arithmetic and properties of real numbers.

🔢 Scalar multiplication and parallel vectors

🔢 What scalar multiplication does

Scalar multiplication of a vector: stretching or shrinking the vector; flipping it in the opposite direction if the scalar is negative.

Multiplying vector v by scalar k produces kv.
If k > 1, the vector stretches; if 0 < k < 1, it shrinks.
If k < 0, the vector points in the opposite direction.
Special case: k = 0 gives the zero vector 0, and for the zero vector, k·0 = 0 for any scalar k.

↔️ Parallel vectors

Two vectors v and w are parallel (v ∥ w) if one is a scalar multiple of the other.

Parallel means they lie on the same line (or parallel lines if translated).
Example: v and 2v are parallel; v and −v are parallel but point in opposite directions.

➕ Vector addition and subtraction

➕ How to add vectors geometrically

The sum v + w is obtained by translating w so its initial point is at the terminal point of v; the sum starts at v's initial point and ends at w's new terminal point.

Intuitively: "tack w onto the end of v."
The zero vector acts as an additive identity: v + 0 = v = 0 + v.
Adding a vector to its negative gives zero: v + (−v) = 0.

➖ Vector subtraction

Vector subtraction: v − w = v + (−w).

First form −w (flip w's direction), then add it to v.
Geometric interpretation: v − w is the vector that, when added to w, gives v (see Figure 1.2.4c in the excerpt).
Don't confuse: v − w is not the same as w − v; they point in opposite directions.

🔄 Commutativity and associativity

Commutative law: v + w = w + v (order doesn't matter).
Associative law: u + (v + w) = (u + v) + w (grouping doesn't matter).
These laws can be proved geometrically (using diagrams) or analytically (using coordinates).

📐 Coordinate formulas in R² and R³

📐 Component-wise operations

The excerpt provides two key theorems for vectors starting at the origin:

Operation	R² formula	R³ formula
Scalar multiplication	k(v₁, v₂) = (kv₁, kv₂)	k(v₁, v₂, v₃) = (kv₁, kv₂, kv₃)
Vector addition	(v₁, v₂) + (w₁, w₂) = (v₁ + w₁, v₂ + w₂)	(v₁, v₂, v₃) + (w₁, w₂, w₃) = (v₁ + w₁, v₂ + w₂, v₃ + w₃)

Each coordinate is handled independently.
Example: 3(2, 1, −1) = (6, 3, −3); (2, 1, −1) + (3, −4, 2) = (5, −3, 1).

🧮 Proof methods: geometric vs analytic

Geometric proof: uses diagrams, translation, and properties from elementary geometry (e.g., showing v + w = w + v by drawing parallelograms).
Analytic proof: uses coordinate formulas and properties of real numbers (e.g., proving associativity by expanding coordinates and regrouping).
Both methods are valid; the excerpt illustrates both for the associative law.

🧭 Basis vectors and component form

🧭 What basis vectors are

Basis vectors: i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1) in R³; i = (1, 0), j = (0, 1) in R².

They are mutually perpendicular (lie on distinct coordinate axes).
They are all unit vectors: their magnitude is 1.
Every vector can be written uniquely as a scalar combination of them.

📝 Component form

Component form: writing v = (a, b, c) as v = a·i + b·j + c·k, where a, b, c are the i, j, k components.

Example: (2, 1, −1) = 2i + j − k.
Operations in component form:
- Scalar multiplication: k(v₁i + v₂j + v₃k) = kv₁i + kv₂j + kv₃k.
- Addition: (v₁i + v₂j + v₃k) + (w₁i + w₂j + w₃k) = (v₁ + w₁)i + (v₂ + w₂)j + (v₃ + w₃)k.
- Magnitude: ‖v₁i + v₂j + v₃k‖ = √(v₁² + v₂² + v₃²).

🎯 Unit vectors and normalization

A unit vector is a vector with magnitude 1.

For any nonzero vector v, the vector v/‖v‖ is a unit vector pointing in the same direction as v.
Normalizing v means dividing it by its magnitude.
Example: if v = (2, 1, −1), then ‖v‖ = √(4 + 1 + 1) = √6, so v/‖v‖ = (2/√6, 1/√6, −1/√6).

📏 Connection to distance

📏 Distance between two points

The distance d between points P = (x₁, y₁, z₁) and Q = (x₂, y₂, z₂) equals the length of the vector w − v, where v = (x₁, y₁, z₁) and w = (x₂, y₂, z₂).
Formula: d = ‖w − v‖ = √((x₂ − x₁)² + (y₂ − y₁)² + (z₂ − z₁)²).
This proves the distance formula from the previous section using vector algebra.

📋 Summary of vector algebra laws

The excerpt lists the basic laws (Theorem 1.5):

Law	Statement	Name
(a)	v + w = w + v	Commutative Law
(b)	u + (v + w) = (u + v) + w	Associative Law
(c)	v + 0 = v = 0 + v	Additive Identity
(d)	v + (−v) = 0	Additive Inverse
(e)	k(lv) = (kl)v	Associative Law (scalars)
(f)	k(v + w) = kv + kw	Distributive Law
(g)	(k + l)v = kv + lv	Distributive Law

These laws hold for all vectors u, v, w and all scalars k, l.
They can be proved either geometrically or analytically (using coordinates).

Dot Product

1.3 Dot Product

🧭 Overview

🧠 One-sentence thesis

The dot product connects the algebraic multiplication of vector components to the geometric angle between vectors, enabling us to determine perpendicularity and measure alignment.

📌 Key points (3–5)

What the dot product is: a way to multiply two vectors that produces a scalar (not a vector) by summing products of corresponding components.
Geometric meaning: the dot product equals the product of the vectors' magnitudes times the cosine of the angle between them.
Perpendicularity test: two nonzero vectors are perpendicular if and only if their dot product is zero.
Common confusion: the dot product is not associative like ordinary multiplication—(u · v) · w is undefined because u · v is a scalar, not a vector.
Sign tells angle type: positive dot product means acute angle, negative means obtuse, zero means right angle.

🔢 Definition and basic computation

🔢 How to compute the dot product

Dot product: For vectors v = (v₁, v₂, v₃) and w = (w₁, w₂, w₃) in R³, the dot product v · w = v₁w₁ + v₂w₂ + v₃w₃. Similarly, in R², v · w = v₁w₁ + v₂w₂.

Multiply corresponding components, then add all the products together.
The result is always a scalar (a single number), not a vector.
Example: For v = (2, 1, −1) and w = (3, −4, 1), the dot product is (2)(3) + (1)(−4) + (−1)(1) = 6 − 4 − 1 = 1.

⚠️ Why associativity fails

For ordinary numbers: (a × b) × c is defined because a × b is a number.
For dot products: (u · v) · w is not defined because u · v is a scalar, and you cannot take the dot product of a scalar with a vector.
Don't confuse: the dot product obeys commutativity (v · w = w · v) and distributivity, but not associativity.

📐 Geometric interpretation

📐 Angle between vectors

Angle between vectors: For nonzero vectors with the same initial point, the angle θ is the smallest nonnegative angle between them, where 0° ≤ θ ≤ 180°.

We always choose the smaller of the two possible angles.
The zero vector has no defined angle with any other vector.

📐 The fundamental relationship

The dot product connects algebra to geometry:

cos θ = (v · w) / (‖v‖ ‖w‖)

This formula lets you find the angle between vectors using only their components.
Proof sketch (from the excerpt): Apply the Law of Cosines to the triangle formed by v, w, and v − w. Expand ‖v − w‖² using components, simplify, and isolate cos θ.
Example: For v = (2, 1, −1) and w = (3, −4, 1), we have v · w = 1, ‖v‖ = √6, ‖w‖ = √26, so cos θ = 1/(√6 √26) ≈ 0.08, giving θ ≈ 85.41°.

⊥ Perpendicularity and angle classification

⊥ Testing for perpendicular vectors

Corollary: Two nonzero vectors v and w are perpendicular if and only if v · w = 0.

Why: cos 90° = 0, so the geometric formula gives v · w = 0.
Notation: v ⊥ w means v and w are perpendicular.
Example: v = (−1, 5, −2) and w = (3, 1, 1) are perpendicular because v · w = (−1)(3) + (5)(1) + (−2)(1) = −3 + 5 − 2 = 0.

📊 Sign of the dot product

The sign tells you the type of angle:

Angle range	Dot product sign	Reason
0° ≤ θ < 90° (acute)	v · w > 0	cos θ > 0 in this range
θ = 90° (right)	v · w = 0	cos 90° = 0
90° < θ ≤ 180° (obtuse)	v · w < 0	cos θ < 0 in this range

The dot product acts as a quick classifier: you don't need to compute the angle explicitly to know if it's acute, right, or obtuse.

🌐 Perpendicularity to a span

If u ⊥ v and u ⊥ w, then u ⊥ (kv + lw) for all scalars k, l.

Span: the collection of all scalar combinations kv + lw.
If v and w are nonzero and parallel, their span is a line; if not parallel, their span is a plane.
Interpretation: a vector perpendicular to two vectors is also perpendicular to every vector in their span.
Why: u · (kv + lw) = k(u · v) + l(u · w) = k(0) + l(0) = 0 by the distributive law.

🧮 Properties and inequalities

🧮 Algebraic properties (Theorem 1.9)

Property	Statement	Name
(a)	v · w = w · v	Commutative Law
(b)	(kv) · w = v · (kw) = k(v · w)	Associative Law (with scalars)
(c)	v · 0 = 0 = 0 · v	Zero property
(d)	u · (v + w) = u · v + u · w	Distributive Law
(e)	(u + v) · w = u · w + v · w	Distributive Law
(f)	\|v · w\| ≤ ‖v‖ ‖w‖	Cauchy-Schwarz Inequality

Properties (a)–(e) follow directly from the component definition.
Don't confuse: property (b) is about scalars multiplying vectors, not about associativity of dot products with other vectors.

📏 Cauchy-Schwarz Inequality

|v · w| ≤ ‖v‖ ‖w‖

This says the absolute value of the dot product never exceeds the product of the magnitudes.
Proof sketch (from the excerpt): For nonzero vectors, v · w = cos θ ‖v‖ ‖w‖, so |v · w| = |cos θ| ‖v‖ ‖w‖ ≤ ‖v‖ ‖w‖ because |cos θ| ≤ 1.
Equality holds when θ = 0° or 180°, i.e., when the vectors are parallel.

📏 Magnitude and Triangle Inequality (Theorem 1.10)

Key results:

(a) ‖v‖² = v · v (magnitude squared equals dot product with itself)
(b) ‖v + w‖ ≤ ‖v‖ + ‖w‖ (Triangle Inequality)
(c) ‖v − w‖ ≥ |‖v‖ − ‖w‖|

Triangle Inequality interpretation:

In any triangle, no side is longer than the sum of the other two sides.
Equivalently: "the shortest distance between two points is a straight line."
Proof sketch (from the excerpt): Expand ‖v + w‖² = (v + w) · (v + w) = ‖v‖² + 2(v · w) + ‖w‖². Use v · w ≤ |v · w| and the Cauchy-Schwarz Inequality to get ‖v + w‖² ≤ (‖v‖ + ‖w‖)², then take square roots.

🔍 Summary of key distinctions

🔍 Analytic vs geometric view

Analytic definition: v · w = v₁w₁ + v₂w₂ + v₃w₃ (uses coordinates).
Geometric formula: v · w = ‖v‖ ‖w‖ cos θ (uses magnitudes and angle).
Both are equivalent; the excerpt derives the geometric formula from the analytic definition using the Law of Cosines.

🔍 Dot product vs scalar multiplication

Scalar multiplication (kv): multiplies a vector by a number, produces a vector.
Dot product (v · w): multiplies two vectors, produces a scalar.
Don't confuse: (kv) · w is defined (scalar times vector, then dot product), but (u · v) · w is not (scalar cannot be dotted with a vector).

Cross Product

1.4 Cross Product

🧭 Overview

🧠 One-sentence thesis

The cross product provides a way to multiply two vectors in R³ to produce a new vector that is perpendicular to both original vectors, enabling geometric calculations of areas, volumes, and perpendicularity.

📌 Key points (3–5)

What the cross product produces: Unlike the dot product (which yields a scalar), the cross product of two vectors in R³ yields another vector.
Perpendicularity property: The cross product v × w is perpendicular to both v and w (when nonzero), and its direction follows the right-hand rule.
Geometric applications: The magnitude ‖v × w‖ equals the area of the parallelogram formed by v and w; the scalar triple product u · (v × w) gives the volume of a parallelepiped.
Common confusion: The cross product is anticommutative (v × w = −w × v), not commutative like the dot product; also, v × w = 0 if and only if v and w are parallel.
Determinant representation: The cross product can be computed as a 3×3 determinant with i, j, k in the first row, making calculations more systematic.

🧮 Definition and basic computation

🧮 The cross product formula

Cross product: For vectors v = (v₁, v₂, v₃) and w = (w₁, w₂, w₃) in R³, the cross product v × w is the vector (v₂w₃ − v₃w₂, v₃w₁ − v₁w₃, v₁w₂ − v₂w₁).

This formula applies only in R³; the cross product is not defined in R² or other dimensions.
The result is always a vector, not a scalar.
Example: i × j = ((0)(0) − (0)(1), (0)(0) − (1)(0), (1)(1) − (0)(0)) = (0, 0, 1) = k.

🔄 Standard basis cross products

The excerpt shows that the standard basis vectors have simple cross products:

i × j = k, j × k = i, k × i = j (cyclic pattern).
j × i = −k, k × j = −i, i × k = −j (reverse order gives negatives).
i × i = j × j = k × k = 0 (any vector crossed with itself is zero).

⊥ Perpendicularity and direction

⊥ Perpendicular to both input vectors

Theorem 1.11: If v × w is nonzero, then it is perpendicular to both v and w.

The proof shows (v × w) · v = 0 by expanding the dot product and observing all terms cancel.
Similarly, (v × w) · w = 0.
Corollary 1.12: v × w is perpendicular to the span of v and w (i.e., perpendicular to the plane containing v and w).

🖐️ Right-hand rule for direction

Since v × w is perpendicular to the plane of v and w, there are two possible directions (opposite to each other).
The excerpt states that v × w follows the right-hand rule: point your thumb in the direction of v × w while rotating v toward w with your remaining fingers.
The vectors v, w, v × w form a right-handed system.
Don't confuse: w × v points in the opposite direction (w × v = −(v × w)).

📏 Magnitude and geometric meaning

📏 Magnitude formula

For nonzero vectors v and w with angle θ between them:

‖v × w‖ = ‖v‖ ‖w‖ sin θ

The excerpt derives this by expanding ‖v × w‖² algebraically and relating it to ‖v‖² ‖w‖² − (v · w)².
Since v · w = ‖v‖ ‖w‖ cos θ, this simplifies to ‖v‖² ‖w‖² sin² θ.
Because 0° ≤ θ ≤ 180°, sin θ ≥ 0, so taking the square root gives the formula.

📐 Area of parallelograms and triangles

Theorem 1.13:

(a) The area of a triangle with adjacent sides v, w is A = ½ ‖v × w‖.
(b) The area of a parallelogram with adjacent sides v, w is A = ‖v × w‖.

Why this works:

A parallelogram with sides v and w has base ‖v‖ and height ‖w‖ sin θ.
Area = ‖v‖ · ‖w‖ sin θ = ‖v × w‖.
A triangle is half the parallelogram, so its area is ½ ‖v × w‖.

Example: To find the area of triangle PQR with vertices P = (2, 4, −7), Q = (3, 7, 18), R = (−5, 12, 8):

Let v = PQ = (1, 3, 25) and w = PR = (−7, 8, 15).
v × w = (−155, −190, 29).
Area = ½ √(60966) ≈ 123.46.

📦 Handling 2D problems

For a parallelogram in R², treat the vectors as lying in the xy-plane by adding a zero z-coordinate.
Example: v = (−3, −1) becomes (−3, −1, 0); w = (1, 2) becomes (1, 2, 0).
Then v × w = (0, 0, −5), so area = ‖(0, 0, −5)‖ = 5.

🔧 Algebraic properties

🔧 Key properties (Theorem 1.14)

Property	Formula	Name
(a)	v × w = −w × v	Anticommutative
(b)	u × (v + w) = u × v + u × w	Distributive
(c)	(u + v) × w = u × w + v × w	Distributive
(d)	(kv) × w = v × (kw) = k(v × w)	Associative with scalars
(e)	v × 0 = 0 = 0 × v	Zero vector
(f)	v × v = 0	Self-cross is zero
(g)	v × w = 0 ⟺ v ∥ w	Parallel condition

🔀 Anticommutativity (property a)

The proof shows v × w = −(w × v) by expanding both sides using the definition.
Geometrically: v × w and w × v have the same magnitude but opposite directions.
Don't confuse with the dot product, which is commutative (v · w = w · v).

∥ Parallel vectors (property g)

If v or w is zero, then v × w = 0 and they are trivially parallel.
If both are nonzero: v × w = 0 ⟺ ‖v‖ ‖w‖ sin θ = 0 ⟺ sin θ = 0 ⟺ θ = 0° or 180° ⟺ v ∥ w.

📊 Determinant representation

📊 2×2 determinants

A 2×2 determinant is defined as: |a b; c d| = ad − bc.

Think of it as: product of downward diagonal minus product of upward diagonal.
Example: |1 2; 3 4| = (1)(4) − (2)(3) = 4 − 6 = −2.

📊 3×3 determinants

A 3×3 determinant is computed by expanding along the first row: |a₁ a₂ a₃; b₁ b₂ b₃; c₁ c₂ c₃| = a₁|b₂ b₃; c₂ c₃| − a₂|b₁ b₃; c₁ c₃| + a₃|b₁ b₂; c₁ c₂|

Multiply each scalar in the first row by the 2×2 determinant obtained by removing that scalar's row and column.
Alternate signs: +, −, +.
Example: |1 0 2; 4 −1 3; 1 0 2| = 1(−2 − 0) − 0(8 − 3) + 2(0 + 1) = −2 + 2 = 0.

🧮 Cross product as a determinant

For v = v₁i + v₂j + v₃k and w = w₁i + w₂j + w₃k:

v × w = |i j k; v₁ v₂ v₃; w₁ w₂ w₃|

The first row contains vectors (i, j, k), not scalars, but the expansion formula still applies.
Expanding: v × w = |v₂ v₃; w₂ w₃|i − |v₁ v₃; w₁ w₃|j + |v₁ v₂; w₁ w₂|k.
This matches the original definition: (v₂w₃ − v₃w₂)i + (v₃w₁ − v₁w₃)j + (v₁w₂ − v₂w₁)k.

Example: For v = 4i − j + 3k and w = i + 2k:

v × w = |i j k; 4 −1 3; 1 0 2| = |−1 3; 0 2|i − |4 3; 1 2|j + |4 −1; 1 0|k = −2i − 5j + k.

🧊 Triple products and volume

🧊 Scalar triple product

Scalar triple product: u · (v × w) gives the volume of the parallelepiped with adjacent sides u, v, w (when they form a right-handed system).

Example 1.12 (parallelepiped volume):

The base parallelogram (formed by v and w) has area ‖v × w‖.
The height is ‖u‖ cos θ, where θ is the angle between u and v × w.
Volume = (area)(height) = ‖v × w‖ · ‖u‖ cos θ = u · (v × w).

🔄 Cyclic symmetry (formula 1.12)

For any vectors u, v, w in R³:

u · (v × w) = w · (u × v) = v · (w × u)

The volume is the same regardless of which face is chosen as the base.
Theorem 1.15: For any three adjacent sides u, v, w of a parallelepiped, the volume is |u · (v × w)|.
The absolute value accounts for the possibility that the vectors form a left-handed system (which gives a negative scalar triple product).

🧮 Scalar triple product as a determinant

Theorem 1.17:

u · (v × w) = |u₁ u₂ u₃; v₁ v₂ v₃; w₁ w₂ w₃|

This provides an alternate definition of the 3×3 determinant as the volume of a parallelepiped whose rows are the three vectors (forming a right-handed system).

Example: For u = (2, 1, 3), v = (−1, 3, 2), w = (1, 1, −2):

u · (v × w) = |2 1 3; −1 3 2; 1 1 −2| = 2|3 2; 1 −2| − 1|−1 2; 1 −2| + 3|−1 3; 1 1| = 2(−8) − 1(0) + 3(−4) = −28.
Volume = |−28| = 28.

🔀 Vector triple product

Vector triple product: u × (v × w) = (u · w)v − (u · v)w (Theorem 1.16).

Geometric interpretation:

The right side shows u × (v × w) is a scalar combination of v and w, so it lies in the plane containing v and w.
This makes sense because u × (v × w) is perpendicular to both u and v × w.
Being perpendicular to v × w means lying in the plane of v and w (since that plane is perpendicular to v × w).

Example: For u = (1, 2, 4), v = (2, 2, 0), w = (1, 3, 0):

u · v = 6, u · w = 7.
u × (v × w) = 7(2, 2, 0) − 6(1, 3, 0) = (14, 14, 0) − (6, 18, 0) = (8, −4, 0).
Note: v and w lie in the xy-plane, and u × (v × w) also lies in that plane; it is perpendicular to both u and v × w = (0, 0, 4).

🔗 Identity for double cross products

Example 1.18 proves:

(u × v) · (w × z) = |u · w u · z; v · w v · z|

The proof uses formula (1.12) to rewrite (u × v) · (w × z) as w · (z × (u × v)).
Then applies Theorem 1.16 to expand z × (u × v) = (z · v)u − (z · u)v.
The result simplifies to (u · w)(v · z) − (u · z)(v · w), which is the 2×2 determinant.

Lines and Planes

1.5 Lines and Planes

🧭 Overview

🧠 One-sentence thesis

Using vector notation simplifies the description of lines and planes in 3-dimensional space, enabling straightforward calculations of distances, intersections, and geometric relationships.

📌 Key points (3–5)

Lines in R³: can be represented in three equivalent forms—vector, parametric, and symmetric—each built from a point and a direction vector.
Planes in R³: determined by a point and a normal vector (perpendicular to the plane), or by three noncollinear points.
Distance formulas: the distance from a point to a line uses the cross product; the distance from a point to a plane uses the dot product and the plane's normal form.
Common confusion: skew lines vs parallel lines—skew lines do not intersect and are not parallel, but they lie on separate parallel planes.
Intersection of planes: two non-parallel planes intersect in a line whose direction is the cross product of the two normal vectors.

📐 Representing lines in R³

📐 Line through a point, parallel to a vector

Vector representation of a line: For a point P = (x₀, y₀, z₀) and nonzero vector v in R³, the line L through P parallel to v is given by r + tv, for −∞ < t < ∞, where r = (x₀, y₀, z₀).

Why this works: Multiplying v by a scalar t stretches or shrinks v (and reverses direction if t < 0), so r + tv sweeps out all points on the line as t varies.
When t = 0, you get the point P itself.
The vector v is called the direction vector of the line.

📝 Parametric representation

Parametric form: For a point P = (x₀, y₀, z₀) and nonzero vector v = (a, b, c), the line L consists of all points (x, y, z) given by
x = x₀ + at, y = y₀ + bt, z = z₀ + ct, for −∞ < t < ∞.

This form gives the coordinates of points on L directly, not vectors.
Example: Line through P = (2, 3, 5) parallel to v = (4, −1, 6) is
x = 2 + 4t, y = 3 − t, z = 5 + 6t.

🔀 Symmetric representation

Symmetric form: For a point P = (x₀, y₀, z₀) and vector v = (a, b, c) with a, b, c all nonzero, the line L consists of all points (x, y, z) satisfying
(x − x₀)/a = (y − y₀)/b = (z − z₀)/c.

How to derive it: Solve each parametric equation for t, then set the three expressions equal.
Special case: If, say, a = 0, then x = x₀ and the symmetric form becomes
x = x₀, (y − y₀)/b = (z − z₀)/c.
This means the line lies in the plane x = x₀ (parallel to the yz-plane).

🔗 Line through two points

Line through P₁ and P₂: The vector r₂ − r₁ points from P₁ to P₂, so the line is r₁ + t(r₂ − r₁), for −∞ < t < ∞.

Parametric form:
x = x₁ + (x₂ − x₁)t, y = y₁ + (y₂ − y₁)t, z = z₁ + (z₂ − z₁)t.
Example: Line through P₁ = (−3, 1, −4) and P₂ = (4, 4, −6) is
x = −3 + 7t, y = 1 + 3t, z = −4 − 2t.

📏 Distance from a point to a line

📏 Distance formula

Distance from point P to line L: If L is given by r + tv and Q is a point on L, then the distance d from P to L is
d = ‖v × w‖ / ‖v‖,
where w is the vector from Q to P.

Why this works: If θ is the angle between w and v, then d = ‖w‖ sin θ. Since ‖v × w‖ = ‖v‖ ‖w‖ sin θ, dividing by ‖v‖ gives d.
Example: Distance from P = (1, 1, 1) to the line through Q = (−3, 1, −4) with direction v = (7, 3, −2):
w = (1, 1, 1) − (−3, 1, −4) = (4, 0, 5),
v × w = (15, −43, −12),
d = √(15² + 43² + 12²) / √(7² + 3² + 2²) = √2218 / √62 ≈ 5.98.

🔄 Parallel and perpendicular lines

Parallel lines: L₁ ∥ L₂ if their direction vectors v₁ and v₂ are parallel.
Perpendicular lines: L₁ ⊥ L₂ if v₁ ⊥ v₂ (i.e., v₁ · v₂ = 0).

🌀 Skew lines

Don't confuse: In R³, two lines can be skew—they do not intersect and are not parallel.
Skew lines lie on separate, parallel planes.
How to check for intersection: Use parametric forms with different parameters (s and t), set the (x, y, z) triples equal, and solve the resulting system of 3 equations in 2 unknowns. If no solution exists, the lines are skew or parallel.
Example: The excerpt shows two lines intersecting at (−1, 2, 1) by solving for s = 0 and t = 2, then verifying all three coordinate equations are satisfied.

🛫 Representing planes in R³

🛫 Plane through a point, perpendicular to a vector

Point-normal form of a plane: Let P be a plane containing point (x₀, y₀, z₀) and let n = (a, b, c) be a nonzero vector perpendicular to P (called a normal vector). Then P consists of all points (x, y, z) satisfying
n · r = 0,
where r = (x − x₀, y − y₀, z − z₀), or equivalently
a(x − x₀) + b(y − y₀) + c(z − z₀) = 0.

Why this works: Any point (x, y, z) in the plane forms a vector r from (x₀, y₀, z₀) that lies in the plane, so r ⊥ n, hence n · r = 0.
Example: Plane through (−3, 1, 3) perpendicular to n = (2, 4, 8) is
2(x + 3) + 4(y − 1) + 8(z − 3) = 0.

📋 Normal form

Normal form of a plane: ax + by + cz + d = 0.

Obtained by expanding the point-normal form and combining constants.
Example: The plane 2(x + 3) + 4(y − 1) + 8(z − 3) = 0 becomes
2x + 4y + 8z − 22 = 0.

🔺 Plane through three noncollinear points

Why three points: Two points determine a line, not a plane. Three collinear points also do not determine a unique plane (infinitely many planes contain that line). Three noncollinear points Q, R, S determine exactly one plane.
How to find the plane: Compute vectors QR and QS. Their cross product n = QR × QS is perpendicular to both, so it is a normal vector for the plane. Use the point-normal form with point Q (or R or S).
Example: Plane through (2, 1, 3), (1, −1, 2), (3, 2, 1):
QR = (−1, −2, −1), QS = (1, 1, −2),
n = QR × QS = (5, −3, 1),
Plane: 5(x − 2) − 3(y − 1) + (z − 3) = 0, or 5x − 3y + z − 10 = 0.

🔗 Plane containing two lines

When two lines determine a plane: If two lines intersect or are parallel (but not identical), they determine a unique plane.
Don't confuse: Skew lines do not determine a plane—they lie on separate, parallel planes.
How to find the plane: Pick three noncollinear points from the two lines (one from one line, two from the other), then use the method for three points.

📏 Distance from a point to a plane

📏 Distance formula

Distance from point Q to plane P: Let Q = (x₀, y₀, z₀) and let P have normal form ax + by + cz + d = 0. Then the distance D from Q to P is
D = |ax₀ + by₀ + cz₀ + d| / √(a² + b² + c²).

Why this works: Let R = (x, y, z) be any point in P and r = vector from R to Q. The normal vector n = (a, b, c) is perpendicular to P. The distance is the projection of r onto n, which is |n · r| / ‖n‖. Substituting and using the fact that ax + by + cz + d = 0 for points in P gives the formula.
Example: Distance from (2, 4, −5) to plane 5x − 3y + z − 10 = 0 is
D = |5(2) − 3(4) + 1(−5) − 10| / √(5² + 3² + 1²) = |−17| / √35 = 17/√35 ≈ 2.87.

🔀 Intersection of two planes

🔀 Parallel and perpendicular planes

Parallel planes: Two planes are parallel if their normal vectors are parallel.
Perpendicular planes: Two planes are perpendicular if their normal vectors are perpendicular (dot product = 0).

🔗 Line of intersection

Line of intersection of two planes: If planes P₁ and P₂ with normal vectors n₁ and n₂ intersect in a line L, then L is parallel to n₁ × n₂, so
L: r + t(n₁ × n₂), for −∞ < t < ∞,
where r is any vector pointing to a point in both planes.

Why this works: n₁ × n₂ is perpendicular to both n₁ and n₂, so it is parallel to both planes, hence parallel to their intersection line L.
How to find a point in both planes: Solve the two normal form equations simultaneously. Often easier by setting one coordinate to zero, reducing to two equations in two unknowns.
Example: Intersection of 5x − 3y + z − 10 = 0 and 2x + 4y − z + 3 = 0:
n₁ = (5, −3, 1), n₂ = (2, 4, −1),
Set x = 0: −3y + z − 10 = 0 and 4y − z + 3 = 0 → y = 7, z = 31,
Point: (0, 7, 31),
n₁ × n₂ = (−1, 7, 26),
Line: (0, 7, 31) + t(−1, 7, 26), or x = −t, y = 7 + 7t, z = 31 + 26t.

Relationship	Condition	Result
Parallel planes	n₁ ∥ n₂	No intersection (or identical)
Perpendicular planes	n₁ · n₂ = 0	Intersect at right angle
General intersection	n₁ not parallel to n₂	Intersect in a line

Surfaces

1.6 Surfaces

🧭 Overview

🧠 One-sentence thesis

Surfaces in three-dimensional space are solution sets of equations F(x, y, z) = 0, and understanding their geometric properties—especially spheres, cylinders, and quadric surfaces—enables us to visualize and solve intersection problems in R³.

📌 Key points (3–5)

What a surface is: the solution set of an equation F(x, y, z) = 0 in R³; surfaces are 2-dimensional.
Spheres and cylinders: the two most important non-planar surfaces; spheres are defined by fixed distance from a center, cylinders by moving a line along a circle.
Quadric surfaces: six main types (ellipsoid, hyperboloid of one/two sheets, elliptic/hyperbolic paraboloid, elliptic cone) arising from second-degree equations.
Common confusion: traces vs. the surface itself—a trace is the intersection of a surface with a plane, revealing the surface's shape slice-by-slice.
Intersection behavior: spheres intersect planes in circles or points; cylinders intersect planes in circles, ellipses, or lines depending on orientation; two spheres intersect in a circle or a point.

🌐 Spheres

🔵 Definition and equation

Sphere S: the set of all points (x, y, z) in R³ at a fixed distance r (the radius) from a fixed point P₀ = (x₀, y₀, z₀) (the center).

Coordinate form: (x − x₀)² + (y − y₀)² + (z − z₀)² = r²
Vector form: S = { x : ‖x − x₀‖ = r }, where x = (x, y, z) and x₀ = (x₀, y₀, z₀)
The vector notation emphasizes that every point on the sphere is exactly distance r from the center.

🔍 Recognizing sphere equations

Multiplying out the standard form gives: x² + y² + z² + ax + by + cz + d = 0
Conversely, an equation of this form may describe a sphere; determine by completing the square for x, y, and z.
Example: 2x² + 2y² + 2z² − 8x + 4y − 16z + 10 = 0
- Divide by 2: x² + y² + z² − 4x + 2y − 8z + 5 = 0
- Complete the square: (x − 2)² + (y + 1)² + (z − 4)² = 16
- Result: sphere with radius 4 centered at (2, −1, 4).

✂️ Sphere intersections

With planes:

A plane intersects a sphere in either a single point or a circle.
Example: sphere x² + y² + z² = 169 (radius 13, center at origin) intersects plane z = 12.
- Substitute z = 12: x² + y² + 144 = 169 → x² + y² = 25
- Result: circle of radius 5 centered at (0, 0, 12), parallel to the xy-plane.

With lines:

Substitute the line's parametric equations into the sphere equation and solve for the parameter t.
Example: sphere (x − 2)² + (y + 1)² + (z − 4)² = 16 and line x = 3 + t, y = 1 + 2t, z = 3 − t.
- Substitute: (3 + t − 2)² + (1 + 2t + 1)² + (3 − t − 4)² = 16
- Simplify: 6t² + 12t − 10 = 0
- Solve: t = (−1 ± 4√6) / (some factor); two intersection points result.

With other spheres:

Two spheres intersect in either a single point or a circle.
Example: x² + y² + z² = 25 and x² + y² + (z − 2)² = 16.
- From first: x² + y² = 25 − z²
- From second: x² + y² = 16 − (z − 2)²
- Equate: 16 − (z − 2)² = 25 − z² → 4z − 4 = 9 → z = 13/4
- Then x² + y² = 231/16
- Result: circle of radius √231/4 ≈ 3.8 centered at (0, 0, 13/4).

🛢️ Cylinders

🔷 Right circular cylinders

Obtained by moving a line L along a circle C in R³ so that L is always perpendicular to the plane containing C.
We consider only cases where C lies in a plane parallel to a coordinate plane.

📐 Cylinder equations

Base circle in the xy-plane:

Equation: (x − a)² + (y − b)² = r², where z is unrestricted.
The circle is centered at (a, b, 0) with radius r; the cylinder extends infinitely in the z-direction.

Other orientations:

Base in xz-plane: (x − a)² + (z − c)² = r², y unrestricted.
Base in yz-plane: (y − b)² + (z − c)² = r², x unrestricted.

✂️ Cylinder intersections with planes

Plane orientation	Trace (intersection)
Parallel to base circle	Circle
Oblique (at an angle between 0° and 90°)	Ellipse
Perpendicular to base circle	One or two lines

Trace: the intersection of a surface with a plane.

🎲 Quadric surfaces

🧮 Second-degree equations

Second-degree equation in R³: Ax² + By² + Cz² + Dxy + Exz + Fyz + Gx + Hy + Iz + J = 0

If this equation does not describe a sphere, cylinder, plane, line, or point, the surface is called a quadric surface.
Every quadric surface can be translated and/or rotated to match one of six standard types.

🥚 Ellipsoid

Equation: x²/a² + y²/a² + z²/c² = 1
When a = b = c, this is a sphere; otherwise, it is egg-shaped (like an ellipse rotated around its major axis).
Traces in coordinate planes: all ellipses.

⏳ Hyperboloid of one sheet

Equation: x²/a² + y²/b² − z²/c² = 1
Traces:
- Planes parallel to xy-plane: ellipses.
- Planes parallel to xz- or yz-planes: hyperbolas (except special cases x = ±a and y = ±b, which give pairs of intersecting lines).
Ruled surface: every point lies on a line entirely on the surface; in fact, it is doubly ruled (two lines through each point).

⏳⏳ Hyperboloid of two sheets

Equation: x²/a² − y²/b² − z²/c² = 1
Traces:
- Planes parallel to xy- or xz-plane: hyperbolas.
- No trace in the yz-plane itself.
- Planes parallel to yz-plane where |x| > |a|: ellipses.

🍽️ Elliptic paraboloid

Equation: x²/a² + y²/b² = z/c
Traces:
- Planes parallel to xy-plane: ellipses (single point in the xy-plane itself).
- Planes parallel to xz- or yz-planes: parabolas.
When c > 0, the surface opens upward; when c < 0, it opens downward.
Paraboloid of revolution: when a = b; used as reflecting surfaces (e.g., vehicle headlights).

🐴 Hyperbolic paraboloid

Equation: x²/a² − y²/b² = z/c
Traces:
- Planes parallel to xz-plane: parabolas pointing upward (when c < 0).
- Planes parallel to yz-plane: parabolas pointing downward (when c < 0).
- Planes parallel to xy-plane: hyperbolas (except in the xy-plane itself, where the trace is a pair of intersecting lines through the origin).
When c > 0, the surface is similar but rotated 90° around the z-axis, and the nature of traces reverses.
Doubly ruled surface: two lines through each point on the surface.
Example: z = y² − x² (special case a = b = 1, c = −1).

🔺 Elliptic cone

Equation: x²/a² + y²/b² − z²/c² = 0
Traces:
- Planes parallel to xy-plane: ellipses (single point in the xy-plane itself).
- Planes parallel to xz- or yz-planes: hyperbolas (except in the xz- and yz-planes themselves, where traces are pairs of intersecting lines).
Ruled surface: every point lies on a line through the origin that lies entirely on the surface.

🔄 Transformations and mixed terms

🔀 Equations with mixed variables

An equation like z = 2xy has a "mixed" term (xy), so it does not immediately match the six standard types.
By rotating the x- and y-axes by 45° using the transformation x = (x′ − y′)/√2, y = (x′ + y′)/√2, z = z′, the equation becomes z′ = (x′)² − (y′)², which is a hyperbolic paraboloid.
General principle: every quadric surface can be translated and/or rotated to match one of the six standard forms.

🔁 Regulus and doubly ruled surfaces

Regulus: a pair of lines through each point on a doubly ruled surface.

The hyperboloid of one sheet and the hyperbolic paraboloid are both doubly ruled.
Don't confuse: a singly ruled surface (like the cylinder or elliptic cone) has one line through each point; a doubly ruled surface has two lines through each point.

Curvilinear Coordinates

1.7 Curvilinear Coordinates

🧭 Overview

🧠 One-sentence thesis

Cylindrical and spherical coordinate systems provide alternative ways to locate points in three-dimensional space by using curved paths instead of straight rectangular axes, often simplifying equations when there is symmetry around an axis or the origin.

📌 Key points (3–5)

Why curvilinear coordinates exist: Cartesian coordinates use straight paths along axes; curvilinear systems use curved paths, referencing points on cylinders or spheres instead of rectangular boxes.
Two main types: cylindrical coordinates (useful for symmetry around the z-axis) and spherical coordinates (useful for symmetry about the origin).
How they relate to Cartesian: both systems have conversion formulas linking (x, y, z) to (r, θ, z) or (ρ, θ, φ).
Common confusion: the same surface can be simpler in one coordinate system but more complex in another—spherical coordinates do not always simplify sphere equations if the sphere is not centered at the origin.
Coordinate surfaces: holding one coordinate constant produces simple geometric surfaces (cylinders, planes, spheres, cones, half-planes).

🔄 From Cartesian to curvilinear paths

🔄 How Cartesian coordinates work

Cartesian coordinates (x, y, z) are determined by following straight paths starting from the origin:
- First along the x-axis
- Then parallel to the y-axis
- Then parallel to the z-axis
This builds a rectangular parallelepiped (box) framework.

🌀 What makes coordinates "curvilinear"

In curvilinear coordinate systems, the paths to locate a point can be curved.

Instead of referencing a point by the sides of a rectangular box, we think of the point as lying on a cylinder or sphere.
The two types covered are cylindrical and spherical coordinates.

🛢️ Cylindrical coordinates

🛢️ Definition and conversion formulas

Cylindrical coordinates (r, θ, z): the point P(x, y, z) is described by the radius r in the xy-plane, the angle θ, and the height z.

From Cartesian to cylindrical:

r equals the square root of (x squared plus y squared)
θ equals the inverse tangent of (y divided by x), with 0 ≤ θ ≤ π if y ≥ 0 and π < θ < 2π if y < 0
z equals z

From cylindrical to Cartesian:

x equals r times cosine of θ
y equals r times sine of θ
z equals z

Restrictions:

r ≥ 0
0 ≤ θ < 2π (measured in radians)
θ is undefined when (x, y) = (0, 0)

🎯 When to use cylindrical coordinates

Cylindrical coordinates are often used when there is symmetry around the z-axis.
Example: The cylinder equation x squared plus y squared equals 4 becomes simply r = 2 in cylindrical coordinates—much simpler.

🧱 Cylindrical coordinate surfaces

Holding one coordinate constant produces:

Constant coordinate	Surface description
r = r₀	Cylinder of radius r₀ centered along the z-axis
θ = θ₀	Half-plane emanating from the z-axis
z = z₀	Plane parallel to the xy-plane

🌐 Spherical coordinates

🌐 Definition and conversion formulas

Spherical coordinates (ρ, θ, φ): the point P(x, y, z) is described by the distance ρ from the origin, the angle θ in the xy-plane, and the zenith angle φ between the line from the origin to P and the positive z-axis.

From Cartesian to spherical:

ρ equals the square root of (x squared plus y squared plus z squared)
θ equals the inverse tangent of (y divided by x), with the same sign convention as cylindrical
φ equals the inverse cosine of (z divided by the square root of (x squared plus y squared plus z squared))

From spherical to Cartesian:

x equals ρ times sine of φ times cosine of θ
y equals ρ times sine of φ times sine of θ
z equals ρ times cosine of φ

Restrictions:

ρ ≥ 0
0 ≤ θ < 2π
0 ≤ φ ≤ π
θ is undefined when (x, y) = (0, 0)
φ is undefined when (x, y, z) = (0, 0, 0)

🔬 When to use spherical coordinates

Spherical coordinates are useful when there is symmetry about the origin.
Don't confuse: spherical coordinates do not always simplify sphere equations. If the sphere is not centered at the origin, the equation can become more complicated.
Example: The sphere (x − 2)² + (y − 1)² + z² = 9 becomes ρ² − 2 sin φ (2 cos θ + sin θ) ρ − 4 = 0 in spherical coordinates—harder to recognize as a sphere.

🧱 Spherical coordinate surfaces

Holding one coordinate constant produces:

Constant coordinate	Surface description
ρ = ρ₀	Sphere of radius ρ₀ centered at the origin
θ = θ₀	Half-plane emanating from the z-axis
φ = φ₀	Circular cone with vertex at the origin

📐 The zenith angle φ

The zenith angle φ is the angle between the line segment from the origin to P and the positive z-axis.

This is measured from the z-axis downward, not from the xy-plane upward.
It ranges from 0 (on the positive z-axis) to π (on the negative z-axis).

🔢 Worked examples

🔢 Converting a point to cylindrical and spherical

Example: Convert the point (−2, −2, 1) from Cartesian coordinates.

(a) Cylindrical:

r = square root of ((−2)² + (−2)²) = 2 times the square root of 2
θ = inverse tangent of (−2 divided by −2) = inverse tangent of 1 = 5π/4, since y = −2 < 0
z = 1
Result: (2√2, 5π/4, 1)

(b) Spherical:

ρ = square root of ((−2)² + (−2)² + 1²) = 3
θ = 5π/4 (same as cylindrical)
φ = inverse cosine of (1/3) ≈ 1.23 radians
Result: (3, 5π/4, 1.23)

🌀 The helicoid surface

Example: Describe the surface given by θ = z in cylindrical coordinates.

This surface is called a helicoid.
As the vertical z coordinate increases, the angle θ increases proportionally.
The radius r is unrestricted (can be any non-negative value).
This sweeps out a surface shaped like a spiral staircase with infinite radius.
The helicoid is a ruled surface (can be generated by moving a straight line).

⚠️ Important notes and conventions

⚠️ Handedness convention

The "standard" definition of spherical coordinates used by mathematicians results in a left-handed system.
Physicists usually switch the definitions of θ and φ to make (ρ, θ, φ) a right-handed system.
Don't confuse: the mathematical and physics conventions differ.

⚠️ When coordinate systems simplify equations

A surface equation can be simpler in one coordinate system and more complex in another.
Cylindrical coordinates work well for cylinders and surfaces symmetric around the z-axis.
Spherical coordinates work well for spheres centered at the origin and cones.
Example: x² + y² = 4 becomes r = 2 (cylindrical)—very simple.
Counter-example: A sphere not centered at the origin becomes more complicated in spherical coordinates.

Vector-Valued Functions

1.8 Vector-Valued Functions

🧭 Overview

🧠 One-sentence thesis

Vector-valued functions extend single-variable calculus to three dimensions by mapping real numbers to vectors, enabling the mathematical description of curves in space and physical motion.

📌 Key points (3–5)

What they are: A vector-valued function assigns a vector in R³ to each real number in its domain, written as f(t) = (f₁(t), f₂(t), f₃(t)).
How calculus extends: Limits, continuity, and derivatives work component-wise—apply single-variable calculus to each component function separately.
Geometric meaning of the derivative: The derivative f′(t) is a tangent vector to the curve traced by f(t), lying on the tangent line to that curve.
Physical interpretation: Position, velocity, acceleration, momentum, and force can all be represented as vector-valued functions of time.
Common confusion: The derivative is a vector (tangent direction), not a scalar slope; higher dimensions require thinking about direction, not just steepness.

📐 Definition and representation

📐 What is a vector-valued function

Vector-valued function of a real variable: A rule that associates a vector f(t) with a real number t, where t is in some subset D of R¹ (the domain). Written as f : D → R³.

The input is a single real number (often time or a parameter).
The output is a three-dimensional vector.
Example: f(t) = ti + t²j + t³k maps each real t to a vector in R³.

🧩 Component form

Two equivalent notations:

Vector form: f(t) = f₁(t)i + f₂(t)j + f₃(t)k
(emphasizes that the output is a vector)
Coordinate form: f(t) = (f₁(t), f₂(t), f₃(t))
(useful for focusing on terminal points)

The functions f₁(t), f₂(t), f₃(t) are called component functions—each is a real-valued function of t.

🌀 Curves in space

By identifying vectors with their terminal points, a vector-valued function traces out a curve in three-dimensional space as t varies.

Example: Helix
f(t) = (cos t, sin t, t) describes a helix spiraling upward. As t increases, the terminal points trace a curve on the cylinder x² + y² = 1 (since cos²t + sin²t = 1).

🔬 Limits, continuity, and derivatives

🔬 Limit definition

Limit: lim(t→a) f(t) = c if lim(t→a) ‖f(t) − c‖ = 0.

Equivalently, compute limits component-wise:
lim(t→a) f(t) = (lim(t→a) f₁(t), lim(t→a) f₂(t), lim(t→a) f₃(t))
provided all three component limits exist.

📏 Continuity

Continuous at a: f(t) is continuous at a if lim(t→a) f(t) = f(a).

Equivalently, f is continuous at a if and only if each component function f₁, f₂, f₃ is continuous at a.

🎯 Derivative

Derivative: f′(a) = lim(h→0) [f(a + h) − f(a)] / h, if that limit exists.

Equivalently, differentiate component-wise:
f′(a) = (f₁′(a), f₂′(a), f₃′(a))

Key difference from single-variable calculus: The derivative of a real-valued function is a number (slope); the derivative of a vector-valued function is a tangent vector to the curve, lying on the tangent line.

Example: Helix tangent
For f(t) = (cos t, sin t, t), the derivative is f′(t) = (−sin t, cos t, 1).
At t = 2π, the tangent line through f(2π) = (1, 0, 2π) is:
L = (1, 0, 2π) + s(0, 1, 1), or parametrically: x = 1, y = s, z = 2π + s.

🧮 Differentiation rules

🧮 Basic rules (Theorem 1.20)

All familiar differentiation rules extend to vector-valued functions:

Rule	Formula
Constant vector	d/dt(c) = 0
Scalar multiple	d/dt(kf) = k df/dt
Sum	d/dt(f + g) = df/dt + dg/dt
Difference	d/dt(f − g) = df/dt − dg/dt
Scalar product	d/dt(uf) = (du/dt)f + u(df/dt)
Dot product	d/dt(f · g) = (df/dt) · g + f · (dg/dt)
Cross product	d/dt(f × g) = (df/dt) × g + f × (dg/dt)

Why they work: Parts (a)–(e) follow from differentiating component functions using single-variable rules. The dot and cross product rules require expanding in components and applying the product rule to each term.

🔍 Important consequence: perpendicularity

Derivative of magnitude:
If f(t) is differentiable and ‖f(t)‖ ≠ 0, then:
d/dt ‖f(t)‖ = [f′(t) · f(t)] / ‖f(t)‖

Key fact: ‖f(t)‖ is constant if and only if f(t) ⊥ f′(t) for all t.

Geometric interpretation: If a curve lies completely on a sphere (or circle) centered at the origin, the tangent vector f′(t) is always perpendicular to the position vector f(t).

Example: Spherical spiral
f(t) = (cos t / √(1 + a²t²), sin t / √(1 + a²t²), −at / √(1 + a²t²))
lies on the unit sphere x² + y² + z² = 1, and f′(t) · f(t) = 0 for all t.

🚀 Physical applications

🚀 Motion in space

For an object moving in space with position vector r(t) = (x(t), y(t), z(t)):

Quantity	Definition	Notation
Position	r(t) = (x(t), y(t), z(t))	—
Velocity	v(t) = dr/dt = r′(t)	Also written ṙ(t)
Acceleration	a(t) = dv/dt = v′(t) = d²r/dt²	Also written r̈(t)
Momentum	p(t) = mv(t)	m = constant mass
Force	F(t) = dp/dt = ma(t)	Newton's 2nd Law

The magnitude ‖v(t)‖ is called the speed of the object.

🎯 Circular motion example

For r(t) = (5 cos t, 3 sin t, 4 sin t):

Velocity: v(t) = (−5 sin t, 3 cos t, 4 cos t)
Acceleration: a(t) = (−5 cos t, −3 sin t, −4 sin t) = −r(t)

Note: ‖r(t)‖ = 5 (constant), so r(t) · v(t) = 0 (perpendicular).
When an object moves in a circle with constant speed, the acceleration vector points toward the center (opposite the position vector).

🎨 Special curves and applications

🎨 Lines and parabolas

Line: f(t) = (a₁t + b₁, a₂t + b₂, a₃t + b₃)
(each component is linear in t)
Parabola: f(t) = (a₁t² + b₁t + c₁, a₂t² + b₂t + c₂, a₃t² + b₃t + c₃)
(each component is quadratic in t)

Line segment between two points:
l(t) = (1 − t)r₁ + tr₂ for t ∈ [0, 1]
gives the segment from r₁ (at t = 0) to r₂ (at t = 1).

🖥️ Bézier curves (CAD application)

Bézier curves approximate the shape of a polygonal path (control polygon) using repeated linear interpolation.

For three points b₀, b₁, b₂:

Define line segments: b₀¹(t) = (1 − t)b₀ + tb₁ and b₁¹(t) = (1 − t)b₁ + tb₂
The Bézier curve is: b₀²(t) = (1 − t)b₀¹(t) + tb₁¹(t)
= (1 − t)²b₀ + 2t(1 − t)b₁ + t²b₂

This is a parabola passing through b₀ (at t = 0) and b₂ (at t = 1), controlled by b₁.

Example: For b₀ = (0, 0, 0), b₁ = (1, 2, 3), b₂ = (4, 5, 2):
b₀²(t) = (2t + 2t², 4t + t², 6t − 4t²)

For four points: The algorithm extends recursively (de Casteljau's algorithm), producing a cubic polynomial curve.

🔄 Higher-order derivatives

Just as in single-variable calculus, repeatedly differentiate:

f″(t) = d/dt[f′(t)]
f‴(t) = d/dt[f″(t)]
dⁿf/dtⁿ = d/dt[dⁿ⁻¹f/dtⁿ⁻¹]

These are used to describe acceleration, jerk, and other higher-order motion characteristics.

⚠️ Important differences from single-variable calculus

⚠️ Mean Value Theorem fails

Don't confuse: The Mean Value Theorem does not hold for vector-valued functions.

Counterexample: For f(t) = (cos t, sin t, t) on [0, 2π], there is no t where:
f′(t) = [f(2π) − f(0)] / (2π − 0)

This is because the theorem requires a scalar equation, but vector equality requires matching all three components simultaneously—generally impossible.

⚠️ When single-variable results do extend

Many results do carry over by applying them component-wise:

If f′(t) = 0 for all t in an interval, then f(t) is a constant vector.
Chain rule, product rules, and basic derivative formulas all extend naturally.

The key is recognizing when a result depends on ordering (like inequalities or the Mean Value Theorem), which doesn't exist for vectors.

Arc Length

1.9 Arc Length

🧭 Overview

🧠 One-sentence thesis

Arc length measures the distance traveled along a curve by integrating the speed (magnitude of the velocity vector) over time, and while arc length parametrization provides theoretical elegance by giving unit speed, it is often impractical for computation.

📌 Key points (3–5)

Core formula: Arc length L from t = a to t = b is the integral of the norm of the derivative (speed) over that interval.
Arc length parametrization: Replaces time parameter t with distance parameter s, yielding unit speed (derivative has magnitude 1), but requires solving an integral in closed form.
Practical vs theoretical trade-off: Arc length parametrization is useful for theoretical work (differential geometry, curvature) but often impossible to compute; standard parametrizations are simpler for practical use.
Common confusion: Different parametrizations can trace the same curve at different speeds—they represent the same geometric object but with different motion rates.
Extension to other coordinates: Arc length formulas exist for cylindrical (and spherical) coordinate systems, derived by converting to Cartesian coordinates.

📏 Defining arc length

📏 Motivation from motion

If r(t) = (x(t), y(t), z(t)) is the position vector of an object moving in R³, then the norm of v(t) (the velocity) is the speed at time t.
It is natural to define the distance s traveled from t = a to t = b as the integral of speed over time.
This mirrors the single-variable calculus formula for parametric curves in R².

📐 Formal definition

Arc length L of a curve f(t) = (x(t), y(t), z(t)) from t = a to t = b is
L = integral from a to b of the norm of f′(t) dt
= integral from a to b of the square root of (x′(t)² + y′(t)² + z′(t)²) dt

Requirements:

The domain must include the interval [a, b].
Each component function x(t), y(t), z(t) must have a continuous first derivative on (a, b) (continuously differentiable or C¹).
No section of the curve is repeated.

Don't confuse: The excerpt notes that this formula is not rigorously proved in the text; a full proof requires Duhamel's principle and is beyond scope.

🧮 Example: helix

Example: Find the length of the helix f(t) = (cos t, sin t, t) from t = 0 to t = 2π.

f′(t) = (−sin t, cos t, 1)
Norm of f′(t) = square root of (sin² t + cos² t + 1) = square root of 2
L = integral from 0 to 2π of square root of 2 dt = (square root of 2)(2π − 0) = 2π times square root of 2

🔧 Handling discontinuities

If derivatives are not continuous at some points in [a, b], partition [a, b] into subintervals where all component functions are continuously differentiable (except at endpoints, which can be ignored).
Sum the arc lengths over the subintervals to get the total arc length.

🔄 Parametrization and equivalent curves

🔄 Different parametrizations of the same curve

The same geometric curve can be traced by different functions at different speeds.
Example: f(t) = (cos t, sin t, t) on [0, 2π] and g(t) = (cos 2t, sin 2t, 2t) on [0, π] trace the same curve.
- g(t) traces the curve twice as fast as f(t).
- Speed of f(t): norm of f′(t) = square root of 2
- Speed of g(t): norm of g′(t) = 2 times square root of 2

🔗 Formal definition of parametrization

Parametrization: Let C be a smooth curve in R³ represented by f(t) on [a, b], and let α : [c, d] → [a, b] be a smooth one-to-one mapping. Then g(s) = f(α(s)) is a parametrization of C with parameter s.

Equivalent parametrization: If α is strictly increasing on [c, d], then g(s) is equivalent to f(t).

Chain Rule for vector-valued functions:

If f(t) is differentiable and t = α(s) is differentiable, then f(s) = f(α(s)) is differentiable.
df/ds = (df/dt)(dt/ds)

📋 Example: equivalent parametrizations

The following are all equivalent parametrizations of the same curve:

f(t) = (cos t, sin t, t) for t in [0, 2π]
g(s) = (cos 2s, sin 2s, 2s) for s in [0, π]
h(s) = (cos 2πs, sin 2πs, 2πs) for s in [0, 1]

To verify: define α(s) = 2s for the first pair, α(s) = 2πs for the second pair; both are smooth, one-to-one, onto, and strictly increasing.

🎯 Arc length parametrization

🎯 The idea: distance as parameter

Replace the parameter t with the parameter s given by
s = s(t) = integral from a to t of the norm of f′(u) du
In terms of motion, s is the distance traveled along the curve after time t has elapsed.
Key insight: The new parameter is distance instead of time.

🔍 Properties of the arc length function s(t)

By the Fundamental Theorem of Calculus, s′(t) = ds/dt = norm of f′(t).
Since f(t) is smooth, norm of f′(t) > 0 for all t in [a, b], so s′(t) > 0.
Therefore s(t) is strictly increasing on [a, b], making it a one-to-one mapping.
s(a) = 0 and s(b) = L (the total arc length from t = a to t = b).
So s : [a, b] → [0, L] is a one-to-one, differentiable mapping onto [0, L].

⚙️ Constructing the arc length parametrization

There exists an inverse function α : [0, L] → [a, b] that is differentiable.
For each t in [a, b] there is a unique s in [0, L] such that s = s(t) and t = α(s).
The derivative of α is α′(s) = 1 / (norm of f′(α(s))).
Define the arc length parametrization f̄(s) = f(α(s)) for all s in [0, L].

⚡ Unit speed property

f̄(s) has unit speed: the norm of f̄′(s) = 1 for all s in [0, L].
Why: By the Chain Rule, f̄′(s) = f′(α(s)) · α′(s) = f′(α(s)) / (norm of f′(α(s))), so the norm of f̄′(s) = 1.
This means the arc length parametrization traverses the curve at a "normal" rate.

🧮 Example: helix with arc length parametrization

Example: Parametrize the helix f(t) = (cos t, sin t, t) for t in [0, 2π] by arc length.

s = integral from 0 to t of (square root of 2) du = (square root of 2) · t
Solve for t in terms of s: t = α(s) = s / (square root of 2)
Therefore f̄(s) = (cos(s / square root of 2), sin(s / square root of 2), s / square root of 2) for s in [0, 2π times square root of 2]
Verify: norm of f̄′(s) = 1

⚠️ Practical limitations

In practice: Arc length parametrization requires evaluating the integral s = integral of norm of f′(u) du in closed form, then solving for t in terms of s.
The reality: The simple integral in the helix example is the exception, not the norm.
Common situation: The integral is either difficult or impossible to evaluate in simple closed form.
Implication: Arc length parametrizations are more useful for theoretical purposes (differential geometry, curvature, moving frame fields) than for practical computations.
Example of difficulty: Bézier curves use polynomial parametrizations (simple for CAD), but their arc length parametrizations are usually impossible to calculate.

🌐 Arc length in other coordinate systems

🌐 Cylindrical coordinates formula

Theorem: If r = r(t), θ = θ(t), and z = z(t) are the cylindrical coordinates of a curve f(t) for t in [a, b], then the arc length L is
L = integral from a to b of the square root of (r′(t)² + r(t)² · θ′(t)² + z′(t)²) dt

Derivation:

Cartesian coordinates: x(t) = r(t) cos θ(t), y(t) = r(t) sin θ(t), z(t) = z(t)
Differentiate: x′(t) = r′(t) cos θ(t) − r(t) θ′(t) sin θ(t), y′(t) = r′(t) sin θ(t) + r(t) θ′(t) cos θ(t)
Compute x′(t)² + y′(t)²: after expanding and using cos² θ + sin² θ = 1, the cross terms cancel, yielding r′(t)² + r(t)² θ′(t)²
Substitute into the Cartesian arc length formula.

🧮 Example: cylindrical coordinates

Example: Find the arc length of the curve with cylindrical coordinates r = e^t, θ = t, z = e^t for t in [0, 1].

r′(t) = e^t, θ′(t) = 1, z′(t) = e^t
L = integral from 0 to 1 of the square root of (e^(2t) + e^(2t) · 1² + e^(2t)) dt
= integral from 0 to 1 of e^t · (square root of 3) dt
= (square root of 3)(e − 1)

🌍 Spherical coordinates (mentioned in exercises)

The excerpt mentions (in Exercise 12) that a similar formula exists for spherical coordinates ρ = ρ(t), θ = θ(t), φ = φ(t):
L = integral from a to b of the square root of (ρ′(t)² + (ρ(t)² sin² φ(t)) θ′(t)² + ρ(t)² φ′(t)²) dt
(The proof follows the same pattern as for cylindrical coordinates.)

Functions of Two or Three Variables

2.1 Functions of Two or Three Variables

🧭 Overview

🧠 One-sentence thesis

Real-valued functions of two or three variables extend single-variable calculus by mapping points in R² or R³ to real numbers, and their limits require checking convergence along infinitely many paths rather than just two directions.

📌 Key points (3–5)

What these functions are: rules that assign a real number to each point (x, y) in R² or (x, y, z) in R³, with a domain and range.
Graphs and level curves: the graph of f(x, y) is a surface in R³, and level curves (where f(x, y) = c) show "elevation" contours.
Limits are harder: in multivariable cases, (x, y) can approach (a, b) along infinitely many paths, not just from left/right as in single-variable calculus.
Common confusion: a limit exists only if the function approaches the same value along every path; different values along different paths means no limit exists.
Continuity: f is continuous at (a, b) if the limit equals f(a, b), just as in single-variable calculus.

📐 What are functions of several variables

📐 Definition and notation

A real-valued function f defined on a subset D of R² is a rule that assigns to each point (x, y) in D a real number f(x, y).

The domain D is the largest set where f is defined.
The range is the set of all output values f(x, y) as (x, y) varies over D.
Similar definitions hold for functions f(x, y, z) in R³.

🔢 Domain examples

Function	Domain	Range
f(x, y) = xy	All of R²	All of R
f(x, y) = 1/(x − y)	All (x, y) where x ≠ y	All real numbers except 0
f(x, y) = √(1 − x² − y²)	Points on/inside unit circle: x² + y² ≤ 1	[0, 1]
f(x, y, z) = e^(x+y−z)	All of R³	All positive real numbers

Don't confuse: the domain is determined by where the formula makes sense (e.g., no division by zero, no negative square roots).

🗺️ Visualizing functions via graphs and level curves

🗺️ The graph as a surface

Writing z = f(x, y), the graph is the set {(x, y, z) : z = f(x, y)} in R³.
This graph is a surface because it satisfies an equation F(x, y, z) = 0 (namely f(x, y) − z = 0).

📏 Level curves

Level curves are the solution sets of equations f(x, y) = c for various constants c.

They are traces of the surface in horizontal planes z = c.
Often projected onto the xy-plane to show "elevation" levels, like topographic maps.
Example: for f(x, y) = (sin √(x² + y²)) / √(x² + y²), the level curves are groups of concentric circles.

🎯 Limits in two or more variables

🎯 The formal definition

The limit of f(x, y) equals L as (x, y) approaches (a, b), written lim_{(x,y)→(a,b)} f(x, y) = L, if given any ε > 0, there exists a δ > 0 such that |f(x, y) − L| < ε whenever 0 < √((x − a)² + (y − b)²) < δ.

In plain language: f(x, y) gets arbitrarily close to L when (x, y) is sufficiently close to (a, b).
The distance √((x − a)² + (y − b)²) measures closeness in the plane.

🛤️ The path problem

Key difference from single-variable limits: in one dimension, x → a from only two directions (left or right).
In two dimensions, (x, y) can approach (a, b) along infinitely many paths.
For the limit to exist, f must approach the same value L along every possible path.

❌ When limits do not exist

Example: lim_{(x,y)→(0,0)} xy/(x² + y²) does not exist.

Substituting (0, 0) gives the indeterminate form 0/0.
Along the positive x-axis (y = 0): f(x, 0) = 0.
Along the line y = x (for x > 0): f(x, x) = x²/(2x²) = 1/2.
Different paths give different values → the limit does not exist.
Don't confuse: showing two paths give different values proves no limit; showing many paths agree does not prove the limit exists.

✅ When you can substitute directly

If f(x, y) is given by a single formula and is properly defined at (a, b) (not an indeterminate form), just substitute (x, y) = (a, b).
Example: lim_{(x,y)→(1,2)} xy/(x² + y²) = (1)(2)/(1 + 4) = 2/5.

🧮 Using the squeeze theorem

Example: Show lim_{(x,y)→(0,0)} y⁴/(x² + y²) = 0.

Notice 0 ≤ y⁴ ≤ (√(x² + y²))⁴ = (x² + y²)².
So |y⁴/(x² + y²)| ≤ (x² + y²)² / (x² + y²) = x² + y² → 0 as (x, y) → (0, 0).
By the squeeze principle (Theorem 2.1(e)), the limit is 0.

🔗 Limit laws and continuity

🔗 Algebraic rules for limits

Theorem 2.1 states that if both lim_{(x,y)→(a,b)} f(x, y) and lim_{(x,y)→(a,b)} g(x, y) exist, then:

Limits of sums/differences equal sums/differences of limits.
Limits of scalar multiples equal scalar multiples of limits.
Limits of products equal products of limits.
Limits of quotients equal quotients of limits (if denominator limit ≠ 0).
Squeeze principle: if |f(x, y) − L| ≤ g(x, y) and lim g = 0, then lim f = L.

🔗 Continuity

A function f(x, y) is continuous at (a, b) if lim_{(x,y)→(a,b)} f(x, y) = f(a, b).

f is a continuous function if it is continuous at every point in its domain.
Example: the function f(x, y) = 0 if (x, y) = (0, 0), and f(x, y) = y⁴/(x² + y²) otherwise, is continuous on all of R² because the limit at (0, 0) equals the function value 0.

Partial Derivatives

2.2 Partial Derivatives

🧭 Overview

🧠 One-sentence thesis

Partial derivatives extend the concept of rate of change to functions of several variables by measuring how the function changes in one direction while holding all other variables constant.

📌 Key points (3–5)

What a partial derivative measures: the rate of change of a multivariable function in one specific direction (positive x or positive y).
How to compute: treat all other variables as constants and differentiate with respect to the target variable using single-variable calculus rules.
Higher-order partial derivatives: you can take partial derivatives of partial derivatives, creating second-order, third-order, and mixed partial derivatives.
Common confusion: mixed partial derivatives—the order of differentiation usually doesn't matter (when continuous), so ∂²f/∂y∂x equals ∂²f/∂x∂y.
Geometric meaning: partial derivatives represent slopes of tangent lines to traces of the surface in planes parallel to the coordinate axes.

📐 Definition and basic concept

📐 What partial derivatives are

Partial derivative of f at (a, b) with respect to x: the limit as h approaches 0 of [f(a + h, b) − f(a, b)] / h

Partial derivative of f at (a, b) with respect to y: the limit as h approaches 0 of [f(a, b + h) − f(a, b)] / h

The symbol ∂ is pronounced "del" and is not a Greek letter; it was introduced around 1740 to distinguish partial derivatives from ordinary derivatives.
Just as the ordinary derivative measures rate of change for single-variable functions, partial derivatives measure rate of change for multivariable functions.
The key difference: partial derivatives measure change in one direction at a time.

🔧 How to calculate partial derivatives

Core technique: treat one variable as the "active" variable and all others as constants.

To find ∂f/∂x: treat y as a constant, then differentiate f(x, y) as if it were only a function of x.
To find ∂f/∂y: treat x as a constant, then differentiate f(x, y) as if it were only a function of y.
Use all the standard differentiation rules from single-variable calculus (product rule, quotient rule, chain rule, etc.).

Example: For f(x, y) = x²y + y³:

∂f/∂x = 2xy (treating y as constant, the derivative of x² is 2x, and y³ disappears because it's constant with respect to x)
∂f/∂y = x² + 3y² (treating x as constant, x² becomes a constant coefficient, and the derivative of y³ is 3y²)

Example: For f(x, y) = sin(xy²) / (x² + 1):

∂f/∂x requires the quotient rule: numerator derivative is (x² + 1)(y² cos(xy²)) − (2x) sin(xy²), denominator is (x² + 1)²
∂f/∂y is simpler: 2xy cos(xy²) / (x² + 1)

🔄 Higher-order partial derivatives

🔄 Second-order partial derivatives

Since ∂f/∂x and ∂f/∂y are themselves functions of x and y, you can take their partial derivatives:

Notation	Meaning	Description
∂²f/∂x²	∂/∂x(∂f/∂x)	Differentiate with respect to x twice
∂²f/∂y²	∂/∂y(∂f/∂y)	Differentiate with respect to y twice
∂²f/∂y∂x	∂/∂y(∂f/∂x)	First x, then y
∂²f/∂x∂y	∂/∂x(∂f/∂y)	First y, then x

Example: For f(x, y) = e^(x²y) + xy³:

∂f/∂x = 2xye^(x²y) + y³
∂²f/∂x² = 2ye^(x²y) + 4x²y²e^(x²y)
∂²f/∂y∂x = 2xe^(x²y) + 2x³ye^(x²y) + 3y²

🔀 Mixed partial derivatives

Mixed partial derivatives: higher-order partial derivatives taken with respect to different variables, such as ∂²f/∂y∂x and ∂²f/∂x∂y.

Key property: When both mixed partials are continuous at a point, they are equal at that point.

In practice: for all functions in typical calculus courses, ∂²f/∂y∂x = ∂²f/∂x∂y everywhere in the domain.
Don't confuse: the order of differentiation doesn't matter (when continuous), even though the notation reads right-to-left.
This equality extends to third-order and higher mixed partials as well.

Example verification: In the example above, both ∂²f/∂y∂x and ∂²f/∂x∂y equal 2xe^(x²y) + 2x³ye^(x²y) + 3y².

📝 Alternative notation systems

The excerpt lists multiple equivalent notations for partial derivatives:

Del notation	Subscript	Numeric	D-operator
∂f/∂x	f_x(x, y)	f₁(x, y)	D_x(x, y) or D₁(x, y)
∂f/∂y	f_y(x, y)	f₂(x, y)	D_y(x, y) or D₂(x, y)
∂²f/∂x²	f_xx(x, y)	f₁₁(x, y)	D_xx(x, y) or D₁₁(x, y)
∂²f/∂y∂x	f_xy(x, y)	f₁₂(x, y)	D_xy(x, y) or D₁₂(x, y)

The numeric notation (f₁, f₂) is useful when variables aren't named x and y.
Subscript notation (f_x, f_xx) is compact and commonly used in proofs.

📊 Geometric interpretation

📊 Partial derivatives as slopes

The excerpt connects partial derivatives to tangent lines in three-dimensional space:

For a surface z = f(x, y) and a point (a, b) in the domain:
- The trace of the surface in the plane y = b is a curve through (a, b, f(a, b)).
- The slope of the tangent line L_x to that curve is ∂f/∂x(a, b).
Similarly:
- The trace in the plane x = a is a curve through (a, b, f(a, b)).
- The slope of the tangent line L_y to that curve is ∂f/∂y(a, b).

Why this matters: Just as the ordinary derivative gives the slope of a tangent line in 2D, partial derivatives give slopes of tangent lines to cross-sections of surfaces in 3D.

Example scenario: If f(x, y) represents elevation on a map, ∂f/∂x at a point tells you how steeply the terrain rises as you walk east (positive x direction), while ∂f/∂y tells you the steepness walking north (positive y direction).

Tangent Plane to a Surface

2.3 Tangent Plane to a Surface

🧭 Overview

🧠 One-sentence thesis

Partial derivatives provide the slopes of tangent lines in the x and y directions, and these slopes determine a tangent plane that "just touches" a surface at a point.

📌 Key points (3–5)

Geometric meaning of partial derivatives: ∂f/∂x and ∂f/∂y represent slopes of tangent lines to traces of the surface in planes parallel to the coordinate planes.
Tangent plane definition: a plane that touches the surface such that the angle between any vector from the point to nearby surface points and the plane approaches zero.
Existence conditions: the tangent plane exists at a point if both partial derivatives exist in a region around the point and are continuous at that point.
Common confusion: the existence of two tangent lines (in x and y directions) does not automatically guarantee a tangent plane exists—continuity of partial derivatives is needed.
Two formulas: explicit surfaces z = f(x,y) and implicit surfaces F(x,y,z) = 0 have different but related tangent plane equations.

📐 Geometric interpretation of partial derivatives

📏 Slopes of tangent lines to surface traces

The excerpt extends the single-variable idea (derivative = slope of tangent line) to surfaces:

For a surface z = f(x,y) at point (a, b, f(a,b)):
- The trace in the plane y = b is a curve in three-dimensional space.
- The tangent line L_x to this curve has slope equal to ∂f/∂x(a,b).
- Similarly, the trace in the plane x = a has tangent line L_y with slope ∂f/∂y(a,b).

Why this matters: these two tangent lines lie in the tangent plane (if it exists) and provide the building blocks for finding the plane's equation.

🔍 What "slope" means in three dimensions

In the xz-plane (where y is fixed), ∂f/∂x(a,b) measures vertical change per unit horizontal change in the x direction.
In the yz-plane (where x is fixed), ∂f/∂y(a,b) measures vertical change per unit horizontal change in the y direction.
These are rates of change along specific directions, not arbitrary directions.

🛫 Definition and existence of the tangent plane

📖 Formal definition

Tangent plane T to surface S at point P = (a,b,c): a plane containing P such that the acute angle between the vector PQ (from P to any generic point Q on S) and the plane T approaches zero as Q approaches P along the surface.

This mimics the intuitive notion of a tangent line "just touching" a curve.
The plane must contain the point and become parallel to the surface in the limit.

⚠️ Existence is not automatic

The excerpt emphasizes a subtle point:

Two tangent lines (L_x and L_y) determine a plane geometrically.
However, the existence of these two lines does not guarantee that this plane is the tangent plane to the surface.
Example scenario: if you take a trace in a diagonal plane (e.g., x - y = 0, making a 45° angle with the x-axis), the tangent line to that curve might not lie in the plane determined by L_x and L_y, or might not exist at all.

Sufficient condition for existence: if ∂f/∂x and ∂f/∂y exist in a region around (a,b) and are continuous at (a,b), then the tangent plane exists at (a, b, f(a,b)).

The excerpt states "in this text, those conditions will always hold," so readers can assume tangent planes exist for all problems.

🚫 Don't confuse

Tangent lines existing ≠ tangent plane existing: you need continuity of partial derivatives, not just their existence at a single point.
The excerpt warns that traces in other directions (not aligned with x or y axes) might behave differently.

🧮 Equation of the tangent plane for explicit surfaces

🔧 Derivation using cross product

For surface z = f(x,y) at point (a, b, f(a,b)):

Find direction vectors parallel to the tangent lines:
- v_x = (1, 0, ∂f/∂x(a,b)) is parallel to L_x (lies in xz-plane with slope ∂f/∂x(a,b)).
- v_y = (0, 1, ∂f/∂y(a,b)) is parallel to L_y (lies in yz-plane with slope ∂f/∂y(a,b)).
Compute normal vector: n = v_x × v_y (cross product of the two direction vectors).
- The cross product yields n = (-∂f/∂x(a,b), -∂f/∂y(a,b), 1).
Use point-normal form: the plane equation is A(x - a) + B(y - b) + C(z - f(a,b)) = 0, where (A,B,C) = n.

📝 Final formula for explicit surfaces

Tangent plane to z = f(x,y) at (a, b, f(a,b)):

∂f/∂x(a,b) · (x - a) + ∂f/∂y(a,b) · (y - b) - z + f(a,b) = 0

This is obtained by multiplying the initial equation by -1 for a cleaner form.
The coefficients are the partial derivatives evaluated at the point.

🧪 Example walkthrough

The excerpt provides Example 2.13: find the tangent plane to z = x² + y² at (1, 2, 5).

Compute partial derivatives: ∂f/∂x = 2x, ∂f/∂y = 2y.
Evaluate at (1,2): ∂f/∂x(1,2) = 2, ∂f/∂y(1,2) = 4.
Substitute into formula: 2(x - 1) + 4(y - 2) - z + 5 = 0.
Simplify: 2x + 4y - z - 5 = 0.

🌐 Equation of the tangent plane for implicit surfaces

🔄 Implicit surface formula

For a surface defined implicitly by F(x,y,z) = 0 at point (a,b,c):

Tangent plane equation:

∂F/∂x(a,b,c) · (x - a) + ∂F/∂y(a,b,c) · (y - b) + ∂F/∂z(a,b,c) · (z - c) = 0

This involves three partial derivatives (including ∂F/∂z), not just two.
The explicit formula (for z = f(x,y)) is a special case: set F(x,y,z) = f(x,y) - z, then ∂F/∂z = -1, and the formula reduces to the previous one.

🧪 Example with implicit surface

The excerpt provides Example 2.14: find the tangent plane to x² + y² + z² = 9 at (2, 2, -1).

Define F(x,y,z) = x² + y² + z² - 9.
Compute partial derivatives: ∂F/∂x = 2x, ∂F/∂y = 2y, ∂F/∂z = 2z.
Evaluate at (2, 2, -1): ∂F/∂x = 4, ∂F/∂y = 4, ∂F/∂z = -2.
Substitute: 4(x - 2) + 4(y - 2) - 2(z + 1) = 0.
Simplify: 2x + 2y - z - 9 = 0.

🔗 Relationship between the two formulas

Surface type	Formula	Number of partial derivatives
Explicit z = f(x,y)	∂f/∂x(a,b)·(x - a) + ∂f/∂y(a,b)·(y - b) - z + f(a,b) = 0	Two (∂f/∂x, ∂f/∂y)
Implicit F(x,y,z) = 0	∂F/∂x(a,b,c)·(x - a) + ∂F/∂y(a,b,c)·(y - b) + ∂F/∂z(a,b,c)·(z - c) = 0	Three (∂F/∂x, ∂F/∂y, ∂F/∂z)

The explicit formula is the special case where F(x,y,z) = f(x,y) - z.
Both formulas use the same principle: the normal vector to the plane is built from partial derivatives.

Directional Derivatives and the Gradient

2.4 Directional Derivatives and the Gradient

🧭 Overview

🧠 One-sentence thesis

The directional derivative generalizes partial derivatives to measure the rate of change of a function in any direction, and the gradient vector points in the direction of fastest increase while being perpendicular to level curves.

📌 Key points (3–5)

What directional derivatives measure: the rate of change of a function in any specified direction, not just along the x or y axes.
The gradient as a tool: the gradient vector combines all partial derivatives and can be used to compute directional derivatives via a dot product.
Geometric meaning of the gradient: the gradient is perpendicular (normal) to level curves and points in the direction where the function increases fastest.
Common confusion: partial derivatives are special cases of directional derivatives—they measure change only along coordinate axes (using unit vectors i and j), whereas directional derivatives work for any direction.
How to find fastest increase/decrease: the function increases fastest in the direction of the gradient and decreases fastest in the opposite direction.

📐 What directional derivatives are

📐 Definition and motivation

Directional derivative: For a function f(x, y) at a point (a, b) in the direction of a unit vector v, the directional derivative D_v f(a, b) is the limit as h approaches 0 of [f((a, b) + h·v) - f(a, b)] / h.

Partial derivatives tell us the rate of change in the positive x and y directions only.
The directional derivative answers: "What about other directions?"
It measures the instantaneous rate of change of f in any direction specified by a unit vector v.

🔗 Connection to partial derivatives

If v = i = (1, 0), then D_v f = the partial derivative with respect to x.
If v = j = (0, 1), then D_v f = the partial derivative with respect to y.
In other words: partial derivatives are just directional derivatives along the coordinate axes.
Don't confuse: A directional derivative is not limited to axis directions; it works for any unit vector v.

🧮 The computational formula

Theorem: If f(x, y) has continuous partial derivatives, then D_v f(a, b) = v₁ · (∂f/∂x)(a, b) + v₂ · (∂f/∂y)(a, b), where v = (v₁, v₂) is a unit vector.

This formula says: multiply each component of v by the corresponding partial derivative, then add.
The proof uses the Mean Value Theorem from single-variable calculus applied separately in the x and y directions, then takes the limit as h approaches 0.
The continuity of the partial derivatives ensures the limit exists and equals the formula.

🧭 The gradient vector

🧭 Definition

Gradient: For a function f(x, y), the gradient ∇f is the vector (∂f/∂x, ∂f/∂y) in two dimensions. For f(x, y, z), the gradient is (∂f/∂x, ∂f/∂y, ∂f/∂z) in three dimensions.

The symbol ∇ is pronounced "del."
The gradient collects all the first-order partial derivatives into a single vector.
Sometimes written as grad(f) instead of ∇f.

🔗 Relationship to directional derivatives

Corollary: D_v f = v · ∇f (the dot product of v and the gradient).

This is the key connection: to find the directional derivative in direction v, just take the dot product of v with the gradient.
Example: For f(x, y) = x·y² + x³·y at point (1, 2) in direction v = (1/√2, 1/√2):
- First compute ∇f = (y² + 3x²y, 2xy + x³).
- At (1, 2): ∇f(1, 2) = (4 + 6, 4 + 1) = (10, 5).
- Then D_v f(1, 2) = (1/√2, 1/√2) · (10, 5) = 15/√2.

🧲 Geometric properties of the gradient

⊥ Normal to level curves

A level curve is where f(x, y) = c for some constant c.
Along a level curve, the function value does not change, so the rate of change in any tangent direction is zero.
If v is a unit vector tangent to the level curve, then D_v f = 0.
Since D_v f = v · ∇f = ||v|| · ||∇f|| · cos(θ), and ||v|| = 1, we have ||∇f|| · cos(θ) = 0.
Because ∇f ≠ 0 (assumed), this means cos(θ) = 0, so θ = 90°.
Conclusion: The gradient ∇f is perpendicular (normal) to the level curve.

📈 Direction of fastest increase

For any unit vector v, D_v f = ||∇f|| · cos(θ), where θ is the angle between v and ∇f.
At a fixed point, ||∇f|| is fixed, so D_v f varies only with θ.
The largest value of D_v f occurs when cos(θ) = 1, i.e., θ = 0°, meaning v points in the same direction as ∇f.
Conclusion: The function f increases fastest in the direction of ∇f.

📉 Direction of fastest decrease

The smallest value of D_v f occurs when cos(θ) = -1, i.e., θ = 180°, meaning v points opposite to ∇f.
Conclusion: The function f decreases fastest in the direction of -∇f.

📋 Summary theorem

Theorem: Let f(x, y) be continuously differentiable with ∇f ≠ 0. Then: (a) The gradient ∇f is normal to any level curve f(x, y) = c. (b) The value of f increases fastest in the direction of ∇f. (c) The value of f decreases fastest in the direction of -∇f.

This theorem also applies to functions of three or more variables.

🌡️ Applied examples

🌡️ Finding directions of fastest change

Example: For f(x, y) = x·y² + x³·y, in which direction does f increase fastest from (1, 2)?

Compute ∇f = (y² + 3x²y, 2xy + x³).
At (1, 2): ∇f(1, 2) = (10, 5) ≠ 0.
A unit vector in that direction is v = ∇f / ||∇f|| = (10, 5) / √(100 + 25) = (2/√5, 1/√5).
Answer: f increases fastest in direction (2/√5, 1/√5) and decreases fastest in direction (-2/√5, -1/√5).

🌡️ Temperature in a solid

Example: Temperature T(x, y, z) = e^(-x) + e^(-2y) + e^(4z). In which direction from (1, 1, 1) will temperature decrease fastest?

Compute ∇T = (-e^(-x), -2e^(-2y), 4e^(4z)).
At (1, 1, 1): ∇T(1, 1, 1) = (-e^(-1), -2e^(-2), 4e^4).
Temperature decreases fastest in the direction of -∇T(1, 1, 1) = (e^(-1), 2e^(-2), -4e^4).
Don't confuse: The gradient points toward increase; its negative points toward decrease.

🔍 Key distinctions

Concept	What it measures	Direction
Partial derivative ∂f/∂x	Rate of change along x-axis only	Fixed: direction of i = (1, 0)
Partial derivative ∂f/∂y	Rate of change along y-axis only	Fixed: direction of j = (0, 1)
Directional derivative D_v f	Rate of change in any direction v	Arbitrary: any unit vector v
Gradient ∇f	Vector of all partial derivatives	Points toward fastest increase

Common confusion: The gradient is not a rate of change itself; it is a vector. The directional derivative is the rate of change, computed by taking the dot product of a direction vector with the gradient.

Budget: budget:token_budget1000000</budget:token_budget>

Maxima and Minima

2.5 Maxima and Minima

🧭 Overview

🧠 One-sentence thesis

The gradient of a smooth function must be zero at any local maximum or minimum, and a second-derivative test using the discriminant D determines whether a critical point is a local maximum, local minimum, or saddle point.

📌 Key points (3–5)

Necessary condition for extrema: If a function has a local max or min at a point, the gradient must be zero there (all first-order partial derivatives are zero).
Critical points vs extrema: A critical point (where gradient = 0) is not always a local max or min; it may be a saddle point.
Second-derivative test: The discriminant D (built from second-order partial derivatives) and the sign of the second partial with respect to x determine the nature of a critical point.
Common confusion: When D = 0, the test fails—you cannot conclude anything from the theorem alone; direct inspection or other methods may be needed.
Why it matters: Finding extrema is essential for optimization; the method extends to functions of three or more variables.

📐 Definitions and setup

📐 Local maximum and minimum

Local maximum at (a, b): f(x, y) ≤ f(a, b) for all (x, y) inside some disk of positive radius centered at (a, b).

Local minimum at (a, b): f(x, y) ≥ f(a, b) for all (x, y) inside some disk of positive radius centered at (a, b).

"Local" means the inequality holds only in a small neighborhood (disk) around the point, not necessarily everywhere.
Global maximum/minimum: The inequality holds for all (x, y) in the entire domain of f.

🧭 Critical points

Critical point (a, b): a point where the gradient of f is zero, i.e., ∇f(a, b) = 0.

This means both partial derivatives are zero: ∂f/∂x (a, b) = 0 and ∂f/∂y (a, b) = 0.
To find critical points, solve these two equations simultaneously.

🔍 Necessary condition for extrema

🔍 Why the gradient must vanish

Theorem 2.5: If f(x, y) has a local maximum or minimum at (a, b), and both first-order partial derivatives exist at (a, b), then ∇f(a, b) = 0.

Reasoning:

Suppose (a, b) is a local maximum.
Then f(a, b) is the largest value in all directions from (a, b) within some small disk.
In particular, in the x-direction: the single-variable function g(x) = f(x, b) has a local maximum at x = a, so g′(a) = 0.
But g′(x) = ∂f/∂x (x, b), so ∂f/∂x (a, b) = 0.
Similarly, in the y-direction, ∂f/∂y (a, b) = 0.
The same reasoning applies to local minima.

Important: This is a necessary condition (must hold if there is an extremum) but not sufficient (a critical point need not be an extremum).

⚠️ Saddle points

Example 2.18: f(x, y) = x y has a critical point at (0, 0).

Solving ∂f/∂x = y = 0 and ∂f/∂y = x = 0 gives (0, 0).
But (0, 0) is not a local max or min:
- When x and y have the same sign, f(x, y) = x y > 0 = f(0, 0).
- When x and y have opposite signs, f(x, y) = x y < 0 = f(0, 0).
Along the path y = x, f(x, y) = x² has a local minimum at (0, 0).
Along the path y = −x, f(x, y) = −x² has a local maximum at (0, 0).
This is a saddle point: a local maximum in one direction and a local minimum in another.

Don't confuse: A critical point is a candidate for an extremum, but it may be a saddle point instead.

🧪 Second-derivative test

🧪 The discriminant D

Theorem 2.6: Let f(x, y) be smooth (all partial derivatives exist and are continuous), with a critical point at (a, b). Define:

D = (∂²f/∂x²)(a, b) · (∂²f/∂y²)(a, b) − [(∂²f/∂y∂x)(a, b)]²

Then:

(a) If D > 0 and ∂²f/∂x² (a, b) > 0, then f has a local minimum at (a, b).
(b) If D > 0 and ∂²f/∂x² (a, b) < 0, then f has a local maximum at (a, b).
(c) If D < 0, then f has neither a local max nor a local min at (a, b); (a, b) is a saddle point.
(d) If D = 0, the test fails (no conclusion).

Why the formula works:

D is the determinant of the 2×2 matrix (Hessian) of second partial derivatives.
For smooth functions, ∂²f/∂y∂x = ∂²f/∂x∂y, so the matrix is symmetric.
If D > 0, then ∂²f/∂x² and ∂²f/∂y² have the same sign (both positive → minimum; both negative → maximum).
You can replace ∂²f/∂x² by ∂²f/∂y² in parts (a) and (b) if desired.

🔢 Step-by-step procedure

Find all critical points by solving ∂f/∂x = 0 and ∂f/∂y = 0 simultaneously.
Compute the second-order partial derivatives: ∂²f/∂x², ∂²f/∂y², ∂²f/∂y∂x.
For each critical point (a, b), compute D.
Apply Theorem 2.6 to classify the critical point.

📝 Worked examples

📝 Example 2.19: f(x, y) = x² + x y + y² − 3x

Step 1: Find critical points

∂f/∂x = 2x + y − 3 = 0
∂f/∂y = x + 2y = 0
Solving: 2x + y = 3 and x + 2y = 0 → unique solution (x, y) = (2, −1).

Step 2: Second derivatives

∂²f/∂x² = 2, ∂²f/∂y² = 2, ∂²f/∂y∂x = 1

Step 3: Compute D at (2, −1)

D = (2)(2) − (1)² = 4 − 1 = 3 > 0
∂²f/∂x² (2, −1) = 2 > 0

Conclusion: (2, −1) is a local minimum.

📝 Example 2.20: f(x, y) = x y − x³ − y²

Step 1: Find critical points

∂f/∂x = y − 3x² = 0 → y = 3x²
∂f/∂y = x − 2y = 0 → x = 2y
Substituting y = 3x² into x = 2y: x = 6x² → x(1 − 6x) = 0 → x = 0 or x = 1/6.
x = 0 → y = 0; x = 1/6 → y = 3(1/6)² = 1/12.
Critical points: (0, 0) and (1/6, 1/12).

Step 2: Second derivatives

∂²f/∂x² = −6x, ∂²f/∂y² = −2, ∂²f/∂y∂x = 1

Step 3: Classify (0, 0)

D = (−6·0)(−2) − (1)² = 0 − 1 = −1 < 0
Conclusion: (0, 0) is a saddle point.

Step 4: Classify (1/6, 1/12)

D = (−6·1/6)(−2) − (1)² = (−1)(−2) − 1 = 2 − 1 = 1 > 0
∂²f/∂x² (1/6, 1/12) = −1 < 0
Conclusion: (1/6, 1/12) is a local maximum.

📝 Example 2.21: When D = 0 (test fails)

Function: f(x, y) = (x − 2)⁴ + (x − 2y)²

Step 1: Find critical points

∂f/∂x = 4(x − 2)³ + 2(x − 2y) = 0
∂f/∂y = −4(x − 2y) = 0
From the second equation: x = 2y.
Substituting into the first: 4(2y − 2)³ = 0 → y = 1, so x = 2.
Critical point: (2, 1).

Step 2: Second derivatives

∂²f/∂x² = 12(x − 2)² + 2, ∂²f/∂y² = 8, ∂²f/∂y∂x = −4

Step 3: Compute D at (2, 1)

D = (2)(8) − (−4)² = 16 − 16 = 0
The test fails.

Step 4: Direct inspection

f(x, y) is the sum of a fourth power and a square, so f(x, y) ≥ 0 for all (x, y).
f(2, 1) = 0.
Therefore f(x, y) ≥ 0 = f(2, 1) for all (x, y).
Conclusion: (2, 1) is a global minimum.

Lesson: When D = 0, examine the function directly or use other methods.

📝 Example 2.22: Multiple critical points

Function: f(x, y) = (x² + y²) e^(−(x² + y²))

Step 1: Find critical points

∂f/∂x = 2x(1 − (x² + y²)) e^(−(x² + y²))
∂f/∂y = 2y(1 − (x² + y²)) e^(−(x² + y²))
Both are zero when:
- x = 0 and y = 0, or
- x² + y² = 1 (the unit circle).

Step 2: Classify (0, 0)

Second derivatives at (0, 0):
- ∂²f/∂x² (0, 0) = 2, ∂²f/∂y² (0, 0) = 2, ∂²f/∂y∂x (0, 0) = 0
D = (2)(2) − 0² = 4 > 0, ∂²f/∂x² (0, 0) = 2 > 0
Conclusion: (0, 0) is a local minimum.

Step 3: Points on the unit circle x² + y² = 1

D = 0 at these points (the test fails).
Alternative approach: Switch to polar coordinates r² = x² + y².
Write f as g(r) = r² e^(−r²).
g′(r) = 2r(1 − r²) e^(−r²) has a critical point at r = 1.
g″(1) = −4 e^(−1) < 0, so r = 1 is a local maximum by the single-variable second-derivative test.
Conclusion: Points on the unit circle x² + y² = 1 are local maximum points for f.

Lesson: When the test fails, alternative methods (polar coordinates, direct inspection, etc.) may reveal the nature of the critical point.

🔄 Summary table

Condition	∂²f/∂x² sign	Classification
D > 0	> 0	Local minimum
D > 0	< 0	Local maximum
D < 0	(any)	Saddle point
D = 0	(any)	Test fails; use other methods

Remember:

First, find critical points by solving ∇f = 0.
Then compute D and apply the test.
If D = 0, inspect the function directly or use alternative techniques (polar coordinates, examining paths, etc.).

Unconstrained Optimization: Numerical Methods

2.6 Unconstrained Optimization: Numerical Methods

🧭 Overview

🧠 One-sentence thesis

When finding critical points of multivariable functions requires solving equations that cannot be solved by elementary means, Newton's algorithm provides a numerical method that generates a sequence of points converging to the actual critical point.

📌 Key points (3–5)

Why numerical methods are needed: solving the gradient equation ∇f = 0 often involves polynomial or transcendental equations that cannot be solved exactly by elementary means.
What Newton's algorithm does: starting from an initial point, it iteratively computes new points using a formula involving first and second partial derivatives until convergence to a critical point.
How to use the algorithm: pick an initial point where D ≠ 0, apply the iterative formulas, and the sequence converges to a critical point (different initial points may lead to different critical points).
Common confusion: convergence depends on a "reasonable" initial point—some choices may cause the algorithm to fail or converge slowly.
Connection to broader methods: Newton's algorithm is a special case of steepest descent techniques used in nonlinear programming to find global maxima and minima.

🔍 The problem: when elementary methods fail

🔍 What makes equations unsolvable

Unconstrained optimization problems: finding local (and perhaps global) maximum and minimum points of real-valued functions f(x, y), where the points (x, y) can be any points in the domain of f.

The standard method requires solving ∇f = 0, which is a system of two equations in two unknowns.
When this becomes impossible:
- Polynomials of degree three or higher in x and y
- Complicated expressions involving trigonometric, exponential, or logarithmic functions
Example: solving x³ + 9x − 2 = 0 has the exact solution cube-root of (28 + 1) minus cube-root of (28 − 1), which is not practical to find by trial and error.

🔢 The numerical solution approach

Instead of exact solutions, use a numerical method that generates a sequence of numbers converging to the actual solution.
The excerpt mentions Newton's method for single-variable equations f(x) = 0 from single-variable calculus.
This section extends that idea to finding critical points of functions of two variables.

🧮 Newton's algorithm for two variables

🧮 The setup

Define:

D(x, y) = (∂²f/∂x²)(x, y) · (∂²f/∂y²)(x, y) − [(∂²f/∂y∂x)(x, y)]²

This is the same discriminant D used in the Second Derivative Test.

🧮 The iterative formulas

Newton's algorithm:

Pick an initial point (x₀, y₀).
For n = 0, 1, 2, 3, ..., define:
- x_{n+1} = x_n − [determinant involving ∂²f/∂y², ∂²f/∂x∂y, ∂f/∂y, ∂f/∂x evaluated at (x_n, y_n)] / D(x_n, y_n)
- y_{n+1} = y_n − [determinant involving ∂²f/∂x², ∂²f/∂x∂y, ∂f/∂x, ∂f/∂y evaluated at (x_n, y_n)] / D(x_n, y_n)
The sequence of points (x_n, y_n) converges to a critical point.

Important constraint: D must not be zero at the initial point (since we divide by D).

🎯 Finding multiple critical points

If there are several critical points, you must try different initial points to find them.
Different starting points can lead to different critical points.

📝 Worked example: a cubic polynomial function

📝 The function and its derivatives

The example uses f(x, y) = x³ − xy − x + xy³ − y⁴.

First partial derivatives:

∂f/∂x = 3x² − y − 1 + y³
∂f/∂y = −x + 3xy² − 4y³

Second partial derivatives:

∂²f/∂x² = 6x
∂²f/∂y² = 6xy − 12y²
∂²f/∂y∂x = −1 + 3y²

Why numerical methods are needed here: solving ∇f = 0 involves two third-degree polynomial equations in x and y, which cannot be done easily.

📝 Choosing the initial point

Looking at the graph of z = f(x, y) over a large region may help, though it may be hard to tell where the critical points are.
The initial point must satisfy D ≠ 0.
For this example: D(0, 0) = (0)(0) − (−1)² = −1 ≠ 0, so (0, 0) is a valid initial point.

💻 Computer implementation

Because the computations are tedious and may require many iterations, the excerpt uses a computer program (written in Java).
The program performs 100 iterations of Newton's algorithm and prints each new point to observe convergence.
The code includes functions for each partial derivative (fx, fy, fxx, fyy, fxy) specific to the function f.

🎯 Results and interpretation

🎯 First critical point: starting from (0, 0)

Convergence behavior:

After 8 iterations, the sequence converged to (0.4711356343449874, −0.39636433796318005).
The partial derivatives at this point are extremely close to zero (on the order of 10⁻¹⁷).
D at this point is approximately −8.776 < 0, so by the Second Derivative Test, this is a saddle point.

🎯 Second critical point: starting from (−1, −1)

Convergence behavior:

After 18 iterations, the sequence converged to (−0.6703832459238667, 0.42501465652420045).
Both partial derivatives vanish at this point.
D ≈ 15.385 > 0 and ∂²f/∂x² ≈ −4.022 < 0, so this is a local maximum.

🎯 Third critical point: starting from (−5, −5)

Convergence behavior:

The algorithm yielded the critical point (−7.540962756992551, −5.595509445899435).
D < 0 at this point, making it a saddle point.

🎯 Summary table

Initial point	Critical point found	Type
(0, 0)	(0.471..., −0.396...)	Saddle point
(−1, −1)	(−0.670..., 0.425...)	Local maximum
(−5, −5)	(−7.541..., −5.596...)	Saddle point

Don't confuse: The same function can have multiple critical points of different types; the initial point determines which one the algorithm finds.

🌐 Broader context and related methods

🌐 Theoretical foundations

The derivation of Newton's algorithm and the proof that it converges (given a "reasonable" choice for the initial point) requires techniques beyond the scope of this text.
The two-variable case described here is a special case of a more general algorithm applicable to functions of n ≥ 2 variables.

🌐 Nonlinear programming

Nonlinear programming: the field of study focused on finding global minima of functions of any number of variables.

In practical applications, global maxima and minima tend to be more interesting than local versions.
Any maximization problem can be turned into a minimization problem (by negating the function).
Many methods have been developed for this purpose.

🌐 Steepest descent technique

Steepest descent: a technique based on the idea that the negative gradient −∇f gives the direction of the fastest rate of decrease of a function f.

How it works:

Start from some initial point.
Move a certain amount in the direction of −∇f at that point.
The new location becomes your next point.
Repeat until you reach the point where f has its smallest value.

Variations:

There is a "pure" steepest descent method.
Many variations improve the rate of convergence, ease of calculation, etc.
Newton's algorithm can be interpreted as a modified steepest descent method.

Example: Starting from an initial point, you follow the direction where the function decreases most rapidly, step by step, until you reach a minimum.

Constrained Optimization: Lagrange Multipliers

2.7 Constrained Optimization: Lagrange Multipliers

🧭 Overview

🧠 One-sentence thesis

The Lagrange multiplier method provides a general technique for finding maxima and minima of functions when the variables must satisfy a constraint equation, by solving the equation "gradient of f equals lambda times gradient of g" instead of requiring direct substitution.

📌 Key points (3–5)

What constrained optimization means: finding maximum or minimum values of a function f(x, y) subject to a constraint equation g(x, y) = c.
The Lagrange multiplier method: solve the equation "gradient of f equals lambda times gradient of g" to find constrained critical points; the constant lambda is called the Lagrange multiplier.
When the method guarantees a solution: if the constraint equation describes a bounded set (including any hidden constraints from the problem context), then the constrained maximum or minimum must occur either at a point satisfying the gradient equation or at a boundary point.
Common confusion: the method gives only a necessary condition—whether a constrained critical point is actually a maximum or minimum depends on the nature of the problem and whether the constraint set is bounded.
What lambda tells you: the value of the Lagrange multiplier approximates how much the function value changes when the constant c in the constraint equation changes by 1.

🎯 The constrained optimization problem

🎯 What makes a problem "constrained"

Constrained optimization problem: Maximize (or minimize) f(x, y) (or f(x, y, z)) given that g(x, y) = c (or g(x, y, z) = c) for some constant c.

The equation g(x, y) = c is called the constraint equation.
Variables x and y are said to be constrained by g(x, y) = c.
Constrained maximum or minimum points: points (x, y) that are maxima or minima of f(x, y) with the condition that they satisfy the constraint equation.

🔄 The traditional substitution approach

Example: maximize the area f(x, y) = xy of a rectangle with perimeter 2x + 2y = 20.
If you can solve the constraint equation for one variable in terms of the other (e.g., y = 10 - x), substitute into f to get a single-variable function (f(x) = 10x - x²).
Then use single-variable calculus (derivatives, critical points) to find the maximum.
Limitation: this approach works only when you can explicitly solve for one variable in terms of the other, which is often not possible.

🧮 The Lagrange multiplier method

🧮 The core theorem

Theorem 2.7: Let f(x, y) and g(x, y) be smooth functions, and suppose that c is a scalar constant such that the gradient of g(x, y) is not zero for all (x, y) that satisfy g(x, y) = c. Then to solve the constrained optimization problem "Maximize (or minimize) f(x, y) given g(x, y) = c," find the points (x, y) that solve the equation "gradient of f(x, y) = lambda times gradient of g(x, y)" for some constant lambda (called the Lagrange multiplier). If there is a constrained maximum or minimum, then it must be such a point.

The method gives a necessary condition only—it tells you where constrained maxima or minima can occur, not that they do occur.
The theorem extends to functions of three variables: solve "gradient of f(x, y, z) = lambda times gradient of g(x, y, z)."

🔍 How to apply the method

Write down the constraint equation g(x, y) = c.
Set up the system of equations: partial derivative of f with respect to x = lambda times partial derivative of g with respect to x, and partial derivative of f with respect to y = lambda times partial derivative of g with respect to y.
Solve for lambda in both equations, then set those expressions equal to each other to find relationships between x and y.
Substitute back into the constraint equation to solve for the actual values of x and y.
These points are called constrained critical points.

Example: For the rectangle problem (maximize xy given 2x + 2y = 20):

Solving "gradient of f = lambda times gradient of g" gives y = 2 lambda and x = 2 lambda.
Setting y/2 = lambda = x/2 implies x = y.
Substituting into 2x + 2y = 20 gives x = 5, y = 5.

⚠️ When is a critical point actually a maximum or minimum?

The nature of the problem itself sometimes makes it clear (e.g., there must be a maximum area, and the minimum area is 0).
Key criterion: if the constraint equation g(x, y) = c (plus any hidden constraints) describes a bounded set B in R², then the constrained maximum or minimum will occur either at a point satisfying the gradient equation or at a boundary point of B.
Example: the constraint 2x + 2y = 20 describes a line (unbounded), but hidden constraints 0 ≤ x, y ≤ 10 (from the rectangle problem) restrict it to a line segment (bounded).
Don't confuse: a constrained critical point is not automatically a maximum or minimum—you must check boundedness or use the problem context.

📐 Worked examples

📐 Finding closest and farthest points on a circle

Problem: Find points on the circle x² + y² = 80 closest to and farthest from (1, 2).

Minimize (and maximize) f(x, y) = (x - 1)² + (y - 2)² given g(x, y) = x² + y² = 80.
Solving the gradient equation: 2(x - 1) = 2 lambda x and 2(y - 2) = 2 lambda y.
Note x ≠ 0 and y ≠ 0 (otherwise contradictions arise).
Solving for lambda: (x - 1)/x = lambda = (y - 2)/y, which simplifies to y = 2x.
Substituting into x² + y² = 80 gives 5x² = 80, so x = ±4.
Constrained critical points: (4, 8) and (-4, -8).
Since f(4, 8) = 45 and f(-4, -8) = 125, and the circle is bounded, (4, 8) is closest and (-4, -8) is farthest.

📐 Three-variable example

Problem: Maximize (and minimize) f(x, y, z) = x + z given g(x, y, z) = x² + y² + z² = 1.

Solving the gradient equation: 1 = 2 lambda x, 0 = 2 lambda y, 1 = 2 lambda z.
The first equation implies lambda ≠ 0, so y = 0 from the second equation.
From the first and third equations: x = 1/(2 lambda) = z.
Substituting into x² + y² + z² = 1 gives constrained critical points (1/√2, 0, 1/√2) and (-1/√2, 0, -1/√2).
Since the constraint describes a sphere (bounded), the first point is the constrained maximum and the second is the constrained minimum.

🔢 The meaning of lambda

🔢 Lambda as a rate of change

The value of lambda approximates the change in the function value f when the constant c in the constraint equation g(x, y) = c is changed by 1.
Notation: lambda ≈ Δf = f(new max. point) - f(old max. point).

🔢 Example from the rectangle problem

Original problem: maximize xy given 2x + 2y = 20. Solution: (5, 5), f(5, 5) = 25, lambda = 2.5.
Modified problem: maximize xy given 2x + 2y = 21. Solution: (5.25, 5.25), f(5.25, 5.25) = 27.5625.
Change in f: 27.5625 - 25 = 2.5625.
Notice lambda = 2.5 is close to 2.5625, confirming the approximation.

🛠️ Practical considerations

🛠️ Limitations of the method

Solving "gradient of f = lambda times gradient of g" means solving a system of two (possibly nonlinear) equations in three unknowns (x, y, lambda), which may not be possible analytically.
The three-variable case (x, y, z, lambda) can be even more complicated.
This restricts the usefulness of Lagrange's method to relatively simple functions.
Many numerical methods exist for solving constrained optimization problems, though they are not covered in this excerpt.

Double Integrals

3.1 Double Integrals

🧭 Overview

🧠 One-sentence thesis

Double integrals extend single-variable integration to functions of two variables by using iterated integrals to calculate volumes under surfaces, and they can be evaluated over rectangular or general regions using appropriate limits of integration.

📌 Key points (3–5)

What double integrals represent: For nonnegative functions f(x, y), the double integral represents the volume under the surface z = f(x, y) above a region in the xy-plane.
How to compute them: Use iterated integrals—integrate with respect to one variable while treating the other as constant, then integrate the result with respect to the second variable.
Order of integration: The order of iterated integrals generally does not matter (by Fubini's Theorem), so you can integrate with respect to x first then y, or vice versa.
Common confusion: The limits of integration for general regions depend on which variable you integrate first—inner limits may be functions of the outer variable.
Extension beyond volume: Double integrals work for any continuous function, not just nonnegative ones, representing signed volume (positive above the xy-plane, negative below).

📐 Building the concept from single-variable calculus

🔄 From antiderivatives to inverse operations

In single-variable calculus, integration and differentiation are inverse operations—to integrate f(x), find the antiderivative F(x) whose derivative is f(x).

The same inverse relationship holds for functions of two or more variables.
For functions of two variables, integrating with respect to y is the inverse of taking the partial derivative with respect to y.
This connection allows us to extend the integration concept to multiple dimensions.

📏 From area to volume

In single-variable calculus: the definite integral of f(x) ≥ 0 represents the area under the curve y = f(x).
In multivariable calculus: the double integral of f(x, y) ≥ 0 represents the volume under the surface z = f(x, y).
This geometric interpretation motivates the definition and provides intuition for the calculation method.

🔢 Computing double integrals over rectangles

🧮 The iterated integral method

For a continuous function f(x, y) ≥ 0 on a rectangle R = [a, b] × [c, d]:

Volume formula using iterated integrals:

V = integral from a to b of [integral from c to d of f(x, y) dy] dx
This equals: integral from c to d of [integral from a to b of f(x, y) dx] dy

How it works:

Inner integral: Integrate f(x, y) with respect to one variable (say y), treating the other variable (x) as a constant. This gives a function of x alone.
Outer integral: Integrate the resulting function with respect to the remaining variable (x).
The final result is a number representing the volume.

🔄 Order independence

You can integrate in either order: first with respect to x then y, or first with respect to y then x.
The result should be the same (this is guaranteed by Fubini's Theorem for continuous functions).
Choose the order that makes the calculation easier.

Example: For f(x, y) = 8x + 6y over [0, 1] × [0, 2], both orders give V = 20.

⚠️ Notation conventions

The notation "integral integral f(x, y) dx dy" means: integrate first with respect to x (the variable whose differential appears first), then with respect to y.
The limits of integration closest to the function correspond to the first integration.
Brackets are often omitted once you understand the convention.

🌐 Double integrals over general regions

🗺️ Beyond rectangles

For regions that are not rectangular, the limits of integration become functions rather than constants.

Vertical slices (integrating y first, then x):

Region R bounded by x = a, x = b, y = g₁(x), y = g₂(x)
Double integral of f(x, y) over R = integral from a to b of [integral from g₁(x) to g₂(x) of f(x, y) dy] dx
The inner limits g₁(x) and g₂(x) are functions of x.

Horizontal slices (integrating x first, then y):

Region R bounded by y = c, y = d, x = h₁(y), x = h₂(y)
Double integral of f(x, y) over R = integral from c to d of [integral from h₁(y) to h₂(y) of f(x, y) dx] dy
The inner limits h₁(y) and h₂(y) are functions of y.

📊 Choosing the slice direction

Both approaches should give the same answer if done correctly.
Choose the direction that makes the limits of integration simpler to express.
Sketch the region to determine appropriate bounds.

Example: For the region 0 ≤ x ≤ 1, 0 ≤ y ≤ 2x², you can use vertical slices with y from 0 to 2x² and x from 0 to 1, or horizontal slices with x from √(y/2) to 1 and y from 0 to 2.

🎯 The area element dA

The symbol dA represents an "infinitesimal" area element.
It reminds us we're integrating over a two-dimensional region.
In practice, dA becomes dx dy or dy dx depending on the order of integration.

➕ Signed volumes and general functions

⚖️ When f(x, y) changes sign

For any continuous function f(x, y), the double integral represents the difference between volume below the surface but above the xy-plane (where f ≥ 0) and volume above the surface but below the xy-plane (where f ≤ 0).

This parallels the single-variable case where integrals give signed area.
The method of iterated integrals works regardless of whether f(x, y) is always nonnegative.
The result may be positive, negative, or zero.

Example: The integral of sin(x + y) over [0, π] × [0, 2π] equals 0 because positive and negative contributions cancel.

🔧 Practical computation

Set up the iterated integral with appropriate limits.
Evaluate the inner integral, treating the outer variable as constant.
Evaluate the outer integral of the resulting expression.
The techniques are the same whether f is always positive or not.

📏 Formal definition for general regions

🔲 Approximation by rectangles

For a bounded region R in the xy-plane:

Enclose R in a rectangle [a, b] × [c, d].
Divide this rectangle into a grid of subrectangles.
Consider only subrectangles completely inside R.
In each such subrectangle, pick a point and evaluate f there.
Multiply by the area of the subrectangle to get approximate volume.
Sum over all subrectangles inside R.

🎯 Taking the limit

As the subrectangles become smaller (largest diagonal approaches 0), the approximation improves.
The double integral is defined as the limit of these sums.
This definition extends to functions that are not always nonnegative.
For regions of the type with functional boundaries, this reduces to iterated integrals.

♾️ Improper double integrals

The region R does not have to be bounded.
Functions may be undefined at some points in R.
Evaluate as a sequence of iterated improper single-variable integrals.

Example: The integral from 1 to ∞ of [integral from 0 to 1/x² of 2y dy] dx equals 1/3.

Double Integrals

🧭 Overview

🧠 One-sentence thesis

📌 Key points (3–5)

What double integrals represent: For nonnegative functions f(x, y), the double integral represents the volume under the surface z = f(x, y) above a region in the xy-plane.
How to compute them: Use iterated integrals—integrate with respect to one variable while treating the other as constant, then integrate the result with respect to the second variable.
Order of integration: The order of iterated integrals generally does not matter (by Fubini's Theorem), so you can integrate with respect to x first then y, or vice versa.
Common confusion: The limits of integration for general regions depend on which variable you integrate first—inner limits may be functions of the outer variable.
Extension beyond volume: Double integrals work for any continuous function, not just nonnegative ones, representing signed volume (positive above the xy-plane, negative below).

📐 Building the concept from single-variable calculus

🔄 From antiderivatives to inverse operations

In single-variable calculus, integration and differentiation are inverse operations—to integrate f(x), find the antiderivative F(x) whose derivative is f(x).

The same inverse relationship holds for functions of two or more variables.
For functions of two variables, integrating with respect to y is the inverse of taking the partial derivative with respect to y.
This connection allows us to extend the integration concept to multiple dimensions.

📏 From area to volume

In single-variable calculus: the definite integral of f(x) ≥ 0 represents the area under the curve y = f(x).
In multivariable calculus: the double integral of f(x, y) ≥ 0 represents the volume under the surface z = f(x, y).
This geometric interpretation motivates the definition and provides intuition for the calculation method.

🔢 Computing double integrals over rectangles

🧮 The iterated integral method

For a continuous function f(x, y) ≥ 0 on a rectangle R = [a, b] × [c, d]:

Volume formula using iterated integrals:

V = integral from a to b of [integral from c to d of f(x, y) dy] dx
This equals: integral from c to d of [integral from a to b of f(x, y) dx] dy

How it works:

Inner integral: Integrate f(x, y) with respect to one variable (say y), treating the other variable (x) as a constant. This gives a function of x alone.
Outer integral: Integrate the resulting function with respect to the remaining variable (x).
The final result is a number representing the volume.

🔄 Order independence

You can integrate in either order: first with respect to x then y, or first with respect to y then x.
The result should be the same (this is guaranteed by Fubini's Theorem for continuous functions).
Choose the order that makes the calculation easier.

Example: For f(x, y) = 8x + 6y over [0, 1] × [0, 2], both orders give V = 20.

⚠️ Notation conventions

The notation "integral integral f(x, y) dx dy" means: integrate first with respect to x (the variable whose differential appears first), then with respect to y.
The limits of integration closest to the function correspond to the first integration.
Brackets are often omitted once you understand the convention.

🌐 Double integrals over general regions

🗺️ Beyond rectangles

For regions that are not rectangular, the limits of integration become functions rather than constants.

Vertical slices (integrating y first, then x):

Region R bounded by x = a, x = b, y = g₁(x), y = g₂(x)
Double integral of f(x, y) over R = integral from a to b of [integral from g₁(x) to g₂(x) of f(x, y) dy] dx
The inner limits g₁(x) and g₂(x) are functions of x.

Horizontal slices (integrating x first, then y):

Region R bounded by y = c, y = d, x = h₁(y), x = h₂(y)
Double integral of f(x, y) over R = integral from c to d of [integral from h₁(y) to h₂(y) of f(x, y) dx] dy
The inner limits h₁(y) and h₂(y) are functions of y.

📊 Choosing the slice direction

Both approaches should give the same answer if done correctly.
Choose the direction that makes the limits of integration simpler to express.
Sketch the region to determine appropriate bounds.

Example: For the region 0 ≤ x ≤ 1, 0 ≤ y ≤ 2x², you can use vertical slices with y from 0 to 2x² and x from 0 to 1, or horizontal slices with x from √(y/2) to 1 and y from 0 to 2.

🎯 The area element dA

The symbol dA represents an "infinitesimal" area element.
It reminds us we're integrating over a two-dimensional region.
In practice, dA becomes dx dy or dy dx depending on the order of integration.

➕ Signed volumes and general functions

⚖️ When f(x, y) changes sign

For any continuous function f(x, y), the double integral represents the difference between volume below the surface but above the xy-plane (where f ≥ 0) and volume above the surface but below the xy-plane (where f ≤ 0).

This parallels the single-variable case where integrals give signed area.
The method of iterated integrals works regardless of whether f(x, y) is always nonnegative.
The result may be positive, negative, or zero.

Example: The integral of sin(x + y) over [0, π] × [0, 2π] equals 0 because positive and negative contributions cancel.

🔧 Practical computation

Set up the iterated integral with appropriate limits.
Evaluate the inner integral, treating the outer variable as constant.
Evaluate the outer integral of the resulting expression.
The techniques are the same whether f is always positive or not.

📏 Formal definition for general regions

🔲 Approximation by rectangles

For a bounded region R in the xy-plane:

Enclose R in a rectangle [a, b] × [c, d].
Divide this rectangle into a grid of subrectangles.
Consider only subrectangles completely inside R.
In each such subrectangle, pick a point and evaluate f there.
Multiply by the area of the subrectangle to get approximate volume.
Sum over all subrectangles inside R.

🎯 Taking the limit

As the subrectangles become smaller (largest diagonal approaches 0), the approximation improves.
The double integral is defined as the limit of these sums.
This definition extends to functions that are not always nonnegative.
For regions of the type with functional boundaries, this reduces to iterated integrals.

♾️ Improper double integrals

The region R does not have to be bounded.
Functions may be undefined at some points in R.
Evaluate as a sequence of iterated improper single-variable integrals.

Example: The integral from 1 to ∞ of [integral from 0 to 1/x² of 2y dy] dx equals 1/3.

Double Integrals Over a General Region

3.2 Double Integrals Over a General Region

🧭 Overview

🧠 One-sentence thesis

Double integrals can be extended from rectangles to general regions by using vertical or horizontal slices bounded by curves, allowing calculation of volumes under surfaces over non-rectangular domains.

📌 Key points (3–5)

Extension from rectangles: Double integrals over general regions use the same slice method as rectangles, but with curve boundaries instead of constant limits.
Two slice orientations: Vertical slices integrate y first (with x-dependent limits), then x; horizontal slices integrate x first (with y-dependent limits), then y.
Order matters for limits: The inner integral's limits are functions of the outer variable; the outer integral's limits are constants.
Common confusion: Vertical vs horizontal slicing—vertical means slicing parallel to the y-axis (integrating dy first), horizontal means slicing parallel to the x-axis (integrating dx first).
General definition: For arbitrary regions, the double integral is defined as a limit of sums over subrectangles inside the region, reducing to iterated integrals for standard region types.

📐 Region types and slice methods

📐 Vertically-bounded regions

A region R bounded on the left by x = a, on the right by x = b (where a < b), below by a curve y = g₁(x), and above by a curve y = g₂(x).

The curves g₁(x) and g₂(x) do not intersect on the open interval (a, b).
They may intersect at the endpoints x = a and x = b.
Vertical slices: Take slices parallel to the y-axis.
The double integral becomes: integral from a to b of [integral from g₁(x) to g₂(x) of f(x,y) dy] dx.
Why this order: Integrate with respect to y first (with x-dependent limits), producing a function of x alone, then integrate with respect to x.

📏 Horizontally-bounded regions

A region R bounded on the left by a curve x = h₁(y), on the right by a curve x = h₂(y), below by y = c, and above by y = d (where c < d).

The curves h₁(y) and h₂(y) do not intersect on the open interval (c, d).
Horizontal slices: Take slices parallel to the x-axis.
The double integral becomes: integral from c to d of [integral from h₁(y) to h₂(y) of f(x,y) dx] dy.
Why this order: Integrate with respect to x first (with y-dependent limits), producing a function of y alone, then integrate with respect to y.

🔄 Choosing between methods

Both methods give the same answer for the same region and function.
Example from the excerpt: Volume under z = 8x + 6y over R = {(x,y): 0 ≤ x ≤ 1, 0 ≤ y ≤ 2x²} computed both ways yields 32/5 = 6.4.
Vertical approach: Integrate from y = 0 to y = 2x², then x from 0 to 1.
Horizontal approach: Integrate from x = square root of (y/2) to x = 1, then y from 0 to 2.
Don't confuse: The region description determines which limits are functions and which are constants.

🧮 Notation and interpretation

🧮 Area element dA

The symbol dA is called an area element or infinitesimal, with the A signifying area.

It represents an infinitesimal piece of area in the region R.
In practice, dA is replaced by dy dx (for vertical slices) or dx dy (for horizontal slices).
The order of dy dx or dx dy indicates the order of integration.

📦 Volume interpretation

When f(x,y) ≥ 0 for all (x,y) in region R, the double integral over R gives the volume under the surface z = f(x,y) over the region R.
Example: Finding volume under the plane z = 8x + 6y over a parabolic region.
Example: Finding volume of a solid bounded by coordinate planes and a plane like 2x + y + 4z = 4.
The excerpt shows that the volume is computed by integrating the height function (z expressed in terms of x and y) over the base region R.

🔬 General definition for arbitrary regions

🔬 Approximation by subrectangles

For a bounded region R in R², enclose it in a rectangle [a,b] × [c,d].
Divide the rectangle into a grid of subrectangles.
Consider only subrectangles completely enclosed within R.
In each subrectangle [xᵢ, xᵢ₊₁] × [yⱼ, yⱼ₊₁], pick a point (xᵢ*, yⱼ*).
The volume under z = f(x,y) over that subrectangle is approximately f(xᵢ*, yⱼ*) Δxᵢ Δyⱼ, where Δxᵢ = xᵢ₊₁ - xᵢ and Δyⱼ = yⱼ₊₁ - yⱼ.
Interpretation: f(xᵢ*, yⱼ*) is the height, Δxᵢ Δyⱼ is the base area of a parallelepiped.

🎯 Limit definition

Total volume is approximately the double sum: sum over j of sum over i of f(xᵢ*, yⱼ*) Δxᵢ Δyⱼ.
The summation occurs over indices of subrectangles inside R.
Definition: The double integral over R of f(x,y) dA is the limit of this double summation as the largest diagonal of the subrectangles goes to 0.
As subrectangles become smaller, they fill more of region R, and the sum approaches the actual volume.
For non-negative functions, this gives volume; for general functions, replace volume by negative volume when f(x,y) < 0.

🌐 Connection to iterated integrals

For regions of the types shown (vertically or horizontally bounded), this general definition reduces to a sequence of two iterated integrals.
The region R does not have to be bounded.
Improper double integrals: Over unbounded regions or regions containing points where f(x,y) is undefined, evaluate as a sequence of iterated improper single-variable integrals.
Example: Integral from 1 to infinity of [integral from 0 to 1/x² of 2y dy] dx = 1/3.

🧩 Worked examples and techniques

🧩 Volume under a plane over a parabolic region

Problem: Find volume V under z = 8x + 6y over R = {(x,y): 0 ≤ x ≤ 1, 0 ≤ y ≤ 2x²}.
Vertical slices: Integrate from y = 0 to y = 2x², then x from 0 to 1.
- First integral: (8xy + 3y²) evaluated from y = 0 to y = 2x² gives 16x³ + 12x⁴.
- Second integral: (4x⁴ + (12/5)x⁵) evaluated from 0 to 1 gives 4 + 12/5 = 32/5.
Horizontal slices: Integrate from x = square root of (y/2) to x = 1, then y from 0 to 2.
- First integral: (4x² + 6xy) evaluated from x = square root of (y/2) to x = 1 gives 4 + 6y - (2y + 6 times square root of 2 times y times square root of y).
- Simplifies to 4 + 4y - 3 times square root of 2 times y to the power of 3/2.
- Second integral: (4y + 2y² - (6 times square root of 2 / 5) times y to the power of 5/2) evaluated from 0 to 2 gives 32/5.

🧩 Volume bounded by coordinate planes and a plane

Problem: Find volume V of solid bounded by three coordinate planes and 2x + y + 4z = 4.
Solve for z: z = (1/4)(4 - 2x - y).
The base region R in the xy-plane is {(x,y): 0 ≤ x ≤ 2, 0 ≤ y ≤ -2x + 4}.
Vertical slices: Integrate from y = 0 to y = -2x + 4, then x from 0 to 2.
- First integral: -(1/8)(4 - 2x - y)² evaluated from y = 0 to y = -2x + 4 gives (1/8)(4 - 2x)².
- Second integral: -(1/48)(4 - 2x)³ evaluated from 0 to 2 gives 64/48 = 4/3.

🧩 Improper double integral

Problem: Evaluate integral from 1 to infinity of [integral from 0 to 1/x² of 2y dy] dx.
First integral: y² evaluated from y = 0 to y = 1/x² gives 1/x⁴ = x to the power of -4.
Second integral: -(1/3) times x to the power of -3 evaluated from 1 to infinity gives 0 - (-1/3) = 1/3.
Don't confuse: This is an improper integral because the outer limit goes to infinity, but it converges to a finite value.

Triple Integrals

3.3 Triple Integrals

🧭 Overview

🧠 One-sentence thesis

Triple integrals extend the double integral concept to three dimensions, allowing us to calculate "hypervolume" under a three-dimensional hypersurface in four-dimensional space, and their evaluation reduces to iterated integrals over solids in three-dimensional space.

📌 Key points (3–5)

What a triple integral is: an extension of double integrals to real-valued functions f(x, y, z) over a solid S in three-dimensional space.
What it represents physically: the hypervolume under a three-dimensional hypersurface w = f(x, y, z) whose graph lies in four-dimensional space.
How to evaluate: through iterated integrals, with the simplest case being rectangular parallelepipeds and more complex cases involving surfaces and curves as boundaries.
Special case for volume: when the function is the constant 1, the triple integral gives the volume of the solid in three-dimensional space.
Common confusion: the order of integration matters for the limits (which can be functions of other variables), even though the final result doesn't depend on order for rectangular regions.

📐 Definition and construction

📐 Formal definition

Triple integral of f(x, y, z) over S, denoted by the integral of f(x, y, z) dV over S, is defined as the limit of the triple summation of f(x*, y*, z*) times delta-x times delta-y times delta-z.

The solid S is enclosed in a rectangular parallelepiped (a three-dimensional box).
The parallelepiped is divided into smaller subparallelepipeds with side lengths delta-x, delta-y, and delta-z.
In each subparallelepiped inside S, pick a point (x*, y*, z*).
The limit is taken over all divisions where the largest diagonal goes to zero.
The triple summation is over all subparallelepipeds inside S.
The limit does not depend on the choice of the enclosing rectangular parallelepiped.

🔤 Volume element notation

The symbol dV is called the volume element.

It represents the infinitesimal volume delta-x times delta-y times delta-z in the limit.
This notation parallels dA (area element) from double integrals.

🌐 Physical interpretation

🌐 Hypervolume in four dimensions

A double integral can be thought of as the volume under a two-dimensional surface.
A triple integral generalizes this: it represents the hypervolume under a three-dimensional hypersurface w = f(x, y, z) whose graph lies in four-dimensional space.
Example: just as a double integral over a region R gives the volume between the surface z = f(x, y) and the xy-plane, a triple integral gives the "four-dimensional volume" between the hypersurface w = f(x, y, z) and three-dimensional space.

📏 General notion of volume

The word "volume" is often used as a general term for the same concept in any n-dimensional object:
- Length in one-dimensional space
- Area in two-dimensional space
- Volume in three-dimensional space
- Hypervolume in four-dimensional space
Don't confuse: even though visualizing four-dimensional volume is difficult, the triple integral provides a concrete way to calculate it.

🔢 Evaluation methods

🔢 Simplest case: rectangular parallelepiped

When S is a rectangular parallelepiped [x₁, x₂] × [y₁, y₂] × [z₁, z₂], meaning S consists of all points (x, y, z) where x₁ ≤ x ≤ x₂, y₁ ≤ y ≤ y₂, z₁ ≤ z ≤ z₂:

The triple integral equals the iterated integral from z₁ to z₂, from y₁ to y₂, from x₁ to x₂ of f(x, y, z) dx dy dz.
The order of integration does not matter in this case.
This is the simplest case because all limits are constants.

Example: To evaluate the integral from 0 to 3, from 0 to 2, from 0 to 1 of (xy + z) dx dy dz:

First integrate with respect to x: one-half x² y + xz evaluated from x = 0 to x = 1 gives one-half y + z.
Then integrate with respect to y: one-quarter y² + yz evaluated from y = 0 to y = 2 gives 1 + 2z.
Finally integrate with respect to z: z + z² evaluated from z = 0 to z = 3 gives 12.

🔢 More complicated case: bounded by surfaces

When S is bounded below by surface z = g₁(x, y), bounded above by surface z = g₂(x, y), y is bounded between two curves h₁(x) and h₂(x), and x varies between a and b:

The triple integral equals the iterated integral from a to b, from h₁(x) to h₂(x), from g₁(x, y) to g₂(x, y) of f(x, y, z) dz dy dx.
The first iterated integral (with respect to z) results in a function of x and y, since its limits are functions of x and y.
This leaves a double integral of the type learned in the previous section.
Don't confuse: the limits of integration can be functions of the remaining variables, unlike the rectangular case where all limits are constants.

Example: To evaluate the integral from 0 to 1, from 0 to 1 - x, from 0 to 2 - x - y of (x + y + z) dz dy dx:

First integrate with respect to z: (x + y)z + one-half z² evaluated from z = 0 to z = 2 - x - y.
Substitute the limits and simplify to get (x + y)(2 - x - y) + one-half (2 - x - y)².
Continue integrating with respect to y, then x, to get the final answer of seven-eighths.

🔄 Variations in order

There are many variations on the complicated case, such as changing the roles of the variables x, y, z.
Triple integrals can be quite tricky due to these variations.
At this stage, the most important skill is learning how to evaluate a triple integral, regardless of what it represents.

📦 Volume calculation

📦 Volume formula

The volume V of a solid in three-dimensional space is given by the triple integral over S of 1 dV.

Since the function being integrated is the constant 1, the triple integral reduces to simpler forms.
If the solid is bounded above by surface z = f(x, y) and bounded below by the xy-plane z = 0, the triple integral reduces to a double integral.

📦 General volume case

When the solid is bounded below and above by surfaces z = g₁(x, y) and z = g₂(x, y), with y bounded between curves h₁(x) and h₂(x), and x varies between a and b:

Volume V equals the integral from a to b, from h₁(x) to h₂(x), from g₁(x, y) to g₂(x, y) of 1 dz dy dx.
This simplifies to the integral from a to b, from h₁(x) to h₂(x) of (g₂(x, y) - g₁(x, y)) dy dx.
The innermost integral with respect to z simply evaluates to the difference between the upper and lower bounding surfaces.

Example: To find the volume of a solid bounded by three coordinate planes, bounded above by the plane x + y + z = 2, and bounded below by the plane z = x + y:

Set up the triple integral of 1 dV with appropriate limits.
The z-limits are from x + y (lower surface) to 2 - x - y (rearranged upper plane equation).
After integrating with respect to z, you get (2 - x - y) - (x + y) = 2 - 2x - 2y.
Continue with the double integral to find the volume.

Numerical Approximation of Multiple Integrals

3.4 Numerical Approximation of Multiple Integrals

🧭 Overview

🧠 One-sentence thesis

The Monte Carlo method approximates multiple integrals by averaging function values at many random points in the region, converging to the true value as the number of points increases.

📌 Key points (3–5)

Why numerical methods are needed: complicated functions may not have simple closed-form iterated integrals.
Core idea: use the average value of a function over a region, multiplied by the region's area/volume, to approximate the integral.
How Monte Carlo works: generate many random points, compute the function at those points, average the results, then scale by the region's measure.
Common confusion: the error term (±) does not give hard bounds—it represents one standard deviation (a likely bound), not a guarantee.
Convergence: as the number of random points N increases, the approximation converges to the actual integral value.

📐 Average value and integral relationship

📐 Average value in one variable

For a continuous function f(x), the average value f-bar of f over an interval [a, b] is defined as f-bar = (1 / (b - a)) times the integral from a to b of f(x) dx.

The quantity (b - a) is the length of the interval, thought of as the "volume" of the interval.
This definition extends the idea that an integral can be recovered by multiplying the average value by the measure of the domain.

📏 Average value in two and three variables

The average value of f(x, y) over a region R is f-bar = (1 / A(R)) times the double integral over R of f(x, y) dA, where A(R) is the area of R.

The average value of f(x, y, z) over a solid S is f-bar = (1 / V(S)) times the triple integral over S of f(x, y, z) dV, where V(S) is the volume of S.

Rearranging: the double integral over R equals A(R) times f-bar.
This relationship is the foundation of the Monte Carlo method.

🔄 From infinite to finite points

The average value conceptually represents the sum of all function values divided by the number of points in R.
Problem: any region contains uncountably many points (they cannot be listed in a discrete sequence).
Solution: take a very large number N of random points, compute the average of f at those points, and use that as an approximation of f-bar.

🎲 The Monte Carlo method

🎲 Basic formula for double integrals

The approximation is:

Double integral over R of f(x, y) dA ≈ A(R) times f-bar ± A(R) times the square root of ((f-squared-bar minus (f-bar)²) / N)

where:

f-bar = (sum from i=1 to N of f(x_i, y_i)) / N
f-squared-bar = (sum from i=1 to N of (f(x_i, y_i))²) / N
The sums are taken over N random points (x₁, y₁), ..., (x_N, y_N) in R.

📊 Understanding the error term

The ± term does not provide hard bounds on the approximation.
It represents a single standard deviation from the expected value of the integral.
This provides a likely bound on the error, not a guarantee.
Don't confuse: this is a probabilistic method (uses random points), not a deterministic method (like Newton's method, which uses a specific formula).

🔢 Example: volume under a plane

The excerpt demonstrates approximating the volume under z = 8x + 6y over the rectangle R = [0,1] × [0,2].

The actual volume (computed earlier in the text) is 20.
A Java program generates N random points in [0,1] × [0,2], computes f = 8x + 6y at each point, and applies the Monte Carlo formula.
Results show convergence:
- N = 10: 19.37 ± 2.73
- N = 100: 21.33 ± 0.75
- N = 1000: 19.81 ± 0.27
- N = 10000: 20.08 ± 0.08
- N = 100000: 20.01 ± 0.03
- N = 1000000: 20.00 ± 0.008
As N increases, the approximation approaches 20 and the error term shrinks.

📈 Convergence behavior

As N approaches infinity, the Monte Carlo approximation converges to the actual volume.
The convergence rate is on the order of O(square root of N) in computational complexity terminology.

🔷 Handling nonrectangular regions

🔷 Enclosing rectangle technique

For a nonrectangular (bounded) region R:

Pick a rectangle R-tilde that encloses R.
Generate random points in R-tilde as before.
Use a point in the calculation of f-bar only if it is inside R.
No need to calculate the area of R—use the area of the enclosing rectangle R-tilde instead.

🔷 Example: nonrectangular region

The excerpt demonstrates the region R = {(x, y): 0 ≤ x ≤ 1, 0 ≤ y ≤ 2x²}.

The actual volume under z = 8x + 6y over this region is 6.4.
The enclosing rectangle is R-tilde = [0,1] × [0,2].
The program generates random points (x, y) in [0,1] × [0,2] and checks if y < 2x².
Only points satisfying the condition are used in the sum.
Results show convergence:
- N = 10: 6.96 ± 2.92
- N = 100: 6.31 ± 0.95
- N = 1000: 6.48 ± 0.32
- N = 10000: 6.35 ± 0.10
- N = 100000: 6.44 ± 0.03
- N = 1000000: 6.42 ± 0.01
The approximation converges to 6.4 as N increases.

🔷 Why the enclosing rectangle works

Excluding points outside R effectively scales the calculation to the correct region.
The ratio of points inside R to total points approximates the ratio of areas.
Using the enclosing rectangle's area compensates for this ratio automatically.

🧊 Extension to triple integrals

🧊 Adapting the method

To evaluate triple integrals using Monte Carlo:

Generate random triples (x, y, z) in a parallelepiped (3D box) instead of random pairs (x, y) in a rectangle.
Use the volume of the parallelepiped instead of the area of a rectangle in the approximation formula.
The same convergence behavior applies.

🧊 Example applications

The exercises suggest:

Approximating the triple integral of e^(xyz) over the unit cube [0,1] × [0,1] × [0,1].
Approximating volumes of solids like spheres and ellipsoids.
Handling non-box regions by enclosing them in a parallelepiped and checking point membership.

Change of Variables in Multiple Integrals

3.5 Change of Variables in Multiple Integrals

🧭 Overview

🧠 One-sentence thesis

The change of variables formula extends single-variable substitution to multiple integrals by transforming regions and including a Jacobian determinant factor, making otherwise impossible integrals tractable.

📌 Key points (3–5)

Core mechanism: Transform coordinates from (x, y) or (x, y, z) to new variables (u, v) or (u, v, w) using a one-to-one mapping, then multiply the integrand by the absolute value of the Jacobian determinant.
The Jacobian: A determinant of partial derivatives that measures how area or volume elements scale under the coordinate transformation; it replaces the derivative factor g′(u) from single-variable substitution.
Common confusion: The Jacobian must be computed in the direction from new variables to old (e.g., ∂(x, y)/∂(u, v)), not the reverse; also, you must take the absolute value of the Jacobian in the formula.
Polar, cylindrical, and spherical coordinates: These are special cases of the change of variables formula with known Jacobians (r, r, and ρ² sin φ respectively).
Why it matters: Many integrals that are impossible in Cartesian coordinates become straightforward after a suitable change of variables.

🔄 From single-variable substitution to multiple integrals

🔄 Single-variable substitution revisited

The excerpt reviews the familiar substitution method from single-variable calculus to motivate the multiple-variable case.

Given an integral like ∫₁² x³√(x² − 1) dx, you substitute u = x² − 1 and du = 2x dx.
Key insight from the excerpt: Think of this as defining x as a function of u, namely x = g(u) = √(u + 1), which is one-to-one and maps [0, 3] onto [1, 2].
The general formula is:

∫ₐᵇ f(x) dx = ∫_{g⁻¹(a)}^{g⁻¹(b)} f(g(u)) g′(u) du
The factor g′(u) du replaces dx; this is the prototype for the Jacobian in multiple integrals.

Why this perspective matters: The excerpt emphasizes that substitution is really about a one-to-one mapping with a nonzero derivative, which generalizes naturally to higher dimensions.

🧩 The change of variables formula for multiple integrals

Change of Variables Formula (Double Integral): If x = x(u, v) and y = y(u, v) define a one-to-one mapping from region R′ in the uv-plane onto region R in the xy-plane, and the Jacobian J(u, v) is never zero in R′, then
∬R f(x, y) dA(x,y) = ∬{R′} f(x(u,v), y(u,v)) |J(u,v)| dA(u,v).

The Jacobian J(u, v) is the determinant of the matrix of partial derivatives:
- J(u, v) = | ∂x/∂u ∂x/∂v |
  | ∂y/∂u ∂y/∂v |
Notation: The Jacobian is also written as ∂(x, y)/∂(u, v).
The formula says dA(x, y) = |J(u, v)| dA(u, v), analogous to dx = g′(u) du in one variable.

Don't confuse: The Jacobian is the determinant of partial derivatives of the old coordinates with respect to the new ones, not the other way around.

🧊 Triple integrals

Change of Variables Formula (Triple Integral): If x = x(u, v, w), y = y(u, v, w), z = z(u, v, w) define a one-to-one mapping from solid S′ onto solid S, and the Jacobian J(u, v, w) is never zero in S′, then
∭S f(x, y, z) dV(x,y,z) = ∭{S′} f(x(u,v,w), y(u,v,w), z(u,v,w)) |J(u,v,w)| dV(u,v,w).

The Jacobian is now a 3×3 determinant:
- J(u, v, w) = | ∂x/∂u ∂x/∂v ∂x/∂w |
  | ∂y/∂u ∂y/∂v ∂y/∂w |
  | ∂z/∂u ∂z/∂v ∂z/∂w |
Also written as ∂(x, y, z)/∂(u, v, w).

📐 Worked example: exponential integral with linear substitution

📐 The problem and substitution choice

Example: Evaluate ∬_R e^((x−y)/(x+y)) dA, where R = {(x, y) : x ≥ 0, y ≥ 0, x + y ≤ 1}.

The excerpt notes that this integral is "probably impossible" without substitution.
Strategy: The exponent suggests u = x − y and v = x + y.
Solving for x and y gives x = (u + v)/2 and y = (v − u)/2.

🗺️ Mapping the region

The original region R is a triangle in the xy-plane with vertices (0, 0), (1, 0), and (0, 1).
The mapping transforms R into R′ in the uv-plane.
The excerpt shows that R′ is bounded by u = v, u = −v, and v = 1.
One-to-one: The mapping is one-to-one from R′ onto R.

🧮 Computing the Jacobian

J(u, v) = | ∂x/∂u ∂x/∂v | = | 1/2 1/2 | = (1/2)(1/2) − (1/2)(−1/2) = 1/2. | ∂y/∂u ∂y/∂v | | −1/2 1/2 |
Therefore |J(u, v)| = 1/2.

✅ Evaluating the integral

The integral becomes:
- ∬_{R′} e^(u/v) · (1/2) dA
- = ∫₀¹ ∫_{−v}^v e^(u/v) · (1/2) du dv
- = ∫₀¹ (v/2) e^(u/v) |_{u=−v}^{u=v} dv
- = ∫₀¹ (v/2)(e − e⁻¹) dv
- = (v²/4)(e − e⁻¹) |₀¹
- = (e² − 1)/(4e).

Key takeaway: The substitution turned an impossible integral into a straightforward calculation.

🌐 Special cases: polar, cylindrical, and spherical coordinates

🌐 Polar coordinates

Double Integral in Polar Coordinates:
∬R f(x, y) dx dy = ∬{R′} f(r cos θ, r sin θ) r dr dθ,
where x = r cos θ, y = r sin θ maps R′ in the rθ-plane onto R in the xy-plane.

The Jacobian is J(r, θ) = | cos θ −r sin θ | = r cos² θ + r sin² θ = r. | sin θ r cos θ |
Therefore |J(r, θ)| = r (since r ≥ 0 in polar coordinates).
Interpretation: The area element dA(x, y) becomes r dr dθ.

Example from the excerpt: Find the volume inside the paraboloid z = x² + y² for 0 ≤ z ≤ 1.

Using vertical slices: V = ∬_R (1 − z) dA = ∬_R (1 − (x² + y²)) dA.
R is the unit disk x² + y² ≤ 1, which is R′ = {(r, θ) : 0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π} in polar coordinates.
V = ∫₀^(2π) ∫₀¹ (1 − r²) r dr dθ = ∫₀^(2π) ∫₀¹ (r − r³) dr dθ = ∫₀^(2π) (1/4) dθ = π/2.

Another example: Volume inside the cone z = √(x² + y²) for 0 ≤ z ≤ 1.

V = ∬_R (1 − √(x² + y²)) dA = ∫₀^(2π) ∫₀¹ (1 − r) r dr dθ = π/3.

🧊 Cylindrical coordinates

Triple Integral in Cylindrical Coordinates:
∭S f(x, y, z) dx dy dz = ∭{S′} f(r cos θ, r sin θ, z) r dr dθ dz,
where x = r cos θ, y = r sin θ, z = z maps S′ in rθz-space onto S in xyz-space.

The Jacobian is r (the same as in polar coordinates, since z is unchanged).
When to use: Problems with cylindrical symmetry (e.g., cylinders, cones with vertical axes).

🌍 Spherical coordinates

Triple Integral in Spherical Coordinates:
∭S f(x, y, z) dx dy dz = ∭{S′} f(ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ) ρ² sin φ dρ dφ dθ,
where x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ maps S′ in ρφθ-space onto S in xyz-space.

The Jacobian is ρ² sin φ.
When to use: Problems with spherical symmetry (e.g., spheres, cones with vertex at the origin).

Example from the excerpt: Find the volume inside the sphere x² + y² + z² = a² for a > 0.

In spherical coordinates, the sphere is ρ = a.
V = ∭_S 1 dV = ∫₀^(2π) ∫₀^π ∫₀^a ρ² sin φ dρ dφ dθ.
= ∫₀^(2π) ∫₀^π (a³/3) sin φ dφ dθ = ∫₀^(2π) (2a³/3) dθ = 4πa³/3.

Don't confuse: The Jacobian for spherical coordinates is ρ² sin φ, not just ρ²; the sin φ factor comes from the geometry of the coordinate system.

🔑 Key requirements and common pitfalls

🔑 One-to-one mapping and nonzero Jacobian

The mapping must be one-to-one from R′ (or S′) onto R (or S).
The Jacobian must be never zero in the interior of R′ (or S′).
Why: These conditions ensure that the transformation is invertible and that area/volume elements are well-defined.

⚠️ Absolute value of the Jacobian

The formula uses |J(u, v)| or |J(u, v, w)|, not just J.
Why: The Jacobian can be negative (indicating orientation reversal), but area and volume are always positive.
Example: In the worked example, J(u, v) = 1/2, so |J(u, v)| = 1/2.

🧭 Direction of the Jacobian

The Jacobian is computed as ∂(x, y)/∂(u, v), meaning old coordinates with respect to new.
Common mistake: Computing ∂(u, v)/∂(x, y) instead; this gives the reciprocal of the correct Jacobian (in 2D) or a more complicated relationship (in 3D).

📊 Summary table of Jacobians

Coordinate system	Transformation	Jacobian
Polar (2D)	x = r cos θ, y = r sin θ	r
Cylindrical (3D)	x = r cos θ, y = r sin θ, z = z	r
Spherical (3D)	x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ	ρ² sin φ

Application: Center of Mass

3.6 Application: Center of Mass

🧭 Overview

🧠 One-sentence thesis

The center of mass of a region or solid is computed using double or triple integrals that weight position coordinates by the density function, generalizing single-variable calculus formulas to arbitrary shapes with variable density.

📌 Key points (3–5)

What center of mass represents: the balance point of a region or solid, weighted by how mass is distributed (density function).
How to compute it: use moments (integrals of position times density) divided by total mass.
Uniform vs variable density: when density is constant, the center of mass is called the centroid; variable density shifts the center toward denser areas.
Common confusion: moments M_x and M_y are named by the axis they are "about," not the variable integrated—M_x involves y, M_y involves x.
2D vs 3D generalization: the same principle extends from planar regions (double integrals) to solids (triple integrals) with three moment components.

📐 Center of mass in two dimensions

📍 Definition and formulas

Center of mass of a region R: the point (x̄, ȳ) given by x̄ = M_y / M and ȳ = M_x / M.

Where:

M_x = double integral over R of y δ(x, y) dA (moment about the x-axis)
M_y = double integral over R of x δ(x, y) dA (moment about the y-axis)
M = double integral over R of δ(x, y) dA (total mass)
δ(x, y) is the density function at point (x, y)

Why these formulas work:

Think of dividing R into tiny rectangles of size Δx by Δy
Each rectangle has approximate mass δ(x*, y*) Δx Δy
The total mass is the limit of sums of these masses → the double integral
Moments weight each piece by its distance from an axis

🔄 Special case: uniform density

When density δ(x, y) = 1 throughout R:

The formulas reduce to the single-variable calculus versions
For R = {(x, y): a ≤ x ≤ b, 0 ≤ y ≤ f(x)}:
- M_x = integral from a to b of (f(x))²/2 dx
- M_y = integral from a to b of x f(x) dx
- M = integral from a to b of f(x) dx (just the area)

Centroid: the center of mass when density is constant.

📊 How density affects location

Example from the excerpt: Region R = {(x, y): 0 ≤ x ≤ 1, 0 ≤ y ≤ 2x²} with δ(x, y) = x + y.

Computed center of mass: (22/27, 50/63)
With uniform density: (3/4, 3/5)
Why the shift? The density function x + y increases toward the upper corner, so more mass concentrates there, pulling the center of mass in that direction.

Don't confuse: The center of mass is not necessarily inside the region if the region is non-convex, though it is for the examples shown.

🧊 Center of mass in three dimensions

🎯 Formulas for solids

For a solid S with density function δ(x, y, z), the center of mass is (x̄, ȳ, z̄) where:

x̄ = M_yz / M, ȳ = M_xz / M, z̄ = M_xy / M

Where:

M_yz = triple integral over S of x δ(x, y, z) dV (moment about the yz-plane)
M_xz = triple integral over S of y δ(x, y, z) dV (moment about the xz-plane)
M_xy = triple integral over S of z δ(x, y, z) dV (moment about the xy-plane)
M = triple integral over S of δ(x, y, z) dV (total mass)

Naming convention reminder:

M_yz is the moment about the yz-plane, so it involves the x-coordinate
M_xz is the moment about the xz-plane, so it involves the y-coordinate
M_xy is the moment about the xy-plane, so it involves the z-coordinate

🌐 Example: upper hemisphere

Solid S = {(x, y, z): z ≥ 0, x² + y² + z² ≤ a²} with uniform density δ = 1.

Using symmetry:

The solid is symmetric about the z-axis
Therefore x̄ = 0 and ȳ = 0 without calculation
Only need to find z̄

Calculation approach:

M = volume of S = (2πa³)/3 (half the sphere volume)
M_xy computed using spherical coordinates
Converting z = ρ cos φ and dV = ρ² sin φ dρ dφ dθ
Result: z̄ = 3a/8

Interpretation: The center of mass is 3a/8 above the base, which makes physical sense—it's below the geometric center (at a/2) because there's more volume near the base of the hemisphere.

🔑 Key computational techniques

🧮 Choosing coordinate systems

Cartesian (x, y, z): natural for rectangular regions
Cylindrical or spherical: exploit symmetry in the solid
Example: the hemisphere calculation uses spherical coordinates because the solid is part of a sphere

⚖️ Exploiting symmetry

When a solid and its density function are both symmetric about an axis:

The center of mass lies on that axis
Coordinates perpendicular to the axis are zero
Saves computation time

Example: The hemisphere is symmetric about the z-axis and has uniform density, so x̄ = ȳ = 0 immediately.

📏 Relationship between moments and coordinates

Moment	Formula	What it measures	Used to find
M_x (2D)	∬ y δ dA	How mass is distributed vertically	ȳ
M_y (2D)	∬ x δ dA	How mass is distributed horizontally	x̄
M_yz (3D)	∭ x δ dV	How mass is distributed away from yz-plane	x̄
M_xz (3D)	∭ y δ dV	How mass is distributed away from xz-plane	ȳ
M_xy (3D)	∭ z δ dV	How mass is distributed away from xy-plane	z̄

Application: Probability and Expected Value

3.7 Application: Probability and Expected Value

🧭 Overview

🧠 One-sentence thesis

Multiple integrals provide the mathematical framework for calculating probabilities and expected values of continuous random variables, extending discrete probability concepts to continuous sample spaces.

📌 Key points (3–5)

Discrete vs continuous random variables: discrete variables use sums of individual probabilities; continuous variables use integrals of probability density functions.
Probability density function (p.d.f.): a nonnegative function f whose integral over the entire space equals 1, used to calculate probabilities via integration.
Joint distributions: multiple random variables can be studied together using joint p.d.f.s and multiple integrals to find probabilities over regions.
Expected value: the "average" value of a random variable, calculated as a sum for discrete variables or an integral (weighted by the p.d.f.) for continuous variables.
Common confusion: for continuous variables, P(X = x) is always 0; only intervals like P(a < X ≤ b) have nonzero probability.

🎲 From discrete to continuous probability

🎲 Discrete random variables

Discrete random variable: a variable X on a sample space Ω consisting of countable outcomes (e.g., die rolls).

For a six-sided die, the sample space is Ω = {1, 2, 3, 4, 5, 6}.
Each outcome has a probability: P(X = 3) = 1/6.
Probabilities of events are sums: P(X ≤ 3) = P(X = 1) + P(X = 2) + P(X = 3) = 3/6 = 1/2.
An event is a subset of the sample space.

🌊 Continuous random variables

Continuous random variable: a variable X representing a value from a non-countable set (e.g., all real numbers in an interval).

Example: X represents a random real number in (0, 1).
Key difference: P(X = x) = 0 for any specific value x (because there are infinitely many points).
Instead, we consider probabilities over intervals: P(X ≤ x).
The term uniformly distributed means every subinterval of the same length has the same probability.

⚠️ Why P(X = x) must be zero

If X is uniformly distributed on (0, 1) and P(X = x) were positive for any x, then summing over all infinitely many points would give infinite total probability.
This violates the requirement that total probability equals 1.
Therefore, only intervals have nonzero probability.

📊 Probability density functions

📊 Distribution function F(x)

Distribution function: F(x) = P(X ≤ x), the cumulative probability up to x.

For a sample space Ω = (a, b):

F(x) = 1 for x ≥ b (certainty)
F(x) = P(X ≤ x) for a < x < b
F(x) = 0 for x ≤ a (impossible)

📈 Probability density function f(x)

Probability density function (p.d.f.): a nonnegative function f such that F(x) = integral from −∞ to x of f(y) dy.

Requirements:

f(x) ≥ 0 for all x
Integral from −∞ to ∞ of f(x) dx = 1 (total probability)
F'(x) = f(x) by the Fundamental Theorem of Calculus

How to use it:

P(X ≤ x) = integral from a to x of f(y) dy
P(a₁ < X ≤ b₁) = integral from a₁ to b₁ of f(x) dx

🎯 Uniform distribution example

For X uniformly distributed on (a, b):

f(x) = 1/(b − a) for a < x < b
f(x) = 0 elsewhere
Example: on (0, 1), f(x) = 1 for 0 < x < 1

Interpretation: the "density" is constant across the interval, reflecting equal likelihood.

🔔 Standard normal distribution

The famous "bell curve":

f(x) = (1/√(2π)) e^(−x²/2) for all real x
Widely used in statistics
The excerpt verifies that the integral equals 1 using a clever double integral in polar coordinates

🔗 Joint distributions and multiple integrals

🔗 Joint probability density functions

Joint p.d.f.: a function f(x, y, z) for three random variables X, Y, Z such that F(x, y, z) = P(X ≤ x, Y ≤ y, Z ≤ z) equals the triple integral of f from −∞ to x, y, z.

Requirements:

f(x, y, z) ≥ 0
Triple integral over all space of f(x, y, z) dx dy dz = 1

Calculating probabilities:

P(a₁ < X ≤ b₁, a₂ < Y ≤ b₂, a₃ < Z ≤ b₃) = triple integral over the box [a₁, b₁] × [a₂, b₂] × [a₃, b₃] of f(x, y, z) dx dy dz
The symbols ≤ and < are interchangeable (since individual points have zero probability)

🧮 Quadratic equation example

Problem: If a, b, c are random numbers from (0, 1), what is the probability that ax² + bx + c = 0 has at least one real solution?

Solution approach:

Real solution exists when b² − 4ac ≥ 0
Use three jointly distributed random variables X, Y, Z (representing a, b, c)
Joint p.d.f. is f(a, b, c) = 1 for a, b, c in (0, 1) (uniform distribution)
Set up triple integral: b varies from 2√(ac) to 1, and (a, c) varies over a region R in the ac-plane
Split R into two regions R₁ and R₂ for easier integration
Result: P(b² − 4ac ≥ 0) = (5 + 3 ln 4)/36 ≈ 0.2544

Interpretation: about 25% chance the equation can be solved.

🎯 Expected value

🎯 Definition and intuition

Expected value E(X): the "average" value of a random variable X as it varies over its sample space.

For discrete variables:

E(X) = sum over all x of x · P(X = x)
Example: six-sided die, E(X) = 1·(1/6) + 2·(1/6) + ... + 6·(1/6) = 3.5

For continuous variables:

E(X) = integral from −∞ to ∞ of x · f(x) dx
The value x is weighted by its probability density f(x)

📐 Uniform distribution expected value

For X uniformly distributed on (0, 1):

f(x) = 1 for 0 < x < 1
E(X) = integral from 0 to 1 of x dx = 1/2
This is the midpoint of the interval, as expected for a symmetric distribution

🔗 Expected values for joint distributions

For jointly distributed X and Y with joint p.d.f. f(x, y):

E(X) = double integral of x · f(x, y) dx dy
E(Y) = double integral of y · f(x, y) dx dy

🎲 Minimum and maximum example

Problem: Pick n > 2 random numbers from (0, 1). What are the expected values of the smallest and largest?

Setup:

Let U₁, ..., Uₙ be n uniform random variables on (0, 1)
X = min(U₁, ..., Uₙ), Y = max(U₁, ..., Uₙ)
Joint p.d.f.: f(x, y) = n(n − 1)(y − x)^(n−2) for 0 ≤ x ≤ y ≤ 1

Results:

E(X) = 1/(n + 1) (expected minimum)
E(Y) = n/(n + 1) (expected maximum)
Example: for n = 3, the average minimum approaches 1/4 and the average maximum approaches 3/4 over many samples

Interpretation: as n increases, the minimum gets closer to 0 and the maximum gets closer to 1, as intuition suggests.

Line Integrals

4.1 Line Integrals

🧭 Overview

🧠 One-sentence thesis

Line integrals extend single-variable integration to curves in two dimensions, allowing us to compute work done by a force along a path and to integrate both scalar and vector fields over curves.

📌 Key points (3–5)

What line integrals generalize: they extend Riemann integrals from intervals in R¹ to curves in R² (and higher dimensions).
Two types of line integrals: scalar field integrals (with respect to arc length ds) and vector field integrals (with respect to position vector dr).
Physical motivation: work equals force times distance; line integrals compute total work when force varies along a curve.
Direction matters for vector fields: reversing the curve direction leaves scalar field integrals unchanged but negates vector field integrals.
Common confusion: line integrals of scalar fields vs. vector fields—scalar integrals are direction-independent; vector integrals are direction-dependent.

📐 Scalar field line integrals

📏 Definition and setup

Line integral of a scalar field: For a real-valued function f(x, y) and a curve C parametrized by x = x(t), y = y(t), a ≤ t ≤ b, the line integral of f(x, y) along C with respect to arc length s is the integral from a to b of f(x(t), y(t)) times the square root of (x'(t) squared plus y'(t) squared) dt.

The symbol ds is the differential of arc length.
ds equals the square root of (x'(t) squared plus y'(t) squared) dt.
This comes from the Pythagorean theorem applied to infinitesimal curve segments.

🖼️ Geometric interpretation

Think of f(x, y) as the height of a picket fence standing along curve C.
Then f(x, y) ds represents the area of an infinitesimally thin section of that fence.
The line integral sums all these areas to give the total area of the fence.

Example: To find the lateral surface area of a right circular cylinder of radius r and height h, parametrize the base circle as x = r cos t, y = r sin t, 0 ≤ t ≤ 2π, set f(x, y) = h (constant height), and integrate. The result is 2πrh.

🔄 Direction independence

Reversing the direction of traversal does not change the value of a scalar field line integral.
If C is traversed from t = a to t = b, then −C (the same curve in the opposite direction) is parametrized by x = x(a + b − t), y = y(a + b − t), a ≤ t ≤ b.
The integral over C equals the integral over −C for scalar fields.

Don't confuse: This direction-independence applies only to scalar field integrals, not vector field integrals.

⚡ Vector field line integrals

🧲 Vector fields and motivation

Vector field: A function f(x, y) = P(x, y) i + Q(x, y) j that assigns a vector to each point in R².

In physics, force is a vector, not just a magnitude.
To compute work done by a varying vector force along a curve, we need a vector form of the line integral.
The position vector r(t) = x(t) i + y(t) j traces out the curve C.
The derivative r'(t) = x'(t) i + y'(t) j is a tangent vector to C.

📝 Definition of vector field line integral

Line integral of a vector field: For f(x, y) = P(x, y) i + Q(x, y) j and curve C with parametrization x = x(t), y = y(t), a ≤ t ≤ b, the line integral of f along C is the integral from a to b of f(x(t), y(t)) dot r'(t) dt.

This can also be written as the integral over C of P(x, y) dx plus the integral over C of Q(x, y) dy.
The notation dr = r'(t) dt = dx i + dy j is the differential of the position vector.
The quantity P(x, y) dx + Q(x, y) dy is called a differential form.

🔧 Physical interpretation: work

If f(x, y) represents a force field, then the work W done moving an object along C is the line integral of f · dr.
The unit tangent vector T(t) = r'(t) divided by the norm of r'(t).
Work can also be written as the integral over C of f · T ds, emphasizing the tangential component of force in the direction of motion.

Example: Evaluating the integral over C of (x² + y²) dx + 2xy dy for two different curves from (0,0) to (1,2)—one linear (x = t, y = 2t) and one parabolic (x = t, y = 2t²)—both give 13/3, suggesting (but not proving) that work might be path-independent in some cases.

🔄 Direction dependence

For vector fields, reversing the curve direction negates the line integral.
The integral over −C of P(x, y) dx equals the negative of the integral over C of P(x, y) dx.
This follows from the chain rule and substitution u = a + b − t.

Don't confuse: Scalar field integrals are unchanged by direction reversal; vector field integrals change sign.

🧩 Key concepts and terminology

🧩 Differential forms and exact forms

Differential form: An expression P(x, y) dx + Q(x, y) dy.

Exact differential form: A differential form that equals dF for some function F(x, y), where dF = (∂F/∂x) dx + (∂F/∂y) dy.

The differential dF is the total differential of F.
Exact forms have special properties for line integrals (explored in later sections).

🛤️ Piecewise smooth curves

A curve C may be the union of several smooth pieces: C = C₁ ∪ C₂ ∪ ... ∪ Cₙ.
The line integral over C is the sum of the line integrals over each piece.

Example: To integrate over a polygonal path from (0,0) to (0,2) to (1,2), split into two segments C₁ (vertical) and C₂ (horizontal), integrate over each, and add the results.

📊 Comparison table

Feature	Scalar field integral	Vector field integral
Notation	∫_C f(x, y) ds	∫_C f · dr or ∫_C P dx + Q dy
Direction reversal	Unchanged: ∫C = ∫{−C}	Sign flips: ∫_{−C} = −∫_C
Physical meaning	Area of "fence" along curve	Work done by force field
Integrand	Real-valued function	Dot product of vector field and tangent

🔍 Properties and special cases

🔍 Relationship between formulations

The two forms ∫_C f · dr and ∫_C f · T ds are equivalent.
This follows because dr = T ds (since r'(t) dt = T · norm(r'(t)) dt and ds = norm(r'(t)) dt).
The T ds form emphasizes integrating the tangential component of f.

🔍 Special results

If f is perpendicular to r'(t) at every point on C, then ∫_C f · dr = 0 (no work done).
If f points in the same direction as r'(t) at every point, then ∫_C f · dr equals ∫_C norm(f) ds (maximum work).
The Riemann integral from a to b of f(x) dx is a special case of a line integral (over an interval in R¹).

🔍 Bounds on line integrals

If the norm of f(x, y) is at most M for all points on C, and C has arc length L, then the absolute value of ∫_C f · dr is at most ML.
This provides an upper bound on work or other quantities computed via line integrals.

Don't confuse: The bound applies to the absolute value of the integral, not the integral of the absolute value.

Properties of Line Integrals

4.2 Properties of Line Integrals

🧭 Overview

🧠 One-sentence thesis

Line integrals of vector fields change sign when the curve direction is reversed, remain unchanged under direction-preserving reparametrizations, and are path-independent if and only if the field has a potential function.

📌 Key points (3–5)

Direction matters for vector fields: reversing the curve direction negates the line integral of a vector field, but not for scalar fields.
Reparametrization invariance: the line integral value is unchanged if the new parametrization preserves direction (strictly increasing parameter transformation).
Path independence criterion: a line integral is path-independent in a region if and only if it equals zero around every closed curve in that region.
Conservative fields and potentials: if a vector field has a potential F (i.e., the gradient of F equals the field), then the line integral depends only on the endpoint values of F, not the path.
Common confusion: path independence does not mean "all line integrals are the same"—it means integrals between the same two endpoints are the same, which is equivalent to closed-curve integrals being zero.

🔄 Direction reversal and scalar vs vector fields

🔄 Scalar fields are direction-independent

For line integrals of real-valued functions (scalar fields), reversing the direction does not change the value: integral over C of f(x,y) ds equals integral over negative-C of f(x,y) ds.

The excerpt emphasizes this holds for scalar fields only.
The notation "negative-C" means the same curve traversed in the opposite direction.

🔄 Vector fields reverse sign with direction

For line integrals of vector fields, the value does change: integral over negative-C of f · dr equals negative of integral over C of f · dr.

The excerpt proves this by showing that when you reverse the parametrization (using u = a + b − t), each component integral (P dx and Q dy) picks up a negative sign.
Why this happens: the derivative of the reversed parametrization is negative (the Chain Rule gives negative x-prime), and after substitution the limits flip, introducing another negative sign; combined, these yield the overall negative.
Example: moving an object along a curve C and then back along the same path (in reverse) results in zero total work, because force is a vector and direction matters.

🧭 Directed curves

Because direction is crucial for vector field integrals, curves in these integrals are called directed curves or oriented curves.
Always keep track of which direction you are integrating.

🔁 Reparametrization and well-definedness

🔁 The reparametrization problem

Any curve has infinitely many parametrizations.
If different parametrizations gave different integral values, the definition would not be well-defined.

🔁 Theorem 4.2: direction-preserving reparametrizations

If t = alpha(u) is a strictly increasing function (alpha-prime(u) > 0) mapping [c, d] onto [a, b], then the line integral of a vector field has the same value for both the original parametrization and the reparametrized curve.

Key condition: alpha-prime(u) > 0 means the two parametrizations move along C in the same direction.
The proof uses the Chain Rule and substitution: the alpha-prime factors cancel out, leaving the same integral.
Don't confuse: the reverse parametrization (u = a + b − t) has alpha-prime = −1 < 0, so it does not satisfy the theorem's condition—that's why reversing direction changes the sign.

🔁 Example verification

The excerpt shows that parametrizing a curve by t or by u = arcsin(t) (where t = sin u) yields the same integral value (13/3 in both cases).
This confirms that as long as the direction is preserved, the choice of parameter does not matter.

🔒 Path independence and closed curves

🔒 Closed curves

A closed curve is one whose initial point and terminal point are the same: (x(a), y(a)) = (x(b), y(b)).

A simple closed curve does not intersect itself (e.g., a circle or ellipse).
Any closed curve can be thought of as a union of simple closed curves (like the loops in a figure eight).
Notation: a circle-integral symbol (closed-loop integral) denotes integration around a closed curve.

🔒 Path independence defined

A line integral is path-independent in a region R if it has the same value for any two curves in R that share the same initial and terminal points.
The excerpt notes that earlier examples (e.g., Example 4.2) had the same value for different paths, but this is not always the case.

🔒 Theorem 4.3: equivalence of path independence and zero closed integrals

In a region R, the line integral of f · dr is independent of the path between any two points if and only if the closed-curve integral of f · dr equals zero for every closed curve C contained in R.

Proof sketch (forward direction): assume all closed integrals are zero. Take two curves C₁ and C₂ from P₁ to P₂. The union of C₁ and negative-C₂ is a closed curve, so its integral is zero. This implies the integral over C₁ equals the integral over C₂.
Proof sketch (reverse direction): assume path independence. Any closed curve C can be split into two parts C₁ and C₂ from P₁ to P₂. By path independence, their integrals are equal, so the closed integral (C₁ plus negative-C₂) is zero.
Why it matters: this theorem shows that path independence and vanishing closed integrals are two sides of the same coin.
Limitation: the theorem does not give a practical test, since you cannot check all possible closed curves.

🧲 Conservative fields and potentials

🧲 The Chain Rule for multivariable functions

Theorem 4.4 (Chain Rule): If z = f(x, y) is continuously differentiable and x = x(t), y = y(t) are differentiable, then dz/dt = (partial z / partial x)(dx/dt) + (partial z / partial y)(dy/dt).

This is the multivariable version of the Chain Rule.
It is used to prove the next theorem about potentials.

🧲 Theorem 4.5: the Fundamental Theorem for line integrals

If there exists a real-valued function F(x, y) such that the gradient of F equals the vector field f on a region R, then the line integral of f · dr over any smooth curve C in R equals F(B) − F(A), where A and B are the endpoints of C.

Key consequence: the integral depends only on the endpoint values of F, not on the path taken.
Proof idea: rewrite the line integral using the fact that P = partial F / partial x and Q = partial F / partial y. Apply the Chain Rule (Theorem 4.4) to get F-prime(x(t), y(t)). Then use the Fundamental Theorem of Calculus to evaluate from a to b, yielding F(B) − F(A).
This is the line integral analogue of the Fundamental Theorem of Calculus.

🧲 Definitions: potential and conservative field

A real-valued function F(x, y) such that gradient-F(x, y) = f(x, y) is called a potential for f.

A conservative vector field is one which has a potential.

If a field is conservative, its line integrals are path-independent.

🧲 Example: finding a potential

The excerpt shows how to find F given f = (x² + y²) i + 2xy j.
Method: integrate partial F / partial x = x² + y² with respect to x to get F = (1/3)x³ + xy² + g(y). Then differentiate with respect to y and set equal to 2xy to find g-prime(y) = 0, so g(y) is a constant (choose 0 for simplicity). Result: F(x, y) = (1/3)x³ + xy².
Verification: using Theorem 4.5, the integral from (0,0) to (1,2) is F(1,2) − F(0,0) = (1/3 + 4) − 0 = 13/3, matching earlier calculations.

🧲 Corollary 4.6: conservative fields have zero closed integrals

If a vector field f has a potential in a region R, then the closed-curve integral of f · dr equals zero for any closed curve C in R.

Why: for a closed curve, the endpoints A and B are the same point, so F(B) − F(A) = 0.
Example: the field f = x i + y j has potential F = (1/2)x² + (1/2)y². Any closed curve (e.g., an ellipse) has integral zero.

🧪 Practical tests and examples

🧪 How to check for a potential

Step	What to do	Why
1. Integrate P with respect to x	Get F = integral of P dx + g(y)	Ensures partial F / partial x = P
2. Differentiate result with respect to y	Get partial F / partial y in terms of g-prime(y)	Must match Q
3. Solve for g(y)	Set partial F / partial y = Q and integrate	Completes the potential
4. Check consistency	Verify that the mixed partials are equal	If they don't match, no potential exists

The excerpt does not explicitly state the mixed-partial test, but the method implicitly relies on it.

🧪 Example: closed ellipse integral

For C: x = 2 cos t, y = 3 sin t (0 ≤ t ≤ 2π), the integral of x dx + y dy equals zero.
Reason: the field x i + y j has potential F = (1/2)x² + (1/2)y², and C is closed, so by Corollary 4.6 the integral is zero.

🧪 When path independence fails

The excerpt mentions that not all line integrals are path-independent (e.g., Example 4.2 was, but others are not).
If a field does not have a potential, integrals between the same two points can differ depending on the path.
Common confusion: do not assume that because one example had path independence, all line integrals do—always check for a potential or use Theorem 4.3.

🔗 Connections and implications

🔗 Relationship between theorems

Theorem 4.3 (path independence ↔ zero closed integrals) is a conceptual bridge.
Theorem 4.5 (potentials → path independence) gives a practical sufficient condition.
Corollary 4.6 (potentials → zero closed integrals) combines both ideas.

🔗 Work interpretation

The excerpt interprets the direction-reversal property in terms of work: moving an object along a curve and back does zero total work if force is treated as a vector.
This physical intuition reinforces why direction matters for vector fields.

🔗 Notation for closed curves

The special closed-loop integral symbol indicates integration around a closed curve.
Older texts sometimes use arrows (counterclockwise or clockwise) to indicate direction.

🔗 Limitations of Theorem 4.3

The theorem does not provide a practical algorithm—you cannot check all closed curves.
The excerpt notes that it mostly gives insight into how line integrals behave and how different integrals are related.
For practical tests, use Theorem 4.5 (find a potential) or later results (like Green's Theorem, mentioned at the end).

Green's Theorem

4.3 Green’s Theorem

🧭 Overview

🧠 One-sentence thesis

Green's Theorem converts a line integral around a closed curve into a double integral over the region inside the curve, providing a powerful computational tool and revealing when vector fields have potentials.

📌 Key points (3–5)

What Green's Theorem does: relates a line integral around a closed curve C to a double integral over the region R inside C using the formula: line integral of f·dr equals the double integral of (∂Q/∂x − ∂P/∂y) dA.
When it applies: the region R must have a simple closed piecewise smooth boundary C, the vector field f = P i + Q j must be smooth on both R and C, and C must be traversed so R is always on the left.
Common confusion—holes matter: Green's Theorem fails when the region has a "hole" (a point or region cut out from the interior); the theorem can be extended to multiply connected regions (regions with holes) by treating inner and outer boundaries carefully.
Connection to potentials: in simply connected regions (no holes), if ∂P/∂y = ∂Q/∂x everywhere, then f has a potential and all closed-loop integrals are zero.
Why it matters: Green's Theorem simplifies difficult line integrals, explains when vector fields are conservative, and extends to more general regions using a slit technique.

🔄 The core formula and setup

🔄 Statement of Green's Theorem

Green's Theorem: Let R be a region in R² whose boundary is a simple closed curve C which is piecewise smooth. Let f(x, y) = P(x, y) i + Q(x, y) j be a smooth vector field defined on both R and C. Then the line integral of f·dr around C equals the double integral over R of (∂Q/∂x − ∂P/∂y) dA, where C is traversed so that R is always on the left side of C.

Smooth vector field: a vector field f(x, y) = P(x, y) i + Q(x, y) j is smooth if its component functions P(x, y) and Q(x, y) are smooth (have continuous derivatives).
The left side is a line integral around the closed curve; the right side is a double integral over the area inside.
Direction matters: you must traverse C counterclockwise (so the region R stays on your left) for the formula to hold as stated.

📐 Simple regions

The proof in the excerpt works for a simple region R, where the boundary C can be written in two distinct ways:

Vertically: C = C₁ ∪ C₂, where C₁ is y = y₁(x) from leftmost point X₁ to rightmost point X₂, and C₂ is y = y₂(x) from X₂ back to X₁.
Horizontally: C = C₁ ∪ C₂, where C₁ is x = x₁(y) from lowest point Y₂ to highest point Y₁, and C₂ is x = x₂(y) from Y₁ back to Y₂.

Example: a region bounded above and below by two smooth curves, and left and right by two smooth curves.

🧮 How the proof works

🧮 Breaking the line integral into pieces

The proof splits the line integral into two parts:

The integral of P(x, y) dx around C.
The integral of Q(x, y) dy around C.

Each part is then related to a double integral using the Fundamental Theorem of Calculus.

🔍 Integrating P(x, y) dx

Use the vertical representation: C₁ is y = y₁(x) (x from a to b) and C₂ is y = y₂(x) (x from b to a).
The line integral of P dx around C becomes:
- Integral from a to b of P(x, y₁(x)) dx minus integral from a to b of P(x, y₂(x)) dx.
This simplifies to minus the integral from a to b of [P(x, y₂(x)) − P(x, y₁(x))] dx.
By the Fundamental Theorem of Calculus, this equals minus the double integral over R of ∂P/∂y dA.

🔍 Integrating Q(x, y) dy

Use the horizontal representation: C₁ is x = x₁(y) (y from d to c) and C₂ is x = x₂(y) (y from c to d).
The line integral of Q dy around C becomes:
- Integral from c to d of Q(x₁(y), y) dy plus integral from d to c of Q(x₂(y), y) dy.
This simplifies to the integral from c to d of [Q(x₂(y), y) − Q(x₁(y), y)] dy.
By the Fundamental Theorem of Calculus, this equals the double integral over R of ∂Q/∂x dA.

✅ Combining the results

Adding the two parts:

Line integral of f·dr = line integral of P dx + line integral of Q dy
= −(double integral of ∂P/∂y dA) + (double integral of ∂Q/∂x dA)
= double integral of (∂Q/∂x − ∂P/∂y) dA.

Don't confuse: the theorem is proved for simple regions, but it extends to more general regions (unions of simple regions).

🧪 Examples and special cases

🧪 Example: zero result from Green's Theorem

Problem: Evaluate the line integral of (x² + y²) dx + 2xy dy around the boundary C of the region R = {(x, y): 0 ≤ x ≤ 1, 2x² ≤ y ≤ 2x}, traversed counterclockwise.

Solution:

P(x, y) = x² + y², Q(x, y) = 2xy.
Compute ∂Q/∂x = 2y and ∂P/∂y = 2y.
So ∂Q/∂x − ∂P/∂y = 2y − 2y = 0.
By Green's Theorem, the line integral equals the double integral of 0 dA = 0.

Why it's zero: this vector field has a potential function F(x, y) = (1/3)x³ + xy², so the line integral around any closed curve is zero (by Corollary 4.6).

🕳️ Example: when Green's Theorem fails (hole at the origin)

Setup: Let f(x, y) = (−y/(x² + y²)) i + (x/(x² + y²)) j, and R = {(x, y): 0 < x² + y² ≤ 1}.

Apparent contradiction:

For the boundary circle C: x² + y² = 1 (traversed counterclockwise), the line integral of f·dr = 2π (from Exercise 9(b) in Section 4.2).
But ∂Q/∂x = (y² − x²)/(x² + y²)² = ∂P/∂y, so ∂Q/∂x − ∂P/∂y = 0.
This would give double integral of 0 dA = 0, contradicting the line integral result.

Resolution: R is not the entire region enclosed by C because the point (0, 0) is excluded (0 < x² + y² means the origin is a "hole"). Green's Theorem does not apply when the region has a hole.

🔗 Example: annulus (region with a hole)

Modified region: R = {(x, y): 1/4 ≤ x² + y² ≤ 1} (an annulus, ring-shaped region).

Boundary: C = C₁ ∪ C₂, where:

C₁ is the outer circle x² + y² = 1 traversed counterclockwise.
C₂ is the inner circle x² + y² = 1/4 traversed clockwise (so R stays on the left).

Result: The line integral of f·dr around C = 0, and the double integral of (∂Q/∂x − ∂P/∂y) dA = 0, so Green's Theorem holds for this annular region.

🕸️ Multiply connected regions

🕸️ What are multiply connected regions?

Multiply connected region: a region with one or more regions cut out from the interior (not just discrete points).

Example: an annulus (one hole), a region with two circular holes, etc.
Contrast with simply connected region: a region with no holes.

✂️ The slit technique

Green's Theorem extends to multiply connected regions using "slits":

Cut slits between the outer boundary and inner boundaries to divide R into subregions R₁, R₂, ... that have no holes.
Each slit is part of the boundary of two adjacent subregions, traversed in opposite directions.
The line integrals along the slits cancel out when you add up the integrals over all subregions.
Apply Green's Theorem to each subregion (which has no holes), then sum the results.

Result: The line integral around the outer and inner boundaries (with R on the left) equals the double integral of (∂Q/∂x − ∂P/∂y) dA over the entire multiply connected region R.

Example: In Figure 4.3.4(a), region R has one hole. Slits divide R into R₁ and R₂. The line integral around C₁ ∪ C₂ equals the sum of the double integrals over R₁ and R₂, which equals the double integral over R.

🔑 Connection to potentials and path independence

🔑 When does a vector field have a potential?

Recall from Corollary 4.6: if a smooth vector field f(x, y) = P(x, y) i + Q(x, y) j has a smooth potential F(x, y) in a region R, then:

∂F/∂x = P and ∂F/∂y = Q.
The mixed partial derivatives are equal: ∂²F/(∂y∂x) = ∂²F/(∂x∂y), so ∂P/∂y = ∂Q/∂x in R.
The line integral of f·dr around any closed curve C in R is zero.

Converse: If ∂P/∂y = ∂Q/∂x in R, then by Green's Theorem:

Line integral of f·dr around C = double integral of (∂Q/∂x − ∂P/∂y) dA = double integral of 0 dA = 0.

🔄 Equivalent conditions for simply connected regions

For a simply connected region R (no holes), the following are equivalent:

Condition	Meaning
(a) f has a smooth potential F(x, y) in R	There exists F such that ∂F/∂x = P and ∂F/∂y = Q
(b) Line integral of f·dr is path-independent	The integral depends only on endpoints, not the path
(c) Line integral of f·dr = 0 for every simple closed curve C in R	All closed-loop integrals are zero
(d) ∂P/∂y = ∂Q/∂x in R	The "curl" condition; the differential form P dx + Q dy is exact

Why simply connected matters: In regions with holes, condition (d) does not guarantee conditions (a)–(c). Example: the vector field in Example 4.8 satisfies ∂P/∂y = ∂Q/∂x everywhere except at the origin, but it does not have a potential in the punctured disk (the region with the origin removed).

Don't confuse: ∂P/∂y = ∂Q/∂x is necessary for a potential to exist, but it is sufficient only in simply connected regions.

Surface Integrals and the Divergence Theorem

4.4 Surface Integrals and the Divergence Theorem

🧭 Overview

🧠 One-sentence thesis

The Divergence Theorem transforms a difficult surface integral over a closed surface into a simpler triple integral over the enclosed solid by relating the flux through the surface to the divergence of the vector field inside.

📌 Key points (3–5)

Parametrizing surfaces: A surface in 3D space is parametrized using two variables (u, v) that map a 2D region onto the surface, analogous to how one variable parametrizes a curve.
Surface area element: The infinitesimal surface area element d-sigma equals the magnitude of the cross product of the two partial derivative vectors of the position vector.
Surface integral of vector fields: The flux integral measures the net flow of a vector field through a surface, computed as the dot product of the field with the outward unit normal vector.
Divergence Theorem shortcut: For closed surfaces (those enclosing a bounded solid), the surface integral equals the triple integral of the divergence over the enclosed volume, avoiding tedious surface parametrization.
Common confusion: Divergence measures how much a field "spreads out" from a point (ratio of flux to volume in the limit), not the field's magnitude; solenoidal fields have zero divergence everywhere.

📐 Parametrizing surfaces

📐 From curves to surfaces

Surface parametrization: A transformation from a region R in the uv-plane (in R²) into a surface Sigma in R³, given by x = x(u,v), y = y(u,v), z = z(u,v).

Analogy with curves: Just as one parameter t traces out a curve in space, two parameters (u, v) "patch" a 2D region onto a surface.
The position vector is r(u,v) = x(u,v) i + y(u,v) j + z(u,v) k for (u,v) in region R.
This is called a "patch" because it maps gridlines in R onto curves on the surface Sigma.

🧮 Partial derivatives and tangent vectors

Define partial derivatives: ∂r/∂u and ∂r/∂v are vectors obtained by differentiating each component of r with respect to u or v.
Geometric meaning:
- Vertical gridlines (u constant) map to curves on Sigma with tangent vector ∂r/∂v.
- Horizontal gridlines (v constant) map to curves on Sigma with tangent vector ∂r/∂u.
These two tangent vectors span the tangent plane to the surface at each point.

🔲 Surface area element derivation

Take a small rectangle in R with corners (u,v), (u+Δu,v), (u+Δu,v+Δv), (u,v+Δv); its area is Δu Δv.
This rectangle maps to a patch on Sigma whose area d-sigma is approximately the area of the parallelogram with sides r(u+Δu,v) − r(u,v) and r(u,v+Δv) − r(u,v).
Using the derivative definition: ∂r/∂u ≈ (r(u+Δu,v) − r(u,v))/Δu, so r(u+Δu,v) − r(u,v) ≈ Δu ∂r/∂u.
Similarly: r(u,v+Δv) − r(u,v) ≈ Δv ∂r/∂v.
The parallelogram area is the magnitude of the cross product: ||(Δu ∂r/∂u) × (Δv ∂r/∂v)|| = ||∂r/∂u × ∂r/∂v|| Δu Δv.
Result: d-sigma = ||∂r/∂u × ∂r/∂v|| du dv.

🧮 Surface integrals of scalar fields

🧮 Definition and formula

Surface integral of f(x,y,z) over Sigma: The double integral over R of f(x(u,v), y(u,v), z(u,v)) times ||∂r/∂u × ∂r/∂v|| du dv.

Notation: double integral over Sigma of f(x,y,z) d-sigma.
Special case: When f = 1, the surface integral gives the surface area S of Sigma.
Formula: S = double integral over Sigma of 1 d-sigma = double integral over R of ||∂r/∂u × ∂r/∂v|| du dv.

🍩 Example: Surface area of a torus

Example: A torus T is formed by revolving a circle of radius a (centered at distance b from the z-axis) around the z-axis, where 0 < a < b.

Parametrization strategy:
- Angle u: from the circle's center to a point on the circle, measured from the positive y-axis.
- Angle v: from the origin to the circle's center, measured from the positive x-axis.
Parametrization: x = (b + a cos u) cos v, y = (b + a cos u) sin v, z = a sin u, for 0 ≤ u ≤ 2π, 0 ≤ v ≤ 2π.
Compute: ∂r/∂u = (−a sin u cos v, −a sin u sin v, a cos u).
Compute: ∂r/∂v = (−(b + a cos u) sin v, (b + a cos u) cos v, 0).
Cross product: ∂r/∂u × ∂r/∂v = (−a(b + a cos u) cos v cos u, −a(b + a cos u) sin v cos u, −a(b + a cos u) sin u).
Magnitude: ||∂r/∂u × ∂r/∂v|| = a(b + a cos u).
Surface area: S = integral from 0 to 2π integral from 0 to 2π of a(b + a cos u) du dv = 4π² ab.

🌊 Surface integrals of vector fields (flux)

🌊 Normal vectors and orientation

The cross product ∂r/∂u × ∂r/∂v is perpendicular to the tangent plane, hence normal to the surface.
Notation: n = ∂r/∂u × ∂r/∂v is a normal vector to Sigma.
Outward unit normal vector: The unit vector normal to Sigma pointing away from the "top" or "outer" part of the surface.
For a closed surface (like a sphere), the outward normal points away from the enclosed solid.

🌊 Flux integral definition

Surface integral of vector field f over Sigma: The double integral over Sigma of f · n d-sigma, where n is the outward unit normal vector.

Notation: double integral over Sigma of f · d-sigma = double integral over Sigma of f · n d-sigma.
The dot product f · n is a scalar function, so this reduces to a scalar surface integral.
Physical interpretation (flux): If f represents a fluid velocity field, the flux measures the net quantity of fluid flowing through Sigma per unit time.
- Positive flux: net flow outward (in direction of n).
- Negative flux: net flow inward (in direction of −n).

✈️ Example: Flux through a triangular plane

Example: Evaluate double integral over Sigma of f · d-sigma, where f(x,y,z) = yz i + xz j + xy k and Sigma is the part of the plane x + y + z = 1 with x ≥ 0, y ≥ 0, z ≥ 0, with outward normal in the positive z direction.

Normal vector: The vector v = (1,1,1) is normal to the plane x + y + z = 1.
Outward unit normal: n = (1/√3, 1/√3, 1/√3).
Parametrization: Project Sigma onto the xy-plane to get region R = {(x,y): 0 ≤ x ≤ 1, 0 ≤ y ≤ 1−x}.
Use u = x, v = y, so x = u, y = v, z = 1 − (u+v) for 0 ≤ u ≤ 1, 0 ≤ v ≤ 1−u.
On Sigma: f · n = (1/√3)(yz + xz + xy) = (1/√3)((u+v)(1−(u+v)) + uv) = (1/√3)((u+v) − (u+v)² + uv).
Compute: ∂r/∂u × ∂r/∂v = (1,0,−1) × (0,1,−1) = (1,1,1), so ||∂r/∂u × ∂r/∂v|| = √3.
Flux integral: Double integral over R of (1/√3)((u+v) − (u+v)² + uv) √3 dv du = integral from 0 to 1 integral from 0 to 1−u of ((u+v) − (u+v)² + uv) dv du = 1/8.

⚡ The Divergence Theorem

⚡ Statement and formula

Divergence Theorem: For a closed surface Sigma bounding a solid S, the surface integral of f · d-sigma equals the triple integral over S of div f dV.

Divergence of f: div f = ∂f₁/∂x + ∂f₂/∂y + ∂f₃/∂z, where f = f₁ i + f₂ j + f₃ k.
Formula: double integral over Sigma of f · d-sigma = triple integral over S of div f dV.
When to use: Sigma must be a closed surface (encloses a bounded solid), e.g., spheres, cubes, ellipsoids.
Don't confuse: Planes and paraboloids are not closed surfaces; the theorem does not apply to them.

⚡ Why the theorem simplifies calculations

Computing surface integrals directly requires parametrizing the surface, finding normal vectors, and integrating over the parameter domain—often tedious, especially when the normal vector changes across different parts of the surface.
The Divergence Theorem converts this to a triple integral over the solid, which is often easier to set up and evaluate.
Trade-off: You must compute the divergence, but this is usually straightforward differentiation.

🎯 Example: Flux through a sphere

Example: Evaluate double integral over Sigma of f · d-sigma, where f(x,y,z) = x i + y j + z k and Sigma is the unit sphere x² + y² + z² = 1.

Direct approach would be tedious: Parametrize the sphere using spherical coordinates, compute the outward normal, etc.
Using Divergence Theorem: div f = ∂x/∂x + ∂y/∂y + ∂z/∂z = 1 + 1 + 1 = 3.
The solid S is the unit ball with volume 4π(1)³/3 = 4π/3.
Result: double integral over Sigma of f · d-sigma = triple integral over S of 3 dV = 3 · (4π/3) = 4π.

🌀 Understanding divergence

🌀 Divergence as a measure of "spreading"

The term "divergence" comes from measuring how much a vector field "diverges" (spreads out) from a point.
Alternative definition: div f(x,y,z) = limit as V→0 of (1/V) times (double integral over Sigma of f · d-sigma), where Sigma is a closed surface around (x,y,z) enclosing volume V.
This limit is the ratio of flux through a surface to the volume enclosed, in the limit as the surface shrinks to the point.
Interpretation: A positive divergence means the field is "flowing out" from the point; negative means "flowing in"; zero means no net flow.

🌀 Solenoidal fields

Solenoidal field: A vector field with zero divergence everywhere.

If div f = 0 at every point, the field is called solenoidal.
Theorem: If the flux of f is zero through every closed surface containing a given point, then div f = 0 at that point.
Proof: By the alternative definition, div f(x,y,z) = limit as V→0 of (1/V) · 0 = 0.

🌀 Notation variants

Sometimes the notation triple-circle integral over Sigma is used for surface integrals over closed surfaces.
In physics texts, a single circle integral symbol may be used instead of the double-circle integral.

📋 Summary table: Key formulas

Concept	Formula	Notes
Surface area element	d-sigma = ‖∂r/∂u × ∂r/∂v‖ du dv	Magnitude of cross product of tangent vectors
Surface area	S = ∬_Σ 1 d-sigma	Special case: f = 1
Scalar surface integral	∬_Σ f(x,y,z) d-sigma	Substitute parametrization and integrate over R
Vector surface integral (flux)	∬_Σ f · d-sigma = ∬_Σ f · n d-sigma	n is outward unit normal
Divergence	div f = ∂f₁/∂x + ∂f₂/∂y + ∂f₃/∂z	Sum of partial derivatives of components
Divergence Theorem	∬_Σ f · d-sigma = ∭_S div f dV	Sigma closed, bounding solid S
Divergence (alternative)	div f(x,y,z) = lim_{V→0} (1/V) ∬_Σ f · d-sigma	Flux per unit volume in the limit

Stokes' Theorem

4.5 Stokes’ Theorem

🧭 Overview

🧠 One-sentence thesis

Stokes' Theorem generalizes Green's Theorem to three dimensions by equating the circulation of a vector field around a closed curve to the surface integral of its curl over any orientable surface bounded by that curve.

📌 Key points (3–5)

Line integrals extend to R³: The definitions from R² carry over to three-variable functions, allowing integration along curves in three-dimensional space.
Stokes' Theorem statement: For an orientable surface Σ with boundary curve C, the line integral of a vector field f around C equals the surface integral of curl f over Σ.
Orientability matters: A surface must have a continuously varying normal vector field (be "two-sided") for Stokes' Theorem to apply; the Möbius strip is a classic example of a nonorientable surface.
Common confusion: The positive unit normal vector n and the curve traversal direction are linked—if you walk along C with your head pointing in the direction of n, the surface must be on your left (the "n-positive" convention).
Curl measures circulation density: The curl of a vector field quantifies rotation or circulation per unit area; fields with zero curl (irrotational fields) have zero circulation around any closed curve.

📐 Line integrals in three dimensions

📐 Extending to R³

The excerpt begins by noting that line integral definitions from R² (covered in earlier sections) extend naturally to functions of three variables.

Line integral with respect to arc length s: For a real-valued function f(x, y, z) and a curve C parametrized by x = x(t), y = y(t), z = z(t) for a ≤ t ≤ b, the line integral is the integral from a to b of f(x(t), y(t), z(t)) times the square root of (x'(t)² + y'(t)² + z'(t)²) dt.

Similar formulas exist for line integrals with respect to x, y, and z individually.
The geometric interpretation carries over: if f(x, y, z) ≥ 0, the line integral with respect to arc length represents the area of a "picket fence" of height f along the curve C in R³.

🧲 Vector fields in R³

Line integral of a vector field: For a vector field f(x, y, z) = P(x, y, z) i + Q(x, y, z) j + R(x, y, z) k and a curve C with smooth parametrization, the line integral of f along C equals the integral of P dx + Q dy + R dz, which can be written as the integral of f · r'(t) dt.

The physical interpretation: if f represents force, then the line integral represents work done by that force in moving an object along C.
Example: The excerpt shows a conical helix calculation where the line integral is evaluated using the parametrization directly.

🔑 Key theorems for R³

The excerpt states three theorems without proof (analogous to the two-variable case):

Theorem	Content	Significance
Theorem 4.10	Line integral of f · dr equals line integral of f · T ds	Relates the vector field integral to the unit tangent vector T
Theorem 4.11 (Chain Rule)	dw/dt = (∂w/∂x)(dx/dt) + (∂w/∂y)(dy/dt) + (∂w/∂z)(dz/dt)	Extends the chain rule to three variables; also covers partial derivatives with respect to multiple parameters
Theorem 4.12	If f has a potential F (i.e., ∇F = f), then the line integral depends only on endpoints	Path independence for conservative fields

Corollary 4.13: If a vector field has a potential in a solid S, then the line integral around any closed curve in S equals zero.
Example: The excerpt demonstrates using a potential function F(x, y, z) = x²/2 + y²/2 + z² to evaluate a line integral by simply computing F(B) - F(A) at the endpoints.

🔄 Orientability and surface properties

🔄 What makes a surface orientable

Orientable surface: A surface Σ in R³ is orientable if there exists a continuous vector field N such that N is nonzero and normal to Σ (perpendicular to the tangent plane) at each point of Σ.

Such an N is called a normal vector field.
Orientable surfaces are "two-sided"—they have an "outer" and "inner" side.
Examples of orientable surfaces: spheres, cylinders, paraboloids, ellipsoids, planes.

🎀 The Möbius strip: a nonorientable example

The Möbius strip is constructed by taking a thin rectangle and connecting its ends at opposite corners, creating a twisted strip.

Why it's nonorientable: If you walk along the center line of a Möbius strip, you return to your starting point upside down—your orientation has changed continuously.
Thinking of your vertical direction as a normal vector, there is a discontinuity at every point because the vertical direction takes two different values at the same location.
The Möbius strip has only one side, making it nonorientable.
Don't confuse: A sphere has two sides (inside and outside); a Möbius strip has only one continuous side.

🧭 Positive unit normal and curve traversal

Positive unit normal vector n: For an orientable surface Σ with boundary curve C, pick a unit normal vector n such that if you walked along C with your head pointing in the direction of n, the surface would be on your left.

In this situation, C is traversed n-positively.
More precisely: if r(t) is the position vector for C and T(t) is the unit tangent vector to C, then the vectors T, n, T × n form a right-handed system.
Example: For the paraboloid z = x² + y² with z ≤ 1, the positive unit normal is n = (-∂z/∂x i - ∂z/∂y j + k) / √(1 + (∂z/∂x)² + (∂z/∂y)²).

🌀 Stokes' Theorem and the curl

🌀 Statement of Stokes' Theorem

Stokes' Theorem: Let Σ be an orientable surface in R³ whose boundary is a simple closed curve C, and let f(x, y, z) = P i + Q j + R k be a smooth vector field. Then the line integral of f · dr around C equals the surface integral of (curl f) · n dσ over Σ.

Where:

curl f = (∂R/∂y - ∂Q/∂z) i + (∂P/∂z - ∂R/∂x) j + (∂Q/∂x - ∂P/∂y) k
n is a positive unit normal vector over Σ
C is traversed n-positively

📝 Proof outline (special case)

The excerpt proves Stokes' Theorem for the special case where Σ is the graph of z = z(x, y) over a region D in R².

Key steps:

Project the surface Σ onto the xy-plane to get region D with boundary curve C_D.
Parametrize C in R³ using the parametrization of C_D and the surface equation z = z(x, y).
Apply the Chain Rule to express dz in terms of dx and dy.
Transform the line integral around C into a line integral around C_D in the xy-plane.
Apply Green's Theorem to convert the line integral around C_D into a double integral over D.
Show that the integrand matches (curl f) · n after careful calculation of partial derivatives.
The key algebraic step uses the fact that ∂²z/∂x∂y = ∂²z/∂y∂x (smoothness of z).

🧪 Verification examples

Example (paraboloid): For f(x, y, z) = z i + x j + y k on the paraboloid z = x² + y² with z ≤ 1:

curl f = i + j + k
Direct calculation of both sides yields π, confirming Stokes' Theorem.
The line integral around the boundary circle (x² + y² = 1 at z = 1) equals the surface integral of the curl over the paraboloid.

Example (elliptic paraboloid): For a more complex vector field on z = x²/4 + y²/9 with z ≤ 1:

Calculating curl f gives -4y i + 9x j + 0 k.
(curl f) · n = 0 everywhere on the surface.
By Stokes' Theorem, the circulation around the boundary ellipse is zero—much easier than computing the line integral directly!

🌊 Physical interpretation and applications

🌊 Circulation and irrotational fields

Circulation: For a simple closed curve C, the line integral of f · dr around C is called the circulation of f around C.

If curl E = 0 for an electrostatic field E due to a point charge, then the circulation around any closed curve is zero by Stokes' Theorem.
Irrotational fields: Vector fields with zero curl are called irrotational (meaning no rotation).
The term "curl" was created by James Clerk Maxwell in his study of electromagnetism.

🔄 Curl as circulation density

The excerpt gives an alternative definition of curl:

Curl as a limit: n · (curl f)(x, y, z) = limit as S → 0 of (1/S) times the circulation of f around C, where S is the surface area of a surface Σ containing the point (x, y, z) with boundary curve C and positive unit normal n at (x, y, z).

Think of the curve C shrinking to the point (x, y, z), causing the surface area to approach zero.
The ratio of circulation to surface area in the limit makes curl a measure of circulation per unit area (circulation density).

🎡 Rotation visualization

The excerpt describes a water-wheel analogy:

Consider a vector field f(x, y, z) parallel to the xy-plane, with magnitude growing as you move away from the y-axis (e.g., f(x, y, z) = (1 + x²) j).
Imagine dropping paddle wheels into the water flow.
A wheel to the right of the y-axis rotates counterclockwise; to the left, clockwise.
The curl is nonzero (curl f = 2x k in the example) and obeys the right-hand rule: curl f points in the direction of your thumb as you cup your right hand in the direction of rotation.
If all vectors had the same direction and magnitude, the wheels wouldn't rotate and there would be no curl (irrotational).

🔗 Path independence and conservative fields

By Stokes' Theorem, if curl f = 0 in a solid region S, then the circulation around any simple closed curve C in S is zero.

For a simply connected solid region S in R³ (regions with no holes), the following are equivalent:

(a) f has a smooth potential F in S (i.e., ∇F = f)
(b) The line integral of f · dr is independent of the path for any curve C in S
(c) The circulation of f around every simple closed curve C in S is zero
(d) ∂R/∂y = ∂Q/∂z, ∂P/∂z = ∂R/∂x, and ∂Q/∂x = ∂P/∂y in S (i.e., curl f = 0 in S)
Part (d) is also a way of saying the differential form P dx + Q dy + R dz is exact.
Example: For f(x, y, z) = xyz i + xz j + xy k, checking the curl conditions shows ∂P/∂z = xy but ∂R/∂x = y, so ∂P/∂z ≠ ∂R/∂x for some points. Therefore, f does not have a potential in R³.

Gradient, Divergence, Curl and Laplacian

4.6 Gradient, Divergence, Curl and Laplacian

🧭 Overview

🧠 One-sentence thesis

The del operator ∇ unifies gradient, divergence, curl, and Laplacian into a single symbolic framework that reveals fundamental relationships between these quantities and simplifies their expression in different coordinate systems.

📌 Key points (3–5)

The del operator as a unifying symbol: ∇ can be treated as a "vector" of partial derivative operators, allowing gradient, divergence, curl, and Laplacian to be written using familiar vector operations (dot product, cross product).
Fundamental relationships: The curl of any gradient is always zero (∇ × ∇f = 0), and the divergence of any curl is always zero (∇ · (∇ × f) = 0).
The Laplacian as a composite: The Laplacian ∆f is the divergence of the gradient (∇ · ∇f), producing a sum of second partial derivatives.
Common confusion: ∇ is not truly a vector because its components are operators, not numbers—but thinking of it as a vector helps organize formulas and remember relationships.
Coordinate system flexibility: Gradient, divergence, curl, and Laplacian have different formulas in cylindrical and spherical coordinates, but the underlying operations remain conceptually the same.

🧮 The del operator and its applications

🧮 What ∇ represents

The del operator: ∇ = (∂/∂x) i + (∂/∂y) j + (∂/∂z) k

Each component (∂/∂x, ∂/∂y, ∂/∂z) is a "partial derivative operator" that gets applied to functions.
Strictly speaking, ∇ is not a true vector because its components are operators, not numbers.
However, treating ∇ as if it were a vector allows us to use familiar vector operations (dot product, cross product) to express gradient, divergence, and curl.
Example: Applying ∂/∂x to a function f(x, y, z) produces the partial derivative ∂f/∂x.

📐 Gradient as ∇ applied to a scalar

Gradient of f: ∇f = (∂f/∂x) i + (∂f/∂y) j + (∂f/∂z) k

The gradient takes a real-valued (scalar) function f(x, y, z) and produces a vector field.
Think of ∇ as being "applied" to f to produce the vector ∇f.
Each component of ∇f is a partial derivative of f evaluated at the point (x, y, z).

🔄 Divergence as ∇ · f

Divergence of f: ∇ · f = ∂f₁/∂x + ∂f₂/∂y + ∂f₃/∂z

For a vector field f = f₁ i + f₂ j + f₃ k, the divergence is the dot product of ∇ with f.
The "multiplication" here means applying each partial derivative operator to the corresponding component of f.
The result is a scalar (real-valued function).
Example: (∂/∂x)(f₁) + (∂/∂y)(f₂) + (∂/∂z)(f₃) = ∂f₁/∂x + ∂f₂/∂y + ∂f₃/∂z.

🌀 Curl as ∇ × f

Curl of f: ∇ × f computed as a determinant with i, j, k in the first row, ∂/∂x, ∂/∂y, ∂/∂z in the second row, and P, Q, R (components of f) in the third row.

For a vector field f = P i + Q j + R k, the curl is the cross product of ∇ with f.
The result is a vector field.
The formula expands to: (∂R/∂y - ∂Q/∂z) i + (∂P/∂z - ∂R/∂x) j + (∂Q/∂x - ∂P/∂y) k.
Don't confuse: This is not a true cross product of vectors, but the notation and computation follow the same pattern.

🔗 The Laplacian and fundamental identities

🔗 Defining the Laplacian

Laplacian of f: ∆f = ∇ · ∇f = ∂²f/∂x² + ∂²f/∂y² + ∂²f/∂z²

The Laplacian is the divergence of the gradient of a scalar function f.
It takes a scalar function and produces another scalar function.
The result is the sum of all the unmixed second partial derivatives.
Alternative notation: ∇²f is often used instead of ∆f, using the convention ∇² = ∇ · ∇.
Example: For f(x, y, z) = x² + y² + z², the Laplacian is 2 + 2 + 2 = 6.

⚡ Curl of a gradient is always zero

Theorem: For any smooth real-valued function f, ∇ × (∇f) = 0.

Why: When you compute the curl of ∇f, each component involves mixed second partial derivatives like ∂²f/∂y∂z - ∂²f/∂z∂y, which are equal for smooth functions, so they cancel.
Implication: Gradients are irrotational (have no curl).
Corollary: If a vector field f has a potential (i.e., f = ∇φ for some scalar φ), then curl f = 0.
Don't confuse: Not every vector field with zero curl is a gradient—this depends on the domain (topology matters).

🌊 Divergence of a curl is always zero

Theorem: For any smooth vector field f, ∇ · (∇ × f) = 0.

Why: The proof is similar to the curl-of-gradient case—mixed partials cancel.
Implication: Curls are solenoidal (have zero divergence).
Corollary: The flux of the curl of any smooth vector field through any closed surface is zero (by the Divergence Theorem, the flux equals the triple integral of ∇ · (∇ × f), which is zero).

🔍 Alternative proof technique

The excerpt mentions a physics-style proof method: if the surface integral of a vector field over all surfaces in a region equals zero, then the vector field itself must be zero throughout that region.
Example application: To prove ∇ × (∇f) = 0, use Stokes' Theorem to show that the line integral of ∇f around any closed curve is zero (by a corollary), which implies the surface integral of ∇ × (∇f) over any capping surface is zero, forcing ∇ × (∇f) = 0.
Don't confuse: This method is powerful but requires careful justification (the excerpt notes physicists often skip the rigorous proof).

📏 Coordinate systems: cylindrical and spherical

📏 Why other coordinate systems matter

Often (especially in physics) it is more convenient to use cylindrical or spherical coordinates instead of Cartesian coordinates.
The gradient, divergence, curl, and Laplacian have different formulas in these systems, but the underlying concepts remain the same.
The excerpt provides complete formulas in tables for Cartesian, cylindrical, and spherical coordinates.

🔵 Cylindrical coordinates (r, θ, z)

Conversion from Cartesian: x = r cos θ, y = r sin θ, z = z.
Basis vectors: eᵣ, eθ, eᵤ are unit vectors in the direction of increasing r, θ, z respectively; they form an orthonormal set.
Key feature: By the right-hand rule, eᵤ × eᵣ = eθ.

Quantity	Formula in cylindrical coordinates
Gradient of F	(∂F/∂r) eᵣ + (1/r)(∂F/∂θ) eθ + (∂F/∂z) eᵤ
Divergence of f	(1/r)(∂/∂r)(r fᵣ) + (1/r)(∂fθ/∂θ) + ∂fᵤ/∂z
Laplacian of F	(1/r)(∂/∂r)(r ∂F/∂r) + (1/r²)(∂²F/∂θ²) + ∂²F/∂z²

Note: The curl formula is also provided in the excerpt but is more complex.

🔴 Spherical coordinates (ρ, θ, φ)

Conversion from Cartesian: x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ.
Basis vectors: eρ, eθ, eφ are unit vectors in the direction of increasing ρ, θ, φ respectively; they form an orthonormal set.
Key feature: By the right-hand rule, eθ × eρ = eφ.

Quantity	Formula in spherical coordinates
Gradient of F	(∂F/∂ρ) eρ + (1/(ρ sin φ))(∂F/∂θ) eθ + (1/ρ)(∂F/∂φ) eφ
Divergence of f	(1/ρ²)(∂/∂ρ)(ρ² fρ) + (1/(ρ sin φ))(∂fθ/∂θ) + (1/(ρ sin φ))(∂/∂φ)(sin φ fφ)
Laplacian of F	(1/ρ²)(∂/∂ρ)(ρ² ∂F/∂ρ) + (1/(ρ² sin² φ))(∂²F/∂θ²) + (1/(ρ² sin φ))(∂/∂φ)(sin φ ∂F/∂φ)

Note: The curl formula is also provided but is lengthy.

🛠️ Deriving formulas in new coordinates

The basic idea: Start with the Cartesian formula and substitute using the coordinate transformation.
Step 1: Express the new basis vectors (e.g., eρ, eθ, eφ) in terms of i, j, k.
Step 2: Solve for i, j, k in terms of the new basis vectors.
Step 3: Use the Chain Rule to express partial derivatives in the new coordinates in terms of Cartesian partials.
Step 4: Solve for Cartesian partials in terms of new-coordinate partials.
Step 5: Substitute everything into the Cartesian formula and simplify.
The excerpt walks through this process for the gradient in spherical coordinates—it is "straightforward but extremely tedious," involving 22 terms that must be simplified.

🧪 Applications and examples

🧪 Position vector field calculations

Example from the excerpt: Let r(x, y, z) = x i + y j + z k be the position vector field, and let the norm-squared be ‖r‖² = x² + y² + z².
Gradient of ‖r‖²: ∇‖r‖² = 2x i + 2y j + 2z k = 2r.
Divergence of r: ∇ · r = ∂x/∂x + ∂y/∂y + ∂z/∂z = 1 + 1 + 1 = 3.
Curl of r: ∇ × r = 0 (all components cancel).
Laplacian of ‖r‖²: ∆‖r‖² = 2 + 2 + 2 = 6.
Alternative calculation: ∆‖r‖² = ∇ · ∇‖r‖² = ∇ · 2r = 2(∇ · r) = 2(3) = 6.

🧪 Verification in spherical coordinates

The excerpt verifies the same results using spherical coordinates.
In spherical coordinates, ‖r‖² = ρ², so let F(ρ, θ, φ) = ρ².
Gradient: ∇F = (∂F/∂ρ) eρ + ... = 2ρ eρ = 2ρ (r/‖r‖) = 2ρ (r/ρ) = 2r, matching the Cartesian result.
Laplacian: ∆F = (1/ρ²)(∂/∂ρ)(ρ² · 2ρ) + 0 + 0 = (1/ρ²)(∂/∂ρ)(2ρ³) = (1/ρ²)(6ρ²) = 6, matching the Cartesian result.

⚡ Maxwell's Equations example

The excerpt shows how Gauss' Law for electrostatics can be converted into one of Maxwell's Equations using the Divergence Theorem.
Gauss' Law: The flux of the electric field E through any closed surface Σ equals 4π times the total charge enclosed (in Gaussian units).
By the Divergence Theorem, the flux equals the triple integral of ∇ · E over the enclosed solid S.
Equating the two expressions and using the fact that the surface (and solid) are arbitrary, we conclude ∇ · E = 4πρ, where ρ is the charge density.
This is one of Maxwell's Equations in differential form.