Calculus

1

Distance and Speed = Height and Slope

0.1 Distance and Speed = = Height and Slope

🧭 Overview

🧠 One-sentence thesis

Calculus is fundamentally about pairs of functions—an original function (height or distance) and its growth rate (slope or speed)—and the first step is understanding how to compute that growth rate.

📌 Key points (3–5)

  • Calculus studies pairs of functions: Function (1) is the original (e.g., height, distance), and Function (2) is its growth rate (e.g., slope, speed).
  • Three fundamental examples: linear (y = 2x), quadratic (y = x²), and exponential (y = 2ˣ) functions grow at different rates—exponential eventually wins.
  • Two methods to find growth rates: Method 1 uses limits (change in y / change in x as Δx → 0); Method 2 uses rules to build new rates from known ones.
  • Common confusion: average slope vs. instantaneous slope—average slope is Δy/Δx over an interval; calculus aims to find the slope at a single point by taking the limit.
  • Why it matters: the growth rate (dy/dx) tells you how fast a function is changing, which applies to real-world problems like speed (rate of distance change).

📊 Three core functions and their growth rates

📈 Linear function: y = 2x

  • Formula: y(x) = 2x
  • Growth rate: dy/dx = 2 (constant)
  • The graph is a straight line with constant slope.
  • Ratio of "up" to "across" is always 2, no matter which two points you pick.
  • Example: between x₁ = 1 and x₂ = 2, Δy = 4 − 2 = 2 and Δx = 1, so Δy/Δx = 2.

📈 Squaring function: y = x²

  • Formula: y(x) = x²
  • Growth rate: dy/dx = 2x (linear, not constant)
  • The graph is a parabola with increasing slope.
  • Example: between x₁ = 1 and x₂ = 2, Δy = 4 − 1 = 3 and Δx = 1, so average slope = 3; between x₁ = 0 and x₂ = 2, average slope is different.
  • Don't confuse: the average slope over an interval is not the same as the instantaneous slope at a point.

📈 Exponential function: y = 2ˣ

  • Formula: y(x) = 2ˣ
  • Growth rate: dy/dx = 2ˣ · (ln 2)
  • The graph is an exponential curve with exponentially increasing growth rate.
  • At first (near x = 0), the linear function grows fastest, but the exponential eventually overtakes both others.
  • Example: at x = 10, y = 2¹⁰ = 1024, far exceeding y = 10² = 100 for the quadratic.
  • The exponential "wins" in the long run.

🔍 What is a function?

🔍 Definition and notation

A function has inputs x and outputs y(x). To each x it assigns one y.

  • Domain: the set of allowed inputs x.
  • Range: the set of resulting outputs y.
  • Example: for y = 2x with x ≥ 0, the range is y ≥ 0; for y = 2ˣ with x ≥ 0, the range is y ≥ 1.

🔍 Three ways to describe a function

  1. Formula: y(x) = 2x
  2. Graph: shows x (horizontal) and y (vertical)
  3. Input-output pairs: the set of all (x, y) pairs
  • The high-level definition: a function is the set of all input-output pairs, or the rule that assigns an output to every input.
  • Practically, we learn by examples first, then refine the definition.

🧮 Two methods to compute growth rates

🧮 Method 1: Limits

  • Write the ratio: (change in y) / (change in x) = Δy / Δx.
  • Take the limit as Δx → 0.
  • This gives the instantaneous rate of change (the slope at a single point).
  • Example: for y = x², the average slope between x₁ and x₂ is Δy/Δx; the limit as Δx → 0 gives dy/dx = 2x.

🧮 Method 2: Rules

  • Use known growth rates and combine them with rules.
  • Constant factor rule: if y = C · f(x), then dy/dx = C · (df/dx).
    • Example: y = 5x² has growth rate 5 · (2x) = 10x.
  • Sum rule: if y = y₁ + y₂, then dy/dx = dy₁/dx + dy₂/dx.
    • Example: y = 5x² + 2x has growth rate 10x + 2.
  • Linear combination: the growth rate of C₁y₁ + C₂y₂ is C₁(dy₁/dx) + C₂(dy₂/dx).
  • Don't confuse: these rules only work when you already know the growth rates of the building-block functions.

📐 Slope of a graph

📐 Average slope

Average slope = (change in y) / (change in x) = (y₂ − y₁) / (x₂ − x₁) = Δy / Δx

  • Δ (delta) is the symbol for "change."
  • The average slope is the ratio of "distance up" to "distance across" between two points.
  • Example: for y = x² between x₁ = 1 and x₂ = 2, average slope = (4 − 1) / (2 − 1) = 3.

📐 Constant vs. changing slope

FunctionSlope behaviorExample
y = 2xConstant slope = 2Δy/Δx = 2 for any interval
y = x²Changing slopeAverage slope = 3 between x = 1 and x = 2; different elsewhere
  • For y = 2x, the ratio Δy/Δx = (2x₂ − 2x₁) / (x₂ − x₁) = 2 always.
  • For y = x², the slope increases as x increases (the graph gets steeper).

📐 Function (1) and Function (2)

  • Function (1): height of the graph (or distance traveled).
  • Function (2): slope of the graph (or speed of the car).
  • Example: if Function (1) is distance traveled y = C·t, then Function (2) is speed dy/dt = C (constant).
  • The core task of differential calculus: given Function (1), find Function (2).

🚗 Distance and speed analogy

🚗 Real-world interpretation

  • Function (1): distance traveled = C·t (linear function of time).
  • Function (2): speed of the car = C (constant).
  • The growth rate of distance with respect to time is speed.
  • Example: if you drive at constant speed C, your distance increases linearly, and the slope of the distance-time graph is C.

🚗 Why calculus matters

  • When speed is not constant (e.g., accelerating car), the distance function is not linear (e.g., y = t²).
  • Calculus finds the instantaneous speed (slope) at any moment by taking the limit Δy/Δx as Δx → 0.
  • This generalizes to any rate of change: growth rate, slope, speed, etc.
2

The Changing Slope of y = x² and y = xⁿ

0.2 The Changing Slope of y = x^2 and y = x^n

🧭 Overview

🧠 One-sentence thesis

The slope at a single point is defined as the limit of the average slope ratio Δy/Δx as the step Δx becomes infinitesimally small, which for y = x² yields the instantaneous slope 2x.

📌 Key points (3–5)

  • Average slope vs instantaneous slope: average slope is Δy/Δx over an interval; instantaneous slope at a point is the limit of this ratio as Δx approaches zero.
  • The central problem of differential calculus: defining "rate of change" at a single moment when nothing actually changes in that moment.
  • Algebraic technique: expand y(x + Δx) − y(x), cancel leading terms, divide by Δx, then let Δx shrink to find the limit.
  • Common confusion: do not try to define dy, dx, or 0/0 separately; instead, recognize that the ratio Δy/Δx has a well-defined limit.
  • First-order vs second-order terms: after cancellation, the first-order term (proportional to Δx) dominates; second-order terms (proportional to (Δx)²) vanish when divided by Δx and Δx shrinks.

📐 Understanding slope: from average to instantaneous

📏 Average slope between two points

Average slope = (change in y) / (change in x) = (y₂ − y₁) / (x₂ − x₁) = Δy / Δx

  • The Greek letter Δ (delta) means "change."
  • For y = x², the average slope depends on which two points you choose.
  • Example: Between x₁ = 1 and x₂ = 2, y₁ = 1 and y₂ = 4, so average slope = (4 − 1)/(2 − 1) = 3.
  • Between x₁ = 0 and x₂ = 2, the average slope is different (not 3), showing that the slope is changing along the curve.

🎯 The problem: slope at a single point

  • At one point, the distance across is Δx = 0 and the distance up is Δy = 0.
  • Formally, the ratio looks like 0/0, which has no meaning.
  • The inspiration of calculus: give this ratio a useful meaning by taking a limit.
  • Don't confuse: we do not define dy and dx separately; we define the limit of the ratio Δy/Δx.

🔬 The algebraic method for y = x²

🧮 Step-by-step calculation

The excerpt shows how to compute Δy/Δx for y = x²:

  1. Set up the ratio: Δy/Δx = [y(x + Δx) − y(x)] / Δx
  2. Expand: (x + Δx)² − x² = x² + 2x·Δx + (Δx)² − x²
  3. Cancel leading terms: x² and −x² cancel
  4. Divide by Δx: [2x·Δx + (Δx)²] / Δx = 2x + Δx

🔍 First-order vs second-order terms

TermFormBehavior after dividing by ΔxImportance
First-order2x·ΔxBecomes 2xDominates; responsible for most of Δy
Second-order(Δx)²Becomes Δx (still small)Vanishes as Δx shrinks
  • The first-order term 2x·Δx is proportional to Δx; after division, it leaves 2x.
  • The second-order term (Δx)² is proportional to (Δx)²; after division, it becomes Δx, which disappears as Δx approaches zero.
  • Example: if Δx = 0.01, then (Δx)²/Δx = 0.01, which is negligible compared to 2x.

🎓 The limit definition of slope

📉 Slope at x as a limit

The slope at x is the limit of Δy/Δx = [y(x + Δx) − y(x)] / Δx as Δx becomes very small.

  • The distance across (from x to x + Δx) is Δx.
  • The distance up (from y(x) to y(x + Δx)) is Δy.
  • As Δx shrinks, the ratio Δy/Δx approaches a definite number: the instantaneous slope.
  • For y = x², the ratio 2x + Δx approaches 2x as Δx approaches zero.
  • Therefore, the slope at any point x is 2x.

⚠️ What not to do

  • Do not try to define dy and dx separately as "infinitesimally small numbers."
  • Do not try to give meaning to 0/0 directly.
  • The successful plan: recognize that the ratio Δy/Δx is clearly defined for any nonzero Δx, and this ratio can approach a limit.

🌟 Generalizing to y = xⁿ

🔢 The pattern

  • The excerpt mentions "y = xⁿ" in the title, indicating that the same method applies to higher powers.
  • The algebraic expansion will again produce leading terms that cancel, a first-order term that dominates, and higher-order terms that vanish.
  • The slope (derivative) will be a function of x, computed by the same limit process.
3

The Exponential y = e^x

0.3 The Exponential y = e^x

🧭 Overview

🧠 One-sentence thesis

The slope at a point is defined as the limit of the ratio Δy/Δx as Δx becomes very small, and for y = x², this limit equals 2x.

📌 Key points (3–5)

  • What slope at a point means: the limit of the average slope Δy/Δx as the horizontal step Δx shrinks toward zero.
  • How algebra leads to calculus: compute the ratio (y(x + Δx) − y(x))/Δx algebraically, then examine what happens as Δx becomes very small.
  • Key algebraic pattern for y = x²: the leading x² terms cancel, leaving 2x·Δx as the "first-order term" and (Δx)² as the "second-order term."
  • Common confusion: we are not dividing 0 by 0; instead, the ratio Δy/Δx is well-defined for small Δx, and we look at its limit.
  • Why this matters: this two-step process (algebra then limit) extends to all power functions y = xⁿ and forms the foundation of differential calculus.

🔍 The central question of differential calculus

🔍 What "slope at a point" means

  • At a single moment or single point, nothing actually changes—so what does "rate of change" mean?
  • The excerpt poses this as the crucial question: how to define and compute slope when we are at one point on a curve like y = x².

🎯 The two-step answer

  1. Algebra step: compute the average slope Δy/Δx over a small interval from x to x + Δx.
  2. Calculus step: let Δx become very small and find the limit of that ratio.

Slope at x: the limit of (Δy/Δx) = (y(x + Δx) − y(x))/Δx as Δx shrinks.

  • The excerpt emphasizes that we do not try to define dy, dx, or 0/0 individually.
  • Instead, the ratio Δy/Δx is clearly defined for any nonzero Δx, and we examine what value it approaches.

🧮 Algebraic calculation for y = x²

🧮 Setting up the ratio

  • Horizontal distance: from x to x + Δx, so the step is Δx.
  • Vertical distance: from y(x) to y(x + Δx), so Δy = (x + Δx)² − x².

The ratio becomes:

  • Δy/Δx = ((x + Δx)² − x²)/Δx

🔢 Expanding and simplifying

Expand (x + Δx)²:

  • (x + Δx)² = x² + 2x·Δx + (Δx)²

Substitute into the ratio:

  • Δy/Δx = (x² + 2x·Δx + (Δx)² − x²)/Δx

Cancel the leading x² terms:

  • Δy/Δx = (2x·Δx + (Δx)²)/Δx = 2x + Δx

🎯 First-order vs second-order terms

TermNameRole
2x·ΔxFirst-order termResponsible for most of Δy; survives division by Δx as 2x
(Δx)²Second-order termAfter dividing by Δx, becomes Δx, which disappears as Δx → 0
  • The excerpt highlights that the "leading terms" x² and −x² cancel.
  • The important term is 2x·Δx, which after division gives 2x.
  • The second-order term (Δx)²/Δx = Δx is "still small" and will vanish in the limit.

🚀 Taking the limit

🚀 What happens as Δx becomes very small

  • After simplification, Δy/Δx = 2x + Δx.
  • As Δx shrinks toward zero, the ratio approaches 2x.
  • Therefore, the slope at x is 2x.

Example: At x = 3, the slope is 2·3 = 6; at x = 5, the slope is 2·5 = 10.

⚠️ Don't confuse: ratio vs individual terms

  • We are not defining dy or dx separately.
  • We are not computing 0/0.
  • We are computing a well-defined ratio for small but nonzero Δx, then examining its limit.
  • The excerpt warns: "Trying to define dy and dx and 0/0 is not wise, and I won't do it."

🔗 Connection to general power functions

🔗 Extension to y = xⁿ

  • The excerpt's title mentions "y = x² and y = xⁿ," indicating this algebraic method generalizes.
  • The same two-step process (expand, cancel leading terms, divide, take limit) applies to any power function.

🔗 Broader context from practice questions

The excerpt includes practice questions that reinforce:

  • Drawing slope graphs from function graphs (when f is increasing, slope s is positive; when f is at max/min, slope is zero).
  • Recovering the function f from its slope df/dt requires knowing a starting height f(0).
  • The area under the slope graph relates back to the change in the function.

These questions show that the slope concept applies to any function f(t), not just polynomials.

4

Video Summaries and Practice Problems

0.4 Video Summaries and Practice Problems

🧭 Overview

🧠 One-sentence thesis

This section provides summaries and practice problems for video lectures on differential and integral calculus, covering derivatives, integrals, optimization, and key functions like exponentials and trigonometric functions.

📌 Key points (3–5)

  • Purpose: Companion material for "Highlights of Calculus" video lectures available on MIT OpenCourseWare.
  • Structure: Contains summaries and practice questions for 15 video lectures covering differential and integral calculus topics.
  • Key topics: Maximum/minimum problems using derivatives, the big picture of integrals, derivatives of sine/cosine, product/quotient/chain rules, and growth rates.
  • Common confusion: Understanding when to use first derivative (slope = 0 for max/min) versus second derivative (bending direction to confirm max vs min).
  • Format: Each lecture summary includes key concepts, worked examples, and practice questions for self-study.

📐 Maximum and Minimum Problems

📐 Finding extrema using derivatives

To find maximum and minimum values of a function y(x): solve dy/dx = 0 to find points x* where slope = zero, then test each x* for a possible minimum or maximum.

  • The derivative equals zero at peaks and valleys of a curve.
  • Not every zero-slope point is a max or min; further testing is needed.
  • Example: For y(x) = x³ - 12x, solving 3x² = 12 gives x* = 2 and x* = -2.

🔍 Second derivative test

The second derivative d²y/dx² tells us about bending:

Second derivativeMeaningType of point
d²y/dx² > 0Curve bends upwardMinimum
d²y/dx² < 0Curve bends downwardMaximum
d²y/dx² = 0Bending changes directionPoint of inflection
  • At x* = 2: second derivative = 6(2) = 12 > 0 → minimum (bending up).
  • At x* = -2: second derivative = 6(-2) = -12 < 0 → maximum (bending down).
  • Example: y = sin x + cos x has maximum at x = π/4 radians (45 degrees) where y = √2.

🎯 Word problem strategy

  • Choose a suitable variable x to represent the quantity you can control.
  • Express the quantity to optimize as a function of x.
  • Example: Minimum of y = (x-1)² + (x-2)² + (x-6)² occurs at x = 3, the average of 1, 2, and 6.

🔗 The Big Picture of Integrals

🔗 Integration as reverse differentiation

Key problem: Recover the integral y(x) from its derivative dy/dx.

  • Integral calculus reverses differential calculus: if you know the slope history, you can find the original function.
  • Notation: y(x) = ∫(dy/dx)dx adds up the whole history of slopes.
  • Example: If dy/dx = x³, then y(x) = (1/4)x⁴ + C (any constant C works).

📊 Three interpretations of integrals

Method 1 - Recognition: Recognize dy/dx as the derivative of a known function.

  • If dy/dx = e^(2x), then y = (1/2)e^(2x) + C.

Method 2 - Area under curve: The integral equals the area under the graph of the derivative.

  • For s(t) = 6t (speed), the distance y(t) = 3t² equals the triangular area under the speed graph.

Method 3 - Sum of small steps: Add up rectangles of height s(t*) and width Δt.

  • As Δt → 0, the sum of rectangles → exact integral.
  • Each rectangle: s(t*)Δt = height × base = area.

🎓 Fundamental Theorem of Calculus

Two parts connect derivatives and integrals:

  1. Derivative of integral: If f(x) = ∫ₐˣ s(t)dt, then df/dx = s(x).
  2. Integral of derivative: If df/dx = s(x), then ∫ₐᵇ s(x)dx = f(b) - f(a).
  • The integral from start to end equals the total change.
  • Example: Area under s(t) = eᵗ from 0 to t is y(t) = eᵗ - 1 (not just eᵗ, because y(0) must equal 0).

🌊 Derivatives of Sine and Cosine

🌊 Key derivative formulas

d/dx(sin x) = cos x and d/dx(cos x) = -sin x (when x is measured in radians).

  • Radians are essential: 2π radians = 360 degrees = full circle.
  • On a unit circle (radius 1), x radians corresponds to arc length x.
  • At x = 0: sin x has slope 1 = cos 0; cos x has slope 0 = -sin 0.

📏 Proving the sine derivative

The key limit: (sin x)/x → 1 as x → 0.

Geometric argument:

  • Draw a right triangle with angle x on a unit circle.
  • Straight piece (sin x) < curved arc (x) < longer straight piece (tan x).
  • This gives cos x < (sin x)/x < 1, which "squeezes" the limit to 1.

Using trigonometry:

  • sin(x + Δx) = sin x cos Δx + cos x sin Δx.
  • Δy = sin(x + Δx) - sin x = (sin x)(cos Δx - 1) + (cos x)(sin Δx).
  • Divide by Δx and let Δx → 0: dy/dx = (sin x)(0) + (cos x)(1) = cos x.

🔄 Important properties

  • The derivative of (sin x)² + (cos x)² equals zero (confirming this sum always equals 1).
  • Second derivative of sin x is -sin x; second derivative of cos x is -cos x.
  • Maximum of y = sin x + cos x occurs where cos x = sin x, at x = π/4.

⚙️ Product, Quotient, and Chain Rules

⚙️ Product Rule

The derivative of f(x)g(x) is f(x)(dg/dx) + g(x)(df/dx).

  • Geometric picture: Change in area = top strip + side strip.
  • Δy = f(x + Δx)Δg + g(x)Δf (approximately).
  • Example: For y = x² sin x, dy/dx = x²(cos x) + (sin x)(2x).

➗ Quotient Rule

If y = f(x)/g(x), then dy/dx = [g(df/dx) - f(dg/dx)]/g².

  • Numerator: "bottom times derivative of top minus top times derivative of bottom."
  • Denominator: bottom squared.
  • Example: d/dx(sin x / cos x) = d/dx(tan x) = 1/cos²x = sec²x.
  • Example: d/dx(1/x⁴) = -4/x⁵, matching the power rule for negative exponents.

🔗 Chain Rule

For z = f(g(x)), the derivative is dz/dx = (dz/dy)(dy/dx) where y = g(x).

  • Think of it as "canceling dy" in the notation.
  • Must substitute back: Replace y with g(x) in the final answer.
  • Example: z = (x⁵)⁴ = x²⁰ has dz/dx = 4y³ · 5x⁴ = 20(x⁵)³x⁴ = 20x¹⁹.
  • Example: z = cos(4x) has dz/dx = -sin(4x) · 4 = -4sin(4x).

Don't confuse: The chain rule applies when one function is "inside" another, not when they're multiplied (that's the product rule).

📈 Growth Rates and Special Functions

📈 Comparing growth rates

Functions ordered from slow to fast growth as x gets large:

TypeExamplesGrowth speed
Logarithmiclog x, √xVery slow
Polynomialx, x², x³Moderate
Exponential2ˣ, eˣ, 10ˣFast
Factorialx!, xˣVery fast
  • At x = 1000: log 1000 = 3, but 1000¹⁰⁰⁰ = 10³⁰⁰⁰.
  • Logarithms are exponents: log(10⁹) = 9.
  • For negative powers: 1/x² decays much slower than e⁻ˣ.

📊 Logarithmic scales

Log scale: Plot x = 1, 10, 100, 1000 equally spaced (no zero point).

  • Equal spacing represents equal ratios, not equal differences.
  • Example: x = 1, 2, 4, 8 are equally spaced on a log scale.

Log-log graph: Both axes use log scale.

  • If y = Axⁿ, then log y = log A + n log x (a straight line).
  • The slope of the line equals the exponent n.

Semilog graph: Only y-axis uses log scale.

  • If y = Abˣ, then log y = log A + x log b (a straight line).
  • Useful for exponential growth or decay.

🔄 Inverse functions and logarithms

If y = eˣ, then the inverse function is x = ln y (natural logarithm).

  • ln y is the exponent in y = eˣ.
  • Key property: ln(yY) = ln y + ln Y (add logarithms because you add exponents).
  • ln(yⁿ) = n ln y (multiply exponent).
  • Change of base: ln y = (ln 10)(log y), where log means base-10 logarithm.

🎯 Practical Applications

🎯 Linear approximation

Near x = a, the function f(x) ≈ f(a) + (x - a)f'(a) follows its tangent line.

  • The tangent line has known height f(a) and slope f'(a).
  • Example: eˣ ≈ 1 + x near x = 0 (the linear part of the series).
  • Example: x¹⁰ ≈ 1 + 10(x - 1) near x = 1, so (1.1)¹⁰ ≈ 2.

🔍 Newton's Method

Iterative method to solve f(x) = 0:

  • Start with guess a where f(a) and f'(a) are known.
  • Follow the tangent line to where it crosses zero: x ≈ a - f(a)/f'(a).
  • Use this new x as the next starting point and repeat.

Example: Solve x² = 1.2.

  • Start at a = 1: f(1) = -0.2, f'(1) = 2, so x ≈ 1 - (-0.2)/2 = 1.1.
  • Start at a = 1.1: f(1.1) = 0.01, f'(1.1) = 2.2, giving even better x.

🌱 Differential equations of growth

Exponential growth: dy/dt = cy has solution y(t) = y(0)eᶜᵗ.

With constant source: dy/dt = cy + s has solution y(t) = -s/c + Aeᶜᵗ.

  • Constant solution y = -s/c when spending s balances income cy.
  • Choose A so y(0) matches the starting value.

Logistic equation: dP/dt = cP - sP² models population with competition.

  • Transform to linear equation using y = 1/P.
  • Population approaches limit c/s as t → ∞ (S-curve growth).

This section provides foundational material for self-study alongside video lectures, with emphasis on understanding concepts through examples and practice.

5

0.5 Graphs and Graphing Calculators

0.5 Graphs and Graphing Calculators

🧭 Overview

🧠 One-sentence thesis

Graphs reveal the essential behavior of functions—growth, bending, maxima, minima—through visual inspection of signs (positive/negative/zero) in the function itself, its slope, and its second derivative, whether or not a formula is available.

📌 Key points (3–5)

  • What graphs reveal without formulas: from a rough graph of y(x), you can extract information about slope (dy/dx), bending (d²y/dx²), and area, especially by identifying where these are positive, negative, or zero.
  • Special situations at zero: when dy/dx = 0 (local max/min), when d²y/dx² = 0 (inflection point), or when y(x) = 0 (crossing the axis), the function exhibits critical behavior.
  • Common confusion—which graph are you looking at?: if the given graph is y(x), dy/dx, or d²y/dx², the interpretation of maxima, minima, and inflection points changes completely.
  • Chain rule discovery via calculator: graphing sin(kx) and its numerical derivative reveals that the derivative is proportional to k·cos(kx), leading to the general chain rule pattern.
  • Why computing matters: computer-based graphics let you see functions, adjust viewing windows, zoom in/out, and discover mathematical ideas experimentally rather than passively.

📊 Reading graphs without formulas

📊 What a graph of y(x) tells you directly

  • The excerpt emphasizes extracting "basic information about the growth rate (the slope) and the minimum = maximum and the bending (and area too)."
  • Growth rate (slope): where y is increasing, dy/dx > 0; where y is decreasing, dy/dx < 0.
  • Maxima and minima: occur where dy/dx = 0 (the slope changes sign).
  • Bending (concavity): controlled by d²y/dx² (the slope of the slope); where d²y/dx² changes sign, there is an inflection point.
  • Example: a smooth hill on the graph of y(x) has a local maximum at the peak (dy/dx = 0) and an inflection point where the curve changes from bending upward to bending downward.

🔄 When the graph is dy/dx instead of y(x)

"Suppose this is the graph of dy/dx, the derivative of y(x). Answer the following questions about y(x), the original function."

  • Local minimum of y(x): occurs where dy/dx crosses from negative to positive (dy/dx = 0 at that point).
  • Local maximum of y(x): occurs where dy/dx crosses from positive to negative.
  • Inflection point of y(x): occurs where dy/dx has a local max or min (i.e., where d²y/dx² = 0).
  • Don't confuse: if you see a peak in the graph and it's labeled dy/dx, that peak is not a maximum of y; it's where the slope of y is largest (an inflection point of y).

🔄 When the graph is d²y/dx² (second derivative)

"Suppose this is the graph of the second derivative d²y/dx² (slope of the slope)."

  • Inflection point of y(x): occurs where d²y/dx² = 0 (the graph crosses the x-axis).
  • Local min/max of y(x): cannot be determined from d²y/dx² alone; you need information about dy/dx.
  • The excerpt notes: "If any of these questions can't be answered, explain why."
  • Example: if d²y/dx² is always positive, y is concave up everywhere, but you cannot tell where y has a minimum without knowing where dy/dx = 0.

🔗 Discovering the chain rule with a calculator

🔗 Experimenting with sin(kx)

The excerpt walks through graphing Y₁ = sin(X) and its numerical derivative Y₂ = nDeriv(Y₁, X, X):

  1. Y₁ = sin(X): Y₂ appears to be cos(X).
  2. Y₁ = sin(2X): Y₂ appears to be 2·cos(2X).
  3. Y₁ = sin(3X): Y₂ appears to be 3·cos(3X).
  • Conjecture: "If k is some constant, then the derivative of sin(kx) is [k·cos(kx)]."

🧩 Chains (compositions) and the inner/outer pattern

"Those functions are chains (also called compositions). They can be written in the form Y = f(g(x))."

  • For sin(kx), the outer function is f(x) = sin(x) and the inner function is g(x) = kx.
  • The excerpt then generalizes: "If y = sin(g(x)), then dy/dx = [cos(g(x))·g'(x)]."
  • This is the chain rule: differentiate the outer function, leave the inner function alone, then multiply by the derivative of the inner function.

🧪 Testing non-linear inner functions

  • The excerpt suggests Y = sin(√x), so g(x) = √x.
  • Conjecture: dy/dx = cos(√x)·(derivative of √x).
  • "Check your conjecture by graphing Y and comparing to the graph of the numerical derivative."

📝 General chain rule statement

"Whenever we have a composition of an outer and an inner function, the chain rule applies."

Examples to predict and check:

  1. y = (2x + 4)³ → dy/dx = 3(2x + 4)²·2
  2. y = cos²(x) = (cos x)² → dy/dx = 2·cos(x)·(−sin x)
  3. y = cos(x²) → dy/dx = −sin(x²)·2x
  4. y = [sin(x² + 1)]³ → dy/dx = 3[sin(x² + 1)]²·cos(x² + 1)·2x

Don't confuse: cos²(x) means (cos x)² (outer function is squaring), while cos(x²) means cos applied to x² (outer function is cosine).

💻 Computer-based graphics and learning

💻 Why graphics matter

"For calculus, the greatest advantage of the computer is to offer graphics. You see the function, not just the formula."

  • The excerpt emphasizes: "The power to see this subject is enormous, because it is adjustable."
  • You can change the viewing window, zoom in/out, and experiment with the domain and range.
  • "The computer offers the experience of actually working with a function."

🔍 Example 1: Do x³ and 3ˣ meet again?

  • At x = 3, x³ = 3ˣ = 27.
  • At x = 2, 2³ = 8 < 3² = 9; at x = 4, 4³ = 64 < 3⁴ = 81.
  • "If x³ is always less than 3ˣ we ought to know—these are among the basic functions of mathematics."
  • The computer can solve x³ = 3ˣ numerically or plot both functions.
  • Key insight: "If the graphs cross once, they must cross again—because 3ˣ is higher at 2 and 4." A crossing point near 2.5 is found by zooming in.
  • The excerpt notes: "I am less interested in the exact number than its position—it comes before x = 3 rather than after."

🧠 Learning by doing

The excerpt draws three conclusions:

  1. "A supercomputer is not necessary."
  2. "High-level programming is not necessary."
  3. "We can do mathematics without completely understanding it."

Rewritten: "We can learn mathematics while doing it."

  • "The hardest part of teaching calculus is to turn it from a spectator sport into a workout. The computer makes that possible."

🔢 Example 3: Finding the special b

"Find the number b for which xᵇ = bˣ has only one solution (at x = b)."

  • When b = 2, the second solution is 4 (above 2); when b = 3, the second solution is below 3.
  • "If we move b from 2 to 3, there must be a special 'double point'—where the graphs barely touch but don't cross."
  • At this special b, the curves are tangent (same slope).
  • "This special point b can be found with computer-based graphics. In many ways it is the 'center point of calculus.'"
  • The excerpt notes: "This number can be discovered first by experiment."

📐 Example 4: Graphing e^x − x^e

"Graph y(x) = eˣ − xᵉ. Locate its minimum. Zoom in near x = e."

  • From the derivatives of eˣ and xᵉ, show that dy/dx = 0 at x = e.
  • "Can you see why d²y/dx² > 0 at x = e?"
  • This confirms a minimum (concave up at the critical point).

🔎 Zooming and viewing windows

🔎 The power of zoom

"The use of the zoom is the best part of graphing. Not only do we choose the domain and range, we change them."

  • The viewing window is controlled by four numbers: limits A ≤ x ≤ B and C ≤ y ≤ D, or opposite corners (A, C) and (B, D), or center (a, b) and scale factors c and d.
  • "Clicking on opposite corners of the zoom box is the fastest way."

📉 Example 5: Solving x⁴ − 11x³ + 5x − 2 = 0

  • "The first tool is algebra—try to factor the polynomial. That succeeds for quadratics, and then gets extremely hard."
  • Two good choices:
    1. (Mathematics) Use the derivative. Solve by Newton's method.
    2. (Graphics) Plot the function and zoom in.
  • "This particular function is zero only once, in the standard window from −10 to 10."
  • "The graph seems to be leaving zero, but mathematics again predicts a second crossing point. So we zoom out before we zoom in."

🌀 Example 6: Oscillating functions

"Zoom out and in on the graphs of y = cos(40x) and y = x·sin(1/x). Describe what you see."

  • These functions oscillate rapidly or have unusual behavior near certain points.
  • Zooming reveals structure that is invisible at default scales.

⚠️ Example 7: Limits and machine precision

"What does y = (tan x − sin x)/x³ approach at x = 0?"

  • "For small x the machine eventually can't separate tan x from sin x. It may give y = 0."
  • "Can you get close enough to see the limit of y as x → 0?"
  • Don't confuse numerical error with mathematical truth: the computer has finite precision.

🧮 Symbolic computation and ideas

🧮 What symbolic computation does

"In symbolic computation, answers can be formulas as well as numbers and graphs."

  • The derivative of y = x² is seen as "2x"; the derivative of sin(t) is "cos(t)."
  • "The computer does more than substitute numbers into formulas—it operates directly on the formulas."

💡 Mathematics is ideas, not just formulas

"Mathematics is not formulas or computations or even proofs, but ideas."

  • "The symbols and pictures are the language."
  • "The book and the professor and the computer can join in teaching it."
  • "Your part is to learn by doing."

🧩 Example 8: Factorials and counting

A computer algebra system quickly finds 100! (158 digits, last 24 are zeros).

  • Question 1: How many digits (approximately) are in N!?
    • The excerpt notes: "It will never show more than N² digits, because none of the N terms can have more than N digits."
    • "A much tighter bound would be 2N, but is it true? Does N! always have fewer than 2N digits?"
  • Question 2: How many zeros (exactly) are at the end of N!?
    • For 10! = 3,628,800, there are two zeros: one from 10, the other from 5×2.
    • "Can you explain the 24 zeros in 100!? An idea from the card game blackjack applies here too: Count the fives."
    • Hard question: How many zeros at the end of 200!?

✍️ Writing in calculus

"May I emphasize the importance of writing? We totally miss it, when the answer is just a number."

  • "A one-page report is harder on instructors as well as students—but much more valuable."
  • "You can't write sentences without being forced to organize ideas—and part of yourself goes into it."
  • Proposed exercise: "Follow through on Examples 1–4 above and report. Without a computer, pick a paragraph from this book that should be clearer and make it clearer."
  • "Ideas are like surfaces—they can be seen many ways. Mathematics can be learned by talking."
6

Velocity and Distance

1.1 Velocity and Distance

🧭 Overview

🧠 One-sentence thesis

Calculus connects velocity and distance through two fundamental operations—differentiation (finding velocity from distance by computing slope) and integration (finding distance from velocity by computing area under the graph)—which are inverse processes that reveal how these two quantities depend on each other.

📌 Key points (3–5)

  • The central relationship: velocity v and distance f are paired functions where each can be recovered from the other through calculus operations.
  • Differentiation (slope): the velocity at any time is the slope of the distance graph; this process goes from f to v.
  • Integration (area): the distance traveled is the area under the velocity graph; this process goes from v to f.
  • Common confusion: negative velocity means backward motion (negative slope on the f-graph), and area below the axis in the v-graph counts as negative distance.
  • Functions as the language: v(t) and f(t) are functions that map each input time t to output values (velocity or distance), with domain (all valid times) and range (all possible outputs).

🚗 The fundamental velocity-distance relationship

🚗 The speedometer and odometer analogy

  • A car's dashboard shows two instruments measuring related but different quantities:
    • Speedometer: measures velocity v (miles per hour or km/hr)—a rate involving time.
    • Odometer: measures total distance f (miles or km)—no time unit.
  • The core question: if you have a complete record of one, can you recover the other?
  • Example: if the speedometer record is complete but the odometer is broken, the distance information is still recoverable through calculus (without re-driving the car).

🔄 Two directions of calculus

The entire subject of calculus is built on converting between v and f:

DirectionNameWhat it doesSymbol
fvDifferentiationFinds velocity from distance recordDifferential calculus
vfIntegrationFinds distance from velocity historyIntegral calculus

📏 Constant velocity: the simplest case

📏 Linear distance from constant speed

When velocity is fixed at v = 60 (mph), distance increases at a constant rate:

  • After 2 hours: f = 120 miles
  • After 4 hours: f = 240 miles
  • General formula: f = vt (distance equals velocity times time)
  • The graph of f is a straight line—this is called "linear" growth.

Don't confuse: this example assumes the car starts at full velocity instantly (a "step function") and begins from f(0) = 0.

📐 Slope equals velocity

For constant velocity, the relationship is purely algebraic (no calculus needed yet):

Slope = (change in distance) / (change in time) = vt / t = v

  • Geometrically: the velocity is the slope of the distance graph.
  • Example: from f₁ = 120 at t₁ = 2 to f₂ = 240 at t₂ = 4, the ratio is 120/2 = 60 mph at both points.

📐 Adjusting the starting point

  • If distance starts at 20 instead of 0: formula becomes f = 20 + 60t.
  • The constant 20 cancels when computing change in distance, so slope is still 60.
  • Negative velocity (v = -30) produces a downward-sloping graph (f = -30t), representing backward motion.

🔲 Area under the velocity graph

🔲 Integration as area computation

The opposite of slope is area—this is the key geometric insight:

The distance f is the area under the v-graph.

  • When v is constant, the region under the graph is a rectangle.
  • Rectangle area = height (v) × width (t) = vt = distance traveled.
  • This is integration: going from v to f by computing area.

🔲 The two central facts

1A: The slope of the f-graph gives the velocity v. The area under the v-graph gives the distance f.

These are inverse operations—differentiation and integration undo each other.

🔄 Forward and backward motion

🔄 Handling negative velocity

Example: a car goes forward at speed V, then backward at speed -V:

  • Forward part (0 ≤ t ≤ 3): velocity = +V, distance climbs to f(3) = 3V.
  • Backward part (3 < t ≤ 6): velocity = -V, distance decreases back to f(6) = 0.
  • The car returns to its starting point.

🔄 Negative area interpretation

  • Negative velocity makes the distance graph go downward (negative slope).
  • Area below the axis in the v-graph is counted as negative.
  • Total area "under" the v-graph = 0 when forward and backward motions cancel.

Don't confuse: the velocity function is discontinuous at t = 3 (the needle jumps), and v(3) is undefined. But the distance function f is continuous—no jump in distance, just a corner where the slope changes.

📊 Functions: inputs, outputs, and notation

📊 What a function is

A function assigns one output to each input; v(t) is the value of function v at time t.

  • Input: the time t (read "v of t" for v(t)).
  • Output: the velocity or distance at that time.
  • The function is like a memory bank with a record at each t.

Example formula for forward-back motion:

  • v(t) = +V if 0 ≤ t < 3
  • v(t) = -V if 3 < t ≤ 6
  • v(3) is undefined (discontinuous jump).

📊 Domain and range

Domain: the set of all inputs (valid times).
Range: the set of all outputs (possible distances or velocities).

  • Forward-back example: domain of f is 0 ≤ t ≤ 6; range is 0 ≤ f(t) ≤ 3V.
  • Range of v contains only two values: +V and -V.

A function is only allowed one value at each input time t.

📊 Linear functions

Every linear function has the form f(t) = vt + C:

  • Graph is a straight line.
  • v is the slope.
  • C is the constant that adjusts the starting point (vertical shift).

🔧 Transforming functions

🔧 Adding/subtracting constants

  • f(t) - 2: subtract 2 from all distances → moves graph down (affects range).
  • f(t - 2): subtract 2 from time → moves graph right (affects domain).

Example: starting from f(t) = 2t + 1:

  • f(t) - 2 = 2t - 1 (shifted down)
  • f(t - 2) = 2(t - 2) + 1 = 2t - 3 (shifted right)

🔧 Multiplying (scaling and zooming)

  • 2f(t): doubles all distances → stretches graph vertically (doubles the slope).
  • f(2t): speeds up time by factor of 2 → everything happens twice as fast, graph compressed horizontally (also doubles the slope).

Don't confuse: these four transformations—f(t) ± C, f(t ± C), kf(t), and f(kt)—each affect the graph differently. The first two are shifts; the last two are zooms/stretches.

7

Calculus Without Limits

1.2 Calculus Without Limits

🧭 Overview

🧠 One-sentence thesis

The sum of differences between consecutive numbers always equals the last number minus the first, and this algebraic fact—when applied to slopes and areas—reveals the Fundamental Theorem of Calculus without requiring limits or curved graphs.

📌 Key points (3–5)

  • Core algebraic insight: For any sequence of numbers f, the differences v add up to f_last − f_first, because middle terms cancel out.
  • Connection to slopes: The differences v represent the slopes of a piecewise linear graph of f.
  • Connection to areas: The sum of differences equals the total area under the piecewise constant v-graph.
  • Common confusion: Don't confuse the slope (velocity v, tax rate) with the accumulated quantity (distance f, total tax)—slopes tell you the rate of change at each point, while f accumulates those changes.
  • Why it matters: This algebraic version of the Fundamental Theorem works for step-by-step data before calculus extends it to smooth curves.

🔢 The fundamental difference pattern

🔢 How differences add up

Start with any list of numbers f = 0, 2, 6, 7, 4, 9. Their differences are v = 2, 4, 1, −3, 5.

When you add the differences:

  • 2 + 4 + 1 − 3 + 5 = 9
  • This equals the last f (which is 9) minus the first f (which is 0)

🧮 Why middle terms cancel

Write out the sum in full:

  • (5 − 1) + (12 − 5) + (7 − 12) + (10 − 7)
  • The 5's cancel, the 12's cancel, the 7's cancel
  • Only 10 − 1 remains

Key principle 1B: The differences of the f's add up to (f_last − f_first).

Example: For f = 1, 5, 12, 7, 10, the differences are 4, 7, −5, 3, which sum to 9 = 10 − 1.

📊 Connecting to slopes and graphs

📈 Piecewise linear f-graph

When you plot the f numbers and connect them with straight lines, you get a piecewise linear graph.

  • The slope of each piece equals the corresponding difference v
  • Slope = (change in f) / (change in t) = v / 1 = v
  • At breakpoints the slope changes suddenly

📉 Piecewise constant v-graph

The v numbers are plotted as horizontal segments (constant over each interval).

  • This resembles velocity graphs where speed is constant between time steps
  • The v-graph looks like a staircase

Don't confuse: The f-graph connects points with slanted lines; the v-graph uses horizontal steps.

🎯 The area-slope connection

🟦 Area under the v-graph

The total area under the piecewise constant v-graph equals f_last − f_first.

  • Each rectangle has base 1 and height v
  • Total area = sum of all v's
  • This works even if you stop partway through (e.g., at t = 3.5, you count half the last rectangle)

🎓 Fundamental Theorem preview

Key principle 1C: The v's are slopes of f(t). The area under the v-graph from t_start to t_end equals f(t_end) − f(t_start).

This is the Fundamental Theorem of Calculus, proven here using only algebra for piecewise linear functions. Chapter 5 will extend it to smooth curves using limits.

🔬 Pattern examples

🔢 Linear growth (constant velocity)

f = 2, 3, 4, 5, 6, 7
v = 1, 1, 1, 1, 1

  • Constant differences → straight line
  • Sum of v's = 5, which equals 7 − 2
  • Formula: f = t + 2

🟪 Quadratic growth (odd number differences)

f = 0, 1, 4, 9, 16 (perfect squares)
v = 1, 3, 5, 7 (odd numbers)

  • The j-th square is f_j = j²
  • The j-th difference is v_j = 2j − 1
  • Beautiful fact: first j odd numbers sum to j²
  • Example: 1 + 3 + 5 + 7 = 16 = 4²

📈 Exponential growth (powers of 2)

f = 1, 2, 4, 8, 16
v = 1, 2, 4, 8

  • Both sequences are powers of 2
  • f_j = 2^j and v_j = 2^(j−1)
  • The smooth version f(t) = 2^t has slope proportional to itself: v(t) = c·2^t

Don't confuse: 2j (linear), j² (quadratic), and 2^j (exponential) grow at very different rates.

🌊 Oscillating motion (discrete sine/cosine)

f = 0, 1, 1, 0, −1, −1, 0
v = 1, 0, −1, −1, 0, 1

  • Both sequences repeat with period 6
  • The f-graph resembles a sine wave (piecewise linear)
  • The v-graph resembles a cosine wave (piecewise constant)
  • Sum of v's = 0, confirming f_last − f_first = 0 − 0

💰 Real-world application: income tax

💵 Tax brackets and rates

1991 single filer tax structure:

  • Bracket 1 (0 to $20,350): 15% rate
  • Bracket 2 ($20,350 to $49,300): 28% rate
  • Bracket 3 (above $49,300): 31% rate

The tax function f(x) is piecewise linear with slopes equal to the tax rates.

📐 Point-slope equation

For income x in the top bracket:

  • f(x) = $11,158.50 + 0.31(x − $49,300)
  • Starting tax (at $49,300) + rate × extra income
  • The number $11,158.50 is the tax at the end of the middle bracket

⚠️ Marginal vs average rates

Marginal rate (v): the slope at point x; tax on each additional dollar
Average rate: total tax f(x) divided by total income x

Don't confuse: A 31% marginal rate does not mean 31% of all income is taxed—only the income in that bracket is taxed at that rate. The average rate is always lower than the top marginal rate.

Example: Tax is like accumulated distance (f), tax rate is like velocity (v) at each point.

🔧 Notation and subscripts

🔤 Subscript system

  • Numbers in sequence: f₀, f₁, f₂, ..., f_j
  • First difference: v₁ = f₁ − f₀
  • General difference: v_j = f_j − f_(j−1)
  • Differences start at j = 1 (there is no v₀)

📝 Function notation

Instead of discrete numbers, use f(t) for any time t:

  • Domain: all possible input values (times)
  • Range: all possible output values (distances)
  • Example: v(t) = V for t ≤ T, then v(t) = 0 for t > T

⚡ Special cases

🚀 Burst of speed

A car travels at speed V until distance 1, then stops:

  • v(t) = V up to t = T, then v(t) = 0
  • f(t) = Vt up to t = T, then f(t) = 1
  • Area under v-graph = V × T = 1, so T = 1/V

📶 Step function (extreme case)

When V → infinity, the car jumps instantly to distance 1:

  • The f-graph becomes a vertical step (unit step function U(t))
  • The slope is infinite at the jump point
  • The v-graph becomes a "delta function" spike with area 1

Note: This is beyond ordinary calculus—it involves impulses rather than continuous velocities.

8

The Velocity at an Instant

1.3 The Velocity at an Instant

🧭 Overview

🧠 One-sentence thesis

Calculus solves the problem of finding instantaneous velocity from a distance curve by taking the limit of average velocities over shrinking time intervals, and conversely finds distance from velocity by computing area under the velocity curve.

📌 Key points (3–5)

  • The two central problems: (1) finding distance when velocity changes, and (2) finding velocity (slope) when the distance graph is not a straight line.
  • Average vs instantaneous velocity: average velocity is the slope of a straight line between two points on the curve; instantaneous velocity is the limit as those points come together.
  • The limiting process: as the time interval h approaches zero, the average velocity (2t + h) approaches the instantaneous velocity (2t).
  • Common confusion: instantaneous velocity requires only local information near a point, but finding distance requires the entire velocity history up to that time.
  • The Fundamental Theorem connection: if the slope of f(t) produces v(t), then the area under v(t) produces f(t) back again.

🚗 The core problem setup

🚗 Two opposite questions

Calculus addresses two inverse problems:

  • Question 1: If velocity v(t) is changing, how do you compute the distance traveled?
  • Question 2: If the distance graph f(t) is not a straight line, what is its slope (velocity)?

The excerpt emphasizes these go in opposite directions and require different approaches.

📈 The example pair

The section uses a specific pair throughout:

  • Distance: f(t) = t²
  • Velocity: v(t) = 2t

When velocity is v(t) = 2t, a physicist says acceleration is constant (equals 2). The speedometer reading goes steadily up.

Example: After 10 seconds the speed is 20 feet per second; after 44 seconds it's 88 feet/second (which equals 60 miles/hour).

📊 Average velocity: the stepping stone

📊 What average velocity means geometrically

Average velocity = (change in f) / (change in t)

  • It is the slope of a straight line connecting two points on the distance curve.
  • When computing an average, we pretend velocity is constant—returning to the easiest case.
  • Example: From t = 10 to t = 11, distance goes from 100 to 121, so average velocity = (121 - 100) / 1 = 21 feet/second.

⚖️ The "Mean Value Theorem" intuition

The excerpt gives a legal analogy:

  • If you enter a highway at 1:00 and exit 150 miles away at 3:00, your average speed is 75 mph.
  • Even if police never clocked you at exactly 75, "they would have a definite feeling that you must have been doing 75 sometime."

Don't confuse: This is about average over an interval, not the reading at any single instant.

🔬 Finding instantaneous velocity

🔬 The shrinking interval method

To find velocity at exactly t = 10:

  1. Compute average over 1 second: 21 ft/s
  2. Compute average over 0.5 seconds: 20.5 ft/s
  3. Keep reducing the time interval h

The general formula for average between t = 10 and t = 10 + h:

  • v_average = [(10 + h)² - 10²] / h = (100 + 20h + h² - 100) / h = 20 + h

As h shrinks toward zero, the average approaches 20.

Conclusion: The velocity at t = 10 is v = 20.

🧮 The general algebraic computation

For any time t (not just t = 10):

  • v_ave = [f(t + h) - f(t)] / h
  • For f(t) = t²: v_ave = [(t + h)² - t²] / h = (t² + 2th + h² - t²) / h = 2t + h

Key result: As h approaches zero, the average velocity 2t + h approaches v(t) = 2t.

The excerpt notes: "This is the key computation of calculus." It requires algebra (to handle the variable t) and then a limit (as h → 0).

🎯 Why the limit is easy here

The excerpt reassures: "The general theory of limits is not particularly simple... but here we don't need it."

  • In this example, the limiting value is easy to identify.
  • The average 2t + h clearly approaches 2t as h → 0.

🔄 The opposite direction: velocity to distance

🔄 Integration as the inverse problem

If v(t) = 2t increases linearly with time, what is the distance?

  • This goes in the opposite direction (it is integration).
  • The Fundamental Theorem of Calculus says: if the slope of f(t) leads to v(t), then the area under the v-graph leads back to f(t).

📐 Computing area under the velocity curve

For v(t) = 2t, the graph forms a triangle:

  • Base = t
  • Height = 2t
  • Area = ½ × base × height = ½ × t × 2t = t²

This matches f(t) = t², confirming the relationship.

🧩 Extended examples and variations

🧩 Time delays (Example 1)

If the car doesn't start until t = 1:

  • v = 0 and f = 0 up to that time
  • After starting: v = 2(t - 1) and f = (t - 1)²
  • The time delay of 1 enters the formulas and shifts the graphs.

🧩 Changing acceleration (Example 2)

If acceleration changes from 2 to another constant a:

  • Velocity changes from v = 2t to v = at
  • Distance becomes f = ½at²
QuantityFormulaNote
AccelerationaConstant
Velocityv = atSlope of velocity curve
Distancef = ½at²Notice the factor ½

Special case: If a equals gravitational constant g, then v = gt is the velocity of a falling body (tested by Galileo at the Leaning Tower of Pisa).

🧩 Adding initial velocity (Example 3)

For f(t) = 3t + t²:

  • The average velocity calculation includes an extra 3h in the numerator
  • This adds 3 to the velocity: v = 3 + 2t
  • If Galileo had thrown a weight instead of dropping it, the starting velocity v₀ would add v₀t to the distance.

🤔 Local vs global: a crucial distinction

🤔 Finding slope (local problem)

To find the slope of the f-graph at a particular time t, you don't have to know the whole history.

  • Point B moves toward point A on the curve.
  • The problem is local—speed is completely decided by f(t) near point A.

🤔 Finding area (global problem)

To find the area under the v-graph up to time t, you do have to know the whole history.

  • A short record of speed is not enough to recover total distance.
  • We can only know the increase in mileage without earlier information, not the total.

Don't confuse: These are fundamentally different operations—differentiation is local, integration is global.

🤔 Conceptual questions from the excerpt

The excerpt includes diagnostic questions (from Steve Monk, University of Washington):

Secant line question: As point B approaches A on a curve, what happens to the slope of the line connecting them?

  • The excerpt notes 57% of students answered correctly.
  • This tests understanding of how average slopes approach instantaneous slope.

Two-car comparison: Given velocity graphs for cars C and D:

  • Which is going faster at a specific time? (Look at height of velocity graph)
  • Which has greater acceleration? (Look at slope of velocity graph)
  • If they start together, which is ahead at the end? (Requires imagining the area/distance)

The excerpt warns: "More than half the class got it wrong" on the distance question—you must look at the speed graph and imagine the distance graph.

budget:token_budget Tokens used this turn: 2047 Tokens remaining: 997953 </budget:token_budget>

9

Circular Motion

1.4 Circular Motion

🧭 Overview

🧠 One-sentence thesis

Circular motion provides a geometric way to understand sine and cosine functions, revealing that the slope of the sine curve is the cosine curve and the slope of the cosine curve is the negative sine curve.

📌 Key points (3–5)

  • Circular motion setup: A ball travels counterclockwise on a unit circle with constant speed 1, where angle equals time t (measured in radians).
  • Position and velocity relationship: The ball's coordinates are x = cos t and y = sin t; its velocity is tangent to the circle with upward component cos t and horizontal component −sin t.
  • Harmonic oscillation: The "shadow" of the ball moving up and down creates simple harmonic motion with distance f = sin t and velocity v = cos t.
  • Key derivative pairs: The slope of sin t is cos t; the slope of cos t is −sin t.
  • Common confusion: Speed vs. velocity—the ball has constant speed 1, but the shadow (oscillating mass) has changing velocity that slows to zero at the top and bottom.

🔄 Setting up circular motion

🎯 The basic setup

A ball travels on a circle of radius 1, centered at the origin, where the angle equals the time t.

  • The ball moves counterclockwise starting from the point (1, 0) on the x-axis.
  • At any time t, the ball is at position x = cos t, y = sin t.
  • The angle is measured in radians, not degrees.
  • A full circle is completed at t = 2π (not t = 360).

📏 Radians vs. degrees

MeasureFull circleConversion
Radians1 radian ≈ 57.3 degrees
Degrees360°1 degree ≈ 0.01745 radians
  • Radians are preferred because they make the speed exactly 1.
  • The circumference of a unit circle is 2πr = 2π(1) = 2π.
  • Traveling distance 2π in time 2π gives speed = 1.

⚡ Speed is constant

  • The ball travels at constant speed 1 around the circle.
  • Distance = 2π, time = 2π, so speed = 1.
  • Don't confuse: speed is constant, but velocity (which includes direction) is always changing.

🎯 Finding velocity from geometry

🔺 The velocity triangle

  • When you "let go" of the ball (like releasing a ball on a string), it flies off tangent to the circle.
  • The velocity direction is perpendicular to the radius from the center.
  • The velocity triangle has the same shape as the position triangle, but rotated 90°.

⬆️ Upward velocity component

The upward component of velocity is cos t, when the upward component of position is sin t.

  • This comes from the geometry of the velocity triangle.
  • The angle t appears in the velocity triangle as the angle with the vertical.
  • Example: At t = 0, the ball is at (1, 0) moving straight up with velocity cos 0 = 1.
  • Example: At t = π/2, the ball is at (0, 1) at the top with upward velocity cos(π/2) = 0 (momentarily stopped vertically).

↔️ Horizontal velocity component

  • The horizontal velocity is −sin t (negative because initially moving left).
  • The velocity satisfies (cos t)² + (−sin t)² = 1, matching the speed of 1.

🔃 Simple harmonic motion

🎭 The shadow concept

  • Instead of watching the ball go around, watch its shadow moving up and down on the y-axis.
  • The shadow's height matches the ball's height: f(t) = sin t.
  • This creates simple harmonic motion—smooth oscillation between +1 and −1.

🌊 Oscillation characteristics

  • The mass oscillates between y = 1 and y = −1.
  • Unlike a "bang-bang" bouncing motion with constant velocity ±1, this motion slows down to zero at the extremes.
  • The mass speeds up in the middle and slows at the top and bottom (like a spring).
  • Time for one complete cycle: 2π.

📊 Distance and velocity graphs

  • Distance: f(t) = sin t (the height of the shadow).
  • Velocity: v(t) = cos t (the upward velocity of the ball, which equals the shadow's velocity).
  • The velocity and distance curves are "out of phase"—shifted by π/2.

📐 Slopes of sine and cosine

📈 Slope of the sine curve

When the distance is f(t) = sin t, the velocity is v(t) = cos t.

  • The slope of the sine curve at any time t equals cos t.
  • At t = 0: sin 0 = 0 and slope = cos 0 = 1 (steepest upward).
  • At t = π/2: sin(π/2) = 1 (maximum) and slope = cos(π/2) = 0 (flat).
  • At t = π: sin π = 0 and slope = cos π = −1 (steepest downward).
  • Key insight: At maximum or minimum, the slope is zero (curve levels off).

📉 Slope of the cosine curve

When the distance is f(t) = cos t, the velocity is v(t) = −sin t.

  • This is the "twin pair" of the first relationship.
  • Think of starting the time clock at the top of the circle (t = 0 at angle π/2 in the original setup).
  • Or watch the ball's horizontal motion: distance across is cos t, velocity across is −sin t.

⚡ Speeded-up motion

  • If the ball travels twice as fast, reaching angle 2t at time t:
    • Position: x = cos 2t, y = sin 2t.
    • Speed is now 2 (completes circle in time π).
    • Upward velocity: 2 cos 2t (not just cos 2t).
    • The velocity triangle is twice as big.
  • General pattern: if f = sin(kt), then v = k cos(kt).

🔗 Connecting derivatives and integrals

🔄 The fundamental relationship

  • Differential calculus: Compute v from f (find slopes).
  • Integral calculus: Compute f from v (find areas).
  • These are inverse operations.

📊 Area under the cosine curve

  • Since the slope of sin t is cos t, the area under cos t equals the change in sin t.
  • Example: Area from t = 0 to t = π/2 equals sin(π/2) − sin 0 = 1 − 0 = 1.
  • This demonstrates the power of the Fundamental Theorem of Calculus.

📝 Summary of pairs

Distance fVelocity vNotes
vtvConstant velocity
½at²atConstant acceleration
sin tcos tHarmonic motion
cos t−sin tHarmonic motion (shifted)

🎓 Key formulas and relationships

🔢 The Pythagorean identity

  • (cos t)² + (sin t)² = 1
  • This comes from x² + y² = 1 for points on the unit circle.
  • Applies to both position and velocity triangles.

🎯 Special values to remember

  • At t = 0: cos 0 = 1, sin 0 = 0
  • At t = π/2: cos(π/2) = 0, sin(π/2) = 1
  • At t = π: cos π = −1, sin π = 0
  • At t = 2π: cos 2π = 1, sin 2π = 0 (back to start)
10

A Review of Trigonometry

1.5 A Review of Trigonometry

🧭 Overview

🧠 One-sentence thesis

Trigonometry extends from right-triangle ratios to circular motion by measuring angles in radians and using fundamental identities derived from the Pythagorean theorem, providing essential formulas needed throughout calculus.

📌 Key points (3–5)

  • Six basic ratios: cosine, sine, tangent, secant, cosecant, and cotangent come from the three sides of a right triangle (x, y, r) and remain constant when the triangle is scaled.
  • From triangles to circles: circles allow angles beyond 90°, with radians replacing degrees as the natural unit (2π radians = 360°, distance around = rθ).
  • Pythagorean identities: cos²θ + sin²θ = 1 and related identities (1 + tan²θ = sec²θ, cot²θ + 1 = csc²θ) flow from x² + y² = r².
  • Addition formulas: cos(s - t) = cos s cos t + sin s sin t, derived from equal distances in rotated circles, leads to double-angle and other formulas.
  • Common confusion: distinguish cos 2t (cosine of double angle) from cos²t (square of cosine); also sin(-θ) = -sin θ (odd) but cos(-θ) = cos θ (even).

📐 The six trigonometric functions

📐 Basic ratios from a right triangle

The six functions are ratios of the three sides x, y, r of a right triangle:

  • cos θ = x/r (near side / hypotenuse), sec θ = r/x = 1/cos θ
  • sin θ = y/r (opposite side / hypotenuse), csc θ = r/y = 1/sin θ
  • tan θ = y/x (opposite side / near side), cot θ = x/y = 1/tan θ
  • The ratios depend only on the angle θ, not the triangle's size—scaling doesn't change them.
  • Three functions on the right are reciprocals of the three on the left.
  • Tangent equals sine divided by cosine: tan θ = (y/r)/(x/r) = y/x.

📏 Size constraints

FunctionRange
|cos θ|, |sin θ|≤ 1
|sec θ|, |csc θ|≥ 1
tan θ, cot θAny value
  • When cos θ → 0, tan θ → infinity (the triangle becomes infinitely steep at 90°).
  • Example: at θ = 2π/3, cos θ = -1/2, sin θ = √3/2, tan θ = -√3, sec θ = -2, csc θ = 2/√3, cot θ = -1/√3.

🔄 From triangles to circles

🔄 Why circles extend trigonometry

  • Triangles work well up to 90° and are acceptable up to 180°, but fail beyond that (cannot put a 240° angle into a triangle).
  • On a circle, angles are measured counterclockwise from the positive x-axis: 90° is straight up, 180° is left, 360° = 0°.
  • Each angle yields a point on the circle; coordinates x and y can be negative (but never r).

🌀 Radians replace degrees

The angle θ is measured by the arc length rθ around the circle; when r = 1, the distance is simply θ.

  • A full circle has circumference 2πr, so the angle is 2π radians = 360°.
  • Key conversions: π radians = 180°, π/2 radians = 90°, 1° = 2π/360 radians, 1 radian ≈ 360°/2π.
  • Example: a 45° angle is 1/8 of a circle = 2π/8 radians, and the arc length is 2π/8.

⚖️ Even and odd functions

  • Negative angles go clockwise: -θ reverses y but not x or r.
  • cos(-θ) = cos θ (even function, no sign change).
  • sin(-θ) = -sin θ, tan(-θ) = -tan θ (odd functions, sign changes).
  • Periodicity: adding 2π (one full rotation) doesn't change x, y, r, so all six functions have period 2π.

🔢 Fundamental identities

🔢 The Pythagorean identities

From x² + y² = r², dividing by r², x², or y² yields three key identities:

  • cos²θ + sin²θ = 1
  • 1 + tan²θ = sec²θ
  • cot²θ + 1 = csc²θ
  • The first identity (cos²θ + sin²θ = 1) is the most important and must be unforgettable.
  • Dividing x² + y² = r² by r² gives (x/r)² + (y/r)² = 1, which is cos²θ + sin²θ = 1.
  • Dividing by x² gives 1 + (y/x)² = (r/x)², which is 1 + tan²θ = sec²θ.
  • These identities allow constant switching between sines and cosines throughout calculus.

🔗 Complementary angle connection

sin θ = cos(π/2 - θ) and cos θ = sin(π/2 - θ)

  • The complementary angle is π/2 - θ because the two angles add to π/2 (a right angle).
  • This connection allows formulas for cosine to be converted into formulas for sine.

➕ Addition and double-angle formulas

➕ Deriving cos(s - t) from equal distances

  • By comparing distances in two identical circles (one rotated), the distance squared from (cos s, sin s) to (cos t, sin t) equals the distance squared from (cos(s-t), sin(s-t)) to (1, 0).
  • Expanding both expressions and using cos²θ + sin²θ = 1 yields:

cos(s - t) = cos s cos t + sin s sin t

  • Replacing t with -t (and using cos(-t) = cos t, sin(-t) = -sin t) gives:

cos(s + t) = cos s cos t - sin s sin t

🔁 Double-angle formulas

Setting s = t in the addition formula:

cos 2t = cos²t - sin²t = 2cos²t - 1 = 1 - 2sin²t

  • The three forms come from substituting cos²t = 1 - sin²t or sin²t = 1 - cos²t.
  • Rearranging: cos²t = (1 + cos 2t)/2 and sin²t = (1 - cos 2t)/2 (needed in calculus).

➕ Sine addition formulas

Using the complementary angle connection from cosine formulas:

sin(s - t) = sin s cos t - cos s sin t sin(s + t) = sin s cos t + cos s sin t sin 2t = 2 sin t cos t

  • Don't confuse: sin 2x (sine of double angle) vs. 2 sin x (twice the sine).

📏 Distance formula application

📏 Distance between two points

Distance d = √[(x₂ - x₁)² + (y₂ - y₁)²]

  • This Pythagorean formula measures the straight-line distance between (x₁, y₁) and (x₂, y₂).
  • The x-distance is |x₂ - x₁|, the y-distance is |y₂ - y₁|, and d is the hypotenuse.
  • Applying this formula to points on two identical circles (one rotated) leads directly to the addition formula for cos(s - t).
11

A Thousand Points of Light

1.6 A Thousand Points of Light

🧭 Overview

🧠 One-sentence thesis

When you plot discrete points sin n (for integer n) instead of the continuous curve sin x, the visual appearance changes dramatically with scale, revealing unexpected patterns—including what looks like 44 separate sine curves—because certain integers n bring sin n close to zero when n is near multiples of π.

📌 Key points (3–5)

  • Discrete vs continuous: sin n (points at integers) looks completely different from sin x (continuous curve); the same 1000 points can appear as sine curves or hexagons depending on scale.
  • Why patterns emerge: points cluster into apparent sine curves because certain integers n make sin n ≈ 0 (when n is close to multiples of π, like 22 ≈ 7π).
  • Counting the curves: there are 44 apparent sine curves starting near heights sin 0, sin 1, …, sin 43; 22 start upward and 22 downward.
  • Common confusion: scale matters—tilting or compressing the x-axis changes what you see; the "sine curves" are an illusion created by which points happen to be "close" in value.
  • The middle curve: the curve starting at (0,0) returns to zero around n = 7810 because 44N ≈ 14N + π when N ≈ 177.5, giving a period of about 15,620.

📊 What changes with scale

📊 Continuous vs discrete graphs

  • sin x (continuous): by x = 10,000 the curve oscillates 10,000/(2π) ≈ 1591 times; too crowded to see individual cycles.
  • sin n (discrete points): picks 10,000 individual points from the curve; these points appear to lie on more than 40 separate sine curves.

🔍 The same points look different

The excerpt shows two graphs of the first 1000 points:

  • First graph: points seem to lie on sine curves.
  • Second graph: most people see hexagons.
  • Key insight: "Tilt the second graph and look from the side at a narrow angle"—the narrow angle compresses the x-axis back to the first scale, and the first graph reappears.

"The effect of scale is something we don't think of."

  • What is "close" depends on the scale you choose.
  • Example: zooming in or out on a computer changes which features you notice.

Don't confuse: the graphs are identical data; only the viewing scale differs.

🔢 Which points are near (0,0)?

🔢 When is sin n close to zero?

A point near (0,0) means sin n ≈ 0.

  • sin 1 = 0.84 (not close to zero; starts the seventh sine curve).
  • sin 2 = 0.91, sin 3 = 0.14 (because 3 and π ≈ 3.14 are close, so sin 3 ≈ sin(π − 3) ≈ sin 0.14 ≈ 0.14).
  • sin 4, sin 5, …, sin 21 are not especially close to zero.

🎯 The first close point: sin 22

  • Why: 22/7 ≈ π, so 22 ≈ 7π.
  • Then sin 22 = sin(7π − 22) ≈ sin(−0.01) ≈ −0.01.
  • This is the first point to the right of (0,0) and slightly below; it begins a curve downward.

🎯 The next close points

  • sin 44: 44 ≈ 14π + 0.02, so sin 44 ≈ sin 0.02 ≈ 0.02.
    • This point (44, sin 44) starts the middle sine curve.
  • sin 88: similarly close to zero.

Pattern: multiples of 44 stay near zero because 44 ≈ 14π + 0.0177.

🌀 How many sine curves are there?

🌀 Counting the curves

There are 44 curves.

  • They begin near the heights sin 0, sin 1, sin 2, …, sin 43.
  • Of these 44 curves, 22 start upward and 22 start downward.

🕵️ Why only 42 are visible

The excerpt notes:

  • sin 11 ≈ 0.99999 (very close to 1).
  • sin 33 ≈ −0.9999 (very close to −1).
  • These are so close to the top and bottom that their curves are invisible.

Why sin 11 ≈ 1: because sin 22 ≈ 0, so 11 ≈ (22)/2 ≈ 7π/2, which is near an odd multiple of π/2 where sine peaks.

🔄 Following a single curve

  • "It is almost impossible to follow a single curve past the top—coming back down it is not the curve you think it is."
  • The discrete points jump between different apparent curves.

🔁 The middle curve and its period

🔁 Points on the middle curve

The middle curve passes through points at n = 0, 44, 88, and every multiple 44N.

🔁 When does it return to zero?

Question: when does 44N come very close to a multiple of π?

  • We know 44 ≈ 14π + 0.0177.
  • So 44N = (14π + 0.0177)N = 14πN + 0.0177N.
  • For 44N ≈ (some integer)π + π, we need 0.0177N ≈ π.
  • Solving: N ≈ π / 0.0177 ≈ 177.5.

📐 The period calculation

  • At N = 177.5, we have 44N = 7810.
  • This is half the period of the sine curve.
  • The sine of 7810 is very near zero.
  • The actual points on the middle curve near this zero crossing are n = 44 × 177 and n = 44 × 178, with sines just above and below zero.
  • Halfway between is n = 7810.

The equation for the middle sine curve is y = sin(x / 7810).

  • Its full period is 15,620—beyond the 10,000-point graph.

Example: the fourth point on the middle curve is at n = 4 × 44 = 176, so the point is (176, sin 176).

🧩 Summary table

ConceptValue / Description
Total apparent curves44
Curves starting upward22
Curves starting downward22
First point near (0,0)sin 22 ≈ −0.01
Middle curve zero crossingsn = 44N, returns to zero around n = 7810
Middle curve period≈ 15,620
Why patterns emerge22 ≈ 7π and 44 ≈ 14π + 0.0177
12

The Derivative of a Function

2.1 The Derivative of a Function

🧭 Overview

🧠 One-sentence thesis

The derivative measures the instantaneous rate of change of a function by taking the limit of average change ratios as the time or space step shrinks to zero.

📌 Key points (3–5)

  • What the derivative measures: the instantaneous rate of change (velocity, slope, marginal demand) at a single point, found by taking limits of average rates.
  • The limit process is essential: you must first compute the difference f(t + Δt) − f(t), divide by Δt, and only then let Δt approach zero—setting Δt = 0 too early gives the meaningless 0/0.
  • Notation flexibility: the derivative can be written as f′(t), df/dt, or v(t); the independent variable can be t (time) or x (position); df/dt is a single symbol, not a fraction you can cancel.
  • Common confusion—when the derivative does not exist: at corners or jumps in the graph, the forward and backward average slopes differ, so there is no single limiting slope.
  • Sign of the derivative: increasing functions have positive slope; decreasing functions have negative slope (e.g., 1/t has derivative −1/t²).

📐 The formal definition

📐 The derivative as a limit

Derivative: f′(t) = lim(Δt → 0) [f(t + Δt) − f(t)] / Δt

  • The numerator f(t + Δt) − f(t) is the change in distance (or height), often written Δf.
  • The denominator Δt is the change in time (or position).
  • The ratio Δf / Δt is the average velocity (or average slope) over the interval.
  • The limit of this ratio as Δt → 0 is the instantaneous velocity (or instantaneous slope).

🔤 Notation and terminology

SymbolRead aloudMeaning
f(t)"f of t"Value of function f at time t
Δt"delta t"A nonzero (usually short) time step
Δf"delta f"The change f(t + Δt) − f(t)
Δf / Δt"delta f over delta t"Average velocity
f′(t)"f prime of t"The derivative at time t
df/dt"d f d t"Same as f′(t); Leibniz notation
lim(Δt → 0)"limit as delta t goes to zero"The limiting process
  • Important: Δf is not "delta times f"; it is the change in f. Similarly Δt is not "delta times t."
  • Important: df/dt is a single notation; do not cancel the d's or treat it as a fraction during the definition.
  • The step Δt (or h) can be positive or negative; the limit must be the same from both directions.

🔄 Alternative letters

  • Independent variable: t (time) or x (position).
  • Dependent variable: f, y, u, etc.
  • Derivative: df/dt, dy/dx, f′(x), y′(x), etc.
  • Example: y = f(x) has slope dy/dx = y′(x).

🧮 Computing derivatives: the three-step method

🧮 The standard procedure

  1. Write out the difference: f(t + Δt) − f(t).
  2. Divide by Δt: form the ratio [f(t + Δt) − f(t)] / Δt and simplify algebraically.
  3. Take the limit: let Δt → 0 to find df/dt.

Critical rule: Do not set Δt = 0 in step 1 or you will get 0/0, which is meaningless. Algebra must happen first.

📊 Example: f(t) = t²

  • Step 1: f(t + Δt) − f(t) = (t + Δt)² − t² = t² + 2t·Δt + (Δt)² − t² = 2t·Δt + (Δt)².
  • Step 2: Δf / Δt = [2t·Δt + (Δt)²] / Δt = 2t + Δt.
  • Step 3: As Δt → 0, the average 2t + Δt approaches the derivative 2t.

📊 Example: f(t) = 1/t

  • Step 1: f(t + Δt) − f(t) = 1/(t + Δt) − 1/t.
  • Use common denominator: = [t − (t + Δt)] / [t(t + Δt)] = −Δt / [t(t + Δt)].
  • Step 2: Δf / Δt = −1 / [t(t + Δt)].
  • Step 3: As Δt → 0, this approaches −1/t².

Interpretation: The function 1/t is decreasing, so its derivative is negative.

⚠️ When the derivative does not exist

⚠️ Corners and jumps

  • Example: f(t) = 2t for t < 3, then f(t) = 6 (constant) for t ≥ 3.
  • For t < 3: df/dt = 2.
  • For t > 3: df/dt = 0 (derivative of a constant is zero).
  • At t = 3: the graph has a corner; the forward average (0) and backward average (2) differ, so f′(3) does not exist.

Don't confuse: A function can be defined everywhere but fail to have a derivative at isolated points (corners, cusps, jumps).

⚠️ The derivative of a constant function is zero

  • If f(t) = c (constant), then f(t + Δt) = c, so Δf = 0.
  • Δf / Δt = 0 / Δt = 0 for any Δt ≠ 0.
  • Limit: df/dt = 0.

🔢 Basic derivative facts

🔢 Constant velocity

  • If f(t) = Vt (constant velocity V), then Δf = V·Δt.
  • Δf / Δt = V always.
  • Derivative: df/dt = V.
  • Example: the derivative of 2t is 2.

🔢 Reciprocal function

  • The derivative of 1/t is −1/t².
  • The derivative of 1/x is −1/x².
  • More generally: the derivative of 4/x is −4/x² (by the same method).

🔢 Increasing vs decreasing

  • Increasing function (going upward): positive slope, df/dt > 0.
  • Decreasing function (going downward): negative slope, df/dt < 0.
  • Example: 1/t decreases as t increases, so its derivative −1/t² is negative.

🧩 The square rule (advanced)

🧩 Derivative of u(x)²

  • If u(x) is a function with derivative du/dx, what is the derivative of f(x) = [u(x)]²?
  • Not (du/dx)²—you do not square the derivative.
  • Compute: Δf = [u(x + Δx)]² − [u(x)]² = [u(x + Δx) + u(x)] · [u(x + Δx) − u(x)].
  • Divide by Δx: Δf / Δx = [u(x + Δx) + u(x)] · [Δu / Δx].
  • As Δx → 0: u(x + Δx) → u(x) and Δu/Δx → du/dx.
  • Square rule: d(u²)/dx = 2u · (du/dx).

🧩 Applications of the square rule

  • u = x²: derivative of x⁴ is 2·x²·(2x) = 4x³.
  • u = 1/x: derivative of 1/x² is 2·(1/x)·(−1/x²) = −2/x³.
  • u = sin x (given du/dx = cos x): derivative of sin²x is 2·sin x·cos x.

Don't confuse: The derivative of u² is not the square of the derivative; it is twice u times the derivative of u.

13

Powers and Polynomials

2.2 Powers and Polynomials

🧭 Overview

🧠 One-sentence thesis

The derivative of x raised to any power n follows the pattern that the derivative of xⁿ is n times xⁿ⁻¹, and this rule extends to all real exponents and combinations of powers (polynomials), enabling us to solve differential equations and apply calculus to economics and biology.

📌 Key points (3–5)

  • The power rule: The derivative of xⁿ is n·xⁿ⁻¹ for any real number n (positive, negative, or fractional).
  • How the binomial formula reveals the pattern: Expanding (x + h)ⁿ produces xⁿ + n·xⁿ⁻¹·h + (terms with h², h³, ...), and after subtracting xⁿ and dividing by h, only n·xⁿ⁻¹ survives as h approaches zero.
  • Linearity rules for combinations: The derivative of c times f(x) is c times the derivative of f(x); the derivative of f(x) + g(x) is the sum of their derivatives.
  • Reversing the process (differential equations): Given dy/dx, we can find y(x), but only up to an arbitrary constant C (the starting value).
  • Common confusion—going backward is not unique: If dy/dx = 3x², then y = x³ + C for any constant C, not just y = x³; constants disappear when differentiating but reappear when integrating.

🔢 The power rule and its derivation

🔢 The pattern for integer powers

The excerpt demonstrates the derivative of x³ step by step:

  • Start with f(x) = x³, so Δf = (x + h)³ − x³.
  • Expand: (x + h)³ = x³ + 3x²h + 3xh² + h³.
  • Subtract x³: Δf = 3x²h + 3xh² + h³.
  • Divide by h: Δf/h = 3x² + 3xh + h².
  • As h → 0, the limit is 3x².

Why this works: The key term is the one with exactly one factor of h, which is n·xⁿ⁻¹·h; all other terms have h², h³, etc., and vanish after dividing by h and taking the limit.

🔢 The binomial formula for (x + h)ⁿ

The binomial formula: (x + h)ⁿ = xⁿ + n·xⁿ⁻¹·h + (other terms with h², h³, ...) + hⁿ.

  • There are n parentheses in (x + h)(x + h)...(x + h).
  • To get a term with exactly one h, choose h from one parenthesis and x from the rest: n ways to do this.
  • Result: the coefficient of xⁿ⁻¹·h is n.
  • After subtracting xⁿ, dividing by h, and letting h → 0, the derivative is n·xⁿ⁻¹.

Example: For n = 4, (x + h)⁴ = x⁴ + 4x³h + 6x²h² + 4xh³ + h⁴. The key term is 4x³h, so the derivative of x⁴ is 4x³.

🔢 Extension to all real exponents

The rule holds for:

  • Negative powers: The derivative of x⁻¹ = 1/x is −1·x⁻² = −1/x² (from Section 2.1).
  • Fractional powers: The derivative of x^(1/2) = √x is (1/2)·x^(−1/2) = 1/(2√x).
  • Any real n: The derivative of x^(2.2) is 2.2·x^(1.2).

Don't confuse: x⁻² means 1/x², and x⁻¹/² means 1/√x; negative exponents produce decreasing functions with negative slopes.

🧮 Derivatives of polynomials

🧮 Linearity: multiplying by constants

Rule 2C: The derivative of c times f(x) is c times f'(x).

  • If f(x) is multiplied by a constant c, so is f(x + h), so is Δf, and so is Δf/h.
  • The limit (the derivative) is also multiplied by c.
  • Example: The derivative of 6x³ is 6 times 3x², which is 18x².

🧮 Linearity: adding functions

Rule 2D: The derivative of f(x) + g(x) is f'(x) + g'(x).

  • Adding f + g means adding Δf + Δg.
  • Divide by h: (Δf + Δg)/h = Δf/h + Δg/h.
  • The limit of a sum is the sum of limits: f' + g'.
  • Example: The derivative of 6x³ + (1/2)x² is 18x² + x.

🧮 Constants vanish

  • The derivative of any constant is zero (a constant just shifts the graph up or down without changing slope).
  • Example: The derivative of 9 + 2x − x⁵ is 0 + 2 − 5x⁴ = 2 − 5x⁴.

Why this matters: In differential calculus, constants disappear; in integral calculus (going backward), constants reappear and must be determined from initial conditions.

🔄 Reversing the process: differential equations

🔄 Finding y from dy/dx

  • Forward: Given y = x³, we find dy/dx = 3x².
  • Backward: Given dy/dx = 3x², we deduce y = x³ + C for any constant C.
  • The constant C is the starting value of y when x = 0.

Example: If dy/dx = 2x + 1, then y = x² + x + C (any function of the form x² + x shifted vertically).

🔄 Two challenges in differential equations

  1. No systematic method for all slopes: Given dy/dx = (sin x)/x, we have no formula for y(x) yet.
  2. Equations mixing y and dy/dx: Many real-world equations involve both the function and its derivative:
    • Economics: dy/dx = 3y/x.
    • Geometry: dy/dx = 3y^(2/3).
    • These require new techniques (Chapter 6).

Roadmap: Chapters 2–4 compute derivatives; Chapter 5 (integrals) finds y from dy/dx; Chapter 6 solves differential equations where y and dy/dx are mixed.

🔄 Application: modeling an epidemic

The excerpt mentions AIDS data through 1988 fitting a cubic polynomial:

  • Number of cases: y = 174.6(t − 1981.2)³ + 340.
  • This implies dy/dt = 3y/t (not the exponential dy/dt = y typical of other epidemics).
  • The cubic growth is dramatically different and suggests a different underlying mechanism.

Don't confuse: Exponential growth (dy/dt = y) reaches a peak quickly; cubic growth (dy/dt = 3y/t) is slower and indicates something is preventing exponential spread.

💰 Applications to economics

💰 Marginal vs. average

Marginal cost/income: the derivative dy/dx, the cost or income of the next unit.

  • Average: total cost y divided by quantity x, i.e., y/x.
  • Marginal: the rate of change dy/dx as x changes.
  • "The average describes the past, the marginal predicts the future."

Example: If the cost is y = x², then:

  • Average cost per unit = x²/x = x.
  • Marginal cost = dy/dx = 2x (the cost of the next unit is higher).

Why it matters: A firm decides whether to produce one more unit based on marginal cost (dy/dx), not average cost (y/x).

💰 Elasticity of demand

Elasticity E(x): the ratio of relative change in demand to relative change in price, E(x) = (dy/dx) / (y/x).

  • Elasticity measures sensitivity: how much does quantity demanded change (in percentage terms) when price changes (in percentage terms)?
  • Dimensionless: E is the same whether measured in liters or gallons, dollars or pesos.
  • The formula is "marginal divided by average."
ElasticityMeaningExample
|E| < 1InelasticDemand changes little with price (necessities like bread)
|E| = 1Unit elasticDemand y = c/x; consumers spend the same total amount xy = c at all prices
|E| > 1ElasticDemand changes a lot with price (luxuries like caviar)
E = 0Perfectly inelasticFixed supply (e.g., wheat after harvest); y = constant
E = ∞Perfectly elasticUnlimited supply at fixed price; x = constant

Example: For demand y = c/x, dy/dx = −c/x² and y/x = c/x², so E = (−c/x²)/(c/x²) = −1 (unit elastic).

💰 Power functions and constant elasticity

  • The demand function y = c·xⁿ has constant elasticity E = n.
  • Proof: dy/dx = c·n·xⁿ⁻¹, and (dy/dx)/(y/x) = (c·n·xⁿ⁻¹)/(c·xⁿ/x) = n.
  • Example: y = 20/√x = 20·x^(−1/2) has E = −1/2 (inelastic); y = x⁻³ has E = −3 (elastic).

Don't confuse: Economists often drop the minus sign in conversation (demand curves slope down, so E is negative), but the definition includes it.

💰 Income elasticity

  • Income elasticity: E(I) = (dy/dI)/(y/I), where y is demand and I is income.
  • Luxury: E > 1 (demand more than doubles when income doubles).
  • Necessity: E < 1 (demand less than doubles).

Example: If you save $500 out of $10,000 income and E = 2, then y = c·I² with c = 5×10⁻⁶. Marginal savings dy/dI = 2c·I = 0.10 (you save 10 cents of the next dollar), even though average savings is 5%.

💰 Supply = demand vs. monopoly

  • Perfect competition: Many suppliers; price is set where supply curve meets demand curve.
  • Monopoly: One supplier (e.g., electricity, airport); the supplier can raise price and some demand remains.
  • Calculus problem shifts: In monopoly, the goal is to maximize profit (find where marginal profit = 0), not to solve supply = demand.
14

2.3 The Slope and the Tangent Line

2.3 The Slope and the Tangent Line

🧭 Overview

🧠 One-sentence thesis

The tangent line at a point on a curve is the straight line that best approximates the curve locally, determined by matching both the point and the slope (derivative) at that point, and it emerges as the limit of secant lines as two points converge.

📌 Key points (3–5)

  • What the tangent line is: the unique straight line through a point on a curve that has the same slope as the curve at that point; over a very short range, any curve looks straight and follows its tangent line.
  • How to write its equation: use the point-slope form y - f(a) = m(x - a), where m = f'(a) is the derivative (slope) at x = a and f(a) is the height.
  • Secant vs tangent: a secant line connects two points on the curve with slope (f(c) - f(a)) / (c - a); as the second point c approaches a, the secant slope approaches the tangent slope df/dx.
  • Common confusion: the tangent line equation y - f(a) = f'(a)(x - a) is not the same as y = f'(a)x + b unless you solve for the correct intercept b = f(a) - f'(a)a.
  • Why it matters: tangent lines let you approximate curved motion or change with straight-line calculations; they appear in optimization (e.g., when to step off a roller-coaster, when to sell gold) and in understanding instantaneous rates.

📐 The equation of a straight line

📐 General form and slope

A straight line has the form y = mx + b, where m is the slope and b is the y-intercept.

  • Why it's straight: if two points satisfy y = mx + b, then every halfway point (and every subdivision) also satisfies it, so the graph cannot curve.
  • What the slope means: moving one unit across (in x) means moving m units up (in y); slope = (distance up) / (distance across) = m / 1.
  • Example: for y = 2x + 1, going from x = 0 to x = 1 raises y from 1 to 3, a rise of 2.

📌 Point-slope form

The point-slope form of a line through (a, f(a)) with slope m is:
y - f(a) = m(x - a).

  • This form immediately shows that when x = a, the factor (x - a) is zero, so y = f(a) as required.
  • For the tangent line, set m = f'(a) (the derivative at x = a).
  • Example: the curve y = x⁴ - x² + 3 at x = 1 has height f(1) = 3 and slope dy/dx = 4x³ - 2x = 2 at x = 1, so the tangent line is y - 3 = 2(x - 1).

🔧 Finding the intercept b

  • If you prefer the form y = mx + b, solve for b using the known point: f(a) = ma + b, so b = f(a) - ma.
  • Example: y - 3 = 2(x - 1) expands to y = 2x + 1, where b = 1.

🎯 The tangent line to a curve

🎯 Definition and construction

The tangent line at x = a on the curve y = f(x) is the line through (a, f(a)) with slope f'(a).

  • Why it works: over a very short range (zoom in with a microscope or computer), the curve looks straight and follows this line before curving away.
  • The tangent line has the same slope as the curve at that one point.
  • Example: for y = x³ - 2 at x = 2, height is f(2) = 6 and slope is dy/dx = 3x² = 12, so the tangent line is y - 6 = 12(x - 2) or y = 12x - 18.

🔄 The normal line

The normal line is perpendicular to the tangent line at the same point; if the tangent has slope m, the normal has slope -1/m.

  • Rule: slopes of perpendicular lines multiply to give -1.
  • Example: if the tangent slope is 12, the normal slope is -1/12, so the normal line is y - 6 = (-1/12)(x - 2).
  • Don't confuse: the normal line is not y = (-1/12)x + b with the same intercept as the tangent; you must recalculate b from the point.
  • Physical interpretation: light rays and brush fires move in the normal direction (perpendicular to the curve or fire line).

🚀 Application: stepping off a roller-coaster

Example: you are on a track y = x² + 4 and want to reach a friend at (0, 0) by stepping off and traveling in a straight line (the tangent).

  • At step-off point x = a, the tangent line is y - (a² + 4) = 2a(x - a).
  • For this line to pass through (0, 0), substitute: 0 - (a² + 4) = 2a(0 - a)-(a² + 4) = -2a²a = -2.
  • So step off at x = -2.

🚗 Application: catching up to a car

Example: a red light is 72 meters away and will turn green in 4 seconds; a waiting car will then accelerate at 3 meters/sec². You want to slow down just enough to catch up without passing.

  • At catchup time T, both cars have the same speed and distance (two conditions → tangent condition).
  • The other car's speed at time T is 3(T - 4) (delayed by 4 seconds).
  • Set your speed V = 3(T - 4)T = (1/3)V + 4.
  • Match distances: your distance VT equals the other car's distance 72 + (1/2)(3)(T - 4)².
  • Solve: V = 12 meters/second (about 43 km/hr or 27 mph).

🔗 The secant line and the limit

🔗 Secant line definition

The secant line connects two points (a, f(a)) and (c, f(c)) on the curve; its slope is the average rate of change:
m = (f(c) - f(a)) / (c - a).

  • Two-point form: y - f(a) = [(f(c) - f(a)) / (c - a)](x - a).
  • This line automatically passes through both points.
  • Example: for y = x³ - 2 from x = 2 (height 6) to x = 3 (height 25), the secant slope is (25 - 6)/(3 - 2) = 19, so the secant line is y - 6 = 19(x - 2).

🔄 From secant to tangent

  • As the second point c approaches the first point a, the secant line approaches the tangent line.

  • The secant slope (f(c) - f(a)) / (c - a) approaches the tangent slope f'(a).

  • Fundamental idea of differential calculus:

    f'(a) = limit as c → a of (f(c) - f(a)) / (c - a).

  • Equivalently, write c = a + Δx (so c - a = Δx and f(c) - f(a) = Δf):

    df/dx = limit of Δf / Δx as Δx → 0.

  • Don't confuse: the limit is not 0/0 (meaningless), but a definite number (the derivative).

📊 Example: sine function

For y = sin x at x = 0:

  • Starting point: (0, sin 0) = (0, 0).
  • Secant to (c, sin c) has slope (sin c - 0) / (c - 0) = (sin c) / c.
  • Secant equation: y = [(sin c) / c] x.
  • As c → 0, the limit of (sin c) / c is 1, so the tangent line is y = 1x (slope = 1).

💰 Example: selling gold

You own gold worth √t million dollars in t years; when should you sell and buy a bond (straight interest)?

  • Rate of increase (derivative): d(√t)/dt = 1/(2√t).
  • Sell when this rate drops to 10% of current value: 1/(2√t) = (√t)/102t = 10t = 5.
  • At t = 5, value is √5 ≈ 2.236 million; tangent line is y - √5 = (√5/10)(t - 5).
  • At t = 25 on the tangent (bond): y - √5 = (√5/10)(25 - 5) = 2√5y = 3√5 ≈ 6.7 million.
  • If you kept the gold: √25 = 5 million (less than the bond).

🧮 Key formulas summary

ConceptFormulaNotes
Tangent liney - f(a) = f'(a)(x - a)Point-slope form; slope = derivative
Normal liney - f(a) = (-1/f'(a))(x - a)Perpendicular to tangent
Secant liney - f(a) = [(f(c) - f(a))/(c - a)](x - a)Connects two points; average slope
Derivative as limitf'(a) = lim (c→a) [(f(c) - f(a))/(c - a)]Secant slope → tangent slope
Slope of y = mx + bmConstant slope; rise/run = m/1

🔍 Common pitfalls

🔍 Intercept confusion

  • The tangent line y - f(a) = f'(a)(x - a) expands to y = f'(a)x + [f(a) - f'(a)a].
  • The intercept is b = f(a) - f'(a)a, not just f(a).
  • Example: tangent at (1, 3) with slope 2 is y = 2x + 1, not y = 2x + 3.

🔍 Normal line slope

  • If tangent slope is m, normal slope is -1/m (not -m).
  • Example: tangent slope 12 → normal slope -1/12.

🔍 Limit vs 0/0

  • As c → a, the secant slope (f(c) - f(a))/(c - a) looks like 0/0, but the limit is the derivative f'(a), a definite number.
  • The excerpt emphasizes: "Algebra stays away from 0/0, but calculus gets as close as it can."
15

The Derivative of the Sine and Cosine

2.4 The Derivative of the Sine and Cosine

🧭 Overview

🧠 One-sentence thesis

The derivatives of sin x and cos x reveal that these functions oscillate in a special way governed by the differential equation y″ = −y, making them fundamental for modeling all oscillatory motion.

📌 Key points (3–5)

  • Core result: The derivative of sin x is cos x, and the derivative of cos x is −sin x, proven by the limit definition and trigonometric addition formulas.
  • Two critical limits: (sin h)/h approaches 1 and (cos h − 1)/h approaches 0 as h → 0; these are the foundation of all sine/cosine derivatives.
  • Second derivatives and oscillation: Both sin x and cos x satisfy y″ = −y, meaning acceleration equals negative distance—the signature of simple harmonic motion.
  • Common confusion: A negative second derivative (y″ < 0) does not mean the function is decreasing; it means the slope (y′) is decreasing, so the curve bends downward even while rising.
  • Why it matters: This differential equation y″ = −y models springs, swings, heartbeats, alternating current, and economic cycles—all real-world oscillations.

🔍 Proving the derivative of sin x

🔍 The limit definition approach

The derivative of y = sin x is dy/dx = cos x.

  • Start from the standard limit:
    • dy/dx = limit as h → 0 of [sin(x + h) − sin x] / h
  • Apply the addition formula: sin(x + h) = sin x · cos h + cos x · sin h
  • Rearrange into two separate ratios:
    • Δy/h = sin x · [(cos h − 1)/h] + cos x · [(sin h)/h]
  • As h → 0, the first ratio → 0 and the second → 1, so dy/dx = 0 + cos x = cos x.

🧮 The two critical limits

These are the heart of the proof:

LimitValueWhy it matters
(sin h)/h as h → 01Shows the sine curve starts with slope 1 at x = 0
(cos h − 1)/h as h → 00Shows 1 − cos h shrinks faster than h itself
  • Don't confuse: Both numerators approach 0, but the ratio behavior depends on which goes to zero more quickly.
  • Example: h²/h → 0 (top faster), h/h → 1 (same speed), √h/h → ∞ (bottom faster).

📐 Why (sin h)/h approaches 1

📐 The squeeze theorem

The proof uses geometric inequalities to trap (sin h)/h between two functions that both approach 1.

Step 1: Establish inequalities

  • From geometry (Figure 2.11): sin h < h < tan h for small positive h
  • Divide through: (sin h)/h < 1 and (sin h)/h > cos h
  • So cos h < (sin h)/h < 1

Step 2: Apply the squeeze

  • As h → 0, cos h → 1 from below
  • The ratio (sin h)/h is caught between cos h and 1
  • Both boundaries approach 1, so the ratio must also approach 1

📏 Geometric justification

  • Why sin h < h: The straight line PQ (length 2 sin h) is shorter than the circular arc (length 2h), because the shortest distance between two points is a straight line.
  • Why h < tan h: Compare areas—the triangular area ½ tan h is larger than the circular sector area ½ h.
  • Important: All angles must be in radians; in degrees, the derivative of sin x is not cos x but is reduced by the factor 2π/360.

Example: At h = 0.01 radians, sin(0.01) ≈ 0.01 and (sin h)/h ≈ 0.9999983, very close to 1.

🔄 Why (cos h − 1)/h approaches 0

🔄 Connecting to the sine limit

Start from the Pythagorean identity: (sin h)² + (cos h)² = 1

  • Rearrange: 1 − (cos h)² = (sin h)²
  • Factor: (1 − cos h)(1 + cos h) = (sin h)² < h²
  • Divide by h and by (1 + cos h):
    • 0 < (1 − cos h)/h < h/(1 + cos h)
  • As h → 0, the right side → 0/(1 + 1) = 0
  • So (1 − cos h)/h is squeezed to 0

🔄 Approximation insight

The inequality also shows that 1 − cos h ≈ ½h² for small h.

  • The factor ½ comes from 1 + cos h ≈ 2
  • This means the "tangent parabola" 1 − ½h² is close to the top of the cosine curve

🌊 Derivatives and oscillation

🌊 The derivative of cos x

The derivative of y = cos x is dy/dx = −sin x.

  • Use the addition formula: cos(x + h) = cos x · cos h − sin x · sin h
  • Form the difference quotient:
    • Δy/h = cos x · [(cos h − 1)/h] − sin x · [(sin h)/h]
  • Take limits: dy/dx = cos x · 0 − sin x · 1 = −sin x

Graphical check: Shifting the sine curve left by π/2 gives the cosine curve; the derivative also shifts by π/2, confirming the result.

🔁 Second derivatives

The second derivative is the derivative of the derivative; notation: y″ or d²y/dx².

  • For y = sin t: y′ = cos t, then y″ = −sin t = −y
  • For y = cos t: y′ = −sin t, then y″ = −cos t = −y
  • Both satisfy the differential equation y″ = −y

What this means physically:

  • y″ is acceleration (rate of change of velocity)
  • y″ = −y means acceleration = −distance
  • The greater the distance, the greater the restoring force pulling back

Example: Stretch a spring → restoring force pulls it back; push a swing up → gravity brings it down.

📊 Concavity and bending

ConditionMeaningGraph behavior
y″ > 0y′ is increasingCurve bends upward (concave up)
y″ < 0y′ is decreasingCurve bends downward (concave down)
  • Don't confuse: y″ < 0 does not mean y is decreasing!
    • It means the slope y′ is decreasing
    • Example: At the start of sin t, y is still increasing but y″ < 0 (the arch bends down while going up)

🎯 Simple harmonic motion

🎯 The differential equation y″ = −y

Simple harmonic motion: oscillation governed by d²y/dt² = −y.

  • All solutions are combinations: y = A sin t + B cos t
  • This models any system where restoring force is proportional to displacement

Real-world examples from the excerpt:

  • Springs: stretch → restoring force pulls back
  • Swings: push up → gravity brings down
  • Hearts: fill and empty
  • Balls: bounce
  • Electric current: alternates
  • Economy: high prices → high production → low prices → …

🎯 Why oscillation is universal

The excerpt emphasizes: "We can't live without oscillations (or differential equations)."

  • Calculus models events by equations
  • Equation (12), y″ = −y, models oscillation
  • Sine and cosine are the only basic solutions to this equation

Example: A function with distance = 5t has velocity df/dt = 5 (constant) and acceleration d²f/dt² = 0 (no oscillation). But y = sin t has acceleration = −y, creating perpetual back-and-forth motion.

16

The Product and Quotient and Power Rules

2.5 The Product and Quotient and Power Rules

🧭 Overview

🧠 One-sentence thesis

Five fundamental differentiation rules—linearity, product, reciprocal, quotient, and power—enable us to find derivatives of complex functions built from simpler pieces whose derivatives we already know.

📌 Key points (3–5)

  • What the rules do: combine known derivatives of simple functions (like x and sin x) to find derivatives of sums, products, quotients, and powers.
  • Sum rule (linearity): when you add functions, you add their derivatives; when you multiply by constants, multiply the derivatives by those constants.
  • Product rule: the derivative of u times v has two terms (u times dv/dx plus v times du/dx), not the product of the two derivatives.
  • Common confusion: the derivative of uv is NOT (du/dx)(dv/dx); the product rule requires both u·(dv/dx) and v·(du/dx).
  • Power rule generalization: the derivative of x^n is n·x^(n−1) works for all real exponents (negative, fractional, any number), and extends to [u(x)]^n with an extra factor du/dx.

➕ Sum and linearity rules

➕ The sum rule

When we add functions, we add their derivatives.

  • The derivative of u(x) + v(x) is du/dx + dv/dx.
  • This is the simplest rule and matches intuition.
  • Example: the derivative of x + sin x is 1 + cos x.
  • Physical interpretation: if you add distances, you add velocities.

🔢 The linearity rule (extended sum rule)

A linear combination au(x) + bv(x) has derivative a·(du/dx) + b·(dv/dx).

  • You can add or subtract functions and multiply by constants.
  • Example: the derivative of 3x − 4 sin x is 3 − 4 cos x.
  • The rule works because limits can be taken separately and added.
  • This is called "linearity" because it preserves addition and scalar multiplication.

✖️ Product and reciprocal rules

✖️ The product rule

The derivative of u times v is u·(dv/dx) + v·(du/dx).

  • Key point: the derivative of a product has two terms, not one.
  • Don't confuse: the derivative of uv is NOT (du/dx)·(dv/dx).
  • Example: for x³ times x², the product rule gives x³·(2x) + x²·(3x²) = 2x⁴ + 3x⁴ = 5x⁴, which matches the derivative of x⁵.
  • Example: the derivative of x sin x is x cos x + sin x (the "1" from dx/dx is implicit).
  • Geometric interpretation: when a rectangle with sides u and v grows, the area change is approximately u·Δv (top strip) plus v·Δu (side strip); the corner piece Δu·Δv becomes negligible.

🔄 The reciprocal rule

The derivative of 1/v(x) is −(dv/dx)/v².

  • This comes from applying the product rule to v·(1/v) = 1, whose derivative is 0.
  • Example: the derivative of 1/x is −1/x².
  • Example: the derivative of 1/cos x (which is sec x) is sin x/cos² x = sec x tan x.
  • Dimensional check: if v is in dollars and x in hours, then (dv/dx)/v² has units (dollars/hour)/dollars² = 1/(dollars·hour), matching d(1/v)/dx.

🎯 The square rule (special case)

  • When u = v in the product rule, the derivative of u² is 2u·(du/dx).
  • Example: the derivative of sin² x is 2 sin x cos x.
  • Example: the derivative of cos² x is −2 cos x sin x (minus sign from the derivative of cosine).
  • Quick check: since sin² x + cos² x = 1, their derivatives must sum to zero.

➗ Quotient rule

➗ Dividing functions

The derivative of u(x)/v(x) is [v·(du/dx) − u·(dv/dx)]/v².

  • This combines the product rule and reciprocal rule: u/v = u·(1/v).
  • The v² in the denominator is familiar from the reciprocal rule.
  • The numerator is v·(du/dx) − u·(dv/dx) (note the minus sign and order).
  • Example: for x⁵/x³ (which is x²), the quotient rule gives [x³·(5x⁴) − x⁵·(3x²)]/x⁶ = (5x⁷ − 3x⁷)/x⁶ = 2x.

📐 Important quotient: tangent

  • tan x = sin x / cos x.
  • Using the quotient rule: derivative is [cos x·cos x − sin x·(−sin x)]/cos² x = (cos² x + sin² x)/cos² x = 1/cos² x = sec² x.
  • Memorize: the derivative of tan x is sec² x.
  • At x = 0, this slope is 1 (same as sin x and x at the origin).
  • At x = π/2, the tangent curve is vertical (sec² x = ∞).

🔍 Sensitivity of quotients

  • The slope generally "blows up" faster than the function because of the square in the denominator.
  • Example: the slope of 1/x is −1/x², which is more sensitive.
  • Example: the derivative of sin x / x is (x cos x − sin x)/x²; at x = 0 this is formally 0/0 but the true derivative is zero (the function is symmetric, so its derivative must be zero at the center).

🔋 Power rule

🔋 Powers of x

The derivative of x^n is n·x^(n−1).

  • This works for all exponents: positive, negative, fractional, any real number.
  • For negative powers: use the reciprocal rule on x^n to get d(x^(−n))/dx = −n·x^(−n−1).
  • Example: the derivative of x^(−1) is −1·x^(−2) = −1/x².
  • Example: the derivative of x^(−2) is −2·x^(−3) = −2/x³.
  • Pattern: multiply by the exponent and reduce the exponent by one.

🌱 Fractional powers

  • The same formula works for n = p/q (fractions).
  • Proof approach: write u = x^(p/q), so u^q = x^p; differentiate both sides using the power rule; solve for du/dx.
  • Example: the derivative of x^(1/3) is (1/3)·x^(−2/3); the slope is infinite at x = 0 and zero at x = ∞.
  • Example: the derivative of x^(4/3) is (4/3)·x^(1/3); the slope is zero at x = 0 and infinite at x = ∞.

🎨 Power rule for composite functions

The derivative of [u(x)]^n is n·[u(x)]^(n−1)·(du/dx).

  • This extends the power rule to powers of any function u(x), not just x.
  • The extra factor du/dx is crucial—don't forget it.
  • Example: the derivative of (sin x)^6 is 6·(sin x)^5·cos x.
  • Example: the derivative of (tan x)^7 is 7·(tan x)^6·sec² x.
  • Example: the derivative of (x² + 1)^8 is 8·(x² + 1)^7·(2x).
  • For n = 2 this reduces to the square rule: 2u·(du/dx).
  • Proof by induction: use the product rule to go from u^n to u^(n+1) = u^n·u.

📊 Summary of trigonometric derivatives

FunctionDerivative
sin xcos x
cos x−sin x
tan xsec² x
sec xsec x tan x
csc x−csc x cot x
cot x−csc² x

🧰 Practical notes

🧰 Dimensional analysis

  • Always check that dimensions (units) agree on both sides of an equation.
  • Example: if v is in dollars and x in hours, dv/dx is in dollars per hour; then −(dv/dx)/v² has units (dollars/hour)/dollars² = 1/(dollars·hour), matching the units of d(1/v)/dx.
  • This test ignores constants and signs but prevents major errors.
  • Example: Einstein's e = mc² is dimensionally consistent (both sides have dimension mass·distance²/time²).

🧰 Three-factor products

  • The derivative of uvw is u·v·(dw/dx) + u·(dv/dx)·w + (du/dx)·v·w (one derivative at a time).
  • Geometric interpretation: when a box with sides u, v, w grows, three slabs are added.
  • Example: the derivative of x·x·x is x·x + x·x + x·x = 3x².

🧰 What to memorize

  • The five rules themselves (linearity, product, reciprocal, quotient, power).
  • The derivatives of sin x, cos x, and tan x.
  • The pattern: derivative of x^n is n·x^(n−1).
  • Don't try to memorize every formula for sec, csc, cot—derive them as needed using the rules.

</budget:token_budget>

17

Limits

2.6 Limits

🧭 Overview

🧠 One-sentence thesis

A rigorous definition of limits uses epsilon (ε) and delta (δ) to formalize the intuitive idea that a sequence or function can be made arbitrarily close to a target value, which is essential for defining derivatives and proving calculus rules.

📌 Key points (3–5)

  • What convergence means: A sequence aₙ → L if, for any tolerance ε > 0, all terms eventually stay within ε of L (not just visit once, but stay there).
  • The epsilon-delta definition: For functions, f(x) → L as x → a means: given any ε > 0, there exists δ > 0 such that |f(x) − L| < ε whenever 0 < |x − a| < δ.
  • Common confusion: "Getting closer" is not enough—the sequence 1, ½, 1, ⅓, 1, ¼, ... gets arbitrarily close to zero but does not converge because it doesn't stay close.
  • Necessary vs. sufficient conditions: If aₙ converges, then aₙ₊₁ − aₙ → 0 (necessary), but the converse is false—the harmonic series partial sums show that steps can shrink to zero while the sequence diverges (not sufficient).
  • Why it matters: Limits underpin the definition of derivatives and enable proofs of sum, product, and quotient rules for differentiation.

🎯 What convergence really means

🎯 Four attempted definitions (only one correct)

The excerpt tests four possible definitions of aₙ → 0:

AttemptStatementWhy it fails (or succeeds)
1All aₙ below 10⁻¹⁰Too weak—doesn't force approach to zero
2Each aₙ₊₁ smaller than aₙ1.1, 1.01, 1.001, ... is decreasing but converges to 1, not 0
3At least one aₙ below any small number1, ½, 1, ⅓, 1, ¼, ... satisfies this but doesn't converge
4Eventually stay below any small number✓ Correct: the tail end decides everything

📏 The formal definition

Convergence to zero: For any ε > 0, there exists N such that |aₙ| < ε for all n > N.

  • ε (epsilon): An arbitrarily small positive tolerance chosen by a challenger (Socrates in the excerpt's metaphor).
  • N: A threshold index—after this point, all terms stay within tolerance.
  • The first million (or billion) terms make no difference; only the tail matters.

Example: The sequence 10⁻³, 10⁻², 10⁻⁶, 10⁻⁵, 10⁻⁹, 10⁻⁸, ... converges to zero even though it oscillates, because it eventually stays below any ε.

Counter-example: 10⁻⁴, 10⁻⁶, 10⁻⁴, 10⁻⁸, 10⁻⁴, 10⁻¹⁰, ... does not converge because it repeatedly returns to 10⁻⁴.

🎯 Convergence to any limit L

General convergence: aₙ → L means |aₙ − L| → 0.

For any ε > 0, there exists N such that |aₙ − L| < ε for all n > N.

  • Visually: Only finitely many terms lie outside any strip of width 2ε centered on L.
  • Negative terms are allowed; distance is measured by absolute value.

Example: The sequence 3/2, 5/4, 7/6, ... converges to L = 1 because the differences ½, ¼, ⅙, ... converge to zero.

🚫 Common pitfalls and necessary vs. sufficient conditions

🚫 The harmonic series divergence

The sequence 1, 1 + ½, 1 + ½ + ⅓, 1 + ½ + ⅓ + ¼, ... (partial sums of the harmonic series) does not converge.

  • The fourth term exceeds 2 (since ⅓ + ¼ > ½).
  • The eighth term exceeds 2½ (since 1/5 + 1/6 + 1/7 + 1/8 > 4 · 1/8 = ½).
  • The sequence grows past any proposed limit L.

Key insight: The steps aₙ₊₁ − aₙ = 1/(n+1) → 0, yet the sequence diverges to infinity.

⚖️ Necessary vs. sufficient

The excerpt proves:

Statement: If [aₙ converges to L] then [aₙ₊₁ − aₙ converges to zero].

  • Proof sketch: If |aₙ − L| < ε and |aₙ₊₁ − L| < ε, then |aₙ₊₁ − aₙ| ≤ |aₙ₊₁ − L| + |L − aₙ| < 2ε.
  • Logical direction: Convergence is a sufficient condition for shrinking steps; shrinking steps are necessary for convergence.
  • The converse is false: Shrinking steps do not guarantee convergence (harmonic series counter-example).

🔄 Five ways to express "A implies B"

ExpressionMeaning
A ⇒ BA implies B
If A then BConditional statement
A is sufficient for BA being true is enough to guarantee B
B is necessary for AB must be true whenever A is true
B is true if A is trueAnother conditional phrasing

Don't confuse: "A ⇒ B" with "B ⇒ A" (the converse). Both require separate proofs. When both hold, write "A ⇔ B" (if and only if).

🔧 Rules for combining limits

🔧 Limit arithmetic for sequences

When aₙ → L and bₙ → M:

  • Addition: aₙ + bₙ → L + M
  • Subtraction: aₙ − bₙ → L − M
  • Multiplication: aₙbₙ → LM
  • Division: aₙ/bₙ → L/M (provided M ≠ 0)
  • Scalar multiplication: caₙ → cL (constants pass through limits)

🔍 Proof sketch for multiplication rule

The key identity is:

aₙbₙ − LM = (aₙ − L)(bₙ − M) + M(aₙ − L) + L(bₙ − M)

If |aₙ − L| < ε and |bₙ − M| < ε beyond some point, the right side is less than ε² + Mε + Lε, which can be made arbitrarily small.

🔧 Corresponding rules for functions

When f(x) → L and g(x) → M as x → a, the same arithmetic rules apply:

  • f(x) + g(x) → L + M
  • f(x)g(x) → LM
  • etc.

📐 Limits of functions: the epsilon-delta definition

📐 From sequences to functions

Instead of n → ∞ for sequences, we consider x → a for functions.

Function limit: lim_{x→a} f(x) = L means: For any ε > 0, there exists δ > 0 such that |f(x) − L| < ε whenever 0 < |x − a| < δ.

  • ε (epsilon): Output tolerance—how close f(x) must be to L.
  • δ (delta): Input tolerance—how close x must be to a.
  • The game: Challenger picks ε; responder must find a δ that works.
  • Crucial detail: We require 0 < |x − a|, meaning x ≠ a; the value f(a) is irrelevant to the limit.

📦 The box visualization

  • Socrates chooses the height of a box (2ε) centered on L.
  • Plato must choose the width (2δ) narrow enough that the graph exits through the sides, not the top or bottom.
  • If this is possible for every ε, then the limit exists.

Example: Prove lim_{x→2} 5x = 10.

  • Need |5x − 10| < ε, i.e., 5|x − 2| < ε.
  • Choose δ = ε/5; then |x − 2| < δ implies |5x − 10| < ε.

🔀 One-sided limits

One-sided limit: lim_{x→a⁺} f(x) = L means x approaches a only from above (x > a).

Example: lim_{x→1⁺} √(x−1) = 0.

  • For ε = 1/10, choose δ = 1/100.
  • If 0 < x − 1 < 1/100, then √(x−1) < 1/10.
  • The box must be extremely narrow because the slope is infinite at x = 1.

Don't confuse: Ordinary limits require x to approach from both sides; one-sided limits restrict the direction.

🗜️ The Squeeze Theorem

🗜️ Statement and proof

Squeeze Theorem: If f(x) ≤ g(x) ≤ h(x) for x near a, and both f(x) → L and h(x) → L as x → a, then g(x) → L as well.

Proof idea:

  • After subtracting L, we have f(x) − L ≤ g(x) − L ≤ h(x) − L.
  • If |f(x) − L| < ε and |h(x) − L| < ε, then |g(x) − L| < ε.
  • For any ε, both outer inequalities hold in some region 0 < |x − a| < δ, so the middle one does too.

Why it matters: The theorem allows us to find limits of complicated functions by bounding them between simpler ones.


Note on practical use: The excerpt emphasizes that mathematicians rarely use the ε-δ definition directly in practice. Instead, they establish once that familiar functions (polynomials, sin x, etc.) are continuous, meaning lim_{x→a} f(x) = f(a), and then apply limit rules. The rigorous definition exists to make proofs possible, not to be invoked constantly.

18

Continuous Functions

2.7 Continuous Functions

🧭 Overview

🧠 One-sentence thesis

A function is continuous at a point when its limit equals its actual value there, and this property—stronger than merely having a limit but weaker than having a derivative—ensures that the function has no jumps, poles, or wild oscillations at that point.

📌 Key points (3–5)

  • Three requirements for continuity: f(a) must exist, the limit of f(x) as x approaches a must exist, and that limit must equal f(a).
  • Types of discontinuity: removable (wrong value assigned), jump (one-sided limits differ), pole (function blows up to infinity), and oscillation (no limit exists due to rapid fluctuation).
  • Common confusion—continuity vs differentiability: having a derivative is a stronger requirement than continuity; a function can be continuous at a point but have no derivative there (e.g., absolute value at zero).
  • Continuable functions: some discontinuous functions can be "fixed" by defining the right value at a problem point (like sin(x)/x at x=0), while others cannot (like 1/x at x=0).
  • Two key properties on closed intervals: continuous functions on [a,b] always reach a maximum and minimum, and take on every intermediate value between them.

🔍 What continuity means

🔍 The three conditions

A function f is "continuous at x = a" if f(a) is defined and f(x) approaches f(a) as x approaches a.

The excerpt breaks this into three parts:

  1. The number f(a) exists (f is defined at a)
  2. The limit of f(x) exists (call it L)
  3. The limit L equals f(a) (f(a) is the right value)
  • Often written compactly as: f(x) → f(a) as x → a
  • Intuitive test: "you can draw its graph without lifting up your pen"
  • Example: All polynomials, sin(x), cos(x), and |x| are continuous functions

🎯 Continuous function vs continuous at a point

The excerpt notes a logical subtlety:

  • A function like 1/x is technically called a "continuous function" because it's continuous at every point where it's defined
  • But it has a discontinuity at x=0 (where it's not defined)
  • This creates confusion: we speak of "a discontinuity of 1/x" while calling it a "continuous function"

🚫 Types of discontinuity

🚫 Removable discontinuity

  • The limit exists but f(a) has the wrong value (or is undefined)
  • Example: A function with limit 0 as x→0 but f(0)=1
  • Fix: Simply redefine f(0) to match the limit
  • After correction, the discontinuity disappears

📊 Jump discontinuity

  • One-sided limits exist but differ
  • Example: Step function jumping from 0 to 1 at x=0
    • Limit from left (x→0⁻) is 0
    • Limit from right (x→0⁺) is 1
  • Another example: x/|x| jumps from -1 to +1
  • Cannot be fixed: no single value makes the function continuous

♾️ Pole (infinite limit)

A function has a "pole" at x=a when the denominator goes to zero and the function goes to +∞ or -∞.

  • Example: 1/x² has a pole at x=0 (approaches +∞ from both sides)
  • Example: 1/x has one-sided infinite limits (−∞ from left, +∞ from right)
  • Simple pole: multiplying by x makes the function continuous at that point
    • Example: x·(1/x) = 1 is continuous at x=0
  • Double pole: needs multiplication by x² (not just x) to become continuous
    • Example: 1/x² needs x²·(1/x²) = 1
  • Rational functions P(x)/Q(x) have poles where Q=0 (after removing common factors)

🌀 Infinite oscillation

  • The function oscillates faster and faster, never settling down
  • Example: sin(1/x) as x→0
    • At x=1/3, 1/4, 1/1000 it equals sin(3), sin(4), sin(1000)
    • Oscillates infinitely fast near zero
    • The sine never exceeds 1, but won't stay in any small box of height ε
  • Cannot be fixed: no limit exists

🔧 Continuable functions

🔧 What "continuable" means

The excerpt proposes this distinction:

  • A function is continuable if its definition can be extended to all x in a way that makes it continuous
  • This clarifies the confusion about "continuous functions"
FunctionContinuable?Why
sin(x)/xYesDefine f(0)=1 (the limit)
√xYesAlready continuous where defined
1/xNoNo value at x=0 makes it continuous
tan(x)NoPoles cannot be removed

🔧 Important examples

Example: sin(x)/x at x=0

  • Limit as x→0 is 1
  • Define f(0)=1 to make it continuous
  • This was crucial for finding the derivative of sin(x)

Example: (1−cos(x))/x at x=0

  • Limit as x→0 is 0
  • Define f(0)=0 to make it continuous

Example: (1−cos(x))/x² at x=0

  • Limit as x→0 is 1/2
  • The excerpt shows the calculation using the identity 1−cos²(x) = sin²(x)
  • Result: approaches 1/(1+cos(x)) → 1/2

Don't confuse: sin(x)/x² blows up (simple pole), but (1−cos(x))/x² has a finite limit

🔗 Continuity vs differentiability

🔗 The key distinction

Two different requirements:

PropertyRequirementStrength
Continuous at xf(x+Δx)−f(x) → 0 as Δx→0Weaker
Derivative at x[f(x+Δx)−f(x)]/Δx → f'(x) as Δx→0Stronger

Key insight from the excerpt:

Asking for a derivative is more than asking for continuity.

  • For continuity: Δf goes to zero (maybe slowly)
  • For derivative: Δf goes to zero as fast as Δx (because Δf/Δx has a limit)

🔗 The fundamental relationship

Rule 2I: At a point where f(x) has a derivative, the function must be continuous. But f(x) can be continuous with no derivative.

Proof sketch: The limit of Δf = (Δx)·(Δf/Δx) is (0)·(df/dx) = 0, so f(x+Δx)−f(x) → 0.

🔗 Examples of continuous but not differentiable

  • |x| at x=0: continuous but slope jumps (no derivative)
  • x^(1/3) at x=0: continuous but derivative (1/3)x^(−2/3) blows up
  • The excerpt mentions a remarkable function (1/2)cos(3x) + (1/4)cos(9x) + ... that is continuous everywhere but has a derivative at no points

🔗 The derivative controls speed

The excerpt emphasizes:

The derivative controls the speed at which f(x) approaches the limit.

Because f(x)−f(0) is nearly f'(0) times x:

  • If derivative of sin(x) is 1 → sin(x) decreases like x
  • If derivative of sin²(x) is 0 → sin²(x) decreases faster than x
  • If derivative of x^(1/3) is ∞ → x^(1/3) decreases more slowly than x

📐 Properties on closed intervals

📐 What's required

The excerpt specifies: continuous function on a finite closed interval [a,b]

  • Closed means endpoints a and b are included
  • At endpoints we require f(x) to approach f(a) and f(b)
  • Contrast: open interval (a,b) leaves out the endpoints

📐 Extreme Value Property

A continuous function on the finite interval [a,b] has a maximum value M and a minimum value m.

  • There are points x_max and x_min in [a,b] where it reaches those values
  • f(x_max) = M ≥ f(x) ≥ f(x_min) = m for all x in [a,b]

Why the requirements matter:

  • For 0 < x ≤ 1, the function f(x)=x never reaches its minimum (zero)
  • If we close the interval but define f(0)=3 (discontinuous), the minimum is still not reached

📐 Intermediate Value Property

If the number F is between f(a) and f(b), there is a point c between a and b where f(c) = F.

  • More generally: if F is between the minimum m and maximum M, there's a point c between x_min and x_max where f(c) = F
  • Example application: cos(x) and 2x are continuous, so cos(x)=2x at some point between 0 and 1

Why continuity matters: A jump discontinuity can skip over intermediate values—the excerpt notes that if f(0)=3 (with a jump), the intermediate value F=2 is not reached.

19

Linear Approximation

3.1 Linear Approximation

🧭 Overview

🧠 One-sentence thesis

Linear approximation uses the tangent line at a known point to estimate function values nearby, turning complicated functions into simple straight-line predictions that are accurate for small changes.

📌 Key points (3–5)

  • Core idea: Near any point a, a smooth function behaves almost like its tangent line: f(x) ≈ f(a) + f′(a)(xa).
  • Why it works: Over very short intervals, the function is "nearly constant" in slope, so the tangent line stays close to the curve.
  • Error behavior: The approximation error is of order (Δx)² — linear approximation, quadratic error.
  • Common confusion: Δy (true change along the curve) vs. dy (change along the tangent line); dy is the approximation to Δy.
  • Practical use: Science and engineering linearize nonlinear laws (Ohm's, Hooke's, Newton's) using the derivative as a local multiplier.

📐 The tangent line formula

📐 Equation of the tangent line

At point x = a, the tangent line to y = f(x) is:

Y = f(a) + f′(a)(xa)

  • Capital Y denotes the line; lowercase y denotes the curve.
  • Start at the known value f(a), then add (slope) × (horizontal distance from a).
  • The slope of the curve and the slope of the line are both f′(a) at x = a.

📐 The approximation statement

The key claim is:

yY, or equivalently f(x) ≈ f(a) + f′(a)(xa)

  • This is the "all-purpose linear approximation."
  • It is accurate provided we don't move too far from a.
  • Example: For y = √x near x = 100, the slope is 1/(2√x) = 1/20, so √102 ≈ 10 + (1/20)(2) = 10.1 (true value ≈ 10.0995).

📐 Alternative notation: starting from x

Instead of going from a to x, we can start at x and move by Δx:

f(x + Δx) ≈ f(x) + (slope at x) · Δx

  • The letters are different but the mathematics is identical.
  • This form emphasizes the change Δx from a basepoint x.

🔢 Important examples

🔢 Power function approximation

For any exponent n:

(1 + x)ⁿ ≈ 1 + n x for x near zero

  • This comes from f(x) = (1 + x)ⁿ with basepoint 0; the slope at 0 is n.
  • Example: (1.01)³ ≈ 1 + 3(0.01) = 1.03.
  • Changing n to −n gives 1/(1 + x)ⁿ ≈ 1 − n x.

🔢 When approximation fails

The binomial expansion shows hidden terms:

(1 + x)¹⁰⁰ = 1 + 100x + (100)(99)/2 · (x)² + …

  • For x = 0.01, the (x)² term ≈ 1/2 is too large to ignore.
  • Linear approximation gives 2, but the true value (1.01)¹⁰⁰ ≈ 2.7 (close to e).
  • Don't confuse: linear approximation is only valid when the quadratic error (1/2)(xf″(c) is negligible.

🔢 Square root example in detail

For y = √x near x = 100:

xY (line)y (curve)Error
10010100
10210.110.0995small
11010.510.490.01
2001514.10.9
  • Accuracy worsens as x departs from 100.
  • The tangent line is above the curve because the slope is decreasing ("concave downward").
  • At x = 102, squaring the approximation: (√100 + (1/20)·2)² = 100 + 2 + (1/400)·4; the last term (Δx)²/400 is the error.

🧮 Differentials notation

🧮 What differentials are

The symbols dx and dy are called differentials:

dx and dy measure changes along the tangent line.

  • Until this point, dy/dx was a single symbol (the derivative), not a fraction.
  • Now dx and dy become separate variables with independent meaning.
  • dx measures horizontal change (same as Δx); dy measures the corresponding vertical change on the tangent line.

🧮 Differential vs. true change

SymbolMeaningWhere measured
ΔyTrue change in yAlong the curve
dyChange in Y (approximation)Along the tangent line
ΔxHorizontal changeSame for both
dxHorizontal change (differential)Same as Δx
  • The differential dy equals ΔY, the change along the tangent line.
  • dy is the linear approximation to Δy.

🧮 Formula for differentials

dy = f′(x) dx

  • This is consistent with the derivative dy/dx = f′(x), but it is a definition, not cancellation.
  • Example: y = x² has dy/dx = 2x, so dy = 2x dx.
  • At basepoint x = 2 with dx = 0.1: dy = 4(0.1) = 0.4; true Δy = (2.1)² − 4 = 0.41; error = (Δx)² = 0.01.

🧮 Rules for differentials

  • d(xⁿ) = n xⁿ⁻¹ dx
  • d(sin x) = cos x dx
  • d(tan x) = sec² x dx
  • d(f + g) = df + dg
  • d(cf) = c df
  • d(fg) = f dg + g df

🌍 Applications and measurements

🌍 Why linearization matters

Science, engineering, and virtually all applications depend on linear approximation. The true function is "linearized" using its slope v.

  • Increasing time by Δt increases distance by ≈ v Δt.
  • Increasing force by Δf increases deflection by ≈ v Δf.
  • Increasing production by Δp increases value by ≈ v Δp.
  • The multiplier v is the derivative; it gives a local prediction of change.
  • Ohm's law, Hooke's law, Newton's law are linear approximations to nonlinear reality.

🌍 Three ways to measure change

TypeFor fFor x
Absolute changeΔfΔx
Relative changeΔf / f(x)Δx / x
Percentage changef / f(x)) × 100x / x) × 100
  • Relative change is often more realistic than absolute change.
  • Example: Knowing the moon's distance within 3 miles (0.001%) is more impressive than knowing your height within 1 inch (1.4%).

🌍 Volume of Earth example

The radius of Earth is within 80 miles of r = 4000 miles. Volume V = (4/3)πr³.

  • Derivative: dV/dr = 4πr².
  • Variation in volume: dV = 4π(4000)²(80) cubic miles.
  • Relative variations:
    • dr/r = 80/4000 = 2%
    • dV/V = [4π(4000)²(80)] / [(4/3)π(4000)³] = 3(80)/4000 = 6%
  • A 2% relative change in radius produces a 6% relative change in volume.
  • Exact calculation gives ≈6.1%; the error is small.
  • Interpretation: dV = (surface area) × (change in radius) = volume of a thin shell added when radius grows by dr.

🌍 Error is quadratic

The exact error in linear approximation is:

Error ≈ (1/2)(Δxf″(c), where c is between x and x + Δx

  • Linear approximation, quadratic error.
  • Example: For y = x² at x = 2, the error is exactly (Δx)².
  • When (Δx)² is not negligible, linear approximation breaks down.

🔄 Reversing the logic

🔄 From derivative back to function

The excerpt emphasizes a conceptual shift:

In Chapter 2 we struggled with y to squeeze out dy/dx. Now we use dy/dx to study y.

  • Earlier: didn't know dy/dx, so used (slope) ≈ Δyx.
  • Now: experts at computing dy/dx, so use Δy ≈ (slope) · Δx.
  • "You work with what you have."
  • After computing the derivative once, the tangent line stays near the function for every nearby number.

🔄 The local-global connection

The information in dy/dx is entirely local. It tells what is happening close to the point and nowhere else.

  • The problem is to connect the finite (Δy) to the infinitesimal (dy).
  • The Mean Value Theorem (coming later) assures points where dy/dx equals Δyx, but we cannot predict where.
  • Therefore we find other ways to recover a function from its derivatives, or to estimate distance from velocity and acceleration.

🔄 The three stages of f = vt

StatementValidity
f = vt is completely falseWhen v changes
f = vt is nearly trueOver very short intervals
df = v dt is exactly trueFor the tangent line (differential)
  • For a brief moment the function is linear and stays near its tangent line.
  • This is the foundation of calculus: local linearity.
20

Maximum and Minimum Problems

3.2 Maximum and Minimum Problems

🧭 Overview

🧠 One-sentence thesis

Calculus identifies maximum and minimum values by finding where the derivative equals zero (stationary points), then comparing those values with endpoints and rough points to determine the absolute extrema.

📌 Key points (3–5)

  • What the derivative tells us: positive slope means the function is increasing; negative slope means decreasing; zero slope indicates a potential maximum or minimum.
  • How to find extrema: solve f'(x) = 0 for stationary points, then evaluate f(x) at all critical points (stationary points, rough points, endpoints) and compare.
  • Three types of critical points: stationary points where f'(x) = 0, rough points where the derivative doesn't exist, and endpoints of the domain.
  • Common confusion: a stationary point (f'(x) = 0) is not always a maximum or minimum—the function can pause and continue in the same direction if there's a double zero.
  • Applied problems workflow: first choose the variable x and construct the function f(x) to optimize, then use calculus to solve f'(x) = 0, finally check all critical points.

📈 Reading the derivative

📈 Positive and negative slopes

  • Positive derivative (df/dx > 0): the function f(x) is increasing between points a and b.
    • All tangent lines slope upward.
    • For any two points x < X in the interval: f(x) < f(X).
  • Negative derivative (df/dx < 0): the function f(x) is decreasing.
    • For any two points x < X: f(x) > f(X).
  • Don't confuse: positive slope does not mean positive function—the function itself can be negative while still increasing.

📊 Example patterns

The excerpt gives f'(x) = (x - 1)(x - 2)(x - 3)(x - 4):

  • This slope is positive beyond x = 4 and up to x = 1.
  • Also positive between x = 2 and x = 3.
  • At x = 1, 2, 3, 4 the slope is zero and f(x) changes direction.
  • The graph goes up-down-up-down-up.

🎯 Finding maxima and minima

🎯 The zero-derivative condition

Local maximum or minimum: If the maximum or minimum occurs at a point x inside an interval where f(x) and df/dx are defined, then f'(x) = 0.

Why the derivative must be zero:

  • At a maximum, f(x + Δx) - f(x) ≤ 0 for any step Δx.
  • If Δx < 0: the difference divided by Δx gives a ratio ≥ 0, so df/dx ≥ 0 in the limit.
  • If Δx > 0: the ratio ≤ 0, so df/dx ≤ 0 in the limit.
  • Both conditions together force df/dx = 0.
  • The tangent line is level (horizontal).

⚠️ Exceptional cases

A function can have f'(x) = 0 without a maximum or minimum:

  • Example: f(x) = 4x³ - 3x⁴ has f'(x) = 12x² - 12x³ = 0 at x = 0 and x = 1.
  • At x = 0: f(0) = 0 is neither a local maximum nor minimum—the curve pauses but continues in the same direction.
  • Reason: the "double zero" from the factor x² in 12x²(1 - x).
  • At x = 1: f(1) = 1 is an absolute maximum (the function goes to negative infinity in both directions).

🔍 Three types of critical points

All maxima and minima occur at critical points:

TypeDescriptionExample
Stationary pointf'(x) = 0Top or bottom of a smooth curve
Rough pointDerivative doesn't existCorner in the graph, like y = |x| at x = 0
EndpointBoundary of the domainMaximum of |x| on [-3, 2] occurs at x = -3

Don't confuse: critical points (specified by x-values) vs. critical values (specified by f(x) values).

🔧 The optimization procedure

🔧 Three-step method

  1. Solve f'(x) = 0 to find stationary points.
  2. Compute f(x) at every critical point—stationary points, rough points, endpoints.
  3. Compare those values to identify the maximum and minimum.

📐 When infinity is involved

  • A function can approach but never reach a value: f(x) = 1/(1 + x²) approaches but never reaches zero as x → ∞.
  • We still say "the minimum is zero" even though it's not attained.
  • Exception: if f is continuous on a closed interval [a, b], then f(x) reaches its maximum and minimum (Extreme Value Theorem).

🌍 Applied problems

🌍 The real challenge: formulation

In applications, the hardest step is often choosing the variable and constructing the function:

  • The problem starts with a question, not a function.
  • You must decide what x represents and express the quantity to optimize as f(x).
  • Only then does calculus enter: compute f'(x) and solve f'(x) = 0.

🚗 Expressway example

Problem: Where to enter an expressway to minimize driving time? Expressway speed 60 mph, ordinary speed 30 mph.

Setup:

  • Distance to expressway: √(a² + x²) at 30 mph takes √(a² + x²)/30 hours.
  • Distance on expressway: (b - x) at 60 mph takes (b - x)/60 hours.
  • Total time: f(x) = (1/30)√(a² + x²) + (1/60)(b - x).

Solution:

  • Derivative: f'(x) = (1/30) · (1/2)(a² + x²)^(-1/2) · 2x - 1/60.
  • Set f'(x) = 0 and solve: 2x = √(a² + x²), so 4x² = a² + x², giving x = a/√3.
  • Surprising result: the optimal entrance point x = a/√3 doesn't depend on b.
  • Must also check endpoints x = 0 and x = b.
  • The optimal x is the smaller of a/√3 and b.

Alternative formulation using angle:

  • Let x be an angle instead of distance.
  • Solving df/dx = 0 gives sin x = 1/2, so the optimal angle is 30°.

⚖️ Physics: minimum energy

Nature often chooses minimum energy:

  • Spring pulled by mass: energy f(x) = (1/2)kx² - mx.
  • Spring energy (1/2)kx² is positive in stretching or compression.
  • Potential energy -mx decreases as mass goes down.
  • Equilibrium at df/dx = kx - m = 0, giving x = m/k.
  • Key insight: when f(x) is quadratic, the equilibrium equation f'(x) = 0 is linear.

💰 Economics: marginal cost and profit

Marginal cost: the derivative dC/dx, representing the cost of each additional unit.

  • If cost C = 1000 + 3x dollars for x books, then dC/dx = 3 (marginal cost is $3 per book).
  • Profit P(x) = income I(x) - cost C(x).
  • Profit is maximized when marginal income equals marginal cost: dI/dx = dC/dx.

Advertisement example:

  • Cost: C(x) = 900 + 400x - x² (setup cost, print cost, volume savings).
  • Income: I(x) = 600x - 6x² (sales per ad, minus diminishing returns).
  • Optimal decision: dC/dx = dI/dx gives 400 - 2x = 600 - 12x, so x = 20.
  • Profit = 9600 - 8500 = 1100.
21

Second Derivatives: Bending and Acceleration

3.3 Second Derivatives: Bending and Acceleration

🧭 Overview

🧠 One-sentence thesis

The second derivative reveals how a curve bends—whether it curves upward or downward—and this bending information determines whether stationary points are maxima or minima, identifies inflection points where bending direction changes, and enables quadratic approximations that track curves far more accurately than tangent lines.

📌 Key points (3–5)

  • What the second derivative measures: rate of change of slope (bending direction), not the slope itself or the function value.
  • Concave up vs. concave down: positive second derivative means the curve bends upward (slope increasing); negative means bending downward (slope decreasing).
  • How to classify stationary points: at a point where the first derivative is zero, a positive second derivative indicates a local minimum, while a negative second derivative indicates a local maximum.
  • Common confusion: inflection points vs. stationary points—inflection points occur where the second derivative equals zero and changes sign (bending direction reverses), not where the first derivative is zero.
  • Quadratic approximation advantage: including the second derivative term allows approximations to bend with the curve, vastly outperforming straight tangent lines.

📐 Understanding the second derivative

📊 What bending means

When the second derivative f²(x) is positive, the function is concave up (also called convex); when f²(x) is negative, the function is concave down.

  • A straight line has zero second derivative because its slope never changes.
  • The parabola y = x² has f' = 2x and f² = 2 (constant positive bending upward).
  • The sine function has f² = -sin(x), so it alternates between bending down and bending up.

Key insight: The second derivative tells you nothing directly about whether the function is positive or negative, or even whether it's increasing or decreasing—it only reveals how the slope is changing.

👁️ Visual test for second derivative sign

  • Concave up (f² > 0): tangent lines stay below the curve; linear approximation underestimates.
  • Concave down (f² < 0): tangent lines stay above the curve; linear approximation overestimates.

Example: For f(x) = x², the tangent at x = 0 is the x-axis, which lies entirely below the upward-bending parabola.

⚙️ Physical interpretation

  • In motion problems, the second derivative f²(t) represents acceleration (units: distance/time²).
  • Acceleration is the rate of change of velocity.
  • Example: sin(2t) has velocity v = 2cos(2t) (max speed 2) and acceleration a = -4sin(2t) (max acceleration 4).

🎯 Using the second derivative to classify extrema

🔍 The second derivative test

When f'(x) = 0 and f²(x) > 0, there is a local minimum at x.
When f'(x) = 0 and f²(x) < 0, there is a local maximum at x.

Why this works:

  • At a minimum, the slope changes from negative (falling) to positive (rising), so the slope is increasing → f² > 0.
  • At a maximum, the slope changes from positive to negative (dropping), so the slope is decreasing → f² < 0.

📝 Worked example

For f(x) = x³ - x²:

  • First derivative: f'(x) = 3x² - 2x
  • Stationary points: 3x² - 2x = 0 gives x = 0 and x = 2/3
  • Second derivative: f²(x) = 6x - 2
  • At x = 0: f²(0) = -2 < 0 → local maximum
  • At x = 2/3: f²(2/3) = 2 > 0 → local minimum

⚠️ Edge case

When both f'(x) = 0 and f²(x) = 0, the test is inconclusive—anything can happen.

  • Example: x³ pauses at x = 0 and continues upward (not a maximum or minimum).
  • Example: x⁴ pauses at x = 0 with a very flat graph (actually a minimum).

🔄 Inflection points

🌊 Where bending direction changes

An inflection point occurs where f²(x) = 0 and f² changes sign.

  • The curve switches from concave down to concave up (or vice versa).
  • At an inflection point, the tangent line crosses the curve (it's below on one side, above on the other).
  • For an instant at the inflection point, the graph is locally straight (f² = 0).

📐 Polynomial patterns

Polynomial degreeInflection points
Parabola (degree 2)None (f² is constant)
Cubic (degree 3)Exactly one (f² is linear)
Quartic (degree 4)Might have 0, 1, or 2 (f² is quadratic)

🌍 Real-world significance: population growth

The excerpt highlights a UN Population Fund report:

  • Population is still growing (f' > 0).
  • The question is whether the rate of growth is slowing (whether we've passed the inflection point where f² changes from positive to negative).
  • At the inflection point, the rate of growth stops growing—this is critical for predicting whether population stabilizes at 10 billion or overshoots to 14 billion.

Don't confuse: increasing population (f > 0 and f' > 0) vs. increasing growth rate (f² > 0).

🔢 Numerical approximations with second derivatives

📏 Centered differences (better than one-sided)

The centered difference for the first derivative:

[f(x + Δx) - f(x - Δx)] / (2Δx) ≈ f'(x)

  • More accurate than one-sided differences because it averages information from both sides.
  • For f(x) = x², the centered difference gives exactly f' = 2x (no error), while one-sided gives 2x + Δx.
  • Centered differences have error proportional to (Δx)², called "second-order accurate."

🔢 Second differences

The second difference approximates the second derivative:

[f(x + Δx) - 2f(x) + f(x - Δx)] / (Δx)² ≈ f²(x)

  • This is a "difference of differences"—it measures how the slope changes.
  • The notation d²f/dx² mirrors this: dx is squared, but df is not (just as in distance/time²).

📊 Accuracy comparison table

The excerpt tests f(x) = sin(x) + cos(x) at x = 0 with decreasing step sizes:

Step ΔxOne-sided errorCentered errorSecond difference error
1/40.13470.01040.0052
1/80.06500.00260.0013
1/160.03190.00070.0003
  • One-sided errors are O(Δx): halving the step halves the error.
  • Centered and second-difference errors are O(Δx)²: halving the step divides error by 4.

📈 Quadratic approximation

🎯 The formula

Quadratic approximation: f(x) ≈ f(a) + f'(a)(x - a) + ½f²(a)(x - a)²

This extends linear approximation (the tangent line) by adding a bending term.

✅ Why the ½ factor

  • The model function f(x) = x² has f² = 2.
  • To match the curve exactly, we need ½ · 2 · (x)² = (x)².
  • The factor ½ cancels the 2 that comes from differentiating x² twice.

🎯 What matches at the basepoint x = a

  1. Function values: f(a) = f(a) ✓
  2. First derivatives: both equal f'(a) ✓
  3. Second derivatives: both equal f²(a) ✓

The quadratic approximation bends with the function, unlike the tangent line which cannot bend.

📝 Examples

Example: (1 + x)ⁿ ≈ 1 + nx + ½n(n-1)x²

  • First derivative at x = 0 is n.
  • Second derivative at x = 0 is n(n-1).
  • This is the start of the binomial expansion.

Example: 1/(1-x) ≈ 1 + x + x²

  • Derivatives at x = 0: f' = 1, f² = 2.
  • The ½ factor cancels the 2, giving coefficients 1, 1, 1.
  • This is the beginning of the geometric series 1 + x + x² + x³ + ...

🔬 Numerical accuracy

Testing √(1+x) ≈ 1 - ½x + ⅜x²:

ApproximationError behavior
Linear only (1 - ½x)Error ∝ x² (divided by 4 when x halved)
Quadratic (1 - ½x + ⅜x²)Error ∝ x³ (divided by 8 when x halved)

The quadratic approximation is dramatically more accurate—errors shrink much faster as you get closer to the basepoint.

22

Graphs

3.4 Graphs

🧭 Overview

🧠 One-sentence thesis

Understanding graphs requires combining mathematical analysis (slope, concavity, asymptotes) with computational tools, because while machines draw curves faster, calculus reveals the underlying structure that makes graphs meaningful.

📌 Key points (3–5)

  • What graphs reveal: position (sign of f), slope (sign of f'), bending (sign of f''), and behavior at infinity (asymptotes).
  • Why derivatives matter for graphing: finding where f' = 0 (stationary points) is far more accurate than visually locating flat spots on y = f(x); each derivative acts like an "infinite zoom."
  • Centering and zoom transforms: shifting coordinates (X = x - a, Y = y - b) and rescaling (x = cX, y = dY) simplify equations and focus on key features.
  • Common confusion: a flat graph near a minimum is normal, not unusual—the tangent is horizontal (dy/dx = 0), so nearby points differ by only Cx², which is tiny when x is small.
  • Asymptotes come in three types: horizontal (y = b as x → ±∞), vertical (x = a where f blows up), and sloping (y = mx + b for large x).

📊 Reading real-world graphs: the electrocardiogram

💓 What an ECG shows

  • An ECG records electrical impulses during heartbeats through twelve graphs (six chest leads, six limb leads).
  • The fundamental pattern: P wave (atrial contraction), QRS complex (ventricular contraction), T wave (ventricular relaxation).
  • Dark vertical lines are 1/5 second apart; light lines are 1/25 second apart.

🩺 Heart rate calculation

  • Count dark lines between R-wave spikes (the peaks).
  • Doctors memorize: 1, 2, 3, 4, 5, 6 dark lines → rates of 300, 150, 100, 75, 60, 50 beats per minute.
  • Example: If beats are 3 dark lines apart (3/5 second), the rate is 5 beats per 3 seconds = 100 per minute.
  • Tachycardia: rate above 100 (too fast); bradycardia: rate below 60 (slow, but normal for resting athletes).

⚠️ Rhythm and emergency signals

  • Normal rhythm: regular spacing between peaks, set by the SA node pacemaker.
  • Sinus arrhythmia: changing time between peaks (fairly normal variation).
  • Fibrillation: irregular contractions with no normal PQRST sequence—muscles quiver independently, pumping action nearly gone; requires immediate CPR or defibrillator.
  • Myocardial infarction (heart attack) shows three I's:
    • Ischemia: upside-down T wave (reduced blood supply).
    • Injury: elevated ST segment (recent attack).
    • Infarction: wide Q wave (≥ 1/25 second, occupying ~1/3 of QRS complex; indicates dead tissue).

Don't confuse: Rhythm problems (pacemaker firing irregularly) vs. blood-supply problems (blocked artery causing infarction).

🔍 Mathematical analysis of graphs

📋 Six main tests for any function f(x)

  1. Sign of f(x): above or below axis; f = 0 at crossing points.
  2. Sign of f'(x): increasing or decreasing; f' = 0 at stationary points.
  3. Sign of f''(x): concave up or down; f'' = 0 at inflection points.
  4. Behavior as x → ±∞: does f approach a limit or grow without bound?
  5. Points where f → ±∞: vertical asymptotes.
  6. Special properties: even/odd symmetry, periodicity, jumps, endpoints.

📐 Asymptotes defined

Horizontal asymptote y = b: if f(x) → b as x → ±∞, the line y = b is a horizontal asymptote.

Vertical asymptote x = a: if f(x) → ±∞ as x → a, the line x = a is a vertical asymptote.

Sloping asymptote y = mx + b: if f(x) - (mx + b) → 0 as x → ±∞, the line y = mx + b is a sloping asymptote.

  • Vertical asymptotes usually come from zero denominators.
  • For rational functions P(x)/Q(x), divide polynomials to find sloping asymptotes.

Example: f(x) = x²/(x - 1) has vertical asymptote x = 1 (denominator zero) and sloping asymptote y = x + 1 (from polynomial division: x²/(x - 1) = x + 1 + 1/(x - 1)).

🔄 Symmetry properties

  • Even function: f(-x) = f(x); graph symmetric across y-axis (e.g., x² or cos x).
  • Odd function: f(-x) = -f(x); graph symmetric through origin (e.g., x³ or sin x).
  • Periodic function: f(x + T) = f(x) for some period T (e.g., sin x has period 2π).

💻 Computational graphing and optimization

🎯 Why graphs near minima are flat

  • At a minimum, dy/dx = 0 (horizontal tangent).
  • Near x*, the graph looks like y = Cx² (a parabola sitting on the tangent).
  • At distance Δx = 0.01, the height is only C(0.01)² = 0.0001C—barely visible unless C is large.
  • This flatness is the whole point of dy/dx = 0, not an unusual problem.

🔬 Three levels of accuracy

MethodWhat you useAccuracy gain
Trace on y(x)Function valuesRead to nearest pixel
Solve f'(x) = 0First derivativeOrder of magnitude better (graph crosses axis, not just flattens)
Newton's methodSecond derivative f''Another order of magnitude (error: 0.01 → 0.0001 → 0.00000001)
  • Finding where f'(x) = 0 is easier than finding where y(x) is minimum, because the derivative's graph crosses the x-axis rather than merely flattening.
  • Each derivative formula represents an infinite zoom already performed by calculus.

Don't confuse: Visual zooming (stretches the picture) vs. taking derivatives (analytically captures the limit of zooming).

📦 Zoom boxes

  • Draw a box around the region of interest on the graph.
  • The box becomes the new viewing window (axes stretched to fit).
  • Repeated zooms narrow the search: first 1 < x* < 3, then 1.5 < x* < 2, etc.
  • Use a long thin box to see bending; square boxes make the graph flatter (approaching the tangent line).

🔧 Coordinate transforms

🎯 Centering transform

Centering transform: shifts the point (a, b) to the origin by setting X = x - a and Y = y - b.

  • Changes y = f(x) into Y + b = f(X + a), or Y = f(X + a) - b.
  • Purpose: move the interesting point (e.g., a minimum) to (0, 0) for simpler algebra.
  • Example: y = 2x² - 4x has minimum at (1, -2); centering gives Y = 2X².

🔎 Zoom transform

Zoom transform: scales axes by factors c and d, setting x = cX and y = dY.

  • Changes Y = F(X) into y = dF(x/c).
  • Square zoom (c = d): lines keep their slope; useful for seeing overall shape.
  • Rectangular zoom (c ≠ d): stretches one axis more; needed to see bending in flat graphs.

📏 Effect on derivatives

  • First derivative (slope) is multiplied by d/c.
  • Second derivative (bending) is multiplied by d/c².
  • Square zoom (d/c = 1) preserves slopes but changes curvature.
  • Zoom out (small c, d) makes the big picture more curved.

🧮 Combined transform

  • First center: X = x - a, Y = y - b.
  • Then zoom: x = cX, y = dY.
  • Result: y = d[f(x/c + a) - b].
  • The viewing window after both transforms: original [-1, 1] becomes [a - c, a + c] for x and [b - d, b + d] for y.

Example: Every upward-opening parabola can be transformed to y = x² by choosing the right a, b, c, d (center the vertex, then scale to remove the leading coefficient).

23

Parabolas, Ellipses, and Hyperbolas

3.5 Parabolas, Ellipses, and Hyperbolas

🧭 Overview

🧠 One-sentence thesis

Parabolas, ellipses, and hyperbolas are second-degree curves that connect algebraic equations to geometric shapes, forming a cornerstone of analytic geometry.

📌 Key points (3–5)

  • What these curves are: second-degree curves produced by equations that include x², xy, and y² terms, in addition to 1, x, and y.
  • Where they come from: all three curves (plus circles) are "conic sections"—they result from slicing a cone with a plane at different angles.
  • How they differ by cutting angle: level cut → circle; moderate angle → ellipse; steep cut → hyperbola; borderline angle (matching the cone) → parabola.
  • Why they matter: they bridge analysis (equations) and geometry (curves), forming the foundation of analytic geometry.
  • Common confusion: straight lines use 1, x, y (first degree); these curves add x², xy, y² (second degree); going to x³ and y³ makes mathematics much more complicated.

📐 The hierarchy of important curves

📊 Ranking the top curves in mathematics

The excerpt lists four most important types of curves:

RankCurve typeWhat it usesPurpose
1Straight lines1, x, yLinear relationships
2Sines and cosinesTrigonometric functionsOscillation
3ExponentialsExponential functionsGrowth and decay
4Parabolas, ellipses, hyperbolas1, x, y, x², xy, y²Second-degree relationships

🏛️ Historical perspective

  • The Greeks would have ranked parabolas, ellipses, and hyperbolas first.
  • It is natural to progress from linear equations (first degree) to quadratic equations (second degree).
  • Moving beyond second degree (to x³ and y³) makes the mathematics complicated.

🔗 Connecting equations and geometry

🧮 What analytic geometry does

Analytic geometry: the connection between analysis of equations and geometry of curves.

  • Analysis focuses on the equation itself (algebraic manipulation, solving).
  • Geometry focuses on the visual curve (shape, properties).
  • Together they form analytic geometry, which has become central to mathematics.

📍 Numbers and points

  • Numbers correspond to points in space.
  • Example: "the point (5, 2)" refers to a specific location.
  • This was not obvious to ancient mathematicians like Euclid.

✏️ Descartes vs. Euclid

  • Where Euclid drew a 45° line through the origin (geometric description), Descartes wrote y = x (algebraic equation).
  • This shift from pure geometry to equation-based representation revolutionized mathematics.

🍦 Conic sections: slicing a cone

🔪 How conic sections are created

Conic sections: curves produced by slicing a cone with a plane at different angles.

All four curves (circle, ellipse, parabola, hyperbola) come from the same geometric construction—cutting a cone with a plane.

📐 The four cuts and their curves

Cut typeAngle descriptionResulting curve
Level cutHorizontal planeCircle
Moderate angleTilted, but not too steepEllipse
Steep cutSharp angleHyperbola (two pieces)
BorderlineMatches the cone angleParabola

🎯 The borderline case

  • The parabola occurs at the exact angle where the slicing plane is parallel to the edge of the cone.
  • This is the transition point between ellipse and hyperbola.
  • Don't confuse: the parabola is not "in between" in shape, but rather the critical angle where the curve changes from closed (ellipse) to open (hyperbola).

🌟 Why this matters

  • The Greeks discovered these remarkable properties geometrically, by studying cone slices.
  • All these seemingly different curves share a common geometric origin.
  • This unifying insight connects diverse mathematical phenomena through a single construction.
24

Iterations x^{n+1} = F(x^n)

3.6 Iterations x^{n+1} = F(x^n)

🧭 Overview

🧠 One-sentence thesis

The Fundamental Theorem of Calculus—that the area under the velocity graph equals the change in distance—can be understood through piecewise linear functions and algebra before introducing limits and curves.

📌 Key points (3–5)

  • Core relationship: The velocities v are slopes of the distance function f(t), and the area under the v-graph equals f(t_end) − f(t_start).
  • Piecewise approach: For piecewise linear f(t) and piecewise constant v(t), the Fundamental Theorem holds using only algebra (no limits needed yet).
  • Growth patterns: Different sequences (linear 2j, quadratic j², exponential 2^j) show how velocity and distance relate in various contexts.
  • Common confusion: Don't confuse marginal rates (slope at a point, like tax rate) with average rates (total divided by total, like average tax burden).
  • Calculus extension: The algebraic approach with steps prepares for smooth curves, where limits become necessary to handle continuously changing velocity.

📐 The Fundamental Theorem without limits

📏 Area equals change in distance

The area under the v-graph is f(t_end) − f(t_start).

  • The excerpt demonstrates this with rectangles under a staircase velocity graph.
  • Each rectangle has base 1 and height equal to the velocity v at that step.
  • Total area = sum of all v's = f_last − f_first.
  • Example: Starting at any time and ending at t = 3.5, counting only half the last rectangle (under v = 7) gives area 1 + 3 + 5 + (1/2)(7) = 12.5, which matches f(3.5) − f(0) = 12.5 − 0.

🔗 Connection between slope and area

  • The v's are slopes of f(t): velocity is the rate of change of distance.
  • The area under velocity gives back the total distance change.
  • This is the Fundamental Theorem of Calculus, but restricted here to piecewise linear f and piecewise constant v.
  • Chapter 5 will remove this restriction and handle curved graphs using limits.

🧮 Algebraic proof example

  • The excerpt shows that 1 + 3 + 5 + 7 = 4² by comparing areas.
  • The triangle under a dotted line has the same area as four rectangles under the staircase.
  • Triangle area = (1/2) × base × height = (1/2) × 4 × 8 = 16 = 4².
  • For j rectangles: area = (1/2) × j × 2j = j².

📈 Growth patterns: linear, quadratic, exponential

📊 Comparing three types of growth

TypeFormulaExample at j=10Velocity pattern
Linear2j20Constant differences
Quadratic100Linearly increasing differences
Exponential2^j1024Exponentially increasing differences
  • Exponential 2^j grows much faster than the others.
  • Don't confuse: 2j (linear), j² (quadratic), and 2^j (exponential) are fundamentally different growth rates.

🚀 Exponential velocity and distance

  • Start with f = 1, 2, 4, 8, 16 (powers of 2).
  • These are 2^0, 2^1, 2^2, 2^3, 2^4.
  • The differences (velocities) are v = 1, 2, 4, 8—also powers of 2.
  • Formula: f_j = 2^j and v_j = 2^(j−1).
  • Key property: v(t) is proportional to f(t)—the velocity is proportional to the distance.
  • This pattern appears in population growth, bank interest, and national debt.
  • The smooth curve version is f(t) = 2^t with slope v(t) = c·2^t, where c ≈ 0.693 (the natural logarithm of 2).

🌊 Oscillating velocity and distance

  • Example: f = 0, 1, 1, 0, −1, −1, 0 (returns to zero after 6 steps).
  • Velocities: v = 1, 0, −1, −1, 0, 1 (also returns to zero).
  • The sum of v's is zero, agreeing with f_last − f_first = 0 − 0 = 0.
  • The f-graph roughly resembles a sine curve; the v-graph roughly resembles a cosine curve.
  • This is a periodic motion: the pattern repeats every 6 steps (the period is 6).
  • Smooth versions (sine and cosine) repeat every 2π radians (or 360 degrees).
  • Don't confuse: These digitized step functions are like analog waveforms but go in jumps, not smoothly.

🚗 Burst of speed and step functions

⚡ Short burst example

  • A car travels at speed V until distance reaches f = 1, then stops suddenly.
  • Velocity: v(t) = V up to time T, then v(t) = 0 after T.
  • Distance: f(t) = V·t up to time T, then f(t) = 1 after T.
  • Area under v-graph = V × T = 1, so T = 1/V.
  • Example: Three cars (Jeep, Corvette, Maserati) with different speeds V but all reach distance 1, so all have area 1 under their velocity graphs; faster cars have taller, narrower rectangles.

📍 Step function and infinite slope

  • Extreme case: speed becomes infinitely fast, so the car jumps instantly from 0 to 1.
  • This is the unit step function U(t): U = 0 for t < 0, then U = 1 for t > 0.
  • Slope is zero everywhere except at the jump (t = 0), where the slope is infinite.
  • The velocity is an impulse (a spike), often denoted by δ (delta function).
  • Area under the infinite spike is still 1.
  • Don't confuse: Calculus is about curves, not jumps; delta functions are an advanced topic beyond ordinary calculus.

💰 Income tax: marginal vs average rates

📋 Tax brackets and rates (1991 example)

  • Taxable income x determines tax f(x) through three brackets:
    • Bracket 1: 0 ≤ x ≤ $20,350, tax = 0.15 × x (rate 15%).
    • Bracket 2: $20,350 ≤ x ≤ $49,300, tax = $3,052.50 + 0.28 × (x − $20,350) (rate 28%).
    • Bracket 3: x ≥ $49,300, tax = $11,158.50 + 0.31 × (x − $49,300) (rate 31%).
  • The tax function f(x) is piecewise linear with slopes 0.15, 0.28, 0.31.

🔍 Marginal rate vs average rate

Marginal rate: the tax on each additional dollar of income; it is the slope at point x.

Average rate: total tax divided by total income.

  • The rates 0.15, 0.28, 0.31 are marginal rates, not average rates.
  • Only the income in a bracket is taxed at that bracket's rate.
  • Example: Someone earning $50,000 pays 31% only on the amount over $49,300, not on the entire $50,000.
  • Tax is like area or distance—it adds up over all brackets.
  • Tax rate is like slope or velocity—it depends on where you are (your income level).
  • Don't confuse: News media often conflate marginal and average rates, leading to misunderstanding.

📐 Equation for the top bracket

  • The top bracket begins at x = $49,300 with tax f(x) = $11,158.50.
  • The slope (marginal rate) is 0.31.
  • Equation: f(x) = $11,158.50 + 0.31 × (x − $49,300) for x ≥ $49,300.
25

Newton's Method (and Chaos)

3.7 Newton’s Method (and Chaos)

🧭 Overview

🧠 One-sentence thesis

Newton's method provides a powerful iterative technique for solving equations by following tangent lines, but even simple equations can lead to unpredictable chaotic behavior that reveals fundamental limits on our ability to compute long-term outcomes.

📌 Key points (3–5)

  • Core mechanism: Newton's method replaces a function with its tangent line at each step, using both the function value f(x₀) and slope f'(x₀) to find where the tangent crosses the axis.
  • Typical convergence: When starting near a solution, the error is squared at each step (superconvergence), doubling the number of correct decimals per iteration.
  • Chaos from simple rules: Even innocent-looking iterations like solving x² + 1 = 0 produce chaotic sequences that are extremely sensitive to starting values and unpredictable over long times.
  • Common confusion: Don't confuse the function f(x) whose root we seek with the iteration function F(x); convergence depends on the slope F'(x*) at the fixed point, not just on f(x*) = 0.
  • Practical limits: Simple deterministic rules can generate chaos, meaning no measurement can ever be accurate enough to predict long-term behavior—this applies to weather, planetary orbits, and mathematical iterations alike.

🎯 The Newton iteration formula

🎯 Building the method from tangent lines

Newton's method: x_{n+1} = x_n - f(x_n)/f'(x_n)

  • What we know at x₀: the graph height f(x₀) and the slope f'(x₀).
  • What we don't know: whether the curve bends (we lack f'').
  • The strategy: follow the tangent line, which uses all available information.
  • The tangent line equation is f(x) ≈ f(x₀) + f'(x₀)(x - x₀).
  • Set the right side to zero to find where the tangent crosses the axis: f(x₀) + f'(x₀)(x₁ - x₀) = 0.
  • Solving for x₁ gives the Newton formula.

🔄 The iteration process

At each new point x_n:

  1. Compute the height f(x_n) and slope f'(x_n).
  2. Draw the tangent line from that point.
  3. Find where it crosses the axis to get x_{n+1}.
  4. Repeat until convergence (or chaos).

Example: The tangent line from x_n crosses the axis at x_{n+1}, while the actual curve crosses at the true solution x*.

🔗 Connection to linear approximation

The excerpt emphasizes that calculus uses three related calculations:

  1. Estimate slope f'(x) from Δf/Δx (differentiation).
  2. Estimate change Δf from f'(x)Δx (linear approximation).
  3. Estimate change Δx from Δf/f'(x) (Newton's method).

Newton's method is exactly Δx = -f(x_n)/f'(x_n), where the desired Δf is -f(x_n).

🚀 Convergence and superconvergence

🚀 Error squaring behavior

Superconvergence: the error is squared at each Newton step when F'(x*) = 0.

  • For the iteration x_{n+1} = F(x_n), the slope of F at the solution determines convergence speed.
  • Newton achieves F'(x*) = 0 when f(x*) = 0, which is optimal.
  • Why F' = 0: The slope of F(x) = x - f(x)/f'(x) equals f(x)f''(x)/(f'(x))², which is zero when f(x) = 0.
  • This means the multiplier m = 0, far better than the usual convergence test |F'(x*)| < 1.

📈 Doubling correct decimals

Example (square roots): To find √4, iterate x_{n+1} = ½(x_n + 4/x_n) starting from x₀ = 1:

  • x₁ = 2.5
  • x₂ = 2.05
  • x₃ = 2.0006
  • x₄ = 2.000000009

The wrong decimal is twice as far out at each step—the number of correct decimals doubles.

⚠️ When convergence fails

  • Starting at a horizontal tangent: If f'(x₀) = 0, the tangent line is horizontal and never crosses the axis (division by zero).
  • Wrong basin of attraction: Starting at x₀ = -1 for √4 converges to -√4 instead of +√4.
  • Divergence: If |1 - 2x₀| > 1 in certain iterations, the sequence diverges rather than converges.

Don't confuse: Fast convergence near x* does not guarantee convergence from far away; basins of attraction can be complicated.

🧮 Classic examples

🧮 Square roots (Example 1)

To solve f(x) = x² - b = 0 (finding x* = √b):

  • The slope is f'(x_n) = 2x_n.
  • Newton's formula becomes x_{n+1} = x_n - (x_n² - b)/(2x_n).
  • This simplifies to x_{n+1} = ½(x_n + b/x_n).
  • Interpretation: Guess the square root, divide into b, and average the two numbers.
  • The Babylonians used this same method without knowing calculus.
  • The error equation is x_{n+1} - 2 = (1/(2x_n))(x_n - 2)², showing error squaring.

🧮 Division without dividing (Example 2)

To solve 1/x - a = 0 (finding x* = 1/a without division):

  • Here f(x) = 1/x - a and f'(x) = -1/x².
  • Newton gives x_{n+1} = x_n - (1/x_n - a)/(-1/x_n²) = x_n + x_n - ax_n² = 2x_n - 2x_n².
  • For a = 2 (aiming for x* = 1/2), the error equation is x_{n+1} - 1/2 = -2(x_n - 1/2)².
  • Convergence condition: Fast convergence if 0 < x₀ < 1; divergence if x₀ < 0 or x₀ > 1.
  • The algebra confirms: (1 - 2x_{n+1}) = (1 - 2x_n)², so convergence requires |1 - 2x₀| < 1.

🧮 Imaginary roots and chaos (Example 3)

To solve f(x) = x² + 1 = 0 (no real solutions, only x* = i and x* = -i):

  • Newton's method becomes x_{n+1} = ½(x_n - 1/x_n).
  • The x's cannot approach i or -i (nothing is imaginary).
  • What happens: The sequence bounces around unpredictably.
  • Starting from x₀ = 1 gives x₁ = 0, then x₂ divides by zero and blows up.
  • Large x_n produces x_{n+1} about half as large, but eventually a small number appears, then division by that small number sends the sequence far out again.
  • This is chaos: no convergence, strong sensitivity to initial conditions.

🌀 The cotangent formula and chaos

🌀 Trigonometric identity reveals the pattern

The key identity: ½(cot θ - 1/cot θ) = cot 2θ.

For x² + 1 = 0, if x₀ = cot θ, then:

  • x₁ = cot 2θ
  • x₂ = cot 4θ
  • x_n = cot(2ⁿθ)

Our points are on the cotangent curve, and every iteration doubles the angle.

🌀 Three behaviors

Starting angleSequence behaviorExplanation
θ = π/4x₀ = 1, x₁ = 0, x₂ = ∞Blows up due to division by zero
θ = π/3x₀ = 1/√3, x₂ = 1/√3, ...Cycles forever with period 2
Small θ (large x₀)Chaotic bouncingAngles 4θ, 8θ, ... eventually hit large cotangents

🌀 Extreme sensitivity

  • After 10 steps, θ is multiplied by 2¹⁰ = 1024.
  • Starting angles 60° and 61° look close, but after 10 steps they differ by 1024°.
  • The x₁₀ values are 0.6 and 14—completely different.
  • This is chaos: small errors snowball, making long-term prediction impossible.

Don't confuse: Deterministic rules (like cot 2ⁿθ) with predictable outcomes; chaos means the formula is "absolutely hopeless after 100 steps."

🌪️ Quadratic iteration and period doubling

🌪️ The parabolic family

Changing variables from x to z = 1/(1 + x²) transforms Newton's iteration into:

  • z_{n+1} = 4z_n - 4z_n²

This is the most famous quadratic iteration, with z_n = (sin 2ⁿθ)².

The general family is z_{n+1} = az_n - az_n² for 0 < a ≤ 4.

🌪️ Behavior as parameter a increases

Range of aBehaviorFixed point
0 < a < 1Converge to z* = 0Single limit
1 < a < 3Converge to z* = (a-1)/aSingle limit
a ≈ 3.42-cycle: alternates between two valuesPeriod 2
a ≈ 3.54-cycle: repeats after four stepsPeriod 4
a ≈ 3.558-cyclePeriod 8
3.57 < a < 4Chaos, period-3 windows, fractalsComplex behavior

Convergence test: |F'(z*)| < 1 at the fixed point z* = F(z*).

🌪️ Period doubling cascade

  • Stable cycles of length 2, 4, 8, 16, 32, 64, ... appear as a increases.
  • Each cycle becomes unstable and doubles in period.
  • The stability windows shrink by the Feigenbaum factor 4.6692...
  • All cycles are unstable before a = 3.57.
  • Between 3.57 and 4: chaos, with occasional windows of stable periods (like period 3).

Example: At a = 3.5, starting from any random z₀, after 100 steps the sequence settles into the 4-cycle: 0.875 → 0.383 → 0.827 → 0.501 → 0.875.

🌪️ What happens at a = 4

Three possibilities:

  1. The z_n's cycle with some long period.
  2. The z_n's come close to every point between 0 and 1.
  3. The z_n's approach a very thin limit set (like a Cantor set).

The behavior is chaotic and statistical tests find no pattern—the numbers are effectively random.

🔬 Fractals and the Cantor set

🔬 Constructing the Cantor set

Cantor set: divide [0,1] into three pieces and remove the open middle third (1/3, 2/3); repeat on each remaining piece.

  • At each step, take out the middle thirds.
  • What remains: all endpoints 1/3, 2/3, 1/9, 2/9, 7/9, 8/9, ... plus other points like 3/4.
  • The removed intervals have lengths summing to 1, so the Cantor set has "measure zero."
  • Self-similarity: Between 0 and 1/3, you see the same Cantor set scaled down by 3; from 0 to 1/9, scaled down by 9.

🔬 Fractals and dimension

Fractal: a set with self-similarity at infinitely many scales.

  • Fractional dimension: The Cantor set has dimension between 0 and 1; a fractal snowflake boundary has dimension between 1 and 2.
  • Covering a line segment with circles of radius r takes c/r circles; for fractals it takes c/r^D circles, where D is the dimension.
  • Mathematical snowflake: Start with a triangle, add a bump in the middle of each side; repeat. The boundary lengthens by 4/3 at each step, becoming infinitely long.

🔬 Chaos in nature

  • Weather forecasting: Small errors snowball, destroying forecasts after 6 days; a plane's flight can change everything.
  • Pluto's orbit: Chaotic despite obeying gravity; motion is unpredictable over long times.
  • Revolutionary idea: Simple rules can lead to answers too sensitive to compute; we are not accustomed to innocent formulas that are "absolutely hopeless after 100 steps."

Don't confuse: Complicated formulas with unpredictability; even simple deterministic rules (like x_{n+1} = 4x_n - 4x_n²) can produce chaos.

🛠️ Alternative methods

🛠️ Secant method

Secant method: x_{n+1} = x_n - f(x_n)/(Δf/Δx)n, where (Δf/Δx)n = (f(x_n) - f(x{n-1}))/(x_n - x{n-1}).

  • Key difference from Newton: Uses two previous points to approximate the slope, avoiding the need to compute f'(x).
  • The secant line connects the two latest points on the graph; set y = 0 to find where it crosses the axis.
  • Prediction: Three secant steps ≈ two Newton steps; both give four times as many correct decimals.
  • Probably also chaotic for x² + 1 = 0.

🛠️ Bisection method

If f(x) changes sign between x₀ and x₁:

  1. Find the sign at the midpoint x₂ = ½(x₀ + x₁).
  2. Decide whether f(x) changes sign between x₀ and x₂, or x₂ and x₁.
  3. Repeat on that half-length interval.
  4. Switch to a faster method when the interval is small enough.

Example: f(x) = x² - 4 is negative at x = 1, positive at x = 2.5, negative at midpoint x = 1.75, so x* lies in [1.75, 2.5].

Three bisection steps reduce the interval by a factor of 8 (= 2³).

🛠️ Comparison table

MethodSlope informationConvergence speedAdvantages
NewtonUses f'(x_n)Error squared (superfast)Fewest steps when close to x*
SecantApproximates f' from two pointsBetween linear and quadraticNo derivative needed
BisectionNone (only sign changes)Linear (halves interval)Always converges if f changes sign; robust

Don't confuse: The function f(x) we're solving with the iteration function F(x); Newton's F has F'(x*) = 0, which is why it converges so fast.

26

The Mean Value Theorem and l'Hôpital's Rule

3.8 The Mean Value Theorem and l’Hôpital’s Rule

🧭 Overview

🧠 One-sentence thesis

The Mean Value Theorem bridges local behavior (instantaneous slope at a point) to global behavior (average slope across an interval), and this connection enables l'Hôpital's Rule to resolve indeterminate forms like 0/0 by comparing derivatives.

📌 Key points (3–5)

  • What the MVT connects: relates the derivative df/dx at some interior point c to the average change Δf/Δx over an interval [a, b].
  • Rolle's theorem as special case: when f(a) = f(b) = 0, the derivative must equal zero at some interior point c.
  • l'Hôpital's Rule for 0/0: when both f(x) and g(x) approach zero, their ratio f/g has the same limit as the ratio of their derivatives f'/g'.
  • Common confusion: l'Hôpital is NOT the quotient rule—derivatives are taken separately for numerator and denominator, and it applies only when you have 0/0 or ∞/∞.
  • Why it matters: the MVT proves that continuous change cannot "jump over" intermediate values, and l'Hôpital provides a practical tool for evaluating limits that initially appear meaningless.

🔗 The Mean Value Theorem core idea

🎯 What the theorem states

Mean Value Theorem: If f(x) is continuous on the closed interval [a, b] and has a derivative everywhere in the open interval (a, b), then [f(b) - f(a)] / (b - a) = f'(c) at some point a < c < b.

  • The left side is the average slope over the entire interval.
  • The right side is the instantaneous slope at one unknown interior point c.
  • The theorem guarantees c exists but does not tell you its exact value.

Example: If average velocity over 2 hours is 75 mph, then instantaneous velocity must equal 75 at some moment during those 2 hours (assuming velocity exists at all interior points).

📐 Closed vs open intervals

NotationMeaningMVT requirement
[a, b]Closed interval, includes endpointsf must be continuous here
(a, b)Open interval, excludes endpointsf' must exist here
  • The derivative f' is required only in the open interval (a, b).
  • Continuity at the endpoints a and b is required, but f'(a) and f'(b) need not exist.

🔄 Rolle's theorem as the foundation

Rolle's theorem: If f(a) = f(b) = 0 (zero at both ends), then f'(c) = 0 at some point a < c < b.

  • This is the special case when the function starts and returns to zero.
  • Average slope is zero, so the derivative must be zero somewhere inside.
  • Proof sketch: A continuous function on [a, b] reaches its maximum and minimum (Extreme Value Theorem). If the max or min occurs at an interior point c, then df/dx = 0 there. If both extremes are at the endpoints, then f(x) ≡ 0 in between, so f' = 0 everywhere.

Historical note: Rolle himself did not believe in the logic behind calculus and fought against it, yet his special case leads directly to the general MVT.

🛠️ Proving and applying the MVT

🧮 Proof strategy

The proof tilts the graph back to Rolle's case by subtracting the secant line:

  • Define F(x) = f(x) - [f(a) + (Δf/Δx)(x - a)].
  • This measures the vertical distance between the curve and the secant line.
  • At both endpoints, F(a) = F(b) = 0.
  • Apply Rolle's theorem to F(x): there exists c where F'(c) = 0.
  • Taking the derivative: 0 = f'(c) - (Δf/Δx), which rearranges to the MVT.

📏 Exact prediction formula

f(x) = f(a) + f'(c)(x - a), where a < c < x.

  • This replaces the approximate linear prediction f(x) ≈ f(a) + f'(a)(x - a).
  • The approximation uses the slope at a; the exact formula uses the slope at an unknown c between a and x.

Example: sin x = (cos c) · x for some c between 0 and x. Since cos c < 1 for c > 0, this proves sin x < x for positive x.

⚙️ Key consequence: constant functions

If f'(c) = 0 at all points in an interval, then f(x) is constant.

  • Proof: The MVT gives Δf = f'(c) · Δx = 0 · Δx = 0 for every pair of points.
  • Therefore f(b) = f(a) for all a and b; the graph is a horizontal line.
  • This simple case is essential for the Fundamental Theorem of Calculus.

🎲 l'Hôpital's Rule for indeterminate forms

🔍 The 0/0 problem

When f(x) and g(x) both approach zero, the ratio f(x)/g(x) is indeterminate:

  • Examples: x²/x, (sin x)/x, (x - sin x)/(1 - cos x) all become 0/0 at x = 0.
  • You cannot work with f and g separately; it is a "race toward zero."
  • The limit of the ratio might be any number, or ±∞, or might not exist.

📜 l'Hôpital's Rule statement

l'Hôpital's Rule: Suppose f(x) and g(x) both approach zero as x → a. Then f(x)/g(x) approaches the same limit as f'(x)/g'(x), if that second limit exists. Normally this limit is f'(a)/g'(a).

Critical warning: This is NOT the quotient rule! Derivatives of f and g are taken separately, not as a quotient.

✅ When to use l'Hôpital

SituationUse l'Hôpital?Why
f(x)/g(x) → 0/0YesBoth numerator and denominator → 0
f(x)/g(x) → ∞/∞YesExtended version applies
f(x)/g(x) → a/(a-1) where a ≠ 0NoOrdinary limit: divide limits separately

Don't confuse: If f(x) → a (nonzero) and g(x) → b (nonzero), just compute a/b. l'Hôpital enters only for 0/0 or ∞/∞.

🧪 Worked examples

Example (old friend): lim[x→0] (1 - cos x)/x

  • Both numerator and denominator → 0.
  • Apply l'Hôpital: derivatives are sin x (numerator) and 1 (denominator).
  • Limit = sin 0 / 1 = 0.

Example (repeated application): lim[x→0] (x - sin x)/(1 - cos x)

  • First application: f'/g' = (1 - cos x)/(sin x), still 0/0 at x = 0.
  • Apply again: f''/g'' = (sin x)/(cos x) → 0/1 = 0.

Example (geometric insight): lim[x→0] (tan x)/(sin x)

  • Derivatives: (sec² x)/(cos x).
  • At x = 0: 1/1 = 1.

🧩 Why l'Hôpital works

The algebra behind the rule:

  • f(x)/g(x) = [f(x) - f(a)]/(x - a) ÷ [g(x) - g(a)]/(x - a).
  • Both f(a) and g(a) are zero by assumption, so this is exact.
  • As x → a, the numerator approaches f'(a) and the denominator approaches g'(a).
  • Therefore the ratio approaches f'(a)/g'(a).

The more general proof uses the Generalized Mean Value Theorem (Cauchy's extension).

🔬 Advanced topics and error analysis

🎯 Generalized MVT (Cauchy)

If f(x) and g(x) are continuous on [a, b] and differentiable on (a, b), there exists a < c < b where [f(b) - f(a)]g'(c) = [g(b) - g(a)]f'(c).

  • When g(x) = x, this reduces to the ordinary MVT.
  • This form proves l'Hôpital's Rule rigorously.
  • Proof: construct F(x) = [f(b) - f(a)]g(x) - [g(b) - g(a)]f(x), which satisfies F(a) = F(b) = 0, then apply Rolle.

📊 Error in linear approximation

The error in the tangent-line approximation grows quadratically:

Error = (1/2) f''(c)(x - a)² for some a < c < x.

  • Linear approximation: f(x) ≈ f(a) + f'(a)(x - a).
  • Exact formula: f(x) = f(a) + f'(a)(x - a) + (1/2)f''(c)(x - a)².
  • The error term involves the second derivative and the square of the distance.

Example: For f(x) = √x near a = 100, approximating √102:

  • √102 ≈ 10 + (1/20)·2 + (1/2)·(-1/4000)·2² = 10.1 - 0.0005.
  • Predicted error ≈ -0.0005; actual error ≈ -0.000496.

⚠️ Other indeterminate forms

Beyond 0/0, l'Hôpital extends to:

  • ∞/∞: Apply the rule directly to f'(x)/g'(x).
  • 0·∞: Rewrite as f(x)/(1/g(x)) to get 0/0.
  • ∞ - ∞, 0⁰, 1^∞, ∞⁰: Take logarithms to reduce to 0/0 or 0·∞.

Don't confuse: When f(x) → 0 and g(x) → ∞, the product f·g is indeterminate, but the ratio f/g → 0 (no l'Hôpital needed).

27

The Chain Rule

4.1 The Chain Rule

🧭 Overview

🧠 One-sentence thesis

The chain rule enables us to find the derivative of composite functions by multiplying the derivative of the outside function (evaluated at the inside function) by the derivative of the inside function.

📌 Key points (3–5)

  • What composition means: A composite function f(g(x)) applies g first to get y, then applies f to y to get the final output z.
  • The chain rule formula: dz/dx = (dz/dy) × (dy/dx), meaning we multiply the derivatives of the outer and inner functions.
  • Common confusion: The derivative of sin(x²) is NOT (cos x)(2x); it is (cos x²)(2x)—the cosine must be evaluated at x², not at x.
  • Order matters: f(g(x)) is usually different from g(f(x)); applying functions in different orders produces different results.
  • Recognition skill: A major part of using the chain rule is identifying which function is "inside" and which is "outside."

🔗 Understanding composite functions

🔗 What composition means

Composite function z = f(g(x)): Start with x, compute y = g(x) (the inside function), then compute z = f(y) (the outside function).

  • The inside function g(x) produces an intermediate value y.
  • The outside function f(y) takes that y as input and produces the final output z.
  • Notation: Written as f ∘ g or more commonly f(g(x)).
  • Example: For sin(x²), the inside function is g(x) = x² and the outside function is f(y) = sin y.

🔄 Order matters in composition

  • f(g(x)) is usually different from g(f(x)).
  • Example with f(x) = sin x and g(x) = x²:
    • f(g(x)) = sin(x²): square first, then take sine
    • g(f(x)) = (sin x)² (often written sin²x): take sine first, then square
  • These produce completely different functions and graphs.
  • Don't confuse: sin x² means sin(x²), never (sin x)².

🧮 Calculator analogy

  • On a calculator: input x, push the "g" button, then push the "f" button.
  • This performs the operations in sequence: x → g(x) → f(g(x)).
  • The squaring and sine functions are used in that order to create sin(x²).

🎯 The chain rule mechanics

🎯 The fundamental formula

Chain Rule: If z = f(g(x)), then dz/dx = (dz/dy) × (dy/dx) = f'(g(x)) × g'(x).

  • The derivative of the composite is the product of two derivatives.
  • First factor: derivative of the outside function f, evaluated at y = g(x).
  • Second factor: derivative of the inside function g, evaluated at x.
  • Key insight: Δz/Δx = (Δz/Δy) × (Δy/Δx) becomes exact in the limit.

⚠️ Critical evaluation point

  • The most common mistake: evaluating the outer derivative at the wrong place.
  • The derivative of sin(x²) is NOT (cos x)(2x).
  • Correct: (cos x²)(2x)—the cosine is evaluated at x², not at x.
  • General pattern: derivative of f(g(x)) requires f'(y) at y = g(x), not f'(x).

🔢 The power rule as a special case

  • For z = [g(x)]ⁿ, the chain rule gives: dz/dx = n[g(x)]^(n-1) × g'(x).
  • Example: derivative of (x² - 1)^(1/2) is (1/2)(x² - 1)^(-1/2) × (2x) = x / √(x² - 1).
  • This extends the simple power rule to any function raised to a power.

🔍 Recognizing and applying chains

🔍 Identifying inside and outside functions

  • Look at the function for a moment to see the structure.
  • Example: (x³ + 1)⁵ is u⁵ where u = x³ + 1 (inside function).
  • Example: cos(2x + 1) is cos u where u = 2x + 1.
  • Example: sin √(1 - x) involves three functions: z = sin y, y = √u, u = 1 - x.

📝 Two calculation approaches

Careful way: Write down all functions explicitly:

  • z = cos u, u = 2x + 1
  • dz/dx = (-sin u)(2) = -2 sin(2x + 1)

Quick way: Keep "the derivative of what's inside" in mind:

  • Derivative of cos(2x + 1) is -sin(2x + 1), times 2 from the chain rule.

🎲 Common examples with solutions

Function zInside y or uOutside fDerivative dz/dx
sin(x²)sin y(cos x²)(2x)
(sin x)³sin x3(sin x)² cos x
sin(3x)3xsin y3 cos(3x)
(1 - x)²1 - x2(1 - x)(-1)

⚡ Speed interpretation

  • Example: z = sin(3t) oscillates three times as fast as sin t.
  • The function sin(3t) completes a full wave at time 2π/3 instead of 2π.
  • The velocity naturally contains the extra factor 3 from the chain rule.

🧩 Special cases and extensions

🧩 Triple chains

  • Some functions involve three compositions: z = f(y), y = g(u), u = h(x).
  • Example: sin √(1 - x) chains z = sin y, y = √u, u = 1 - x.
  • Derivative: (cos √(1 - x)) × (1/(2√u)) × (-1) evaluated properly.
  • Multiply all three derivatives in sequence.

🔁 Iteration and self-composition

  • Newton's method composes F(x) with itself: x_(n+1) = F(x_n).
  • Example: F(x) = (1/2)x + 4 gives F(F(x)) = (1/4)x + 6.
  • Derivative of F(x) is 1/2; derivative of F(F(x)) is 1/4 = (1/2) × (1/2).

📐 Second derivatives

  • The chain rule gives dz/dx as a product, so the second derivative needs the product rule.
  • Formula: d²z/dx² = (dz/dy)(d²y/dx²) + (d/dx)[dz/dy] × (dy/dx).
  • The last term requires the chain rule again: becomes (d²z/dy²) × (dy/dx)².
  • Example: derivative of sin(x²) is 2x cos(x²); second derivative is 2 cos(x²) - 4x² sin(x²).

🎯 Practical word problem

  • A Buick uses 1/20 gallon per mile; you drive at 60 miles per hour.
  • Gallons per hour = (gallons/mile) × (miles/hour) = (1/20) × 60 = 3.
  • This is the chain rule: (dy/dt) = (dy/dx) × (dx/dt).

🚫 Common pitfalls

🚫 Wrong evaluation points

  • Don't confuse: The derivative of sin(x²) is NOT (cos x)(2x).
  • The cosine must be evaluated at the inside function x², not at x.
  • Always evaluate the outer function's derivative at y = g(x).

🚫 Forgetting the inside derivative

  • The derivative of (sin x)³ is 3(sin x)², times cos x.
  • That extra factor cos x (the derivative of sin x) is easy to forget.
  • The derivative of (1 - x)² is 2(1 - x), times (-1).

🚫 Confusing composition with products

  • f(g(x)) is composition; f(x)g(x) is multiplication—completely different.
  • Example: f(x) = x⁴, g(x) = x³ gives f(g(x)) = x¹² but f(x)g(x) = x⁷.
  • Use the chain rule for composition, the product rule for multiplication.

🚫 Identity and inverse special cases

  • Identity function: f(x) = x and g(x) = x gives f(g(x)) = x (derivative is 1 × 1 = 1).
  • Inverse functions: If g adds 5 and f subtracts 5, then f(g(x)) = x (output equals input).
28

Implicit Differentiation and Related Rates

4.2 Implicit Differentiation and Related Rates

🧭 Overview

🧠 One-sentence thesis

Implicit differentiation allows us to find derivatives of functions that cannot be solved explicitly for y, and related rates problems use the chain rule to connect the rates of change of different quantities through their relationships.

📌 Key points (3–5)

  • Implicit differentiation (ID): differentiate equations directly without solving for y, using the chain rule to include dy/dx wherever y appears.
  • When to use ID: equations like y⁵ + xy = 3 or sin y + sin x = 1 cannot be solved explicitly for y, but ID still finds dy/dx at any point.
  • Related rates setup: given dg/dt, find df/dt by identifying the relationship between f and g, then apply the chain rule: df/dt = (df/dg)(dg/dt).
  • Common confusion: in related rates, substitute known values after differentiating, not before—differentiating a constant gives zero and loses information.
  • Why it matters: ID reveals slopes and tangent lines for curves defined implicitly; related rates solve real-world problems where one changing quantity affects another.

🔍 Core concept: Implicit differentiation

🔍 What implicit differentiation means

Implicit differentiation: differentiating an equation F(x, y) = 0 directly by the chain rule, without solving for y in terms of x.

  • Many equations cannot be solved for y explicitly (e.g., y⁵ + xy = 3—Galois proved no solution formula exists for fifth-degree equations).
  • The function y(x) is defined implicitly by the equation, not given as an explicit formula.
  • ID works by differentiating every term with respect to x, including dy/dx from the chain rule wherever y appears.

🧮 How to perform ID

Step-by-step process:

  1. Differentiate every term in the equation with respect to x.
  2. When differentiating terms containing y, include dy/dx (from the chain rule).
  3. Collect all dy/dx terms on one side.
  4. Solve algebraically for dy/dx.
  5. Substitute the specific point (x, y) if needed.

Example: For y⁵ + xy = 3:

  • Differentiate: 5y⁴(dy/dx) + x(dy/dx) + y = 0
  • At point (2, 1): 5(dy/dx) + 2(dy/dx) + 1 = 0
  • Solve: dy/dx = -1/7

🔄 Verification approaches

The excerpt shows two ways to check ID results:

  • Method 1: If possible, solve for y explicitly and differentiate normally—should match ID result.
  • Method 2: Solve for x instead (e.g., x = 3/y - y⁴), differentiate to get 1 = ... dy/dx, then solve for dy/dx.

Example: For xy = 2, ID gives x(dy/dx) + y = 0, so dy/dx = -y/x. Solving explicitly gives y = 2/x, so dy/dx = -2/x² = -y/x ✓

🔁 Second derivatives and geometric applications

🔁 Finding second derivatives implicitly

After finding dy/dx by ID, differentiate again to get d²y/dx²:

  • Use the quotient rule or product rule on the dy/dx expression.
  • Remember dy/dx itself contains y, so apply the chain rule again.

Example: Circle x² + y² = 25

  • First derivative: 2x + 2y(dy/dx) = 0 → dy/dx = -x/y
  • Second derivative: d²y/dx² = -(y - x(dy/dx))/y² = -(y² + x²)/y³

📐 Geometric interpretation: perpendicular lines

For the circle x² + y² = 25:

  • Radius slope: y/x (goes across x, up y)
  • Tangent slope: -x/y (goes across -y, up x)
  • Product: (-x/y)(y/x) = -1, confirming perpendicularity

The second derivative -(y² + x²)/y³ is negative at the top of the circle, confirming it is concave down.

⏱️ Related rates problems

⏱️ What related rates means

Related rates: problems where you are given the rate of change of one quantity (dg/dt) and must find the rate of change of another quantity (df/dt) using a relationship between them.

  • The chain rule connects the rates: df/dt = (df/dg)(dg/dt)
  • The variable is typically t (time) because these are real-world applications.
  • You must identify or derive the relationship between f and g from the problem setup.

🎯 Three-step solution method

Standard approach emphasized in the excerpt:

  1. Write down a relation from the figure or problem description.
  2. Take its derivative with respect to t (using implicit differentiation).
  3. Substitute known information after differentiating.

Critical warning: Don't confuse: substitute values after differentiating, not before. If you substitute y = 50 into tan θ = y/100 to get tan θ = 1/2, then differentiate, you get zero (useless).

🌐 Example: Circle circumference

Simple case showing the pattern:

  • Given: radius growing at dr/dt = 7
  • Relation: C = 2πr
  • Differentiate: dC/dt = 2π(dr/dt)
  • Substitute: dC/dt = 2π(7) = 14π

Surprising implication: To put a rope around Earth that any 7-footer can walk under (adding 7 feet to radius), you only need 14π ≈ 44 feet more rope, regardless of Earth's size.

📊 Complex related rates examples

📊 Rectangle with changing sides

Problem: Rectangle sides change so dz/dt = 1 and dx/dt = 3(dy/dt). At x = 4, y = 3, find dx/dt.

Solution:

  • Relation: x² + y² = z² (diagonal)
  • Differentiate: 2x(dx/dt) + 2y(dy/dt) = 2z(dz/dt)
  • Substitute x = 4, y = 3, z = 5, dz/dt = 1: 8(dx/dt) + 6(dy/dt) = 10
  • Use dx/dt = 3(dy/dt): 6(dy/dt) = 2(dx/dt), so 10(dx/dt) = 10
  • Answer: dx/dt = 1

🚶 Shadow lengthening problem

Problem: Person 2 meters tall walks from streetlight 8 meters high. Shadow lengthens at ds/dt = 4/9 meters/second. How fast is the person walking?

Solution:

  • Draw a figure to identify variables (this was emphasized as hard—took three tries).
  • Relation from similar triangles: x/6 = s/2
  • Differentiate: dx/dt = (6/2)(ds/dt) = 3(4/9) = 4/3 meters/second
  • Note: Never needed to know actual values of x, s, or the angle.

🎈 Balloon rising problem

Problem: Balloon rises at dy/dt = 3 m/s from point C, observer at A is 100 meters from C.

Three parts:

  • (a) Rate z changes when y = 50: z² = y² + 100² → 2z(dz/dt) = 2y(dy/dt) → dz/dt = (2·50·3)/(2·50√5) = 3√5/5
  • (b) Rate area changes: A = (1/2)(100)(y) = 50y → dA/dt = 50(dy/dt) = 150
  • (c) Rate angle θ changes: tan θ = y/100 → sec²θ(dθ/dt) = (1/100)(dy/dt) → dθ/dt = 3/125

Key insight: Substitute y = 50 after differentiating tan θ = y/100. Substituting first gives tan θ = 1/2 (constant), whose derivative is zero.

⚡ Speed-of-light paradox

⚡ The paradox setup

Lighthouse problem: Light at A turns once per second (dθ/dt = 2π radians/second). How fast does point B move up shoreline?

  • Relation: y = 100 tan θ
  • Speed: dy/dt = 100 sec²θ (dθ/dt) = 200π sec²θ

Paradox: As θ approaches 90°, sec θ → ∞, so dy/dt → ∞. Point B moves faster than light c, contradicting relativity!

⚡ Resolution: light travel time

The error: we forgot light takes time to reach B.

  • Arrival time: t = θ/(2π) + z/c (rotation time + travel time)
  • Differentiate: dθ/dt = 2π(1 - (dz/dt)/c)
  • Combine with z' = y' sin θ and y = 100 tan θ
  • Result: y' = 200πc/(c cos²θ + 200π sin θ)

As θ → 90°, this approaches 200πc (finite), not infinity. The shadow can appear to move faster than light, but no physical object or information actually does.

⚠️ Small paradox about y and z

Apparent contradiction: From y = z sin θ, it looks like dy/dt = (dz/dt) sin θ. But the exact opposite is true: dz/dt = (dy/dt) sin θ.

Explanation: The angle θ is also changing with time. Correct differentiation: z² = y² + 100² → z(dz/dt) = y(dy/dt) → dz/dt = (y/z)(dy/dt) = (dy/dt) sin θ.

Don't confuse: you cannot treat θ as constant when it is actually changing—this is why we must differentiate the full relationship, not just substitute.

29

Inverse Functions and Their Derivatives

4.3 Inverse Functions and Their Derivatives

🧭 Overview

🧠 One-sentence thesis

Inverse functions undo each other's operations, and their derivatives obey the fundamental rule that (dx/dy)(dy/dx) = 1, which follows directly from the chain rule applied to the identity f(g(x)) = x.

📌 Key points (3–5)

  • What inverse functions do: If g(x) = y, then the inverse g⁻¹(y) = x; one function undoes what the other does.
  • The derivative rule: The slope of an inverse function is the reciprocal of the original function's slope: dx/dy = 1/(dy/dx).
  • Domain and range swap: The domain of g matches the range of g⁻¹, and vice versa; inputs to one are outputs from the other.
  • Common confusion: Not all functions have inverses—only functions that are steadily increasing or decreasing (one x for each y) can be inverted.
  • How to find inverses: Solve the equation y = g(x) for x to get x = g⁻¹(y); inverting a chain h(g(x)) requires reversing the order: g⁻¹(h⁻¹(z)).

🔄 What inverse functions are

🔄 The core relationship

Inverse functions: If y = g(x) then x = g⁻¹(y); if x = g⁻¹(y) then y = g(x).

  • The defining equation is f(g(x)) = x.
  • Start with any input x, compute y = g(x), then compute f(y), and you must get back x.
  • What one function does, the inverse undoes.
  • Example: If g(x) = x - 2 and f(y) = y + 2, then starting from x = 5 gives y = 3, and f(3) = 5.
  • The notation g⁻¹ is pronounced "g inverse" and is not the same as 1/g(x).

🔁 Symmetry in both directions

  • The relationship is completely symmetric: if f is the inverse of g, then g is the inverse of f.
  • Both compositions return to the start: g⁻¹(g(x)) = x and g(g⁻¹(y)) = y.
  • Example: For y = √x and x = y², the square root of y² is y, and the square of √x is x.

🎯 Domain and range requirements

🎯 The matching rule

  • The domain of a function matches the range of its inverse.
  • The inputs to g⁻¹ are the outputs from g; the inputs to g are the outputs from g⁻¹.
  • Example: For y = √x, the domain is x ≥ 0 (nonnegative inputs), which matches the range of x = y² when y ≥ 0.

⚠️ When inverses don't exist

  • Not all functions have inverses.
  • For each y, the equation g(x) = y must produce only one x.
  • If two points x₁ and x₂ give g(x₁) = g(x₂), then g has no inverse—because g⁻¹(y) cannot equal both x₁ and x₂.
  • Example: sin(x) = 1/2 has many solutions on the interval 0 ≤ x ≤ π, so sine is not invertible over that full interval.
  • To be invertible over an interval, g must be steadily increasing or steadily decreasing.

🧪 Testing for invertibility

  • Horizontal line test: If no horizontal line touches the graph twice, then f(x) is invertible (one x for each y).
  • Don't confuse with the vertical line test (which checks if something is a function at all).

📐 The derivative of inverse functions

📐 The fundamental rule

From f(g(x)) = x, the chain rule gives:

(dx/dy)(dy/dx) = 1 or equivalently dx/dy = 1/(dy/dx)

  • The slope of x = g⁻¹(y) times the slope of y = g(x) equals one.
  • This comes from differentiating both sides of f(g(x)) = x, which gives 1.
  • Example: For y = x³ and x = y^(1/3), we have dy/dx = 3x² and dx/dy = (1/3)y^(-2/3) = 1/(3x²).

🔍 Why the rule works

  • The chain rule applied to f(g(x)) = x gives f'(g(x)) · g'(x) = 1.
  • Writing y = g(x) and x = f(y), this becomes (dx/dy)(dy/dx) = 1.
  • This is not ordinary algebra with fractions, but it is true—the derivatives are limits of fractions Δx/Δy and Δy/Δx.
  • Example: For y = 3x, the slopes dy/dx = 3 and dx/dy = 1/3 multiply to give 1.

📊 Computing derivatives two ways

You can find dx/dy either:

  • Directly: differentiate x = g⁻¹(y) with respect to y.
  • Indirectly: compute 1/(dy/dx) using the original function.

Example: For y = x³ and x = y^(1/3):

  • Direct: dx/dy = (1/3)y^(-2/3)
  • Indirect: dx/dy = 1/(dy/dx) = 1/(3x²) = 1/(3y^(2/3))

Both give the same answer.

📈 Graphs of inverse functions

📈 The mirror-image property

  • The graph of x = g⁻¹(y) is the mirror image of the graph of y = g(x).
  • The reflection is across the 45° line (the line y = x).
  • If point (2, 6) is on the graph of y = g(x), then point (6, 2) is on the graph of x = g⁻¹(y).
  • The same pairs (x, y) appear on both graphs, but with roles swapped.

🔄 Why the reflection works

  • The graph of y = g(x) shows x across (horizontal) and y up (vertical).
  • The graph of x = g⁻¹(y) would naturally show y across and x up—which is the same picture turned across the 45° line.
  • Don't confuse: some books graph y = g⁻¹(x), but the excerpt emphasizes x = g⁻¹(y) as the correct inverse notation.

🔗 Inverting chains of functions

🔗 The reversal rule

The inverse of z = h(g(x)) is a chain of inverses in the opposite order: x = g⁻¹(h⁻¹(z)).

  • h⁻¹ is applied first because h was applied last.
  • The key equation: g⁻¹(h⁻¹(h(g(x)))) = x.
  • In the middle, h⁻¹ and h cancel, leaving g⁻¹(g(x)) = x.

🧩 Why order matters

  • Example: For z = 3(x - 2), we have g(x) = x - 2 and h(y) = 3y.
  • To invert: first divide by 3 (apply h⁻¹), then add 2 (apply g⁻¹).
  • The inverse is x = (z/3) + 2, which is g⁻¹(h⁻¹(z)).
  • The inverse of h ∘ g is g⁻¹ ∘ h⁻¹ (opposite order).

🔢 Practical example

For z = √(x - 2):

  • Write z² = x - 2, then x = z² + 2.
  • The inverse adds 2 and takes the square—but not in that order.
  • Correct: z² + 2 (square first, then add).
  • Wrong: (z + 2)² (this would add first, then square).

🌟 Important examples

🌟 Temperature conversion

  • Fahrenheit to Celsius: y = (5/9)(x - 32)
  • Celsius to Fahrenheit: x = (9/5)y + 32
  • These are inverse functions: (5/9) subtracts 32 first; (9/5)y + 32 adds 32 last.
  • The slopes dy/dx = 5/9 and dx/dy = 9/5 multiply to give 1.

🌟 Exponentials and logarithms (preview)

  • If y = 2^x then x = log₂(y).
  • What the exponential does, the logarithm undoes.
  • The logarithm of 2^x is the exponent x.
  • The excerpt mentions these are "overwhelmingly important" and will get a full chapter later.
  • Natural logarithms use base e: y = e^x is the inverse of x = ln(y).
30

Inverses of Trigonometric Functions

4.4 Inverses of Trigonometric Functions

🧭 Overview

🧠 One-sentence thesis

Inverting trigonometric functions requires restricting their domains to intervals where they are one-to-one, and the chain rule reveals that the derivative of each inverse function is the reciprocal of the original function's slope expressed in terms of the output variable.

📌 Key points (3–5)

  • Why domain restriction matters: Sine, cosine, and tangent repeat infinitely, so we must choose one piece of the curve where each output comes from exactly one input to make the function invertible.
  • How to find derivatives of inverse trig functions: Use the chain rule: the slope of the inverse function equals 1 divided by the slope of the original function, then rewrite in terms of the new variable.
  • The three core derivative formulas: The derivatives of inverse sine, inverse tangent, and inverse secant are the building blocks; the other three (inverse cosine, inverse cotangent, inverse cosecant) differ only by a minus sign.
  • Common confusion—cofunctions: Inverse sine and inverse cosine are "cofunctions" that always add to π/2 (a right angle), so their derivatives add to zero; the same pattern holds for the other pairs.
  • Why these matter in calculus: These derivatives provide new velocity–distance pairs (e.g., velocity 1/√(1 − t²) corresponds to distance sin⁻¹ t) and answer the question "What function has derivative 1/(1 + x²)?" (answer: tan⁻¹ x).

🔄 Why we restrict domains

🔄 The problem with unrestricted trig functions

  • Trigonometric functions like sine go up and down infinitely often.
  • If we allow the whole sine curve, infinitely many angles would satisfy sin x = 0.
  • The sine function could not have an inverse because one output y would correspond to many inputs x.

✂️ Choosing the right interval

Domain restriction: To make a trigonometric function invertible, we select an interval where the function is steadily increasing (or steadily decreasing), so each output comes from exactly one input.

FunctionRestricted domain for xRange for yBehavior on interval
sin x−π/2 ≤ x ≤ π/2−1 ≤ y ≤ 1Increasing
cos x0 ≤ x ≤ π−1 ≤ y ≤ 1Decreasing
tan x−π/2 < x < π/2 (open interval)All yIncreasing
sec x0 ≤ x ≤ π, x ≠ π/2|y| ≥ 1Increasing (with gap)
  • Don't confuse: The restricted domain is for the original function y = sin x; the inverse function x = sin⁻¹ y swaps the roles, so y becomes the input (between −1 and 1) and x becomes the output (between −π/2 and π/2).

📐 Inverse sine and inverse cosine

📐 Definition and notation

x = sin⁻¹ y means that y = sin x and |x| ≤ π/2.

  • The inverse sine starts with a number y between −1 and 1 and produces an angle x (the angle whose sine is y).
  • Historically called "arc sine" (arcsin in computing); the notation sin⁻¹ has nothing to do with 1/sin x.
  • Example: The 30° angle x = π/6 has sine y = 1/2, so sin⁻¹(1/2) = π/6.

🔁 Round-trip identities

  • sin⁻¹(sin x) = x for −π/2 ≤ x ≤ π/2
  • sin(sin⁻¹ y) = y for −1 ≤ y ≤ 1
  • Similarly for cosine: cos⁻¹(cos x) = x and cos(cos⁻¹ y) = y.

🧮 Key formula: cosine in terms of sine

If sin x = y, then cos x = √(1 − y²).

  • This comes from cos² x = 1 − sin² x.
  • Example: If sin x = y, then cos x = cos(sin⁻¹ y) = √(1 − y²).
  • Why it matters: This formula is crucial for computing derivatives.

📉 Derivative of inverse sine

Using the chain rule:

  • y = sin x gives dy/dx = cos x
  • So dx/dy = 1/cos x = 1/√(1 − y²)

Derivative of sin⁻¹ y: dx/dy = 1/√(1 − y²)

  • This gives a new velocity–distance pair: velocity v(t) = 1/√(1 − t²) corresponds to distance f(t) = sin⁻¹ t.
  • Extreme case: At y = 1, the slope is infinite (1/0) because the graph of y = sin x is horizontal there (slope zero), so its mirror image is vertical.

🔻 Derivative of inverse cosine

The inverse cosine and inverse sine are cofunctions:

Cofunction identity: cos⁻¹ y + sin⁻¹ y = π/2

  • The sum is constant (π/2), so its derivative is zero.
  • Therefore the derivatives of cos⁻¹ y and sin⁻¹ y must add to zero (opposite signs).

Derivative of cos⁻¹ y: dx/dy = −1/√(1 − y²)

  • Don't confuse: Two functions can have the same derivative if they differ by a constant; here sin⁻¹ y = −cos⁻¹ y + C with C = π/2.

📊 Inverse tangent and its derivative

📊 Definition and domain

x = tan⁻¹ y means that y = tan x and −π/2 < x < π/2 (open interval).

  • The tangent can be any number (all real y), but the inverse tangent produces an angle between −π/2 and π/2.
  • The interval is "open" because the endpoints are not included (the tangents of ±π/2 are not defined).
  • Not a ratio: The inverse tangent is not sin⁻¹ y / cos⁻¹ y; it is the angle whose tangent is y.

🧮 Derivative of inverse tangent

The slope of y = tan x is dy/dx = sec² x.

  • By the chain rule: dx/dy = 1/sec² x = 1/(1 + tan² x) = 1/(1 + y²)

Derivative of tan⁻¹ y: df/dy = 1/(1 + y²)

  • Important application: What function has derivative 1/(1 + x²)? Answer: f(x) = tan⁻¹ x (just change letters).
  • Example: The tangent of x = π/4 is y = 1. On the inverse tangent curve, dx/dy = 1/(1 + 1) = 1/2. On the tangent curve, dy/dx = sec²(π/4) = 2. The slopes multiply to give 1, as required by the chain rule.

🔢 The remaining three inverse functions

🔢 Inverse cotangent, secant, and cosecant

The idea is to use 1/(dy/dx) for y = cot x, y = sec x, and y = csc x.

Inverse functionDerivativeNotes
cot⁻¹ y−1/(1 + y²)Same as tan⁻¹ y but with minus sign
sec⁻¹ y1/(|y|√(y² − 1))Requires |y| ≥ 1; absolute value for positive slope
csc⁻¹ y−1/(|y|√(y² − 1))Same as sec⁻¹ y but with minus sign

🔗 Cofunction pairs and minus signs

Each inverse function and its "cofunction" add to π/2, so their derivatives add to zero:

  • sin⁻¹ y + cos⁻¹ y = π/2 → derivatives differ by minus sign
  • tan⁻¹ y + cot⁻¹ y = π/2 → derivatives differ by minus sign
  • sec⁻¹ y + csc⁻¹ y = π/2 → derivatives differ by minus sign

Only three derivatives to learn: The other three just have minus signs.

📝 Note on inverse secant

  • When y is negative, there is a choice for x = sec⁻¹ y.
  • The excerpt selects the angle in the second quadrant (between π/2 and π) so that sec⁻¹ y = cos⁻¹(1/y), matching sec x = 1/cos x.
  • This choice makes sec⁻¹ y an increasing function (cos⁻¹ y is decreasing), requiring the absolute value |y| in the derivative.
  • Don't confuse: Some tables make a different choice (third quadrant), which omits the absolute value.

🚫 Domain gaps

  • For sec⁻¹ y and csc⁻¹ y, the input must satisfy |y| ≥ 1 (the graph misses all points −1 < y < 1).
  • The graph of sec⁻¹ y misses x = π/2 (where cosine is zero, so secant would be 1/0).
  • The graph of csc⁻¹ y misses x = 0 (because csc 0 would be 1/sin 0 = 1/0).

📋 Summary table

📋 Quick reference for all six inverse functions

Function f(y)Inputs yOutputs xSlope dx/dy
sin⁻¹ y, cos⁻¹ y|y| ≤ 1[−π/2, π/2], [0, π]±1/√(1 − y²)
tan⁻¹ y, cot⁻¹ yAll y(−π/2, π/2), (0, π)±1/(1 + y²)
sec⁻¹ y, csc⁻¹ y|y| ≥ 1[0, π], [−π/2, π/2]±1/(|y|√(y² − 1))

*Asterisks indicate that certain points (x = π/2 for sec⁻¹, x = 0 for csc⁻¹) are removed.

  • The column of derivatives is what we need and use in calculus.
  • Pattern: Each pair (sin⁻¹ and cos⁻¹, tan⁻¹ and cot⁻¹, sec⁻¹ and csc⁻¹) has the same formula except for sign.
31

The Idea of the Integral

5.1 The Idea of the Integral

🧭 Overview

🧠 One-sentence thesis

Integration solves the problem of adding infinitely many infinitesimal quantities by working backward from derivatives—finding a function f(x) whose derivative is the given v(x)—rather than performing the addition directly.

📌 Key points (3–5)

  • Core idea: Integration is the reverse of differentiation; if v(x) is the derivative of f(x), then f(x) is the integral (antiderivative) of v(x).
  • The fundamental insight: Taking sums reverses taking differences in algebra; taking integrals reverses taking derivatives in calculus.
  • Geometric meaning: The integral represents the area under a curve, computed as the limit of sums of rectangle areas as rectangles become infinitely thin.
  • Common confusion: Don't try to add infinitely many infinitesimals directly—the "whole point of calculus is to offer a better way" by finding the antiderivative.
  • From discrete to continuous: The transition from algebra (finite sums) to calculus (integrals) happens by taking limits as the number of terms approaches infinity and each term becomes infinitesimal.

🔄 The Fundamental Connection Between Sums and Differences

➕ How sums reverse differences (algebra version)

Fundamental Theorem (before limits): If each v_j = f_j − f_(j−1), then v_1 + v_2 + ⋯ + v_n = f_n − f_0.

  • Start with two sets of n numbers: v's (suggesting velocity) and f's (suggesting distance).
  • Going from f to v: Take differences between consecutive f's to get each v.
  • Going from v to f: Add up all the v's to recover the final f (minus the starting f_0).
  • Example from the excerpt: Given f = 1, 3, 6, 10, the differences are v = 1, 2, 3, 4; conversely, summing v = 1, 2, 3, 4 gives back f_4 = 10 (when f_0 = 0).

🔭 The telescoping sum

The sum "telescopes" because intermediate terms cancel:

  • v_1 + v_2 + v_3 + ⋯ = (f_1 − f_0) + (f_2 − f_1) + (f_3 − f_2) + ⋯
  • Every f_1 is canceled by −f_1, every f_2 by −f_2, etc.
  • Only the last term f_n and the starting term −f_0 survive.
  • Why this matters: We can compute a sum by finding the right f's, rather than adding term by term.

🎯 Example: adding odd numbers

  • Question: How to add 1 + 3 + 5 + ⋯ + 99?
  • The odd numbers are differences between squares: 0, 1, 4, 9, 16, …
  • There are 50 odd numbers from 1 to 99.
  • By the Fundamental Theorem, the sum equals (50)² = 2500.
  • The tricky part: Discovering that the f's are squares; in calculus, the tricky part is finding the right f(x).

📐 From Rectangles to Areas Under Curves

💰 The income example (discrete case)

The excerpt uses a company earning √x million dollars per year after x years.

  • Bar graph approach: In the first 4 years, income rates are √1, √2, √3, √4 million/year.
  • Each rectangle has base = 1 year and height = income rate.
  • Total income = sum of rectangle areas = √1 + √2 + √3 + √4 ≈ 6.15 million.
  • Two perspectives: (1) arithmetic sum of rates × time; (2) geometric sum of rectangle areas.

🔬 Refining the estimate

  • Problem: The bar graph overstates income because the rate changes continuously, not in yearly jumps.
  • Divide time into quarters (16 rectangles): total area ≈ 5.56 million, closer to truth.
  • Divide into weeks (208 rectangles), then days (1460 rectangles), then hours…
  • Each refinement gives thinner rectangles with heights √(Δx), √(2Δx), …, √4.
  • The limit: As Δx → 0, the number of rectangles → ∞, and the sum approaches the true area under the √x curve.

🎨 The integral as a limit

The area under the curve is the limit of the sum as Δx → 0.

  • This limiting area is called the integral.
  • The discrete sum v_1 + v_2 + ⋯ + v_n becomes a continuous integral.
  • Don't confuse: The integral is not an infinite sum you compute by adding; it's a limit that you find by discovering the antiderivative f(x).

🔑 The Key Idea: Reversing the Derivative

🧮 Algebra vs. calculus comparison

Algebra (finite)Calculus (limit)
Compute v_1 + ⋯ + v_nCompute limit of Δx[v(x) + v(2Δx) + ⋯]
Find f's such that v_j = f_j − f_(j−1)Find f(x) such that v(x) = df/dx
Sum = f_n − f_0Integral = area under v(x) curve

🔍 What integration means

  • Integration is finding an antiderivative: If v(x) is the derivative of some function f(x), then f(x) is the integral of v(x).
  • Example from the excerpt: The integral of v = cos x is f = sin x; the integral of v = x is f = (1/2)x².
  • Why this works: Instead of adding infinitely many infinitesimals, we find f and use the Fundamental Theorem (in its calculus form).

⚠️ The starting point issue

  • In the discrete case, we need f_0 as a starting value.
  • In calculus, this becomes the "constant of integration" C.
  • Don't confuse: Any constant can be added to f(x) because the derivative of a constant is zero; the differences between f's remain unchanged.

🎓 The Transition to Calculus

📊 Optimist vs. pessimist estimates

  • Optimist: Uses the income rate at the end of each period (heights √(Δx), √(2Δx), …, √4).
  • Pessimist: Uses the rate at the beginning of each period (heights 0, √(Δx), …, √(4−Δx)).
  • The optimist always overestimates; the pessimist always underestimates.
  • As time periods shrink, both estimates converge to the same limit: the true area under the √x curve.
  • Key insight: The difference between them is the area of the last rectangle, which shrinks to zero.

🌊 The limiting process

  • Years → weeks → days → hours → seconds → continuous time.
  • The number of rectangles grows: 4 → 16 → 208 → 1460 → …
  • Each rectangle's base Δx shrinks toward zero.
  • The sum Δx · [v(x) + v(2Δx) + ⋯] approaches a definite number: the integral.
  • Why limits are essential: This is "the essential difference between algebra and calculus"—we must take the limit "to add up infinitely many infinitesimals."

🧩 The derivative connection

From Problem 15 in the excerpt:

  • Let f(x) = area under the √x curve from 0 to x.
  • The area from 0 to x + Δx is f(x + Δx).
  • The extra area Δf is almost a rectangle with base Δx and height √x.
  • So Δf/Δx is close to √x.
  • As Δx → 0, we suspect df/dx = √x.
  • This confirms: If f(x) is the integral of v(x), then v(x) is the derivative of f(x).
32

Antiderivatives

5.2 Antiderivatives

🧭 Overview

🧠 One-sentence thesis

The antiderivative reverses differentiation to find the area under a curve, enabling exact calculation of integrals through the Fundamental Theorem of Calculus.

📌 Key points (3–5)

  • What an antiderivative is: a function f(x) whose derivative equals the given function v(x); finding it means working backward from derivatives.
  • The integral symbol and its meaning: ∫ (stretched S from Latin "sum") represents the limit of rectangular areas approaching the curved area under a graph.
  • Definite vs indefinite integrals: indefinite integrals are functions f(x) + C (with arbitrary constant C); definite integrals are numbers f(b) − f(a) between limits a and b.
  • Common confusion: the constant C appears in indefinite integrals but cancels when computing definite integrals (the difference f(b) − f(a)).
  • The core technique: to find the antiderivative of x^n, raise the exponent to n+1 and divide by n+1, then adjust to cancel unwanted factors from differentiation.

🔄 Working backward from derivatives

🔄 The antiderivative concept

Antiderivative: a function f(x) such that df/dx = v(x); it reverses the derivative operation.

  • Instead of finding the derivative of a given function, we find a function that produces a given derivative.
  • The excerpt calls this "the opposite of Chapters 2–4" and requires working backwards.
  • Example: since the derivative of x^5 is 5x^4, the antiderivative of x^4 is x^5/5 (divide by 5 to cancel the factor from differentiation).

🧮 The power rule in reverse

General formula: the antiderivative of x^n is x^(n+1)/(n+1), provided n+1 ≠ 0.

Why it works:

  • The derivative lowers the exponent (x^n becomes nx^(n−1)).
  • The antiderivative raises it back (x^n becomes x^(n+1)).
  • Dividing by (n+1) cancels the unwanted factor that differentiation would produce.

Example from the excerpt: for v(x) = √x = x^(1/2):

  • Raise exponent: x^(3/2)
  • The derivative would be (3/2)x^(1/2), so divide by 3/2
  • Result: f(x) = (2/3)x^(3/2)

📐 From rectangles to exact areas

📐 The limiting process

The excerpt describes approximating curved areas with rectangles:

  • Rectangles have base Δx and heights v(x) at sample points.
  • As Δx → 0, the sum v₁Δx + v₂Δx + ⋯ approaches the integral ∫v(x)dx.
  • The "dx" in the integral notation indicates that Δx approaches zero.

Two requirements for valid approximation:

  1. The largest width Δx_max must approach zero.
  2. The top of each rectangle must touch or cross the curve.

🎯 Computing exact areas

Once the antiderivative f(x) is known, the area from x = 0 to x = 4 is simply f(4) − f(0).

Example: for v(x) = √x with f(x) = (2/3)x^(3/2):

  • At x = 4: f(4) = (2/3)(8) = 16/3
  • At x = 0: f(0) = 0
  • Total area = 16/3 = 5⅓ million dollars (in the income context)

The excerpt notes this is the exact answer that thousands of rectangles were slowly approaching.

🔀 Definite vs indefinite integrals

🔀 Indefinite integrals (functions)

Indefinite integral: the most general antiderivative, written ∫v(x)dx = f(x) + C, where C is an arbitrary constant.

  • It is a function, not a number.
  • Contains the variable x.
  • Includes constant C because the derivative of any constant is zero.
  • Example: ∫(4 − x)dx = 4x − (1/2)x² + C

All these are antiderivatives of the same function:

  • f(x) = 4x − (1/2)x²
  • f(x) = 4x − (1/2)x² + 1
  • f(x) = 4x − (1/2)x² − 9
  • f(x) = 4x − (1/2)x² + C (general form)

🔢 Definite integrals (numbers)

Definite integral: the area under v(x) between limits a and b, written ∫ₐᵇv(x)dx = f(b) − f(a).

  • It is a number, not a function.
  • Contains no variable x and no arbitrary constant C.
  • Determined by the function v(x) and the endpoints (limits of integration).

Why C cancels:

  • At x = 3: f(3) = 7½ + C
  • At x = 1: f(1) = 3½ + C
  • Difference: f(3) − f(1) = (7½ + C) − (3½ + C) = 4

The constant C appears in both terms and cancels in the subtraction.

📊 Comparison table

FeatureIndefinite integralDefinite integral
Result typeFunction f(x)Number
Contains x?YesNo
Contains C?Yes (arbitrary constant)No (cancels)
Notation∫v(x)dx∫ₐᵇv(x)dx
Example4x − (1/2)x² + C8 (from 0 to 4)

🎓 The Fundamental Theorem preview

🎓 Connecting sums and integrals

The excerpt draws a parallel between discrete and continuous cases:

In algebra (discrete):

  • Difference: fⱼ − fⱼ₋₁ = vⱼ
  • Sum: v₁ + ⋯ + vₙ = fₙ − f₀

In calculus (continuous):

  • Derivative: df/dx = v(x)
  • Integral: ∫ₐᵇv(x)dx = f(b) − f(a)

🎓 The Fundamental Theorem statement

The excerpt states (equation 7):

∫ₐᵇv(x)dx = ∫ₐᵇ(df/dx)dx = f(b) − f(a)

What this means:

  • The integral of v(x) equals the difference in its antiderivative f(x) at the endpoints.
  • Integration and differentiation are inverse operations.
  • The excerpt notes the proof comes later but gives the answer by following the analogy.

Don't confuse: the excerpt emphasizes this is a preview; the full proof is "the Fundamental Theorem of Calculus" covered later.

33

Summation versus Integration

5.3 Summation versus Integration

🧭 Overview

🧠 One-sentence thesis

Summation formulas for discrete sums (like 1 + 2 + ... + n) approach integral formulas for continuous areas as the number of rectangles increases and their width shrinks to zero, with correction terms vanishing in the limit.

📌 Key points (3–5)

  • Sigma notation provides a compact way to write sums: Σ from j=1 to n of v_j means v₁ + v₂ + ... + vₙ
  • Gauss's formula for the sum 1 + 2 + ... + n equals (1/2)n(n+1), and the sum of squares 1² + 2² + ... + n² equals (1/3)n³ plus correction terms
  • Correction terms disappear in limits: discrete sums like (1/3)n³ + (1/2)n² + (1/6)n approach the clean integral result (1/3)n³ as rectangles become infinitely thin
  • Common confusion: summation (discrete) vs integration (continuous)—sums have correction terms that depend on n, but integrals give exact answers without corrections
  • The Fundamental Theorem connection: if f_n - f_(n-1) = v_n for sums, then df/dx = v(x) for integrals; both relate accumulation to rates of change

📝 Sigma notation and dummy variables

📝 What sigma notation means

Summation notation uses the capital Greek letter Σ (sigma) to express sums compactly.

  • Structure: Σ from j=1 to n of v_j means add v₁ + v₂ + ... + vₙ
  • The number below Σ is the lower limit (where to start)
  • The number above Σ is the upper limit (where to stop)
  • Example: Σ from j=1 to 4 of j² = 1² + 2² + 3² + 4² = 30

🔄 Dummy variables have no meaning

  • The letter used for the index (j, k, i, etc.) is a dummy variable
  • It appears only on the Σ side, not in the final numerical answer
  • Σ from j=1 to 4 of j² = Σ from k=1 to 4 of k² = 30 (same result, different letter)
  • Don't confuse: the upper limit n appears on both sides and affects the sum; the dummy variable does not

🔀 Changing limits and variables

When you shift the index, you must change three things consistently:

  • The lower limit
  • The upper limit
  • The expression being summed

Example: Σ from j=101 to 200 of j equals Σ from k=1 to 100 of (k+100)

  • Here j = k + 100, so when j starts at 101, k starts at 1
  • When j ends at 200, k ends at 100
  • The sum 101 + 102 + ... + 200 equals (1+100) + (2+100) + ... + (100+100)

🧮 Special summation formulas

🧮 Gauss's sum of integers

The sum 1 + 2 + 3 + ... + n equals (1/2)n(n+1).

Gauss's insight: pair first and last terms

  • 1 + 100 = 101
  • 2 + 99 = 101
  • 3 + 98 = 101
  • There are 50 such pairs, so the sum is 50 × 101 = 5050

General formula: (1/2)n(n+1) = (1/2)n² + (1/2)n

  • The leading term is (1/2)n²
  • The correction term is (1/2)n

Example: The sum from 101 to 200 equals (sum to 200) - (sum to 100) = 15,050

🔲 Sum of squares

The sum 1² + 2² + 3² + ... + n² equals (1/3)n³ + (1/2)n² + (1/6)n.

How to find it: guess and verify using mathematical induction

  • Try f_n = (1/3)n³ first (copying from integrals)
  • Check if f_n - f_(n-1) = n²
  • Result: (1/3)n³ - (1/3)(n-1)³ = n² - n + 1/3 (not quite right)
  • Add correction terms (1/2)n² + (1/6)n to fix the difference

Important: the leading term (1/3)n³ dominates for large n

  • For n = 100: exact sum is 338,350
  • Leading term alone: (1/3)(100)³ ≈ 333,333
  • Correction terms add about 5,017

🔢 Mathematical induction principle

To prove a formula f_n is correct for all n:

  1. Check the base case: verify f₁ is correct
  2. Check the change: verify f_n - f_(n-1) equals v_n

If both checks pass, the formula is proven for all n.

🏗️ From rectangles to integrals

🏗️ Rectangular approximation of area

To find the area under v = x² from 0 to 100:

  • Divide into n rectangles, each of width Δx = 100/n
  • The j-th rectangle has height (j·Δx)²
  • Total area = Σ from j=1 to n of (j·Δx)² · Δx

Factor out (Δx)³: area = (Δx)³ · [1² + 2² + ... + n²]

  • Substitute the sum of squares formula
  • Result: (Δx)³ · [(1/3)n³ + (1/2)n² + (1/6)n]
  • Since n = 100/Δx, this becomes: (1/3)(100)³ + (1/2)(100)²·Δx + (1/6)(100)·(Δx)²

✨ Correction terms vanish

As rectangles get thinner (Δx → 0):

  • The leading term (1/3)(100)³ stays constant
  • The term (1/2)(100)²·Δx → 0
  • The term (1/6)(100)·(Δx)² → 0 even faster

Why they vanish: correction terms account for the small corners of rectangles above the curve; as rectangles thin out, these corners disappear.

Example with 100, 1000, 10000 rectangles:

  • 100 rectangles: area ≈ 338,350
  • 1000 rectangles: area ≈ 333,833.5
  • Limit: area = exactly 333,333.333... = (1/3)(100)³

🎯 The clean integral answer

The integral of v = x² from 0 to n is exactly (1/3)n³ with no correction terms.

Contrast with summation:

MethodFormulaCorrection terms?
Summation(1/3)n³ + (1/2)n² + (1/6)nYes, depend on n
Integration(1/3)n³No, exact answer

The antiderivative of v(x) = x² is f(x) = (1/3)x³, so the area from 0 to 100 is f(100) - f(0) = (1/3)(100)³.

⚡ Convergence speed matters

⚡ Slow convergence is unacceptable

For v = x from 0 to 4 with n rectangles:

  • Exact area = 8
  • Rectangular area = 8 + 2Δx (where Δx = 4/n)
  • Error is proportional to Δx = 4/n

The problem: to get error of 10⁻⁶ requires 8 million rectangles

  • Each time you double the rectangles, error only halves
  • This is "much too slow" for practical computation

🎯 Better methods exist

Endpoint rule (rectangles touch curve at right edge):

  • Error proportional to Δx
  • Convergence: error ∝ 1/n

Midpoint rule (rectangles cross curve at center):

  • Much better: error proportional to (Δx)²
  • Convergence: error ∝ 1/n²
  • Only 1000 rectangles needed for 10⁻⁶ accuracy

Don't confuse: unbalanced rectangles (one side too high) vs balanced rectangles (crossing the curve in the middle)

📊 General pattern for p-th powers

The sum 1^p + 2^p + ... + n^p equals [1/(p+1)]n^(p+1) plus correction terms.

The integral of v = x^p from 0 to n is exactly [1/(p+1)]n^(p+1).

Key insight: calculus doesn't care if n or p are integers; the formula works for any p where p+1 > 0.

🔗 The Fundamental Theorem connection

🔗 Parallel structure

For discrete sums:

  • If f₁ + f₂ + ... + fₙ is the sum, then fₙ - f_(n-1) is the last term added
  • Formula: if Σ v_j = f_n, then v_n = f_n - f_(n-1)

For continuous integrals:

  • If ∫v(x)dx = f(x), then df/dx = v(x)
  • The reverse of slope (derivative) is area (integral)

🌉 The limit transition

As Δx → 0 and n → ∞:

  • Σ becomes ∫ (sum becomes integral)
  • v_j becomes v(x) (discrete values become continuous function)
  • Δx becomes dx (finite width becomes infinitesimal)
  • Correction terms → 0 (discrete errors vanish)

Not yet fully proved: the text acknowledges this requires careful handling of multiple limits simultaneously, deferred to a later section.

34

Indefinite Integrals and Substitutions

5.4 Indefinite Integrals and Substitutions

🧭 Overview

🧠 One-sentence thesis

Substitution is the most powerful technique for finding antiderivatives by reversing the chain rule, allowing us to transform complicated integrals into simpler forms that we already know how to integrate.

📌 Key points (3–5)

  • What indefinite integrals are: finding a function f(x) whose derivative is the given v(x), rather than computing areas as definite integrals.
  • Linearity makes combinations easy: constants factor out, and integrals of sums equal sums of integrals.
  • Substitution reverses the chain rule: choose an inside function u whose derivative du/dx appears in the integral, transform to u, integrate, then substitute back to x.
  • Common confusion: you cannot substitute u for any expression—the derivative du/dx must be present (possibly after adjusting constants).
  • Why some integrals fail: not every function has an elementary antiderivative; substitution requires the right structure.

🔍 Two approaches to integration

🔍 Indefinite vs definite integrals

  • Definite integral: computes a number by summing rectangular areas as delta-x approaches zero.
  • Indefinite integral: finds a function f(x) whose derivative is v(x).
  • The excerpt emphasizes these give the same answer but work differently.
  • Computers can always compute definite integrals numerically; symbolic codes like MACSYMA or Mathematica are needed to find antiderivative formulas.

📋 Known antiderivative pairs

The excerpt provides a reference table of basic pairs where df/dx = v(x):

Typev(x)f(x)
Powersx to the nx to the (n+1) divided by (n+1), plus C
Trigcos xsin x plus C
Trigsin xnegative cos x plus C
Inverse trig1 divided by square root of (1 minus x squared)inverse sine of x plus C
Inverse trig1 divided by (1 plus x squared)inverse tangent of x plus C
  • Every integration formula comes directly from reversing a differentiation formula.
  • The constant C appears because derivatives of constants are zero.

➕ Linearity rules

➕ Sum and constant rules

Sum rule: the antiderivative of v(x) plus w(x) is the sum of their separate antiderivatives.

Constant rule: the antiderivative of c times v(x) is c times the antiderivative of v(x).

  • These combine into full linearity: the antiderivative of a times v plus b times w equals a times the antiderivative of v plus b times the antiderivative of w.
  • Proof: derivatives are linear, so (af plus bg) prime equals af prime plus bg prime.
  • Practical tip: factor out constants immediately to simplify the integral.

🔢 The plus C convention

  • All antiderivatives of the same function differ only by a constant.
  • For combinations like av(x) plus bw(x), the constants from each part combine into a single C.
  • Write "plus C" once at the end to give all possible antiderivatives.

Example: The antiderivative of x squared plus x to the negative 2 is (x cubed divided by 3) plus (x to the negative 1 divided by negative 1) plus C.

🔄 Substitution method

🔄 The core idea

Substitution reverses the chain rule for derivatives:

  • The chain rule says: the derivative of f(g(x)) is f prime of g(x) times dg/dx.
  • Reversing: if you see v(u(x)) times du/dx, the antiderivative is f(u(x)) plus C.

Example: sin of (x squared) has derivative (cos of x squared) times 2x. Therefore, the integral of x times cos of (x squared) is (1/2) times sin of (x squared) plus C.

🎯 Choosing the inside function u

Two critical points:

  1. Constants are fixable: you can multiply and divide by constants like 2 or 15 to get the exact du/dx you need.
  2. The derivative du/dx must be present: you cannot substitute u for any expression unless its derivative appears (possibly after constant adjustment).

Success cases:

  • Integral of 2x times cos of (x squared): works because u = x squared gives du/dx = 2x, which is there.
  • Integral of x squared times (x cubed plus 1) to the fourth: works because u = x cubed plus 1 gives du/dx = 3x squared, and we can fix the factor of 3.

Failure cases:

  • Integral of cos of (x squared): fails because du/dx = 2x is missing.
  • Integral of x squared times cos of (x squared): fails because du/dx = 2x is wrong (we have x squared instead).

📝 Four-step substitution process

  1. Choose u(x) and compute du/dx.
  2. Locate v(u) times du/dx times dx, or equivalently v(u) times du.
  3. Integrate the integral of v(u) du to find f(u) plus C.
  4. Substitute back: replace u with u(x) in the antiderivative.

Example: Integral of (cos of square root of x) times dx divided by (2 times square root of x):

  • Step 1: u = square root of x, so du/dx = 1 divided by (2 times square root of x).
  • Step 2: This is integral of cos u times du.
  • Step 3: Integral is sin u plus C.
  • Step 4: Answer is sin of (square root of x) plus C.

⚙️ Special cases: shifts and rescaling

Shifts: u = x plus c has du/dx = 1 (automatic).

  • Integral of (x plus 2) cubed dx equals (1/4) times (x plus 2) to the fourth plus C.

Rescaling: u = cx has du/dx = c.

  • Integral of cos of (2x) dx equals (1/2) times sin of (2x) plus C.
  • The factor c from the chain rule cancels the 1/c you introduce.

General rules:

  • Integral of v(x plus c) dx = f(x plus c)
  • Integral of v(cx) dx = (1/c) times f(cx)

Combined: Integral of cos of (3x plus 7) dx equals (1/3) times sin of (3x plus 7) plus C.

⚠️ Common mistakes

Don't confuse dx with du: The factor du/dx from the chain rule is absolutely needed.

Nonexample: The integral of (x squared plus 1) squared dx does NOT equal (1/3) times (x squared plus 1) cubed. Why? Because if u = x squared plus 1, then du/dx = 2x is missing from the integral.

When substitution fails: Some functions like cos of (x squared) or 1 divided by square root of (a to the fourth minus sine squared of x) have no elementary antiderivative. They can be computed numerically but not expressed in closed form.

35

The Definite Integral

5.5 The Definite Integral

🧭 Overview

🧠 One-sentence thesis

The definite integral assigns a specific numerical value to the area under a curve by choosing a starting constant and can be rigorously defined as the limit of rectangular approximations, even when no simple antiderivative formula exists.

📌 Key points (3–5)

  • From indefinite to definite: The indefinite integral f(x) + C becomes definite by setting C = -f(a) so that area starts at zero, yielding the formula ∫ from a to b of v(x) dx = f(b) - f(a).
  • Two viewpoints: When f is known, the formula computes area; when f is unknown, the area itself defines the integral f(x).
  • Riemann sums as definition: The integral is the common limit of lower sums s (using minimum heights) and upper sums S (using maximum heights) as the mesh width approaches zero.
  • Common confusion: Not every function is Riemann integrable—the lower and upper sums must converge to the same limit; continuous functions always satisfy this, but some discontinuous functions (like the rational/irrational indicator) do not.
  • Substitution changes limits: When substituting u = u(x), the limits change from [a, b] on x to [u(a), u(b)] on u.

🎯 Choosing the constant of integration

🎯 Setting the starting area to zero

The constant C in f(x) + C is chosen so that f(a) + C = 0, giving C = -f(a).

  • The indefinite integral contains an arbitrary constant because any vertical shift has the same derivative.
  • For a definite integral (a specific number), we want area = 0 at the starting point x = a.
  • This yields the area from a to x as: ∫ from a to x of v(t) dt = f(x) - f(a).
  • The area from a to b is: ∫ from a to b of v(x) dx = f(b) - f(a).

📐 The fundamental calculation pattern

The excerpt emphasizes a two-step process:

  1. Find f(x) such that df/dx = v(x).
  2. Substitute limits: Compute f(b) - f(a), written as f(x) evaluated from a to b (using brackets or a vertical bar).

Example: For v(x) = 5(x + 1)⁴, the antiderivative is f(x) = (x + 1)⁵, so ∫ from a to b equals (b + 1)⁵ - (a + 1)⁵.

Don't confuse: Adding any constant to f(x) (like writing (x + 1)⁵ - 1) gives an equally valid antiderivative, but f(b) - f(a) remains unchanged because the constant cancels.

🔄 Substitution with definite integrals

🔄 Changing the variable changes the limits

When substituting u = u(x):

  • The new limits on u are u(a) and u(b).
  • The formula becomes: ∫ from a to b of v(u(x)) · (du/dx) dx = ∫ from u(a) to u(b) of v(u) du.

Example: For ∫ from 0 to 1 of (x² + 5)³ · x dx, set u = x² + 5, so du/dx = 2x. The integral becomes ∫ from 5 to 6 of u³ · (du/2) = (1/8)u⁴ evaluated from 5 to 6.

⚠️ When no elementary formula exists

The excerpt notes that ∫ sin(x²) dx has no simple algebraic antiderivative.

  • Trying f(x) = cos(x²) fails because the chain rule produces an extra 2x that cannot be adjusted away.
  • Key insight: Every continuous v(x) still has an antiderivative f(x), even if we cannot write it in closed form.
  • The area under the graph provides the definition of f(x) through Riemann sums.

📏 Riemann sums and the definition of the integral

📏 The setup: dividing the interval

Problem: Integrate continuous v(x) over [a, b].

Step 1: Split [a, b] into n subintervals using meshpoints x₁, x₂, ..., with x₀ = a and xₙ = b.

  • Each subinterval k has length Δxₖ = xₖ - xₖ₋₁.
  • In subinterval k, let mₖ = minimum of v(x) and Mₖ = maximum of v(x).

📊 Lower and upper sums

Lower sum s: The total area of rectangles with height mₖ (minimum) in each subinterval.
Upper sum S: The total area of rectangles with height Mₖ (maximum) in each subinterval.

  • s = m₁Δx₁ + m₂Δx₂ + ... + mₙΔxₙ
  • S = M₁Δx₁ + M₂Δx₂ + ... + MₙΔxₙ
  • The true area under v(x) satisfies: s ≤ area ≤ S.

Key behavior: As new dividing points are added, the lower sum s increases (finer rectangles capture more area) and the upper sum S decreases (finer rectangles exclude more excess).

🎯 The definition via limits

Definition: The area A is the common limit of s and S as the maximum mesh width approaches zero: s → A and S → A as Δx_max → 0.

  • If this common limit exists, A is the Riemann integral of v(x) from a to b.
  • The limit exists for all continuous functions and some discontinuous ones.

🔢 Riemann sums (intermediate rectangles)

Between s and S, we can use any height v(xₖ) where xₖ is any point in subinterval k.

  • Riemann sum S*: v(x₁)Δx₁ + v(x₂)Δx₂ + ... + v(x*ₙ)Δxₙ.
  • The midpoint rule chooses x*ₖ at the center of each subinterval and is often more accurate than s or S.
  • All such sums approach the same limit A if v is continuous.

🧪 Which functions are integrable?

✅ Continuous functions are always integrable

The excerpt states (with proof sketch):

  • Every continuous function on a closed interval [a, b] is Riemann integrable.
  • The proof relies on uniform continuity: for any ε, there exists a δ (independent of position) such that v(x) varies by less than 2ε within any interval of width 2δ.
  • As ε → 0, the gap S - s shrinks to zero, so s and S converge to the same number A.

❌ Some discontinuous functions are not integrable

Example (Remark 1): Define V(x) = 1 at every fraction (rational number) and V(x) = 0 at every irrational number.

  • Every subinterval contains both rationals and irrationals.
  • Therefore mₖ = 0 and Mₖ = 1 in every subinterval.
  • The lower sum is always s = 0; the upper sum is always S = b - a.
  • The gap never closes, so V(x) is not Riemann integrable.

Don't confuse: A step function (with finitely many jumps) is integrable because only the intervals containing jumps have mₖ ≠ Mₖ, and their total width shrinks to zero.

🔬 Advanced note: Lebesgue integration

The excerpt mentions (Remark 5) that modern mathematics uses Lebesgue's approach to integrate functions like V(x).

  • Lebesgue allows infinitely many subintervals (of shrinking width) to cover all rationals with total width ε.
  • Since irrationals are "uncountable" and V(x) = 0 there, the Lebesgue integral of V(x) is zero.
  • This extends integration beyond Riemann's definition but is not needed for continuous functions.

🔑 Key properties and notation

🔑 Standard notation

  • f(x) evaluated from a to b is written with brackets [f(x)] from a to b or a vertical bar f(x)|ᵇₐ.
  • This always means f(b) - f(a): the upper limit gives +f(b), the lower limit gives -f(a).

🔑 Why the definition matters

The excerpt emphasizes two perspectives:

  1. Computational: When we know f(x), the formula ∫ from a to b of v(x) dx = f(b) - f(a) computes area.
  2. Theoretical: When we don't know f(x), the area (defined via Riemann sums) constructs the antiderivative.

This completes the circle: the integral (area) leads back to the derivative through the Fundamental Theorem of Calculus.

36

Properties of the Integral and Average Value

5.6 Properties of the Integral and Average Value

🧭 Overview

🧠 One-sentence thesis

The integral obeys seven fundamental properties—including addition over neighboring intervals, sign reversal when going backward, and the Mean Value Theorem—that enable its most important application: computing the average value of a continuous function over an interval, which extends the discrete average to a continuum and forms the foundation of continuous probability.

📌 Key points (3–5)

  • Seven basic properties: integrals add over neighboring intervals, equal zero from a point to itself, reverse sign when limits are swapped, cancel for odd functions, add for even functions, preserve inequalities, and guarantee an average value at some point c.
  • Average value formula: the average of v(x) from a to b is (1/(b − a)) times the integral from a to b, extending the discrete average (sum divided by n) to the continuous case.
  • Mean Value Theorem for integrals: if v(x) is continuous, there exists a point c where v(c) equals the average value—visualized as a rectangle with the same area as the region under the curve.
  • Common confusion: "expected value" does not mean the outcome you literally expect; it is the probability-weighted average, which is predictable even when individual outcomes are random.
  • From discrete to continuous probability: as intervals shrink, sums of (outcome × probability) become integrals of x p(x) dx, where p(x) is the probability density.

📐 The Seven Properties

📐 Property 1: Addition over neighboring intervals

If v(x) is integrated from a to b and then from b to c, the sum equals the integral from a to c.

  • Formula: integral from a to b plus integral from b to c equals integral from a to c.
  • Why it works: rectangular areas obey this rule, and their limits (the integrals) inherit it.
  • Example: integrating velocity from time 0 to 5 and then 5 to 10 gives the same total distance as integrating from 0 to 10.

📐 Property 2: Integral from a point to itself is zero

  • Formula: integral from b to b equals 0.
  • Reason: the area over a single point has no width, so it contributes zero area.
  • This follows from Property 1 when c = b: the two identical integrals on the left must cancel.

📐 Property 3: Reversing limits reverses the sign

Going backward from b to a produces the negative of the integral from a to b.

  • Formula: integral from a to b equals − (integral from b to a).
  • Why: when the "lower limit" is larger than the "upper limit," the steps Δx are negative, so rectangular areas are negative.
  • Example: integral from 0 to x of t² dt = x³/3; integral from x to 0 equals −x³/3.
  • Don't confuse: this is not about the function being negative; it is about the direction of integration.

📐 Property 4: Odd and even functions

Odd functions:

If v(−x) = −v(x), then the integral from −a to a equals 0.

  • Reason: areas on opposite sides of the origin cancel.
  • Example: integral from −a to a of 6xdx = x⁶ evaluated from −a to a = a⁶ − (−a)⁶ = 0.
  • Curious fact: if v(x) is odd, its antiderivative f(x) is even.

Even functions:

If v(−x) = +v(x), then the integral from −a to a equals 2 times the integral from 0 to a.

  • Reason: areas on both sides add.
  • Example: integral from −a to a of cos x dx = sin a − sin(−a) = 2 sin a.

📐 Property 5: Positive functions have positive integrals

  • If v(x) > 0 for axb, then the integral from a to b is positive.
  • Proof: lower sums s are positive and increase toward the integral, so the integral must be positive.
  • Interpretation: positive velocity means positive distance; a positive function lies above a positive area.

📐 Property 6: Inequalities are preserved

If l(x)v(x)u(x) for all x in [a, b], then the integral of l ≤ integral of v ≤ integral of u.

  • Why: rectangles for v stay between rectangles for l and u, and this holds in the limit.
  • Example 1: cos t ≤ 1, so integrating from 0 to x gives sin xx.
  • Example 2: 1 ≤ sec² t, so integrating from 0 to x gives x ≤ tan x (for x > 0).
  • Example 3: 1/(1 + x²) ≤ 1, so tan⁻¹ xx.

📐 Property 7: Mean Value Theorem for integrals

If v(x) is continuous, there exists a point c in [a, b] where v(c) equals the average value of v(x).

  • Formula: v(c) = (1/(b − a)) times the integral from a to b of v(x) dx.
  • This is the same as the ordinary Mean Value Theorem for derivatives: f′(c) = (f(b)f(a))/(b − a).
  • Direct proof: imagine a rectangle across [a, b]; raise its height from v_min to v_max until its area equals the area under the curve—at that height v(c), the function equals its average.
  • Relies on the intermediate value theorem: a continuous function takes on every height between its minimum and maximum.

📊 Average Value of a Function

📊 Definition and formula

The average value of v(x) from a to b is (1/(b − a)) times the integral from a to b of v(x) dx.

  • Parallel to discrete average: (1/n) times (v₁ + v₂ + ⋯ + v_n).
  • Integration is the continuous analog of summation.
  • Example: the average of 1/n, 2/n, 3/n, …, n/n is (n + 1)/(2n); as n → ∞, this approaches 1/2, which equals the integral from 0 to 1 of x dx.

📊 Examples of average values

Example: Average of x from 0 to 1

  • Integral from 0 to 1 of x dx = 1/2.
  • The average value of all numbers between 0 and 1 is 1/2.

Example: Average of x² from −1 to 1

  • (1/2) times integral from −1 to 1 of x² dx = x³/6 evaluated from −1 to 1 = 1/6 − (−1/6) = 1/3.
  • Where does x² equal 1/3? At c = ±1/√3, the Gauss points (important for numerical integration).

Example: Average of sin² x from 0 to π

  • (1/π) times integral from 0 to π of sin² x dx = (x − sin x cos x)/2 evaluated from 0 to π = 1/2.
  • The function sin² x = 1/2 − (1/2) cos 2x oscillates around its average value 1/2.
  • The point c is π/4 or 3π/4, where sin² c = 1/2.

Example: Average of an odd function

  • The average of x from −1 to 1 is 0 (the center point c = 0).

📊 Geometric interpretation

  • The average value v_ave is the height of a rectangle over [a, b] with the same area as the region under v(x).
  • Equal areas lie above and below the line y = v_ave.

🎲 Expected Value and Continuous Probability

🎲 Discrete expected value

Expected value: multiply each outcome by its probability and add.

Example: Rolling two dice

  • Outcomes range from 2 to 12.
  • Probability of 2 is (1/6)(1/6) = 1/36 (two ones).
  • Probabilities: 2 → 1/36, 3 → 2/36, 4 → 3/36, …, 7 → 6/36, …, 12 → 1/36.
  • All probabilities sum to 1 (all possibilities covered).
  • Expected value: 2·(1/36) + 3·(2/36) + 4·(3/36) + ⋯ + 12·(1/36) = 7.
  • Key insight: one roll is unpredictable, but the average of a million rolls is almost completely predictable.

🎲 Continuous probability

Transition from discrete to continuous:

  • In the continuous case, the probability of hitting any particular number x is zero.
  • Instead, an interval has nonzero probability: the probability of an outcome between x and x + Δx is p(x) Δx, where p(x) is the probability density.
  • The integral of p(x) over all outcomes equals 1 (some outcome must happen).

Example: Uniform distribution from 2 to 12

  • All numbers between 2 and 12 are equally probable, so p(x) = 1/10.
  • Discrete approximation with Δx = 1: average = 2·(1/10) + 3·(1/10) + ⋯ + 11·(1/10) = 6.5.
  • Continuous limit: expected value E(x) = integral from 2 to 12 of x p(x) dx = integral from 2 to 12 of x (dx/10) = x²/20 evaluated from 2 to 12 = 7.

Example: Uniform distribution from 0 to 1

  • p(x) = 1 (since the interval has length 1).
  • E(x^n) = integral from 0 to 1 of x^n dx = 1/(n + 1).
  • E(1/√x) = integral from 0 to 1 of dx/√x = 2.
  • E(1/x) = integral from 0 to 1 of dx/x = ∞ (does not exist).

🎲 Common confusion: "expected" does not mean "what you expect"

  • Don't confuse: the expected value is not the outcome you literally expect to see.
  • It is the probability-weighted average, which is predictable even when individual outcomes are random.
  • Example: if you pick a random number between 0 and 1, its expected value is 1/2, but you don't expect to pick exactly 1/2.

🎲 The class-size paradox

Setup:

  • 95 classes have 20 students; 5 classes have 200 students.
  • Total: 100 classes, 2900 students.

Two perspectives:

ViewpointCalculationResult
Random professor(95·20 + 5·200)/10029 students
Random student20·(1900/2900) + 200·(1000/2900)82 students
  • Why the difference: a random student is more likely to be in a large class because large classes contain more students.
  • Probability for the student: 1900/2900 of being in a small class, 1000/2900 of being in a large class.
  • Don't confuse: (a) the percentage of cars with one person vs. (b) the percentage of people alone in a car—these are different.
37

The Fundamental Theorem and Its Applications

5.7 The Fundamental Theorem and Its Applications

🧭 Overview

🧠 One-sentence thesis

The Fundamental Theorem of Calculus establishes that differentiation and integration are inverse operations, enabling us to compute integrals by finding antiderivatives and to understand that integration applies far beyond simple rectangular areas.

📌 Key points (3–5)

  • Part 1 says: The derivative of an integral (area function from a to x) equals the original function v(x).
  • Part 2 says: The integral of a derivative df/dx recovers the original function f(x) plus a constant; definite integrals equal f(b) − f(a).
  • Variable limits: When both endpoints of an integral depend on x, use the chain rule: derivative = v(b(x))·(db/dx) − v(a(x))·(da/dx).
  • Common confusion: Integration is not limited to vertical rectangles under curves—thin rings, shells, horizontal strips, and triangles can also be integrated.
  • Why it matters: The theorem connects local properties (derivatives at points) to global properties (integrals over intervals) and extends to volumes, probabilities, and other applications.

🔄 Part 1: Derivative of an integral

🔄 The main statement

Fundamental Theorem, Part 1: If v(x) is continuous and f(x) = integral from a to x of v(t) dt, then df/dx = v(x).

  • The integral from a fixed point a to a variable point x defines a function f(x) (the accumulated area).
  • Even without a formula for f(x), we can find its derivative.
  • The derivative of this area function equals the height of the original curve.

📐 How the proof works

  • Consider a small change Δx: the new area minus the old area is the thin strip from x to x + Δx.
  • That strip has area ≈ v(x) · Δx (a thin rectangle).
  • Dividing by Δx gives Δfxv(x).
  • As Δx → 0, the approximation becomes exact: df/dx = v(x).
  • Key tool: The Mean Value Theorem for integrals says the average value on a short interval equals v(c) for some point c in that interval; as the interval shrinks, c is squeezed toward x.

🖼️ Graphical meaning

  • The f-graph gives cumulative area under the v-graph.
  • A thin vertical strip has area Δf ≈ height v(x) × base Δx.
  • Dividing by the base gives the height: Δfxv(x).

Example: If f(x) = integral from 0 to x of t² dt, then df/dx = x².

🔀 Variable endpoints

🔀 Lower limit moving

If g(x) = integral from x to b of v(t) dt, then dg/dx = −v(x).

  • When the lower limit moves forward, area is removed from the left.
  • Hence the minus sign.
  • Quick proof: reverse the limits (integral from b to x = − integral from x to b), then apply Part 1.

🔀 Both limits moving

If A = integral from a(x) to b(x) of v(t) dt, then dA/dx = v(b(x))·(db/dx) − v(a(x))·(da/dx).

  • Two thin strips: one added at the upper limit, one subtracted at the lower limit.
  • Each strip's area ≈ height × thickness.
  • This is the chain rule in action.

Example: A = integral from 2x to x³ of cos t dt has dA/dx = (cos x³)·(3x²) − (cos 2x)·(2).

⚠️ Don't confuse

  • When a = 0 and b = x, we get da/dx = 0 and db/dx = 1, so the derivative is just v(x) (Part 1).
  • When a = x and b = constant, the derivative is −v(x) (lower limit case).
  • The general formula covers both.

🔁 Part 2: Integral of a derivative

🔁 The main statement

Fundamental Theorem, Part 2: If v(x) = df/dx, then integral from a to b of v(x) dx = f(b) − f(a).

  • To integrate v(x), find any antiderivative f(x) (a function whose derivative is v).
  • Evaluate f at the endpoints and subtract.
  • This is the tool we use most, because integrals are harder to compute than derivatives.

🔑 The key lemma

  • Special case: If df/dx = 0 everywhere in an interval, then f(x) = constant.
  • Proof uses the Mean Value Theorem: if f(a) ≠ f(b), then f′(c) = [f(b) − f(a)]/(b − a) ≠ 0 for some c, contradicting df/dx = 0.
  • General case: If dA/dx = df/dx, then A(x) = f(x) + C.
  • The derivative of A(x) − f(x) is zero, so A(x) − f(x) must be constant.

🔗 Connecting Part 1 and Part 2

  • Part 1 says A(x) = integral from a to x of v(t) dt has dA/dx = v(x).
  • If f(x) is any other antiderivative of v(x), then A(x) = f(x) + C.
  • At the lower limit, A(a) = 0, so f(a) + C = 0, hence C = −f(a).
  • At the upper limit, A(b) = f(b) + C = f(b) − f(a).
  • Therefore the definite integral equals f(b) − f(a).

🧩 Why this matters

  • We can compute integrals by finding antiderivatives (often easier than summing rectangles).
  • The theorem connects local information (derivative at each point) to global information (total change over an interval).

🌀 Beyond rectangles: New applications

🌀 Thin rings for circles

  • Question: Why does the area A of a circle satisfy dA/dr = C (circumference)?
  • The area is πr², and its derivative 2πr is the circumference.
  • Geometric reason: Divide the circle into thin concentric rings.
  • A ring of thickness Δr has area ≈ circumference × Δr = 2πr · Δr.
  • Integrating from 0 to r: A = integral of 2πr dr = πr².
  • The ring "unwinds" into a thin strip of width Δr and length ≈ C.

Example: This is the first step away from rectangles—we add up rings instead.

🌀 Thin shells for spheres

  • Question: Why does the volume V of a sphere satisfy dV/dr = A (surface area)?
  • A thin spherical shell from radius r to r + Δr has volume ≈ surface area × thickness = 4πr² · Δr.
  • Integrating from 0 to r: V = integral of 4πr² dr = (4/3)πr³.
  • The derivative of volume is surface area.

Main point: Integration is not restricted to rectangles; shells, rings, and other shapes work too.

🔲 Puzzle: The square

  • A square with side s has area A = s².
  • The derivative dA/ds = 2s is only half the perimeter (4s).
  • Why? The excerpt leaves this as an exercise (the figure is misleading).

📏 Horizontal rectangles

  • Problem: Find the area under v(x) = arccos x from x = 0 to x = 1.
  • We don't have an easy antiderivative for arccos x.
  • Solution: Use horizontal strips instead of vertical ones.
  • A horizontal strip at height v has length x = cos v and thickness Δv.
  • Integrate upward: area = integral from 0 to π/2 of cos v dv = sin v evaluated from 0 to π/2 = 1.

Don't confuse: The limits are now on v, not on x; we integrate in the direction perpendicular to the usual one.

🧮 Summary of techniques

SituationFormulaKey idea
Fixed lower limit a, variable upper xdf/dx = v(x)Part 1: derivative of area = height
Variable lower x, fixed upper bdg/dx = −v(x)Area removed from left
Both limits a(x) and b(x) variabledA/dx = v(b)·(db/dx) − v(a)·(da/dx)Chain rule on both ends
Antiderivative knownintegral v dx = f(b) − f(a)Part 2: evaluate at endpoints
Non-rectangular regionsUse rings, shells, horizontal strips, etc.Match the geometry to the problem
38

Numerical Integration

5.8 Numerical Integration

🧭 Overview

🧠 One-sentence thesis

Numerical integration methods improve accuracy by fitting higher-degree polynomials to the function being integrated, with the quality of a formula determined by how many powers of x it integrates exactly.

📌 Key points (3–5)

  • Goal: compute the definite integral I = ∫ from a to b of y(x) dx accurately and quickly when an antiderivative is unavailable or rectangle methods are too crude.
  • Order of accuracy: a method has order p if it integrates 1, x, x², ..., x^(p−1) exactly but fails on x^p; higher p means the error involves (Δx)^p, so smaller errors.
  • Common confusion: the trapezoidal rule (second-order) and midpoint rule (second-order) both beat rectangles (first-order), but Simpson's rule (fourth-order) is much better because it unexpectedly integrates x³ exactly, not just x².
  • Why it matters: each evaluation of y(x) can be expensive (a subroutine call), so minimizing the number of sample points while staying below error tolerance is critical.
  • Practical insight: second-order rules need ~1000 values for 10^(−6) tolerance; fourth-order methods (Simpson, Gauss) are far more efficient.

📏 Rectangle rules and first-order accuracy

📏 Right and left rectangle rules

Right rectangle rule R_n: Δx · (y₁ + y₂ + ⋯ + y_n)
Left rectangle rule L_n: Δx · (y₀ + y₁ + ⋯ + y_(n−1))

  • Divide [a, b] into n equal intervals of length Δx = (b − a)/n.
  • R_n uses the function value at the right endpoint of each interval; L_n uses the left endpoint.
  • Simple to compute and visualize, but very inaccurate.

🔺 Error in rectangle rules

  • When y(x) is a straight line, the errors form triangles with base Δx and height (y_(j+1) − y_j).
  • The total error telescopes:
    • R_n − I = (1/2) Δx · (y_n − y₀) = (1/2) Δx · [y(b) − y(a)]
    • L_n − I = −(1/2) Δx · [y(b) − y(a)]
  • First-order accuracy: error is proportional to Δx (the first power).
  • Example: integrating √x from 0 to 1 with n = 1000 gives error ≈ 0.0005 ≈ (1/2) Δx.

⚠️ Why rectangles are slow

  • The greater the slope of y(x), the greater the error—rectangles have zero slope and cannot follow a slant.
  • To achieve error 1/1,000,000 for y = x from 0 to 1 requires 500,000 rectangles.
  • The error prediction (1/2) Δx [y(b) − y(a)] is "asymptotically correct": as Δx → 0, the ratio of predicted to actual error approaches 1.

🔄 Special cases

  • When y(b) = y(a): the leading error term vanishes; the next term involves (Δx)² and the error drops much faster.
  • Periodic functions over a complete period: y(0) = y(1) and all derivatives match at endpoints, so errors go to zero exponentially fast (e.g., 1/(10 + cos 2x) integrated from 0 to 1 shows practically zero error even with few rectangles).

🔷 Second-order methods: trapezoidal and midpoint

🔷 Trapezoidal rule T_n

Trapezoidal rule: T_n = (1/2) R_n + (1/2) L_n = Δx · [(1/2) y₀ + y₁ + ⋯ + y_(n−1) + (1/2) y_n]

  • Average of the right and left rectangle rules.
  • Geometrically: the area of a trapezoid with base Δx and heights y_(j−1) and y_j is (1/2) Δx · (y_(j−1) + y_j).
  • Exact for y = x: trapezoids fit under a sloping line, so the error for straight lines is zero.
  • Second-order accuracy: error is proportional to (Δx)².

📐 Midpoint rule M_n

Midpoint rule: M_n = Δx · (y_(1/2) + y_(3/2) + ⋯ + y_(n−1/2))

  • Evaluate y(x) at the midpoint of each interval: x = (1/2) Δx, (3/2) Δx, (5/2) Δx, …
  • A rectangle with height at the midpoint has the same area as a trapezoid when the graph is a straight line.
  • Also exact for y = x, so also second-order.

📊 Error comparison for y = x²

nTrapezoidal error T_n − IMidpoint error M_n − I
11/6−1/12
101/600−1/1200
1001/60000−1/120000
  • Errors fall by 100 when n is multiplied by 10 (confirming (Δx)² behavior).
  • Midpoint is twice as accurate as trapezoidal: the leading error for M is −(1/24) (Δx)² [y′(b) − y′(a)], versus (1/12) (Δx)² [y′(b) − y′(a)] for T.
  • Don't confuse: both are second-order, but the constant in front of (Δx)² differs.

🧮 Error formulas

  • Trapezoidal: T_n − I ≈ (1/12) (Δx)² · [y′(b) − y′(a)]
  • Midpoint: M_n − I ≈ −(1/24) (Δx)² · [y′(b) − y′(a)]
  • The exact formulas replace [y′(b) − y′(a)] with (b − a) y″(c) for some unknown point c (like the Mean Value Theorem).
  • In practice, the unknown c limits usefulness; the key takeaway is the (Δx)² factor.

🎯 Fourth-order method: Simpson's rule

🎯 Simpson's rule S_n

Simpson's rule: S_n = (1/3) T_n + (2/3) M_n = (Δx/6) · [y₀ + 4y_(1/2) + 2y₁ + 4y_(3/2) + 2y₂ + ⋯ + 4y_(n−1/2) + y_n]

  • Combines trapezoidal and midpoint with weights 1/3 and 2/3 to cancel more error.
  • The famous 1–4–2–4–2–⋯–4–1 pattern: endpoints get weight 1, midpoints get weight 4, interior full-step points get weight 2, all multiplied by Δx/6.
  • Parabolic approximation: fits a parabola through three consecutive points (y₀, y_(1/2), y₁) and integrates the parabola exactly.

🚀 Why Simpson is fourth-order

  • By construction, Simpson integrates 1, x, and x² exactly (it was designed for parabolas).
  • Surprise: it also integrates x³ exactly, even though it was not designed to do so.
  • Reason: over the symmetric interval [−1, 1], x³ is an odd function with zero integral, and Simpson's symmetric weights give zero by symmetry.
  • Fourth-order accuracy: error is proportional to (Δx)⁴, so errors drop by 10,000 when n is multiplied by 10.

📉 Error behavior for powers of x

y(x)n = 1 errorn = 10 errorn = 100 error
000
000
x⁴8.33×10⁻³8.33×10⁻⁷8.33×10⁻¹¹
  • Exact for x² and x³; first error appears at x⁴.
  • The error for x⁴ confirms (Δx)⁴ scaling: multiplying n by 10 divides error by 10⁴.

🔑 Why Simpson is popular

  • Efficiency: achieves high accuracy with far fewer function evaluations than second-order methods.
  • Example: to reach 10⁻⁶ tolerance, second-order methods need ~1000 samples; Simpson needs far fewer.
  • Each evaluation of y(x) can be expensive (a subroutine call), so minimizing the number of samples is crucial.

🌟 Gauss rule (optional advanced method)

🌟 Gauss two-point rule

Gauss rule over [−1, 1]: ∫ from −1 to 1 of y(x) dx ≈ y(−1/√3) + y(1/√3)

  • Uses only two points per interval, yet achieves fourth-order accuracy.
  • The "Gauss points" x = ±1/√3 are chosen so that 1, x, x², and x³ are all integrated exactly.
  • By symmetry, odd powers (x, x³) integrate to zero; the key is x²: the integral of x² from −1 to 1 is 2/3, and (−1/√3)² + (1/√3)² = 1/3 + 1/3 = 2/3.

⚖️ Gauss vs Simpson

  • Gauss: two points per interval, fourth-order; best for thousands of integrations over one interval.
  • Simpson: also uses two new y-values per interval (midpoint and next full step), fourth-order; better when intervals go back-to-back because sample points align.
  • For y = x⁴ from 0 to 1:
    • Simpson error (n = 1): 8.33×10⁻³
    • Gauss error (n = 1): −5.56×10⁻³ (slightly better constant, same (Δx)⁴ scaling)

🔧 When to use Gauss

  • Gauss points are not as convenient as equally spaced points (hand calculators prefer Simpson).
  • Gauss shines when the same interval is integrated many times with different functions.
  • Don't confuse: both are fourth-order, but Gauss uses unequal spacing within each interval.

💻 Practical numerical integration on calculators

💻 How calculators actually work

  • Points are not equally spaced: machines may internally change variables (e.g., replace x by 3u² − 2u³) to avoid singularities at endpoints.
  • Example: ∫ from 0 to 1 of dx/√x has an infinite value at x = 0; the substitution x = 3u² − 2u³ gives dx = 6(u − u²) du, which vanishes at u = 0, removing the singularity.
  • The differential 6(u − u²) du was chosen to be zero at both u = 0 and u = 1, so the machine does not need y(x) at the endpoints (where infinity is most common).

⚠️ Difficult cases

  • Singularities inside [a, b]: break the interval into two pieces.
  • Integrals to infinity: chop off the tail where the integrand is negligible (e.g., stop e^(−x²) at x = 10).
  • Rapid oscillations: the answer depends on cancellation of highs and lows; the calculator requires many integration points.
  • Aliasing danger: if sin(8x) is sampled with Δx = 1/8, it is always zero—a high frequency 8 is confused with frequency 0. Unequal spacing prevents this.

🎛️ User interface

  • Input: the function y(x), endpoints a and b, and the desired accuracy (e.g., 0.00001).
  • Output: the integral I and an estimated error bound.
  • The machine samples y(x) at points x₁, …, x_k and estimates accuracy based on how answers converge as more points are added.
  • Formulas using 1, 3, 7, 15, … sample points are common; each new formula reuses samples from the previous one, stopping when answers are close.

🛡️ How any method can be deceived

  • Ask for the integral of y = 0 and note the sample points x₁, …, x_k.
  • Then integrate Y(x) = (x − x₁)² ⋯ (x − x_k)²; this also returns zero (now wrong) because the calculator follows the same steps and Y is zero at all sample points.
  • Lesson: no method is foolproof; the function must be reasonably smooth and not adversarially constructed.
39

An Overview

6.1 An Overview

🧭 Overview

🧠 One-sentence thesis

Logarithms and exponentials are inverse operations—logarithms convert multiplication into addition by expressing numbers as exponents of a base—and understanding them is essential for modeling natural phenomena through differential equations.

📌 Key points (3–5)

  • What logarithms do: they turn multiplication into addition by expressing numbers as exponents (powers) of a base.
  • Logarithm as exponent: the logarithm of a number is simply the exponent needed to produce that number from the base.
  • Why exponentials matter: the function e^x and the differential equation dy/dx = y are central to modeling change in nature, science, economics, and engineering.
  • Common confusion: logarithms are "mirror images" (inverses) of exponentials—they undo each other.
  • Three key applications: understanding logarithms as exponents, drawing graphs on different scales (ordinary, semilog, log-log), and finding derivatives of exponential functions.

🔢 What logarithms are

🔢 Logarithm as exponent

The logarithm of a number is the exponent needed to produce that number from a given base.

  • Logarithms express numbers as powers of a base.
  • Example: In base 10, the numbers 10, 100, 1000 are written as 10¹, 10², 10³; their logarithms are 1, 2, 3 respectively.
  • The logarithm of 1 is always 0, regardless of base, because any base raised to the zeroth power equals 1 (b⁰ = 1).

➕ Turning multiplication into addition

  • The fundamental property: b^(m+n) = (b^m)(b^n).
  • This rule means that adding exponents corresponds to multiplying the original numbers.
  • Why it matters: addition is simpler than multiplication, making logarithms useful for computation and analysis.

🪞 Mirror images of exponentials

  • Logarithms and exponentials are inverse operations.
  • If you take the exponential of a logarithm (or vice versa), you get back the original number.
  • Don't confuse: exponentials grow from exponents; logarithms extract exponents from numbers.

🌍 Why exponentials and logarithms matter

🧬 Differential equations and natural phenomena

  • The chapter emphasizes that dy/dx = y is a central differential equation in mathematics.
  • The exponential function e^x is the solution to this equation.
  • Applications span life sciences, physical sciences, economics, and engineering—anywhere change depends on the current state.

📐 Three key uses in Section 6.1

ApplicationWhat it does
Understanding logarithmsViewing them as exponents of a base
GraphingDrawing on ordinary, semilog, and log-log paper
Finding derivativesUsing the property b^(x+Δx) = (b^x)(b^Δx) to find slopes

🎯 The special role of e^x

  • Among all exponential functions (2^x, 10^x, etc.), e^x is the most important.
  • The chapter is devoted to understanding, differentiating, integrating, solving equations with, and inverting exponentials.
  • The placement of this chapter early (before other integration techniques) reflects the priority: differential equations and natural laws come before computational techniques.

📊 Bases and notation

🔟 Base 10 logarithms

  • Powers of 10 are fundamental to the decimal system: 10⁰ = 1, 10¹ = 10, 10² = 100, 10³ = 1000.
  • The logarithms "to base 10" of these numbers are their exponents: 0, 1, 2, 3.

🔤 General base b

  • The same principle applies to any base b.
  • The logarithm of 1 in any base is always 0, because b⁰ = 1 for any base.
  • The fundamental multiplication rule b^(m+n) = (b^m)(b^n) holds for any base and drives all three applications in the section.
40

The Exponential e^x

6.2 The Exponential e^x

🧭 Overview

🧠 One-sentence thesis

The excerpt provided contains no substantive content about the exponential function e^x; it consists entirely of trigonometry exercises, a discussion of sine wave patterns in discrete graphs, and an introduction to the derivative definition.

📌 Key points (3–5)

  • The excerpt does not cover the exponential function e^x despite the title "6.2 The Exponential e^x."
  • The material includes trigonometry problems (sine, cosine, tangent functions and their properties).
  • A lengthy section discusses visual patterns ("A Thousand Points of Light") when plotting discrete sine values.
  • The excerpt ends with the formal definition of the derivative from Chapter 2.
  • Common confusion: The title suggests exponential content, but the actual text is unrelated.

📭 Content mismatch

📭 What the excerpt contains

The source text does not discuss the exponential function e^x at all. Instead, it includes:

  • Trigonometry exercises: problems about sine, cosine, tangent, secant, cosecant, cotangent, their periods, addition formulas, and identities.
  • Discrete sine graphs: a detailed exploration of visual patterns when plotting y = sin n (discrete integer values) rather than y = sin x (continuous curve).
  • Moiré patterns: an explanation of hexagonal visual artifacts arising from interference between periodic patterns.
  • Derivative definition: the formal limit definition of the derivative from Chapter 2.

🚫 Missing exponential content

No information is provided about:

  • The definition or properties of e^x
  • The base e (Euler's number)
  • Exponential growth or decay
  • The derivative of e^x
  • Applications of exponential functions

🔢 Trigonometry exercises (partial content)

🔢 Types of problems

The excerpt lists 32 numbered exercises covering:

  • Evaluating trigonometric functions at specific angles (e.g., theta = 3π/2)
  • Verifying trigonometric identities (e.g., cos²θ + sin²θ)
  • Graphing functions and determining periods
  • Addition formulas for sine and cosine
  • Solving trigonometric equations
  • Geometric applications (law of cosines, triangle problems)

🔢 Key formulas mentioned

The exercises reference but do not fully explain:

  • Addition formulas for cos(s + t) and sin(s + t)
  • Complementary angle relationships: sin θ = cos(π/2 - θ)
  • Period properties of tan θ and cot θ
  • Identities like sec²θ - tan²θ and csc²θ - cot²θ

📊 Discrete sine patterns

📊 The main observation

The excerpt describes plotting y = sin n for integer values n = 1, 2, 3, ..., 10,000.

  • Continuous vs discrete: The graph of sin x is one continuous curve; the graph of sin n picks discrete points from that curve.
  • Visual illusion: When 10,000 points are plotted, they appear to lie on more than 40 separate sine curves.
  • Scale effect: The same 1,000 points look like sine curves in one scale and hexagons in another scale.

📊 Why points cluster on curves

The excerpt explains that certain integer values n produce sin n values close to zero:

  • sin 22 ≈ 0 because 22/7 is close to π, so 22 is close to 7π (whose sine is zero).
  • sin 44 ≈ 0 because 44 is close to 14π.
  • Pattern: Points at n = 0, 44, 88, 132, ... (multiples of 44) lie on the "middle sine curve."

The excerpt states there are 44 curves in total, starting near the heights sin 0, sin 1, ..., sin 43.

📊 Moiré patterns and hexagons

When the same points are plotted at a different scale, hexagonal patterns appear.

Moiré pattern: visual interference between periodic patterns.

  • The hexagons arise from the way human eyes perceive overlapping periodic structures.
  • The excerpt notes this is "a problem with your eyes," not the mathematics itself.
  • Applications mentioned: engineering, optics; problems: printing misalignment, TV vertical lines, dizziness in cloth manufacturing.

🧮 Derivative definition

🧮 Formal definition

The excerpt ends with the beginning of Chapter 2, introducing the derivative:

At time t, the derivative f'(t) or df/dt or v(t) is f'(t) = lim (Δt → 0) [f(t + Δt) - f(t)] / Δt

  • Interpretation: The ratio [f(t + Δt) - f(t)] / Δt is the average velocity (or average rate of change) over a short time interval Δt.
  • Limit process: The derivative is the limit of this ratio as Δt approaches zero.
  • Notation: The derivative can be written as f'(t), df/dt, or v(t) (when representing velocity).

🧮 Examples mentioned

The excerpt briefly recalls two examples from Chapter 1:

  • When distance is t², the velocity (derivative) is 2t.
  • When f(t) = sin t, the derivative is v(t) = cos t.

No further explanation or derivation is provided in the excerpt.


Note: This excerpt does not contain the expected content on the exponential function e^x. To study that topic, a different source or section is required.

41

Growth and Decay in Science and Economics

6.3 Growth and Decay in Science and Economics

🧭 Overview

🧠 One-sentence thesis

The differential equation dy/dt = cy (and its extension dy/dt = cy + s with a source term) governs exponential growth and decay across science and economics, determining population dynamics, radioactive dating, financial investments, and steady-state behaviors.

📌 Key points (3–5)

  • Core equation: dy/dt = cy has the solution y = y₀e^(ct), where c is the growth rate (c > 0) or decay rate (c < 0).
  • Three-quantity relationship: Given any two of {y₀, c, t} plus one additional fact, the third can be determined—leading to problems about finding doubling time, decay constants, or initial values.
  • Source term extension: dy/dt = cy + s (with continuous deposits/withdrawals s) has solution y = (y₀ + s/c)e^(ct) - s/c, combining exponential growth from y₀ with contributions from the source.
  • Common confusion: Don't confuse annual rates with continuous rates—continuous compounding at 5% yields an effective annual rate of about 5.13% (e^0.05 ≈ 1.0513).
  • Steady state vs transient: When c < 0 (decay), the transient term y₀e^(ct) → 0 and the system approaches steady state y∞ = -s/c, independent of initial conditions.

📐 The fundamental exponential equation

📐 What dy/dt = cy means

The differential equation dy/dt = cy states that the rate of change is proportional to the current amount.

  • This is fundamentally different from dy/dt = x (which asks for an antiderivative).
  • The equation dy/dt = y can be rewritten as dy/y = dx, leading to ln y = x + C, hence y = e^x · e^C.
  • Geometric interpretation: A field of tangent arrows where slopes grow steeper as y grows (not as x grows).
  • The solution curve must stay tangent to these arrows at every point.

🔑 Standard solution form

The solution to dy/dt = cy starting from y = y₀ at t = 0 is:

y = y₀e^(ct)

  • The constant c is the "growth rate" (c > 0) or "decay rate" (c < 0).
  • Time variable t replaces x in most applications.
  • The exponential e^(ct) is the growth/decay factor; y₀ is the initial condition.

🔢 Three types of problems

🔢 Type 1: Finding doubling time T (given c)

When does y₀e^(cT) = 2y₀?

  • This gives e^(cT) = 2, so cT = ln 2.
  • Doubling time: T = (ln 2)/c ≈ 0.7/c.
  • Example: At c = 0.1 (10% continuous growth), doubling takes about 7 time units.
  • More generally: time to multiply by factor k is (ln k)/c.

🔢 Type 2: Finding decay constant c (given half-life T)

For carbon-14 with half-life T = 5568 years, when does y = (1/2)y₀?

  • This gives e^(cT) = 1/2, so cT = ln(1/2) = -ln 2.
  • Decay constant: c = -ln(2)/T (negative for decay).
  • The ratio y(T)/y(0) = e^(cT) determines c via logarithms.

🔢 Type 3: Finding initial value y₀ (given y at time t)

If y(1) = 5 and c = 2, what was y₀?

  • From y(t) = y₀e^(ct), we get y₀ = y(t)e^(-ct).
  • This "runs the process backward"—growth forward becomes decay backward.
  • Example: y₀ = 5e^(-2).

⏱️ Key time property

For any step Δt, the multiplier is always e^(cΔt):

  • y(T + t) = y₀e^(c(T+t)) = (y₀e^(cT))e^(ct).
  • Each time interval multiplies by the same factor, whether at the start or later.

🌍 Applications in science

🌍 Population growth

Model: dy/dt = by - dy = cy, where b = birth rate, d = death rate, c = b - d (net rate).

  • Solution: y = y₀e^(ct).
  • Example: Earth's population with c = 0.02/year doubles in T ≈ 0.7/0.02 = 35 years.
  • Limitation: This model cannot hold for very large t (populations can't grow exponentially forever).
  • Dimensions: b, c, d have units of "1/time" (per person per unit time).

☢️ Radioactive dating

Living organisms maintain radiocarbon balance; dead material decays.

  • Carbon-14 dating: Ratio of disintegrations = e^(ct), where c is known from half-life.
  • Example: Charcoal giving 0.97 disintegrations vs 6.68 in living wood:
    • e^(ct) = 0.97/6.68
    • t = (1/c) ln(0.97/6.68) ≈ 14,400 years (Lascaux cave paintings).
  • Uranium dating: Comparing U-238 and U-235 with different half-lives determines when uranium was created (about 6 billion years ago).

🌡️ Newton's Law of Cooling

Model: dy/dt = c(y - y∞), where y∞ is ambient temperature.

  • The rate is proportional to the temperature difference.
  • Solution: (y - y∞) = (y₀ - y∞)e^(ct).
  • Example: Body found at 90° in 70° room, later 80°—work backward to find time of death.
  • Don't confuse: y∞ is the steady-state temperature (not zero).

💰 Financial applications

💰 Continuous vs annual rates

At continuous rate c, the growth over time dt is dy = cy·dt.

  • After one year: e^c ≈ 1 + (annual rate).
  • Example: 5% continuous = e^0.05 ≈ 1.0513 = 5.13% annual (effective rate).
  • Banks quote both: the "effective rate" accounts for compounding.

💰 Present and future value

ConceptFormulaMeaning
Future valuey₀e^(ct)What y₀ now becomes in t years
Present valueye^(-ct)What y in t years is worth now
Doubling time(ln 2)/cTime for money to double

Example: $24 in 1626 at 8% continuous interest → $24 · e^(29.2) ≈ 115 trillion after 365 years.

🔄 Adding a source term

🔄 The extended equation

dy/dt = cy + s (constant source s, starting from y₀)

  • s represents continuous deposits (s > 0) or withdrawals (s < 0).
  • Units: s has dimensions "amount/time" to match dy/dt.
  • Example: Initial investment y₀ = $8000 with deposits s = $1000/year.

🔄 Solution methods

Method 1 (fast): Assume y = Ae^(ct) + B

  • Substitute into equation: cAe^(ct) = c(Ae^(ct) + B) + s.
  • This gives cB + s = 0, so B = -s/c.
  • From y(0) = y₀: A = y₀ + s/c.
  • Key formula: y = (y₀ + s/c)e^(ct) - s/c = y₀e^(ct) + (s/c)(e^(ct) - 1).

Method 2 (Duhamel's principle): Add up all outputs

  • Initial deposit y₀ grows to y₀e^(ct).
  • Small deposit s·dT at time T grows by factor e^(c(t-T)) to reach time t.
  • Total: y(t) = y₀e^(ct) + ∫₀ᵗ e^(c(t-T)) s dT.
  • Integration yields the same formula.

Method 3 (for steady state): Look at difference y - y∞

  • If y∞ = -s/c is the steady state, then d/dt(y - y∞) = c(y - y∞).
  • This is a pure exponential: (y - y∞) = (y₀ - y∞)e^(ct).

💵 Six financial questions

💵 Questions 1–2: No source (s = 0)

  1. Future value: y₀ → y₀e^(ct) after time t.
  2. Present value: To get y later, deposit ye^(-ct) now.

💵 Questions 3–4: No initial deposit (y₀ = 0)

  1. Deposits to target: Depositing s = $1000/year at 5% for 20 years yields y = (1000/0.05)(e - 1) ≈ $34,400.
  2. Required deposit rate: To reach $20,000 in 20 years requires s = 1000/(e - 1) ≈ $582/year.

💵 Questions 5–6: End with nothing (y = 0)

  1. Annuity: To withdraw $1000/year for 20 years, deposit y₀ = (s/c)(1 - e^(-ct)) ≈ $12,640.
  2. Loan repayment: To clear $20,000 loan in 20 years requires payments s = -cy₀e^(ct)/(e^(ct) - 1) ≈ $1582/year.

Puzzle solution: The difference $1582 - $582 = $1000 is exactly the interest on $20,000 at 5%—you can pay $1000/year to stay even with interest, and the extra $582 builds to $20,000 to repay principal.

🎯 Steady state behavior

🎯 When decay dominates (c < 0)

The solution y = (y₀ + s/c)e^(ct) - s/c has two parts:

  • Transient term: (y₀ + s/c)e^(ct) → 0 as t → ∞ (dies out).
  • Steady state: y∞ = -s/c (remains constant).

At steady state: dy/dt = 0, meaning cy + s = 0.

  • The source s exactly balances the decay cy.
  • y∞ depends on s and c, but not on y₀.

🎯 Population example

Bermuda with birth rate b = 0.02, death rate d = 0.03 (net c = -0.01), immigration s = 1200/year:

  • Steady state: y∞ = -1200/(-0.01) = 120,000.
  • Population approaches 120,000 regardless of whether it starts at 5,000 or 5,000,000.
  • If y₀ < 120,000, population grows; if y₀ > 120,000, population decays.

🎯 Don't confuse

  • Transient vs steady: y₀e^(ct) depends on initial conditions; y∞ = -s/c does not.
  • Growth vs decay: c > 0 means no steady state (exponential growth); c < 0 means convergence to y∞.
42

Logarithms

6.4 Logarithms

🧭 Overview

🧠 One-sentence thesis

The natural logarithm ln x, defined as the integral of 1/x from 1 to x, provides the theoretical foundation for exponential functions and yields powerful techniques for differentiation and integration through its properties and the chain rule.

📌 Key points (3–5)

  • Definition as integral: ln x is defined as the area under the curve y = 1/x from 1 to x, making it the "missing integral" of the −1 power.
  • Two fundamental properties: ln(ab) = ln a + ln b (product becomes sum) and ln(b^n) = n ln b (power becomes multiple), both provable from the integral definition.
  • Logarithmic differentiation (LD): for complicated products or powers, differentiate ln p instead of p directly, then multiply back by p to recover the derivative.
  • Common confusion: ln x is only defined for x > 0; for integrals that may cross negative values, use ln |u| to handle both positive and negative u (but never u = 0).
  • Integration yields logarithms: the integral of du/u equals ln |u|, which extends via substitution to many rational forms like 1/(cx + 7) or tan x.

📐 Definition and foundational properties

📐 The natural logarithm as area

ln x = integral from 1 to x of (1/t) dt

  • This is the direct definition: logarithm is the area under the hyperbola y = 1/x, measured from the starting point x = 1.
  • At x = 1, the area is zero, so ln 1 = 0.
  • The integral does not extend to x ≤ 0; ln x is defined only for x > 0.
  • As x → 0⁺, the area becomes infinite in the negative direction: ln 0 = −∞.
  • As x → ∞, the area grows without bound: ln ∞ = +∞.

Why this definition matters: Earlier chapters integrated all powers x^n except the −1 power. The logarithm fills that gap and avoids circular reasoning (we cannot define e^x as its own integral).

🔗 Property 1: Logarithm of a product

ln(ab) = ln a + ln b

Proof from the integral definition:

  • The area from 1 to a is ln a.
  • The area from a to ab must equal ln b.
  • Use substitution u = x/a in the integral from a to ab of (1/x) dx.
  • Then x = au, dx = a du, so dx/x = du/u.
  • The limits become u = 1 to u = b, yielding ln b.
  • Neighboring areas combine: integral from 1 to a plus integral from a to ab equals integral from 1 to ab.

Example: ln 6 = ln(2·3) = ln 2 + ln 3.

🔢 Property 2: Logarithm of a power

ln(b^n) = n ln b

Proof from the integral definition:

  • Use substitution x = u^n in the integral from 1 to b^n of (1/x) dx.
  • Then dx = n u^(n−1) du, so dx/x = n du/u.
  • The limits become u = 1 to u = b, yielding n times the integral from 1 to b of (1/u) du = n ln b.

Example: ln 8 = ln(2³) = 3 ln 2.

Don't confuse: ln(b^n) pulls the exponent down as a multiplier; ln(a + b) has no simplification.

🔄 Inverse relationship with e^x

🔄 Defining e and e^x via logarithms

  • e is defined as the unique number whose logarithm equals 1: ln e = 1, so the area from 1 to e under 1/x is exactly 1.
  • e^π is defined as the unique number whose logarithm equals π: ln(e^π) = π.
  • This approach constructs the exponential function as the inverse of the logarithm.

Why the area reaches every value:

  • The area from 1 to 2 is more than 1/2 (because 1/x > 1/2 on that interval of length 1).
  • The combined area from 1 to 4 is more than 1, so we reach area = 1 before x = 4 (actually at e ≈ 2.718).
  • Since 1/x is always positive, the area is strictly increasing and never returns to any value.

📈 Growth rate comparison

  • ln x grows very slowly: ln x → ∞ as x → ∞, but (ln x)/x → 0.
  • ln x is passed by every root: (ln x)/x^(1/n) → 0 for any n, because e^x eventually surpasses every power x^n (and they are inverses).
  • Example: at x = 10, ln 10 ≈ 2.3 versus √10 ≈ 3.2; but at x = e^10, ln(e^10) = 10 versus √(e^10) = e^5 ≈ 148, and ln x loses badly.

Key insight: To double the area under 1/x, you must square the distance (because ln(x²) = 2 ln x).

🧮 Approximation near x = 1

🧮 Linear and quadratic approximations

For x near zero:

  • Linear: ln(1 + x) ≈ x
  • Quadratic: ln(1 + x) ≈ x − (1/2)x²
  • Also: e^x ≈ 1 + x (linear) and e^x ≈ 1 + x + (1/2)x² (quadratic)

Why the linear approximation works:

  • Between 1 and 1 + x, the area under 1/x is nearly a rectangle with base x and height 1.
  • The curved area ln(1 + x) is close to the rectangular area x.
  • A small triangle is chopped off at the top, with area approximately (1/2)x².

Example: ln 1.01 = 0.0099503 (actual) versus 0.01 (linear approximation); the difference 0.0000497 is predicted almost exactly by −(1/2)(0.01)² = −0.00005.

Don't confuse: Two wrongs make a right here: ln(e^x) ≈ ln(1 + x) ≈ x, and ln(e^x) = x exactly.

🔍 Differentiation with logarithms

🔍 Basic derivative and chain rule

The derivative of ln x is 1/x. The derivative of ln u(x) is (1/u)(du/dx).

  • By the Fundamental Theorem of Calculus, since ln x is defined as the integral of 1/x, its derivative must be 1/x.
  • For ln u(x), apply the chain rule: inside function is u, outside is ln.

Examples:

  • d/dx ln(3x) = (1/(3x))·3 = 1/x (the 3 cancels!)
  • d/dx ln(x³) = 3x²/x³ = 3/x
  • d/dx ln(x² + 1) = 2x/(x² + 1)
  • d/dx ln(cos x) = −sin x/cos x = −tan x
  • d/dx ln(ln x) = (1/ln x)·(1/x)

Common confusion: The slope of ln(3x) is not 3/x; it equals 1/x because ln(3x) = ln 3 + ln x, and the constant ln 3 disappears when differentiating.

🪵 Logarithmic differentiation (LD)

For products and powers, differentiate ln p instead of p directly:

Method:

  1. Take ln of both sides: ln p = (sum or simpler expression)
  2. Differentiate: (1/p)(dp/dx) = (derivative of right side)
  3. Multiply by p: dp/dx = p·(derivative of ln p)

Example (product): If p(x) = x·√(x−1), then

  • ln p = ln x + (1/2)ln(x−1)
  • (1/p)(dp/dx) = 1/x + 1/(2(x−1))
  • dp/dx = p·[1/x + 1/(2(x−1))]

Example (power): If p = x^(1/x), then

  • ln p = (1/x)ln x
  • (1/p)(dp/dx) = (1/x²)(−ln x + 1)
  • dp/dx = x^(1/x)·[(1 − ln x)/x²]

The catch: You must multiply back by p at the end, which can complicate the answer—but for complex products or variable exponents, LD is often simpler than the product or power rule.

∫ Integration producing logarithms

∫ The fundamental integral pattern

Integral of (du/dx)/u(x) dx = ln |u(x)| + C Or equivalently: integral of du/u = ln |u| + C

Strategy: Try to identify u(x) so that the integrand contains (du/dx) divided by u.

Examples:

  • Integral of dx/(x + 7) = ln |x + 7| + C
  • Integral of dx/(cx + 7) = (1/c)ln |cx + 7| + C
  • Integral of x dx/(x² + 7) = (1/2)ln(x² + 7) + C (here u = x² + 7, du/dx = 2x)
  • Integral of dx/(x ln x) = ln |ln x| + C (here u = ln x, du/dx = 1/x)

∫ Trigonometric integrals

  • Tangent: integral of tan x dx = integral of (sin x)/(cos x) dx = −ln |cos x| + C (u = cos x, du = −sin x dx)
  • Cotangent: integral of cot x dx = integral of (cos x)/(sin x) dx = ln |sin x| + C (u = sin x, du = cos x dx)
  • Secant (requires a trick): integral of sec x dx = ln |sec x + tan x| + C
    • Multiply numerator and denominator by (sec x + tan x)
    • Then u = sec x + tan x, and du/dx = sec x tan x + sec² x appears in the numerator
  • Cosecant: integral of csc x dx = ln |csc x + cot x| + C (similar trick)

∫ Absolute value and domain

Why write ln |u|:

  • When u > 0, the integral of 1/u is ln u.
  • When u < 0, ln u is undefined, but we can switch to −u: integral of (du/dx)/u dx = integral of (−du/dx)/(−u) dx = ln(−u).
  • Combining both cases: ln |u| works for u ≠ 0.

Forbidden case: u = 0 gives infinite area on both sides; the integral cannot cross zero.

Example: Integral from −1 to −e of dx/x = [ln |x|] from −1 to −e = ln e − ln 1 = 1 − 0 = 1 (the area is below the x-axis, but we measure magnitude).

Don't confuse: We do not have logarithms of negative numbers (in real calculus); we use absolute value to handle negative u. Never integrate across u = 0.

∫ Special integrals

  • Integral of ln x: integral of ln x dx = x ln x − x + C (derived by recognizing that d/dx(x ln x) = ln x + 1, so subtract x to remove the extra 1)
  • Definite integral examples:
    • Integral from x to 3x of dt/t = ln(3x) − ln x = ln 3
    • Integral from 0.1 to 1 of dx/x = ln 1 − ln 0.1 = 0 − ln(1/10) = ln 10

Application: The area integral from a to b of dx/(ln x) (no elementary formula) is approximately the number of primes between a and b—this is the prime number theorem. Near e^1000, about 1/1000 of integers are prime.

43

Separable Equations Including the Logistic Equation

6.5 Separable Equations Including the Logistic Equation

🧭 Overview

🧠 One-sentence thesis

Separable differential equations allow us to solve important nonlinear models like the logistic equation by splitting variables into separate integrals, revealing how populations grow rapidly at first and then level off into an S-curve as competition limits further expansion.

📌 Key points (3–5)

  • What separation means: rearranging dy/dt = u(y)v(t) so all y terms are on one side and all t terms on the other, then integrating both sides independently.
  • The logistic equation models realistic growth: dy/dt = cy - by² captures both natural growth (cy) and competition that slows growth (-by²), producing an S-shaped curve.
  • Steady state at c/b: when competition term by² balances growth term cy, the population stops changing and reaches equilibrium y = c/b.
  • Common confusion—linear vs nonlinear: the equation dy/dt = cy is separable and linear (exponential growth forever), but dy/dt = y + t is not separable; dy/dt = cy - by² is separable but nonlinear (growth stops).
  • Why it matters: these methods solve real models in biology (population dynamics), chemistry (reaction rates), and ecology (resource competition).

🔧 The separation method

🔧 Core idea of separation

To solve dy/dt = u(y)v(t), separate dy/u(y) from v(t)dt and integrate both sides.

  • The goal: isolate all y-related terms on the left, all t-related terms on the right.
  • Then integrate: ∫ dy/u(y) = ∫ v(t)dt + C.
  • After integration, substitute the initial condition (t = 0, y = y₀) to find the constant C.
  • Finally, solve for y(t) explicitly if possible.

📝 Example: dy/dt = y²

  • Separate: dy/y² = dt.
  • Integrate: -1/y = t + C.
  • At t = 0, y = y₀ gives C = -1/y₀.
  • Solution: y = y₀/(1 - ty₀).
  • Warning: this solution blows up (goes to infinity) when t reaches 1/y₀.

📝 Example: dy/dt = ty

  • Separate: dy/y = t dt.
  • Integrate: ln y = (1/2)t² + C.
  • At t = 0, y = y₀ gives C = ln y₀.
  • Solution: y = y₀ exp(t²/2).
  • The interest rate c = t (time-dependent) produces exponent t²/2.

❌ When separation fails

  • The equation dy/dt = y + t is not separable—you cannot isolate y from t.
  • Other methods (like assuming y = Ae^t + B + Dt) may still work.

🌱 The logistic equation

🌱 Why the simple model fails

Problem: dy/dt = cy predicts exponential growth forever, but real populations face limits.

  • Constant growth rate c = (birth rate - death rate) is unrealistic long-term.
  • Competition for food and space must slow growth as population y gets large.
  • The true rate c depends on population size: c(y), not a constant.

🌱 The logistic model

dy/dt = cy - by²

  • Growth term cy: natural increase when population is small.
  • Competition term -by²: interactions between individuals slow growth; the number of interactions is proportional to y times y.
  • Typically b is very small compared to c, so competition only matters when y is large.
  • This is the basic model of "growth versus competition."

🎯 Steady state and equilibrium

  • When dy/dt = 0, growth stops: cy - by² = 0.
  • Solving: y = c/b (the steady state or carrying capacity).
  • Example (world population): c ≈ 0.029/year, b ≈ 3×10⁻¹²/year gives y∞ = 10 billion people.
  • At equilibrium, loss from competition exactly balances gain from growth.

📈 The inflection point

  • Halfway to steady state, at y = c/(2b), the S-curve bends from accelerating to decelerating.
  • The second derivative d²y/dt² = 0 here; the slope dy/dt is maximum.
  • Found by differentiating: y'' = (c - 2by)y' = 0 when y = c/(2b).

📐 Solving the logistic equation

📐 Separation and integration

  • Start: dy/dt = cy - by².
  • Separate: dy/(cy - by²) = dt.
  • Integrate both sides: ∫ dy/(cy - by²) = ∫ dt.
  • The y-integral requires "partial fractions" (covered in Section 7.4).
  • Result: ln[y/(c - by)] = ct + C.

📐 Finding the S-curve formula

  • At t = 0, y = y₀ gives C = ln[y₀/(c - by₀)].
  • Take exponentials: y/(c - by) = e^(ct) · y₀/(c - by₀).
  • Key insight: it is y/(c - by) that grows exponentially, not y itself.
  • Final formula: y = c/[b + de^(-ct)], where d depends on initial conditions.
  • As t → ∞, e^(-ct) → 0, so y → c/b (the steady state).

🔄 Alternative: linear equation for z = 1/y

  • Surprising fact: if you set z = 1/y, then z satisfies a linear equation.
  • By calculus: z' = -y'/y² = -(cy - by²)/y² = -cz + b.
  • This is linear: z' = -cz + b.
  • Solution: z = Ae^(-ct) + b/c.
  • Turn upside down: y = 1/z gives the same S-curve.

🧪 Applications in biology and chemistry

🧪 The Law of Mass Action

The reaction rate is proportional to the product of concentrations.

  • For reaction mA + nB → pC, the rate is proportional to [A] times [B].
  • Example: d[A]/dt = -r[A][B] and d[C]/dt = +k[A][B].
  • This produces quadratic terms (like y²) in the differential equations.
  • Why it matters: interactions between two populations of size y and z give rate proportional to yz; one population competing with itself gives y².

🧪 The Michaelis-Menten (MM) equation

dy/dt = -cy/(y + K)

  • Models enzyme-catalyzed reactions in biochemistry.
  • Enzyme: a catalyst that speeds up reactions and emerges unchanged.
  • The Michaelis constant K depends on reaction rates.
  • Behavior:
    • When y is large: dy/dt ≈ -c (maximum rate).
    • When y is small: dy/dt ≈ -cy/K (proportional to y).
  • Separation gives: ∫(y + K)/y dy = -∫c dt, leading to y + K ln y = -ct + C.
  • No simple formula for y(t), but the solution can be graphed by computer.

🧪 Examples of catalysts

  • Platinum in catalytic converters (reacts with pollutants).
  • Spray propellants (destroy ozone layer).
  • Enzymes in blood clotting (Factor VIII missing in hemophilia).
  • Yeast (makes bread rise).
  • Meat tenderizer (predigests protein).

🎨 The y-line method

🎨 Understanding dy/dt = f(y)

  • Draw a horizontal "y line" (the y-axis).
  • Add arrows showing the sign of f(y):
    • When f(y) > 0, y increases (arrow points right).
    • When f(y) < 0, y decreases (arrow points left).
    • When f(y) = 0, y is stationary (steady state).

🎨 Stable vs unstable steady states

  • Stable: arrows point toward the steady state from both sides.
  • Unstable: arrows point away from the steady state.
  • The y-line shows which direction y moves and where it stops.
  • Example: for logistic equation, y = 0 is unstable, y = c/b is stable.
44

Powers Instead of Exponentials

6.6 Powers Instead of Exponentials

🧭 Overview

🧠 One-sentence thesis

The exponential function e^x can be understood through discrete compound growth (1 + x/n)^n, which converges to e^x as n approaches infinity, connecting difference equations used in finance and computing to the continuous differential equations of calculus.

📌 Key points (3–5)

  • The infinite series for e^x: e^x = 1 + x + x²/2 + x³/6 + x⁴/24 + ... is the only power series that equals its own derivative.
  • Discrete vs continuous growth: (1 + x/n)^n approaches e^x as n → ∞, bridging compound interest (discrete steps) and continuous exponential growth.
  • Difference equations mirror differential equations: y(t+1) = ay(t) (discrete) corresponds to y' = cy (continuous), with solution a^t y₀ vs e^(ct) y₀.
  • Common confusion: More frequent compounding increases returns but approaches a limit—continuous compounding at 100% yields e ≈ 2.718, not infinity.
  • Practical applications: Finance (loans, annuities, IRA accounts) uses discrete formulas; scientific computing approximates continuous differential equations with discrete time steps.

📐 The infinite series representation

📐 Why e^x needs infinitely many terms

e^x = 1 + x + (1/2)x² + (1/6)x³ + (1/24)x⁴ + ...

  • No polynomial can be its own derivative because the highest power x^n drops to nx^(n-1).
  • Each term x^n is divided by n factorial (n! = 1·2·3·...·n).
  • Example: x⁴/24 because 4! = 1·2·3·4 = 24; x⁵/120 because 5! = 120.

🔄 The derivative equals itself

The derivative of each term produces the previous term:

  • Derivative of x^n/n! is x^(n-1)/(n-1)! because the n from differentiation cancels the n in the factorial.
  • Therefore d(e^x)/dx = 0 + 1 + x + x²/2 + x³/6 + ... = e^x.
  • The integral of each term is the next term, so ∫e^x dx = e^x + C.

🎯 Convergence to e

When x = 1: e = 1 + 1 + 1/2 + 1/6 + 1/24 + 1/120 + 1/720 + ...

  • Nine terms give e ≈ 2.71828 with high accuracy.
  • The factorials cause extremely fast convergence as n increases.

💰 Compound interest and discrete growth

💰 How compounding works

Starting with $1000 at 100% annual rate:

  • Annual compounding: $1000 → $2000 (multiply once by 2).
  • Semi-annual: $1000 → $1500 → $2250 (multiply twice by 1.5).
  • Quarterly: multiply four times by 1.25 → $2441.41.
  • Monthly: (1 + 1/12)^12 · 1000 = $2613.04.
  • Daily: (1 + 1/365)^365 · 1000 = $2714.57.
  • Continuous: e · 1000 = $2718.28.

🔢 Two methods to prove (1 + 1/n)^n → e

Quick method (logarithms):

  • ln[(1 + 1/n)^n] = n ln(1 + 1/n) ≈ n · (1/n) = 1 using the approximation ln(1 + x) ≈ x.
  • As n → ∞, this approximation improves and the logarithm approaches exactly 1.
  • Therefore (1 + 1/n)^n approaches the number whose logarithm is 1, which is e.

Slow method (binomial theorem):

  • Multiply out: (1 + 1/n)^n = 1 + n·(1/n) + [n(n-1)/(1·2)]·(1/n)² + [n(n-1)(n-2)/(1·2·3)]·(1/n)³ + ...
  • Each term approaches a limit: n(n-1)/(1·2) · (1/n)² → 1/(1·2), etc.
  • The sum of all limits is 1 + 1 + 1/2 + 1/6 + 1/24 + ... = e.

📈 General formula for e^x

The limit of (1 + x/n)^n is e^x.

  • Replace 1/n with x/n in the binomial expansion.
  • As n → ∞, the result is 1 + x + x²/2 + x³/6 + ... = e^x.
  • This works for positive or negative x.

🔢 Difference equations vs differential equations

🔢 Basic discrete growth

Difference equation: y(t+1) = ay(t)

  • Each step multiplies by the same number a.
  • Solution: y(t) = a^t y₀ (at discrete times t = 0, 1, 2, ...).
  • Compare to continuous: y' = cy gives y(t) = e^(ct) y₀.
FeatureDiscreteContinuous
Equationy(t+1) = ay(t)y' = cy
Solutiona^t y₀e^(ct) y₀
Growth condition|a| > 1c > 0
Decay condition|a| < 1c < 0

💵 With a source term

Difference equation: y(t+1) = ay(t) + s

  • Each step multiplies by a and adds s.
  • Solution: y(t) = a^t y₀ + s(a^(t-1) + a^(t-2) + ... + a + 1) = a^t y₀ + s(a^t - 1)/(a - 1).

Example (IRA deposits): 8% interest with annual $2000 deposits (y₀ = 0, a = 1.08):

  • y(t) = 2000(1.08^t - 1)/(1.08 - 1) = 2000(1.08^t - 1)/0.08.

⚖️ Steady state

When |a| < 1, the system approaches equilibrium:

  • As t → ∞, a^t → 0, so y(t) → s/(1 - a).
  • Example: Half the balance spent yearly with $2000 deposits (a = 1/2): y_∞ = 2000/(1 - 1/2) = $4000.
  • At steady state: y_∞ = ay_∞ + s, solving gives y_∞ = s/(1 - a).

Compare to continuous: y' = cy + s gives steady state y_∞ = -s/c.

📊 Supply and demand example

Three assumptions:

  1. Supply next time depends on price this time: S(t+1) = cP(t).
  2. Demand next time depends on price next time: D(t+1) = -dP(t+1) + b.
  3. Demand equals supply: D(t+1) = S(t+1).

Combining: -dP(t+1) + b = cP(t), with steady state P_∞ = b/(c + d).

Stability condition: The economy is stable if c < d (supply less sensitive than demand).

  • Example (unstable): c = 2, b = d = 1 gives -P(t+1) + 1 = 2P(t); prices oscillate and grow.
  • Example (stable): c = 1/2, b = d = 1 gives -P(t+1) + 1 = (1/2)P(t); prices converge to 2/3.

💳 The six fundamental finance problems

💳 Setup and effective rates

Annual rate x = 0.05 (5%), compounded n times per year over 20 years:

  • Quarterly: (1 + 0.05/4)^4 = 1.0509 → effective rate 5.09%.
  • Continuous: e^0.05 = 1.0513 → effective rate 5.13%.

💳 Future and present value (y and y₀)

Problem 1 (Future value): y growing from y₀

  • Discrete: y = (1 + 0.05/n)^(20n) y₀
  • Continuous: y = e^(0.05·20) y₀

Problem 2 (Present value): Deposit y₀ to reach y

  • Discrete: y₀ = (1 + 0.05/n)^(-20n) y
  • Continuous: y₀ = e^(-0.05·20) y

💳 Deposits and final balance (s and y)

Problem 3 (Savings): y growing from deposits s

  • Discrete: y = s[(1 + 0.05/n)^(20n) - 1]/(0.05/n)
  • Continuous: y = s[e^(0.05·20) - 1]/0.05

Problem 4 (Required deposit): Deposits s to reach y

  • Discrete: s = y(0.05/n)/[(1 + 0.05/n)^(20n) - 1]
  • Continuous: s = y(0.05)/[e^(0.05·20) - 1]

💳 Annuities and loans (y₀ and s)

Problem 5 (Annuity): Deposit y₀ to receive payments s

  • Discrete: y₀ = s[1 - (1 + 0.05/n)^(-20n)]/(0.05/n)
  • Continuous: y₀ = s[1 - e^(-0.05·20)]/0.05
  • You deposit less than 20n·s because the bank earns interest while paying you.

Problem 6 (Loan/Mortgage): Repay y₀ with payments s

  • Discrete: s = y₀(0.05/n)/[1 - (1 + 0.05/n)^(-20n)]
  • Continuous: s = y₀(0.05)/[1 - e^(-0.05·20)]
  • You pay more than y₀ total because you earn interest while repaying.

Pattern: In each pair (1,2), (3,4), (5,6), one of the three numbers y, y₀, s is zero.

🖥️ Scientific computing with Euler's method

🖥️ Discretizing differential equations

To solve y' = cy on a computer, replace derivatives with differences:

Euler's method: [y(t + Δt) - y(t)]/Δt = cy(t)

  • Rearranging: y(t + Δt) = (1 + cΔt)y(t).
  • Each step multiplies by a = 1 + cΔt.
  • After n steps: y = (1 + cΔt)^n at time nΔt.

⚠️ Stability issues

Test case: y' = -y (so c = -1), true solution y = e^(-t) decays.

Δta = 1 + cΔta^10a^20
3-2410241048576
10000
1/100.900.810.350.12
1/200.950.900.600.36
  • Top row (Δt = 3): Total instability—numbers blow up when they should decay.
  • Second row (Δt = 1): All zeros, equally useless.
  • Bottom rows: Reasonable when |cΔt| ≤ 0.10 or 0.05.

Don't confuse: Large time steps cause instability; the method requires small Δt for accuracy.

🎯 Convergence analysis

At clock time nΔt = 1 with Δt = 1/n:

  • (1 - 1/n)^n → e^(-1) ≈ 0.37 as n → ∞.
  • The values 0.35 and 0.36 are converging to the correct e^(-1).

Error estimate: Using ln(1 - Δt)^n ≈ n[-Δt - (1/2)(Δt)²] = -1 - (1/2)Δt:

  • Therefore (1 - Δt)^n ≈ e^(-1)·e^(-Δt/2) ≈ e^(-1)(1 - Δt/2).
  • Error is approximately (1/2)Δt·e^(-1) = (1/5)Δt (since e^(-1)/2 ≈ 1/5).
  • Cutting Δt in half cuts the error in half—first-order accuracy.

🚫 Why Euler's method is inadequate

  • Error proportional to Δt is like using rectangles for integrals.
  • Completely unacceptable for scientific computing (e.g., weather prediction).
  • Better methods (trapezoidal rule) have errors proportional to (Δt)², much smaller.
  • All good software uses higher-order methods beyond Euler's first-order approach.
45

Hyperbolic Functions

6.7 Hyperbolic Functions

🧭 Overview

🧠 One-sentence thesis

Hyperbolic functions (cosh x and sinh x) combine exponentials e^x and e^(-x) in ways that parallel circular trigonometric functions but with sign changes, connecting to hyperbolas instead of circles and appearing in physical applications like hanging cables.

📌 Key points (3–5)

  • What hyperbolic functions are: cosh x and sinh x are specific combinations of e^x and e^(-x), analogous to cosine and sine but for hyperbolas instead of circles.
  • The core identity: (cosh x)² − (sinh x)² = 1 (compare to (cos x)² + (sin x)² = 1), which places the point (cosh x, sinh x) on a unit hyperbola.
  • Parallel properties with sign changes: nearly every formula for sine and cosine has a hyperbolic counterpart, usually differing only by a minus sign.
  • Common confusion: hyperbolic functions are not periodic like sine and cosine; cosh x grows exponentially for large |x|, and sinh x grows exponentially in one direction and decreases in the other.
  • Physical meaning: cosh x describes the shape of a hanging cable (a catenary), and inverse hyperbolic functions can be expressed as logarithms.

📐 Definitions and basic behavior

📐 The two main hyperbolic functions

Hyperbolic cosine: cosh x = (e^x + e^(-x)) / 2
Hyperbolic sine: sinh x = (e^x − e^(-x)) / 2

  • "Cosh" rhymes with "gosh"; "sinh" is pronounced "cinch."
  • For large positive x, both functions approach (1/2)e^x because e^(-x) becomes negligible.
  • For large negative x, e^(-x) dominates:
    • cosh x still goes to +∞ (because the e^(-x) term is positive).
    • sinh x goes to −∞ (because of the minus sign in front of e^(-x)).

🔄 Even and odd symmetry

  • cosh(−x) = cosh x and cosh 0 = 1: cosh is an even function, like cosine.
  • sinh(−x) = −sinh x and sinh 0 = 0: sinh is an odd function, like sine.

🏗️ Physical example: the catenary

  • A cable hanging under its own weight traces the shape of cosh x.
  • The excerpt gives the formula: y = a cosh(x/a), where a = (cable tension) / (cable density).
  • Turned upside down, this shape is the Gateway Arch in St. Louis—"the largest upside-down cosh function ever built."
  • Example: Busch Stadium in St. Louis has 96 catenary curves to match the Arch.

🔗 The hyperbola connection

🔗 The fundamental identity

The key property is:

(cosh x)² − (sinh x)² = 1

  • Compare to the circular identity: (cos x)² + (sin x)² = 1.
  • The circular identity places (cos x, sin x) on a unit circle.
  • The hyperbolic identity places (cosh x, sinh x) on a unit hyperbola.
  • As x varies, the point (cosh x, sinh x) travels along the hyperbola.
  • This is why they are called "hyperbolic" functions (the "h" in cosh and sinh).

✅ Verification of the identity

Check by substitution:

  • Left side = [(e^x + e^(-x))/2]² − [(e^x − e^(-x))/2]²
  • Expand: [(e^(2x) + 2 + e^(-2x))/4] − [(e^(2x) − 2 + e^(-2x))/4]
  • Simplify: (e^(2x) + 2 + e^(-2x) − e^(2x) + 2 − e^(-2x)) / 4 = 4/4 = 1.

🧮 Derivatives and integrals

🧮 Derivatives mirror sine and cosine (with sign changes)

FunctionDerivativeCircular parallel
cosh xsinh xd/dx(cos x) = −sin x
sinh xcosh xd/dx(sin x) = cos x
  • Notice: the derivative of cosh x is sinh x (positive), whereas the derivative of cos x is −sin x (negative).
  • The derivative of sinh x is cosh x, exactly like sin x → cos x.

📦 Integrals follow directly

  • ∫ sinh x dx = cosh x + C
  • ∫ cosh x dx = sinh x + C

🔢 Four other hyperbolic functions

Defined in parallel to tangent, cotangent, secant, cosecant:

  • tanh x = sinh x / cosh x = (e^x − e^(-x)) / (e^x + e^(-x))
  • coth x = cosh x / sinh x = (e^x + e^(-x)) / (e^x − e^(-x))
  • sech x = 1 / cosh x = 2 / (e^x + e^(-x))
  • csch x = 1 / sinh x = 2 / (e^x − e^(-x))

🔢 Identities from the fundamental relation

Divide (cosh x)² − (sinh x)² = 1 by (cosh x)² or (sinh x)²:

  • 1 − (tanh x)² = (sech x)²
  • (coth x)² − 1 = (csch x)²

🔢 Example derivatives and integrals

  • d/dx(tanh x) = (sech x)²
  • d/dx(sech x) = −sech x tanh x
  • ∫ tanh x dx = ∫ (sinh x / cosh x) dx = ln(cosh x) + C

🔁 Inverse hyperbolic functions

🔁 Three main inverse functions

The excerpt highlights three:

  1. y = sinh⁻¹ x (meaning x = sinh y) has derivative dy/dx = 1 / √(x² + 1)
  2. y = tanh⁻¹ x (meaning x = tanh y) has derivative dy/dx = 1 / (1 − x²)
  3. y = sech⁻¹ x (meaning x = sech y) has derivative dy/dx = −1 / [x√(1 − x²)]
  • These derivatives are computed by the chain rule: dy/dx = 1 / (dx/dy).
  • They provide new antiderivative formulas (integrals).

🪵 Logarithmic forms

Because ln x is the inverse of e^x, inverse hyperbolic functions can be expressed as logarithms:

  • sinh⁻¹ x = ln[x + √(x² + 1)]
  • cosh⁻¹ x = ln[x + √(x² − 1)]
  • tanh⁻¹ x = (1/2) ln[(1 + x)/(1 − x)]
  • sech⁻¹ x = ln[(1 + √(1 − x²))/x]

🪵 Example derivation for tanh⁻¹ x

The excerpt shows:

  • Start with y = tanh⁻¹ x, so x = tanh y.
  • Then (1 + x)/(1 − x) = (1 + tanh y)/(1 − tanh y).
  • Multiply numerator and denominator by cosh y: = (cosh y + sinh y)/(cosh y − sinh y) = e^y / e^(−y) = e^(2y).
  • Take logarithms: 2y = ln[(1 + x)/(1 − x)], so y = (1/2) ln[(1 + x)/(1 − x)].
  • Differentiate: dy/dx = (1/2)[1/(1 + x) − (−1)/(1 − x)] = 1/(1 − x²).

🪵 Why logarithms appear for hyperbolic but not circular inverses

  • The excerpt notes that sin⁻¹ x and tan⁻¹ x were not expressed as logarithms.
  • Answer: parallel formulas do exist for circular functions, but they involve imaginary numbers.
  • The hyperbolic functions use real exponentials, so their inverses naturally become real logarithms.

🌀 Connection to circular functions via imaginary exponents

🌀 Euler's formulas

The excerpt reveals "one of the great equations of mathematics":

cos x = (e^(ix) + e^(−ix)) / 2
sin x = (e^(ix) − e^(−ix)) / (2i)

  • These involve imaginary exponents (i = √(−1)).
  • Multiply sin x by i and add to cos x:

cos x + i sin x = e^(ix) (Euler's equation)

  • The excerpt calls this "unbelievably beautiful" and "infinitely more important than anything hyperbolic."

🌀 Parallel to hyperbolic functions

  • The hyperbolic counterpart is: cosh x + sinh x = e^x (not beautiful, according to the excerpt).
  • The formulas for cos x and sin x are exactly parallel to cosh x and sinh x, except for the imaginary unit i.
  • This explains why every sine/cosine identity has a hyperbolic counterpart with sign changes: replacing i with 1 (or vice versa) flips certain signs.

🌀 Don't confuse

  • Hyperbolic functions use real exponentials and grow without bound.
  • Circular functions use imaginary exponentials and are periodic.
  • The analogy is structural (formulas and identities), not behavioral (graphs and limits).
46

Integration by Parts

7.1 Integration by Parts

🧭 Overview

🧠 One-sentence thesis

Integration by parts transforms difficult integrals into easier ones by reversing the product rule, exchanging the problem of integrating u dv for the simpler problem of integrating v du.

📌 Key points (3–5)

  • What it does: converts the integral of u dv into uv minus the integral of v du, based on reversing the product rule for derivatives.
  • How to choose u and v: give u a nice derivative and dv a nice integral; typically ln x or inverse trig functions go into u, while e^x or trig functions go into dv.
  • When it works best: prime candidates include x^n times e^x, x^n times sin x or cos x, x^n times ln x, and inverse functions like arctan x.
  • Common confusion: the new integral ∫v du must be simpler than the original ∫u dv; poor choices (like u = cos x, dv = x dx for ∫x cos x dx) make the problem harder instead of easier.
  • Why it matters: integration by parts is not just a trick—it expresses physical laws like force balance and work equilibrium in engineering and physics.

🔄 The core formula and how it works

🔄 Deriving the formula from the product rule

The product rule states: u(x) · dv/dx + v(x) · du/dx = d/dx[u(x)v(x)]

  • Integrate both sides of the product rule.
  • The right side becomes u(x)v(x) after integration.
  • Move one integral to the other side with a minus sign.
  • Result: ∫u dv = uv − ∫v du (formula 3).

📐 For definite integrals

  • The integrated term uv is evaluated at the endpoints a and b.
  • Formula: ∫[a to b] u dv/dx dx = u(b)v(b) − u(a)v(a) − ∫[a to b] v du/dx dx.
  • Example: the area under y = ln x from 2 to 3 is 3 ln 3 − 3 − 2 ln 2 + 2.

🎯 The goal of choosing u and v

  • The new integral ∫v du should be easier than the original ∫u dv.
  • Try to give u a nice derivative and dv a nice integral.
  • If the choice makes the problem harder (e.g., x² appears instead of x), the direction is wrong.

🧪 Standard examples and patterns

🧪 Integrating ln x (Example 1)

  • Choose u = ln x and dv = dx (so v = x).
  • Apply the formula: ∫ln x dx = x ln x − ∫x · (1/x) dx.
  • The right side simplifies to x ln x − ∫1 dx = x ln x − x + C.
  • Key insight: we exchanged the integral of ln x for the integral of 1.

🧪 Integrating x cos x (Example 2)

  • Choose u = x and dv = cos x dx (so v = sin x).
  • Apply: ∫x cos x dx = x sin x − ∫sin x dx.
  • Complete: x sin x + cos x + C.
  • Don't confuse: if you chose u = cos x and dv = x dx, you'd get ½x² cos x + ∫½x² sin x dx, which is harder (x² is worse than x).

🧪 Integrating (cos x)² (Example 3)

  • Choose u = cos x and dv = cos x dx (so v = sin x).
  • Apply: ∫(cos x)² dx = cos x sin x + ∫(sin x)² dx.
  • Substitute (sin x)² = 1 − (cos x)² on the right.
  • Result: ∫(cos x)² dx = cos x sin x + x − ∫(cos x)² dx.
  • The original integral appears on both sides; move it to the left: 2∫(cos x)² dx = cos x sin x + x.
  • Divide by 2: ∫(cos x)² dx = ½(cos x sin x + x) + C.
  • Note: the definite integral from 0 to 2π equals π (by symmetry with (sin x)²).

🧪 Integrating arctan x (Example 4)

  • Choose u = arctan x and v = x (so dv = dx).
  • Apply: ∫arctan x dx = x arctan x − ∫x dx/(1 + x²).
  • The last integral has w = 1 + x² below and almost dw = 2x dx above.
  • Substitute: ∫x dx/(1 + x²) = ½∫dw/w = ½ ln w = ½ ln(1 + x²).
  • Final answer: x arctan x − ½ ln(1 + x²) + C.
  • Pattern: all familiar inverse functions can be integrated by parts with v = x.

🧪 Integrating x² e^x (Example 5)

  • Choose u = x² and dv = e^x dx (so v = e^x).
  • First integration: ∫x² e^x dx = x² e^x − ∫e^x · 2x dx.
  • The last integral involves x e^x, which still needs work.
  • Second integration (now u = x): ∫x e^x dx = x e^x − ∫e^x dx = x e^x − e^x.
  • Substitute back: ∫x² e^x dx = x² e^x − 2[x e^x − e^x] + C.
  • Key insight: two integrations by parts are needed when the first one only simplifies the problem halfway.

📋 Prime candidates and choosing strategy

📋 List of prime candidates

TypeExampleTypical choice
x^n times exponentialx^n e^xu = x^n, dv = e^x dx
x^n times trigx^n sin x, x^n cos xu = x^n, dv = sin x or cos x dx
x^n times logarithmx^n ln xu = ln x, dv = x^n dx
Exponential times trige^x sin x, e^x cos xEither can be u or v
Inverse trigarcsin x, arctan xu = inverse function, v = x

📋 How to decide

  • Logarithms and inverse functions: almost always choose as u (they have nice derivatives).
  • Exponentials and trig: usually choose as dv (they have nice integrals).
  • Powers of x: choose as u (the derivative lowers the power).
  • Don't confuse: ∫x sin(x²) dx uses substitution (not parts), because the derivative of x² is present.

🔧 Physical meaning (optional)

🔧 Integration by parts in engineering

In engineering, −dv/dx = f(x) represents a balance of forces, where v(x) = k du/dx is the internal force created by stretching (Hooke's law).

  • Multiply the force balance −dv/dx = f(x) by displacement u(x) and integrate.
  • Result: ∫f(x)u(x) dx = −u(x)v(x) + ∫v(x) du/dx dx (formula 18).
  • Left side: force times displacement = external work.
  • Right side last term: internal force times stretching = internal work.
  • Integrated term: includes −u(1)W, the work by a hanging weight (u(0) = 0 at the fixed support does no work).
  • Key insight: the balance of forces becomes a balance of work; this is the principle of virtual work, fundamental to mechanics.

🔧 Example: hanging bar (Example 6)

  • A bar is pulled down by constant force per unit length f(x) = F.
  • Solve: v(x) = −Fx + C and ku(x) = −½Fx² + Cx + D.
  • Boundary conditions: u = 0 at x = 0 (fixed top) gives D = 0; v = W at x = 1 (hanging weight) gives C = W + F.
  • Integration by parts shows external work equals internal work plus work by the weight.

🌀 The delta function (optional)

🌀 What the delta function is

The delta function δ(x) is the derivative of the unit step function U(x), which jumps from 0 to 1 at x = 0.

  • δ(x) = dU/dx, though there is no genuine derivative at the jump.
  • δ(x) = 0 everywhere except at x = 0, where it has an "infinite spike."
  • The integral across the jump: ∫[−A to A] δ(x) dx = U(A) − U(−A) = 1 (formula 13).
  • Key point: the delta function is only known by its integrals; it is not a true function.

🌀 Integrating v(x) times δ(x)

  • The integral of v(x)δ(x) equals v(0), the value of v at the spike.
  • Proof by integration by parts: ∫[−A to A] v(x)δ(x) dx = v(x)U(x) evaluated at endpoints − ∫[−A to A] U(x) dv/dx dx.
  • Simplify: v(A) · 1 − ∫[0 to A] 1 · dv/dx dx = v(A) − [v(A) − v(0)] = v(0) (formula 15).
  • Examples:
    • ∫[−2 to 2] cos x δ(x) dx = 1 (value of cos x at x = 0).
    • ∫[−6 to 5] (U(x) + δ(x)) dx = 7 (step function contributes area, delta contributes 1).

🌀 Why it matters

  • In physics, the delta function represents concentrated forces or impulses.
  • Integration by parts makes physical sense: it connects the delta function (derivative of a jump) to smooth functions through their values at the spike.
47

Trigonometric Integrals

7.2 Trigonometric Integrals

🧭 Overview

🧠 One-sentence thesis

Any product of sines, cosines, secants, and tangents can be integrated using either substitution (when an exponent is odd) or double-angle formulas (when both exponents are even), with the choice of method depending on the parity of the exponents.

📌 Key points (3–5)

  • When to use substitution: If either the sine or cosine exponent is odd, separate out one factor (sin x dx or cos x dx) as du and convert the rest using sin²x + cos²x = 1.
  • When to use double angles: If both exponents are even, apply the identities cos²x = ½(1 + cos 2x) and sin²x = ½(1 - cos 2x) to halve the exponents and double the angle.
  • Special products with different angles: The integral of sin px cos qx (and similar products) over a full period equals zero when p ≠ q, but equals π when p = q for squared terms—this is fundamental to Fourier series.
  • Common confusion: Don't try substitution when both exponents are even; u = sin x won't help integrate sin⁴x dx without du present.
  • Tangents and secants: Use the identity 1 + tan²x = sec²x to convert between them, allowing negative powers (which introduce logarithms).

🔄 The substitution method for odd exponents

🎯 When one exponent is odd

General method for ∫ sinᵐx cosⁿx dx when m or n is odd: If n is odd, separate out a single cos x dx as du, convert remaining cosines to sines using cos²x = 1 - sin²x, then integrate with u = sin x. If m is odd, separate sin x dx as du and convert the rest to cosines.

  • The key is recognizing that an odd exponent means you can "peel off" one factor to serve as du.
  • Example: For ∫ sin²x cos³x dx, keep cos x dx as du, replace cos²x with (1 - sin²x), giving ∫ sin²x(1 - sin²x) cos x dx = ∫ u²(1 - u²) du.
  • This produces polynomial integrals in u that are straightforward to evaluate.

🔢 Both exponents odd

  • When both m and n are odd, either method works—choose whichever seems simpler.
  • Example: For ∫ sin⁵x dx, keep sin x dx and convert everything else: ∫ (1 - cos²x)² sin x dx with u = cos x and du = -sin x dx.

📐 The double-angle method for even exponents

🎲 Core identities

The double-angle formulas are:

  • cos²x = ½(1 + cos 2x)
  • sin²x = ½(1 - cos 2x)

These should be memorized along with their integrals:

  • ∫ cos²x dx = ½(x + sin x cos x) or ½x + ¼ sin 2x (plus C)
  • ∫ sin²x dx = ½(x - sin x cos x) or ½x - ¼ sin 2x (plus C)

🔁 Repeated application

  • For higher even powers like cos⁴x, apply the formula once to get cos²2x, then apply it again if needed.
  • Example: ∫ cos⁴x dx = ∫ (½ + ½ cos 2x)² dx = ¼ ∫ (1 + 2 cos 2x + cos² 2x) dx.
  • The cos² 2x term requires another application: cos² 2x = ½(1 + cos 4x).
  • Powers are cut in half as the angle doubles: cos⁴x → cos² 2x → cos 4x.

⚙️ Integration by parts alternative

  • Instead of double angles, you can use integration by parts with a reduction formula.
  • Reduction formula: n ∫ cosⁿx dx = cosⁿ⁻¹x sin x + (n - 1) ∫ cosⁿ⁻²x dx.
  • This reduces the exponent by 2 each time until reaching cos²x or cos x.
  • Don't confuse: Both methods (double-angle and parts) give correct answers that look different but are equivalent.

🌊 Products with different angles

🎵 Sine times cosine with different frequencies

Key identity: sin px cos qx = ½ sin(p + q)x + ½ sin(p - q)x

  • This separates the product into two simple sines.
  • Example: sin 8x cos 6x = ½ sin 14x + ½ sin 2x, which integrates easily.
  • Over a full period (0 to 2π), the integral equals zero because cosines at the endpoints cancel.

🔊 Same trigonometric functions

For products of two sines or two cosines:

  • sin px sin qx = -½ cos(p + q)x + ½ cos(p - q)x
  • cos px cos qx = ½ cos(p + q)x + ½ cos(p - q)x

Critical distinction:

  • When p ≠ q: the integral from 0 to 2π equals zero (all cosine terms vanish).
  • When p = q: you have sin²px or cos²px, and the integral equals π (the constant term ½ contributes).

📡 Importance to Fourier series

  • These zero integrals are "the most important in all of mathematics" according to the excerpt.
  • They enable signal processing and Fourier analysis because different frequencies are orthogonal.
  • Example: ∫₀²π sin 8x sin 7x dx = 0, but ∫₀²π sin² 8x dx = π.

🔺 Tangents and secants

🔑 The fundamental identity

Key identity: 1 + tan²x = sec²x (derived from cos²x + sin²x = 1 by dividing by cos²x)

  • This allows converting between tangents and secants, just as we converted between sines and cosines.
  • Since (tan x)' = sec²x and (sec x)' = sec x tan x, these are the natural substitution pairs.

📊 Basic integrals with logarithms

Negative powers introduce logarithms:

  • ∫ tan x dx = -ln|cos x| (using u = cos x)
  • ∫ sec x dx = ln|sec x + tan x| (using u = sec x + tan x)

🔽 Reduction strategy

  • For ∫ tanᵐx dx: separate off tan²x as (sec²x - 1), integrate tanᵐ⁻²x sec²x dx as a u-substitution, leaving ∫ tanᵐ⁻²x dx with exponent lowered by 2.
  • For ∫ sec³x dx: use integration by parts with u = sec x and v = tan x, then apply the identity to reduce back to ∫ sec x dx.
  • Every power tanᵐx or secⁿx eventually reduces to the basic logarithmic integrals.

🌍 Real-world application: Mercator maps

  • The integral ∫ sec x dx = ln(sec x + tan x) measures distance from the equator to latitude x on a Mercator projection.
  • Mapmakers stretch each latitude strip by 1/cos x (i.e., sec x) to preserve angles.
  • The distance north adds up strip heights dx/cos x, giving ∫ sec x dx.
  • Don't confuse: The distance to the North Pole is infinite on this projection because sec x → ∞ as x → π/2.
48

Trigonometric Substitutions

7.3 Trigonometric Substitutions

🧭 Overview

🧠 One-sentence thesis

Trigonometric substitution transforms awkward square-root integrals into simpler trigonometric forms by replacing x with sin θ, tan θ, or sec θ depending on the structure of the expression under the radical.

📌 Key points (3–5)

  • Core method: Replace x with a trigonometric function to convert difficult square roots into familiar trigonometric identities.
  • Three main patterns: Use x = a sin θ for √(a² − x²), x = a tan θ for √(a² + x²), and x = a sec θ for √(x² − a²).
  • Critical detail: Always replace dx with the corresponding differential (e.g., dx = a cos θ dθ when x = a sin θ); forgetting this step gives completely wrong answers.
  • Common confusion: Don't confuse which substitution to use—the sign and position of terms under the square root determine the correct choice.
  • Completing the square: When linear terms (like 2bx) appear in the quadratic, first rewrite as a perfect square plus a constant before applying trigonometric substitution.

🔄 The three standard substitutions

🔄 For √(a² − x²): use x = a sin θ

When the square root has the form √(a² − x²), substitute x = a sin θ.

  • Why it works: The identity 1 − sin² θ = cos² θ converts the awkward square root into a simple cos θ.
  • Don't forget: dx becomes a cos θ dθ (both the a and the cos θ dθ are essential).
  • Example: √(1 − x²) becomes √(1 − sin² θ) = cos θ (assuming cos θ is positive).
  • Returning to x: Since x = a sin θ, the angle θ = sin⁻¹(x/a) at the end.

🔄 For √(x² − a²): use x = a sec θ

When the square root has the form √(x² − a²), substitute x = a sec θ.

  • Why it works: The identity sec² θ − 1 = tan² θ converts x² − a² into a² tan² θ, so the square root becomes a tan θ.
  • Don't forget: dx becomes a sec θ tan θ dθ.
  • This substitution applies when x is outside the interval |x| ≤ a (where the square root is real).
  • Example: √(x² − 16) with x = 4 sec θ becomes √(16 sec² θ − 16) = 4 tan θ.

🔄 For √(a² + x²) or (a² + x²): use x = a tan θ

When the expression has the form a² + x², substitute x = a tan θ.

  • Why it works: The identity 1 + tan² θ = sec² θ converts a² + x² into a² sec² θ.
  • Don't forget: dx becomes a sec² θ dθ.
  • Example: The integral of dx/(16 + x²) with x = 4 tan θ becomes ∫(4 sec² θ dθ)/(16 sec² θ) = ∫dθ/4.

📊 Comparison table

Square root formSubstitutionIdentity useddx replacementResult under √
√(a² − x²)x = a sin θ1 − sin² θ = cos² θa cos θ dθa cos θ
√(x² − a²)x = a sec θsec² θ − 1 = tan² θa sec θ tan θ dθa tan θ
√(a² + x²) or (a² + x²)x = a tan θ1 + tan² θ = sec² θa sec² θ dθa sec θ

⚠️ Critical details and common mistakes

⚠️ The dx replacement is mandatory

  • The mistake: Computing ∫cos θ dθ when you meant ∫√(1 − x²) dx gives a totally wrong answer.
  • Why: Substitution changes both the integrand and the differential; the area remains the same only if both are changed correctly.
  • The excerpt emphasizes: "If we go from √(1 − x²) to cos θ, and forget the difference between dx and dθ, and just compute ∫cos θ dθ, the answer is totally wrong."

⚠️ Changing limits for definite integrals

  • When x changes to θ, the limits of integration must also change.
  • Example: If x = 4 at the lower limit and x = a sin θ, then sin θ = 1, so θ = π/2.
  • Warning: Some limits may cause division by zero (infinite area); check carefully.

⚠️ Infinite vs finite area

  • Both 1/√x and 1/x^(3/2) blow up at x = 0, but:
    • ∫₀¹ (1/√x) dx = 2 (finite area, slow growth)
    • ∫₀¹ (1/x^(3/2)) dx = ∞ (infinite area, fast growth)

🔧 Completing the square

🔧 When linear terms appear

If the quadratic contains a linear term 2bx, rewrite x² + 2bx + c as (x + b)² + C to remove the linear term.

  • The technique: Match x² + 2bx with the square (x + b)² = x² + 2bx + b², then adjust the constant.
  • Fixing the constant: C = c − b² (for positive x²) or C = c + b² (for negative x²).
  • Example: x² + 10x + 16 = (x + 5)² − 9, so set u = x + 5 to get u² − 9.

🔧 Three-step process

  1. Complete the square: Get x² and x terms into one square (x + b)².
  2. Fix the constant: Add or subtract to recover the original function.
  3. Substitute u: Set u = x + b to eliminate the linear term, then apply standard trigonometric substitution.

🔧 When x² has a coefficient

  • If the quadratic starts with 5x² or −5x², factor out the 5 first.
  • Example: 5x² − 10x + 25 = 5(x² − 2x + 5) = 5[(x − 1)² + 4].
  • Then u = x − 1 gives 5[u² + 4], ready for u = 2 tan θ.

🔀 Alternative: hyperbolic substitutions

🔀 Hyperbolic functions as substitutes

The excerpt mentions hyperbolic substitutions as alternatives:

TrigonometricHyperbolic alternative
x = a sin θ for √(a² − x²)x = a tanh θ (uses sech² θ)
x = a tan θ for √(a² + x²)x = a sinh θ (uses cosh² θ)
x = a sec θ for √(x² − a²)x = a cosh θ (uses sinh² θ)
  • Hyperbolic forms may look simpler during integration.
  • The final answer often appears as cosh⁻¹ x or sinh⁻¹ x, which can be rewritten as logarithms.
  • Example: ∫dx/√(x² − 1) = cosh⁻¹ x = ln(x + √(x² − 1)).
49

Partial Fractions

7.4 Partial Fractions

🧭 Overview

🧠 One-sentence thesis

The method of partial fractions splits a rational function P(x)/Q(x) into a sum of simpler fractions—each easy to integrate—by factoring the denominator Q and finding constants that match the numerators.

📌 Key points (3–5)

  • What the method does: breaks P/Q into a sum of simpler pieces (partial fractions), each with a constant or linear numerator over a factor of Q.
  • When to use it: the degree of P must be less than the degree of Q; if not, divide first to lower P's degree.
  • How to find the constants: the "cover-up method" (Method 2) is quickest—cover up a factor in Q, substitute the value of x that makes that factor zero, and solve for the constant.
  • Common confusion—repeated vs single factors: a single factor (x - 5) contributes A/(x - 5); a repeated factor (x - 5)² contributes both A/(x - 5) and B/(x - 5)².
  • Why it matters: every rational function can be split this way, and each piece integrates to logarithms, arctangents, or simple rational functions.

🧩 Core idea and setup

🧩 What partial fractions are

Partial fractions: expressing a rational function P(x)/Q(x) as a sum of simpler fractions, each easy to integrate.

  • A rational function is a ratio of polynomials.
  • The integral of P/Q often involves logarithms (e.g., integral of 1/(x - 2) is ln|x - 2| + C).
  • Instead of integrating P/Q directly, split it into pieces like A/(x - 2) + B/(x + 2) + C/x, then integrate each piece separately.
  • Example: the sum 1/(x - 2) + 3/(x + 2) - 4/x came from a single rational function with common denominator (x - 2)(x + 2)(x).

📐 Degree requirement

  • The degree of P must be less than the degree of Q before splitting.
  • If degree(P) ≥ degree(Q), divide the leading term of P by the leading term of Q first.
  • Example (Example 4): (3x² + 2x + 7)/(x² + 1) becomes 3 + (2x + 4)/(x² + 1) after dividing 3x² by x².
  • Only the remainder (lower degree) is split into partial fractions.

🔍 Finding the factors and form

🔍 Factoring the denominator Q

  • The first step is to factor Q into linear factors (like x - 2) and quadratic factors (like x² + 1).
  • Linear factors correspond to real roots; quadratic factors (that don't factor further) correspond to imaginary roots.
  • Example: x⁴ - 1 factors into (x² - 1)(x² + 1), then (x + 1)(x - 1)(x² + 1).
  • The excerpt notes that factoring cubics or quartics is often hard in practice; most examples give the factors.

📝 Expected form of partial fractions

Each factor contributes a fraction with a specific numerator:

Factor typeContributesExample
Single linear (x - a)A/(x - a)A/(x - 2)
Repeated linear (x - a)²A/(x - a) + B/(x - a)²A/(x - 1) + B/(x - 1)²
Single quadratic (x² + bx + c)(Cx + D)/(x² + bx + c)(Cx + D)/(x² + 4)
Repeated quadratic (x² + bx + c)²(Cx + D)/(x² + bx + c) + (Ex + F)/(x² + bx + c)²(not shown in detail)
  • A linear factor gets a constant numerator (A, B, C, ...).
  • A quadratic factor gets a linear numerator (Cx + D, Ex + F, ...).
  • Don't confuse: a repeated factor needs additional terms, not just one.

⚙️ Methods to find the constants

⚙️ Method 1 (slow): matching numerators

  • Put all fractions over the common denominator Q.
  • The numerators on both sides must be identical polynomials.
  • Expand and match coefficients of x⁴, x³, x², x, and the constant term.
  • This gives a system of equations for A, B, C, ...
  • Example (Example 1): (3x² + 8x - 4)/Q = [A(x + 2)(x) + B(x - 2)(x) + C(x - 2)(x + 2)]/Q.
  • The excerpt calls this an "invitation to human error."

🎯 Method 2 (quicker): cover-up method

Cover-up method: cover up a factor in Q, substitute the value of x that makes that factor zero, and the rest of the equation gives the constant.

  • To find A (numerator of A/(x - 2)):
    • Cover up (x - 2) in the denominator of P/Q.
    • Set x = 2 (the value that makes x - 2 zero).
    • Evaluate the rest of P/Q; the result is A.
  • Example (Example 1): to find A in (3x² + 8x - 4)/[(x - 2)(x + 2)(x)], cover up (x - 2) and set x = 2:
    • (3·2² + 8·2 - 4)/[(2 + 2)(2)] = 24/8 = 3 = A.
  • Repeat for each linear factor.
  • Example (Example 2): for (x + 2)/[(x - 1)(x + 3)], cover up (x - 1) and set x = 1 gives A = 3/4; cover up (x + 3) and set x = -3 gives B = -1/4.

🧮 Handling repeated and quadratic factors

  • Repeated linear factors: use cover-up for the highest power, then match numerators for the rest.
    • Example (Example 5): (2x + 3)/(x - 1)² = A/(x - 1) + B/(x - 1)².
    • Multiply by (x - 1)² and set x = 1 to find B = 5.
    • Then match numerators or substitute another value to find A = 2.
  • Quadratic factors: cover-up doesn't work directly; match numerators after finding constants from linear factors.
    • Example (Example 6): after finding B and E by cover-up, match the full numerators to solve for A, C, D.
    • The excerpt says "compare the numerators" and "match coefficients" as a last resort.

🧪 Worked examples

🧪 Example 1: three linear factors

  • Problem: (3x² + 8x - 4)/[(x - 2)(x + 2)(x)] = A/(x - 2) + B/(x + 2) + C/x.
  • Cover up (x - 2), set x = 2: A = 3.
  • Cover up (x + 2), set x = -2: B = -1.
  • Cover up x, set x = 0: C = 1.
  • Result: 3/(x - 2) - 1/(x + 2) + 1/x.

🧪 Example 3: logistic equation

  • Problem: 1/[y(c - by)] = A/y + B/(c - by).
  • Multiply by y, set y = 0: A = 1/c.
  • Multiply by (c - by), set y = c/b: B = b/c.
  • Integral: (1/c)ln|y| + (b/c)ln|c - by| (up to constants).
  • This was needed for the logistic differential equation.

🧪 Example 4: quadratic in denominator

  • Problem: (3x² + 2x + 7)/(x² + 1).
  • Degree of P equals degree of Q, so divide first: 3 + (2x + 4)/(x² + 1).
  • The quadratic x² + 1 cannot be factored into real linear factors.
  • Partial fractions accept a linear numerator (2x + 4) over a quadratic.
  • Integral: 3x + ln(x² + 1) + 4 tan⁻¹(x) + C.

🧪 Example 5: repeated linear factor

  • Problem: (2x + 3)/(x - 1)² = A/(x - 1) + B/(x - 1)².
  • Multiply by (x - 1)², set x = 1: B = 5.
  • Then A = 2 (by matching or substituting another x).
  • Integral: 2 ln|x - 1| - 5/(x - 1) + C.
  • Note: the fraction 5/(x - 1)² integrates without logarithms.

🧪 Example 6: everything combined

  • Problem: (2x³ + 9x² + 4)/[x²(x² + 4)(x - 1)] = A/x + B/x² + (Cx + D)/(x² + 4) + E/(x - 1).
  • Degree of P (3) < degree of Q (5), so no division needed.
  • Cover up (x - 1), set x = 1: E = 3.
  • Cover up x², set x = 0: B = -1.
  • Match numerators for the rest: A = -1, C = -2, D = 0.
  • The excerpt calls this "more of a game than a calculus problem."

📊 General rules and integration

📊 Four-step procedure

  1. Check degree: if degree(P) ≥ degree(Q), divide first.
  2. Factor Q: split into linear and quadratic factors (possibly repeated).
  3. Set up partial fractions: write the expected form (constants over linear factors, linear terms over quadratics, extra terms for repeated factors).
  4. Find constants: use cover-up for linear factors, then match numerators for the rest.
  5. Integrate: each piece integrates separately (logarithms for linear factors, arctangents or logarithms for quadratics).

📊 Integration outcomes

  • A/(x - a) integrates to A ln|x - a| + C.
  • B/(x - a)² integrates to -B/(x - a) + C (no logarithm).
  • (Cx + D)/(x² + 1) splits into C·(2x)/(x² + 1) (logarithm) and D/(x² + 1) (arctangent).
  • For x² + x + 1, complete the square before integrating (as in Sections 7.2 and 7.3).
  • The excerpt emphasizes: "we never have to go higher than quadratics."

📊 Why the method always works

  • Every polynomial Q can be factored into linear and quadratic factors (over the reals).
  • The form of partial fractions matches this factorization.
  • Matching numerators is always possible (it's a system of linear equations).
  • The excerpt says "we could prove that this method always works" but instead shows it works in Example 6.

⚠️ Common pitfalls and tips

⚠️ Don't confuse single and repeated factors

  • A single (x - 5) contributes only A/(x - 5).
  • A repeated (x - 5)² contributes both A/(x - 5) and B/(x - 5)².
  • A triple (x - 5)³ would add C/(x - 5)³.
  • Example (Example 5): (2x + 3)/(x - 1)² needs both terms.

⚠️ Quadratic factors need linear numerators

  • Don't write A/(x² + 4); write (Cx + D)/(x² + 4).
  • The numerator must have degree one less than the denominator.
  • Example (Example 4): (2x + 4)/(x² + 1) is correct; 2/(x² + 1) would be incomplete.

⚠️ Cover-up is fastest for linear factors

  • The excerpt calls Method 1 (matching all numerators) "slow" and "an invitation to human error."
  • Cover-up (Method 2) is "quicker" and "the way to start, and usually the way to finish."
  • For repeated or quadratic factors, use cover-up first, then match numerators at the end.

⚠️ Practical limits

  • Factoring Q is often the hardest step; the excerpt notes "there is no magic way to find those factors."
  • Most examples give the factors; in practice, only simple polynomials can be factored by hand.
  • The excerpt author says "you should never have to do such a problem" (referring to Example 6) and "I never intend to do another one."
50

Improper Integrals

7.5 Improper Integrals

🧭 Overview

🧠 One-sentence thesis

Improper integrals—where limits or the function become infinite—can still converge to finite areas, and we decide convergence by comparing to known benchmark integrals like 1/x^p.

📌 Key points (3–5)

  • What "improper" means: the integral has infinite limits (b = ∞ or a = −∞) or the function y becomes infinite somewhere in the interval.
  • Finite area despite infinite region: just because the region extends infinitely doesn't guarantee infinite area; many improper integrals converge to finite values.
  • The p = 1 borderline: for 1/x^p from 1 to ∞, the area is finite when p > 1 but infinite when p ≤ 1; near x = 0, the area is finite when p < 1 but infinite when p ≥ 1.
  • Common confusion—two infinities: an integral like ∫(0 to ∞) dx/(x(ln x)²) can have trouble at both x = 0 and x = ∞, or at an interior point like x = 1 where ln x = 0.
  • Comparison test: if 0 ≤ u(x) ≤ v(x), then convergence of ∫v(x)dx implies convergence of ∫u(x)dx, and divergence of ∫u(x)dx implies divergence of ∫v(x)dx.

🔢 Three types of improper integrals

🔢 Upper limit b = ∞

Improper integral with infinite upper limit: ∫(a to ∞) y(x)dx = limit as b → ∞ of ∫(a to b) y(x)dx.

  • In practice, substitute the dangerous limit directly and watch what happens.
  • Example: ∫(1 to ∞) dx/x² = −1/x evaluated from 1 to ∞ = 1, because "1/∞ = 0."
  • Example: ∫(1 to ∞) dx/x = ln x from 1 to ∞ = ∞ (diverges).
  • The strict rule uses a limit: compute ∫(1 to b) dx/x², then let b approach infinity.

🔢 Lower limit a = −∞

Improper integral with infinite lower limit: ∫(−∞ to b) y(x)dx = limit as a → −∞ of ∫(a to b) y(x)dx.

  • Example: ∫(−∞ to 0) e^x dx = e^x from −∞ to 0 = 1, because "e^(−∞) = 0."
  • The exponential function e^x decays to zero as x → −∞, so the area is finite.

🔢 Function y becomes infinite

  • The function may blow up at one endpoint or inside the interval.
  • Example: ∫(0 to 1) dx/√x has y = 1/√x → ∞ as x → 0, but the area is 2√x from 0 to 1 = 2 (finite).
  • Loosely speaking, "−ln 0 = ∞"; strictly, integrate from a near zero to 1, then let a → 0.

📏 The p = 1 borderline

📏 Going out to ∞: p > 1 converges

  • For ∫(1 to ∞) dx/x^p:
    • If p > 1, the integral equals 1/(p − 1) (finite).
    • If p ≤ 1, the integral diverges to ∞.
  • The borderline case p = 1 gives ∫(1 to ∞) dx/x = ln x = ∞ (diverges).
  • Example: p = 1.01 gives area = 1/(1.01 − 1) = 100; the region is infinite but the area is finite.

📏 Climbing the y-axis at x = 0: p < 1 converges

  • For ∫(0 to 1) dx/x^p:
    • If p < 1, the integral equals 1/(1 − p) (finite).
    • If p ≥ 1, the integral diverges to ∞.
  • Example: p = 1/2 (y = 1/√x) gives area = 2; p = 99/100 gives area = 100.
  • Don't confuse: the same function 1/x^p has opposite convergence behavior at 0 vs ∞.
RegionConvergence conditionDivergence condition
∫(1 to ∞) dx/x^pp > 1p ≤ 1
∫(0 to 1) dx/x^pp < 1p ≥ 1

📏 Narrower borderlines

  • Under 1/x, the area is infinite; dividing by ln x or (ln x)² creates a narrower borderline.
  • ∫(e to ∞) dx/(x ln x) = ln(ln x) from e to ∞ = ∞ (diverges, but very slowly).
  • ∫(e to ∞) dx/(x(ln x)²) = −1/ln x from e to ∞ = 1 (converges).
  • Warning: ∫(0 to ∞) dx/(x(ln x)²) has another infinity at x = 1, where ln x = 0; the area is infinite there.

🧪 Comparison test

🧪 How comparison works

Comparison test: If 0 ≤ u(x) ≤ v(x), then the area under u(x) is smaller than the area under v(x).

  • If ∫v(x)dx is finite, then ∫u(x)dx is finite (and smaller).
  • If ∫u(x)dx is infinite, then ∫v(x)dx is infinite (and larger).
  • The trick: construct a simple benchmark function (like 1/x^p) that stays on one side of the given function.

🧪 Examples of comparison

Example (converges by larger function): ∫(1 to ∞) dx/(x² + 4x) converges by comparison with ∫(1 to ∞) dx/x² = 1.

  • Removing 4x from the denominator increases the area, so the original integral is between 0 and 1.

Example (diverges by smaller function): ∫(1 to ∞) dx/√(x + 1) diverges by comparison with ∫(1 to ∞) dx/(2√x) = ∞.

  • Increasing the denominator decreases the area, but the smaller integral still diverges, so the original diverges.

Example (e^(−x²)): ∫(0 to ∞) e^(−x²)dx is below ∫(0 to 1) 1 dx + ∫(1 to ∞) e^(−x)dx = 1 + 1 = 2.

  • For x ≥ 1, e^(−x²) ≤ e^(−x), so the area is finite (actually equals √π/2).

Example (1/ln x): ∫(e to 1) dx/ln x is above ∫(e to 1) dx/(x ln x) = ∞.

  • The smaller integral diverges, so the original diverges.

🧪 When comparison fails

  • We don't get the exact area, only a decision about convergence.
  • Example: ∫(0 to ∞) e^(−x)/x diverges because e^(−x) is no help at x = 0; we compare with ∫(0 to 1) dx/x = ∞.
  • Example: ∫(0 to ∞) x^50 e^(−x)dx = 50! (finite), but ∫(0 to ∞) x^(−1) e^(−x)dx = ∞; the factor e^(−x) overrides any power x^p as x → ∞, but not at x = 0.

🔄 Integrals from −∞ to +∞

🔄 Split into two parts

∫(−∞ to ∞) y(x)dx = ∫(−∞ to 0) y(x)dx + ∫(0 to ∞) y(x)dx, and each part must converge separately.

  • The limits at −∞ and +∞ are kept separate; we cannot accept ∞ − ∞ = 0.
  • Example: the bell-shaped curve y = e^(−x²) covers finite area (exactly √π); the separate areas left and right of zero are each (√π)/2.

🔄 Balancing regions don't cancel

  • Example: ∫(−∞ to ∞) x dx is not defined, even though ∫(−b to b) x dx = 0 for every b.
  • The area under y = x is +∞ on one side of zero and −∞ on the other; the two areas are not separately finite.
  • Example: ∫(−1 to 1) dx/x does not exist; the regions left and right of x = 0 are mirror images (1/x is odd), but each has infinite area.
  • Don't confuse: Cauchy's "principal value integral" would be zero, but the standard rules say no—∞ − ∞ is not zero.
51

Areas and Volumes by Slices

8.1 Areas and Volumes by Slices

🧭 Overview

🧠 One-sentence thesis

The integral method extends from finding areas under curves to computing volumes of three-dimensional solids by slicing them into thin pieces whose areas or volumes can be summed.

📌 Key points (3–5)

  • Core idea: Just as area under a curve is built from thin vertical strips of width dx and height v(x), volumes are built from thin slices of thickness dx (or dy) and cross-sectional area A(x).
  • Area between curves: Integrate "top minus bottom" over the intersection interval: integral of [v(x) − w(x)] dx.
  • Slices vs. shells for volumes: Slices perpendicular to the axis give integrals of cross-sectional area A(x) dx; shells parallel to the axis (for solids of revolution) give integrals of 2πxh dx.
  • Common confusion: For washers (disks with holes), the area is f² − g² (outer radius squared minus inner radius squared), not (f − g)².
  • Horizontal vs. vertical: You can slice a region vertically (x integrals) or horizontally (y integrals)—choose whichever makes the integral simpler.

📐 Areas between curves

📏 The basic formula

Area between two curves = integral from a to b of [v(x) − w(x)] dx, where v(x) is the upper curve and w(x) is the lower curve.

  • The strip height is the vertical distance from the lower curve up to the upper curve.
  • Width of each strip is dx (informally; rigorously it's the limit as Δx → 0).
  • Find intersection points by solving v(x) = w(x) to determine the limits a and b.

Example: Upper curve y = 6x, lower curve y = 3x². Intersections at x = 0 and x = 2. Area = integral from 0 to 2 of (6x − 3x²) dx = 3x² − x³ evaluated from 0 to 2 = 4.

🔄 Choosing vertical or horizontal slices

  • Vertical slices (parallel to the y-axis) lead to integrals in x: integrate (top − bottom) dx.
  • Horizontal slices (parallel to the x-axis) lead to integrals in y: integrate (right − left) dy.
  • When to prefer horizontal: If vertical slicing requires multiple separate integrals (e.g., a parallelogram), horizontal slicing may give a single simpler integral.

Example: A unit parallelogram needs three separate x integrals if sliced vertically, but only one y integral (integral from 0 to 1 of 1 dy = 1) if sliced horizontally.

⚠️ Watch the geometry

  • The figure is essential: a region may loop back on itself, requiring you to split the integral or change the slicing direction.
  • Don't blindly integrate from one intersection to another—the "top" and "bottom" curves may switch, or the region may have parts that need separate treatment.

Example: The whole area between a circle and a 45° line has two intersection points at x = ±1/√2, but integrating from −1/√2 to +1/√2 misses the part of the circle that bulges out over itself on the left; that part needs strips of height 2v instead of v − w.

🧊 Volumes by slicing

🍞 The slicing principle

Volume = integral of (cross-sectional area A(x)) times thickness dx = integral of A(x) dx.

  • Each slice is a thin slab of thickness dx (or dy) with cross-sectional area A that depends on position.
  • For a cylinder (constant cross-section A), volume = A times height h.
  • For a pyramid or cone (cross-section tapers to zero), volume = (1/3) × base area × height—the factor 1/3 comes from integrating the quadratically shrinking area.

Example: A triangular pyramid with base area 6 and height h has side lengths that drop linearly to zero, so area A(x) = 6(1 − x/h)². Integrating from 0 to h gives volume = 2h (one-third of the cylinder volume 6h).

🌐 Solids of revolution: disks and washers

When you rotate a curve y = f(x) around the x-axis, every cross-section is a circle.

  • Disk method: Area of a full disk is π y² = π [f(x)]². Volume = integral of π [f(x)]² dx.
  • Washer method (disk with a hole): Outer radius f, inner radius g. Area = π f² − π g² (not π (f − g)²!). Volume = integral of π (f² − g²) dx.

Example: Rotating y = √x from x = 0 to x = 2 around the x-axis gives a "headlight" with volume = integral from 0 to 2 of π x dx = π x²/2 evaluated = 2π.

Example (washer): Same √x curve but with a hole of radius 1 down the center. From x = 1 to x = 2, each slice is a washer with outer radius √x and inner radius 1. Area = π(x − π) = π(x − 1). Volume = integral from 1 to 2 of π(x − 1) dx = π/2.

🔁 Horizontal slices for y-axis problems

  • If you rotate around the y-axis or if the solid is easier to describe in y, slice horizontally.
  • Cross-sectional area is now A(y), and you integrate A(y) dy.

Example: A half-sphere of radius R sliced horizontally has circular cross-sections of radius r where y² + r² = R². Area A(y) = π r² = π(R² − y²). Volume = integral from 0 to R of π(R² − y²) dy = (2/3)π R³.

🥫 Volumes by cylindrical shells

🛢️ The shell method

Instead of slicing perpendicular to the axis, cut the solid into thin cylindrical shells parallel to the axis.

Shell volume = 2π x (circumference) × h (height) × dx (thickness) = 2π x h dx.

  • A shell at radius x with thickness dx is like an outer cylinder minus an inner cylinder; the dominant term is 2π x h dx.
  • Total volume = integral of 2π x h dx.

When to use shells: Often simpler when rotating around the y-axis, because the shell height is directly y = f(x) and you integrate in x without needing to solve for x = f⁻¹(y).

🆚 Shells vs. slices: which to choose?

MethodBest forWhat you integrateWhat you need
Slices (disks/washers)Rotation around x-axisπ y² dx or π(f² − g²) dxRadius as a function of x
ShellsRotation around y-axis2π x h dxHeight h = f(x) directly
  • Normal choice: Slices through the x-axis, shells around the y-axis.
  • If you use slices for y-axis rotation, you need x = f⁻¹(y), which may be hard to find or integrate.

Example: Rotating y = cos x around the y-axis:

  • Good (shells): integral of 2π x cos x dx.
  • Bad (slices): integral of π [cos⁻¹(y)]² dy—requires solving for x in terms of y.

🧪 Shell examples

Example (cone): A cone of base radius r and height b, sliced into shells. Shell height h = b − (b/r)x. Volume = integral from 0 to r of 2π x [b − (b/r)x] dx = (1/3)π r² b.

Example (sphere with hole): Bore a hole of radius a through a sphere of radius b. Shells start at x = a. Shell height h = 2√(b² − x²). Volume = integral from a to b of 2π x · 2√(b² − x²) dx. Substitute u = b² − x² (so du = −2x dx) to get −2π integral of √u du = −2π (2/3)u^(3/2) = (4/3)π (b² − a²)^(3/2).

Example (paraboloid): Rotate y = x² around the y-axis from x = 0 to x = √2 (up to y = 2). Shell height h = 2 − x². Volume = integral from 0 to √2 of 2π x (2 − x²) dx = 2π [x² − x⁴/4] evaluated = 2π.

📋 Summary table of methods

ProblemSlice/ShellArea/Volume elementIntegral
Area between curves (vertical)Vertical strips(v − w) dx∫(v − w) dx
Area between curves (horizontal)Horizontal strips(right − left) dy∫(right − left) dy
General solid (x slices)Slices ⊥ x-axisA(x) dx∫A(x) dx
General solid (y slices)Slices ⊥ y-axisA(y) dy∫A(y) dy
Revolution around x-axis (disk)Disksπ y² dx∫π [f(x)]² dx
Revolution around x-axis (washer)Washersπ(f² − g²) dx∫π(f² − g²) dx
Revolution around y-axis (shell)Shells2π x h dx∫2π x f(x) dx

🎯 Key takeaway

  • What to integrate (setting up the correct area or volume element) is more important than how to integrate (the mechanics).
  • Always draw the figure to identify the correct limits, the top/bottom or outer/inner boundaries, and whether vertical or horizontal (or shells) is simpler.
52

Length of a Plane Curve

8.2 Length of a Plane Curve

🧭 Overview

🧠 One-sentence thesis

Arc length of a curve can be computed by integrating infinitesimal straight-line distances along the curve, either from a function y = f(x) or from parametric equations x(t) and y(t).

📌 Key points (3–5)

  • Core idea: Break a smooth curve into tiny straight pieces, find each piece's length using the Pythagorean theorem, then integrate.
  • Non-parametric formula: For y = f(x), arc length equals the integral of the square root of (1 + (dy/dx) squared) with respect to x.
  • Parametric formula: For x(t) and y(t), arc length equals the integral of the square root of ((dx/dt) squared + (dy/dt) squared) with respect to t.
  • Common confusion: Parametric equations are more general—they allow closed curves and self-crossing paths, while y = f(x) gives only one y per x.
  • Practical challenge: Most arc length integrals cannot be solved in closed form and require numerical integration.

📐 The fundamental setup

📐 Breaking curves into straight pieces

  • A smooth curve is nearly straight over very small distances.
  • For a short segment:
    • Horizontal change: delta x
    • Vertical change: delta y
    • Straight-line distance: (delta s) squared = (delta x) squared + (delta y) squared
  • This is the Pythagorean theorem applied to each tiny piece.

🔍 From finite to infinitesimal

  • The slope dy/dx relates vertical to horizontal change: delta y ≈ (dy/dx) times delta x
  • Substituting gives: delta s ≈ square root of (1 + (dy/dx) squared) times delta x
  • In the limit, this becomes ds = square root of (1 + (dy/dx) squared) dx
  • Don't confuse: ds is not the same as dx; it's the actual distance along the curve, not just horizontal distance.

📏 Non-parametric arc length

📏 The standard formula

Arc length formula: For y = f(x) from x = a to x = b, the length s equals the integral from a to b of the square root of (1 + (f'(x)) squared) dx.

  • This assumes the curve can be written as y = f(x) with continuous derivative.
  • The square root makes most integrals difficult or impossible to solve exactly.

🧮 Example: y = x to the three-halves power

  • From x = 0 to x = 4
  • dy/dx = (3/2) times x to the one-half power
  • Length integral: square root of (1 + (9/4)x) dx from 0 to 4
  • Result: approximately 9, compared to straight-line distance of square root of 80 ≈ 8.94
  • The curve is surprisingly close to straight.

⭕ Example: Quarter circle

  • Curve: y = square root of (1 - x squared) from x = 0 to x = 1
  • dy/dx = -x / square root of (1 - x squared)
  • After simplification: integral of 1 / square root of (1 - x squared) dx
  • Antiderivative is arcsin(x), giving π/2 at x = 1
  • This is exactly one-quarter of the full circumference 2π.

🚫 Example: Ellipse—no closed form

  • For the ellipse y squared + 2x squared = 2
  • The length integral cannot be evaluated in closed form.
  • Must use numerical integration (trapezoidal rule, Simpson's rule, or midpoint rule).
  • This illustrates a common limitation: even simple-looking curves may have intractable length integrals.

🔄 Parametric arc length

🔄 Why parametric equations

  • Parametric form: x = x(t), y = y(t) where t is a parameter (often time).
  • Advantages over y = f(x):
    • Can represent closed curves (circles, ellipses)
    • Can handle self-crossing paths
    • Encodes both position and timing/speed
  • Example: The unit circle is x = cos(t), y = sin(t), which automatically satisfies x squared + y squared = 1.

🔄 The parametric formula

Parametric arc length: The length equals the integral of the square root of ((dx/dt) squared + (dy/dt) squared) dt.

  • Each infinitesimal piece has (ds) squared = (dx) squared + (dy) squared
  • Approximate: delta x ≈ (dx/dt) times delta t and delta y ≈ (dy/dt) times delta t
  • Result: ds = square root of ((dx/dt) squared + (dy/dt) squared) dt

⭕ Example: Circle revisited

  • x = cos(t), y = sin(t) from t = 0 to t = π/2
  • dx/dt = -sin(t), dy/dt = cos(t)
  • Square root of (sin squared + cos squared) = 1
  • Length = integral of 1 dt = π/2
  • Simpler than the non-parametric version (no 1/square root of (1 - x squared)).

🔄 Connecting the two forms

  • The non-parametric form y = f(x) is a special case of parametric equations.
  • Set x = t and y = f(t); then dx/dt = 1.
  • The parametric formula reduces to the non-parametric formula.
  • Example: x = t, y = t to the three-halves power is the same curve as y = x to the three-halves power.

🏃 Speed and motion

🏃 Defining speed

Speed: ds/dt = square root of ((dx/dt) squared + (dy/dt) squared).

  • Speed is the rate of distance traveled along the curve, not just horizontal or vertical rate.
  • Example: A ball thrown upward has dx/dt = 0 but speed = absolute value of dy/dt (positive both up and down).
  • Don't confuse: Speed is always non-negative; velocity components dx/dt and dy/dt can be negative.

🔄 Same curve, different speeds

  • Example: x = t squared, y = t cubed versus x = t, y = t to the three-halves power
  • Both trace the same geometric path (y = x to the three-halves power).
  • Different parametrizations mean different speeds along the same curve.
  • The total arc length is the same, but the relationship between t and position differs.

⚠️ Practical considerations

⚠️ Integration challenges

Curve typeSolvabilityMethod
Straight linesExactDirect integration
Circles (parametric)ExactSimplifies to constant
Most polynomialsDifficultNumerical integration
EllipsesImpossible in closed formNumerical methods required

⚠️ Numerical methods

  • The excerpt mentions trapezoidal rule, Simpson's rule, and midpoint rule.
  • Calculators often automatically substitute to improve convergence.
  • Example: Quarter-ellipse requires thousands of intervals for accurate midpoint rule result (≈1.91).
  • Blind application can fail when denominators approach zero at endpoints.

⚠️ Common pitfall: Staircase paradox

  • A staircase of horizontal and vertical segments can approximate a diagonal line arbitrarily closely.
  • Yet the staircase length stays constant (sum of horizontal + vertical distances).
  • The diagonal has length square root of 2, but 100 tiny stair-steps still have length 2.
  • Lesson: Closeness in shape does not guarantee closeness in length.
53

Area of a Surface of Revolution

8.3 Area of a Surface of Revolution

🧭 Overview

🧠 One-sentence thesis

When a curve is revolved around an axis, the surface area can be computed by summing the areas of thin bands formed by revolving short straight line segments, leading to integrals that depend on the curve's slope and the radius of revolution.

📌 Key points (3–5)

  • Core idea: Revolve short straight pieces (length ds) instead of the curve itself; each piece produces a thin band whose area is 2π r ds.
  • Two formulas: Revolution around the x-axis uses radius r = y; revolution around the y-axis uses radius r = x.
  • Band area formula: Surface area of one band = (side length s) × (middle circumference 2π r).
  • Common confusion: The radius r changes depending on the axis of revolution—don't mix up r = y (x-axis) with r = x (y-axis).
  • Parametric form: When the curve is given as x(t), y(t), express ds in terms of dt using the square root of (dx/dt)² + (dy/dt)².

🔄 How surfaces of revolution are built

🔄 Revolving a curve around an axis

A surface of revolution is produced by revolving a curve y = f(x) around an axis, creating a symmetric surface.

  • Examples of surfaces: revolving a sloping line → cone; revolving a line parallel to the axis → cylinder (pipe); revolving a curve → lamp shade or light bulb.
  • The excerpt notes that Section 8.1 computed the volume inside such surfaces; this section computes the surface area.

🧩 Key approximation strategy

  • Instead of trying to measure the curved surface directly, revolve short straight line segments with slope Δy/Δx.
  • Each straight piece (length Δs) produces a thin band when revolved.
  • The curved surface is approximated by the sum of these bands.
  • Don't confuse: We are not cutting the surface into tiny flat patches (which would require a double integral dx dy); surfaces of revolution are special and can be cut into bands that go all the way around, keeping the integral one-dimensional.

📏 Band area and the fundamental formula

📏 Area of a single band

The surface area of a band is 2π r s, where r is the radius of the circle traced by the center of the piece and s is the length of the straight piece.

  • The band is a slice of a cone.
  • When flattened out, its area = (side length s) × (middle circumference 2π r).
  • For a small piece: s = √(1 + (Δy/Δx)²) Δx (the arc length element).

📐 The two main formulas

Revolution around the x-axis (radius r = y):

S = integral from a to b of 2π y √(1 + (dy/dx)²) dx

Revolution around the y-axis (radius r = x):

S = integral from a to b of 2π x √(1 + (dy/dx)²) dx

Axis of revolutionRadius rFormula
x-axisy = f(x)2π y √(1 + (dy/dx)²) dx
y-axisx2π x √(1 + (dy/dx)²) dx
  • The key difference: which coordinate (x or y) serves as the radius.
  • Example: Revolving y = 2x from x = 0 to x = 1 around the x-axis gives area 2π√5 (radius r = y = 2x at midpoint is 1, length s = √5).
  • Example: The same line segment revolved around the y-axis gives area π√5 (radius r = x, which is half of y).

🎯 Worked examples

🎯 Sphere from a semicircle

  • Revolve the semicircle y = √(R² - x²) around the x-axis (limits x = -R to x = R).
  • The slope is dy/dx = -x/√(R² - x²).
  • Plugging into the formula: 1 + (dy/dx)² simplifies to R²/(R² - x²), so √(1 + (dy/dx)²) = R/√(R² - x²).
  • The integral becomes: 2π ∫ √(R² - x²) · (R/√(R² - x²)) dx = 2π ∫ R dx = 2π R · 2R = 4π R².
  • This matches the known formula for the surface area of a sphere.

🎯 Cone from a sloping line

  • Revolve y = 2x from x = 0 to x = 1 around the x-axis.
  • (dy/dx)² = 4, so √(1 + 4) = √5.
  • S = ∫ 2π (2x) √5 dx from 0 to 1 = 2π√5.
  • Check: The line from (0,0) to (1,2) has length √5; its midpoint is (1/2, 1); middle radius r = 1; area = 2π · 1 · √5 = 2π√5. ✓

🎯 Doughnut from a circle

  • The parametric curve x = cos t, y = 5 + sin t traces a circle with center at (0,5).
  • Revolving around the x-axis produces a doughnut (torus).
  • (dx/dt)² + (dy/dt)² = sin² t + cos² t = 1, so ds = dt.
  • S = ∫ from 0 to 2π of 2π (5 + sin t) · 1 dt = 2π [5t - cos t] from 0 to 2π = 2π · 10π = 20π².

🔢 Parametric form

🔢 When the curve is given as x(t), y(t)

The surface area formula becomes: ∫ 2π y(t) √((dx/dt)² + (dy/dt)²) dt.

  • The length element ds is expressed in terms of t: (ds)² = (dx)² + (dy)² becomes ds = √((dx/dt)² + (dy/dt)²) dt.
  • For revolution around the x-axis, use radius r = y(t).
  • For revolution around the y-axis, use radius r = x(t).
  • Example: x = 2t, y = t² gives dx/dt = 2, dy/dt = 2t, so ds = √(4 + 4t²) dt.

🔢 Why parametric form is useful

  • Some curves cannot be written as y = f(x) (e.g., circles, ellipses).
  • Parametric equations allow us to handle a wider variety of curves.
  • The same length formula (ds)² = (dx)² + (dy)² applies, just rewritten in terms of the parameter t.
54

Probability and Calculus

8.4 Probability and Calculus

🧭 Overview

🧠 One-sentence thesis

Calculus extends probability from discrete counting to continuous distributions, where outcomes fall in ranges and are described by probability densities that integrate to give probabilities and expected values.

📌 Key points (3–5)

  • Discrete vs. continuous probability: Discrete uses counting and lists of outcomes; continuous uses calculus and probability densities p(x) over intervals.
  • Probability density p(x): The chance that X falls between a and b is the integral of p(x) from a to b; p(x) itself is not a probability but a density.
  • Mean and variance: The mean μ (expected value) is the integral of x·p(x); the variance σ² measures spread around the mean; standard deviation σ is the square root of variance.
  • Common confusion: Don't confuse p(x) with probability—p(x) dx is the probability of falling in a small interval dx; the actual probability of hitting exactly x is zero in continuous models.
  • Central Limit Theorem: Averaging N independent samples produces a distribution that approaches normal (bell-shaped) with mean μ and variance σ²/N, regardless of the original distribution.

🎲 Discrete vs. continuous probability

🎲 Discrete random variables

A discrete random variable X has a list of possible values, each with a known probability pₙ.

  • Examples: number of coin tosses until heads (X = 1, 2, 3, …); number of errors on a quiz; dice outcomes (X = 2, 3, …, 12).
  • Each outcome n has probability pₙ, and the sum of all probabilities is necessarily 1: p₁ + p₂ + p₃ + … = 1.
  • No calculus needed: you count outcomes and add probabilities.

🌊 Continuous random variables

A continuous random variable X can fall anywhere in an interval; outcomes are described by a probability density p(x).

  • Examples: lifetime of a VCR (X ≥ 0); SAT score (200 ≤ X ≤ 800); fraction of voters in a poll (0 ≤ X ≤ 1).
  • The probability of hitting exactly one value is zero; instead, we ask for the probability that X falls in a range.
  • Calculus enters: probabilities are found by integrating the density p(x).

🔍 How to distinguish

  • Discrete: outcomes are countable (whole numbers, specific events); use sums Σ pₙ.
  • Continuous: outcomes fill an interval; use integrals ∫ p(x) dx.
  • Example: The number of quiz errors is discrete (0, 1, 2, …); the time until breakdown is continuous (any positive real number).

📐 Probability density and integration

📐 The density function p(x)

The probability density p(x) satisfies: Prob{a ≤ X ≤ b} = integral from a to b of p(x) dx.

  • p(x) itself is not a probability; it is a density (probability per unit length).
  • Roughly, p(x) dx is the chance of falling between x and x + dx.
  • p(x) ≥ 0 everywhere, and the total integral from −∞ to +∞ equals 1: ∫₋∞^∞ p(x) dx = 1.

🔔 Important continuous distributions

DistributionDensity p(x)Mean μVariance σ²Application
Exponentiala·e^(−ax) for x ≥ 01/a1/a²Waiting time, breakdown time
Normal (Gaussian)(1/(σ√(2π)))·e^(−(x−μ)²/(2σ²))μσ²Distribution around mean, bell curve
Uniform1 for 0 ≤ x ≤ 11/21/12Random choice in interval

🧮 Example: Exponential distribution

  • VCR lifetime has average 4 years: p(x) = (1/4)·e^(−x/4) for x ≥ 0.
  • Probability of breakdown within 12 years: integral from 0 to 12 of (1/4)·e^(−x/4) dx = 1 − e^(−3) ≈ 0.95.
  • The integral from 0 to ∞ is 1, confirming eventual breakdown is certain.

🔔 Example: Normal (bell-shaped) distribution

  • SAT scores with mean 500 and standard deviation 200: p(x) = (1/(200√(2π)))·e^(−(x−500)²/(2·200²)).
  • The graph is symmetric around x = 500; 68% of outcomes fall within one standard deviation (300 to 700); 95% fall within two standard deviations (100 to 900).
  • The density extends from −∞ to +∞, but the probability outside [200, 800] is tiny (about 0.0013 above 800).
  • Don't confuse: The normal distribution is defined by two parameters: μ (center) and σ (spread).

📊 Cumulative density function F(x)

F(x) = integral from −∞ to x of p(x) dx = Prob{X ≤ x}.

  • F(x) accumulates probabilities up to x; dF/dx = p(x).
  • F(−∞) = 0 and F(+∞) = 1.
  • Example: For the normal distribution, F(μ) = 1/2 (50% chance of being below the mean).

🎯 Mean, variance, and standard deviation

🎯 Mean (expected value) μ

The mean μ is the average outcome, weighted by probability.

  • Discrete: μ = Σ n·pₙ (sum of outcome times probability).
  • Continuous: μ = ∫₋∞^∞ x·p(x) dx.
  • Example (coin tosses until heads): μ = 1·(1/2) + 2·(1/4) + 3·(1/8) + … = 2.
  • Example (exponential with a = 1/4): μ = ∫₀^∞ x·(1/4)·e^(−x/4) dx = 4 (integration by parts).

📏 Variance σ² and standard deviation σ

Variance σ² measures the spread around the mean; it is the expected value of (X − μ)².

  • Discrete: σ² = Σ (n − μ)²·pₙ.
  • Continuous: σ² = ∫₋∞^∞ (x − μ)²·p(x) dx.
  • Standard deviation σ is the square root of variance; it has the same units as X.
  • Alternative formula: σ² = ∫ x²·p(x) dx − μ² (easier to compute).

🧮 Example: Yes-no poll (one person)

  • Fraction p = 1/3 thinks yes (X = 1), fraction 1 − p = 2/3 thinks no (X = 0).
  • Mean: μ = 0·(2/3) + 1·(1/3) = 1/3.
  • Variance: σ² = (0 − 1/3)²·(2/3) + (1 − 1/3)²·(1/3) = 2/9.
  • Standard deviation: σ = √(2/9).
  • Interpretation: When p is near 0 or 1, the spread is smaller (more predictable); maximum variance is at p = 1/2.

📋 Summary table

ModelMean μVariance σ²
Yes-no (one person)pp(1 − p)
Poisson (pₙ = λⁿ·e^(−λ)/n!)λλ
Exponential (p(x) = a·e^(−ax))1/a1/a²
Normalμ (built-in parameter)σ² (built-in parameter)

🎲 Important discrete models

🎲 Poisson distribution

The Poisson model describes the probability of n random, independent occurrences when the average is λ: pₙ = (λⁿ/n!)·e^(−λ).

  • Example: A student makes an average of 2 errors per exam (λ = 2).
    • Probability of 0 errors: p₀ = (2⁰/0!)·e^(−2) = e^(−2) ≈ 0.135.
    • Probability of 1 error: p₁ = (2¹/1!)·e^(−2) = 2e^(−2) ≈ 0.27.
    • Probability of 2 errors: p₂ = (2²/2!)·e^(−2) = 2e^(−2) ≈ 0.27.
  • The sum of all probabilities is 1, from the series for e^λ: Σ (λⁿ/n!) = e^λ, so Σ pₙ = e^λ·e^(−λ) = 1.
  • Mean and variance are both λ.

✈️ Example: Airline overbooking

  • On average, 3 out of 100 passengers don't show up (λ = 3).
  • Plane holds 98 passengers; airline books 100.
  • Someone is bumped if 0 or 1 no-shows: Prob = p₀ + p₁ = e^(−3) + 3e^(−3) ≈ 4/20.

🪙 Example: Coin tosses until heads

  • Probabilities: p₁ = 1/2, p₂ = 1/4, p₃ = 1/8, … (pₙ = (1/2)ⁿ).
  • Mean number of tosses: μ = 1·(1/2) + 2·(1/4) + 3·(1/8) + … = 2.
  • The sum of probabilities is 1/2 + 1/4 + 1/8 + … = 1.

📊 Law of Averages and Central Limit Theorem

📊 Averaging N samples

  • Repeat an experiment N times; each produces outcome X.
  • The average outcome is X̄ = (X₁ + X₂ + … + Xₙ)/N (called "X bar").

📈 Law of Averages

As N → ∞, the average X̄ is almost sure to approach the mean μ.

  • "Almost sure" means the chance of X̄ not approaching μ is zero (it can happen, but it won't).
  • Example: Toss a coin many times; the fraction of heads will approach 1/2.
  • Common confusion: The Law of Averages does not say that outcomes "even up" or that tails become more likely after many heads. Each toss is independent; past results do not affect future probabilities.

🔔 Central Limit Theorem

No matter what the original distribution p(x), the distribution of the average X̄ from N samples approaches a normal (bell-shaped) distribution with mean μ and variance σ²/N.

  • The standard deviation of X̄ is σ/√N (shrinks as N increases).
  • With 95% confidence, X̄ falls within two standard deviations of μ: μ − 2σ/√N to μ + 2σ/√N.
  • This holds even if the original distribution is not normal.

🗳️ Example: Yes-no poll of 2500 voters

  • One voter has mean μ = p (unknown true fraction) and variance σ² ≤ 1/4 (maximum when p = 1/2).
  • For N = 2500 voters, standard deviation of X̄ is at most (1/2)/√2500 = 1%.
  • Poll result: X̄ = 53%.
  • With 95% confidence, X̄ is within 2% of the true mean μ, so 51% ≤ μ ≤ 55%.
  • Conclusion: The poll is conclusive; the true fraction is very likely above 50%.
  • Error margin: For any poll of N voters, the margin is approximately 1/√N.

🏈 Example: Elevator safety

  • 16 football players, average weight μ = 210 pounds, standard deviation σ = 30 pounds.
  • Elevator capacity: 3600 pounds (average of 225 pounds per person).
  • Average weight X̄ is approximately normal with mean 210 and standard deviation 30/√16 = 7.5.
  • Probability that X̄ > 225 is about 2% (more than two standard deviations above the mean).
  • Interpretation: A statistician would say 98% confidence, but 2% risk is too high for an elevator.

🎲 Example: Weldon's dice

  • Threw 12 dice 26,306 times, counting 5's and 6's in 315,672 separate rolls.
  • Observed fraction: X̄ = 0.3377 instead of expected p = 1/3 ≈ 0.3333.
  • Variance for one roll: σ² = p(1 − p) = 2/9; standard deviation of X̄: σ/√N ≈ 0.00084.
  • Difference 0.3377 − 0.3333 is more than 5 standard deviations away from the mean.
  • Probability of falling 5σ away is about 1 in 10,000.
  • Conclusion: The dice were unfair (faces with 5 or 6 indentations were lighter and came up more often).

⚠️ Don't confuse: Law of Averages misconceptions

  • Wrong: "After many heads, tails become more likely to even things out."
  • Right: Each toss is independent; the fraction of heads approaches 1/2 as N grows, but past results do not influence future tosses.
  • Example: A fair coin comes up heads 10 times in a row. The next toss still has 50-50 odds; the coin has no memory.
55

Masses and Moments

8.5 Masses and Moments

🧭 Overview

🧠 One-sentence thesis

The moment of a mass—mass multiplied by distance from an axis—determines the balance point (center of mass) and rotational behavior (moment of inertia), with discrete sums generalizing to integrals when mass is continuously distributed.

📌 Key points (3–5)

  • What a moment is: mass times distance from an axis; total moment divided by total mass gives the center of mass (balance point).
  • Discrete vs continuous: discrete masses sum as Σ mₙxₙ; continuous distributions integrate as ∫ x ρ(x) dx, where ρ is density.
  • Two dimensions: moments around the y-axis use x-distances (Mᵧ = Σ mₙxₙ), moments around the x-axis use y-distances (Mₓ = Σ mₙyₙ); the centroid is (x̄, ȳ).
  • Common confusion: moment around the y-axis uses x coordinates (distance from the y-axis), not y coordinates.
  • Moment of inertia: mass times the square of distance (I = Σ x²ₙmₙ or ∫ x² ρ dx); measures resistance to rotation and determines rotational kinetic energy (½ I ω²).

📏 Moments and center of mass in one dimension

📏 What a moment measures

Moment of mass around the y-axis = mx = (mass) × (distance to axis).

  • A moment is not just mass; it is mass weighted by how far it sits from a reference axis.
  • Example: a mass of 2 at distance x = 7 has moment 2 × 7 = 14.
  • The total moment is the sum of all individual moments: Mᵧ = Σ mₙxₙ.

⚖️ Center of mass (balance point)

Center of mass x̄ = (total moment) / (total mass) = (Σ mₙxₙ) / (Σ mₙ).

  • This is the point where the system balances.
  • If you move all masses to x̄, the total moment stays the same.
  • If you place the axis at x̄, the moments on either side cancel (net moment = 0).
  • Example: masses 1, 3, 2 at positions x = 1, 3, 7 have total mass M = 6, total moment Mᵧ = 1 + 9 + 14 = 24, so x̄ = 24/6 = 4.

🌊 Continuous distributions

When mass is spread out, density ρ(x) = (mass of piece) / (length of piece) = dm/dx.

  • Total mass: M = ∫ ρ(x) dx (integrate density over the region).
  • Total moment: Mᵧ = ∫ x ρ(x) dx (each piece dm = ρ dx contributes x · ρ dx).
  • Center of mass: x̄ = Mᵧ / M = [∫ x ρ(x) dx] / [∫ ρ(x) dx].

Example: Constant density ρ from 0 to L gives M = ρL, Mᵧ = ρ · (½ L²), so x̄ = L/2 (halfway along).

Example: Density ρ = e⁻ˣ from 0 to ∞ gives M = 1, Mᵧ = 1, so x̄ = 1.

🗺️ Two-dimensional masses and centroids

🗺️ Moments in the plane

When masses mₙ are at points (xₙ, yₙ):

MomentFormulaWhat it measures
Around y-axisMᵧ = Σ mₙxₙUses x-coordinates (distance from y-axis)
Around x-axisMₓ = Σ mₙyₙUses y-coordinates (distance from x-axis)

Don't confuse: The moment around the y-axis uses the x coordinate, because x measures how far the mass is from the y-axis.

🎯 Center of mass in two dimensions

Center of mass: (x̄, ȳ) where x̄ = Mᵧ / M = (Σ mₙxₙ) / (Σ mₙ) and ȳ = Mₓ / M = (Σ mₙyₙ) / (Σ mₙ).

  • This is the balance point in the plane; the plate will not tip if it rests on (x̄, ȳ).
  • Also called the centroid when density ρ = 1 (uniform thin plate).

📐 Plates with constant density

For a thin plate with ρ = 1:

  • Mass M = area of the plate.
  • Moment around y-axis: Mᵧ = ∫ x · (length of vertical strip) dx.
    • All points on a vertical strip share the same x (distance from y-axis).
  • Moment around x-axis: Mₓ = ∫ y · (length of horizontal strip) dy.
    • All points on a horizontal strip share the same y (distance from x-axis).

Example: Triangle with sides x = 0, y = 0, and y = 4 − 2x.

  • Mass M = area = ∫₀² (4 − 2x) dx = 4.
  • Mᵧ = ∫₀² x(4 − 2x) dx = 8/3.
  • Mₓ = ∫₀⁴ y · ½(4 − y) dy = 16/3.
  • Centroid: (x̄, ȳ) = (2/3, 4/3).

Example: Half-circle below x² + y² = r².

  • Mᵧ = 0 by symmetry (region balances on the y-axis).
  • Mₓ = ∫₀ʳ y · 2√(r² − y²) dy = (2/3)r³.
  • Mass M = (½)πr² (area of semicircle).
  • Centroid height: ȳ = Mₓ / M = 4r/(3π).
  • This is less than r/2 because the bottom of the semicircle is wider than the top.

🔄 Moment of inertia and rotation

🔄 What moment of inertia measures

Moment of inertia: I = Σ (distance)² · mass, e.g., Iᵧ = Σ x²ₙmₙ around the y-axis, I₀ = Σ r²ₙmₙ around the origin.

  • Moment of inertia uses the square of the distance, not just distance.
  • It measures resistance to rotational acceleration.
  • In the plane: Iₓ + Iᵧ = I₀ because x²ₙ + y²ₙ = r²ₙ.

🌀 Continuous moment of inertia

For a plate with constant density ρ = 1:

  • Around y-axis: Iᵧ = ∫ x² · (vertical strip length) dx.
  • Around x-axis: Iₓ = ∫ y² · (horizontal strip length) dy.

Example: Rod from 0 to L.

  • Around the end (x = 0): I_end = ∫₀ᴸ x² dx = (1/3)L³.
  • Around the center (x from −L/2 to L/2): I_center = ∫₍₋ₗ/₂₎^(L/2) x² dx = (1/12)L³.
  • Turning is easier around the center because I is smaller.

⚙️ Rotation and energy

A mass m moving in a circle of radius r with angular velocity ω (radians per second) has speed v = rω.

  • Rotational kinetic energy: (½)mv² = (½)m(rω)² = (½)I₀ω², where I₀ = mr².
  • The moment of inertia I₀ plays the role of mass in rotational motion.
  • Example: An ice skater pulls her arms in, reducing I₀; since rotational energy (½)I₀ω² is conserved, ω increases and she spins faster.

🔧 Torque

Torque T = F · x = force × distance from the turning axis.

  • Torque is the rotational analogue of force.
  • Newton's law F = ma becomes T = I · (dω/dt) for rotation.
  • A push further from the axis (larger x) produces more torque.
  • Example: Pushing a revolving door near the hinge is less effective than pushing near the outer edge.

🎳 Rolling objects experiment

When objects roll down a slope, potential energy mgh converts to translational kinetic energy (½)mv² plus rotational kinetic energy (½)Iω².

  • For rolling without slipping: v = ωr.
  • Energy balance: mgh = (½)mv²(1 + I/(mr²)).
  • Define J = I/(mr²) (dimensionless ratio).
  • Velocity at the bottom: v² = 2gh / (1 + J).
  • Smaller J means larger velocity (faster finish).
  • A hollow cylinder has J = 1 (all mass at radius r), so it finishes last.
  • Density cancels in J = I/(mr²), so it does not affect the race order.
  • Size also does not matter if shape and density are the same.

🔗 Connection to probability

🔗 Parallel formulas

The formulas for mass and moment match those for probability:

ConceptMass/MomentProbability
TotalM = ∫ ρ(x) dx∫ p(x) dx = 1
Center/Meanx̄ = [∫ x ρ(x) dx] / Mμ = ∫ x p(x) dx
Spread∫ (x − x̄)² ρ(x) dxσ² = ∫ (x − μ)² p(x) dx
  • The only difference: total probability is always 1, so the mean does not require division by a total.
  • The moment of inertia ∫ (x − x̄)² ρ(x) dx is analogous to variance.
  • Mathematics reuses the same structure in different contexts.
56

Force, Work, and Energy

8.6 Force, Work, and Energy

🧭 Overview

🧠 One-sentence thesis

Work is the integral of force over distance, and it equals the change in potential energy, which can be converted to kinetic energy when objects move under internal forces.

📌 Key points (3–5)

  • Constant vs. variable force: constant force uses simple multiplication W = Fx, but variable force requires integration W = ∫F(x)dx.
  • Potential energy relationship: work done equals the change in potential energy V, where V is the indefinite integral of force F.
  • Force as derivative: force F equals dV/dx (the derivative of potential energy), connecting calculus fundamentals to physics.
  • Common confusion: don't confuse work with simple force times distance when force varies—springs require W = ½kx², not kx·x.
  • Energy conservation: when no external force acts, total energy (kinetic plus potential) remains constant as one form converts to the other.

🔧 Work and force fundamentals

🔧 Constant force

Work = force times distance moved in the direction of force.

  • Formula: W = Fx
  • Example: lifting a 30-pound suitcase up 20 feet of stairs requires W = 600 foot-pounds.
  • If force opposes motion, work is negative (pushing backward on a forward-rolling car gives negative work).
  • If the object doesn't move, work is zero regardless of force magnitude.

🔧 Variable force and integration

When force changes with position, calculus is required:

  • W = ∫F(x)dx replaces simple multiplication.
  • This parallels how ∫v(t)dt handles changing velocity.
  • The integral adds up small pieces of work F·dx over short distances.

🌀 Springs and Hooke's law

🌀 How springs behave

Hooke's law: F(x) = kx, where force is proportional to stretching distance x.

  • k is the elastic constant (units: pounds per foot or Newtons per meter).
  • More stretching requires more force—this is the variable force that needs integration.

🌀 Work to stretch a spring

Starting from x = 0, the work is:

  • W = ∫₀ˣ kx dx = ½kx²
  • Don't confuse: W does NOT equal kx times x; that would ignore the varying force.
  • Example: if F = 20 pounds stretches a spring 1 foot, then k = 20 pounds/foot, and work = ½(20)(1)² = 10 foot-pounds.

🌀 Compression vs. stretching

  • Compression uses negative x and negative F.
  • But work W = ½kx² is still positive (x² is always positive).
  • Same distance of compression requires the same work as stretching.

⚡ Potential energy

⚡ What potential energy means

Potential energy V(x): the energy stored in a system due to position or configuration.

  • A spring at rest has no strain and no energy.
  • Tension or compression gives it potential energy.
  • The change in energy equals the work done.

⚡ Work as change in potential

Moving from x = a to x = b:

  • W = ∫ₐᵇ F(x)dx = V(b) - V(a)
  • Work is the definite integral; potential is the indefinite integral.
  • Example: carrying a suitcase up stairs and back down gives total work = zero (same starting and ending potential).

⚡ The arbitrary constant

  • Potential includes an arbitrary constant C: V = ½kx² + C.
  • To compute a change in potential, C cancels out.
  • Common choice: V = 0 at x = 0, or V = 0 at x = ∞ (for gravity).

⚡ Force as derivative of potential

The fundamental theorem connects F and V:

  • Force exerted on spring: F = dV/dx
  • Force exerted by spring (restoring force): F = -dV/dx
  • The sign depends on point of view (pulling vs. being pulled back).

🌍 Gravity and inverse square law

🌍 Newton's gravitational force

Force = GMm/x², where G is the gravitational constant, M is Earth's mass, m is object mass, x is distance from Earth's center.

  • Force to overcome gravity: F = GMm/x²
  • Force exerted by gravity: F = -GMm/x²

🌍 Gravitational potential

V(x) = ∫F(x)dx = -GMm/x + C

  • Usually C = 0, making potential zero at x = ∞.
  • Example: lifting a suitcase 20 feet uses nearly constant weight, but exact calculation uses the integral of F(x).
  • With constant force, lifting to x = ∞ requires infinite work; with correct decreasing force, work equals GMm/x₀ (finite).

🏃 Kinetic energy and conservation

🏃 Converting potential to kinetic

When you release a spring or drop a suitcase (external force F = 0):

  • Internal force still acts.
  • Potential energy converts to kinetic energy K = ½mv².
  • Newton's law: F = ma = m(dv/dt)

🏃 Work-energy relationship

Integrating Newton's law from x = a to x = b:

  • ∫ₐᵇ F dx = ½mv²(b) - ½mv²(a)
  • This work also equals -V(b) + V(a) (since F = -dV/dx).
  • Therefore: ½mv²(b) + V(b) = ½mv²(a) + V(a)

🏃 Conservation of energy

Total energy (kinetic plus potential) is constant when no external force acts.

  • Example: mass on a stretched spring—spring energy V = ½kx² converts to kinetic energy ½mv² at x = 0, then back to potential at the opposite extreme.
  • This is simple harmonic motion: m(d²x/dt²) + kx = 0.
  • The rate of change of total energy is zero.

💧 Pressure and hydrostatic force

💧 Pressure basics

Pressure = force per unit area (p = F/A).

  • Water has weight-density w ≈ 9800 N/m³ ≈ 62 lb/ft³.
  • At constant depth h, pressure p = wh.
  • Force on base of area A: F = whA.

💧 Force on vertical sides

Pressure varies with depth down a side wall:

  • Divide the side into horizontal strips of thickness Δh.
  • Strip at depth h has length l(h) and area l(h)Δh.
  • Pressure wh is nearly constant on the strip.
  • Force on strip: F = whl(h)Δh.

Total force: F = ∫whl(h)dh

💧 Example: trapezoidal dam

For a dam with depth from 0 to 20, side length l = 60 at top:

  • If sides are straight and l = 50 at h = 20, then l = 60 - ½h.
  • F = ∫₀²⁰ wh(60 - ½h)dh = [30wh² - ⅙wh³]₀²⁰ = 12000w - ⅙(8000w).
  • Units: feet and lb/ft³ give pounds; meters and N/m³ give Newtons.

💧 Pumping water out

Work to empty a tank where area at depth h is A(h):

  • Imagine lifting one layer at a time.
  • Layer weighs wA(h)Δh.
  • Work to lift it distance h: whA(h)Δh.

Total work: W = ∫whA(h)dh

Example: for bottom half of sphere of radius R, cross-sectional area A = π(R² - h²), so W = πwR⁴/4.

57

Polar Coordinates

9.1 Polar Coordinates

🧭 Overview

🧠 One-sentence thesis

Polar coordinates locate points by direction (angle θ) and distance (r) from the origin, offering an alternative to rectangular x-y coordinates that is especially useful for circular and radial patterns.

📌 Key points (3–5)

  • Two coordinate systems: Polar uses distance r and angle θ; rectangular uses x and y positions.
  • Converting between systems: x = r cos θ and y = r sin θ (polar to rectangular); r = square root of (x² + y²) and tan θ = y/x (rectangular to polar).
  • Polar graphs: Equations like r = cos θ produce curves (often circles) that are easier to describe in polar form than rectangular form.
  • Common confusion: The same point can have multiple polar representations (adding 2π to θ, or using negative r values), unlike the unique x-y representation.
  • Why it matters: Flight controllers, pilots, and many real-world applications naturally think in terms of direction and distance rather than horizontal and vertical positions.

📐 The two coordinate systems

📍 What polar coordinates measure

Polar coordinates: a point's location given by r (distance from origin) and θ (angle from horizontal axis).

  • Distance r: a positive number showing how far the point is from the origin.
  • Angle θ: measured from the horizontal (positive x-axis); flight controllers prefer degrees, mathematicians prefer radians.
  • Example: A plane at r = 2 and θ = 30° (or π/6 radians) is 2 units away at a 30° angle.

🔄 How rectangular coordinates differ

  • Rectangular coordinates: use x (horizontal position) and y (vertical position).
  • The excerpt notes that polar is "perfect for distance from the origin" but for most other distances, switching to x and y is better.
  • Different situations call for different systems—polar excels at radial problems, rectangular at grid-based problems.

🔀 Converting between coordinate systems

➡️ From polar to rectangular

The conversion uses a right triangle with hypotenuse r and angle θ:

x = r cos θ and y = r sin θ

  • Example: Point at r = 2, θ = π/6 converts to x = 2 cos(π/6) = √3 and y = 2 sin(π/6) = 1.
  • The excerpt calls these "the most used formulas in this chapter" and says "we will do it constantly."

⬅️ From rectangular to polar

r = square root of (x² + y²) and tan θ = y/x

  • The distance formula comes from the Pythagorean theorem: x² + y² = r².
  • Example: Point (√3, 1) has r = √(3 + 1) = 2 and tan θ = 1/√3, so θ = π/6.
  • The excerpt warns the angle formula is "not so beautiful" and writes "almost θ = arctan(y/x)" because of complications.

⚠️ Angle ambiguities

Don't confuse: Multiple angles can represent the same direction.

  • Adding or subtracting 2π (360°) keeps the same direction: −π/4 is the same as 7π/4 or 15π/4.
  • Adding or subtracting π (180°) doesn't change the tangent: points (1, −1) and (−1, 1) both have tan θ = −1 but are on different lines.
  • Negative angles: θ = −π/4 means rotating clockwise instead of counterclockwise.

🔢 Multiple representations of one point

  • The same physical point can have many polar coordinate pairs.
  • Example: Point B at (1, −1) can be described as r = √2, θ = −π/4 or r = √2, θ = 7π/4.
  • Even negative r is allowed: r = −√2, θ = 3π/4 means "go backward along the 135° line."
  • The excerpt recommends always keeping r ≥ 0 when giving a position, but negative r is allowed when drawing polar graphs.

📊 Polar equations and graphs

🎯 What a polar equation means

Polar equation: a relation between r and θ, written as r = F(θ).

  • This parallels y = f(x) from rectangular coordinates.
  • To graph: take various angles θ (0°, 30°, 60°, ...) and go out distance r = F(θ) on each ray, then connect the points.
  • The resulting curve is the "polar graph."

⭕ The circle r = cos θ

Angle θDistance r = cos θNotes
0° to 90°Positive, decreasingMoving around the circle
90° to 180°NegativeGoing backward on each ray
180° to 360°Traces circle againThe equation gives the circle twice

Why it's a circle (Method 1): Multiply both sides by r:

  • r = cos θ becomes r² = r cos θ
  • Substitute x² + y² = r² and x = r cos θ to get x² + y² = x
  • Rewrite as (x − ½)² + y² = (½)², which is a circle with center (½, 0) and radius ½.

Why it's a circle (Method 2): Use parametric equations:

  • x = r cos θ = cos² θ and y = r sin θ = sin θ cos θ
  • These describe the same shifted circle.

Don't confuse: The angles 0 to π give the whole circle once; continuing to 2π traces it again.

🔵 Other polar circles

  • r = sin θ: Another shifted circle (shown in the figure).
  • r = cos θ + sin θ: A third circle (exercise asks for its xy equation and radius).
  • All calculations rely on the fundamental conversions x = r cos θ and y = r sin θ.

🔧 Parametric form

🎨 Parametric equations

The excerpt briefly introduces parametric equations as an alternative to polar:

  • Instead of r = F(θ), write x and y separately as functions of a parameter (often θ or time t).
  • Example for the circle: x = cos² θ and y = sin θ cos θ.
  • The parameter could be time—"the curve would be the same."
  • The excerpt notes "Chapter 12 studies parametric equations in detail."

📐 Geometric insight

For the circle r = cos θ:

  • Think of it as center vector (½, 0) plus radius vector (½ cos t, ½ sin t).
  • This gives x = ½ + ½ cos t and y = ½ sin t.
  • Setting t = 2θ recovers x = cos² θ and y = sin θ cos θ.
  • This reveals a geometry theorem: "The angle t at the center is twice the angle θ at the circumference."
58

Polar Equations and Graphs

9.2 Polar Equations and Graphs

🧭 Overview

🧠 One-sentence thesis

Polar coordinates simplify the equations of circles and many other curves (including conic sections with a focus at the origin), making them far more practical than rectangular coordinates for these shapes.

📌 Key points (3–5)

  • Why polar coordinates matter: Circles and rays have extremely simple equations (r = constant and theta = constant), and these coordinates form an orthogonal grid just like the x-y system.
  • Symmetry tests: Polar curves can be symmetric across the x-axis, y-axis, or through the origin, but the algebraic tests differ depending on the curve—changing theta to negative theta, theta to pi minus theta, or r to negative r.
  • Conic sections unified: The single equation r = A/(1 + e cos theta) describes all conic sections (circle, ellipse, parabola, hyperbola) depending on the eccentricity e, with one focus at the origin.
  • Common confusion: The same point can have multiple polar representations (r, theta) and (−r, theta + pi), so curves may intersect at points that satisfy different theta values in each equation.
  • Real applications: Planetary orbits (like Mars seen from Earth) and other physical phenomena naturally produce polar curves like limaçons and cardioids.

🔵 Circles and rays: the simplest polar curves

🔵 The unit circle and constant-radius circles

The equation r = 1 describes the unit circle around the origin.

  • In rectangular coordinates, the same circle requires the complex equation y = sqrt(1 − x²) (or x² + y² = 1).
  • Any circle centered at the origin is simply r = constant.
  • Example: r = 3 is a circle of radius 3; no trigonometry needed.

📐 Straight lines through the origin

The equation theta = constant describes a ray (or full line if negative r is allowed) at a fixed angle.

  • theta = pi/4 is the 45° line.
  • If we allow r < 0, the one-directional ray becomes a full line through the origin.
  • These rays are perpendicular to the circles, forming an orthogonal coordinate system (just as interesting as the x-y grid of perpendicular lines).

🔄 Converting between systems

  • Multiplying r = 4 cos theta by r gives r² = 4r cos theta, which becomes x² + y² = 4x in rectangular coordinates—a circle with center shifted from the origin.
  • The equation r = 4/cos theta simplifies to r cos theta = 4, or x = 4, a vertical line.
  • Don't confuse: r = 1/cos theta is a straight line (x = 1), not a curve.

🌸 Flowers, spirals, and limaçons

🌸 The four-petal flower

The equation r = cos 2theta produces a four-petal flower.

  • At theta = 30°, 150°, −30°, −150°, the radius r = cos 60° = 1/2, marking points on the petals.
  • The curve has all three symmetries: across the x-axis, across the y-axis, and through the origin.
  • Example: r = cos 3theta produces a three-petal flower; r = cos 5theta has 5 petals, but r = cos 100theta has 200 petals (even multiples double the count).

🌀 The spiral of Archimedes

The equation r = theta describes a spiral that moves outward as theta increases.

  • Unlike periodic curves (which repeat every 2pi), the spiral adds new points indefinitely.
  • Each full rotation (theta increases by 2pi) moves the spiral outward by 2pi units.
  • If negative theta and negative r are allowed, a second spiral appears.

❤️ The cardioid

The equation r = 1 + cos theta produces a heart-shaped curve called a cardioid.

  • The name comes from its heart shape; it has a cusp (sharp point) at the origin.
  • The radius r is never negative because cos theta never goes below −1, so 1 + cos theta ≥ 0.
  • Real application: The electrical vector in a human heart approximately traces a cardioid (see electrocardiograms).

🐌 Limaçons (generalized cardioids)

The equation r = 1 + b cos theta is a limaçon (French for "snail").

  • When b = 1, it is the cardioid.
  • As b increases past 1, a dimple appears, then an inner loop forms.
  • For large b, the curve resembles two circles.
Value of bShape
0 < b < 1Dimpled limaçon
b = 1Cardioid (cusp at origin)
b > 1Limaçon with inner loop

Example: Mars seen from Earth
Earth orbits at r = 2, Mars at r = 3 (roughly 1.5 times farther). Using time t as a parameter:

  • Earth: x = 2 cos 2pi t, y = 2 sin 2pi t (completes orbit in 1 year)
  • Mars: x = 3 cos pi t, y = 3 sin pi t (completes orbit in 2 years)

Subtracting Earth's position from Mars's position and simplifying gives r = 3 − 4 cos pi t, a limaçon with a loop. This explains why Mars appears to loop backward in the sky when viewed from Earth.

🔍 Symmetry tests for polar curves

🔍 Three types of symmetry

Polar curves can be symmetric in three ways:

  1. Across the x-axis (y → −y)
  2. Across the y-axis (x → −x)
  3. Through the origin (point reflection)

Each symmetry has two algebraic tests because the same point has multiple polar representations.

↔️ Symmetry across the x-axis

Test 1: Change theta to −theta. If the equation is unchanged, the curve is symmetric.
Test 2: Change theta to pi − theta and r to −r. If the equation is unchanged, the curve is symmetric.

  • r = cos 2theta passes Test 1 (cos(−2theta) = cos 2theta).
  • r = sin 2theta passes Test 2 (changing both gives the same equation).
  • Both flowers have x-axis symmetry, but they pass different tests.

↕️ Symmetry across the y-axis

Test 1: Change theta to pi − theta. If unchanged, symmetric.
Test 2: Change theta to −theta and r to −r. If unchanged, symmetric.

  • r = cos 2theta passes Test 1.
  • r = sin 2theta passes Test 2 (sine is odd).

🔄 Symmetry through the origin

Test 1: Change r to −r. If unchanged, symmetric.
Test 2: Change theta to theta + pi. If unchanged, symmetric.

  • The flowers r = cos 2theta and r = sin 2theta pass Test 2 only: cos 2(theta + pi) = cos 2theta and sin 2(theta + pi) = sin 2theta.
  • Every equation r² = F(theta) passes Test 1 because (−r)² = r².

⚠️ Important note

Changing r to −r and theta to theta + pi does nothing—these always represent the same point, so this is not a useful test.

🪐 Conic sections in polar coordinates

🪐 The unified conic equation

The graph of r = A/(1 + e cos theta) is a conic section with eccentricity e.

Eccentricity eConic section
e = 0Circle
0 < e < 1Ellipse
e = 1Parabola
e > 1Hyperbola
  • One focus is always at the origin (0, 0).
  • The amplifying factor A scales the curve without changing its shape.
  • This single equation smoothly transitions from ellipses through parabolas to hyperbolas as e increases.

📐 Example: Parabola (e = 1)

Starting from r = 1/(1 + cos theta):

  • Multiply: r(1 + cos theta) = 1, so r + r cos theta = 1, or r = 1 − x.
  • Square both sides: x² + y² = 1 − 2x + x².
  • Cancel x²: y² = 1 − 2x, a parabola.

📐 Example: Hyperbola (e = 2)

Starting from r = 1/(1 + 2 cos theta):

  • r(1 + 2 cos theta) = 1, so r = 1 − 2x.
  • Square: x² + y² = 1 − 4x + 4x².
  • Rearrange: y² − 3x² = 1 − 4x, a hyperbola with focus at (0, 0).

📐 Example: Ellipse (e = 1/2)

Starting from r = 1/(1 + (1/2) cos theta):

  • r = 1 − (1/2)x.
  • Square: x² + y² = 1 − x + (1/4)x², an ellipse.

📏 The directrix property

All points P on the conic satisfy: distance to focus = e × distance to directrix.

  • From r = A/(1 + e cos theta), multiply through: r + e r cos theta = A, so r + ex = A, giving r = e(A/e − x).
  • This says |PF| = e|Pd|, where the directrix is the vertical line x = A/e.
  • The eccentricity e measures how "stretched" the conic is.

🌍 Planetary orbits

  • The Sun is at a focus (not the center) of each planet's elliptical orbit.
  • The polar equation r = A/(1 + e cos theta) naturally places the Sun at the origin.
  • Conics are determined by five numbers: two for shape (A and e), two for the focus position, and one for rotation angle.

⚠️ Common pitfalls and intersections

⚠️ Multiple representations of the same point

  • The point (r, theta) and (−r, theta + pi) are identical.
  • When finding intersections of two polar curves, setting the equations equal may miss points where the curves reach the same location at different theta values.
  • Example: r = 2 cos theta and r = 1 + cos theta intersect where 2 cos theta = 1 + cos theta (giving cos theta = 1), but they also meet at another point reached at different angles—always draw graphs to find all meeting points.

⚠️ Squaring can introduce artifacts

  • The Mars-Earth example: squaring x² + y² = (3 − 4 cos pi t)² gives r² = 13 − 12 cos theta, but this graph looks nothing like the limaçon because squaring loses the sign of r.
  • Don't confuse: r² = F(theta) and r = F(theta) are different curves.
59

Slope, Length, and Area for Polar Curves

9.3 Slope, Length, and Area for Polar Curves

🧭 Overview

🧠 One-sentence thesis

Calculus problems for polar curves—finding area, slope, and arc length—require adapting rectangular-coordinate formulas by treating narrow wedges as the fundamental shape and expressing x and y in terms of the angle θ.

📌 Key points (3–5)

  • Area in polar coordinates: built from narrow wedges (sectors) with area (1/2) r² Δθ, leading to the integral (1/2) ∫ r² dθ.
  • Common confusion: integrating from 0 to 2π can trace the same curve twice (e.g., a circle r = cos θ completes at θ = π, not 2π).
  • Slope of polar curves: found by dy/dx = (dy/dθ) / (dx/dθ), avoiding the need to eliminate θ and express y as a function of x.
  • Arc length: uses ds = √[(dr/dθ)² + r²] dθ, derived from squaring and adding dx and dy in polar form.
  • Why it matters: these techniques let us compute geometric properties (area, slope, length, surface area) directly from polar equations r = F(θ).

📐 Area in polar coordinates

🥧 The wedge as the fundamental piece

  • A narrow wedge between angles θ and θ + Δθ is approximately a circular sector.
  • The sector is a "piece of pie" cut at a small angle Δθ.
  • Its area is a fraction of the full circle's area π r²:
    • Fraction = Δθ / (2π)
    • Area of wedge = (Δθ / 2π) · π r² = (1/2) r² Δθ

Area formula for polar curves: The area inside r = F(θ) is the integral ∫ (1/2) r² dθ = ∫ (1/2) [F(θ)]² dθ.

⚠️ Don't trace the curve twice

  • Example: The circle r = cos θ has radius 1/2.
  • Integrating from 0 to 2π gives area π/2, but the correct area is π/4.
  • Why: The curve completes one full circle as θ goes from 0 to π; continuing to 2π traces it twice.
  • Fix: Integrate from 0 to π, or recognize when the curve repeats.

🔀 Area between two curves

  • A "chopped wedge" between r₁ and r₂ has area (1/2)(r₁² - r₂²) Δθ.
  • Not (1/2)(r₁ - r₂)² Δθ—don't square the difference; subtract the squares.
  • Example: Area between r = cos θ and r = 1/2 from θ = -π/3 to π/3:
    • Integral: ∫ (1/2)[(cos θ)² - (1/2)²] dθ
    • Limits found where the curves intersect: cos θ = 1/2 at θ = ±π/3.

📈 Slope of polar curves

🔄 Converting to rectangular coordinates

  • A polar curve r = F(θ) can be written as:
    • x = r cos θ = F(θ) cos θ
    • y = r sin θ = F(θ) sin θ
  • Both x and y are functions of θ, not of each other directly.

🧮 Using the chain rule

  • The slope dy/dx is found via the chain rule:
    • dy/dx = (dy/dθ) / (dx/dθ)
  • Why this works: Avoids the "awkward or impossible" step of eliminating θ to express y as a function of x.
  • Example: For the cardioid r = 1 + cos θ:
    • dy/dθ = (1 + cos θ)(cos θ) + (- sin θ)(sin θ)
    • dx/dθ = (1 + cos θ)(- sin θ) + (- sin θ)(cos θ)
    • Slope = (dy/dθ) / (dx/dθ)

🔝 Finding extreme points

  • To find the highest point, maximize y = r sin θ by setting dy/dθ = 0.
  • Example: For r = 1 + cos θ, setting dy/dθ = 0 gives cos θ + cos² θ = 0, which occurs at θ = 60°.

📏 Arc length of polar curves

📐 The length element ds

  • Start with ds = √[(dx)² + (dy)²].
  • For polar curves x = F(θ) cos θ and y = F(θ) sin θ, differentiate using the product rule:
    • dx = [F'(θ) cos θ - F(θ) sin θ] dθ
    • dy = [F'(θ) sin θ + F(θ) cos θ] dθ
  • Square and add (using cos² θ + sin² θ = 1):

Arc length formula: ds = √{[F'(θ)]² + [F(θ)]²} dθ, or equivalently ds = √[(dr/dθ)² + r²] dθ.

  • Geometrically: (ds)² = (dr)² + (r dθ)², treating the curve as a right triangle with legs dr and r dθ.

🌀 Examples of arc length

CurveFormula for dsTotal lengthNotes
Circle r = cos θ√1 dθπIntegrate 0 to π, not 2π
Cardioid r = 1 + cos θ√(2 + 2 cos θ) dθ8Use symmetry; simplify to 2 cos(θ/2)
Logarithmic spiral r = e^θ√2 e^θ dθ√2Finite length as θ → ∞

🐚 Surface area of revolution

  • Revolving around the x-axis: surface area = ∫ 2πy ds
  • Revolving around the y-axis: surface area = ∫ 2πx ds
  • Express x, y, and ds in terms of θ and dθ, then integrate.
  • Example: The circle r = cos θ revolved around the y-axis (with x = cos² θ) gives a "doughnut with no hole" with surface area π²/2.
60

Complex Numbers

9.4 Complex Numbers

🧭 Overview

🧠 One-sentence thesis

Complex numbers extend the real number system to provide solutions for all polynomial equations of degree n, and their polar representation (using Euler's formula) makes multiplication, division, and solving differential equations straightforward.

📌 Key points (3–5)

  • Why complex numbers exist: Real numbers cannot solve equations like x² + 4 = 0; complex numbers (x + iy) fill this gap and guarantee n roots for every degree-n polynomial.
  • Two representations: Rectangular form (x + iy) is best for addition/subtraction; polar form (r·e^(iθ)) is best for multiplication/division/powers.
  • Euler's formula: e^(iθ) = cos θ + i·sin θ connects exponentials, trigonometry, and complex numbers; it makes angle arithmetic automatic.
  • Common confusion: Don't confuse the complex conjugate (x − iy) with the negative (−x − iy); conjugates are mirror images across the real axis and make denominators real when dividing.
  • Application to differential equations: Substituting y = e^(ct) converts differential equations into algebraic equations for c; complex solutions yield real oscillatory or spiral solutions via their real and imaginary parts.

🔢 Foundations: the imaginary unit and arithmetic

🔢 The imaginary unit i

The imaginary number i is defined so that i² = −1.

  • No real number squared gives −1, so mathematicians agreed on a new symbol i (engineers use j).
  • Whenever i² appears in calculations, replace it with −1.
  • Both i and −i are square roots of −1; they solve x² + 1 = 0.

➕ Addition and multiplication

  • Addition: Keep real and imaginary parts separate.
    • Example: (1 + 3i) + (1 + 3i) = (1+1) + i(3+3) = 2 + 6i.
  • Multiplication: Use the distributive property and replace i² with −1.
    • Example: (1 + 3i)(1 + 3i) = 1 + 3i + 3i + 9i² = 1 + 6i − 9 = −8 + 6i.
    • Example: (1 + 3i)(5 − i) = 5 + 15i − i − 3i² = 5 + 14i + 3 = 8 + 14i.
  • After accepting i, no further "new" numbers are needed; all operations stay within the complex system.

🔄 Complex conjugate

The complex conjugate of x + iy is x − iy (flip the sign of the imaginary part).

  • Adding a number and its conjugate gives a real result: (1 + 3i) + (1 − 3i) = 2.
  • Multiplying a number and its conjugate also gives a real result: (1 + 3i)(1 − 3i) = 1 − 9i² = 1 + 9 = 10.
  • This property is the key to division: multiply numerator and denominator by the conjugate of the denominator.
    • Example: 1/(1 + 3i) = (1 − 3i)/[(1 + 3i)(1 − 3i)] = (1 − 3i)/10.
    • General formula: 1/(x + iy) = (x − iy)/(x² + y²).

📏 Absolute value (modulus)

The absolute value of x + iy is r = |x + iy| = √(x² + y²).

  • This is the distance from the origin in the complex plane.
  • The product (x + iy)(x − iy) = x² + y² = r².

🗺️ The complex plane and polar coordinates

🗺️ Rectangular vs. polar representation

  • Rectangular (Cartesian) form: x + iy corresponds to the point (x, y) in the plane.
    • x is the real part (horizontal axis), y is the imaginary part (vertical axis).
    • Best for addition and subtraction (like vector addition).
  • Polar form: x + iy = r(cos θ + i·sin θ) = r·e^(iθ).
    • r = √(x² + y²) is the absolute value; θ is the angle from the positive real axis.
    • x = r·cos θ, y = r·sin θ.
    • Best for multiplication, division, and powers.

🔁 Multiplication and division in polar form

  • Multiplication adds angles and multiplies absolute values:
    • (r₁·e^(iθ₁)) · (r₂·e^(iθ₂)) = (r₁·r₂)·e^(i(θ₁+θ₂)).
    • Example: 2e^(iπ/2) times 3e^(iπ/2) equals 6e^(iπ) = −6 (since 2i times 3i = −6).
  • Division subtracts angles and divides absolute values:
    • (r₁·e^(iθ₁)) / (r₂·e^(iθ₂)) = (r₁/r₂)·e^(i(θ₁−θ₂)).
  • Squaring doubles the angle and squares the absolute value:
    • (r·cos θ + i·r·sin θ)² = r²·cos 2θ + i·r²·sin 2θ.
    • Example: (1 + i)² = 2i; in polar form, √2·e^(iπ/4) squared is 2·e^(iπ/2) = 2i.
  • Powers and roots:
    • The nth power: (r·e^(iθ))ⁿ = rⁿ·e^(inθ).
    • The square root: √(r·e^(iθ)) = √r·e^(iθ/2).

🌀 de Moivre's formula

(cos θ + i·sin θ)ⁿ = cos(nθ) + i·sin(nθ).

  • This follows from repeated multiplication in polar form.
  • For n = −1, we get the reciprocal: 1/(cos θ + i·sin θ) = cos θ − i·sin θ (the complex conjugate on the unit circle).

🌟 Euler's formula and the exponential form

🌟 Euler's formula

e^(iθ) = cos θ + i·sin θ.

  • This is the "key to all numbers on the unit circle."
  • Squaring both sides: (e^(iθ))² = e^(2iθ), which matches the angle-doubling rule.
  • Multiplying e^(iθ) by e^(iφ) gives e^(i(θ+φ)), which adds angles automatically.
  • Special case: e^(2πi) = cos(2π) + i·sin(2π) = 1 (a full circle returns to 1).
  • Derivation: Replace x with in the series e^x = 1 + x + x²/2 + x³/6 + …
    • e^(iθ) = 1 + iθ − θ²/2 − iθ³/6 + …
    • Real terms (1 − θ²/2 + …) sum to cos θ; imaginary terms (θ − θ³/6 + …) sum to sin θ.

📐 Three equivalent forms

FormNotationBest for
Rectangularx + iyAddition, subtraction
Polar (trig)r·cos θ + i·r·sin θVisualizing angle and radius
Exponentialr·e^(iθ)Multiplication, division, powers
  • All three describe the same complex number; choose the form that simplifies the operation.

🔍 Examples of powers and roots

  • Example (powers of w = e^(iπ/4)):
    • w² = e^(iπ/2) = i.
    • w⁴ = e^(iπ) = −1.
    • w⁸ = e^(2πi) = 1 (full circle).
    • w²⁵ = w⁸·w⁸·w⁸·w = w (since w⁸ = 1).
    • The eight powers of w are the eighth roots of 1, evenly spaced around the unit circle.
  • Example (square roots of −4): −4 = 4·e^(iπ), so √(−4) = 2·e^(iπ/2) = 2i or 2·e^(i3π/2) = −2i.
  • Example (cube roots of 1): Solve r³·e^(3iθ) = 1.
    • r = 1; θ can be 2π/3, 4π/3, or 6π/3 = 2π (which is the same as 0).
    • The three cube roots are e^(2πi/3), e^(4πi/3), and e^(6πi/3) = 1, evenly spaced at 120° intervals.
    • General rule: The nth roots of 1 are e^(2πi/n), e^(4πi/n), …, e^(2πi) = 1 (n roots total).

🧮 Application: solving differential equations

🧮 The substitution method

  • Key idea: Substitute y = e^(ct) into a linear differential equation with constant coefficients.
  • Each derivative brings a factor of c:
    • y′ = c·e^(ct).
    • y″ = c²·e^(ct).
    • y‴ = c³·e^(ct), etc.
  • Cancel e^(ct) from both sides to get an algebraic equation for c.

🔁 Example: y″ = −4y

  • Substitute y = e^(ct): c²·e^(ct) = −4·e^(ct).
  • Cancel e^(ct): c² = −4.
  • Solutions for c: c = 2i or c = −2i (the two square roots of −4).
  • Pure exponential solutions: y = e^(2it) and y = e^(−2it).
  • General solution: y = A·e^(2it) + B·e^(−2it) (A and B chosen to match initial conditions).
  • Real solutions: Take real and imaginary parts using Euler's formula.
    • e^(2it) = cos(2t) + i·sin(2t).
    • Real part: y = cos(2t).
    • Imaginary part: y = sin(2t).
    • These are "pure oscillatory solutions" (no growth or decay, just oscillation).

🌀 Example: y‴ = y

  • Substitute y = e^(ct): c³·e^(ct) = e^(ct), so c³ = 1.
  • Solutions for c: the three cube roots of 1.
    • c = 1 gives y = e^t (real, growing exponentially).
    • c = e^(2πi/3) = −1/2 + i·√3/2 gives y = e^(ct) = e^(−t/2)·e^(i√3t/2).
    • c = e^(4πi/3) = −1/2 − i·√3/2 gives y = e^(−t/2)·e^(−i√3t/2) (complex conjugate of the second).
  • Interpretation: The absolute value |y| = e^(−t/2) decreases; the factor e^(±i√3t/2) rotates around the circle, so y spirals in to zero.
  • Real solutions:
    • y = e^t (exponential growth).
    • y = e^(−t/2)·cos(√3t/2) (real part of the spiral).
    • y = e^(−t/2)·sin(√3t/2) (imaginary part of the spiral).

🔢 Example: y⁽⁴⁾ = y

  • Substitute y = e^(ct): c⁴ = 1.
  • Solutions for c: the four fourth roots of 1 are i, −1, −i, 1.
  • Exponential solutions: y = e^(it), e^(−t), e^(−it), e^t.
  • Real solutions: Combine e^(it) and e^(−it) into cos(t) and sin(t); also e^t and e^(−t).
  • All satisfy y″ = y (since the fourth derivative equals the function).

⚠️ Don't confuse

  • The differential equation may be real, but the exponential solutions e^(ct) are often complex.
  • To get real solutions, take the real part (cos) and imaginary part (sin) of the complex exponentials.
  • The angle θ in e^(iθ) corresponds to oscillation; the real exponent in e^(−t/2) corresponds to growth or decay.
61

The Geometric Series

10.1 The Geometric Series

🧭 Overview

🧠 One-sentence thesis

The geometric series 1 + x + x² + x³ + ⋯ converges to 1/(1 − x) when |x| < 1, and by applying calculus operations (differentiation, integration, substitution) to this series, we can discover series representations for logarithms, arctangent, and even compute π and ln 2.

📌 Key points (3–5)

  • Convergence condition: The geometric series converges only when |x| is between −1 and 1; outside that range it diverges.
  • Two-way relationship: The function 1/(1 − x) produces the series through its derivatives at x = 0, and the series sums to the function when it converges.
  • Calculus operations unlock new series: Differentiating, integrating, multiplying, or substituting variables in the geometric series yields series for functions like 1/(1 − x)², ln(1 + x), and tan⁻¹ x.
  • Common confusion: Convergence vs divergence—changing x to 1 in the logarithm series 1 + 1/2 + 1/3 + ⋯ (harmonic series) diverges to infinity, but alternating signs (1 − 1/2 + 1/3 − ⋯) converges to ln 2.
  • Practical power: These series let us compute exact values like ln 2 ≈ 0.693 and π/4 = 1 − 1/3 + 1/5 − 1/7 + ⋯, though convergence speed varies.

🔁 The core geometric series

🔁 What it is and how it sums

Geometric series: 1 + x + x² + x³ + ⋯ = 1/(1 − x)

  • The left side is an infinite sum of powers of x.
  • The right side is a simple rational function.
  • Why it works: Multiply (1 + x + x² + ⋯ + x^(n−1)) by (1 − x); everything cancels except 1 at the start and −x^n at the end, giving (1 − x^n)/(1 − x).
  • As n → ∞, if |x| < 1 then x^n → 0, so the sum approaches 1/(1 − x).

📏 Convergence range

  • Converges: when −1 < x < 1 (i.e., |x| < 1).
  • Diverges: when |x| ≥ 1.
  • Example: At x = 1/10, the series 1 + 0.1 + 0.01 + ⋯ = 1/(1 − 0.1) = 10/9 ≈ 1.111⋯
  • Don't confuse: The series is "safe" only inside this interval; outside it, terms grow instead of shrinking.

🔄 Matching derivatives

  • The function 1/(1 − x) has derivatives at x = 0: f(0) = 1, f′(0) = 1, f″(0) = 2, f‴(0) = 6, …, f⁽ⁿ⁾(0) = n!
  • Each power x^n in the series has its nth derivative equal to n! at x = 0, and all other derivatives zero.
  • Key idea: By adding all powers, the series matches every derivative of 1/(1 − x) at x = 0, so the series equals the function.

🔢 Applications to repeating decimals

🔢 Fractions as geometric series

  • Every repeating decimal is a geometric series, which sums to a fraction.
  • Example: 0.111⋯ = 1/10 + 1/100 + 1/1000 + ⋯ = (1/10)/(1 − 1/10) = 1/9.
  • Multiply by 10: 1.111⋯ = 10/9.
  • Example: 0.121212⋯ = 12/100 + 12/10000 + ⋯ = (12/100)/(1 − 1/100) = 12/99.

🔢 Multiplying by a constant

  • Multiply the geometric series by a or ax:
    • a + ax + ax² + ⋯ = a/(1 − x)
    • ax + ax² + ax³ + ⋯ = ax/(1 − x)
  • Example: 3.333⋯ = 3 × (10/9) = 10/3.

🧮 Calculus operations on the series

📈 Differentiation

  • Differentiate 1 + x + x² + x³ + ⋯ term by term:
    • Result: 1 + 2x + 3x² + 4x³ + ⋯
    • This equals d/dx[1/(1 − x)] = 1/(1 − x)².
  • Example: At x = 1/10, the left side starts 1.23456789⋯, and the right side is 100/81.

📉 Subtracting series

  • Subtract the original series from its derivative:
    • (1 + 2x + 3x² + ⋯) − (1 + x + x² + ⋯) = x + 2x² + 3x³ + ⋯
    • This equals x/(1 − x)².
  • Application: Expected number of coin tosses until heads = 1·(1/2) + 2·(1/4) + 3·(1/8) + ⋯ = (1/2)/(1 − 1/2)² = 2.

✖️ Multiplying series

  • Multiply (1 + x + x² + ⋯) by itself:
    • Result: 1 + 2x + 3x² + 4x³ + ⋯
    • This is the same as the derivative series, so it also equals 1/(1 − x)².
  • Observation: The geometric series satisfies dy/dx = y², and so does the function 1/(1 − x).

🔗 Integration

  • Integrate 1 + x + x² + x³ + ⋯ term by term:
    • Result: x + x²/2 + x³/3 + x⁴/4 + ⋯
    • This equals ∫[dx/(1 − x)] = −ln(1 − x).
  • Similarly, integrate 1 − x + x² − x³ + ⋯ (the series for 1/(1 + x)):
    • Result: x − x²/2 + x³/3 − x⁴/4 + ⋯
    • This equals ∫[dx/(1 + x)] = ln(1 + x).

📊 Computing logarithms and π

📊 Series for ln 2

  • Substitute x = 1 into x − x²/2 + x³/3 − ⋯:
    • Result: 1 − 1/2 + 1/3 − 1/4 + ⋯ = ln 2 ≈ 0.693.
  • Substitute x = 1/2 into x + x²/2 + x³/3 + ⋯:
    • Result: 1/2 + 1/8 + 1/24 + 1/64 + ⋯ = −ln(1/2) = ln 2.
  • Common confusion: The harmonic series 1 + 1/2 + 1/3 + 1/4 + ⋯ (all positive) diverges to infinity, but alternating signs makes it converge.

📊 Faster convergence

  • Add ln(1 + x) and −ln(1 − x):
    • Result: 2(x + x³/3 + x⁵/5 + ⋯) = ln[(1 + x)/(1 − x)].
  • At x = 1/3, the right side is ln(4/3) − ln(2/3) = ln 2.
  • Powers of 1/3 shrink much faster than powers of 1 or 1/2, so this series computes ln 2 more quickly.

🥧 Series for π

  • Change x to −x² in the geometric series:
    • Result: 1 − x² + x⁴ − x⁶ + ⋯ = 1/(1 + x²).
  • Integrate term by term:
    • Result: x − x³/3 + x⁵/5 − x⁷/7 + ⋯ = tan⁻¹ x.
  • Substitute x = 1 (since tan⁻¹ 1 = π/4):
    • Result: 1 − 1/3 + 1/5 − 1/7 + ⋯ = π/4.
  • Practical issue: This series converges very slowly; the 5000th term is still about 0.0001.

🥧 Historical computation of π

  • Archimedes located π between 3.14 and 3 + 1/7 using polygons.
  • Halley used faster-converging arctangent series with smaller x values (e.g., x = 1/√3) to find 71 digits of π/6.
  • Modern formulas like π/4 = 4 tan⁻¹(1/5) − tan⁻¹(1/239) converge much faster.
  • By 1973, computers reached 1 million digits; by 1989, over 1 billion digits were computed using the arithmetic-geometric mean iteration of Gauss.
  • Don't confuse: Hand calculations (Shanks: 527 correct out of 607 claimed) vs. computer methods (now over 1 billion digits).

🔧 Variable substitutions

🔧 Changing the variable

  • Replace x with x²:
    • 1 + x² + x⁴ + x⁶ + ⋯ = 1/(1 − x²) (even powers only).
  • Replace x with −x²:
    • 1 − x² + x⁴ − x⁶ + ⋯ = 1/(1 + x²).
  • Replace x with x/2:
    • 1 + x/2 + (x/2)² + ⋯ = 1/(1 − x/2) = 2/(2 − x).
    • Converges when |x| < 2.

🔧 Negative powers

  • Replace x with 1/x:
    • 1 + 1/x + 1/x² + 1/x³ + ⋯ = 1/(1 − 1/x) = x/(x − 1).
  • This is a series of negative powers x⁻ⁿ.
  • Convergence shift: This series converges when |x| > 1 (large x), not small x.

🧪 Solving differential equations with series

🧪 Term-by-term construction

  • Start with dy/dx = y² and initial value y(0) = 1.
  • At x = 0, y′ = 1² = 1.
  • Differentiate the equation: y″ = 2y·y′, so at x = 0, y″ = 2·1·1 = 2.
  • Continue: y‴ = 2y·y″ + 2(y′)² = 2·1·2 + 2·1² = 6 at x = 0.
  • All derivatives are factorials: 1, 1, 2, 6, 24, …
  • Result: The series 1 + x + x² + x³ + ⋯ matches these derivatives, so y = 1/(1 − x).

🧪 Different starting value

  • Start with y(0) = −1.
  • Then y′ = (−1)² = 1, y″ = 2·(−1)·1 = −2, y‴ = 6, etc.
  • Alternating signs: −1, 1, −2, 6, …
  • Result: The series −1 + x − x² + x³ − ⋯ = −1/(1 + x).
  • This is the geometric series with x replaced by −x, then multiplied by −1.
62

Convergence Tests: Positive Series

10.2 Convergence Tests: Positive Series

🧭 Overview

🧠 One-sentence thesis

A series converges when its partial sums approach a finite limit, and several tests—comparison, integral, ratio, and root—allow us to determine convergence without computing the exact sum.

📌 Key points (3–5)

  • Partial sums decide convergence: A series converges when the sequence of partial sums s_n approaches a limit s, not when individual terms a_n approach zero.
  • Necessary but not sufficient: If a series converges, then a_n must approach zero, but a_n approaching zero does not guarantee convergence (harmonic series is the key counterexample).
  • Comparison is the main strategy: Most tests compare a new series with known convergent or divergent series (geometric series and p-series are the benchmarks).
  • Common confusion—terms vs. sums: Do not confuse a_n approaching 0 with s_n approaching s; the harmonic series has a_n = 1/n approaching 0 but s_n diverges to infinity.
  • Different tests for different series: Ratio and root tests work well for factorials and exponentials; integral and comparison tests work better for polynomial-like terms.

📐 Fundamental definitions

📐 Partial sums

Partial sum s_n: the sum of the first n terms, s_n = a_1 + a_2 + ... + a_n.

  • The series is really a sequence of these partial sums.
  • Convergence means the s_n approach a limit as n increases.
  • Example: For 1/2 + 1/4 + 1/8 + ..., the partial sums are s_1 = 1/2, s_2 = 3/4, s_3 = 7/8, s_n = 1 - 1/(2^n).

📐 Convergence vs. divergence

A series converges to s when its partial sums s_n approach the limit s.

  • If no limit exists, the series diverges.
  • Positive series can only diverge to infinity (they cannot oscillate because each term moves forward).
  • Example: The geometric series 1/10 + 1/100 + 1/1000 + ... converges to s = 1/9.
  • Example: The series 1 + 1 + 1 + ... diverges because s_n = 1, 2, 3, ... goes to infinity.

📐 Necessary condition (but not sufficient)

Theorem 10A: If a series converges (s_n approaches s), then its terms must approach zero (a_n approaches 0).

  • Proof: If s_n approaches s, then s_(n-1) also approaches s, so their difference a_n = s_n - s_(n-1) approaches zero.
  • Warning: The converse is false—a_n approaching 0 does not guarantee convergence.
  • The harmonic series 1 + 1/2 + 1/3 + ... has a_n = 1/n approaching 0, but s_n diverges to infinity.

🔍 Comparison tests

🔍 Direct comparison test

Theorem 10B (Comparison test):

  • If 0 ≤ a_n ≤ b_n and the sum of b_n converges, then the sum of a_n converges.
  • If a_n ≥ c_n and the sum of c_n diverges to infinity, then the sum of a_n diverges to infinity.

How it works:

  • Smaller terms add to a smaller sum; larger terms add to a larger sum.
  • You need a known series for comparison (usually geometric or p-series).

Example—harmonic series divergence:

  • Compare 1 + 1/2 + 1/3 + 1/4 + ... with 1 + 1/2 + 1/4 + 1/4 + 1/8 + 1/8 + 1/8 + 1/8 + ...
  • The comparison series groups as 1 + 1/2 + 1/2 + 1/2 + ... (since 2·(1/4) = 1/2, 4·(1/8) = 1/2).
  • The comparison series diverges, so the harmonic series (above it) also diverges.

🔍 Limit comparison test

Theorem 10F (Limit comparison test): If the ratio a_n / b_n approaches a positive limit L, then the sum of a_n and the sum of b_n either both diverge or both converge.

Why it works:

  • When n is large, a_n is approximately L·b_n, so the two series behave the same way.
  • Example: The sum of 1/(n² + 1) behaves like the sum of 1/n² because the ratio n²/(n² + 1) approaches 1.

🔍 Key benchmark series

SeriesFormConverges whenDiverges when
Geometricx + x² + x³ + ...|x| < 1|x| ≥ 1
p-series1 + 1/(2^p) + 1/(3^p) + ...p > 1p ≤ 1
Harmonic1 + 1/2 + 1/3 + ...NeverAlways (p = 1 case)

🧮 Integral test

🧮 How the integral test works

Theorem 10C (Integral test): If y(x) is decreasing and y(n) agrees with a_n, then the sum of a_n and the integral of y(x) from 1 to infinity both converge or both diverge.

Visual reasoning:

  • Each term a_n = 1/n is the area of a rectangle of width 1 and height 1/n.
  • Compare rectangular areas with the curved area under y = 1/x.
  • The rectangle is above the curve to the right of x = n, below the curve to the left.
  • If the integral (curved area) is finite, the sum (rectangular area) is finite.

🧮 Harmonic series via integral test

  • Compare 1 + 1/2 + 1/3 + ... with the integral of 1/x from 1 to n+1.
  • The integral equals ln(n+1), which goes to infinity.
  • Therefore the harmonic series diverges.
  • More precisely: ln(n+1) < s_n < 1 + ln(n), so s_n grows like ln(n).

Don't confuse: The series diverges very slowly—after a million years of adding a million terms per second, the sum is still less than 46.

🧮 The p-series

Key result: The sum of 1/(2^p) + 1/(3^p) + 1/(4^p) + ... converges if and only if p > 1.

  • Apply the integral test to y = 1/(x^p).
  • The integral from 1 to infinity of 1/(x^p) equals 1/(p-1) when p > 1 (finite area).
  • When p ≤ 1, the integral diverges.
  • Special cases: p = 2 gives sum π²/6 (Euler); p = 3 sum is unknown.

⚡ Ratio and root tests

⚡ Ratio test

Theorem 10D (Ratio test): If a_(n+1) / a_n approaches a limit L < 1, the series converges. If L > 1, it diverges. If L = 1, no decision.

When to use it:

  • Best for series involving factorials or exponentials.
  • The test compares a_n with L^n (geometric series).

Example—exponential series:

  • For e^x = 1 + x + (x²/2!) + (x³/3!) + ..., the terms are a_n = x^n / n!.
  • The ratio a_(n+1) / a_n = x/(n+1) approaches L = 0 as n approaches infinity.
  • Since L = 0 < 1, the series converges for all x.
  • Factorials grow faster than any power x^n.

⚡ Root test

Theorem 10E (Root test): If the nth root (a_n)^(1/n) approaches L < 1, the series converges. If L > 1, it diverges. If L = 1, no decision.

Example: For the sum of 1/(n^n), the nth root is 1/n, which approaches L = 0, so the series converges.

⚡ When ratio and root tests fail

  • Both tests give L = 1 for the p-series 1/n^p, regardless of p.
  • They cannot distinguish between p = 2 (convergent) and p = 1 (divergent).
  • The integral test is sharper for polynomial-like terms.

Don't confuse: L = 1 means "no decision," not "divergence"—you need a different test.

🎯 Strategy and examples

🎯 The convergence spectrum

From most divergent to most convergent:

  • 1 + 1 + ... (p < 1 series)
  • 1/n (harmonic, borderline)
  • 1/(n^p) for p > 1
  • n/(2^n)
  • 1/(2^n)
  • 4^n / n!
  • 1/n!
  • 1/(n^n)

Key insight: The crossover to convergence is after 1/n. Powers x^n beat any n^p; factorials n! beat any x^n.

🎯 Choosing the right test

Series typeBest testWhy
Polynomial-like (1/n^p)Integral or comparisonRatio/root give L = 1
Factorials (n!)Ratio testRatio simplifies nicely
Exponentials (x^n)Ratio or rootBoth give L = x
Mixed (x^n / n^p)Ratio testPowers dominate

🎯 Worked examples

Example: Does 1/(n² + 1) converge?

  • Compare with 1/n²: the ratio n²/(n² + 1) approaches 1.
  • By limit comparison test, both behave the same way.
  • Since 1/n² converges (p = 2 > 1), so does 1/(n² + 1).

Example: Does 1/(2n - 1) converge?

  • Compare with 1/(2n): the ratio 2n/(2n - 1) approaches 1.
  • Since 1/(2n) = (1/2)·(1/n) diverges, so does 1/(2n - 1).

Don't confuse: Adding or subtracting constants in the denominator doesn't change convergence behavior for large n.

63

10.3 Convergence Tests: All Series

10.3 Convergence Tests: All Series

🧭 Overview

🧠 One-sentence thesis

When series contain negative terms, absolute convergence (convergence of the series of absolute values) guarantees convergence of the original series, though a series may converge without being absolutely convergent.

📌 Key points (3–5)

  • Absolute convergence definition: a series converges absolutely if the series of absolute values of its terms converges.
  • Main guarantee: if the series of absolute values converges, the original series (with negative terms) must also converge.
  • Common confusion: convergence vs absolute convergence—a series can converge even when its absolute-value series diverges (the π series example).
  • Sign flexibility: changing signs in a convergent positive series preserves convergence; choosing signs carefully can make the sum reach any value in a range.

🔢 Absolute convergence concept

🔢 What absolute convergence means

Absolutely convergent: the series ∑ aₙ is absolutely convergent if ∑ |aₙ| is convergent.

  • Start with any series that may have negative terms.
  • Take the absolute value of every term: change aₙ to |aₙ|.
  • If this new all-positive series converges, the original series is called absolutely convergent.

🔍 Why absolute values matter

  • Changing a negative number to its absolute value increases the term (makes it larger or keeps it the same).
  • So ∑ |aₙ| is a "larger" series than ∑ aₙ.
  • The excerpt's main point: the smaller series ∑ aₙ is guaranteed to converge if the larger series ∑ |aₙ| converges.

🧷 The main convergence guarantee

🧷 Absolute convergence implies convergence

Rule 10G: If ∑ |aₙ| converges, then ∑ aₙ converges (absolutely).

  • This is a one-way implication: absolute convergence is a stronger condition.
  • It provides a test: to check if a series with negative terms converges, first check if the absolute-value series converges.
  • Example: the geometric series 1 − 1/2 + 1/4 − 1/8 + ... converges to 1/3; the absolute-value series 1 + 1/2 + 1/4 + 1/8 + ... converges to 2, so the original series is absolutely convergent.

⚠️ Convergence without absolute convergence

  • The converse is not true: ∑ aₙ might converge even if ∑ |aₙ| diverges.
  • The excerpt gives the π series as an example: π/4 = 1 − 1/3 + 1/5 − 1/7 + ... converges, but changing all signs to + makes it diverge to infinity.
  • Don't confuse: "converges" ≠ "converges absolutely."

| Type | ∑ aₙ | ∑ |aₙ| | Relationship | |------|------|--------|--------------| | Absolutely convergent | Converges | Converges | Absolute convergence guarantees convergence | | Conditionally convergent | Converges | Diverges | Convergence without absolute convergence |

🎯 Sign manipulation and flexibility

🎯 Changing signs in positive series

Example 1 from the excerpt:

  • Start with a positive convergent series: 1/2 + 1/4 + 1/8 + ...
  • Change any signs to minus (e.g., 1/2 − 1/4 + 1/8 − ...).
  • The new series still converges (absolutely), because the absolute-value series is the original positive series.

🎲 Controlling the sum by choosing signs

  • The excerpt states: "The right choice of signs will make it converge to any number between −1 and 1."
  • This refers to starting with 1/2 + 1/4 + 1/8 + ... (which sums to 1).
  • By choosing which terms to make negative, you can steer the sum anywhere in the interval [−1, 1].
  • Example: all positive gives +1; all negative gives −1; a mix gives values in between.

🔄 Alternating series example

Example 2 mentioned: the alternating series 1 − 1/2 + 1/4 − 1/8 + ...

  • This is a geometric series with ratio −1/2.
  • It converges (the excerpt earlier stated it equals 1/3).
  • The absolute-value series 1 + 1/2 + 1/4 + 1/8 + ... converges to 2, so this is absolutely convergent.
64

The Product and Quotient and Power Rules

10.4 The Taylor Series for e^x, sin x, and cos x

🧭 Overview

🧠 One-sentence thesis

The power rule applies to all real exponents—negative, fractional, or any number—and together with the product, quotient, and reciprocal rules, these differentiation formulas achieve virtually all derivatives ever computed.

📌 Key points (3–5)

  • Power rule generalization: the derivative of x^n is n·x^(n−1) for any real number n, including negative integers and fractions.
  • Fractional exponents: derivatives of x^(p/q) follow the same power rule by treating u = x^(p/q) as u^q = x^p and differentiating both sides.
  • Complete toolkit: product rule, quotient rule, reciprocal rule, power rule, and linearity rule together handle nearly all derivatives.
  • Common confusion: infinite slope vs zero slope—when 0 < n < 1, the slope is infinite at x = 0; when n > 1, the slope is zero at x = 0.
  • Trigonometric derivatives: all six trig functions have established derivatives using these rules.

🔢 Power rule for all exponents

🔢 Negative and fractional powers

The power rule: the derivative of u^n is n·u^(n−1)·u' for any real number n.

  • The rule works for negative exponents: the derivative of x^(−1) is (−1)·x^(−2).
  • It also works for fractional exponents like x^(1/2) or x^(p/q).
  • The excerpt emphasizes that the power rule applies "when n is negative, or a fraction, or any real number."

🧮 Fractional exponents derivation

To find the derivative of x^(p/q):

  • Write u = x^(p/q) as u^q = x^p.
  • Take derivatives of both sides: q·u^(q−1)·(du/dx) = p·x^(p−1).
  • Solve for du/dx: (du/dx) = (p·x^(p−1))/(q·u^(q−1)).
  • Cancel x^p with u^q and replace p/q by n and u by x^n to get du/dx = n·x^(n−1).

Example: The slope of x^(1/3) is (1/3)·x^(−2/3). The slope is infinite at x = 0 and zero at x = 8, but the curve keeps climbing without an asymptote.

⚠️ Slope behavior at x = 0

Exponent rangeSlope at x = 0Slope at x = 8Behavior
0 < n < 1InfiniteZeroClimbs but flattens out
n > 1ZeroInfiniteStarts flat, then climbs faster
  • Don't confuse: x^(1/3) has infinite slope at x = 0 (keeps climbing steeply near zero), while x^(4/3) has zero slope at x = 0 (starts flat).
  • Example: x^(4/3) has slope (4/3)·x^(1/3), which is zero at x = 0 and infinite at x = 8; it climbs faster than a line but slower than a parabola.

📐 Complete differentiation toolkit

📐 Five core rules

The excerpt lists all rules "in one place for convenience":

RuleFormulaWhat it does
Linearity(a·u + b·v)' = a·u' + b·v'Derivative of linear combinations
Product(u·v)' = u·v' + v·u'Derivative of products
Reciprocal(1/v)' = −v'/v²Derivative of reciprocals
Quotient(u/v)' = (v·u' − u·v')/v²Derivative of quotients
Power(u^n)' = n·u^(n−1)·u'Derivative of powers
  • These rules "achieve virtually all the derivatives ever computed by mankind."
  • The excerpt notes that together with the chain rule (Chapter 4), these formulas form a complete toolkit.

🔺 Trigonometric derivatives

All six trigonometric functions now have established derivatives:

  • (sin x)' = cos x
  • (cos x)' = −sin x
  • (tan x)' = sec²x
  • (cot x)' = −csc²x
  • (sec x)' = sec x · tan x
  • (csc x)' = −csc x · cot x

These follow from the product, quotient, and reciprocal rules applied to sin x and cos x.

Example: The derivative of tan x = (sin x)/(cos x) uses the quotient rule: (cos x · cos x − sin x · (−sin x))/(cos²x) = (cos²x + sin²x)/(cos²x) = 1/(cos²x) = sec²x.

🧩 Applying the rules

🧩 Combining multiple rules

  • The derivative of tan³x uses the power rule: 3·tan²x · (tan x)' = 3·tan²x · sec²x.
  • The derivative of (cos x)^(−1) with n = −1 gives (−1)·(cos x)^(−2)·(−sin x) = sin x/(cos²x) = sec x · tan x, which agrees with the rule for sec x.
  • The linearity rule applies to a·u(x) + b·v(x): the derivative is a·u' + b·v'.

Example: The slope of 3·sin x + 4·cos x is 3·cos x + 4·(−sin x) = 3·cos x − 4·sin x.

🔍 Understanding vs memorizing

The excerpt contrasts human understanding with mechanical computation:

  • "A computer can memorize them all, but it doesn't know what they mean and you do."
  • The goal is not just to apply formulas but to understand why they work.

Don't confuse: knowing the rules mechanically vs understanding their meaning—the excerpt emphasizes that human learners should grasp the concepts behind the formulas, not just memorize them.

65

Power Series

10.5 Power Series

🧭 Overview

🧠 One-sentence thesis

Power series converge within a symmetric interval around their basepoint, determined by a radius of convergence that extends to the nearest point where the underlying function fails.

📌 Key points (3–5)

  • Convergence behavior: A power series either converges everywhere, only at the basepoint, or within a symmetric interval of radius r around the basepoint.
  • What determines the radius: The convergence stops at distance r from the basepoint, where the function (or its derivatives) first encounters a singularity—even if that singularity involves complex numbers.
  • Remainder and accuracy: The error after n terms is controlled by the (n+1)st derivative, allowing us to estimate how many terms are needed for a desired accuracy.
  • Common confusion: The series may converge in an interval but the function is fine beyond it—or the function may work everywhere but the series only converges locally (e.g., 1/(1−x) is defined for x>1 but its series diverges there).
  • Operations preserve radius: Differentiating or integrating a power series term-by-term does not change the convergence radius.

📏 Convergence radius and intervals

📏 Three convergence scenarios

Radius of convergence r: A power series Σaₙxⁿ either converges for all x, or only at x=0, or it has a radius r such that the series converges absolutely if |x| < r and diverges if |x| > r.

  • Converges everywhere (r = ∞): Example: Σxⁿ/n! = eˣ
  • Converges only at basepoint (r = 0): Example: Σn!xⁿ
  • Finite radius: Example: Σxⁿ has r = 1 (geometric series)

🎯 Symmetric intervals

  • When the basepoint is a (not zero), the interval shifts: convergence occurs for |x − a| < r
  • This means x lies between a − r and a + r, symmetric around the basepoint
  • Don't confuse: The endpoints a ± r require separate testing; the series may converge (absolutely or conditionally) or diverge there

🔍 Why convergence stops: the comparison test proof

  • If Σaₙ Xⁿ converges at some point X, then it converges for all x closer to the basepoint: |x| < |X|
  • Proof idea: Since Σaₙ Xⁿ converges, eventually |aₙ Xⁿ| ≤ 1, so |aₙ xⁿ| ≤ |x/X|ⁿ
  • The series Σaₙ xⁿ is dominated by the convergent geometric series for |x/X|

🧪 Finding the radius: ratio and root tests

🧪 Ratio test (best for power series)

  • Compute L = limit of |aₙ₊₁ xⁿ⁺¹ / aₙ xⁿ| as n → ∞
  • Convergence if L < 1; this determines the radius r

Example: For Σnxⁿ/4ⁿ, the ratio of consecutive terms approaches x/4, so convergence requires |x/4| < 1, giving r = 4.

🌀 Root test

  • Compute L = limit of (|aₙ xⁿ|)^(1/n) as n → ∞
  • Same convergence criterion: L < 1

Example: For the sine series x − x³/3! + x⁵/5! − ..., the ratio x²/(n+2)(n+1) approaches 0, so r = ∞ (converges everywhere).

📍 Shifted basepoint

Example: Σ(x−5)ⁿ/n² has basepoint a = 5 and radius r = 1 (ratios approach |x−5|), so convergence occurs for 4 ≤ x ≤ 6. The factor 1/n² ensures convergence even at the endpoints 4 and 6.

🎯 Remainder and convergence to the function

🎯 The remainder term Rₙ(x)

Remainder formula: Rₙ(x) = f(x) − sₙ(x) = f⁽ⁿ⁺¹⁾(c)(x−a)ⁿ⁺¹/(n+1)! for some unknown point c between a and x.

  • sₙ is the partial sum: a₀ + a₁(x−a) + ... + aₙ(x−a)ⁿ
  • The error is like the next term aₙ₊₁(x−a)ⁿ⁺¹, but with the derivative evaluated at c instead of a
  • Why it matters: To prove the series converges to f(x), we must show Rₙ → 0 as n → ∞

Example (eˣ): Rₙ = eˣ − (1 + x + ... + xⁿ/n!) = eᶜ xⁿ⁺¹/(n+1)! for some c between 0 and x. At x=1, n=2, the error is e − 2.5 ≈ 0.218, and the formula gives eᶜ/6 with c ≈ 0.27.

🔬 Practical use

  • In practice, we compute only a few terms and estimate the error
  • For higher accuracy, move the basepoint closer to the target x, or switch to another series

🌐 The circle of convergence

🌐 Why "radius" and complex singularities

  • The convergence region is actually a circle in the complex plane, not just an interval on the real line
  • The radius r extends from the basepoint to the nearest point where f(x) fails (a "singularity")
  • This singularity can be a complex number, invisible on the real axis

Example: f(x) = 1/(1+x²) has r = 1 around x = 0, even though the function never fails for real x. The reason: it blows up at the imaginary points x = i and x = −i, which lie at distance 1 from the origin.

📐 Distance to singularity

FunctionSingularityBasepoint aRadius r
1/(1−x)x = 1a = 0r = 1
1/(1+x²)x = ±ia = 0r = 1
1/(1+x²)x = ±ia = 3r = √10
ln(1+x)x = −1a = 0r = 1

Don't confuse: A function may look smooth on the real line but have complex singularities that limit convergence.

🔧 Operations on power series

🔧 Derivative and integral preserve radius

If f(x) = Σaₙ xⁿ has radius r, then:

  • Derivative: df/dx = Σn aₙ xⁿ⁻¹ also has radius r
  • Integral: ∫f(x)dx = Σaₙ xⁿ⁺¹/(n+1) also has radius r

Example: The series for 1/(1−x), its derivative 1/(1−x)², and its integral −ln(1−x) all have r = 1 (all fail at x = 1).

🧮 Integration of previously impossible functions

Example: We can now integrate e^(−x²): ∫e^(−x²) dx = ∫(1 − x² + x⁴/2! − ...) dx = x − x³/3 + x⁵/(5·2!) − x⁷/(7·3!) + ...

This series always converges (r = ∞).

🎲 The binomial series

🎲 Binomial coefficients for any power p

Binomial series: (1+x)ᵖ = 1 + px + p(p−1)x²/2! + ... = Σ[p(p−1)···(p−n+1)/n!]xⁿ

  • When p is a positive integer, the series stops after xᵖ (familiar from Chapter 2)
  • When p is fractional or negative, the series never stops and converges for |x| < 1

Example (p = 1/2): √(1+x) = 1 + (1/2)x − (1/8)x² + (1/16)x³ − ...

  • Coefficients: aₙ = [1/n!]·(1/2)·(−1/2)·(−3/2)···(1/2 − n + 1)

🎯 Convergence radius for binomials

  • Positive integer p: r = ∞ (series terminates, no failure)
  • Other p: r = 1 (failure at x = −1)
    • If p < 0: (1+x)ᵖ blows up at x = −1
    • If 0 < p < 1: higher derivatives blow up at x = −1

Example: For 1/√(1+x) (p = −1/2), the function itself fails at x = −1.

66

Vectors and Dot Products

11.1 Vectors and Dot Products

🧭 Overview

🧠 One-sentence thesis

Vectors encode both magnitude and direction and can be combined through addition and the dot product, which reveals geometric relationships like perpendicularity and angles between vectors.

📌 Key points (3–5)

  • What a vector is: a directed quantity with magnitude and direction, represented by components (e.g., x and y) or as an arrow from one point to another.
  • Two ways to work with vectors: coordinate-free (geometry and physics, focusing on direction and magnitude without axes) vs. coordinate-based (using components for calculation).
  • The dot product has two equivalent forms: geometric (length times length times cosine of angle) and algebraic (sum of component-wise products).
  • Common confusion—perpendicular vs. parallel: perpendicular vectors have dot product zero (angle 90°); parallel vectors have the same or opposite direction (angle 0° or 180°).
  • Why it matters: vectors model physical quantities (velocity, force), medical data (heart vectors in ECG), and geometric relationships (midpoints, medians).

📐 What vectors are and how to represent them

📐 Definition and notation

A vector is a straight line with direction, starting at one point and ending at another; it has both magnitude (length) and direction.

  • In the plane, a vector v from the origin to (x, y) has two components: v = [x, y] (column form) or v = xi + yj (unit-vector form).
  • In three dimensions, V = [x, y, z] or V = xi + yj + zk.
  • Boldface (e.g., v) indicates a vector; lightface (e.g., x, y) indicates scalar components.
  • The zero vector is 0 = [0, 0] (or [0, 0, 0] in 3D).

📏 Length (magnitude)

  • The length of v = (x, y) is |v| = √(x² + y²), from the Pythagorean theorem.
  • In 3D, |V| = √(x² + y² + z²).
  • Example: v = (3, 1) has length √(9 + 1) = √10.

🧭 Direction and unit vectors

  • A unit vector has length 1.
  • Any nonzero vector v can be written as length times direction: v = |v| · u, where u = v / |v| is the unit vector in the same direction.
  • In 2D, a unit vector at angle θ from the x-axis is u = (cos θ, sin θ).
  • The standard unit vectors along the axes are i = (1, 0), j = (0, 1) in 2D; i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1) in 3D.

➕ Vector addition and scalar multiplication

➕ Adding vectors component-wise

  • v + w = (v₁ + w₁, v₂ + w₂).
  • Example: (3, 1) + (−1, 2) = (2, 3).
  • Geometrically: place the tail of w at the head of v (head-to-tail addition); the sum is the diagonal of the parallelogram formed by v and w.

✖️ Scalar multiplication

  • Multiplying v by a scalar c stretches (or shrinks) the vector: cv = (c·v₁, c·v₂).
  • Example: 2v doubles the length; −v reverses direction.
  • vv = 0.

🌐 Coordinate-free vs. coordinate-based

  • Coordinate-free: vectors are defined by direction and magnitude, independent of any origin or axes; useful in physics (velocity, force) and geometry.
  • Coordinate-based: vectors are given by components relative to axes; necessary for numerical computation.
  • A vector can be moved parallel to itself (same length and direction) without changing its identity in coordinate-free contexts.

🔢 The dot product: two definitions

🔢 Geometric definition

Dot product (Definition 1): v · w = |v| |w| cos θ, where θ is the angle between v and w.

  • Multiplies the lengths of the two vectors and the cosine of the angle between them.
  • Example: v = (3, 0), w = (2, 2); |v| = 3, |w| = √8, θ = 45°, so v · w = 3 · √8 · (1/√2) = 6.

🔢 Algebraic definition

Dot product (Definition 2): v · w = v₁w₁ + v₂w₂ (in 2D) or V₁W₁ + V₂W₂ + V₃W₃ (in 3D).

  • Multiply corresponding components and add.
  • Example: (3, 0) · (2, 2) = 3·2 + 0·2 = 6.
  • The two definitions are equivalent (proven via the law of cosines).

🔄 Why the two forms are equal

  • Compute |VW|² in two ways:
    • Coordinates: (V₁ − W₁)² + (V₂ − W₂)² + (V₃ − W₃)² = |V|² − 2(V₁W₁ + V₂W₂ + V₃W₃) + |W|².
    • Law of cosines: |VW|² = |V|² + |W|² − 2|V||W| cos θ.
  • Matching terms shows V₁W₁ + V₂W₂ + V₃W₃ = |V||W| cos θ.

🔍 Geometric meaning of the dot product

🔍 Perpendicular vectors

  • If V · W = 0, then cos θ = 0, so θ = 90° (or −90°).
  • Perpendicular vectors have dot product zero.
  • Example: (2, 2, −1) · (−1, 2, 2) = −2 + 4 − 2 = 0, so they are perpendicular.
  • The unit vectors i, j, k are mutually perpendicular: i · j = 0, i · k = 0, j · k = 0.

🔍 Parallel vectors

  • If V and W point in the same direction, θ = 0° and cos θ = 1, so V · W = |V||W|.
  • If they point in opposite directions, θ = 180° and cos θ = −1, so V · W = −|V||W|.

🔍 Finding the angle

  • Rearrange the dot product formula: cos θ = (V · W) / (|V| |W|).
  • Example: i and i + j have i · (i + j) = 1, |i| = 1, |i + j| = √2, so cos θ = 1/√2 and θ = 45°.

🔍 Dot product with itself

  • V · V = |V|² (the length squared).
  • Example: (1, 2, −2) · (1, 2, −2) = 1 + 4 + 4 = 9 = |V|².

🧮 Properties and inequalities

🧮 Algebraic properties

The dot product satisfies:

  1. Commutative: V · W = W · V.
  2. Scalar multiplication: (cV) · W = c(V · W).
  3. Distributive: (U + V) · W = U · W + V · W.
  • These properties allow splitting and recombining vectors component-wise.
  • Example: V · W = (V₁i + V₂j + V₃k) · (W₁i + W₂j + W₃k) expands to nine terms; using i · i = 1, i · j = 0, etc., we recover V₁W₁ + V₂W₂ + V₃W₃.

🧮 Cauchy-Schwarz inequality

|V · W| ≤ |V| |W|.

  • Comes from |cos θ| ≤ 1.
  • Equality holds when |cos θ| = 1, i.e., when V and W are parallel (θ = 0° or 180°).
  • Example: For V = i + 2j + 2k and W = 2i + 2j + k, verify |V · W| ≤ |V| |W|.

🧮 Triangle inequality

  • |V + W| ≤ |V| + |W|.
  • Says that the length of one side of a triangle is less than the sum of the other two sides.
  • Proof uses Cauchy-Schwarz: |V + W|² = V · V + 2V · W + W · W ≤ |V|² + 2|V||W| + |W|² = (|V| + |W|)².

🌍 Applications: geometry and medicine

🌍 Geometry: midpoints and medians

  • Example (midpoints of a quadrilateral): In any four-sided figure in space, connect the midpoints of the four sides; these four midpoints form a parallelogram.
    • Proof: Let the sides be A, B, C, D. The vector from one midpoint to the next is V = ½A + ½B; the opposite side is W = ½C + ½D. Head-to-tail addition shows A + B = C + D (both reach the same point R), so V = W.
  • Example (medians of a triangle): The three medians (lines from corners to midpoints of opposite sides) meet at a single point.
    • The three median vectors add to zero.

🌍 Medicine: the heart vector in ECG

  • An electrocardiogram (ECG) measures the sum of many small voltage vectors in the heart wall, producing a heart vector V.
  • The ECG projects V in twelve directions (leads on arms, leg, chest) to produce twelve graphs.
  • Case 1 (heart attack/infarction): Dead muscle cells contribute no voltage, so V turns away from the damaged region.
  • Case 2 (hypertrophy): Overworked muscle cells contribute more voltage, so V turns toward the thickened region.
  • The direction of V (found from projections) helps locate the problem.
  • Don't confuse: infarction (loss of signal, V turns away) vs. hypertrophy (excess signal, V turns toward).

🧪 Worked examples

🧪 Example: computing dot product and angle

  • V = (1, 2, −2), W = (2, −3, 7).
  • V · W = 1·2 + 2·(−3) + (−2)·7 = 2 − 6 − 14 = −8.
  • |V| = √(1 + 4 + 4) = 3, |W| = √(4 + 9 + 49) = √62.
  • cos θ = −8 / (3√62), which is negative, so θ is between 90° and 180° (obtuse angle).

🧪 Example: perpendicular vectors

  • Find a vector perpendicular to v = (v₁, v₂).
  • Any vector w = (−v₂, v₁) satisfies v · w = v₁(−v₂) + v₂·v₁ = 0.
  • In 3D, infinitely many vectors are perpendicular to a given vector (they form a plane).

🧪 Example: unit vector at an angle

  • A unit vector at angle θ from the x-axis is u = (cos θ, sin θ).
  • Check: |u|² = cos²θ + sin²θ = 1.

🧪 Example: length and direction

  • v = (3, 1) has length √10.
  • Unit vector in the same direction: u = (3/√10, 1/√10).
  • Then v = √10 · u (length times direction).
67

Planes and Projections

11.2 Planes and Projections

🧭 Overview

🧠 One-sentence thesis

Planes in three-dimensional space are defined by a point and a perpendicular normal vector, and projecting one vector onto another decomposes it into parallel and perpendicular components that are fundamental to understanding forces, velocities, and distances in space.

📌 Key points (3–5)

  • Plane equation: A plane is determined by a point P₀ and a normal vector N perpendicular to the plane; the equation is a(x - x₀) + b(y - y₀) + c(z - z₀) = 0.
  • Parallel planes: Planes with the same normal vector N but different constant d are parallel; d = 0 means the plane passes through the origin.
  • Vector projection: The projection of vector B onto vector A splits B into a component along A (length |B| cos θ) and a component perpendicular to A.
  • Common confusion: The normal vector N is perpendicular to the plane, not in the plane; many directions lie in the plane, but only one perpendicular direction defines it.
  • Distance to plane: The shortest distance from a point to a plane is always along the normal vector direction.

📐 Defining planes in space

📍 What determines a plane

A plane in space is determined by a point P₀ = (x₀, y₀, z₀) and a normal vector N perpendicular to the plane.

  • Unlike a line (which needs one point and a slope), a plane needs a point and a direction perpendicular to it.
  • The normal vector N = (a, b, c) = ai + bj + ck points "up" or "down" from the plane.
  • The length of N doesn't matter for defining the plane; doubling N gives the same plane.
  • Any vector lying in the plane must be perpendicular to N (dot product = 0).

📝 Plane equation forms

The point-normal form uses the dot product to enforce perpendicularity:

Form 1 (point-normal): (a, b, c) · (x - x₀, y - y₀, z - z₀) = 0

Form 2 (standard): ax + by + cz = d, where d = ax₀ + by₀ + cz₀

  • To verify a point P = (x, y, z) lies on the plane, substitute its coordinates into the equation.
  • The normal vector N = (a, b, c) can be read directly from the coefficients in ax + by + cz = d.
  • Example: The plane x + y + z = 6 has normal N = (1, 1, 1) and passes through (1, 2, 3) because 1 + 2 + 3 = 6.

🔄 Parallel and perpendicular planes

Parallel planes: Same normal vector N, different constant d.

  • Example: x + y + z = 6 and x + y + z = 7 are parallel (both have N = (1, 1, 1)).
  • The plane x + y + z = 0 passes through the origin; x + y + z = d is parallel but shifted.

Perpendicular planes: Their normal vectors are perpendicular (dot product = 0).

  • Example: x - y + 3z = 0 and 3y + z = 0 are perpendicular because (1, -1, 3) · (0, 3, 1) = 0.

Angle between planes: The angle between two planes equals the angle between their normal vectors.

  • cos θ = |N₁ · N₂| / (|N₁| |N₂|)

🎯 Projecting vectors

📏 Length of projection

When projecting vector B onto vector A, the component along A has length:

Length of P = |B| cos θ = (A · B) / |A|

  • This uses the dot product A · B = |A| |B| cos θ to avoid computing angles directly.
  • The length is positive when θ < 90°, zero when perpendicular, negative when θ > 90°.
  • Doubling B doubles the projection; doubling A does not change the projection length.

➡️ Projection as a vector

The projection P is not just a length but a vector along A:

P = (A · B / |A|²) A

  • This equals (length of P) × (unit vector in direction of A).
  • If A is already a unit vector (|A| = 1), then P = (A · B) A.
  • Example: Projecting B = i - j onto A = 3i + j gives P = (2/10) A = (6/10)i + (2/10)j.

⊥ Perpendicular component

Any vector B splits into two perpendicular parts:

  • Parallel component: P = (A · B / |A|²) A
  • Perpendicular component: B - P

These satisfy: P · (B - P) = 0 (they are perpendicular) and |P|² + |B - P|² = |B|² (Pythagorean theorem).

Don't confuse: The projection P is along A; the component B - P is perpendicular to A, not to B.

🌬️ Physical applications

Wind velocity example: A 100 mph east wind (V = (100, 0)) and northeast flight direction (A = (1, 1)).

  • Projection P = (100/2)(1, 1) = (50, 50) is the tailwind component.
  • The perpendicular component is the crosswind.

Force on incline: Gravity F = (0, 0, -mg) on a surface with normal N = (2, 2, 1).

  • Projection along N is the component perpendicular to the surface (does not move the ball).
  • The component F - P lies in the plane and makes the ball roll.

📏 Distance from point to plane

🎯 Distance from origin

The shortest distance from (0, 0, 0) to the plane ax + by + cz = d is along the normal vector N.

Distance = |d| / √(a² + b² + c²) = |d| / |N|

Nearest point = (da, db, dc) / (a² + b² + c²)

  • The nearest point P is a multiple tN of the normal vector.
  • To find t: the point tN = (ta, tb, tc) must satisfy the plane equation, giving t = d / (a² + b² + c²).
  • Example: For x + 2y + 2z = 5, the nearest point is (5/9, 10/9, 10/9) and distance is 5/3.

📍 Distance from arbitrary point

The distance from Q = (x₁, y₁, z₁) to the plane ax + by + cz = d is:

Distance = |d - ax₁ - by₁ - cz₁| / √(a² + b² + c²)

  • The vector from Q to the nearest point P is tN, where t = (d - Q · N) / |N|².
  • When Q is on the plane, ax₁ + by₁ + cz₁ = d, so the distance is zero.
  • Don't confuse: This distance is |tN|, not |P| (the distance from origin to P).

📐 Distance between parallel planes

For parallel planes ax + by + cz = d₁ and ax + by + cz = d₂:

  • They have the same normal vector N.
  • Pick any point Q on the first plane; find how far along N you must travel to reach the second plane.
  • The distance is |d₂ - d₁| / |N|.

🔬 Medical application: Electrocardiogram

❤️ Heart vector projections

The heart produces a net electrical vector V that changes over time during each heartbeat.

  • The heart vector is the sum of many small action potentials from cardiac cells.
  • An ECG measures voltage differences between electrodes, which are projections of V onto the lead directions.
  • The Einthoven triangle uses three leads (right arm, left arm, left leg) forming roughly 60° angles.

📊 Lead relationships

The three lead vectors form a triangle: L_I - L_II + L_III = 0

Therefore the voltage projections satisfy: V_I - V_II + V_III = 0

  • Only two of the three measurements are independent (the third can be calculated).
  • Additional "augmented" leads (aVR, aVL, aVF) are computed algebraically, not from physical wires.
  • Example: If V = 2i - j, then V · L_I = 4, V · L_II = 2 + √3, and V · L_III = -2 + √3.

📈 QRS complex

The large spike on an ECG trace represents ventricular depolarization:

  • The heart vector sweeps through a loop in space.
  • Each lead shows only the projection of this loop onto that lead direction.
  • The mean heart vector H (the "axis") is found by looking for a lead where the area under the QRS is zero—H is perpendicular to that lead.

Don't confuse: The ECG graphs show projections of the moving vector, not the vector itself; you need at least two leads to reconstruct the actual heart vector in the plane.

68

Cross Products and Determinants

11.3 Cross Products and Determinants

🧭 Overview

🧠 One-sentence thesis

The cross product A × B produces a vector perpendicular to both A and B whose length equals the area of the parallelogram they span, and its algebraic formula connects directly to determinants that compute volumes and solve geometric problems.

📌 Key points (3–5)

  • What the cross product is: A vector (not a number) with length |A||B|sin θ, perpendicular to both A and B.
  • How it differs from dot product: Dot product A · B = |A||B|cos θ is a number that rewards parallel vectors; cross product A × B is a vector largest when AB.
  • Geometric meaning: |A × B| = area of parallelogram with sides A and B; triple product A · (B × C) = volume of box with edges A, B, C.
  • Common confusion: The cross product is anticommutative: B × A = −(A × B), unlike ordinary multiplication or dot products.
  • Determinant connection: Cross products and volumes are computed as determinants—compact formulas that organize the algebra.

🔄 The cross product structure

🔄 Definition and properties

Cross product A × B: A vector with length |A||B|sin θ and direction perpendicular to both A and B.

  • The cross product exists only in three dimensions.
  • A and B lie in a plane through the origin; A × B points along the normal vector N perpendicular to that plane.
  • Direction determined by the right-hand rule: curl fingers from A toward B (≤180°), thumb points along A × B.

↔️ Dot product vs cross product

FeatureDot product A · BCross product A × B
Result typeNumber (scalar)Vector
Formula|A||B|cos θ|A||B|sin θ
Maximum whenA parallel to B (θ=0)A perpendicular to B (θ=90°)
Zero whenABA parallel to B
Commutative?Yes: A · B = B · ANo: B × A = −(A × B)
  • The excerpt emphasizes: |A × B|² + |A · B|² = |A|²|B|² (equation 1).
  • This bridges geometry (angles) to algebra (components).

🔁 Anticommutativity and the right-hand rule

  • Key property: B × A = −(A × B).
  • This was an "intellectual revolution" in 19th-century algebra—ordinary multiplication always satisfies AB = BA, but cross products do not.
  • Example: i × j = k, but j × i = −k.
  • The right-hand rule determines sign: fingers curl from first vector toward second, thumb shows direction.
  • Cyclic order (ijk, jki, kij) gives plus signs; anticyclic order (ikj, jik, kji) gives minus signs.

🧩 Basic cross products of unit vectors

  • i × j = k; j × k = i; k × i = j (cyclic: plus).
  • j × i = −k; k × j = −i; i × k = −j (anticyclic: minus).
  • i × i = j × j = k × k = 0 (parallel vectors).
  • Example: A record turning counterclockwise (force at i in direction j) has angular velocity up (k); clockwise rotation (force at j toward i) points down (−k).

📐 Geometric applications

📐 Area of parallelogram and triangle

  • Parallelogram area = base × perpendicular height = |A||B|sin θ = |A × B|.
  • Triangle area = ½|A × B| (triangle is half the parallelogram).
  • Example: A = i + 2j, B = 4i + 5jA × B = (1·5 − 2·4)k = −3k; area = 3 (absolute value), triangle area = 3/2.
  • Don't confuse: The sign in A × B indicates orientation (right/left-handed), but area is always the absolute value.

🌀 Torque and moment

  • Torque vector T = R × F produces rotation.
  • R = position vector from origin to where force acts; F = force vector.
  • When F parallel to R: no turning (torque = 0).
  • When FR: maximum rotation.
  • Magnitude: |R||F|sin θ = moment (turning force × distance).
  • Direction: perpendicular to both R and F, along axis of rotation (right-hand rule).

✈️ Finding plane equations

  • To find the plane through three points P, Q, R:
    1. Compute vectors A = QP and B = RP in the plane.
    2. Normal vector N = A × B is perpendicular to the plane.
    3. Plane equation: N₁x + N₂y + N₃z = d (find d by substituting one point).
  • Example: P=(1,0,0), Q=(0,1,0), R=(0,0,1) → A = ji, B = kiN = A × B = i + j + k → plane: x + y + z = 1.

🧮 Algebraic formula for cross product

🧮 Two-dimensional case

  • Vectors A = a₁i + a₂j and B = b₁i + b₂j in xy-plane.
  • A × B points in z-direction: A × B = (a₁b₂ − a₂b₁)k.
  • This is the 2×2 determinant of the components.
  • Parallelogram area = |a₁b₂ − a₂b₁|.

🧮 Three-dimensional formula

A × B = (a₂b₃ − a₃b₂)i + (a₃b₁ − a₁b₃)j + (a₁b₂ − a₂b₁)k (equation 6).

  • Derived by expanding (A = a₁i + a₂j + a₃k) × (B = b₁i + b₂j + b₃k) using i × i = 0, i × j = k, etc.
  • Nine terms reduce to six (three with i × i = 0 disappear).
  • Pattern: i component uses indices 2,3; j component uses 3,1; k component uses 1,2 (cyclic).
  • Example: (i + 2j + 3k) × (4i + 5j + 6k) = (2·6−3·5)i + (3·4−1·6)j + (1·5−2·4)k = −3i + 6j − 3k.

🔍 Verification of perpendicularity

  • A · (A × B) = a₁(a₂b₃ − a₃b₂) + a₂(a₃b₁ − a₁b₃) + a₃(a₁b₂ − a₂b₁) = 0 (equation 7).
  • Similarly B · (A × B) = 0.
  • This confirms A × B is perpendicular to both A and B.

📦 Determinants and volume

📦 2×2 determinants

  • Matrix: [a₁ a₂; b₁ b₂] (array of four numbers in brackets).
  • Determinant: |a₁ a₂; b₁ b₂| = a₁b₂ − a₂b₁ (one number in vertical bars).
  • Equals area of parallelogram with sides (a₁, a₂) and (b₁, b₂).
  • Example: |2 1; 4 3| = 6 − 4 = 2.

📦 3×3 determinants and box volume

Triple scalar product A · (B × C): a number equal to the volume of the box with edges A, B, C.

  • Volume = base area |B × C| times perpendicular height |A|cos θ = A · (B × C).
  • The 3×3 determinant organizes nine components into this volume:

|a₁ a₂ a₃|
|b₁ b₂ b₃| = a₁b₂c₃ + a₂b₃c₁ + a₃b₁c₂ − a₁b₃c₂ − a₂b₁c₃ − a₃b₂c₁
|c₁ c₂ c₃|

  • Six terms: three with plus (indices 123, 231, 312 in cyclic order), three with minus (132, 213, 321 anticyclic).
  • Mnemonic: products "down to the right" are plus.
  • Example: |2 1 1; 1 2 1; 1 1 2| = 2·2·2 + 1·1·1 + 1·1·1 − 2·1·1 − 1·1·2 − 1·2·1 = 8+1+1−2−2−2 = 4.

📦 Properties of triple products

  • A · (B × C) = (A × B) · C = C · (A × B) = B · (C × A) (same box, different base).
  • Cyclic permutations (ABC, BCA, CAB) give the same volume.
  • Anticyclic (A · (C × B)) has opposite sign.
  • Volume = 0 when A, B, C lie in the same plane (box is flattened).

🔢 Determinant as cross product

A × B = |i j k; a₁ a₂ a₃; b₁ b₂ b₃| (equation 14).

  • "Determinant" with vectors i, j, k in top row (not strictly legal, but works).
  • Expand on top row: i|a₂ a₃; b₂ b₃| − j|a₁ a₃; b₁ b₃| + k|a₁ a₂; b₁ b₂|.
  • This reproduces equation (6).
  • Example: (ji) × (ki) = |i j k; −1 1 0; −1 0 1| = i(1·1−0·0) − j(−1·1−0·(−1)) + k(−1·0−1·(−1)) = i + j + k.
  • Don't forget: minus sign in front of j component.

🔢 Plane equation from determinant

  • Point (x, y, z) lies on the plane through origin containing B and C when volume = 0:

|x y z|
|b₁ b₂ b₃| = 0
|c₁ c₂ c₃|

  • Expanding this determinant gives the plane equation.
  • Example: B = ji, C = ki → |x y z; −1 1 0; −1 0 1| = x(1·1−0·0) − y(−1·1−0·(−1)) + z(−1·0−1·(−1)) = x + y + z = 0.

🔑 Key computational facts

🔑 Splitting determinants

  • A 3×3 determinant splits into three 2×2 determinants multiplied by top-row entries:

|a₁ a₂ a₃|
|b₁ b₂ b₃| = a₁|b₂ b₃; c₂ c₃| − a₂|b₁ b₃; c₁ c₃| + a₃|b₁ b₂; c₁ c₂|
|c₁ c₂ c₃|

  • The minus sign on a₂ is easy to forget.
  • Can expand on any row (with appropriate sign pattern).

🔑 Zero determinant = coplanarity

  • Determinant = 0 means the three vectors lie in the same plane (no volume).
  • Example: rows (0,1,−1), (1,1,0), (1,0,1) → all six products cancel → determinant = 0.
  • Test for three points lying on a line through origin: their position vectors give zero determinant.

🔑 Sign and handedness

  • Negative determinant means A, B, C form a "left-handed triple."
  • Volume is always the absolute value of the determinant.
  • Right-handed: xyz axes in standard orientation (determinant of identity matrix = 1).
  • Left-handed: mirror image (e.g., exchanging two vectors flips sign).
69

Matrices and Linear Equations

11.4 Matrices and Linear Equations

🧭 Overview

🧠 One-sentence thesis

Matrices provide a unified notation and computational framework for solving systems of linear equations, with the inverse matrix A⁻¹ enabling solutions when the determinant is nonzero.

📌 Key points (3–5)

  • Three equivalent formulations: Any system of linear equations can be written by rows (separate equations), by columns (vector combinations), or by matrices (Au = d).
  • When solutions exist: A unique solution exists when the determinant is nonzero (nonsingular case); parallel lines or dependent columns signal no solution or infinitely many solutions (singular case).
  • Solution methods: Cramer's Rule uses determinant ratios; the inverse matrix gives u = A⁻¹d; elimination is practical for larger systems.
  • Common confusion: The singular case (det A = 0) means either no solution or infinitely many—don't assume "no determinant" always means "no solution"; if the right side lies on the same line/plane as the columns, infinitely many solutions exist.
  • Real application: Least squares fitting (projecting onto a plane) solves overdetermined systems by minimizing error, connecting linear algebra with calculus optimization.

🔤 Three ways to write linear systems

🔤 By rows: separate equations

Each equation represents one constraint:

  • For two unknowns x and y: a₁x + b₁y = d₁ and a₂x + b₂y = d₂
  • Row picture: Each equation is a line (or hyperplane in higher dimensions); the solution is where they intersect.
  • Example: x + y = 5000 and 0.05x + 0.10y = 400 (investment problem with principal and interest constraints).

🔤 By columns: vector combinations

Rewrite as x times one column vector plus y times another equals the right-side vector:

  • x[a₁, a₂] + y[b₁, b₂] = [d₁, d₂]
  • Column picture: Find the combination of column vectors a and b that produces d.
  • The coefficients x and y are the unknowns.

🔤 By matrices: Au = d

Compact notation combining all coefficients into matrix A and unknowns into vector u:

  • A is the coefficient matrix (rows = equations, columns = unknowns)
  • u is the unknown vector
  • d is the known right-side vector
  • Matrix-vector multiplication can be computed by rows (dot products) or by columns (linear combinations).

🚫 Singular vs nonsingular systems

🚫 When the system is singular (det A = 0)

A system is singular when the determinant of A equals zero.

Row picture interpretation:

  • Lines are parallel (no intersection) → no solution
  • Lines coincide (same line) → infinitely many solutions

Column picture interpretation:

  • Columns lie along the same line → one column is a multiple of the other
  • Only combinations along that line are reachable; d must lie on this line for any solution to exist.

Example: 2x + y = 0 and 2x + y = 1 are parallel (no solution); 2x + y = 0 and 4x + 2y = 0 are the same line (infinitely many solutions).

✅ When the system is nonsingular (det A ≠ 0)

  • Lines intersect at exactly one point.
  • Columns span the plane; any d can be reached by a unique combination.
  • A unique solution exists.

🔢 Solution by determinants: Cramer's Rule

🔢 The 2×2 determinant

The determinant of a 2×2 matrix [a₁ b₁; a₂ b₂] is the number a₁b₂ - a₂b₁.

This number appears when eliminating variables:

  • Multiply equations strategically to cancel x or y
  • The coefficient of the remaining variable is the determinant of A.

🔢 Cramer's Rule formula

Cramer's Rule: x = |d₁ b₁; d₂ b₂| / |a₁ b₁; a₂ b₂| and y = |a₁ d₁; a₂ d₂| / |a₁ b₁; a₂ b₂|

  • Numerator for x: replace the a-column with d
  • Numerator for y: replace the b-column with d
  • Denominator: always det A
  • Breaks down when det A = 0 (can't divide by zero in the singular case).

Example: For the investment problem with det A = 0.05, the solution is x = 100/0.05 = 2000 and y = 150/0.05 = 3000.

🔄 The inverse matrix A⁻¹

🔄 What the inverse means

A⁻¹ is the matrix that "undoes" A: if Au = d, then u = A⁻¹d.

  • Think of A as transforming u into d; A⁻¹ transforms d back into u.
  • Analogous to inverse functions: g(x) = y and x = g⁻¹(y).
  • The inverse exists only when det A ≠ 0 (nonsingular case).

🔄 Formula for 2×2 inverse

For A = [a b; c d] with determinant D = ad - bc:

A⁻¹ = (1/D)[d -b; -c a]

  • Notice the sign pattern and position swap
  • Divide every entry by the determinant
  • Check: A⁻¹A = I and AA⁻¹ = I, where I is the identity matrix.

🔄 The identity matrix I

Identity matrix I: Has 1's on the diagonal and 0's elsewhere; acts like the number 1.

Properties:

  • Iu = u for every vector u
  • IA = AI = A for every matrix A
  • [1 0; 0 1] in the 2×2 case

✖️ Matrix multiplication

✖️ Matrix times vector (Mv)

Two equivalent interpretations:

By rows (dot products):

  • Each component of the result is a row of M dotted with v
  • [row₁; row₂][v] = [(row₁)·v; (row₂)·v]

By columns (linear combination):

  • The result is a combination of M's columns, with coefficients from v
  • If v = [x; y], then Mv = x(column₁) + y(column₂)

✖️ Matrix times matrix (MV)

Matrix-matrix multiplication: Treat each column of V as a separate vector; multiply M by each column.

MV = [row₁; row₂][v₁ v₂] = [(row₁)·v₁ (row₁)·v₂; (row₂)·v₁ (row₂)·v₂]

  • Entry in row i, column j = (row i of M) · (column j of V)
  • Example: The product A⁻¹A yields the identity matrix I.

Important property: (AB)⁻¹ = B⁻¹A⁻¹ (reverse order).

📐 Application: Projection and least squares

📐 The overdetermined problem

When there are more equations than unknowns (e.g., three equations, two unknowns):

  • No exact solution exists in general
  • Goal: find the "best" approximate solution
  • Geometric interpretation: project d onto the plane spanned by columns a and b.

📐 The projection principle

Key law: The error d - p is perpendicular to the plane.

This means:

  • a · (xa + yb - d) = 0
  • b · (xa + yb - d) = 0

Rewriting these perpendicularity conditions gives the normal equations:

  • (a·a)x + (a·b)y = a·d
  • (b·a)x + (b·b)y = b·d

Solve this 2×2 system for x and y to find the projection p = xa + yb.

📐 Least squares line fitting

Equivalent problem: Fit data points by a straight line that minimizes squared errors.

  • Data points: (1, d₁), (2, d₂), (3, d₃)
  • Desired line: f = x + yt
  • Overdetermined system: x + y(1) = d₁, x + y(2) = d₂, x + y(3) = d₃
  • The normal equations give the coefficients x and y that minimize the sum of squared vertical errors.

Calculus connection: Minimizing E(x,y) = sum of (error)² requires both partial derivatives to be zero, which produces the same normal equations.

Example: For points (1,0), (2,5), (3,4), the closest line is f = -1 + 2t with projection p = (1, 3, 5) and error (-1, 2, -1).

📐 General n-dimensional formula

For n data points with coordinates b₁, b₂, ..., bₙ and values d₁, d₂, ..., dₙ:

Normal equations:

  • (n)x + (Σbᵢ)y = Σdᵢ
  • (Σbᵢ)x + (Σbᵢ²)y = Σbᵢdᵢ

The solution gives the best-fit line in the least squares sense.

70

Linear Algebra in Three Dimensions

11.5 Linear Algebra

🧭 Overview

🧠 One-sentence thesis

Three-dimensional linear systems can be solved either by formulas (determinants and Cramer's Rule) or by algorithms (Gaussian elimination), with elimination being faster and more practical for numerical computation.

📌 Key points (3–5)

  • Two geometric pictures: the row picture shows three planes intersecting at a solution point; the column picture shows three column vectors combining to produce the right-side vector.
  • Determinant decides solvability: when det A ≠ 0, the system has a unique solution and the matrix has an inverse; when det A = 0, the system is singular (no solution or infinitely many).
  • Formulas vs algorithms: determinants and A⁻¹ give explicit formulas (Cramer's Rule), but Gaussian elimination is faster and safer for actual computation.
  • Common confusion: the inverse formula A⁻¹ uses cross products of columns, but each row of A⁻¹ does not use the corresponding column of A (e.g., the first row of A⁻¹ uses only columns b and c, not a).
  • Singular breakdown: when D = 0, elimination reaches an impossible equation like 0 = –2, and geometrically the three planes form an "open tunnel" with no common intersection point.

🖼️ The two geometric pictures

🖼️ Row picture: three planes intersecting

Row picture: each equation represents a plane in three-dimensional space; the solution is the point where all three planes meet.

  • The equation x + y = 1 (with z absent) is a plane cutting vertically through the line x + y = 1 in the xy-plane.
  • The equation x + 2z = 0 (with y absent) is a plane containing the entire y-axis, because all points (0, y, 0) satisfy it.
  • Two planes intersect in a line: the first two equations together describe a line in 3D space (e.g., through P = (0, 1, 0) and Q = (–1, 2, ½)).
  • The third plane picks the solution: the intersection line of planes 1 and 2 crosses the third plane at the unique solution point.
  • Example: the solution x = –2, y = 3, z = 1 lies on all three planes and is the single intersection point.

🎯 Column picture: combining column vectors

Column picture: rewrite the system as a vector equation xa + yb + zc = d, where a, b, c are the columns of A.

  • Matrix-vector multiplication Au is a combination of the columns of A weighted by the components of u.
  • The goal is to find scalars x, y, z so that the linear combination of the three column vectors equals the right-side vector d.
  • Example: for the system in equation (1), the column equation is
    x(1, 1, 0) + y(1, 0, –2) + z(0, 2, 2) = (1, 0, –4).
  • The solution x = –2, y = 3, z = 1 is the same in both pictures.

Don't confuse: the row picture uses planes (one per equation); the column picture uses vectors (one per unknown). Both yield the same solution point.

🔢 Determinants and the inverse matrix

🔢 The determinant as volume

Determinant of a 3×3 matrix: det A = a · (b × c), the triple product, which equals the volume of the box with edges a, b, c.

  • The determinant can also be written as six terms:
    a₁(b₂c₃ – b₃c₂) + a₂(b₃c₁ – b₁c₃) + a₃(b₁c₂ – b₂c₁).
  • For the example matrix A with columns (1,1,0), (1,0,–2), (0,2,2), the determinant is 2.
  • Transposing does not change the determinant: swapping rows and columns gives the same six terms in a different order.
  • When D = det A ≠ 0, the columns form a genuine box (nonzero volume), so the system can be solved.

🔄 The inverse matrix formula

Inverse matrix: when D ≠ 0, A has an inverse A⁻¹ such that AA⁻¹ = A⁻¹A = I (the identity matrix).

  • The formula for a 3×3 inverse uses cross products of columns:
    A⁻¹ = (1/D) [b × c, c × a, a × b] (as rows).
  • Each entry of A⁻¹ is a 2×2 determinant (a "cofactor") divided by D.
  • Key point: to find row i of A⁻¹, ignore column i of A. For example, the first row of A⁻¹ uses only columns b and c, not a.
  • A "sign matrix" of alternating + and – determines whether to reverse the sign of each 2×2 determinant.
  • Example: for A = [(1,1,0), (1,0,2), (0,–2,2)], the inverse is
    A⁻¹ = (1/2) [(4,–2,2), (–2,2,–2), (–2,2,–1)].

Don't confuse: the inverse formula is elegant but error-prone; always check by multiplying A⁻¹A = I.

🎲 Cramer's Rule

Cramer's Rule: the solution components are ratios of determinants: x = |d b c| / |a b c|, y = |a d c| / |a b c|, z = |a b d| / |a b c|.

  • The numerator for each unknown replaces the corresponding column of A with the right-side vector d.
  • All denominators are D = det A.
  • This is the same as u = A⁻¹d written out component by component.
  • Example: for d = (1, 0, –4), the numerators are the triple products d · (b × c), d · (c × a), d · (a × b), each divided by D = 2.

Why it matters: Cramer's Rule is a closed-form solution, useful for symbolic algebra, but impractical for large numerical systems.

⚠️ The singular case (D = 0)

⚠️ When the determinant is zero

Singular matrix: when det A = 0, the matrix has no inverse, and the system Au = d has either no solution or infinitely many solutions.

  • Geometrically, the box formed by the columns is flattened (zero volume).
  • The three column vectors lie in the same plane.
  • Example: changing the lower-right entry of A from 2 to 4 makes D = 0.

🚧 Row picture breakdown

  • The three planes do not meet at a single point.
  • Typically, the intersection line of the first two planes is parallel to the third plane, forming an "open tunnel."
  • Example: planes 1 and 2 intersect in a line that stays above plane 3 (Figure 11.22).
  • In the extreme case, two or all three planes may be parallel (e.g., if one row is a multiple of another).

🚧 Column picture breakdown

  • The three columns lie in the same plane.
  • The vector d can be produced by a combination of a, b, c only if d also lies in that plane.
  • Most vectors d will be outside the plane, so most singular systems have no solution.
  • If d happens to lie in the plane, there are infinitely many solutions.

Don't confuse: singular does not mean "no solution" automatically; it means "no unique solution" (either zero or infinitely many).

🔧 Gaussian elimination (the algorithm)

🔧 How elimination works

Gaussian elimination: systematically subtract multiples of equations to eliminate unknowns, producing a triangular system that is easy to solve by back substitution.

  • Step 1: Use the first equation to eliminate x from all equations below it.
    • Subtract (coefficient ratio) × equation 1 from each lower equation.
    • The coefficient of x in the first equation is the first pivot.
  • Step 2: Use the new second equation to eliminate y from all equations below it.
    • The coefficient of y in the second equation (after step 1) is the second pivot.
  • Step 3: Continue until the system is triangular (all entries below the diagonal are zero).
  • Back substitution: solve the last equation for z, then substitute into the second-to-last for y, then into the first for x.

Example: for the system
x + y + z = 1
2x + 5y + 3z = 7
4x + 7y + 6z = 11,
elimination produces
x + y + z = 1
3y + z = 5
z = 2.
Then z = 2, y = 1, x = –2.

🔧 Why elimination is better

  • Faster: elimination avoids computing determinants and the inverse.
  • Safer: easier to check by substituting the solution back into the original equations.
  • What computers use: professional codes solve linear systems by elimination, not by formulas.
  • The product of the pivots equals the determinant (a byproduct of elimination).

🔧 Elimination detects singularity

  • If a pivot position becomes zero and cannot be fixed by row exchange, the matrix is singular.
  • Elimination reaches an impossible equation like 0 = –2, confirming no solution exists.
  • Example: for the singular matrix S (with lower-right entry 4 instead of 2), elimination produces
    x + y = 1
    –y + 2z = –1
    0 = –2 (impossible).

Don't confuse: a zero in the pivot position is not always fatal; if you can exchange rows to bring a nonzero entry into that position, elimination continues. Failure occurs only when no such exchange is possible.

📊 Summary table: formulas vs algorithms

AspectFormulas (determinants, A⁻¹)Algorithms (elimination)
Best forSymbolic algebra, small systemsNumerical computation, large systems
SpeedSlow (many multiplications)Fast (systematic subtraction)
Error checkingMultiply A⁻¹A to verifySubstitute solution into original equations
Singularity detectionCompute det A; if 0, singularReach impossible equation (e.g., 0 = –2)
What computers useRarely (except for theory)Always (with refinements for roundoff error)

Key insight: this section is at the "crossover point" between formulas (like the quadratic formula for 2×2 systems) and algorithms (like Newton's method for higher-degree equations). For 3×3 systems, both approaches work; for larger systems, algorithms dominate.

71

The Position Vector

12.1 The Position Vector

🧭 Overview

🧠 One-sentence thesis

The position vector R(t) describes motion along a curve as a function of time, and its derivatives yield the velocity and acceleration vectors that characterize how the object moves.

📌 Key points (3–5)

  • Position vector R(t): locates a moving body at each time t by giving coordinates x(t)i + y(t)j + z(t)k.
  • Velocity v = dR/dt: the derivative of position; it is tangent to the curve and encodes both speed and direction.
  • Speed vs. velocity: speed |v| = ds/dt is a scalar (magnitude), while velocity v is a vector; acceleration a = dv/dt measures how velocity changes (not just speed changes).
  • Common confusion: changing speed (gas/brake) vs. changing direction (steering)—both produce acceleration; uniform circular motion has constant speed but nonzero acceleration because direction changes.
  • Lines vs. curves: straight-line motion has constant velocity v and zero acceleration; curved motion (e.g., circles, helices) has changing velocity and nonzero acceleration.

📐 Position, velocity, and the geometry of motion

📍 What the position vector is

Position vector: R(t) = x(t)i + y(t)j + z(t)k, which locates the moving body at time t.

  • As t varies, the tip of R(t) traces out a curve in space.
  • The parameter t tells when the body passes each point, not just where the curve is.
  • Example: R(t) = t i + t² j + t³ k swings upward as t increases; at t = 0 the body is at the origin, at t = 2 it is at (2, 4, 8).

🎯 Velocity as the derivative

Velocity: v(t) = dR/dt = (dx/dt)i + (dy/dt)j + (dz/dt)k.

  • Geometric meaning: the velocity vector v is tangent to the curve.
  • Why: the change ΔR goes from one point on the curve to a nearby point; dividing by Δt changes length but not direction, and as Δt → 0 the direction lines up with the tangent.
  • Example: for R = t i + t² j + t³ k, the velocity is v = i + 2t j + 3t² k; at t = 0, v = i (tangent along the x axis).

📏 Speed, distance, and the unit tangent

  • Speed: |v| = √[(dx/dt)² + (dy/dt)² + (dz/dt)²], a scalar measuring how fast the body moves.
  • Distance (arc length): s = ∫ |v(t)| dt; speed is ds/dt.
  • Unit tangent vector: T = v/|v| = (dR/dt)/(ds/dt) = dR/ds, a vector of length 1 pointing along the curve.
  • Don't confuse: |v| (speed, scalar) vs. v (velocity, vector); the chord length |ΔR| vs. the arc length Δs (as Δt → 0, |ΔRs| → |T| = 1).

🚗 Straight-line motion: constant velocity

➡️ Uniform motion along a line

Line equation (vector form): R(t) = R₀ + t v, where R₀ is the starting point and v is the constant velocity.

  • Why it's a line: dR/dt = v (constant), so there is no acceleration (a = 0).
  • Component form: x = x₀ + tv₁, y = y₀ + tv₂, z = z₀ + tv₃.
  • Speed: |v| = √(v₁² + v₂² + v₃²); direction: v/|v|.
  • Example: R = (2j + k) + t(i + j + 2k) starts at (0, 2, 1) and moves with velocity (1, 1, 2).

🔀 Parametric vs. parameter-free forms

  • With parameter t: x = x₀ + tv₁, y = y₀ + tv₂, z = z₀ + tv₃ (tells position at each time).
  • Without parameter: (x − x₀)/v₁ = (y − y₀)/v₂ = (z − z₀)/v₃ (two equations; eliminates t).
  • Trade-off: parametric form gives velocity and timing; parameter-free form gives only the geometric path.
  • Example: x/1 = (y − 2)/1 = (z − 1)/2 is the same line as x = t, y = 2 + t, z = 1 + 2t, but we cannot tell speed or direction from the first form.

🧮 Practical questions about lines

  • Line through two points P and Q: choose R₀ = P (or Q) and v = QP (or any multiple).
  • Segment from P to Q: let t go from 0 to 1 in R = P + t(QP).
  • Closest point to origin: minimize x² + y² + z²; set derivative to zero.
  • Meeting a plane: substitute x(t), y(t), z(t) into the plane equation and solve for t.
  • Perpendicular to a plane: use the plane's normal vector N = (a, b, c) as the velocity v.
  • Don't confuse: a line has one parameter (or two equations); a plane has one equation (or two parameters).

🔄 Circular and helical motion: changing direction

⭕ Steady motion around a circle

  • Position: x = r cos ωt, y = r sin ωt, z = 0 (radius r, angular velocity ω).
  • Velocity: v = −ωr sin ωt i + ωr cos ωt j (tangent to the circle).
  • Speed: |v| = ωr (constant).
  • Unit tangent: T = −sin ωt i + cos ωt j (length 1, direction changes).
  • Distance: s = ωrt; at t = 2π/ω, one full circle is s = 2πr.

🎢 Acceleration in circular motion

  • Acceleration: a = −ω²r cos ωt i − ω²r sin ωt j = −ω²R (points toward the center).
  • Magnitude: |a| = ω²r.
  • Key insight: even though speed is constant, there is acceleration because the direction of v is changing.
  • Don't confuse: accelerating by changing speed (gas/brake) vs. accelerating by changing direction (steering); circular motion is all steering.

🌀 Helix: circle plus vertical motion

  • Position: R = cos t i + sin t j + t k (circle in xy plane, rising in z).
  • Velocity: v = −sin t i + cos t j + k.
  • Speed: |v| = √(sin²t + cos²t + 1) = √2 (constant).
  • Distance: s = √2 t; at t = π, half turn is complete, distance is √2 π (longer than the shadow's semicircle π because of the 45° slope).
  • Acceleration: a = −cos t i − sin t j (points toward the z axis).

🚀 Acceleration and integration

📈 Acceleration as the second derivative

Acceleration: a(t) = dv/dt = d²R/dt² = (d²x/dt²)i + (d²y/dt²)j + (d²z/dt²)k.

  • What it measures: rate of change of velocity (vector), not rate of change of speed (scalar).
  • Straight line: if v is constant, then a = 0.
  • Curve: if direction changes, a ≠ 0 even if speed is constant.
  • Example: circular motion has a = −ω²R (toward center); helix has a = −cos t i − sin t j (toward axis).

🔁 Integrating acceleration to find motion

  • Given: acceleration a(t), initial velocity v₀, initial position R₀.
  • Find velocity: v(t) = v₀ + ∫ a(t) dt (integrate each component).
  • Find position: R(t) = R₀ + ∫ v(t) dt.
  • Example (constant a): v = v₀ + at, R = R₀ + vt + ½at².

⚾ Curve ball example

  • Initial position: R₀ = 5k (5 feet off ground).
  • Initial velocity: v₀ = 120i − 2j + 2k (120 ft/s ≈ 82 mph).
  • Acceleration: a = 16j − 32k (spin + gravity).
  • Velocity: v = 120i + (−2 + 16t)j + (2 − 32t)k.
  • Position: R = 120t i + (−2t + 8t²)j + (5 + 2t − 16t²)k.
  • At t = ½ (home plate, 60 ft): y = −1 + 2 = 1 ft (outside), z = 5 + 1 − 4 = 2 ft (low)—the ball curves.
  • At t = ¼ (halfway): y = −½ + ½ = 0 (looks like it's coming over the plate)—the t² term grows later.

🔍 Comparing lines and planes

📊 Structural differences

ObjectEquationsParametersWhat you know directly
Line2 equations (without t)1 parameter tStarting point R₀ and direction v
Plane1 equation: ax + by + cz = d2 parameters t, sNormal vector N = (a, b, c)
  • Line: R = R₀ + t v (one direction); or (x − x₀)/v₁ = (y − y₀)/v₂ = (z − z₀)/v₃.
  • Plane: R = R₀ + t v + s w (two directions in the plane); or ax + by + cz = d (perpendicular direction N).
  • Don't confuse: lines look simpler with parameters, planes look simpler without.

🔗 Line-plane relationships

  • Parallel: line R₀ + t v is parallel to plane (normal N) when v · N = 0.
  • Perpendicular: line is perpendicular to plane when v is parallel to N (i.e., v = λN).
  • Finding the plane containing a point and a line: use the line's direction v and a vector w from the line to the point; the normal is N = v × w.
  • Example: plane through (1, 2, 1) and line (1, 0, 0) + t(2, 0, −1) has v = 2ik, w = 2j + k, N = v × w = 2i − 2j + 4k; equation 2x − 2y + 4z = 2.
72

Plane Motion: Projectiles and Cycloids

12.2 Plane Motion: Projectiles and Cycloids

🧭 Overview

🧠 One-sentence thesis

Projectile motion follows parabolic paths determined by initial velocity, angle, and gravity, while cycloids—traced by points on rolling circles—require parametric equations and solve optimization problems like the fastest slide between two points.

📌 Key points (3–5)

  • Projectile motion basics: horizontal velocity stays constant; vertical velocity decreases by gt; position is found by integrating velocities to get x(t) and y(t).
  • Three-parameter system: projectile motion is determined by initial speed v₀, launch angle α, and time t; some targets cannot be reached, others can be reached in two ways.
  • Cycloids need parameters: the path of a point on a rolling circle cannot be written as y = f(x) without a parameter (angle θ); parametric form is x = a(θ − sin θ), y = a(1 − cos θ).
  • Common confusion: for projectiles, solving y(t) = 0 gives the time of landing, not the place; you must then substitute that time into x(t) to find the range.
  • Cycloid optimization: an upside-down cycloid gives the fastest slide (brachistochrone) between two points—faster than a straight line because starting vertically builds speed early.

🎯 Projectile motion fundamentals

🚀 How projectiles move

The excerpt describes motion without air resistance, with only gravity acting downward.

  • Horizontal component: velocity dx/dt = v₀ cos α stays constant (no horizontal force).
  • Vertical component: velocity dy/dt = v₀ sin α − gt decreases linearly due to gravity (acceleration d²y/dt² = −g).
  • Starting from (0, 0), integrate the velocities:
    • x(t) = (v₀ cos α)t
    • y(t) = (v₀ sin α)t − ½gt²

Projectile path: x(t) = (v₀ cos α)t, y(t) = (v₀ sin α)t − ½gt²

The path is a parabola, but written parametrically (with time t) rather than as y = ax² + bx + c.

⏱️ Flight time vs landing position

Don't confuse: solving for when the projectile lands involves two steps, not one.

  • To find when it hits the ground: solve y(t) = 0 for time T.
  • To find where it lands (the range R): substitute T into x(T).

Example from the excerpt: Water leaves a hose at v₀ = 10 m/s at angle α.

  • Flight ends when y = (10 sin α)T − ½gT² = 0.
  • Solving: T = (20 sin α)/g.
  • Range: R = x(T) = (10 cos α)T = (200 cos α sin α)/g = (100 sin 2α)/g.
  • With g = 9.8 m/s², maximum range is 100/9.8 ≈ 10.2 meters (when sin 2α = 1, i.e., α = 45°).
  • Cannot reach a car 12 meters away from ground level.

📐 Angle and range relationships

QuantityFormulaNotes
Flight time T(2v₀ sin α)/gProportional to vertical component
Range R(v₀² sin 2α)/gSame for two angles if sin 2α is the same
Maximum height(v₀ sin α)²/(2g)Occurs at t = T/2 (halfway)

Two angles, same range: The excerpt notes that sin 2α is the same for α = 30° and α = 60° (since sin 60° = sin 120°), so both give the same range but different flight times.

  • With air resistance, the optimal angle drops from 45° to about 35° (mentioned for baseball).

🎡 Cycloid geometry

🔄 What a cycloid is

Cycloid: the path traced by a point on a circle of radius a as the circle rolls along the x-axis.

  • After one complete turn, the point returns to the x-axis at x = 2πa.
  • The curve has a cusp (infinite slope) at each landing point.

Why parametric form is essential: The excerpt states that after a century of trying to find an xy equation, scientists (Galileo, Wren, Huygens, Bernoulli, Newton, l'Hôpital) concluded the right way is to use a parameter θ (the angle through which the circle turns).

📏 Parametric equations for the cycloid

The parameter θ is the turning angle of the circle (not the polar angle from the origin).

  • The circle rolls a distance aθ along the x-axis, so its center is at (aθ, a).
  • To account for the point P on the circle, subtract the horizontal and vertical offsets:
    • x = a(θ − sin θ)
    • y = a(1 − cos θ)

At θ = 0: position is (0, 0).
At θ = 2π: position is (2πa, 0).

Slope: dy/dx = (dy/dθ)/(dx/dθ) = (a sin θ)/(a(1 − cos θ)) = sin θ/(1 − cos θ).

  • At θ = 0, this is infinite (cusp).

📊 Cycloid properties

The excerpt poses three "questions" and answers them:

PropertyFormulaCalculation method
Area under one arch3πa²Integrate y dx from θ = 0 to 2π
Arc length8aIntegrate ds = √((dx/dθ)² + (dy/dθ)²) dθ
Sliding time (upside-down)π√(a/g)Use energy conservation: ½mv² − mgy = 0, so v = √(2gy)

Energy method for sliding time:

  • Kinetic + potential energy = ½mv² − mgy = 0 (starts from rest at y = 0).
  • Speed v = √(2gy) = ds/dt.
  • Sliding time = ∫ dt = ∫ ds/√(2gy).
  • Substituting the cycloid's ds and y in terms of θ gives π√(a/g).

Dimension check: a is distance, g is distance/time², so √(a/g) has units of time. ✓

🏆 The brachistochrone problem

⚡ Fastest slide: cycloid beats straight line

Brachistochrone problem: find the curve from point O to point Q that minimizes sliding time under gravity (starting from rest).

The upside-down cycloid solves this problem.

Why not a straight line? The excerpt compares:

  • Cycloid sliding time: π√(a/g)
  • Straight-line sliding time: √(2 + 4)√(a/g) ≈ 2.45√(a/g)

The cycloid is faster even though the path is longer, because "it is better to start out vertically and pick up speed early."

🦁 Historical challenge

John Bernoulli posed this as an international challenge. Most mathematicians couldn't solve it. Isaac Newton solved it anonymously, but Bernoulli recognized him: "I recognize the lion by his claws."

Additional cycloid property: Starting from rest at any point P along the upside-down cycloid, the time to reach the bottom Q is the same. Bernoulli: "You will be petrified with astonishment when I say..."

🔗 Related curves

🌀 Variations on the cycloid

CurveDescriptionNotes
EpicycloidCircle rolls around the outside of another circle
HypocycloidCircle rolls inside a fixed circle
AstroidSpecial hypocycloid with radii ratio 1:4x = a cos³ θ, y = a sin³ θ (curved star)
TrochoidPoint P is distance d from center (not on circumference)x = aθ − d sin θ, y = a − d cos θ

🚂 Train wheel puzzle

The excerpt mentions an old puzzle: "What point moves backward when a train starts forward?"

Answer: The bottom of the wheel flange (which extends below the track) has dx/dt < 0 at the lowest point—it moves backward relative to the ground even as the train moves forward.

73

Curvature and Normal Vector

12.3 Curvature and Normal Vector

🧭 Overview

🧠 One-sentence thesis

Curvature measures how sharply a curve bends by quantifying the rate at which the direction of motion changes, while the normal vector indicates the direction in which the curve is turning.

📌 Key points (3–5)

  • What curvature measures: the rate at which the tangent direction T changes as you move along the curve, independent of speed.
  • Two key geometric quantities: curvature κ (kappa) tells how fast the direction turns; normal vector N tells which way it turns.
  • Speed vs shape: curvature and normal vector depend only on the curve's shape, not on how fast you travel along it (changing the parameter from t to 2t leaves them unchanged).
  • Common confusion: don't confuse the three sources of acceleration—gas/brake change speed (tangential component), steering changes direction (normal component proportional to curvature).
  • Circle as reference: for a circle, curvature equals 1/radius (smaller circle = sharper turn = larger curvature), and N points toward the center.

📐 Understanding curvature

📏 Definition and meaning

Curvature κ = |dT/ds|: the magnitude of the change in the unit tangent vector T per unit arc length s.

  • Curvature measures "change in direction divided by change in position."
  • It depends only on the curve's geometry, not on the speed of travel.
  • Larger curvature means sharper bending; zero curvature means a straight line.

🔄 Circle as the standard example

  • For a circle of radius a, the curvature is κ = 1/a.
  • Smaller radius → tighter turn → larger curvature.
  • Example: A circle of radius 2 has curvature 1/2; a circle of radius 1 has curvature 1.

🧮 Three formulas for curvature

ContextFormulaWhen to use
General (vectors)κ = |v × a| / |v|³Any parametric curve with velocity v and acceleration a
Plane curve (parametric)κ = |x'y'' − y'x''| / ((x')² + (y')²)^(3/2)Given x(t) and y(t)
Graph y = f(x)κ = |d²y/dx²| / (1 + (dy/dx)²)^(3/2)Curve given as y(x)
  • The "brutal but valuable" plane curve formula comes from computing the cross product components.
  • Common approximation: use |d²y/dx²| alone (omitting the denominator) for small slopes.

🎯 The normal vector N

🧭 Definition and direction

Normal vector N = (dT/ds) / |dT/ds|: a unit vector along the derivative of T, perpendicular to T.

  • N is perpendicular to the tangent T (proven by differentiating T · T = 1).
  • N points in the direction the curve is turning.
  • For plane curves, N points "left or right"; for space curves, follow dT.

🔗 Why T and dT are perpendicular

  • Since T is a unit vector, T · T = 1 always.
  • Differentiating both sides: 2T · (dT/dt) = 0.
  • Therefore dT/dt is perpendicular to T.
  • Geometric intuition: T moves around the unit sphere; movement dT must be perpendicular to the radius T.

🌀 Example: helix

For the unit helix R = cos(t)i + sin(t)j + tk:

  • T slopes upward at 45°, going around a circle at that latitude.
  • N = −cos(t)i − sin(t)j is horizontal, pointing toward the helix axis.
  • The curvature is κ = 1/2 (less than a circle's κ = 1 because of the climbing).

🚗 Acceleration components

⚡ Two sources of acceleration

The acceleration splits into two perpendicular components:

a = (d²s/dt²)T + κ(ds/dt)²N

ComponentDirectionPhysical meaning
(d²s/dt²)TAlong TSpeeding up or slowing down (gas/brake)
κ(ds/dt)²NAlong NTurning (steering wheel)
  • The tangential component is the rate of change of speed.
  • The normal component depends on both curvature κ and speed squared.
  • Don't confuse: all three controls (gas, brake, steering) change velocity, but only steering changes direction.

🏎️ Why turning requires force

  • Newton's Law: F = ma.
  • The force needed to steer around a corner is proportional to curvature and speed squared.
  • Example: At fixed speed (d²s/dt² = 0), the only acceleration is κN—purely from turning.
  • Example: Circular speed-up with R = cos(t²)i + sin(t²)j gives a = 2T + 4t²N (tangential component 2, normal component 4t²).

🎢 Special case: constant speed

  • When moving at unit speed, |v| = 1 and ds/dt = 1.
  • Then d²s/dt² = 0 (no change in speed).
  • Acceleration simplifies to a = κN (purely in the turning direction).

📊 Practical formulas and examples

📈 Parabola curvature

For y = (1/2)x²:

  • Curvature κ = 1/(1 + x²)^(3/2).
  • At x = 0 (the vertex), κ = 1.
  • As x increases, κ approaches zero (the parabola straightens out far from the vertex).
  • Common approximation y'' = 1 agrees with κ at x = 0 but diverges elsewhere.

🌐 Helix curvature

For the unit helix R = cos(t)i + sin(t)j + tk:

  • Compute v × a to get |v × a| = √2.
  • Speed |v| = √2.
  • Curvature κ = √2/(√2)³ = 1/2.
  • Comparison: Without the climbing term tk, this would be a unit circle with κ = 1.

🎪 Key insight about parameters

  • Replacing t with 2t doubles velocity v and multiplies acceleration a by 4.
  • Both |v × a| and |v|³ get multiplied by 8.
  • Their ratio κ remains unchanged—curvature depends only on the curve's shape, not the speed of traversal.
74

Polar Coordinates and Planetary Motion

12.4 Polar Coordinates and Planetary Motion

🧭 Overview

🧠 One-sentence thesis

Polar coordinates provide the natural framework for proving Kepler's three laws of planetary motion by connecting central gravitational forces to elliptical orbits through calculus.

📌 Key points (3–5)

  • Central forces keep motion planar: any force pointing toward the origin (like gravity from the sun) constrains the planet to move in a single plane because the cross product R × v remains constant.
  • Polar unit vectors: u_r points radially outward and u_θ is perpendicular to it; both rotate as θ changes, which creates the complexity in velocity and acceleration formulas.
  • Kepler's second law (area swept): follows directly from conservation of angular momentum; dA/dt = constant means planets move faster when closer to the sun.
  • Kepler's first law (elliptical orbits): proven by substituting q = 1/r and showing that the equation of motion reduces to a differential equation whose solution is the polar form of an ellipse.
  • Common confusion: the sun is at a focus of the ellipse, not at the center—a mistake even the Royal Mint made on a pound note featuring Newton.

🌐 Polar coordinate system for motion

🧭 The two unit vectors

u_r: the unit vector pointing radially outward from the origin, equal to (cos θ)i + (sin θ)j.

u_θ: the unit vector perpendicular to u_r in the direction of increasing θ, equal to (−sin θ)i + (cos θ)j.

  • These vectors are always perpendicular: u_r · u_θ = 0.
  • Unlike i and j, these vectors change direction as the point moves.
  • Key derivatives: d(u_r)/dθ = u_θ and d(u_θ)/dθ = −u_r.
  • Don't confuse: the subscripts r and θ indicate direction, not derivatives.

🚀 Velocity in polar coordinates

The position vector is R = r u_r (distance times direction).

Using the chain rule, velocity becomes:

  • v = (dr/dt) u_r + r(dθ/dt) u_θ
  • The first term (dr/dt) u_r is the outward speed.
  • The second term r(dθ/dt) u_θ is the circular speed.
  • The magnitude squared is |v|² = (dr/dt)² + (r dθ/dt)².

Example: For steady circular motion with r = 3 and θ = 2t, velocity is v = 6 u_θ (purely circular, no radial component).

⚡ Acceleration in polar coordinates

The acceleration formula (from differentiating velocity) is:

  • a = [d²r/dt² − r(dθ/dt)²] u_r + [r d²θ/dt² + 2(dr/dt)(dθ/dt)] u_θ
ComponentPhysical meaning
d²r/dt² − r(dθ/dt)²Radial acceleration (outward/inward)
r d²θ/dt² + 2(dr/dt)(dθ/dt)Tangential acceleration (around)
  • The term −r(dθ/dt)² is the centripetal acceleration pulling inward during rotation.
  • For circular motion: u_θ corresponds to the tangent vector T, and −u_r corresponds to the normal vector N.

🪐 Central forces and planar motion

🎯 What makes a force "central"

Central force: a force that is always a multiple of the position vector R from the origin (the sun).

  • Gravity from the sun is central: F points from planet toward sun.
  • Because F = ma (Newton's law) and F is parallel to R, the acceleration a is also parallel to R.
  • Therefore R × a = 0 (cross product of parallel vectors is zero).

🔒 Why motion stays in a plane

Starting from R × a = 0, apply the product rule to R × v:

  • d/dt(R × v) = v × v + R × a = 0 + 0 = 0
  • So R × v = H (a constant vector).
  • Since R is always perpendicular to the constant vector H, the position R must stay in the plane perpendicular to H.

This is a powerful general result: any central force field confines motion to a plane.

📐 Kepler's second law: equal areas in equal times

⏱️ The area-sweeping rate

Kepler's second law states:

The vector from sun to planet sweeps out area at a steady rate: dA/dt = constant.

From the constant angular momentum H = R × v:

  • Substituting R = r u_r and v = (dr/dt) u_r + r(dθ/dt) u_θ gives H = r²(dθ/dt)(u_r × u_θ).
  • The magnitude is h = r²(dθ/dt).

In polar coordinates, a small wedge of area is dA = (1/2) r² dθ.

Therefore:

  • dA/dt = (1/2) r²(dθ/dt) = (1/2) h = constant

🌍 Physical interpretation

  • When the planet is near the sun, r is small, so dθ/dt must be large → the planet moves faster around its orbit.
  • When the planet is far from the sun, r is large, so dθ/dt is smaller → the planet moves slower.
  • The product r²(dθ/dt) stays constant, balancing distance and angular speed.

🥚 Kepler's first law: elliptical orbits

🔄 The substitution q = 1/r

To prove orbits are ellipses, introduce q = 1/r and find its equation.

Using the chain rule and the fact that dθ/dt = hq²:

  • dr/dt = −(1/q²)(dq/dθ)(dθ/dt) = −h(dq/dθ)
  • d²r/dt² = −h²q²(d²q/dθ²)

⚖️ Newton's law in polar form

The radial component of F = ma for gravity is:

  • −GM/r² = d²r/dt² − r(dθ/dt)²

Substituting r = 1/q and dθ/dt = hq²:

  • −GMq² = −h²q²(d²q/dθ²) − (1/q)(hq²)²

Dividing by −h²q² gives:

  • d²q/dθ² + q = GM/h² = C (a constant)

🎯 The solution is an ellipse

The general solution is q = C − D cos θ, which means:

  • 1/r = C − D cos θ

This is the polar equation of a conic section.

For the orbit to be an ellipse (not a parabola or hyperbola):

  • The condition is C > D, ensuring q never reaches zero (r never becomes infinite).
  • The sun is at one focus of the ellipse, not at the center.

Example: At θ = 0, r = 1/(C − D) is the minimum distance (perihelion); at θ = π, r = 1/(C + D) is the maximum distance (aphelion).

⏳ Kepler's third law: the period formula

📏 Relating period to orbit size

Kepler's third law:

The period T (the planet's "year") is proportional to a^(3/2), where a is the semi-major axis (maximum distance from the ellipse's center).

The formula is T = 2π√(a³/GM), or equivalently T = k a^(3/2) where k = 2π/√(GM) is the same for all planets.

🧮 Deriving the period

Two key facts about ellipses:

  1. The total area is πab (where b is the semi-minor axis).
  2. The height above the sun (at θ = π/2) is b²/a.

From Kepler's second law, the area swept in time T is:

  • A = (1/2)hT = πab, so T = 2πab/h

From the orbit equation at θ = π/2:

  • r = 1/C = b²/a, and C = GM/h²

Combining these:

  • b = √(a/C) = √(ah²/GM)
  • T = 2πab/h = 2π√(a³/GM)

🌌 Verification for circular orbits

For a circular orbit with radius r and angular velocity ω:

  • Gravity provides centripetal force: GM/r² = rω²
  • Solving: ω = √(GM/r³)
  • Period: T = 2π/ω = 2π√(r³/GM) ✓

This matches Kepler's formula with a = r.

🎓 Historical significance

🌟 Kepler's achievement

  • Kepler worked 60 years before Newton invented calculus.
  • He discovered these laws from pages of astronomical observations (especially Tycho Brahe's data) through "terrific guesses."
  • The excerpt calls this "the greatest scientific discovery of all time"—on May 15, 1618, Kepler wrote that "the right ratio outfought the darkness of my mind."

🏛️ The Royal Mint mistake

The British pound note once showed Newton with his Principia Mathematica, but the artist incorrectly drew the sun at the center of the ellipse instead of at a focus—contradicting what Newton had just proved. The note was withdrawn from circulation.

Don't confuse: the semi-major axis a measures distance from the center, but the sun sits at a focus, offset by distance c where c² = a² − b².

budget:token_budget Tokens used: approximately 2,800 Tokens remaining: approximately 997,200 </budget:token_budget>

75

Surfaces and Level Curves

13.1 Surfaces and Level Curves

🧭 Overview

🧠 One-sentence thesis

Surfaces in three-dimensional space are visualized through level curves in the base plane, which connect all points sharing the same function value and reveal the shape, steepness, and critical points of the surface.

📌 Key points (3–5)

  • What a surface is: the graph of z = f(x, y) is a surface in xyz space, where x and y are independent and z depends on them.
  • What level curves show: a level curve f(x, y) = c in the base plane lies below all surface points at height z = c; the family of labeled curves forms a contour map.
  • How to read steepness: closely bunched level curves indicate steep regions; widely spaced curves indicate flat regions; the steepest direction is perpendicular to the level curves.
  • How to find extrema: level curves form tightening loops around maximum or minimum points as c increases or decreases.
  • Common confusion: level curves are drawn in the base plane (the xy plane), not on the surface itself; they represent the "shadow" of horizontal slices through the surface.

📐 From curves to surfaces

📐 One variable vs two variables

  • One variable: y = f(x) produces a curve in the xy plane (two-dimensional).
  • Two variables: z = f(x, y) produces a surface in xyz space (three-dimensional).
  • Above each point (x, y) in the base plane sits the point (x, y, z) on the surface.
  • The printed page is two-dimensional, so surfaces are shown using shading, color, or projection; human eyes are skilled at interpreting these as three-dimensional shapes.

🧮 New calculus tools

The chapter extends one-variable calculus to multivariable calculus:

One variable f(x)Two variables f(x, y)
df/dxTwo partial derivatives: ∂f/∂x and ∂f/∂y
d²f/dx²Four second derivatives: ∂²f/∂x², ∂²f/∂x∂y, ∂²f/∂y∂x, ∂²f/∂y²
Tangent lineTangent plane: z − z₀ = (∂f/∂x)(x − x₀) + (∂f/∂y)(y − y₀)
df/dx = 0 (max/min)Two equations: ∂f/∂x = 0 and ∂f/∂y = 0

🗺️ Level curves and contour maps

🗺️ What a level curve is

Level curve or contour line of z = f(x, y): contains all points (x, y) that share the same value f(x, y) = c.

  • Above these points, the surface is at height z = c.
  • Different values of c produce different level curves.
  • To see the curve for c = 2: cut through the surface with the horizontal plane z = 2; the intersection projects down to the level curve f(x, y) = 2 in the base plane.
  • The family of labeled curves (one for each c) is called a contour map.

🗺️ How to construct a contour map

  1. Choose a constant c.
  2. Solve f(x, y) = c to find the curve in the xy plane.
  3. Label that curve with c.
  4. Repeat for many values of c.
  • Example: For z = √(x² + y²) (a cone), the level curves are √(x² + y²) = c, which are circles of radius c.

🗺️ Why level curves never cross

  • If two level curves crossed at a point (x, y), that point would have two different values f(x, y) = c₁ and f(x, y) = c₂ simultaneously, which is impossible for a function.

🏔️ Reading surfaces from level curves

🏔️ Identifying steepness

  • Closely bunched curves: the surface is steep (height changes rapidly over a short horizontal distance).
  • Widely spaced curves: the surface is flat (height changes slowly).
  • Example: On a mountain trail, tightly packed contour lines mean a steep climb; far-apart lines mean gentle terrain.

🏔️ Finding maxima and minima

  • Maximum: level curves form loops that tighten around the highest point as c increases.
  • Minimum: level curves form loops that tighten around the lowest point as c decreases.
  • The curves squeeze to a single point at the extremum.

🏔️ Steepest direction

  • The steepest direction on the surface (uphill or downhill) is perpendicular to the level curves in the base plane.
  • Water runs perpendicular to contour lines (though the excerpt notes this looks "doubtful for rivers" on some maps).

🧪 Examples of surfaces and their level curves

🧪 The cone: z = √(x² + y²)

  • Surface: a 45° cone (distance out equals distance up).
  • Level curves: circles √(x² + y²) = c, with radius c.
  • At height 5, the cone contains a circle of points; the circle in the base plane is the level curve for c = 5.

🧪 The paraboloid: z = x² + y²

  • Surface: bends upward like a parabola (not a cone).
  • Level curves: still circles x² + y² = c.
  • The circle of radius 3 is the level curve for c = 9 (height 9 on the surface).

🧪 The plane: z = 2x + y

  • Surface: a plane.
  • Level curves: parallel straight lines 2x + y = c.
  • Lines are labeled by their height c (positive above the base plane, negative below).
  • Don't confuse: not all surfaces with straight-line level curves have parallel lines (e.g., z = y/x has lines y = cx that swing around the origin like a spiral slide).

🧪 Temperature maps

  • A weather map shows contour lines of the temperature function.
  • Each level curve connects points at constant temperature (isotherms).
  • Example: one line might run from Seattle to Omaha to Cincinnati to Washington.

🧪 Mountain contour maps

  • Level curves are typically separated by 100 feet in height.
  • Steep trail: curves bunched together (height climbs quickly).
  • Flat region: curves far apart.
  • Water runs perpendicular to the level curves.

🧪 Utility functions in economics

  • Utility function: x²y gives the value of x hours awake and y hours asleep.
  • Indifference curve: x²y = c; any two points on this curve have the same utility, so we are indifferent between them.
  • The curve is "convex": we prefer the average of any two points (the line between two points lies on higher level curves).
  • Example: 2 hours awake and 1/4 hour asleep has the same value as 1 hour of each: (2²)(1/4) = (1²)(1) = 1.
  • Extreme case 1: f = 4x + y (four quarters substitute for one dollar); level curves are straight lines (perfect substitution).
  • Extreme case 2: f = min(x, y) (counts pairs of shoes); extra left or right shoes are useless; level curves form right angles (complements).
76

Partial Derivatives

13.2 Partial Derivatives

🧭 Overview

🧠 One-sentence thesis

Partial derivatives measure how a multivariable function changes in one direction at a time by holding other variables constant, and they generalize ordinary derivatives to surfaces and higher dimensions.

📌 Key points (3–5)

  • What partial derivatives are: ordinary derivatives of "partial functions" where all but one variable are held constant.
  • Geometric meaning: partial derivatives describe cross sections of surfaces cut by vertical planes, while level curves are cut by horizontal planes.
  • Second derivatives: two first partial derivatives lead to four second partial derivatives, but the mixed derivatives (f_xy and f_yx) are equal when continuous.
  • Common confusion: saddle points have both partial derivatives equal to zero but are neither maxima nor minima—they are flat in all directions yet simultaneously tops and bottoms.
  • Why it matters: partial derivatives appear in physics (heat and wave equations), economics (marginal profits), and any field with multiple variables.

📐 Partial functions and geometric interpretation

📐 What partial functions are

Partial function: a function where one variable is fixed at a constant value while the other varies.

  • For a function f(x, y), fixing y at y₀ gives the partial function in x; fixing x at x₀ gives the partial function in y.
  • Example: for the distance function f = square root of (x² + y²), fixing y = 0 gives the partial function x² (varies with x only).

✂️ Cross sections vs level curves

  • Cross sections: graphs of partial functions, cut out by vertical planes (y = y₀ or x = x₀).
  • Level curves: cut out by horizontal planes (z = c).
  • Don't confuse: vertical planes create cross sections showing how the surface rises/falls in one direction; horizontal planes show contours of constant height.
FeatureCross sectionLevel curve
Plane typeVerticalHorizontal
What it showsPartial function graphPoints of equal function value
Equation formx = x₀ or y = y₀z = c

🔍 Computing partial derivatives

Partial derivative: the ordinary derivative of a partial function (constant y or constant x).

  • Notation: ∂f/∂x or f_x for the x-derivative; ∂f/∂y or f_y for the y-derivative.
  • The partial derivative involves one direction, but limits and continuity involve all directions.
  • Example: for f(x, y) = y² - x², the partial derivatives are ∂f/∂x = -2x and ∂f/∂y = 2y.

🏔️ Saddle points and special behavior

🏔️ What makes a saddle point

  • A saddle point occurs where both partial derivatives are zero, but the point is neither a maximum nor a minimum.
  • Example: for f(x, y) = y² - x², the origin (0, 0) is a saddle point because both ∂f/∂x = 0 and ∂f/∂y = 0 there.
  • The surface is "momentarily flat in all directions" yet is simultaneously the bottom of one parabola and the top of another.

🎢 Understanding the saddle shape

  • Moving in the y direction from (0, 0): the partial function y² - 0 opens upward (bottom of a valley).
  • Moving in the x direction from (0, 0): the partial function 0 - x² opens downward (top of a hill).
  • The surface is called a hyperbolic paraboloid because level curves y² - x² = c are hyperbolas.

🔄 Alternative saddle example

  • The function f = 2xy also has a saddle point at the origin.
  • Along the 45° line (x = y): the function becomes 2x² and is climbing.
  • Along the -45° line (x = -y): the function becomes -2x² and is falling.
  • The graph of 2xy is the same saddle rotated by 45°.

🔢 Second derivatives and mixed partials

🔢 Four second derivatives from two first

  • Two first derivatives (f_x and f_y) lead to four second derivatives:
    • f_xx: x-derivative of f_x (pure x direction)
    • f_yy: y-derivative of f_y (pure y direction)
    • f_xy: y-derivative of f_x (mixed)
    • f_yx: x-derivative of f_y (mixed)
  • Notation alternatives: ∂²f/∂x², ∂²f/∂x∂y, ∂²f/∂y∂x, ∂²f/∂y².

⚖️ Equality of mixed derivatives

Key theorem: If f(x, y) has continuous second derivatives, then f_xy = f_yx.

  • Example: for f = x/y, both mixed derivatives equal -1/y².
  • Example: for f = 4x² + 3xy + y², both cross derivatives equal 3.
  • The order of differentiation doesn't matter when second derivatives are continuous.

🎯 Computing second derivatives

  • Only one variable moves at a time, so one-variable calculus is sufficient.
  • Example: for f = 4x² + 3xy + y²:
    • First: f_x = 8x + 3y and f_y = 3x + 2y
    • Second: f_xx = 8, f_yy = 2, f_xy = f_yx = 3

🌍 Applications and extensions

🌍 Multiple variables beyond x and y

  • Functions can have more than two variables: f(x, y, z) = x² + y² + z².
  • Variables may have domain-specific names: pressure P(T, V) = nRT/V uses temperature T and volume V.
  • In economics: 26 products mean 26 variables, sometimes 52 to include prices and amounts.

📊 Marginal analysis

  • Partial derivatives are marginal profits when one of many variables changes.
  • A spreadsheet shows current values; an "infinitesimal spreadsheet" shows derivatives.
  • Example: changing one product quantity while holding others constant reveals marginal profit for that product.

🌊 Differential equations

  • Heat equation: f_t = f_xx (one dimension) or f_t = f_xx + f_yy (two dimensions).
  • Wave equation: f_tt = c²f_xx describes wave motion.
  • Example: f(x, t) = sin(x + t) and f(x, t) = sin(x - t) both satisfy the wave equation; one wave moves left, the other right.
  • The wave velocity is distance/time = Δx/Δt.
77

Tangent Planes and Linear Approximations

13.3 Tangent Planes and Linear Approximations

🧭 Overview

🧠 One-sentence thesis

The tangent plane provides a linear approximation to a curved surface near a point, enabling us to estimate function values, understand sensitivity to changes, and solve nonlinear equations through Newton's method.

📌 Key points (3–5)

  • Tangent plane equation: For z = f(x, y), the tangent plane at (x₀, y₀, z₀) is z - z₀ = (∂f/∂x)₀(x - x₀) + (∂f/∂y)₀(y - y₀), matching the surface's slopes in both directions.
  • Total differential: The differential dz = (∂z/∂x)dx + (∂z/∂y)dy represents exact movement on the tangent plane and approximates movement on the curved surface.
  • Linear approximation accuracy: The approximation f(x, y) ≈ f(x₀, y₀) + (∂f/∂x)₀(x - x₀) + (∂f/∂y)₀(y - y₀) has error of order (x - x₀)² + (y - y₀)², coming from ignored quadratic terms.
  • Common confusion: Don't confuse the differential dz (exact on the plane) with Δz (actual change on the surface); the difference is second-order small.
  • Newton's method extension: For two equations g(x, y) = 0 and h(x, y) = 0, linear approximations at each step lead to a system of linear equations that iteratively approach the solution.

📐 Building the tangent plane

📏 What the tangent plane represents

The tangent plane at (x₀, y₀, z₀) has the same slopes as the surface z = f(x, y).

  • Over a short range, a smooth surface looks flat, just as a curve looks straight.
  • The plane "balances on" the surface, touching at exactly one point.
  • All tangent lines to curves through the point lie in this tangent plane.
  • The plane equation is linear: z - z₀ = (∂f/∂x)₀(x - x₀) + (∂f/∂y)₀(y - y₀).

🧭 The normal vector

  • The normal vector N perpendicular to the tangent plane has components: ((∂f/∂x)₀, (∂f/∂y)₀, -1).
  • For a sphere, the normal vector points outward along the radius.
  • Any tangent vector T to a curve on the surface satisfies T · N = 0 (perpendicular).
  • Example: For z = 14 - x² - y² at (1, 2, 9), the derivatives are -2x and -2y, giving -2 and -4 at the point, so N = (-2, -4, -1).

🔄 Implicit form for surfaces

When the surface is given as F(x, y, z) = c rather than z = f(x, y):

  • Tangent plane equation: (∂F/∂x)₀(x - x₀) + (∂F/∂y)₀(y - y₀) + (∂F/∂z)₀(z - z₀) = 0.
  • Normal vector: N = ((∂F/∂x)₀, (∂F/∂y)₀, (∂F/∂z)₀).
  • No need to solve for z first; use implicit differentiation directly.
  • Example: For a sphere x² + y² + z² = 14, differentiating gives 2x + 2z(∂z/∂x) = 0, so ∂z/∂x = -x/z.

📊 Differentials and sensitivity

📏 Total differential definition

The total differential measures movement on the tangent plane:

  • dz = (∂z/∂x)dx + (∂z/∂y)dy, where dx and dy are small changes in x and y.
  • This holds exactly on the plane; approximately on the surface.
  • The differential shows how sensitive z is to changes in each variable.
  • Example: For cylinder volume V = πr²h, the differential dV = 2πrh dr + πr² dh represents shell volume plus layer volume.

🔍 Comparing sensitivities

Partial derivatives reveal which variable has more impact:

  • Larger partial derivative means greater sensitivity to that variable.
  • Example: For V = πr²h at r = h = 1, we have ∂V/∂r = 2πrh = 2π and ∂V/∂h = πr² = π, so V is twice as sensitive to changes in r.
  • For a tall cylinder, radius changes matter more; for a flat cylinder (like a penny), height changes matter more.

📐 Area and volume applications

Example: Triangle area A = (1/2)ab sin θ has three variables:

  • ∂A/∂a = (1/2)b sin θ
  • ∂A/∂b = (1/2)a sin θ
  • ∂A/∂θ = (1/2)ab cos θ
  • Total differential: dA = (1/2)b sin θ da + (1/2)a sin θ db + (1/2)ab cos θ dθ

🎯 Linear approximation

📍 The approximation formula

Near (x₀, y₀), the function f(x, y) is approximately:

  • f(x, y) ≈ f(x₀, y₀) + (∂f/∂x)₀(x - x₀) + (∂f/∂y)₀(y - y₀)
  • This is the same as f ≈ f₀ + fₓΔx + fᵧΔy.
  • The right side is a linear function fₗ(x, y).
  • At the basepoint, f and fₗ have the same value and same slopes.

⚠️ Error in approximation

The error comes from curvature (second derivatives):

  • |f(x, y) - fₗ(x, y)| ≤ M[(x - x₀)² + (y - y₀)²]
  • M bounds the second derivatives fₓₓ, fₓᵧ, and fᵧᵧ along the path.
  • The error is "second order"—quadratic in the distance moved.
  • Example: For cylinder volume from r = h = 1.0 to 1.1, the linear approximation gives dV = 0.300 while the actual change is ΔV = 0.331, with error 0.031.

🔺 Distance function example

For r = √(x² + y²):

  • Partial derivatives: ∂r/∂x = x/r and ∂r/∂y = y/r
  • Linear approximation: r ≈ r₀ + (x/r₀)Δx + (y/r₀)Δy
  • Don't confuse: change of distance is linear in Δx and Δy (first approximation), even though distance itself involves a square root.
  • Breakdown: At (0, 0) the derivatives x/r and y/r are discontinuous; the cone has a sharp point with no tangent plane.

🔧 Newton's method for two equations

🎯 The iteration scheme

To solve g(x, y) = 0 and h(x, y) = 0 simultaneously:

  • Start from current guess (xₙ, yₙ).
  • Replace g and h by their linear approximations.
  • Set approximations to zero: (∂g/∂x)Δx + (∂g/∂y)Δy = -g(xₙ, yₙ) and (∂h/∂x)Δx + (∂h/∂y)Δy = -h(xₙ, yₙ).
  • Solve these two linear equations for steps Δx and Δy.
  • Next guess: xₙ₊₁ = xₙ + Δx, yₙ₊₁ = yₙ + Δy.

🌊 Basins of attraction

Different starting points lead to different outcomes:

  • Each solution has a basin of attraction—all starting points that converge to it.
  • There is also a basin leading to infinity (divergence).
  • The basins can be completely mixed together as fractals.
  • Chaos appears on borderlines between basins.
  • Example: For x³ - y = 0 and y³ - x = 0, starting at (2, 1) converges to (1, 1); starting at (-1/2, 0) converges to (0, 0); starting at (1, 0) diverges.

⚡ Convergence properties

  • Near a solution, convergence is fast: error is squared at each step.
  • Each step doubles the number of correct digits.
  • Far from solutions, the method can diverge or exhibit chaotic behavior.
  • The Jacobian matrix J contains all four partial derivatives and determines the linear system at each step.

💼 Economic application example

📈 Supply and demand equilibrium

The equilibrium price P occurs where supply equals demand:

  • Demand line: p = -0.2q + 40 (negative slope—price up, demand down)
  • Supply line: p = sq + t (positive slope s—price up, supply up)
  • Equilibrium: P and Q satisfy both equations simultaneously.

🔄 Sensitivity to parameters

P is a function of supply parameters s and t, even without an explicit formula:

  • Take implicit derivatives of supply = demand equation.
  • From P = -0.2Q + 40 = sQ + t, differentiate with respect to s: Pₛ = -0.2Qₛ = sQₛ + Q.
  • Differentiate with respect to t: Pₜ = -0.2Qₜ = sQₜ + 1.
  • At s = 0.4, t = 10, P = 30, Q = 50, we find Pₛ = 50/3 and Pₜ = 1/3.
  • Interpretation: The derivative of the solution comes from the derivative of the equation, even when we cannot solve explicitly.

This example shows that linear approximation applies even when following the intersection of changing curves, not just points on a fixed surface.

78

Directional Derivatives and Gradients

13.4 Directional Derivatives and Gradients

🧭 Overview

🧠 One-sentence thesis

The directional derivative extends the concept of slope to any direction on a surface, and the gradient vector encodes all directional information while pointing in the direction of steepest ascent.

📌 Key points (3–5)

  • What directional derivatives measure: the rate of change of a function f(x,y) in any direction u, not just along the x or y axes.
  • The gradient vector: grad f = (∂f/∂x, ∂f/∂y) contains all the information needed to compute directional derivatives in every direction via the dot product.
  • Geometric meaning: the gradient points in the direction of steepest climb, its length equals the maximum slope, and it is perpendicular to level curves.
  • Common confusion: the gradient lives in the base plane (xy-plane), not in 3D space with the surface—don't confuse grad f with the normal vector N to the surface.
  • Extension to curves: on curved paths, df/dt measures rate of change with respect to time (depends on speed), while df/ds measures slope with respect to arc length (independent of speed).

📐 What directional derivatives are

📐 The basic idea

  • Partial derivatives ∂f/∂x and ∂f/∂y give slopes in the x and y directions only.
  • But a surface z = f(x,y) has slopes in all directions.
  • A directional derivative measures the slope when moving in an arbitrary direction u (a unit vector).

🔢 The definition

The derivative of f in the direction u at point P is: D_u f(P) = lim (s→0) [f(P + us) - f(P)] / s

  • Start at point P = (x₀, y₀).
  • Move a distance s in direction u = (u₁, u₂).
  • Compute the change Δf and divide by s.
  • Take the limit as s approaches zero.

Example: For f = xy at (1,1) in the 45° direction u = (1/√2, 1/√2), the step to (1 + s/√2, 1 + s/√2) gives z = (1 + s/√2)² = 1 + √2·s + (1/2)s². So Δz/Δs approaches √2 as s→0.

⚡ The fast formula

D_u f = (∂f/∂x)u₁ + (∂f/∂y)u₂

  • This comes from the linear approximation: Δf ≈ (∂f/∂x)Δx + (∂f/∂y)Δy.
  • Since Δx = u₁s and Δy = u₂s, divide by s to get the formula.
  • Key insight: slopes in all directions are determined by slopes in just two directions (x and y).

🎯 The gradient vector

🎯 Definition and notation

The gradient of f(x,y) is the vector: grad f = ∇f = (∂f/∂x)i + (∂f/∂y)j

  • In three dimensions, add the component (∂f/∂z)k.
  • The symbol ∇ is read as "grad" or "del."
  • The gradient is a vector in the base plane (xy-plane), not in 3D space.

🔗 Connection to directional derivatives

The directional derivative is the dot product:

D_u f = (grad f) · u

  • This shows that grad f encodes all directional information.
  • Different directions u extract different components via the dot product.

Example: For f = 3x + y + 1, grad f = (3, 1). In direction u = (1/√2, 1/√2), D_u f = 3/√2 + 1/√2 = 4/√2.

📍 Where the gradient lives

VectorDimensionPurpose
grad f = (f_x, f_y)2D (base plane)Points in steepest climb direction
N = (f_x, f_y, -1)3D (space)Perpendicular to the surface z = f(x,y)

Don't confuse: The gradient shares the derivatives f_x and f_y with the normal vector N, but grad f is in the xy-plane while N points away from the surface in 3D space.

⛰️ Geometric interpretation

⛰️ Steepest direction

The gradient points in the direction of maximum increase of f:

  • The slope D_u f is largest when u is parallel to grad f.
  • That maximum slope equals |grad f| = √(f_x² + f_y²).
  • To climb a mountain most steeply, follow the gradient direction.

Example: For f = 3x + y + 1, grad f = (3, 1) with length √10. The steepest direction is u = (3/√10, 1/√10), and the maximum slope is √10.

🧭 Level direction

The gradient is perpendicular to level curves (contour lines):

  • On a level curve, f is constant, so D_u f = 0.
  • This means (grad f) · u = 0, so grad f ⊥ u.
  • Contour lines on a map run perpendicular to the gradient.

Example: For f = 3x + y + 1, the level direction at any point is proportional to (1, -3), which is perpendicular to grad f = (3, 1). Check: 3·1 + 1·(-3) = 0.

📏 Length as steepness

The magnitude |grad f| measures how steep the surface is:

  • Larger |grad f| means steeper slope.
  • On a flat region, grad f approaches zero.
  • On a cone z = √(x² + y²), |grad z| = 1 everywhere (constant slope).

🛤️ Derivatives along curved paths

🛤️ Two types of derivatives

When moving along a curved path R(t) = x(t)i + y(t)j:

DerivativeFormulaMeaning
df/dt(grad f) · vRate of change with respect to time
df/ds(grad f) · TSlope with respect to arc length
  • v = (dx/dt, dy/dt) is the velocity vector (any speed).
  • T = v/|v| is the unit tangent vector (speed 1).
  • If you move faster, df/dt increases, but df/ds stays the same.

🎢 The chain rule form

df/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt)

This is the multivariable chain rule:

  • It combines the gradient (∂f/∂x, ∂f/∂y) with the velocity (dx/dt, dy/dt).
  • The dot product gives the total rate of change.

Example: For f = xy on the path x = t², y = t, the velocity is v = (2t, 1). At t = 1, v = (2, 1) and grad f = (y, x) = (1, 1). So df/dt = 1·2 + 1·1 = 3.

⚖️ Speed vs. slope

  • df/dt depends on speed: moving faster increases the rate of change.
  • df/ds is independent of speed: it measures the intrinsic slope of f in the path direction.
  • On a straight path at unit speed, df/dt = df/ds = D_u f.

Example: On a circle x = cos t, y = sin t (speed 1), for f = xy, both df/dt and df/ds equal cos²t - sin²t because ds/dt = 1.

🔬 Advanced topics

🔬 Implicit functions

When z is given implicitly by F(x, y, z) = 0:

grad z = (-F_x/F_z, -F_y/F_z)

  • Differentiate F_x dx + F_y dy + F_z dz = 0.
  • Solve for dz to get the gradient components.
  • This avoids solving explicitly for z = f(x,y).

Example: For F = x² + y² - z² = 0 (a cone), F_x = 2x, F_y = 2y, F_z = -2z. So grad z = (x/z, y/z), which points radially outward.

🌐 Coordinate-free view

The gradient can be defined without choosing axes:

  • Direction: grad f points where df/ds is largest.
  • Length: |grad f| equals that maximum slope.
  • The key relation is: (change in f) ≈ (grad f) · (change in position).

This shows the gradient is a "tensor"—its geometric meaning (direction and length) is independent of the coordinate system, even though its formula changes with different coordinates.

79

The Chain Rule for Multivariable Functions

13.5 The Chain Rule

🧭 Overview

🧠 One-sentence thesis

The chain rule extends to multivariable functions by tracking how changes propagate through intermediate variables, enabling us to compute derivatives when functions depend on other functions of multiple variables.

📌 Key points (3–5)

  • Why chain rules matter: Functions of functions (like f(g(x,y)) or f(x(t), y(t))) appear constantly in applications, and we need systematic ways to find their derivatives.
  • Three main patterns: (1) f(z) where z = g(x,y); (2) f(x,y) where x = x(t), y = y(t); (3) f(x,y) where x and y each depend on multiple variables like t and u.
  • Multiple pathways: When t affects f through both x and y, the chain rule sums both contributions: (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt).
  • Common confusion: The notation ∂f/∂x can mean different things depending on which variables are held constant; subscripts like (∂f/∂x)_y clarify what stays fixed.
  • Practical impact: The chain rule connects coordinate systems (like Cartesian to polar), handles constrained variables (like the gas law PV = nRT), and enables solving partial differential equations.

🔗 Three chain rule patterns

🔗 Pattern 1: f(g(x,y)) – one inside function

When f depends on g, and g depends on x and y, the x-derivative is: ∂f/∂x = (df/dg)(∂g/∂x), and similarly for y.

  • How it works: Change x by dx (y constant) → g changes by (∂g/∂x)dx → f changes by (df/dg)(∂g/∂x)dx.
  • Example: Every function f(x + cy) satisfies the one-way wave equation ∂f/∂y = c(∂f/∂x), because ∂g/∂x = 1 and ∂g/∂y = c for g = x + cy.
  • This pattern handles the "outside function" depending on a single composite "inside function."

🔗 Pattern 2: f(x(t), y(t)) – path through space

When both x and y depend on t, the total derivative is: df/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt).

  • Why two terms: t influences f through two pathways—via x and via y—so both contributions add.
  • Example: Driving to Florida, if temperature f increases 0.05 degrees per mile south (x direction) and you drive 70 mph south, the rate is (0.05)(70) = 3.5 degrees per hour.
  • Don't confuse: This is df/dt (total derivative), not ∂f/∂t (partial derivative holding other variables fixed).

🔗 Pattern 3: f(x(t,u), y(t,u)) – multiple independent variables

When x and y each depend on t and u, there are two chain rules: ∂f/∂t = (∂f/∂x)(∂x/∂t) + (∂f/∂y)(∂y/∂t), and similarly for u.

  • We write ∂f/∂t (not df/dt) because u is also present.
  • Example: Converting to polar coordinates, x = r cos θ and y = r sin θ, so ∂f/∂r = (∂f/∂x)(cos θ) + (∂f/∂y)(sin θ).
  • This pattern connects different coordinate systems and enables expressing equations in new variables.

🌡️ Direct time dependence

🌡️ When f depends explicitly on t

If f = f(x, y, t) where x and y also depend on t, the full chain rule adds a direct term:

df/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt) + ∂f/∂t

  • The extra term: ∂f/∂t captures changes in f due to t itself, independent of x and y.
  • Example: Temperature depends on location (x, y) and time of day; driving adds the day/night cycle term to the rate of temperature change.
  • Total vs partial: df/dt is the total derivative from all causes; ∂f/∂t is only the direct effect of t.

🔄 Coordinate transformations

🔄 Polar coordinates example

Starting from f(x, y) with x = r cos θ and y = r sin θ:

∂f/∂θ = (∂f/∂x)(−r sin θ) + (∂f/∂y)(r cos θ)

  • Second derivatives: Applying the chain rule twice gives f_θθ in terms of f_xx, f_xy, f_yy (the formula is lengthy).
  • Laplace's equation: The combination f_xx + f_yy = 0 transforms to f_rr + (1/r)f_r + (1/r²)f_θθ = 0 in polar coordinates.
  • Messy formulas often signal the wrong question; special combinations like f_xx + f_yy are more natural than individual derivatives.

🔄 The paradox of ∂r/∂x

Two calculations seem to contradict:

  • From r = √(x² + y²), we get ∂r/∂x = x/r = cos θ (holding y constant).
  • From r = x/cos θ, we get ∂r/∂x = 1/cos θ (holding θ constant).

Resolution: The key question is "which variable is held constant?"

  • (∂r/∂x)_y = cos θ means moving horizontally (constant y).
  • (∂r/∂x)_θ = 1/cos θ means moving radially (constant θ).
  • Don't confuse: ∂r/∂x is different from 1/(∂x/∂r); the simple reciprocal rule fails for partial derivatives.

⚠️ Non-independent variables

⚠️ When variables are constrained

If x, y, z satisfy a relation (like the gas law PV = nRT), they are not independent—changing one forces others to change.

Subscript notation: Use (∂f/∂x)_y to specify that y is held constant (not z).

Example: For f = 3x + 2y + z on the plane z = 4x + y:

  • (∂f/∂x)_y = 7 (holding y constant, z must change by 4dx)
  • (∂f/∂x)_z = −5 (holding z constant, y must change)
  • The plain ∂f/∂x = 3 assumes both y and z are fixed, which the constraint forbids.

⚠️ Chain rule with constraints

When y is held constant but z depends on x via a constraint:

(∂f/∂x)_y = ∂f/∂x + (∂f/∂z)(∂z/∂x)

  • The first term is the direct effect of x on f.
  • The second term accounts for z changing as x changes (to maintain the constraint).
  • Example: For f = 3x + 2y + z with z = 4x + y, we get 3 + (1)(4) = 7.

⚠️ Implicit differentiation revisited

The chain rule explains the implicit differentiation formula from single-variable calculus:

If F(x, y) = 0, then dy/dx = −(∂F/∂x)/(∂F/∂y)

  • Moving along the curve F = 0, the total change is F_x dx + F_y dy = 0.
  • Solving for dy/dx gives the ratio of partial derivatives.

🎯 Second derivatives and applications

🎯 Second derivatives along a line

Moving at angle θ with speed 1 (so x_t = cos θ, y_t = sin θ):

f_tt = f_xx cos²θ + 2f_xy cos θ sin θ + f_yy sin²θ

  • If f_xx, f_xy, f_yy are all bounded by M, then |f_tt| ≤ 2M along any line.
  • This bound is needed for error estimates in linear approximation.
  • Curved paths add extra terms from the path's curvature (not just the function's curvature).

🎯 Wave and Laplace equations

The chain rule reveals that certain combinations of derivatives are coordinate-independent:

  • Wave equation: f_tt = c²f_xx is satisfied by any f(x + ct) or f(x − ct).
  • Laplace's equation: f_xx + f_yy = 0 transforms systematically to other coordinates.
  • These are partial differential equations—the unknowns have multiple independent variables, unlike ordinary differential equations.
80

Maxima, Minima, and Saddle Points

13.6 Maxima, Minima, and Saddle Points

🧭 Overview

🧠 One-sentence thesis

Finding extreme values of multivariable functions requires setting all partial derivatives to zero at interior points, checking boundaries separately, and using second derivatives to distinguish minima from maxima and saddle points.

📌 Key points (3–5)

  • Stationary points: Interior extrema occur where all partial derivatives equal zero (∂f/∂x = 0 and ∂f/∂y = 0), creating a horizontal tangent plane.
  • Three candidates for extrema: stationary points (zero derivatives), rough points (no derivative exists), and boundary points (edges of the allowed region).
  • Second derivative test: At a stationary point, examine f_xx, f_xy, and f_yy; the signs of a = f_xx and the product ac - b² (where b = f_xy, c = f_yy) determine whether the point is a minimum, maximum, or saddle point.
  • Common confusion: Positive second derivatives f_xx > 0 and f_yy > 0 do NOT guarantee a minimum—you must also check that f_xx·f_yy > (f_xy)²; otherwise the cross-term dominates and creates a saddle point.
  • Boundary optimization: When the domain is restricted, the minimum or maximum often lies on the boundary, requiring separate one-dimensional optimization along the boundary curve.

🎯 Finding stationary points

🎯 The zero-derivative condition

Stationary point: An interior point where ∂f/∂x = 0 and ∂f/∂y = 0 (equivalently, grad f = 0).

  • At a minimum or maximum, a nonzero derivative would tilt the tangent plane, allowing the function to decrease in some direction.
  • The reasoning reduces to one variable: along any line through the point (say y = y₀), the function must have df/dx = 0 at the minimum.
  • For three variables, add the third equation ∂f/∂z = 0.

📐 Solving for the stationary point

Example: For f(x, y) = x² + xy + y² - x - y + 1:

  • Set f_x = 2x + y - 1 = 0
  • Set f_y = x + 2y - 1 = 0
  • Solve the system: x₀ = 1/3, y₀ = 1/3

Key insight: Quadratic functions produce linear equations for the stationary point, making them straightforward to solve.

🏗️ Steiner's problem

Setup: Find the point nearest to three given corners of a triangle.

Different distance measures yield different solutions:

  • Sum of squared distances (d₁² + d₂² + d₃²): The minimum is at the centroid (average of the three corners), x = (x₁ + x₂ + x₃)/3.
  • Sum of distances (d₁ + d₂ + d₃): The minimum is at the Steiner point where roads to the three corners meet at 120° angles.

The gradient of distance d₁ is a unit vector pointing away from corner 1. At the Steiner point, three unit vectors sum to zero, which only happens when they form 120° angles.

Exception: If any corner has an angle exceeding 120°, that corner itself is the minimum point.

🚧 Boundary optimization

🚧 When boundaries matter

Most applications restrict the domain (e.g., x ≥ 0, or x² + y² ≤ 1). Three possibilities for extrema:

  1. Stationary point (f_x = 0, f_y = 0)
  2. Rough point (derivative doesn't exist)
  3. Boundary point

The excerpt notes that boundaries contain "about 40% of the minima and 80% of the work."

🔄 Optimization along a boundary

Example: Minimize f(x, y) = x² + xy + y² - x - y + 1 on the circle x² + y² = 1.

Two approaches:

  • Parametric: Set x = cos t, y = sin t, then minimize f(t) as a one-variable problem.
  • Substitution: Express y in terms of x using the constraint (leads to square roots).

The parametric approach is cleaner: df/dt = 0 locates candidates on the boundary.

Don't confuse: The absolute minimum at (1/3, 1/3) might lie inside or outside the allowed region; if outside, search the boundary instead.

🔍 The second derivative test

🔍 Distinguishing minima, maxima, and saddles

For a quadratic f(x, y) = ax² + 2bxy + cy² at the origin:

ConditionTypeReason
a > 0 and ac > b²MinimumBoth squared terms positive after completing the square
a < 0 and ac > b²MaximumBoth squared terms negative
ac < b²Saddle pointSquared terms have opposite signs

The test uses:

  • a = f_xx (second derivative in x direction)
  • b = f_xy (mixed derivative)
  • c = f_yy (second derivative in y direction)

⚠️ Why f_xx > 0 and f_yy > 0 aren't enough

Example: f(x, y) = x² + 10xy + y² has f_xx = 2, f_yy = 2, f_xy = 10.

  • Both f_xx and f_yy are positive (curves up in x and y directions).
  • But ac - b² = (2)(2) - (10)² = -96 < 0, so this is a saddle point.
  • At (1, -1), f = 1 - 10 + 1 = -8 < 0, confirming the graph dips below the xy-plane.

The size of f_xy matters, not just its sign. When the cross-term dominates, it creates a saddle.

📊 General functions beyond quadratics

For any function f(x, y), evaluate the second derivatives at the stationary point:

  • a = f_xx, b = f_xy, c = f_yy (all at the stationary point)
  • Apply the same test: compare ac with b²

Why this works: Near a stationary point, the Taylor series begins with quadratic terms (linear terms vanish because first derivatives are zero). The quadratic part determines the local shape.

Example: f(x, y) = eˣ - x - cos y at (0, 0)

  • f_xx = eˣ = 1, f_yy = cos y = 1, f_xy = 0 at the origin
  • ac - b² = (1)(1) - 0 = 1 > 0 and a > 0 → minimum

📈 Taylor series and higher-order terms

📈 Structure of the Taylor series

Around the basepoint (0, 0):

f(x, y) = f(0, 0) + x·(∂f/∂x)₀ + y·(∂f/∂y)₀ + (x²/2)·(∂²f/∂x²)₀ + xy·(∂²f/∂x∂y)₀ + (y²/2)·(∂²f/∂y²)₀ + ...

General term: (xⁿyᵐ)/(n!m!) times the mixed derivative evaluated at (0, 0).

🎭 What each part means

  • Constant term: The function value at the basepoint (not important for finding extrema).
  • Linear terms: Determine the tangent plane; controlled by first derivatives.
  • Quadratic terms: Take over at stationary points where linear terms vanish; determine concavity.
  • Higher terms: Too small to matter near the stationary point.

At a stationary point, the first derivatives are zero, so the series starts with the constant and jumps to quadratics—this is why second derivatives decide the type of extremum.

🔬 Special case: ac = b²

When f_xx·f_yy = (f_xy)², the test is inconclusive (analogous to a one-dimensional inflection point). Higher derivatives must decide, but this is rare in practice.

🖥️ Numerical methods

🖥️ Newton's method for optimization

To solve f_x = 0 and f_y = 0 numerically:

  • At current point (xₙ, yₙ), compute all second derivatives
  • Solve linear equations for steps Δx and Δy:
    • (f_xx)Δx + (f_xy)Δy = -f_x(xₙ, yₙ)
    • (f_xy)Δx + (f_yy)Δy = -f_y(xₙ, yₙ)
  • Update: xₙ₊₁ = xₙ + Δx, yₙ₊₁ = yₙ + Δy

This requires computing second derivatives.

⛰️ Steepest descent alternative

When second derivatives are unavailable:

  • Move in the direction of negative gradient: Δx = -s·(∂f/∂x), Δy = -s·(∂f/∂y)
  • Choose step size s to minimize f along this direction
  • Recompute gradient at the new point and repeat

Like a boulder rolling downhill, following the steepest path at each step.

81

Constraints and Lagrange Multipliers

13.7 Constraints and Lagrange Multipliers

🧭 Overview

🧠 One-sentence thesis

When minimizing or maximizing a function subject to a constraint, the solution occurs where the gradient of the objective function is parallel to the gradient of the constraint, with the Lagrange multiplier measuring the sensitivity of the optimal value to changes in the constraint.

📌 Key points (3–5)

  • The constrained optimization problem: minimize or maximize f(x, y) while another function g(x, y) equals a fixed value k.
  • Geometric insight: at the optimum, the level curve of f is tangent to the constraint curve g = k, meaning their gradients are parallel.
  • The key equation: grad f = λ grad g, where λ (lambda) is the Lagrange multiplier—an unknown number that makes the gradients parallel.
  • Common confusion: don't set f_x = 0 and f_y = 0 as in unconstrained optimization; the constraint changes the problem entirely.
  • What λ means: the multiplier λ equals the derivative of f_min with respect to k, measuring how much the optimal value changes when the constraint changes.

🎯 The constrained optimization problem

🎯 Why constraints matter

  • In practice, we often want to minimize or maximize one function f(x, y) while another function g(x, y) is held fixed at some value k.
  • The constraint g(x, y) = k might represent limited material, funds, or energy.
  • At the absolute minimum of f(x, y) without constraints, the requirement g(x, y) = k is probably violated—that point is not allowed.
  • We cannot simply use f_x = 0 and f_y = 0 because those equations ignore the constraint g.

🔍 Trial and error approach

Example: minimize f = x² + y² subject to g = 2x + y = k.

  • Trying x = 0, y = k gives f = k².
  • Trying x = (1/2)k, y = 0 gives f = (1/4)k², which is smaller.
  • Trying x = y = (1/3)k gives f = (2/9)k², even better.
  • But how do we find the true minimum systematically?

🔄 The tangency condition

🔄 Geometric interpretation

  • The level curves of f(x, y) are curves where f equals a constant c.
  • For f = x² + y², the level curves are circles x² + y² = c.
  • When c is small, these circles don't touch the constraint line 2x + y = k.
  • As c increases, eventually a circle will just touch the constraint line.
  • The touching point is (x_min, y_min), and that value of c is f_min.

📐 What tangency means

Key fact: When the circle touches the line, they are tangent—they have the same slope, and their perpendiculars point in the same direction.

  • The direction perpendicular to f = c is given by grad f = (f_x, f_y).
  • The direction perpendicular to g = k is given by grad g = (g_x, g_y).
  • At the optimum, these two gradient vectors are parallel.
  • One gradient is a multiple of the other: grad f = λ grad g.

Don't confuse: Parallel gradients doesn't mean the curves are parallel—it means they are tangent (touching at exactly one point with the same slope).

🧮 The Lagrange multiplier method

🧮 The three equations

At the minimum of f(x, y) subject to g(x, y) = k:

grad f = λ grad g, which gives:

  • ∂f/∂x = λ ∂g/∂x
  • ∂f/∂y = λ ∂g/∂y
  • g(x, y) = k

Now there are three unknowns (x, y, λ) and three equations.

📝 Solving Example 1

For f = x² + y² with constraint 2x + y = k:

  • ∂f/∂x = λ ∂g/∂x becomes 2x = 2λ
  • ∂f/∂y = λ ∂g/∂y becomes 2y = λ
  • The constraint: 2x + y = k

From the first two equations: 2x = 2λ and y = (1/2)λ. Substitute into the constraint: 2x + y = 2λ + (1/2)λ = (5/2)λ = k. Therefore λ = (2/5)k, giving x = (2/5)k, y = (1/5)k, and f_min = (1/5)k².

🔢 The Lagrange function

An alternative formulation uses a single function:

L(x, y, λ) = f(x, y) − λ(g(x, y) − k)

The three derivatives of L are all zero at the solution:

  • ∂L/∂x = ∂f/∂x − λ ∂g/∂x = 0
  • ∂L/∂y = ∂f/∂y − λ ∂g/∂y = 0
  • ∂L/∂λ = −g + k = 0

Note that ∂L/∂λ = 0 automatically produces g = k—the constraint is "built in" to L.

🔬 The meaning of λ

🔬 Sensitivity interpretation

The Lagrange multiplier λ equals the derivative of f_min with respect to k.

  • If the constraint changes from k to k + Δk, then f_min changes by approximately λ Δk.
  • The multiplier λ measures the sensitivity of the optimal value to changes in the constraint.
  • In Example 1, λ = (2/5)k and f_min = (1/5)k², so df_min/dk = (2/5)k = λ. ✓

Example: If a constraint represents available budget k, then λ tells you how much improvement you get per additional dollar.

Don't confuse: λ is not part of the final answer (x, y)—it's a tool that appears in the equations and happens to have this useful interpretation.

🎲 Multiple constraints

🎲 Two constraints in three dimensions

To minimize f(x, y, z) subject to g(x, y, z) = k₁ and h(x, y, z) = k₂:

Now there are five equations for five unknowns (x, y, z, λ₁, λ₂):

  • ∂f/∂x = λ₁ ∂g/∂x + λ₂ ∂h/∂x
  • ∂f/∂y = λ₁ ∂g/∂y + λ₂ ∂h/∂y
  • ∂f/∂z = λ₁ ∂g/∂z + λ₂ ∂h/∂z
  • g(x, y, z) = k₁
  • h(x, y, z) = k₂

🌐 Geometric picture

  • For f = x² + y² + z², the level surfaces are spheres.
  • Two constraint equations g = k₁ and h = k₂ define two surfaces.
  • The constraints keep (x, y, z) on both surfaces—therefore on the line where they meet.
  • We are finding the point on this line closest to the origin.
  • At the solution, grad f, grad g, and grad h are all perpendicular to the line.
  • Three vectors perpendicular to the same line lie in the same plane.
  • Therefore grad f is a combination: grad f = λ₁ grad g + λ₂ grad h.

📊 Example with linear constraints

Minimize x² + y² + z² when x + y + z = 9 and x + 2y + 3z = 20.

The equations become:

  • 2x = λ₁ + λ₂
  • 2y = λ₁ + 2λ₂
  • 2z = λ₁ + 3λ₂
  • x + y + z = 9
  • x + 2y + 3z = 20

Solving: λ₁ = 2, λ₂ = 2, giving (x, y, z) = (2, 3, 4) and f_min = 29.

⚖️ Inequality constraints

⚖️ When constraints are inequalities

In practice, constraints often involve inequalities:

  • g ≤ k means "you cannot use more than k, but you don't have to use all of it."
  • h ≥ 0 means "this quantity cannot be negative."

Key rule: At the minimum, the multipliers must satisfy the same type of inequalities: λ₁ ≤ 0 for g ≤ k, and λ₂ ≥ 0 for h ≥ 0.

🎯 Two cases for each inequality

For a constraint g ≤ k:

  • Case 1: The minimum is inside the constraint curve (g < k). The constraint is not really constraining. This brings back f_x = 0 and f_y = 0, and λ = 0.
  • Case 2: The minimum is on the constraint curve (g = k). The constraint is active, preventing the minimum from going lower. Then λ ≠ 0.

We don't know in advance which case applies—that's what makes optimization problems interesting.

📐 Linear programming example

Minimize f = 5x + 6y with g = x + y = 4 and h = x ≥ 0 and H = y ≥ 0.

The equations are:

  • 5 = λ₁ + λ₂
  • 6 = λ₁ + λ₃

Since λ₃ > λ₂, we have λ₃ > 0, which means H = y = 0 must be an equation (not a strict inequality). Then x + y = 4 gives x = 4, and the solution is (4, 0) with f_min = 20.

Geometric insight: The constraint curve x + y = 4 is a line segment from one axis to the other. The level curves f = c are parallel lines that move outward as c increases. The first touching point is always at an endpoint or corner of the feasible region.

82

Double Integrals

14.1 Double Integrals

🧭 Overview

🧠 One-sentence thesis

Double integrals extend single-variable integration to functions of two variables by summing small pieces over a region and can be computed by splitting them into two successive single integrals.

📌 Key points (3–5)

  • What double integrals represent: the volume under a surface z = f(x,y) above a region R in the xy-plane, defined as a limit of sums of thin sticks.
  • How to compute them: use Fubini's Theorem to split a double integral into two single integrals (inner and outer), integrating first with respect to one variable, then the other.
  • Order matters for limits: the inner integral has limits that may depend on the outer variable; the outer integral has constant limits.
  • Common confusion: when reversing integration order, you must redraw the region to find new entry/exit values—vertical strips become horizontal strips with different limit expressions.
  • Applications beyond volume: double integrals compute area (when f = 1), mass (when f = density), and moments for finding centroids.

📐 Definition and setup

📐 The limit of sums definition

Double integral: the limit as rectangle dimensions approach zero of the sum of f(xᵢ, yᵢ)ΔA over all small rectangles covering region R.

  • Start by dividing the base region R into small rectangles with area ΔA = (Δx)(Δy).
  • Above each rectangle, imagine a "thin stick" with volume ≈ height × base = f(xᵢ, yᵢ)ΔA.
  • Sum all stick volumes: Σ f(xᵢ, yᵢ)ΔA.
  • Take the limit as Δx → 0 and Δy → 0 to get the exact volume (the double integral).
  • Notation: ∬ᴿ f(x,y) dA.

🔧 Three key properties

PropertyFormulaMeaning
Linearity∬(f + g)dA = ∬f dA + ∬g dAVolumes add
Constant factor∬cf dA = c∬f dAStretch by constant
Region splitting∬ᴿ = ∬ₛ + ∬ₜSplit R into non-overlapping pieces S and T

🔄 Fubini's Theorem: splitting into single integrals

🔄 The fundamental technique

Fubini's Theorem: For continuous f on rectangle R, the double integral equals either iterated integral (y first then x, or x first then y).

  • Instead of computing the limit of sums directly, compute two single integrals in succession.
  • Inner integral: integrate f(x,y) with respect to one variable (say y) while treating the other (x) as constant; result is a function of x.
  • Outer integral: integrate that result with respect to x.
  • The order can be reversed: integrate with respect to x first, then y.

📝 Reading the notation

  • ∫ₐᵇ [∫_c^d f(x,y) dy] dx means:
    • Inner: integrate f(x,y) from y = c to y = d (answer depends on x).
    • Outer: integrate that answer from x = a to x = b (final answer is a number).
  • The brackets are usually omitted; dy is written inside dx.
  • Recommendation from the excerpt: compute inner and outer integrals on separate lines.

⚠️ Limits must match the geometry

  • For a rectangle base: both inner and outer limits are constants.
  • For a non-rectangular region: inner limits often depend on the outer variable.
    • Example: if R is a triangle with y going from 0 to 1 − x, then the inner integral has limits 0 and 1 − x (which depend on x), while the outer integral has constant limits 0 and 1 for x.
  • Don't confuse: inner limits cannot depend on the inner variable itself (e.g., limits on y cannot be functions of y).

🔀 Reversing the order of integration

🔀 Why and how to reverse

  • Sometimes one order is easier or even necessary (e.g., when an antiderivative is unknown in one order but simple in the other).
  • Key step: draw the region R to see the strips in both directions.
  • Vertical strips (dy first): y runs from bottom curve to top curve at fixed x; then x runs over its full range.
  • Horizontal strips (dx first): x runs from left boundary to right boundary at fixed y; then y runs over its full range.

🖼️ Example walkthrough

  • Original: ∫₀² ∫_{x²}^{2x} x³ dy dx.
    • Inner: y from x² (parabola) to 2x (line) → vertical strips.
    • Outer: x from 0 to 2.
  • Reversed: ∫₀⁴ ∫_{y/2}^{√y} x³ dx dy.
    • Inner: x from y/2 (solve y = 2x) to √y (solve y = x²) → horizontal strips.
    • Outer: y from 0 to 4 (where the line and parabola meet).
  • The integrand x³ stays the same; only limits change.

🎯 Practical tip

Always sketch the region R. Without a figure, finding correct limits is very difficult, especially for non-rectangular regions.

🧮 Applications beyond volume

📏 Computing area

  • Set f(x,y) = 1 in the double integral: ∬ᴿ 1 dA = area of R.
  • Example: area of triangle with vertices (0,0), (1,0), (0,1) is ∫₀¹ ∫₀^{1−x} 1 dy dx = ½.

⚖️ Mass and density

Total mass M = ∬ᴿ ρ(x,y) dA, where ρ is the density function.

  • Each small rectangle has mass ≈ ρ(xᵢ, yᵢ)ΔA.
  • Sum and take the limit to get total mass.
  • Example: semicircle with density ρ = y has mass M = ∫₋₁¹ ∫₀^{√(1−x²)} y dy dx = 2/3.

📍 Moments and centroids

  • Moment about x-axis: Mₓ = ∬ᴿ y dA (y is the distance to the x-axis).
  • Moment about y-axis: Mᵧ = ∬ᴿ x dA.
  • Centroid coordinates: x̄ = Mᵧ/M, ȳ = Mₓ/M, where M is total mass (or area if density = 1).
  • Example: the semicircle centroid is at height ȳ = (2/3)/(π/2) = 4/(3π), the "average height" of points in the region.

🎓 When reversing saves the day

  • Some integrals are impossible in one order but easy in the other.
  • Example: ∫₀¹ ∫ᵧ¹ cos(x²) dx dy cannot be done directly (no elementary antiderivative for cos(x²)).
  • Reverse to ∫₀¹ ∫₀ˣ cos(x²) dy dx = ∫₀¹ x cos(x²) dx, which equals ½ sin(1) by substitution.

🔍 Common pitfalls and checks

❌ Typical mistakes

  • Wrong limits: forgetting that inner limits can vary with the outer variable.
  • Limits depending on the wrong variable: inner y-limits cannot depend on y itself.
  • Not sketching R: leads to incorrect entry/exit values when reversing order.

✅ How to verify your answer

  • Compute the integral in both orders; if continuous on R, Fubini guarantees the same result.
  • Check units: volume has units of (height) × (area); area is dimensionless if x and y are dimensionless.
  • For symmetric regions and functions, use symmetry to simplify or check (e.g., x̄ = 0 for a semicircle centered on the y-axis).
83

Change to Better Coordinates

14.2 Change to Better Coordinates

🧭 Overview

🧠 One-sentence thesis

Changing from xy-coordinates to better-suited variables (like polar or rotated coordinates) simplifies the limits of integration for double integrals, provided we correctly transform both the integrand and the area element dA using the Jacobian stretching factor J.

📌 Key points (3–5)

  • Why change coordinates: Regions like tilted squares or rings have miserable limits in xy, but simple rectangular limits in rotated or polar coordinates.
  • What changes when you substitute: Three things must change—(1) the limits of integration, (2) the area element dA, and (3) the integrand itself (e.g., x becomes a function of u,v).
  • The Jacobian determinant J: The stretching factor that converts dx dy into |J| du dv; for polar coordinates J = r, for rotation J = 1, and in general J is the 2×2 determinant of partial derivatives.
  • Common confusion—pure rotation vs general change: Rotation preserves area (J = 1 and dA = du dv), but most coordinate changes stretch or shrink area, so |J| ≠ 1.
  • Practical payoff: Even though the integrand may look worse after substitution, the limits become constants (e.g., 0 to 1, or 0 to 2π), making integration feasible.

🔄 Rotation of axes

🔄 When and why to rotate

  • A tilted square (rotated by angle α) has horrible xy-limits: for each x you need entry and exit points P and Q on slanted sides.
  • Rotating the coordinate system by α aligns the new u,v axes with the square's edges, so limits become simply 0 ≤ u ≤ 1, 0 ≤ v ≤ 1.
  • The geometry is obvious in the rotated frame, but you must convert it into algebra.

🧮 Rotation formulas

The forward and reverse transformations are:

From xy to uvFrom uv to xy
u = x cos α + y sin αx = u cos α − v sin α
v = −x sin α + y cos αy = u sin α + v cos α
  • Memory aid: Follow the corners (1,0) and (0,1) as they rotate.
  • For a pure rotation, the area element is unchanged: dx dy = du dv (the Jacobian J = 1).

📐 Example: tilted unit square

  • Area: ∬ dA over the rotated square = ∫₀¹ ∫₀¹ du dv = 1 (as expected).
  • Moment around y-axis: ∬ x dA = ∫₀¹ ∫₀¹ (u cos α − v sin α) du dv = ½ cos α − ½ sin α.
    • The factors ½ come from integrating u and v from 0 to 1.
  • Center of gravity: x̄ = (moment) / (area) = ½ cos α − ½ sin α, which is the center of the rotated square.
  • Moment of inertia: Iᵧ = ∬ x² dA involves (u cos α − v sin α)²; after integration you get (cos² α)/3 − (cos α sin α)/2 + (sin² α)/3.
  • Key observation: Iₓ + Iᵧ = 2/3 is constant, independent of α, because rotating around a corner does not change the polar moment I₀.

🌀 Polar coordinates

🌀 The polar transformation

  • Formulas: x = r cos θ, y = r sin θ.
  • Area element: dA = r dr dθ (definitely not dr dθ).
    • The factor r accounts for the fact that dθ is an angle, not a length; the arc length is r dθ.

📏 Why dA = r dr dθ

Three derivations are given:

  1. Approximate: A polar rectangle has straight sides ≈ Δr and circular arcs ≈ r Δθ, so area ≈ (Δr)(r Δθ) = r Δr Δθ.
  2. Exact wedge difference: Area of wedge with angle Δθ is ½ r² Δθ; the difference between outer radius r + Δr and inner radius r − Δr gives area = r Δr Δθ (with r at the center).
  3. Jacobian formula (coming later): The determinant J = r emerges automatically from the partial derivatives.

🔵 Example: ring between r = 4 and r = 5

  • Area: ∫₀²ᵖ ∫₄⁵ r dr dθ = 2π · [½ r²]₄⁵ = 2π · (25/2 − 16/2) = 9π.
    • The ring is essentially a giant polar rectangle with Δr = 1, average radius 4.5, and full angle 2π.
  • Moment ∬ x dA: ∫₀²ᵖ ∫₄⁵ (r cos θ) r dr dθ = [⅓ r³]₄⁵ · [sin θ]₀²ᵖ = 0 (by symmetry, since sin 2π − sin 0 = 0).
  • Moment of inertia Iᵧ = ∬ x² dA: ∫₀²ᵖ ∫₄⁵ r² cos² θ · r dr dθ = [¼ r⁴]₄⁵ · (average of cos² θ over 0 to 2π is ½, so integral is π) = ¼(5⁴ − 4⁴) · π.

Don't confuse: The θ integral of cos² θ from 0 to 2π is π, not 2π, because the average value of cos² θ is ½.

🥧 Example: semicircular plates with varying density

Two densities: ρ = 1 (constant) and ρ = 1 − r (decreasing outward).

  • Mass with ρ = 1: M = ∫₀ᵖ ∫₀¹ r dr dθ = π · ½ = π/2 (equals the area).
  • Mass with ρ = 1 − r: M = ∫₀ᵖ ∫₀¹ (1 − r) r dr dθ = π · (½ − ⅓) = π/6 (smaller because density is lower away from origin).
  • Center of gravity ȳ: The moment Mₓ = ∬ y dA = ∬ (r sin θ) r dr dθ.
    • For ρ = 1: Mₓ = ∫₀ᵖ sin θ dθ · ∫₀¹ r² dr = 2 · ⅓ = ⅔, so ȳ = (⅔)/(π/2) = 4/(3π).
    • For ρ = 1 − r: Mₓ = ∫₀ᵖ sin θ dθ · ∫₀¹ r²(1 − r) dr = 2 · (⅓ − ¼) = 1/6, so ȳ = (1/6)/(π/6) = 1/π.
  • Interpretation: ȳ is the average value of y over the region (weighted by density if ρ ≠ 1).

🔔 Example: the bell curve integral

Goal: Compute A = ∫₋∞^∞ e^(−x²) dx (the area under the Gaussian).

  • Trick: Square it: A² = (∫₋∞^∞ e^(−x²) dx)(∫₋∞^∞ e^(−y²) dy) = ∬ e^(−x²−y²) dx dy over the entire plane.
  • Switch to polar: x² + y² = r², so A² = ∫₀²ᵖ ∫₀^∞ e^(−r²) r dr dθ.
    • The r integral: substitute u = r², du = 2r dr, so ½ ∫₀^∞ e^(−u) du = ½.
    • The θ integral: 2π.
    • Therefore A² = 2π · ½ = π, so A = √π.
  • Application to normal distribution: p(x) = e^(−x²/2) / √(2π) has ∫ p(x) dx = 1 after substituting x = √2 y and using the result above.

🔧 The Jacobian determinant

🔧 General formula for the stretching factor

Jacobian J(u,v): The 2×2 determinant J = |∂x/∂u ∂x/∂v|
|∂y/∂u ∂y/∂v|
= (∂x/∂u)(∂y/∂v) − (∂x/∂v)(∂y/∂u).

  • Often written ∂(x,y)/∂(u,v) as a reminder that it generalizes dx/du.
  • The area element transforms as dx dy = |J| du dv.
  • Why absolute value? We keep double integrals running forward (left-to-right, down-to-up); |J| ensures the area is positive.

🧮 Polar coordinates via Jacobian

x = r cos θ, y = r sin θ.

PartialValue
∂x/∂rcos θ
∂x/∂θ−r sin θ
∂y/∂rsin θ
∂y/∂θr cos θ

J = (cos θ)(r cos θ) − (−r sin θ)(sin θ) = r cos² θ + r sin² θ = r.

So dA = r dr dθ, confirming the geometric derivation.

🔲 Linear change: x = au + bv, y = cu + dv

All coefficients a, b, c, d are constant.

J = |a b| = ad − bc (an ordinary determinant). |c d|

  • Example (rotation): a = cos α, b = −sin α, c = sin α, d = cos α gives J = cos² α + sin² α = 1 (no area change).
  • Example (parallelogram): x = ⅔u + ⅓v, y = ⅓u + ⅔v gives J = (⅔)(⅔) − (⅓)(⅓) = 4/9 − 1/9 = ⅓.
    • A unit square in uv (area 1) maps to a parallelogram in xy with area ⅓.
    • Conversely, a 3×3 square in uv (area 9) maps to area 9 · ⅓ = 3 in xy.

🔄 Inverse transformation

If you reverse the change (from xy to uv instead of uv to xy), the new Jacobian is the reciprocal:

∂(u,v)/∂(x,y) = 1 / ∂(x,y)/∂(u,v).

  • The 2×2 matrices of partial derivatives are inverses of each other, so their determinants multiply to 1.
  • This mirrors the single-variable rule: (du/dx) = 1/(dx/du).

🎯 Practical integration examples

🎯 Tilted square with exponential

Region R is a parallelogram in xy; change x = ⅔u + ⅓v, y = ⅓u + ⅔v maps it to the square S: 0 ≤ u ≤ 3, 0 ≤ v ≤ 3.

  • Area: ∬ᴿ dx dy = ∬ₛ |J| du dv = ∫₀³ ∫₀³ ⅓ du dv = ⅓ · 9 = 3.
  • Integral of eˣ: ∬ᴿ eˣ dx dy = ∫₀³ ∫₀³ e^(2u/3 + v/3) · ⅓ du dv.
    • Integrate u: [−(3/2) e^(2u/3)]₀³ = (3/2)(e² − 1).
    • Integrate v: [3 e^(v/3)]₀³ = 3(e − 1).
    • Multiply by ⅓: result is (3/2)(e² − 1)(e − 1) · (1/3) = (1/2)(e² − 1)(e − 1).
  • Main point: The limits 0 and 3 are trivial; the Jacobian ⅓ accounts for the area scaling.

🌐 Why J is a determinant (geometric insight)

  • A small change du gives one side of the curving "rectangle": (∂x/∂u i + ∂y/∂u j) du.
  • A small change dv gives the other side: (∂x/∂v i + ∂y/∂v j) dv.
  • The area of the parallelogram spanned by these vectors is the magnitude of their cross product, which equals |J| du dv.
  • Curvature (from second derivatives) only contributes (du)² and (dv)² terms, which we ignore in the infinitesimal limit.

📊 Average value of a function

Average value of f(x,y) over region R: ∬ᴿ f(x,y) dA / ∬ᴿ dA.

  • The numerator is the integral of f; the denominator is the area.
  • Example: For the semicircle with ρ = 1, ȳ = (∬ y dA) / (area) = (⅔) / (π/2) = 4/(3π) is the average y-coordinate.
  • This generalizes the one-dimensional average ∫ₐᵇ v(x) dx / (b − a).

⚠️ Common pitfalls

⚠️ Forgetting the Jacobian factor

  • Wrong: Changing to polar and writing dA = dr dθ.
  • Right: dA = r dr dθ; the factor r is essential and comes from geometry or the Jacobian.
  • Example: The area of a ring is ∫∫ r dr dθ, not ∫∫ dr dθ.

⚠️ Confusing rotation (J = 1) with general linear change (J ≠ 1)

  • Rotation preserves area because it is orthogonal (perpendicular axes, no stretching).
  • A shear or scaling changes area; e.g., x = 2u, y = 3v has J = 6, so a unit square becomes area 6.

⚠️ Ignoring absolute value of J

  • The Jacobian can be negative (if the transformation reverses orientation).
  • Always use |J| in the area element to keep area positive.
  • Single integrals allow negative dx (running backward), but double integrals conventionally run forward.

⚠️ Not changing the integrand

  • If you integrate ∬ x dA and switch to polar, you must write x = r cos θ.
  • The integrand becomes (r cos θ) · r dr dθ, with two factors of r (one from x, one from dA).

📚 Reference formulas

📚 Moments and moments of inertia

For a region with density ρ(x,y):

QuantityFormulaMeaning
Mass M∬ ρ dATotal mass
Moment Mᵧ∬ x ρ dAMoment around y-axis
Moment Mₓ∬ y ρ dAMoment around x-axis
Center of mass x̄Mᵧ / MAverage x-coordinate
Center of mass ȳMₓ / MAverage y-coordinate
Moment of inertia Iᵧ∬ x² ρ dARotational inertia around y-axis
Moment of inertia Iₓ∬ y² ρ dARotational inertia around x-axis
Polar moment I₀∬ r² ρ dARotational inertia around origin
  • Key identity: I₀ = Iₓ + Iᵧ (because r² = x² + y²).
  • If ρ = 1 (uniform density), mass = area and the center of mass is the centroid.
84

Triple Integrals

14.3 Triple Integrals

🧭 Overview

🧠 One-sentence thesis

Triple integrals compute volumes and other properties of three-dimensional regions by assembling small boxes through three successive integrations, where the key challenge is finding correct limits that describe the region's boundaries.

📌 Key points (3–5)

  • What triple integrals add up: small boxes with volume dV = dx dy dz, assembled by three successive single integrals.
  • How limits work: inner limits describe line lengths through the solid, middle limits describe slice areas, outer limits add up all slices.
  • Order matters: six possible orders (e.g., dx dy dz vs dz dy dx) require different limit expressions, though the final volume is the same.
  • Common confusion: limits can depend on other variables—when integrating dx first, the limits on x can depend on y and z (the variables held constant during that integration).
  • Stretching factor for change of variables: when changing from xyz to uvw coordinates, the volume element transforms by a factor J (e.g., dx dy dz = abc du dv dw for an ellipsoid).

📦 Building blocks of triple integrals

📦 What a triple integral represents

Triple integral: the limit of the sum of f_i ΔV over small boxes of volume ΔV, where f_i is any value of f(x, y, z) in the i-th box.

  • The basic volume element is dV = dx dy dz (length times width times height).
  • The goal is to assemble these small boxes by integration to find total volume or other properties.
  • The excerpt emphasizes: "The main problem will be to discover the correct limits on x, y, z."

🧱 Six important shapes

The excerpt identifies six fundamental three-dimensional shapes:

  • Box (easiest)
  • Prism
  • Cylinder
  • Cone
  • Tetrahedron
  • Sphere (hardest in xyz coordinates, easier in spherical coordinates)

Why these matter: In practice, these are the most important shapes; more complicated regions are harder to visualize clearly.

🔢 How integration order works

🔢 Three stages of integration

The computation breaks down as:

∫∫∫ f(x, y, z) dV is computed from three single integrals ∫ (∫ (∫ f dx) dy) dz.

Each stage has a geometric meaning:

StageWhat it computesWhat is held constant
Inner integral (∫ dx)Length of a line through the solidy and z
Middle integral (∫ dy)Area of a slicez
Outer integral (∫ dz)Volume of all slicesNothing

Example from the box:

  • Inner: ∫ dx = 2 (lines in x-direction have length 2)
  • Middle: ∫ 2 dy = 6 (area of a plane section)
  • Outer: ∫ 6 dz = 6 (total volume)

🔄 Order of integration

  • Six possible orders: dx dy dz, dx dz dy, dy dx dz, dy dz dx, dz dx dy, dz dy dx.
  • Each order requires different limits but gives the same final answer.
  • Don't confuse: The limits on the inner integral can depend on the two outer variables; the limits on the middle integral can depend on the outermost variable; the outer limits are constants.

Example from the prism:

  • Order dx dy dz: limits are x from 0 to 2, y from 0 to 3 - 3z, z from 0 to 1.
  • Order dz dy dx: limits are z from 0 to (3 - y)/3, y from 0 to 3, x from 0 to 2.

🎯 Finding limits from boundaries

🎯 The boundary equation method

To find limits on the inner integral:

  1. Follow a line in that direction through the solid.
  2. Find where it enters and exits the region.
  3. Use the boundary equation to solve for the variable.

Example from the tetrahedron:

  • Boundary plane: x + y + z = 1.
  • A line in the x-direction enters at x = 0 and exits at x = 1 - y - z.
  • So the inner integral has limits from 0 to 1 - y - z.

🎯 Assembling lines into slices

After the inner integral, you have lines assembled into slices.

  • The middle integral limits describe the extent of each slice.
  • Key insight: "We are assembling lines, not points."

Example from the tetrahedron:

  • After integrating dx, we get (1 - y - z) as the length.
  • The y integral goes from 0 to 1 - z (not 1 - y - z).
  • Why? Because at each y up to 1 - z, there is a line; the figure shows lines filling a triangular slice.

🎯 Slices to volume

The result of the middle integral is the area of a slice.

  • This area typically depends on the outer variable.
  • The outer integral multiplies each slice area by its thickness and adds them up.

Example from the sphere:

  • Each horizontal slice at height z is a circle of radius √(1 - z²).
  • Area of slice = π(1 - z²).
  • Volume = ∫ from -1 to 1 of π(1 - z²) dz = 4π/3.

📐 Key examples and patterns

📐 The tetrahedron (four-sided pyramid)

A tetrahedron has four flat triangular faces.

  • Standard tetrahedron: bounded by x = 0, y = 0, z = 0, and x + y + z = 1.
  • Volume formula: (1/3)(base times height) = 1/6 for the standard tetrahedron.
  • Centroid: the average position (x̄, ȳ, z̄) = (1/4, 1/4, 1/4) by symmetry.

How to compute the centroid:

  • Find ∫∫∫ z dV (the moment) and divide by ∫∫∫ dV (the volume).
  • For the standard tetrahedron: ∫∫∫ z dV = 1/24, volume = 1/6, so z̄ = (1/24)/(1/6) = 1/4.

📐 The sphere

Volume inside x² + y² + z² = 1:

  • Limits on x: from -√(1 - y² - z²) to +√(1 - y² - z²).
  • Limits on y: from -√(1 - z²) to +√(1 - z²) (the slice at height z is a circle).
  • Limits on z: from -1 to 1.
  • Shortcut: recognize that ∫ 2√(1 - y² - z²) dy = area of circular slice = π(1 - z²).
  • Final integral: ∫ from -1 to 1 of π(1 - z²) dz = 4π/3.

Don't confuse with:

  • Cone: slices have radius (1 - z), area π(1 - z)², volume π/3.
  • Cylinder: all slices have radius 1, area π, volume π (for height 1).

📐 Dimensional pattern

The excerpt reveals a pattern across dimensions:

DimensionStandard regionVolume/Area
1DInterval to x = 11
2DTriangle to x + y = 11/2
3DTetrahedron to x + y + z = 11/6
4DHypertetrahedron to x + y + z + w = 11/24 (predicted)

Pattern: each dimension divides by the next integer.

🔄 Change of variables and stretching

🔄 The ellipsoid example

Problem: find volume inside x²/a² + y²/b² + z²/c² = 1.

  • Direct integration has "terrible" algebra.
  • Solution: change variables to u = x/a, v = y/b, w = z/c.
  • The boundary becomes u² + v² + w² = 1 (a sphere in uvw space).

🔄 The stretching factor J

When changing variables, the volume element transforms:

  • From u = x/a, we get dx = a du.
  • Similarly dy = b dv and dz = c dw.
  • So dx dy dz = (abc) du dv dw.
  • The constant J = abc is the stretching factor.

Result:

  • Volume of ellipsoid = ∫∫∫ dx dy dz = ∫∫∫ (abc) du dv dw = (abc) × (4π/3) = (4π/3)abc.
  • This generalizes the sphere (a = b = c = 1) to stretched spheres.

Why this works:

  • The uvw boxes are stretched into xyz boxes with no bending or twisting.
  • Each small box's volume is multiplied by the constant factor abc.

Don't confuse: The excerpt notes this is special—other changes of variables produce more complicated stretching factors J that are not constant.

📊 Computing averages and moments

📊 Average value

Average value of f(x, y, z) in a solid V: ∫∫∫ f(x, y, z) dV divided by ∫∫∫ dV.

  • The denominator is the total volume.
  • The numerator is the "weighted sum" of f over the region.

📊 Centroid (average position)

The centroid (x̄, ȳ, z̄) is the average position:

  • x̄ = ∫∫∫ x dV / ∫∫∫ dV
  • ȳ = ∫∫∫ y dV / ∫∫∫ dV
  • z̄ = ∫∫∫ z dV / ∫∫∫ dV

Example from the tetrahedron:

  • ∫∫∫ z dV = ∫ from 0 to 1 of z(1 - z)²/2 dz = 1/24.
  • Volume = 1/6.
  • So z̄ = (1/24)/(1/6) = 1/4.

Why z is constant in inner integrals: When computing ∫∫∫ z dV in the order dx dy dz, z is held constant during the x and y integrations, so it factors out: ∫∫∫ z dx dy dz = ∫ z (∫∫ dx dy) dz = ∫ z × (slice area) dz.

📊 Mass and density

If the density is ρ(x, y, z):

  • Total mass = ∫∫∫ ρ dV.
  • Average density = ∫∫∫ ρ dV / ∫∫∫ dV.

Example: For density ρ = e^z in a region V, total mass = ∫∫∫ e^z dx dy dz.

📊 Moment of inertia

Moment of inertia = ∫∫∫ l² dV, where l is the distance to an axis or point.

  • This measures resistance to rotation.
  • Different axes give different moments of inertia.
85

Cylindrical and Spherical Coordinates

14.4 Cylindrical and Spherical Coordinates

🧭 Overview

🧠 One-sentence thesis

Cylindrical and spherical coordinate systems simplify triple integrals for solids with rotational symmetry, and spherical coordinates reveal that a uniform solid sphere exerts gravitational force as if all its mass were concentrated at its center.

📌 Key points (3–5)

  • Cylindrical coordinates (r, θ, z): natural for cylinders and solids of revolution; volume element is r dr dθ dz (stretching factor J = r).
  • Spherical coordinates (ρ, φ, θ): natural for spheres; volume element is ρ² sin φ dρ dφ dθ (stretching factor J = ρ² sin φ).
  • Common confusion: the polar angle φ is measured down from the z-axis (0 at North Pole, π at South Pole), not up from the xy-plane; also, ρ measures distance from the origin, while r measures distance from the z-axis.
  • Newton's gravitational result: the gravitational attraction of a uniform solid sphere on an outside point equals that of a point mass at the center; inside a hollow sphere, the gravitational force is zero.
  • Order of integration: different orders correspond to different ways of slicing the solid (slices, shells, or wedges), just as in single-variable integration.

📐 Cylindrical coordinates

📐 Definition and conversion

Cylindrical coordinates (r, θ, z): r is the distance from the z-axis, θ is the angle in the xy-plane, and z is the height.

  • Conversion formulas: x = r cos θ, y = r sin θ, z = z.
  • The relationship r² = x² + y² holds.
  • Example: the point (x, y, z) = (1, 1, 1) has r = √2, θ = π/4, z = 1.

🧊 Volume element and stretching factor

  • The small curved box has dimensions dr, dθ, and dz.
  • The base is a "polar rectangle" with area r dr dθ (the r is the stretching factor from polar coordinates).
  • The height is dz.
  • Volume element: dV = r dr dθ dz.
  • Stretching factor: J = r.
  • Don't confuse: the factor r appears because the arc length in the θ-direction is r dθ, not just dθ.

🔄 Six orders of integration

  • There are six possible orders: r-θ-z, r-z-θ, θ-r-z, θ-z-r, z-r-θ, z-θ-r.
  • Each order corresponds to a different way of cutting up the solid:
    • Slices: integrate z last → circular slices at height z.
    • Shells: integrate r last → cylindrical shells at radius r.
    • Wedges: integrate θ last → wedge-shaped pieces at angle θ.
  • Example: for a cone r = 1 − z, integrating r and θ first gives the area of a shell at radius r as r(1 − r)·2π; integrating r and z first gives 1/6 dθ (a wedge).

🌀 Solids of revolution

  • When the solid is symmetric around the z-axis, the θ integral yields 2π (a full rotation).
  • The r integral often goes out to a radius f(z), giving the area of a circular slice as π(f(z))².
  • This leaves the z integral: ∫ π(f(z))² dz, which matches the old volume formula from single-variable calculus.

🌐 Spherical coordinates

🌐 Definition and conversion

Spherical coordinates (ρ, φ, θ): ρ is the distance from the origin, φ is the polar angle measured down from the z-axis, and θ is the azimuthal angle (longitude) in the xy-plane.

  • Two-step conversion: first ρ to (r, z), then (r, θ) to (x, y):
    • r = ρ sin φ, z = ρ cos φ.
    • x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ.
  • Check: x² + y² + z² = ρ²(sin² φ + cos² φ) = ρ².
  • Angle ranges:
    • θ: 0 to 2π (full rotation around z-axis).
    • φ: 0 to π (from North Pole to South Pole).
    • ρ: 0 to R (from center to surface).

🧊 Volume element and stretching factor

  • The small "spherical box" has three curved edges: dρ, dφ, and the horizontal edge.
  • The horizontal edge is ρ sin φ dθ (not just ρ dθ), because the radius of the circle at angle φ is ρ sin φ.
  • Multiplying the three edge lengths: dρ · ρ dφ · ρ sin φ dθ.
  • Volume element: dV = ρ² sin φ dρ dφ dθ.
  • Stretching factor: J = ρ² sin φ.
  • Don't confuse: the factor sin φ appears because the horizontal radius shrinks as you move toward the poles.

🔍 Typical shapes in spherical coordinates

ShapeConditionDescription
Solid sphere (ball)0 ≤ ρ ≤ RAll points within radius R
Surface of sphereρ = RShell at radius R
Upper half-sphere0 ≤ φ ≤ π/2Above the xy-plane
Eastern half-sphere0 ≤ θ ≤ πPositive x-direction
Cone from originφ = constantCuts the surface in a circle (not a great circle)

📏 Volume and surface area examples

  • Volume of a solid ball: ∫₀²π ∫₀π ∫₀ᴿ ρ² sin φ dρ dφ dθ = (1/3 R³)·(2)·(2π) = 4πR³/3.
  • Surface area of a sphere: forget the ρ integral; ∫₀²π ∫₀π R² sin φ dφ dθ = R²·(2)·(2π) = 4πR².
  • Volume above a cone: if the φ integral stops at π/3, the volume is (1/3 R³)·(1/2)·(2π).

🍎 Newton's gravitational problem

🍎 The challenge

  • Newton needed to show that a uniform solid sphere exerts gravitational force on an outside point as if all its mass were concentrated at the center.
  • The difficulty: different parts of the sphere are at different distances from the outside point.
  • The actual distance q from the outside point P (at distance D on the z-axis) to a typical inside point varies.

🔑 Key insight: average of 1/q

  • The average distance q̄ to all points in the sphere is not equal to D.
  • But the average of 1/q is equal to 1/D (this is the crucial result).
  • Don't confuse: the average of 1/2 and 1/4 is not 1/3; averaging 1/q is different from averaging q.

⚡ Gravitational potential

Potential at point P: ∫∫∫ (1/q) dV = (Volume of sphere) / D.

  • A small volume dV at distance q contributes dV/q to the potential.
  • The triple integral adds contributions from the whole sphere.
  • Result: the potential at distance D equals the whole volume (4πR³/3) divided by D.
  • This means the potential is the same as if the sphere were squeezed to a point mass at the center.

💪 Gravitational force

Force at point P: ∫∫∫ (cos α / q²) dV = (Volume of sphere) / D².

  • Force is a vector, so we need the z-component.
  • The angle α is between the force vector (pointing toward dV) and the z-axis.
  • Multiply by cos α to get the z-component.
  • By symmetry, the x and y components are zero.
  • The force is proportional to 1/D² (inverse square law), just as for a point mass.

🧮 Computing the potential integral

  • Use the law of cosines: q² = D² − 2Dρ cos φ + ρ² = u.
  • For the surface integral (fixed ρ), substitute du = 2Dρ sin φ dφ.
  • The integral ∫∫ dA/q over the spherical shell becomes 4πρ²/D.
  • Then integrate over ρ from 0 to R: ∫₀ᴿ 4πρ² dρ / D = 4πR³/3D.
  • This proves the potential formula.

🎯 Inside a hollow sphere

  • When the outside point moves inside a hollow shell (D < ρ), the potential becomes constant (independent of D).
  • Since force is the derivative of potential, and the derivative of a constant is zero, the force inside a hollow sphere is zero.
  • Intuitive explanation: infinitesimal areas on opposite sides of the shell are proportional to q² and Q², but the forces involve 1/q² and 1/Q², so they cancel.
  • Real-world application: the inside of a car is safe from lightning because the charge distributes on the surface, keeping the interior at constant potential (zero force).

🌍 Implications for shells and layers

  • Each spherical shell of area 4πρ² has the property that the average of 1/q is 1/D.
  • The Earth can have shells of different densities, and Newton's result still holds.
  • Each separate shell acts as if its mass were concentrated at the center.

🔄 General coordinate changes

🔄 The Jacobian determinant

Stretching factor J: for a change from (x, y, z) to (u, v, w), the volume element dV = |J| du dv dw, where J is the 3×3 determinant of first partial derivatives.

  • The Jacobian matrix contains nine derivatives:
    • Row 1: ∂x/∂u, ∂x/∂v, ∂x/∂w
    • Row 2: ∂y/∂u, ∂y/∂v, ∂y/∂w
    • Row 3: ∂z/∂u, ∂z/∂v, ∂z/∂w
  • Notation: J = ∂(x, y, z) / ∂(u, v, w).
  • A 3×3 determinant has six terms (one along the main diagonal from pure stretching, five others allowing for rotation).

🧮 Example: spherical coordinates

  • For x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ, the Jacobian is:
    • Row 1: sin φ cos θ, ρ cos φ cos θ, −ρ sin φ sin θ
    • Row 2: sin φ sin θ, ρ cos φ sin θ, ρ sin φ cos θ
    • Row 3: cos φ, −ρ sin φ, 0
  • The determinant has six terms, but two are zero (because of the zero in the corner).
  • The other four terms combine (using sin² θ + cos² θ = 1) to give J = ρ² sin φ.
  • This matches the geometric result from the curved box.

🎯 When to use the determinant

  • For cylindrical and spherical coordinates, geometry gives the stretching factor directly.
  • For other coordinate systems (u, v, w), use the Jacobian determinant formula.
  • The absolute value |J| ensures the volume is positive.
86

Vector Fields

15.1 Vector Fields

🧭 Overview

🧠 One-sentence thesis

Vector fields assign a vector to every point in space, and recognizing whether a field is a gradient field (conservative) unlocks the Fundamental Theorem of Calculus in higher dimensions through concepts like potential functions and field lines.

📌 Key points (3–5)

  • What a vector field is: assigns a vector F(x, y) with components M and N to every point, unlike a scalar function that outputs a single number.
  • Gradient fields are special: a vector field F is a gradient field if F = grad f for some potential function f; these are also called conservative fields.
  • Field lines vs equipotentials: field lines (streamlines) follow the direction of vectors; equipotentials are level curves of the potential f, and the two are always perpendicular.
  • Common confusion: not every vector field is a gradient—radial fields R, R/r, R/r² are gradients, but spin fields S and S/r are not (only S/r² is).
  • Why it matters: gradient fields connect to the Fundamental Theorem; recognizing them simplifies calculations and reveals physical meaning (force, velocity, potential energy).

📐 Defining vector fields

📐 What is a vector field?

Vector field F: assigns to every point (x, y) in region R a vector F(x, y) = M(x, y) i + N(x, y) j.

  • Input: a point (x, y) or (x, y, z).
  • Output: a vector with components that vary from point to point.
  • Contrast with ordinary functions: scalar function f(x) takes a number and returns a number; vector field takes a point and returns a vector.
  • In three dimensions: F = M i + N j + P k, where M, N, P are functions of (x, y, z).

📐 Two functions of two variables

  • A plane vector field involves two functions of two variables: the components M(x, y) and N(x, y).
  • A vector has fixed components; a vector field has varying components.
  • Example: position field R = x i + y j has M = x and N = y, so components grow as you move away from the origin.

🌀 Key examples of vector fields

🌀 Position field R

  • R = x i + y j points radially outward from the origin.
  • Magnitude: |R| = square root of (x² + y²) = r.
  • Direction: outward along rays from the origin.
  • R is a gradient field: it is the gradient of f = (1/2)(x² + y²).

🌀 Unit radial field R/r

  • Divides R by its length to get unit vectors pointing outward.
  • Components: M = x/r, N = y/r.
  • Magnitude: |R/r| = 1.
  • Also a gradient field: gradient of f = r.

🌀 Inverse-square radial field R/r²

  • Components: M = x/(x² + y²), N = y/(x² + y²).
  • Magnitude: |R/r²| = 1/r.
  • Gradient field: all radial fields R/rⁿ are gradients.
  • Physical meaning: gravity and electrostatic force follow inverse-square laws (proportional to R/r³ in three dimensions).

🌪️ Spin field S

  • S = −y i + x j rotates around the origin instead of pointing outward.
  • Magnitude: |S| = r (same as R).
  • Direction: perpendicular to R; their dot product is zero: S · R = (−y)(x) + (x)(y) = 0.
  • Not a gradient field: no function f has ∂f/∂x = −y and ∂f/∂y = x simultaneously.
  • Spin fields S/r and S/r² also exist; only S/r² is a gradient (of the polar angle θ = arctan(y/x)).

🌪️ Don't confuse: radial vs spin

Field typeExampleDirectionGradient?
RadialR, R/r, R/r²Outward from originYes (all)
SpinS, S/rAround originNo
SpinS/r²Around originYes (only this one)

🎯 Gradient fields and potentials

🎯 What is a gradient field?

Gradient field: F = grad f = (∂f/∂x) i + (∂f/∂y) j, where f(x, y) is the potential function.

  • Components M and N are partial derivatives: M = ∂f/∂x, N = ∂f/∂y.
  • The field is everywhere perpendicular to level curves f(x, y) = c.
  • Length |grad f| tells how fast f changes in the direction it changes fastest.
  • Example: if f = x²y, then F = 2xy i + x² j.

🎯 Conservative fields and potentials

  • Gradient fields are called conservative fields.
  • The function f is the potential function.
  • Major goal: recognize gradient fields by a simple test (developed later in the chapter).
  • Physical meaning: in physics, force fields are often minus grad f; electrons flow from high to low potential.

🎯 Why some fields are not gradients

  • If ∂f/∂x = 0, then f doesn't depend on x, so ∂f/∂y cannot equal x.
  • This is why S = −y i + x j is not a gradient: no f satisfies both ∂f/∂x = −y and ∂f/∂y = x.
  • The acceptance of S/r² but rejection of S and S/r is "interesting" and will be explored further.

🌊 Physical examples

🌊 Velocity and flow fields

  • Velocity field V: gives direction and speed of flow at every point in a fluid.
  • Steady flow: no change with time, but velocity can differ at different points.
  • Components: V = v₁ i + v₂ j + v₃ k (velocities in x, y, z directions).
  • Speed: |V| = square root of (v₁² + v₂² + v₃²).
  • Flow field ρV: density ρ times velocity; gives rate of mass transport.

🌊 Examples of velocity fields

  • Compact disc or wheel: V is a spin field (V = ωS, where ω is angular velocity).
  • Tornado: closer to V = S/r² (except dead spot at center).
  • Explosion: V = R/r².

⚡ Force fields

  • Gravity in a room: F = −mg k (downward); gradient of −mgz.
  • Gravity in space: F = −mMG R/r³ (radial inward); Newton's inverse square law.
    • Magnitude proportional to 1/r².
    • Potential: f = mMG/r.
    • Components: ∂f/∂x = −mMGx/r³, ∂f/∂y = −mMGy/r³, ∂f/∂z = −mMGz/r³.
  • Magnetic field: current in a wire produces B, a spin field S/r² around the wire times current strength.

🛤️ Field lines and equipotentials

🛤️ What are field lines?

Field line (integral curve): a curve C such that vectors F(x, y) are tangent to C.

  • The slope dy/dx of curve C equals the slope N/M of vector F = M i + N j.
  • Example: for spin field S = −y i + x j, field lines satisfy dy/dx = x/y, giving circles.
  • Field lines show the direction of the field but lose information about vector length.
  • In fluid flow, field lines are called streamlines; a leaf dropped in a river follows a streamline.

🛤️ Equipotentials

  • When F is a gradient field with potential f, level curves f(x, y) = c are equipotentials.
  • Equipotentials connect points of equal potential.
  • Key relationship: streamlines are perpendicular to equipotentials.
  • Example: for f = xy, equipotentials are hyperbolas xy = c; streamlines (for F = y i + x j) are also hyperbolas x² − y² = constant.

🛤️ Orthogonal trajectories

  • Gradient field F is tangent to field lines and perpendicular to equipotentials.
  • In the gradient direction, f changes fastest; in the level direction, f doesn't change at all.
  • Proof: chain rule along f(x, y) = c gives (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt) = 0, or (grad f) · (tangent to level curve) = 0.
  • The plane is crisscrossed by "orthogonal trajectories"—curves meeting everywhere at right angles.

🛤️ Don't confuse: field lines vs equipotentials

  • Field lines: follow the vector direction; show where particles move.
  • Equipotentials: perpendicular to field lines; show where potential is constant.
  • Example: gravity field R/r³ has field lines as rays (inward) and equipotentials as circles (where f = 1/r is constant).
  • Example: spin field S/r² has field lines as circles and equipotentials as rays (θ = c).
87

Line Integrals

15.2 Line Integrals

🧭 Overview

🧠 One-sentence thesis

Line integrals measure work or flow along a curve, and for conservative (gradient) fields the work depends only on endpoints, not the path taken.

📌 Key points (3–5)

  • What line integrals measure: work done by a force field along a curve (∫ F · T ds) or flow across a curve (∫ F · n ds).
  • Conservative vs non-conservative fields: conservative fields are gradients of a potential function f, so work equals f(Q) − f(P) and is path-independent; non-conservative fields have path-dependent work.
  • How to compute: convert everything to a parameter t, so ∫ F · dR = ∫ M dx + N dy becomes an integral over t from start to finish.
  • Common confusion: the spin field S = −y i + x j looks simple but is not conservative—different paths give different work values (Example 4 gave 1, π/2, and 0).
  • Quick test for conservative fields: check if ∂M/∂y = ∂N/∂x (test D); if true, the field is a gradient and work is path-independent.

📐 What line integrals are

📐 Two main types

The excerpt introduces two physical interpretations:

TypeFormulaMeaningWhat F represents
Work∫_c F · T dsForce component along movementForce field
Flow (flux)∫_c F · n dsFlow component perpendicular to curveFlow field
  • T = unit tangent vector (direction of movement along the curve)
  • n = unit normal vector (perpendicular to the curve)
  • ds = infinitesimal step along the curve (arc length element)

📐 The differential dR

dR = T ds = dx i + dy j

  • This is a small movement vector along the curve.
  • Work becomes F · dR = M dx + N dy (in two dimensions).
  • In three dimensions: F · dR = M dx + N dy + P dz.
  • Example: pushing a piano horizontally against gravity does no work because the force (gravity) is perpendicular to the movement (horizontal); carrying it upstairs does work P dz where P is the weight and dz is the height change.

🛤️ How to compute line integrals

🛤️ Convert to a parameter t

The formal definition (limit of sums) is not used in practice. Instead:

∫ g(x, y) ds = ∫_{t=a}^{t=b} g(x(t), y(t)) √[(dx/dt)² + (dy/dt)²] dt

  • The curve is given by points (x(t), y(t)) from t = a to t = b.
  • The square root √[(dx/dt)² + (dy/dt)²] is the speed ds/dt.
  • In three dimensions, add (dz/dt)² under the square root.

🛤️ Work integral in practice

For work ∫ F · dR = ∫ M dx + N dy:

  1. Write the path as x(t), y(t).
  2. Substitute into M(x, y) and N(x, y) to get functions of t.
  3. Compute dx/dt and dy/dt.
  4. Integrate M(dx/dt) + N(dy/dt) from start time to finish time.

Example (spin field): F = −y i + x j on the straight line x = 1 − t, y = t from (1,0) to (0,1):

  • M = −y = −t, N = x = 1 − t
  • dx/dt = −1, dy/dt = 1
  • ∫₀¹ [−t(−1) + (1−t)(1)] dt = ∫₀¹ 1 dt = 1

Don't confuse: The answer depends on the path, not the choice of parameter. Traveling the same path at a different speed (different parameterization) gives the same work.

🛤️ Example: coil spring mass

Points on a spring: (x, y, z) = (cos t, sin t, t), density ρ = 4, two complete turns (t = 0 to 4π).

  • (dx/dt)² + (dy/dt)² + (dz/dt)² = sin²t + cos²t + 1 = 2
  • ds/dt = √2
  • mass = ∫₀^(4π) 4√2 dt = 16π√2

🔄 Conservative fields and path independence

🔄 When work is path-independent

Conservative field: A field F where the work ∫ F · dR depends only on the starting point P and ending point Q, not on the path between them.

  • For a conservative field, work around any closed path is zero: ∮ F · dR = 0.
  • Going from P to Q on one path, then back to P on another path, gives total work = 0.
  • Therefore: if two paths from P to Q give different work, the field is not conservative.

🔄 Gradient fields

Gradient field: F = (∂f/∂x) i + (∂f/∂y) j = grad f for some potential function f(x, y).

Fundamental Theorem for line integrals:

If F = grad f, then ∫_c F · T ds = f(Q) − f(P).

  • The work is just the change in the potential function.
  • F · dR = (∂f/∂x) dx + (∂f/∂y) dy = df (the total differential).
  • Integrating df from P to Q gives f(Q) − f(P).

Example (constant wind): F = M i (wind blowing east with constant force M).

  • This is the gradient of f = Mx.
  • From Atlanta (x = 1000) to Los Angeles (x = −1000): work = M(−1000) − M(1000) = −2000M.
  • The work is negative because the wind opposes the westward movement.
  • This is true on any path (straight, semicircle, bent line) because F is a gradient field.

🔄 Non-conservative example

F = My i (wind proportional to height y, blowing east).

  • On the straight path (y = 0): no force, so work = 0.
  • On the semicircle (y = 1000 sin t): work = −(π/2) × 10⁶ M (enormous).
  • Different paths give different work → not conservative.
  • Why not a gradient: No function f can have ∂f/∂x = My and ∂f/∂y = 0 simultaneously (the first requires f to depend on y; the second forbids it).

🧪 Four equivalent properties of conservative fields

🧪 The four statements

For F = M(x, y) i + N(x, y) j in a region with no holes:

PropertyStatement
AWork around every closed path is zero: ∮ F · dR = 0
BWork from P to Q depends only on P and Q, not the path
CF is a gradient field: M = ∂f/∂x and N = ∂f/∂y for some f
DThe components satisfy ∂M/∂y = ∂N/∂x

A field with one of these properties has them all.

🧪 Test D: the quick check

Test D is the practical way to check if a field is conservative:

  • Compute ∂M/∂y and ∂N/∂x.
  • If they are equal, the field is conservative (in a region with no holes).
  • If they are not equal, the field is definitely not conservative.

Example (gradient field): F = 2xy i + x² j

  • M = 2xy, N = x²
  • ∂M/∂y = 2x, ∂N/∂x = 2x ✓
  • Test D is passed → F is conservative.

Example (spin field): F = −y i + x j

  • M = −y, N = x
  • ∂M/∂y = −1, ∂N/∂x = +1 ✗
  • Test D fails → F is not conservative.

🧪 Why the properties are equivalent

  • C implies B: If F = grad f, then work = f(Q) − f(P) by the Fundamental Theorem, which depends only on endpoints.
  • B implies A: A closed path goes from P to Q and back to P; if work depends only on endpoints, the two legs cancel and total work = 0.
  • B implies C: If work is path-independent, define f(Q) as the work to reach Q from a fixed point P. Then grad f = F (the construction in the excerpt).

🔨 Constructing the potential function

🔨 Three solution methods

When F = M i + N j is conservative, find f such that ∂f/∂x = M and ∂f/∂y = N.

Method 1 (integrate along a path):

  • Choose P = (0, 0).
  • Integrate M dx + N dy along the x-axis to (x, 0), then up to (x, y).
  • On the x-axis, y = 0 and dy = 0.
  • On the vertical segment, x is fixed and dx = 0.

Method 2 (straight-line path):

  • Integrate M dx + N dy on the line (xt, yt) from t = 0 to t = 1.
  • Substitute x = xt, y = yt, dx = x dt, dy = y dt.

Method 3 (direct integration):

  • Integrate ∂f/∂x = M with respect to x, treating y as constant.
  • This gives f = (integral of M) + C(y), where C(y) is an arbitrary function of y.
  • Differentiate with respect to y and set equal to N to find C(y).

🔨 Example: F = 2xy i + x² j

Method 1:

  • ∫₍₀,₀₎^(x,0) 2xy dx = 0 (because y = 0)
  • ∫₍ₓ,₀₎^(x,y) x² dy = x²y
  • Result: f = x²y

Method 2:

  • ∫₀¹ [2(xt)(yt)(x dt) + (xt)²(y dt)] = ∫₀¹ 3x²y t² dt = x²y t³ |₀¹ = x²y

Method 3:

  • ∂f/∂x = 2xy → f = x²y + C(y)
  • ∂f/∂y = x² + C'(y) must equal N = x² → C'(y) = 0 → C(y) = constant
  • Result: f = x²y (plus an arbitrary constant)

Verification: ∂f/∂x = 2xy ✓ and ∂f/∂y = x² ✓

🔨 When construction fails

For the spin field F = −y i + x j:

Attempted Method 3:

  • ∂f/∂x = −y → f = −xy + C(y)
  • ∂f/∂y = −x + C'(y) must equal N = x
  • This requires −x + C'(y) = x, so C'(y) = 2x.
  • Impossible: C'(y) cannot depend on x.
  • Conclusion: no potential function exists; the field is not conservative.

⚡ Energy conservation

⚡ Kinetic plus potential energy

When a force field does work on a mass m:

  • Newton's Law: F = m a = m dv/dt (vector form)
  • Work = ∫ F · dR = ∫ (m dv/dt) · (v dt) = (1/2) m |v|² evaluated from P to Q
  • Work equals the change in kinetic energy (1/2) m |v|².

For a gradient field F = grad f:

  • Work = ∫ F · dR = ∫ df = f(Q) − f(P)
  • But physics uses the convention: work = f(P) − f(Q) (with a minus sign).

Combining:

  • (1/2) m |v(Q)|² − (1/2) m |v(P)|² = f(P) − f(Q)
  • Rearranging: (1/2) m |v(P)|² + f(P) = (1/2) m |v(Q)|² + f(Q)

Conservation of energy: The total energy (kinetic + potential) is the same at P and Q.

Don't confuse: This only holds for conservative (gradient) fields. For non-conservative fields like friction, energy is not conserved—it is dissipated.

88

Green's Theorem

15.3 Green’s Theorem

🧭 Overview

🧠 One-sentence thesis

Green's Theorem extends the Fundamental Theorem of Calculus to two dimensions by equating a double integral over a region to a line integral around its boundary, enabling calculation of work, flux, and area through boundary information alone.

📌 Key points (3–5)

  • Core idea: integrate a derivative using only boundary information—a one-dimensional integral equals evaluation at endpoints; Green's Theorem does this in two dimensions.
  • Two forms: the tangential form relates work (circulation) around a closed curve to curl inside; the normal form relates flux through the boundary to divergence inside.
  • Conservative fields: pass test D (partial M with respect to y equals partial N with respect to x), have zero curl, possess a potential function f, and do zero work around closed paths.
  • Source-free fields: have zero divergence (partial M with respect to x plus partial N with respect to y equals zero), have zero flux through closed curves, and possess a stream function g.
  • Common confusion: a field can pass test D or have zero divergence everywhere except at isolated points (like the origin)—holes in the region matter; the region must be simply connected (no holes) for potentials and stream functions to exist globally.

🔄 The two forms of Green's Theorem

🔄 Tangential form (work and circulation)

The closed line integral of M dx + N dy around curve C equals the double integral of (partial N with respect to x minus partial M with respect to y) over region R.

  • What it measures: work done by force field F = M i + N j around a closed path, or circulation.
  • The boundary integral: integral around C of F · T ds = integral of M dx + N dy.
  • The region integral: double integral over R of (partial N / partial x - partial M / partial y) dx dy.
  • Physical meaning: circulation around the boundary equals the total curl (spin) inside.

🔄 Normal form (flux and divergence)

The closed line integral of M dy - N dx around curve C equals the double integral of (partial M with respect to x plus partial N with respect to y) over region R.

  • What it measures: flow of fluid through the boundary (flux).
  • The boundary integral: integral around C of F · n ds = integral of M dy - N dx.
  • The region integral: double integral over R of (partial M / partial x + partial N / partial y) dx dy.
  • Physical meaning: net flow out through the boundary equals the total source strength (divergence) inside.
  • How to derive: turn the tangential form by 90 degrees—replace N with M and M with negative N.

🧭 Direction conventions

  • Counterclockwise around the outside boundary keeps the region R on your left.
  • Clockwise around holes (inner boundaries) also keeps R on your left.
  • The normal vector n points outward; it is T rotated 90 degrees clockwise.
  • The relation: n ds = dy i - dx j.

📐 Computing area using Green's Theorem

📐 Three equivalent area formulas

The area of region R can be computed by line integrals around boundary C:

FormulaMNWhy it works
integral of x dy0xpartial N / partial x = 1
negative integral of y dxy0partial M / partial y = -1
(1/2) integral of (x dy - y dx)-y/2x/2both partials sum to 1

Example: For a triangle with vertices at origin, (2,0), and (0,2):

  • Two sides (on axes) contribute zero because x=0 or y=0.
  • The sloping side x = 2 - y has dx = -dy.
  • Computing (1/2) integral of (x dy - y dx) gives area = 2.

Example: For an ellipse x = a cos t, y = b sin t:

  • The differential x dy - y dx simplifies to ab dt.
  • Integrating from 0 to 2π gives area = πab.
  • When a = b = r (circle), this gives πr².

📐 Understanding the strip interpretation

  • Upward dy times x (at right edge) plus downward dy times x (at left edge) = area of horizontal strip.
  • Rightward dx times y (at top) plus leftward dx times y (at bottom) = negative area of vertical strip.
  • The line integral automatically adds these contributions correctly.

🌀 Conservative fields and test D

🌀 What makes a field conservative

A field F = M i + N j is conservative if it is the gradient of some potential function f, meaning M = partial f / partial x and N = partial f / partial y.

Test D (the quick test):

  • Check whether partial M / partial y equals partial N / partial x.
  • Why it works: if f exists, then partial M / partial y = partial²f / partial y partial x and partial N / partial x = partial²f / partial x partial y, and mixed partials are equal (f_xy = f_yx).

Properties of conservative fields:

  • Work around any closed path is zero.
  • Work from P to Q is the same along all paths (path-independent).
  • The work equals f(Q) - f(P).
  • The curl (N_x - M_y) is zero.

🌀 Finding the potential function

When test D is passed:

  1. Integrate M with respect to x: f(x,y) = integral of M dx + C(y).
  2. The "constant" C(y) may depend on y.
  3. Differentiate with respect to y and set equal to N to find C(y).

Example: For F = y i + x j:

  • Check: partial M / partial y = 1, partial N / partial x = 1. Test D passed.
  • Integrate M = y: f = xy + C(y).
  • Differentiate: partial f / partial y = x + C'(y) = N = x, so C'(y) = 0.
  • Potential: f = xy.

⚠️ The origin problem (holes matter)

Example: The spin field S/r² = (-y i + x j)/(x² + y²):

  • Test D appears to pass: both partials equal (y² - x²)/(x² + y²)².
  • But the field blows up at the origin (undefined when r = 0).
  • Around a circle enclosing the origin, the work integral is 2π (not zero!).
  • The potential f = θ (angle) increases by 2π around the origin.
  • Don't confuse: the field passes test D everywhere except at one point, but that's enough to make it non-conservative on regions containing the origin.
  • The double integral of zero does not equal 2π—there's a "delta function" at the origin.

💧 Source-free fields and stream functions

💧 What makes a field source-free

A field F = M i + N j is source-free if it has zero divergence: partial M / partial x + partial N / partial y = 0.

Test H (the divergence test):

  • Check whether partial M / partial x + partial N / partial y = 0.
  • This is the quick test for source-free fields.

Properties of source-free fields (E–F–G–H):

  • E: Total flux through every closed curve is zero.
  • F: Flux from P to Q is the same across all curves.
  • G: There exists a stream function g where M = partial g / partial y and N = -partial g / partial x.
  • H: The divergence is zero.

💧 Physical interpretation

  • Steady flow: whatever flows out through the boundary must be replaced inside.
  • Zero divergence: no sources (adding fluid) or sinks (removing fluid).
  • Balance equation: flow through C (out minus in) = replacement in R (source minus sink).

Example: The spin field S = -y i + x j:

  • Divergence: partial(-y)/partial x + partial(x)/partial y = 0 + 0 = 0.
  • Stream function: g = (1/2)(y² - x²).
  • Check: partial g / partial y = y = M, partial g / partial x = -x, so -partial g / partial x = x = N.
  • Flow enters and leaves any closed curve equally—no net flux.

💧 Point sources

Example: The radial field R/r² = (x i + y j)/(x² + y²):

  • Divergence appears to be zero: the two partials cancel.
  • But flux through any circle around the origin is 2π (not zero!).
  • The field blows up at the origin.
  • There is a point source at the origin with strength 2π.
  • Don't confuse: like the conservative case, one bad point creates a hole that changes everything.

⭐ The best fields: both conservative and source-free

⭐ Cauchy-Riemann equations

When F is both conservative (has potential f) and source-free (has stream function g):

The connections:

  • M = partial f / partial x = partial g / partial y
  • N = partial f / partial y = -partial g / partial x

These are the Cauchy-Riemann equations, linking the potential and stream function.

⭐ Laplace's equation

Both f and g satisfy Laplace's equation:

  • partial²f / partial x² + partial²f / partial y² = 0
  • partial²g / partial x² + partial²g / partial y² = 0

Why:

  • From conservative: partial²f / partial x² + partial²f / partial y² = partial M / partial x + partial N / partial y = divergence = 0.
  • From source-free: partial²g / partial x² + partial²g / partial y² = -partial N / partial x + partial M / partial y = -curl = 0.

Example: F = y i + x j:

  • Conservative: test D passes (M_y = 1 = N_x). Potential: f = xy.
  • Source-free: divergence = 0 + 0 = 0. Stream function: g = (1/2)(y² - x²).
  • Check Laplace: f_xx + f_yy = 0 + 0 = 0; g_xx + g_yy = -1 + 1 = 0.
  • Equipotentials (f = constant) are hyperbolas xy = constant.
  • Streamlines (g = constant) are perpendicular hyperbolas y² - x² = constant.

⭐ Comparison table

PropertyConservative fieldSource-free field
Quick testM_y = N_x (test D)M_x + N_y = 0 (test H)
Function existsPotential fStream function g
Boundary integralWork = 0Flux = 0
Inside equals zeroCurl (N_x - M_y)Divergence (M_x + N_y)
RelationsM = f_x, N = f_yM = g_y, N = -g_x

Don't confuse: having one property doesn't guarantee the other—a field can be conservative without being source-free (example: 2x i + 2y j has potential x² + y² but divergence = 4), or source-free without being conservative (example: 2y i - 2x j has stream function x² + y² but curl ≠ 0).

89

Surface Integrals

15.4 Surface Integrals

🧭 Overview

🧠 One-sentence thesis

Surface integrals extend the idea of line integrals to two-dimensional curved surfaces, allowing us to compute both surface area and flux through a surface by using either direct xyz-coordinates or parametric representations.

📌 Key points (3–5)

  • Two methods for computing surface integrals: Method 1 treats the surface as z = f(x,y) using x,y as parameters; Method 2 uses general parameters u,v for surfaces that may twist or close.
  • Surface area element dS: involves a square root of derivatives (like arc length ds), computed from the cross product of tangent vectors.
  • Flux integrals F·n dS: the square root cancels because the unit normal n has it in the denominator, simplifying calculations significantly.
  • Common confusion: the shadow vs the surface—integration limits come from the projection (shadow) in the base plane, not from the curved surface itself.
  • Orientability requirement: flux integrals require choosing a consistent normal direction n; some surfaces like the Möbius strip have only one side and cannot be oriented.

📐 Two fundamental methods

📐 Method 1: Surface as z = f(x,y)

A surface expressed as a function z = f(x,y) where each (x,y) pair gives exactly one z value.

  • The surface is the graph of a function over the xy-plane.
  • Tangent vectors: A = i + (∂z/∂x)k and B = j + (∂z/∂y)k
  • Normal vector: N = A × B = (∂z/∂x)i - (∂z/∂y)j + k
  • Area element: dS = |N| dx dy = √(1 + (∂z/∂x)² + (∂z/∂y)²) dx dy

Example: For the plane z = x + 2y, the derivatives are ∂z/∂x = 1 and ∂z/∂y = 2, giving |N| = √6. A flat area A in the base becomes √6·A on the sloping plane—like a roof having more area than the room below.

Limitation: This method cannot handle surfaces that fold back on themselves (like a complete sphere, which has both upper and lower z values for the same (x,y)).

🔄 Method 2: Parametric surfaces x(u,v), y(u,v), z(u,v)

A surface described by three coordinate functions of two parameters u and v.

  • All three coordinates x, y, z are expressed in terms of u and v.
  • Tangent vectors: A = (∂x/∂u)i + (∂y/∂u)j + (∂z/∂u)k and B = (∂x/∂v)i + (∂y/∂v)j + (∂z/∂v)k
  • Normal vector: N = A × B (computed as a 3×3 determinant)
  • Area element: dS = |N| du dv

Example—Cone: x = u cos v, y = u sin v, z = u gives A = cos v i + sin v j + k and B = -u sin v i + u cos v j. The cross product yields N = -u cos v i - u sin v j + u k with length √2·u, so dS = √2 u du dv.

Example—Cylinder: x = cos v, y = sin v, z = u gives dS = du dv (the small pieces are rectangles with sides of length 1).

Advantage: Parameters allow surfaces to close up (like a sphere) or twist, which graphs of functions cannot do.

🧮 Computing surface area

🧮 The integration process

  • Step 1: Compute dS using one of the two methods.
  • Step 2: Identify the shadow (projection) in the parameter space.
  • Step 3: Integrate dS over the shadow region.

Don't confuse: The surface itself vs its shadow—you integrate over the shadow (the projection), not over the curved surface directly.

🎯 Cone example revisited

For the cone z = √(x² + y²) up to height z = a:

  • Method 1 gives dS = √2 dx dy (constant slope factor).
  • The shadow is the circle x² + y² = a² in the base.
  • Surface area = ∫∫(shadow) √2 dx dy = √2 · πa².

With parameters (u = radius, v = angle), the same cone has dS = √2 u du dv, and integrating from u = 0 to a and v = 0 to 2π gives the same answer—parameters automatically switch to polar coordinates.

🌐 Sphere example

For a sphere of radius a:

  • Method 1 problem: z = √(a² - x² - y²) gives only the upper hemisphere; the lower half requires the negative square root.
  • Method 2 solution: Using spherical coordinates (u = θ angle from pole, v = φ around equator), the area element becomes dS = a² sin θ dθ dφ, which integrates to 4πa² for the complete sphere.

💨 Flux integrals F·n dS

💨 Why flux is simpler than area

Flux = ∫∫ F·n dS measures the flow of a vector field F through a surface in the direction of the unit normal n.

  • The unit normal is n = N/|N| (normal vector divided by its length).
  • Key simplification: n dS = (N/|N|) · |N| dx dy = N dx dy—the square root cancels!
  • For z = f(x,y): F·n dS = (M·∂f/∂x - N·∂f/∂y + P) dx dy, where F = Mi + Nj + Pk.

Example—Plane z = x + 2y: The vector n dS = (-i - 2j + k) dx dy (no square root). For the flow field F = k, the flux F·n dS = 1 dx dy equals the shadow area—slope makes no difference because the same "rain" passes through both the tilted plane and the base.

🎪 Flux with parameters

For parametric surfaces: F·n dS = F·N du dv = F·(A × B) du dv.

Example—Cylinder: For x² + y² = 1, 0 ≤ z ≤ b with F = xi + yj + zk:

  • Parameters: x = cos u, y = sin u, z = v.
  • Normal: N = A × B = cos u i + sin u j (points outward).
  • F·N = cos²u + sin²u = 1.
  • Flux through the side = ∫₀^(2π) ∫₀^b 1 dv du = 2πb.

Don't confuse: This flux is only through the cylindrical side, not including the top and bottom caps.

⚠️ The orientability issue

A surface is orientable when you can consistently choose a normal direction n everywhere on the surface.

  • Closed surfaces: Convention is n points outward (outward flux is positive).
  • Open surfaces: Either direction is acceptable, but you must choose one.
  • Möbius strip: Has only one side—moving a normal vector continuously around the strip brings it back pointing the opposite direction, making flux integrals impossible.

📋 Summary table

AspectMethod 1: z = f(x,y)Method 2: Parameters u,v
Surface typeGraph of a functionAny surface (can twist/close)
Tangent vectorsA = i + (∂z/∂x)k, B = j + (∂z/∂y)kA = (∂x/∂u)i + (∂y/∂u)j + (∂z/∂u)k, B = (∂x/∂v)i + (∂y/∂v)j + (∂z/∂v)k
Normal N(∂z/∂x)i - (∂z/∂y)j + kA × B (3×3 determinant)
Area element√(1 + (∂z/∂x)² + (∂z/∂y)²) dx dy|A × B| du dv
Flux element(M·∂f/∂x - N·∂f/∂y + P) dx dyF·(A × B) du dv
Integration regionShadow in xy-planeParameter rectangle
90

The Divergence Theorem

15.5 The Divergence Theorem

🧭 Overview

🧠 One-sentence thesis

The Divergence Theorem generalizes the two-dimensional flux balance (Green's Theorem) to three dimensions by equating the flux of a vector field through a closed surface to the triple integral of its divergence over the enclosed volume.

📌 Key points (3–5)

  • What the theorem states: The flux of F through a closed surface S equals the triple integral of div F over the volume V inside.
  • Physical meaning: "flow out minus flow in equals source"—divergence measures the source (or sink) density at each point.
  • Divergence-free fields: When div F = 0, flow in equals flow out (conservation); spin fields and certain radial fields have this property.
  • Common confusion: The field R divided by rho cubed has div F = 0 everywhere except at the origin, where a point source creates a delta-function singularity; the flux through any surface enclosing the origin is 4 pi, not zero.
  • Why it matters: The theorem underpins equilibrium equations in physics (electrostatics, heat flow, elasticity) and enables easier calculation by converting difficult surface integrals into volume integrals (or vice versa).

🔁 The theorem and its meaning

🔁 Statement of the Divergence Theorem

Divergence Theorem: The flux of F = M i + N j + P k through the boundary surface S equals the integral of the divergence of F inside V:

F · n dS = ∭ div F dV = ∭ (∂M/∂x + ∂N/∂y + ∂P/∂z) dx dy dz

  • The left side is a surface integral (flux through the closed surface S).
  • The right side is a volume integral (total divergence inside V).
  • This is the three-dimensional version of Green's Theorem in normal form.

🌊 Physical interpretation

  • Divergence measures source density: div F at a point tells how much "stuff" is being created (positive divergence) or destroyed (negative divergence) per unit volume.
  • Balance law: The net outward flux through S must equal the total source inside V.
  • Example: If div F = 0 everywhere inside, then ∬ F · n dS = 0 (what flows out through one part of S flows back in through another).

🧮 Comparison with Green's Theorem

DimensionRegionBoundaryDivergenceTheorem
2DPlane region RClosed curve C∂M/∂x + ∂N/∂yGreen's Theorem (normal form)
3DSolid volume VClosed surface S∂M/∂x + ∂N/∂y + ∂P/∂zDivergence Theorem
  • The new term ∂P/∂z accounts for upward flow in three dimensions.
  • A constant upward component P adds nothing to divergence (derivative is zero) and nothing to flux (flow up through top = flow up through bottom).

🌀 Divergence-free fields

🌀 What "divergence-free" means

Divergence-free field: A vector field F with div F = 0 at every point.

  • For such fields, the Divergence Theorem gives ∬ F · n dS = ∭ 0 dV = 0.
  • Physical meaning: Flow in equals flow out; there is conservation of fluid (no sources or sinks).

🔄 Example: Spin fields

  • The spin field −y i + x j + 0 k (spinning around the z-axis) has div F = 0.
  • Another spin field 0 i − z j + y k (spinning around the x-axis) also has div F = 0.
  • All three partial derivatives ∂M/∂x, ∂N/∂y, ∂P/∂z are zero separately.
  • Flow goes around in circles; whatever goes out through S comes back in.

📏 Example: Position field R

  • R = x i + y j + z k has div R = 1 + 1 + 1 = 3.
  • This is radial flow, straight out from the origin.
  • Mass must be added at every point to keep the flow going.
  • The flux through any closed surface is three times the volume: ∬ R · n dS = 3V.
  • Example: For a cylinder of volume b, the flux is 3b (confirmed by direct calculation in Section 15.4).

⚡ Example: Electrostatic and gravity fields

  • An electrostatic field R/rho³ or gravity field −R/rho³ almost has div F = 0.
  • Here R = x i + y j + z k and rho = √(x² + y² + z²).
  • Calculation in three steps:
    1. ∂ρ/∂x = x/ρ, ∂ρ/∂y = y/ρ, ∂ρ/∂z = z/ρ (but F is R/ρ³, not ρ).
    2. ∂M/∂x = ∂/∂x(x/ρ³) = 1/ρ³ − 3x²/ρ⁵; similarly for ∂N/∂y and ∂P/∂z with y² and z².
    3. div F = 3/ρ³ − 3(x² + y² + z²)/ρ⁵ = 3/ρ³ − 3/ρ³ = 0.
  • Don't confuse: This is true except at the origin where ρ = 0 and M, N, P are infinite.

🎯 The paradox at the origin

  • For a sphere of radius rho (constant), the outward unit normal is n = R/ρ.
  • The flux is F · n = (R/ρ³) · (R/ρ) = ρ²/ρ⁴ = 1/ρ² (always positive).
  • Integrating over the sphere: ∬ F · n dS = ∬ dS/ρ² = 4πρ²/ρ² = 4π.
  • But if div F = 0 everywhere, the Divergence Theorem would give ∭ 0 dV = 0, not 4π.
  • Resolution: There is a point source at the origin (a delta function times 4π); the Divergence Theorem does not apply unless delta functions are allowed.
  • Key insight: Every surface enclosing the origin has flux = 4π, even if the surface is twisted and complicated.

🔗 Using the theorem between surfaces

  • If div F = 0 in the volume V between two surfaces (neither enclosing the origin), then ∬ F · n dS = 0 for the two surfaces together.
  • Example: Flux = −4π into a sphere (inward normal) must be balanced by flux = 4π out of a twisted surface surrounding it.

🔬 Applications: Gauss's Law and equilibrium

🔬 Gauss's Law

Gauss's Law (differential form):

  • Gravity: div F = −4πGM (where M is mass density)
  • Electrostatics: div E = q/ε₀ (where q is charge density)

Gauss's Law (integral form):

  • Gravity: ∬ F · n dS = −∭ 4πGM dV (flux proportional to total mass)
  • Electrostatics: ∬ E · n dS = ∭ (q/ε₀) dV (flux proportional to total charge)
  • A mass M at the origin produces gravity field F = −GM R/ρ³.
  • A charge q at the origin produces electric field E = (q/4πε₀) R/ρ³.
  • The physical constants are G (gravitational) and ε₀ (permittivity); the mathematical constant is the relation between divergence and flux.

🌡️ Example: Heat flow in the sun

  • Temperature T = ln(1/ρ) inside a ball of radius ρ₀.
  • Heat flow F = −grad T = +grad ln ρ = (x i + y j + z k)/ρ² = R/ρ² (radially outward, magnitude 1/ρ).
  • On the sun's surface (radius ρ₀), n is radially outward (magnitude 1), so F · n = 1/ρ₀.
  • Flux: ∬ F · n dS = ∬ dS/ρ₀ = (surface area)/ρ₀ = 4πρ₀²/ρ₀ = 4πρ₀.
  • Check by Divergence Theorem: div F = 1/ρ² (from Example 5), so ∭ div F dV = ∫₀^(2π) ∫₀^π ∫₀^(ρ₀) (1/ρ²) ρ² sin θ dρ dθ dφ = ρ₀ · 2 · 2π = 4πρ₀ ✓

⚖️ General equilibrium framework

  • Pattern: potential → force field → source equation.
  • Three common cases:
FieldPotentialForce/FlowEquation
ElectromagnetismfF = −c grad fdiv(c grad f) = electric charge
Heat flowTF = −c grad Tdiv(c grad T) = heat source
Elasticityustress = +c grad udiv(c grad u) = outside force
  • The constant c depends on the material.
  • The equation to solve is always div(c grad f) = known source.
  • "Taking the divergence of the gradient" is the core operation.

🧮 The reasoning behind the theorem

🧮 Local balance in a small box

  • Consider a small box centered at (x, y, z) with edges Δx, Δy, Δz.
  • Volume ΔV = Δx Δy Δz.
  • Top and bottom faces:
    • Top (normal k): flux ≈ P(x, y, z + ½Δz) Δx Δy
    • Bottom (normal −k): flux ≈ −P(x, y, z − ½Δz) Δx Δy
    • Net upward flux ≈ ΔP Δx Δy = (∂P/∂z) Δx Δy Δz = (∂P/∂z) ΔV
  • Similarly, side faces contribute (∂N/∂y) ΔV and front/back contribute (∂M/∂x) ΔV.
  • Key point: flux out of the box ≈ (∂M/∂x + ∂N/∂y + ∂P/∂z) ΔV = (div F) ΔV.

🔗 From local to global

  • For a constant field, both sides are zero (flow goes straight through).
  • For F = x i + y j + z k, divergence = 3, so 3ΔV is created inside; flux is also 3ΔV.
  • Sum (div F) ΔV over many boxes (about 1/ΔV boxes) → ∭ div F dV as Δx, Δy, Δz → 0.
  • On the other side, fluxes between adjacent boxes cancel (flow from box to box).
  • Only fluxes at the outer surface S survive → ∬ F · n dS.
  • This gives the Divergence Theorem.

📜 Historical note

  • Probably discovered by Gauss (only an outline in his notebooks).
  • Green and Ostrogradsky both published proofs in 1828 (England and St. Petersburg).
  • Requirements: smoothness of F and S; avoidance of one-sided surfaces (Möbius strips).

🧰 The del operator and product rules

🧰 The del operator ∇

Del (∇): A vector whose components are operations, not numbers: ∇ = i ∂/∂x + j ∂/∂y + k ∂/∂z

  • Gradient: ∇f = i ∂f/∂x + j ∂f/∂y + k ∂f/∂z (vector from scalar).
  • Divergence: ∇ · F = ∂M/∂x + ∂N/∂y + ∂P/∂z (scalar from vector).
  • Curl: ∇ × F = ... (to be defined in the next section; vector from vector).
  • Laplacian: ∇ · ∇f = ∂²f/∂x² + ∂²f/∂y² + ∂²f/∂z², often written ∇²f or Δf.
  • Laplace's equation: div grad f = 0 becomes ∇²f = 0 (potential when source is zero).

🔧 Product rule for vectors

Product rule: div(u V) = u div V + V · (grad u)

  • Here u(x, y, z) is a scalar function and V(x, y, z) is a vector field.
  • Derivation: Apply ordinary product rules to each component:
    • (uM)ₓ = u ∂M/∂x + M ∂u/∂x
    • (uN)ᵧ = u ∂N/∂y + N ∂u/∂y
    • (uP)_z = u ∂P/∂z + P ∂u/∂z
  • Add these three to get the vector rule.

🔄 Integration by parts (Gauss's Formula)

Gauss's Formula (3D): ∭ u div V dx dy dz = −∭ V · (grad u) dx dy dz + ∬ u V · n dS

  • This is the reverse of the product rule.
  • Integrating both sides of div(u V) gives ∬ u V · n dS by the Divergence Theorem.
  • Move one term to the other side → the minus sign.
  • Green's Formula (2D): ∬ u (∂M/∂x + ∂N/∂y) dx dy = −∬ (M ∂u/∂x + N ∂u/∂y) dx dy + ∮ u(M i + N j) · n ds
  • 1D analogue: ∫ u dv/dx dx = −∫ u' v dx + [uv]ᵇₐ (integration by parts leaves a boundary term).

🧪 Example: Divergence of R/ρ²

  • Use the product rule with V = R and u = 1/ρ².
  • div F = (div R)/ρ² + R · (grad 1/ρ²).
  • div R = 3 (from x i + y j + z k).
  • For grad 1/ρ², apply the chain rule: R · (grad 1/ρ²) = −2 R · (grad ρ)/ρ³ = −2 R · (R/ρ)/ρ³ = −2/ρ².
  • Combine: div F = 3/ρ² − 2/ρ² = 1/ρ² (as claimed in Example 4).

💧 Continuity equation for fluids

💧 Flow with velocity and density

  • Velocity V: rate of movement of fluid (vector).
  • Density ρ: mass per unit volume (scalar).
  • Mass flux F = ρ V: rate of movement of mass (vector).
  • Example: Air has greater velocity than water, but much lower density, so ρ V is usually larger for the ocean.
  • Water is virtually incompressible (ρ = constant); air is compressible (ρ varies).

💧 Conservation of mass

Continuity Equation: div(ρ V) + ∂ρ/∂t = 0

  • This is a balance equation for flow without sources or sinks.
  • Explanation:
    • Mass in a region: ∭ ρ dV.
    • Rate of decrease: −∭ ∂ρ/∂t dV.
    • Mass flow out through surface: ∬ F · n dS = ∬ ρ V · n dS.
    • By the Divergence Theorem: ∬ ρ V · n dS = ∭ div(ρ V) dV.
    • To balance in every region, div(ρ V) must equal −∂ρ/∂t at every point.
  • Physical meaning: The rate at which density decreases at a point equals the divergence of the mass flux (net outflow per unit volume).

📊 One-dimensional illustration

  • For flow in the x direction only, the continuity equation becomes d(ρV)/dx + dρ/dt = 0.
  • In a small box of width dx and cross-section dS:
    • Mass in: ρV dS dt (entering from the left).
    • Extra mass out: d(ρV) dS dt (leaving on the right).
    • Mass loss: dρ dS dx (decrease in density over time dt).
  • Balance: d(ρV) dS dt = dρ dS dx → d(ρV)/dx = −dρ/dt (same as the continuity equation).
91

Stokes' Theorem and the Curl of F

15.6 Stokes’ Theorem and the Curl of F

🧭 Overview

🧠 One-sentence thesis

Stokes' Theorem extends Green's Theorem to three-dimensional curved surfaces by relating the circulation of a vector field around a space curve to the surface integral of the field's curl, revealing that gradient fields do no work around closed paths because their curl is always zero.

📌 Key points (3–5)

  • What curl measures: the "spin" or rotation of a three-dimensional vector field; its direction is the axis of rotation and its magnitude is twice the angular velocity.
  • Stokes' Theorem statement: the line integral of F around a closed space curve C equals the surface integral of curl F over any surface S bounded by C.
  • Gradient fields have zero curl: curl grad f is always zero (because mixed partial derivatives are equal), which means gradient fields are conservative and do no work around closed loops.
  • Common confusion: curl F is a vector (with three components), not a scalar; in two dimensions only the k-component survives, which is why Green's Theorem looks simpler.
  • Test D for conservative fields: a field F is conservative (does no work around closed paths) if and only if curl F = 0 everywhere.

🌀 Understanding the curl

🌀 Definition and components

The curl of a vector field F(x,y,z) = M i + N j + P k is the vector field curl F = (∂P/∂y - ∂N/∂z) i + (∂M/∂z - ∂P/∂x) j + (∂N/∂x - ∂M/∂y) k.

  • The curl is a vector, not a number.
  • It can be written as a determinant (though technically "illegal"): the first row is i, j, k; the second row is ∂/∂x, ∂/∂y, ∂/∂z; the third row is M, N, P.
  • In two dimensions (plane fields with P = 0 and no z-dependence), only the k-component survives: curl F = (∂N/∂x - ∂M/∂y) k, which recovers the scalar quantity from Green's Theorem.

🎡 Physical meaning: spin and rotation

  • Curl measures rotation: if you place a paddlewheel in the flow at any point, curl F tells you how fast and in what direction it will spin.
  • Direction: the direction of curl F is the axis of rotation.
  • Magnitude: |curl F| equals twice the angular velocity; the turning speed is ½|curl F|.
  • The angular velocity of a wheel with axis direction n is ½(curl F)·n (the "directional spin").
  • Maximum rotation occurs when the wheel axis aligns with curl F.

🔄 Spin field example

  • A spin field S = a × R (cross product of a fixed vector a with position R) has curl S = 2a.
  • The axis is along a, and the angular velocity is |a|.
  • Example: the field -y i + x j (spinning around the z-axis) has a = k and curl = 2k.
  • Even parallel vectors can have spin if their lengths vary (shear fields).

🔗 Key identities with curl

🔗 Curl of a gradient is always zero

curl grad f = 0 for every function f(x,y,z).

  • This follows because mixed partial derivatives are equal: f_yz = f_zy, f_xz = f_zx, f_yx = f_xy.
  • All six terms in curl grad f cancel in pairs.
  • Implication: gradient fields are irrotational (no spin anywhere).
  • This is the three-dimensional version of ∂M/∂y = ∂N/∂x for conservative plane fields.

🔗 Divergence of a curl is always zero

div curl F = 0 for every vector field F.

  • Again, mixed derivatives cancel: P_xy = P_yx, N_xz = N_zx, M_zy = M_yz.
  • Twin identity: just as curl grad = 0, we also have div curl = 0.
  • Spin fields have no divergence; position fields have no curl.

🔗 Summary of automatic zeros

ExpressionResultWhy
curl grad f0Mixed partials equal
div curl F0Mixed partials equal
curl (position field R)0R is a gradient
div (spin field S)0S is a curl

📐 Stokes' Theorem

📐 The theorem statement

Stokes' Theorem: ∮_C F·dR = ∬_S (curl F)·n dS

  • Left side: line integral of F around a closed space curve C (the work or circulation).
  • Right side: surface integral of curl F over any surface S bounded by C.
  • The curve C is the boundary of the surface S.
  • The normal direction n and the direction around C are related by the right-hand rule: walking along C with your head pointing along n, the surface is on your left.

📐 Connection to Green's Theorem

  • Green's Theorem is the special case when S is flat (in the xy-plane) and n = k.
  • Then only the k-component of curl F matters: ∂N/∂x - ∂M/∂y.
  • Stokes' Theorem generalizes to curved surfaces in three dimensions.
  • Both proofs of Stokes' Theorem use Green's Theorem as the foundation.

📐 Two proof sketches

First proof (why it's true):

  • Break S into small triangles.
  • Apply Green's Theorem to each triangle in its own plane.
  • When you add up the triangles, the interior edges cancel (one triangle's CA cancels the adjacent triangle's AC).
  • Only the outer boundary C remains.

Second proof (how to compute):

  • For a surface z = f(x,y), express dz in terms of dx and dy.
  • The line integral becomes an integral over the shadow of C in the xy-plane.
  • The surface integral also projects down to the xy-plane.
  • Both sides equal by Green's Theorem on the shadow.

📐 Important consequence

  • If curl F = 0 everywhere, then ∮ F·dR = 0 around every closed curve.
  • This is the test for conservative fields in three dimensions.
  • Example application: Faraday's Law in electromagnetism relates curl E to the time rate of change of magnetic flux.

🎯 Conservative fields and potentials

🎯 Four equivalent properties

A field F = M i + N j + P k is conservative if it has any (hence all) of these properties:

PropertyStatement
AWork ∮ F·dR = 0 around every closed path
BWork from P to Q is path-independent
CF is a gradient field: F = grad f for some potential f
DCurl F = 0 (test D: M_y = N_x, M_z = P_x, N_z = P_y)
  • Test D is the quick check: compute the three equations and see if they hold.
  • If curl F ≠ 0 anywhere, the field cannot be conservative.
  • If curl F = 0 everywhere, you can find the potential f.

🎯 Finding the potential function

When test D passes (curl F = 0), solve the three equations:

  1. From ∂f/∂x = M, integrate to get f = ∫M dx + C(y,z) (arbitrary function of y,z).
  2. From ∂f/∂y = N, match the y-derivative and determine C up to an arbitrary c(z).
  3. From ∂f/∂z = P, match the z-derivative and determine c.

Example: F = 2xy i + (x² + z) j + y k

  • Step 1: f = x²y + C(y,z)
  • Step 2: ∂f/∂y = x² + ∂C/∂y must equal x² + z, so C = yz + c(z)
  • Step 3: ∂f/∂z = y + dc/dz must equal y, so c is constant
  • Result: f = x²y + yz

🎯 When no potential exists

  • If test D fails (curl F ≠ 0), no potential exists.
  • Example: spin field F = (z - y) i + (x - z) j + (y - x) k has curl = (2,2,2) ≠ 0.
  • Trying to find f will lead to contradictions.
  • Don't confuse: a field can have zero divergence but nonzero curl (spin fields), or zero curl but nonzero divergence (position field).

🎯 Alternative method: line integral

  • Define f(x,y,z) as the work to reach (x,y,z) from (0,0,0).
  • The path doesn't matter (because F is conservative).
  • Integrate F·dR along any convenient path, such as the straight line (xt, yt, zt) from t=0 to t=1.
  • This gives the same potential function.

🔄 Review of the big theorems

🔄 The four fundamental theorems

TheoremEquationDimension
Green (form 1)∮ F·dR = ∬(∂N/∂x - ∂M/∂y) dx dy2D, relates circulation to curl
Green (form 2)∮ F·n ds = ∬(∂M/∂x + ∂N/∂y) dx dy2D, relates flux to divergence
Divergence∬ F·n dS = ∭ div F dV3D, relates flux to divergence
Stokes∮ F·dR = ∬ curl F·n dS3D, relates circulation to curl
  • Green's first form leads to Stokes' Theorem (both about curl and circulation).
  • Green's second form leads to the Divergence Theorem (both about divergence and flux).
  • The 3D theorems contain the 2D theorems as special cases (take P = 0 and a flat surface).

🔄 Why start with 2D?

  • The excerpt notes: "It is easier to generalize than to specialize."
  • Starting with simpler cases (derivatives before partial derivatives, Green before Stokes) builds understanding.
  • The 3D theorems could be stated first, but the 2D versions are more intuitive.

🔄 Common pattern

  • All four theorems convert a "boundary integral" to an "interior integral."
  • Line integrals become surface integrals; surface integrals become volume integrals.
  • The key operators (curl, divergence, gradient) connect the dimensions.
92

Mathematics after Calculus

CHAPTER 16 Mathematics after Calculus

🧭 Overview

🧠 One-sentence thesis

After calculus, students should choose from linear algebra, differential equations, discrete mathematics, advanced calculus with Fourier series, numerical methods, and statistics—each offering distinct tools for modeling and solving real-world problems, with linear algebra and differential equations being the two most fundamental courses.

📌 Key points (3–5)

  • Two main post-calculus courses: linear algebra (systems of equations, discrete, easier) and differential equations (continuous, rate-driven).
  • Linear algebra vs differential equations: linear algebra handles n interconnected variables in discrete systems; differential equations handle continuous change where the rate dy/dt depends on the present state y.
  • Discrete mathematics: focuses on networks, algorithms, and counting (no derivatives); essential for computer science.
  • Common confusion: linear algebra is discrete (from algebra), differential equations are continuous (from calculus)—don't conflate the two paradigms.
  • Why it matters: these courses provide the mathematical foundation for engineering, science, management, and computing; they enable modeling, optimization, and understanding of dynamic systems.

📐 Linear algebra: systems and interconnections

📐 What linear algebra studies

Linear algebra is about systems of equations with n variables to solve for, where a change in one affects the others.

  • The variables can represent prices, velocities, currents, concentrations—outputs from any model with interconnected parts.
  • The single key assumption: the model must be linear.
  • A change in one variable produces proportional changes in all variables.
  • Practically every subject begins with linear models; when systems become nonlinear, we solve them by a sequence of linear equations.
  • Example: an organization's pricing model with three products—changing the price of one product affects demand for the others proportionally.

🧮 Why linear algebra is fundamental

  • The excerpt states: "Linear algebra has become as basic and as applicable as calculus, and fortunately it is easier."
  • It is recommended to take this course.
  • Linear programming is mentioned as nonlinear because it requires x ≥ 0 (a constraint), but the core equations are linear.
  • Don't confuse: linear algebra deals with discrete systems (from algebra), not continuous rates of change.

🌊 Differential equations: continuous change

🌊 What differential equations study

A differential equation is continuous (from calculus), where the rate dy/dt is determined by the present state y—which changes by following that rule.

  • The system evolves over time according to a rule that depends on the current state.
  • The excerpt highlights two key forms:
    • y′ = cy + s(t) for economics and life sciences
    • y″ + by′ + cy = f(t) for physics and engineering
  • Example: a population y grows at a rate proportional to its current size (y′ = cy), or a drug concentration decays over time.

🔄 Continuous vs discrete

  • Differential equations are continuous (from calculus).
  • Matrix equations (linear algebra) are discrete (from algebra).
  • Don't confuse: the same problem can be modeled either way—continuous models use derivatives; discrete models use differences.

🔢 Discrete mathematics: algorithms and networks

🔢 What discrete mathematics covers

  • Matrices are a part, but networks and algorithms are a bigger part.
  • Derivatives are not a part—this is closer to algebra.
  • It is needed in computer science.

🧩 Typical problems

  • The excerpt gives a matching question example:

    Can 25 states be matched with 25 neighbors, so one state in each pair has an even number of letters?

  • Example: New York pairs with New Jersey, Texas with Oklahoma, California with Arizona; rules are needed for Hawaii and Alaska.
  • This matching question "doesn't sound mathematical, but it is."
  • The excerpt mentions "counting the ways a computer can send ten messages in parallel—and finding the fastest way."

🗺️ Four topics from discrete mathematics

  • Section 16.3 selects four topics so students can decide if they want more.
  • The excerpt emphasizes that discrete math is essential for understanding algorithms and network problems.

🌀 Advanced calculus: Fourier series and transforms

🌀 Building blocks for solutions

  • The excerpt describes two key building blocks:
    • Any function u(x + iy) solves the Laplace equation u_xx + u_yy = 0.
    • Any function e^(ik(x + ct)) solves the wave equation u_tt − c²u_xx = 0.
  • From these building blocks, we assemble solutions.

📡 Fourier transform and series

A Fourier series breaks the signal into Σa_k cos kx or Σb_k sin kx or Σc_k e^(ikx).

  • For the wave equation, a signal starts at t = 0 and is a combination of pure oscillations e^(ikx).
  • The coefficients in that combination make up the Fourier transform—to tell how much of each frequency is in the signal.
  • The excerpt notes: "A lot of engineers and scientists would rather know those Fourier coefficients than f(x)."
  • These sums can be infinite (like power series).
  • Instead of values of f(x) or derivatives at the basepoint, the function is described by a_k, b_k, c_k.
  • Everything is computed by the "Fast Fourier Transform"—described as "the greatest algorithm since Newton's method."

📻 Examples of frequency content

  • A radio signal is near one frequency.
  • A step function has many frequencies.
  • A delta function has every frequency in the same amount: δ(x) = Σ cos kx.
  • The excerpt notes: "Channel 4 can't broadcast a perfect step function. You wouldn't want to hear a delta function."

🖥️ Numerical methods and scientific computing

🖥️ Algorithms replace formulas

  • For nonlinear equations, this means Newton's method.
  • For Ax = b, it means elimination.
  • Exact solutions are gone—speed, accuracy, and stability become essential.

🔬 Integration with theory

  • The excerpt suggests: "It seems right to make scientific computing a part of applied mathematics, and teach the algorithms with the theory."
  • The text Introduction to Applied Mathematics is mentioned as "one step in this direction, trying to present advanced calculus as it is actually used."

📊 Statistics: drawing conclusions from data

📊 Why statistics matters

  • The excerpt states: "Our society produces oceans of data—somebody has to draw conclusions."
  • To decide if a new drug works, if oil spills are common or rare, and how often to have a checkup, we can't just guess.
  • Example: "I am astounded that the connection between smoking and health was hidden for centuries. It was in the data! Eventually the statisticians uncovered it."

🔍 Finding patterns

  • Professionals can find patterns, and the rest of us can understand (with a little mathematics) what has been found.
  • One purpose in studying mathematics is to know more about your own life.

💡 The purpose of studying mathematics

💡 Understanding functions

Calculus lights up a key idea: Functions. Shapes and populations and heart signals and profits and growth rates, all are given by functions.

  • Functions change in time.
  • They have integrals and derivatives.
  • To understand and use them is a challenge—mathematics takes effort.

🌟 Contributing to the field

  • A lot of people have contributed, in whatever way they could—as the reader and author are doing.
  • The excerpt concludes: "We may not be Newton or Leibniz or Gauss or Einstein, but we can share some part of what they created."
  • Don't confuse: you don't need to be a genius to benefit from and contribute to mathematics—understanding and applying the tools is valuable in itself.
    Calculus | Thetawave AI – Best AI Note Taker for College Students