Elementary Calculus

The Derivative: Introduction to Calculus

1.1 Introduction

🧭 Overview

🧠 One-sentence thesis

Calculus provides a method—using limits—to analyze curved shapes and dynamic quantities by calculating instantaneous rates of change (derivatives) and areas under curves (integrals), which are connected by the Fundamental Theorem of Calculus.

📌 Key points (3–5)

What calculus analyzes: curved shapes and dynamic (changing) quantities, in contrast to classical mathematics which focused on static quantities.
The derivative concept: instantaneous velocity (or rate of change) is found by taking the limit of average velocity over shrinking time intervals.
The integral concept: area inside curved regions is found by summing areas of rectangles whose widths shrink indefinitely.
Common confusion: do not replace Δt by 0 in the ratio Δs/Δt before canceling; only after simplification does Δt approach 0.
Why it matters: derivatives and integrals are connected by the Fundamental Theorem of Calculus, discovered independently by Newton and Leibniz in the 17th century.

🎯 What calculus is and why it was invented

🎯 The core idea

Calculus: the analysis of curved shapes.

Classical mathematics (algebra, geometry, trigonometry) studied static quantities.
Calculus introduced a way to analyze dynamic (changing) quantities.
Its development grew out of attempts to solve physical problems involving motion and change.

🏛️ Historical context

Calculus is several centuries old and marked the beginning of modern mathematics.
The 17th–19th centuries saw revolutionary advances in physics, chemistry, biology, and other sciences; calculus was part of that qualitative leap.
The first European calculus textbook was written by Guillaume de l'Hôpital in 1696, titled Analysis of the Infinitely Small for Understanding Curved Lines.

🪂 The motivating example: a falling object

🪂 The physical setup

An object at rest 100 ft above the ground is dropped (ignoring air resistance and wind).
The object falls straight down until it hits the ground.
Its position s above the ground after t seconds is given by s(t) = −16t² + 100 ft.
The object hits the ground after 2.5 seconds (when s = 0).

📉 Path vs graph

The object's path is a straight line (vertical drop).
The graph of position s as a function of time t is curved—part of a parabola.
This illustrates why calculus is needed: even though the physical path is straight, the relationship between position and time is curved.

❓ The question

How fast is the object moving before it hits the ground?

🧮 From average to instantaneous velocity

🧮 Average speed and average velocity

Average speed over 2.5 seconds:
- distance traveled / time elapsed = 100 ft / 2.5 seconds = 40 ft/s.
Average velocity over 2.5 seconds:
- (final position − initial position) / (end time − start time) = (0 − 100) / (2.5 − 0) = −40 ft/s.
Key difference: velocity takes direction into account; downward motion means negative velocity, upward motion means positive velocity.

⏱️ Defining instantaneous velocity

The natural way to define instantaneous velocity at a particular instant t:

Find the average velocity over an interval of time [t, t + Δt], where Δt is a small positive number.
Let the interval become smaller and smaller indefinitely, shrinking to the point t.
If the average velocity approaches some value, call that value the instantaneous velocity at time t.

📐 The calculation

Over the interval [t, t + Δt], the change in time is Δt and the change in position is Δs.
Average velocity = Δs / Δt = [s(t + Δt) − s(t)] / Δt.
Substituting s(t) = −16t² + 100:
- Δs / Δt = [−16(t + Δt)² + 100 − (−16t² + 100)] / Δt
- = [−16t² − 32tΔt − 16(Δt)² + 100 + 16t² − 100] / Δt
- = [−32tΔt − 16(Δt)²] / Δt
- = Δt(−32t − 16Δt) / Δt
- = −32t − 16Δt.
Critical step: Δt is canceled before letting it approach 0.
As Δt gets closer and closer to 0, the average velocity −32t − 16Δt gets closer and closer to −32t − 0 = −32t.

🎯 The result

The object has instantaneous velocity −32t at time t.
At the instant the object hits the ground (t = 2.5 sec), the instantaneous velocity is −32(2.5) = −80 ft/s.

⚠️ Don't confuse

Do not replace Δt by 0 in the ratio Δs / Δt until after doing as much cancellation as possible.
If you substitute Δt = 0 too early, you get 0/0, which is undefined.

🔢 The limit notation

🔢 Writing the limit

The calculation above is interpreted as taking the limit of Δs / Δt as Δt approaches 0:

instantaneous velocity at t = limit of average velocity over [t, t + Δt] as Δt approaches 0
= lim (Δt → 0) [Δs / Δt]
= lim (Δt → 0) [−32t − 16Δt]
= −32t − 16(0)
= −32t.

🔄 Why velocity varies with t

The instantaneous velocity v(t) = −32t varies with t, as it should.
The object accelerates as it falls, so its velocity increases in magnitude over time.

📊 The derivative concept

📊 What the derivative is

The instantaneous velocity v(t) = −32t is called the derivative of the position function s(t) = −16t² + 100.

Calculating derivatives, analyzing their properties, and using them to solve problems are part of differential calculus.

📈 Connection to curved shapes

Instantaneous velocity is a special case of an instantaneous rate of change of a function.
Similar to how the rate of change of a line is its slope, the instantaneous rate of change of a general curve represents the slope of the curve.
Example: the parabola s(t) = −16t² + 100 has slope −32t for all t.
Key difference from lines: the slope of this curve varies (as a function of t), unlike the slope of a straight line, which is constant.

🟦 Integrals: area under curves

🟦 The area problem

Finding the area inside curved regions is another type of problem that calculus can solve.

🧱 The basic idea

Use simpler regions—rectangles—whose areas are known.
Use those rectangles to approximate the area inside the curved region.
Draw more and more rectangles of diminishing widths inside the curved region.
The sums of their areas approach the area of the curved region.

📐 Example

The excerpt shows an example with four rectangles to approximate the area under a curve y = f(x) over an interval [a, b] on which f(x) ≥ 0.

🔢 The integral

The limit of these sums of rectangular areas is called an integral.

The study and application of integrals are part of integral calculus.

🔗 The Fundamental Theorem of Calculus

🔗 The remarkable connection

Perhaps the most remarkable result in calculus is that there is a connection between derivatives and integrals—the Fundamental Theorem of Calculus.

🏆 Discovery

Discovered in the 17th century, independently, by two men who invented calculus as we know it:
- Isaac Newton (1642–1727): English physicist, astronomer, and mathematician.
- Gottfried Wilhelm von Leibniz (1646–1716): German mathematician and philosopher.

∞ Infinite series and power series

∞ What infinite series are

An infinite series is just a sum of an infinite number of terms.

🥧 Example: approximating π

The excerpt gives the formula:

π/4 = 1 − 1/3 + 1/5 − 1/7 + 1/9 − ···,
where the sum on the right involves an infinite number of terms.

📈 Power series

A power series is a particular type of infinite series applied to functions; it can be thought of as a polynomial of infinite degree.

🌊 Example: sine function

The trigonometric function sin x does not appear to be a polynomial, but it has a power series representation:

sin x = x − x³/3! + x⁵/5! − x⁷/7! + x⁹/9! − ···,
where the sum continues infinitely, and the formula holds for all x (in radians).

🔧 Why power series matter

The idea of replacing a function by its power series played an important role throughout the development of calculus.
It is a powerful technique in many applications.

🔢 Number systems and notation

🔢 Standard sets of numbers

The excerpt defines:

Symbol	Meaning	Example members
N	Natural numbers (nonnegative integers)	0, 1, 2, 3, 4, ...
Z	All integers	0, ±1, ±2, ±3, ±4, ...
Q	Rational numbers m/n (m, n integers, n ≠ 0)	1/2, −3/4, 5, 0, ...
R	All real numbers	All rational and irrational numbers

Subset relationship: N ⊂ Z ⊂ Q ⊂ R.

🔍 Irrational numbers

Irrational numbers: real numbers that are not rational.

Example: √2 is irrational (2 is not the square of a rational number).
Proof sketch: If q² were an integer, then q itself would have to be an integer. Since 2 is not the square of an integer, it cannot be the square of a rational number.
This argument also shows that √3, √5, √6, √7, √8, √10, etc., are irrational.

♾️ Size of infinite sets

There are far more irrational numbers—and hence real numbers—than rational numbers.
The rational numbers can be listed in a sequence (first, second, third, etc.).
The set of real numbers cannot be listed in a sequence.
Thus, some infinite sets are larger than others—R is larger than Q.

🌊 The continuum

A continuum: no gaps exist.

Intervals such as [0, 1] or R itself are examples of a continuum.
In the closed interval [0, 1], there is no "next" real number after 0.
Continuum Hypothesis (unsolved problem): Is there an infinite set larger than Q but smaller than R?

∞ Infinity in calculus

∞ Two notions of infinity

Infinitely large: unbounded growth.
Infinitesimally small: quantities approaching zero.

🧮 Mathematical meaning

Calculus attempts to give the idea of infinity some mathematical meaning, typically by way of limits.

🤔 Philosophical debate

The mathematical use of infinity has been a subject of philosophical debate.

🔄 Alternative approaches

Not everyone agrees that calculus handles infinity satisfactorily. For example, infinitesimal analysis is an alternative development of the same material without using limits in the traditional sense.

The Derivative: Limit Approach

1.2 The Derivative: Limit Approach

🧭 Overview

🧠 One-sentence thesis

The derivative measures the instantaneous rate of change of a function at a point, and it can be computed using a limit definition that generalizes the concept of velocity to any function.

📌 Key points (3–5)

What the derivative is: the instantaneous rate of change of a function f at x, defined as a limit of the ratio of change in f to change in x as the change approaches zero.
How to compute it: use the limit definition with delta-x (or h) approaching zero; simplify the ratio before substituting zero to avoid division by zero.
Special cases: constant functions have derivative zero; linear functions have constant derivatives equal to their slope.
Common confusion: the derivative is not the value of the function itself, but the rate at which the function changes; negative derivatives mean the function is decreasing, positive means increasing.
When derivatives don't exist: functions can fail to be differentiable at points where the graph has a "corner" (like absolute value at x = 0).

📐 The limit definition

📐 Formal definition

The derivative of a real-valued function f(x), denoted by f′(x), is the limit as delta-x approaches 0 of (f(x + delta-x) − f(x)) / delta-x, for x in the domain of f, provided that the limit exists.

This generalizes the instantaneous velocity example from the previous section.
The derivative f′ is itself a function of x—you can evaluate it at specific values or write its general formula.
The limit captures the idea of "instantaneous" change: shrink the interval to zero and see what ratio the changes approach.

🔄 Alternative formulations

The excerpt provides several equivalent ways to write the derivative:

Formulation	Expression	When to use
Delta notation	lim (delta-x → 0) of (f(x + delta-x) − f(x)) / delta-x	Physics contexts, emphasizes change
h notation	lim (h → 0) of (f(x + h) − f(x)) / h	Common in math texts
w notation	lim (w → x) of (f(w) − f(x)) / (w − x)	Emphasizes two points approaching each other
Backward difference	lim (h → 0) of (f(x) − f(x − h)) / h	Useful for certain proofs
Symmetric difference	lim (h → 0) of (f(x + h) − f(x − h)) / (2h)	Averages forward and backward approaches

All formulations are mathematically equivalent when the derivative exists.
The symmetric difference formula is proved using limit rules: it equals the average of the forward and backward formulas.

🎯 Simple examples and patterns

🎯 Constant functions

Rule: The derivative of any constant function is 0.
Why: A constant function never changes value, so its rate of change is always zero.
Example: f(x) = 1 has f′(x) = 0 for all x.
The excerpt notes that formal calculation is unnecessary—the answer should be obvious from the meaning of "rate of change."

📏 Linear functions

Rule: The derivative of any linear function f(x) = mx + b is the slope m itself: f′(x) = m for all x.
Why: Lines change at a constant rate, which is precisely their slope.
Example: f(x) = 2x − 1 has slope 2, so f′(x) = 2 everywhere.
Example: f(x) = −x + 2 has slope −1, so f′(x) = −1 everywhere.
Converse: A function with a constant derivative must be a linear function (proved later in the text).

🌀 Curved functions

Functions representing curves (not straight lines) do not change at a constant rate—that is precisely what makes them curved.
Such functions do not have constant derivatives.
Example: The parabola s(t) = −16t² + 100 has derivative s′(t) = −32t, which is not constant.
Calculating derivatives of curved functions using the limit definition requires more algebraic work.

🔢 Worked example: reciprocal function

🔢 Finding the derivative of f(x) = 1/x

The excerpt walks through the calculation for all x ≠ 0:

Write the limit definition: lim (delta-x → 0) of (1/(x + delta-x) − 1/x) / delta-x
Notice this gives 0/0 form, so simplify the ratio before plugging in delta-x = 0
Get a common denominator: (x − (x + delta-x)) / ((x + delta-x)x) / delta-x
Simplify: −delta-x / (delta-x(x + delta-x)x)
Cancel delta-x: −1 / ((x + delta-x)x)
Now take the limit as delta-x → 0: −1 / (x · x) = −1/x²

Result: f′(x) = −1/x² for all x ≠ 0.

📉 Interpreting the sign

The instantaneous rate of change at x = 2 is f′(2) = −1/4, a negative number.
Why negative matters: The function f(x) = 1/x is decreasing at x = 2 (and everywhere x ≠ 0).
General principle:
- Negative derivative → function is decreasing
- Positive derivative → function is increasing
The rate of change is always taken in the direction of increasing x (the positive x direction).

🧮 Limit rules

🧮 Intuitive idea of limits

For a real number a and a real-valued function f(x), the limit of f(x) as x approaches a equals the number L if f(x) approaches L as x approaches a.

Equivalently: f(x) can be made as close as you want to L by choosing x close enough to a.
Note: x can approach a from any direction.
For now, the excerpt uses only this intuitive understanding; formal definitions come later.

📋 Basic limit rules

The excerpt lists five rules (proofs come later):

Sum: lim (f + g) = lim f + lim g
Difference: lim (f − g) = lim f − lim g
Constant multiple: lim (k · f) = k · lim f
Product: lim (f · g) = (lim f) · (lim g)
Quotient: lim (f/g) = (lim f) / (lim g), if lim g ≠ 0

These say: the limit of sums, differences, constant multiples, products, and quotients is the sum, difference, constant multiple, product, and quotient (respectively) of the limits.
This seems intuitively obvious.
These rules are used to derive alternative formulations of the derivative.

🔍 Special properties

🔍 Even and odd functions

Definitions:

A function f is even if f(−x) = f(x) for all x in its domain (examples: x², x⁴, cos x).
A function f is odd if f(−x) = −f(x) for all x in its domain (examples: x, x³, sin x).

Derivative properties:

The derivative of an even function is an odd function.
The derivative of an odd function is an even function.

Proof sketch for even functions (from the excerpt):

Start with f′(−x) using the limit definition with x replaced by −x.
Use the fact that f is even to replace f(−(x − h)) with f(x − h) and f(−x) with f(x).
Rearrange to get −f′(x), showing f′ is odd.

⚠️ When derivatives don't exist

Example: f(x) = |x| at x = 0

The absolute value function f(x) = |x| is defined as x if x ≥ 0, and −x if x < 0.

The graph consists of two lines meeting at the origin.
For x ≥ 0: the graph is y = x with slope 1.
For x ≤ 0: the graph is y = −x with slope −1.
At x = 0: the lines agree in value (both equal 0) but their slopes do not agree.
Therefore the derivative does not exist at x = 0, since the derivative of a curve is just its slope.
Don't confuse: The function itself is defined and continuous at x = 0; only the derivative fails to exist there.

📚 Terminology and notation

📚 Key terms

Differentiable at x: the derivative f′(x) exists.
Differentiable function: differentiable at every point in its domain.
Differentiation: the act of calculating a derivative.
Example: f(x) = x is a differentiable function; f(x) = |x| is not differentiable at x = 0.

🖊️ Multiple notations

The excerpt lists many equivalent ways to denote the derivative of y = f(x):

dy/dx (Leibniz notation, looks like a fraction)
f′(x) (prime notation)
d/dx (f(x)) (operator notation)
y′ (prime on dependent variable)
ẏ or ḟ(x) (dot notation, common in physics)
df/dx (differential notation)
Df(x) (operator notation)

All are equivalent; different fields and contexts prefer different notations.

🌍 Physical applications

🌍 Examples across fields

The excerpt provides a table of physical quantities where one is the derivative of another:

Field	Function	Derivative
Physics	position	velocity
Physics	velocity	acceleration
Physics	momentum	force
Physics	work	power
Physics	angular momentum	torque
Engineering	electric charge	electric current
Engineering	magnetic flux	induced voltage
Economics	profit	marginal profit

Velocity as the derivative of position is just one application among many.
The derivative concept applies whenever you need an instantaneous rate of change.

The Derivative: Infinitesimal Approach

1.3 The Derivative: Infinitesimal Approach

🧭 Overview

🧠 One-sentence thesis

The derivative can be understood as a ratio of infinitesimals—infinitely small but nonzero quantities—which provides an intuitive alternative to the limit approach and reveals that instantaneous rate of change is simply the average rate of change over an infinitesimal interval.

📌 Key points (3–5)

Infinitesimals as an alternative framework: The derivative dy/dx can be treated as an actual fraction of two infinitesimal quantities, not just notation, offering a more intuitive approach used in physics and engineering.
What infinitesimals are: Numbers closer to zero than any real number but not zero themselves, with the special property that their square equals zero.
Microstraightness property: At the infinitesimal scale, any differentiable curve becomes a straight line segment, making the derivative literally the slope (rise over run) of that segment.
Common confusion: Infinitesimals vs zero—an infinitesimal δ is not zero (2δ ≠ δ), but its square is zero (δ² = 0); this is fundamentally different from how zero behaves.
Practical insight: For small real values near zero, sin(x) ≈ x, which follows from the infinitesimal result sin(dx) = dx and is widely used in engineering applications.

🔢 Understanding infinitesimals

🔢 Definition and properties

A number δ is an infinitesimal if: (a) δ ≠ 0, (b) if δ > 0 then δ is smaller than any positive real number, (c) if δ < 0 then δ is larger than any negative real number, (d) δ² = 0 (and all higher powers are also 0).

Infinitesimals are not real numbers—they exist as mathematical abstractions, just as infinity does in calculus.
Think of an infinitesimal as "infinitely small, arbitrarily close to 0 but not 0."
Any infinitesimal multiplied by a nonzero real number remains an infinitesimal; zero times an infinitesimal equals zero.

⚖️ How infinitesimals differ from zero

Property	Zero (0)	Infinitesimal (δ)
Equals itself when doubled?	Yes: 2·0 = 0	No: 2δ ≠ δ
Square equals itself?	Yes: 0² = 0	No: δ² = 0 ≠ δ
Constant multiples distinct?	No	Yes (any nonzero multiple)

This distinction is crucial: constant multiples of an infinitesimal are all different from each other, unlike zero.
Example: The calculator analogy—squaring very small numbers like 10⁻⁸ produces results so tiny (10⁻¹⁶) that calculators treat them as effectively zero.

🔗 Connection to limits

The infinitesimal approach is not fundamentally different from letting Δx approach 0 in limits—you consider values arbitrarily close to but not equal to zero.
The excerpt notes this approach was developed rigorously and is equivalent to the limit approach; only the terminology differs.

📐 The derivative as a ratio

📐 Defining dy/dx with infinitesimals

Let dx be an infinitesimal such that f(x + dx) is defined. Then dy = f(x + dx) − f(x) is also an infinitesimal, and the derivative is: dy/dx = [f(x + dx) − f(x)] / dx

dx: an infinitesimally small change in the variable x.
dy: the resulting infinitesimally small change in y = f(x).
The derivative is literally the ratio of these two infinitesimals.

🧮 Example calculation

Example: Finding the derivative of y = x²

Start with dy/dx = [f(x + dx) − f(x)] / dx
Substitute: [(x + dx)² − x²] / dx
Expand: [x² + 2x·dx + (dx)² − x²] / dx
Since dx is infinitesimal, (dx)² = 0
Simplify: [2x·dx + 0] / dx = 2x·dx / dx = 2x
Result: dy/dx = 2x (a real number, no infinitesimals in the final answer)

Key observation: No limits are needed in this calculation, and the final derivative is always a real number.

📝 Notation equivalence

All these notations mean the same thing for the derivative of y = f(x):

dy/dx, f′(x), d/dx(f(x)), y′, ẏ, ḟ(x), df/dx, Df(x)

dy/dx is called Leibniz notation (created by Leibniz).
The dot notation ẏ is Newton's notation, still used in physics when the variable represents time.
The prime notation f′ is due to Lagrange.

🔍 Microstraightness and instantaneous rate

🔍 The microstraightness property

For the graph of a differentiable function, any part of the curve with infinitesimal length is a straight line segment—at the infinitesimal level, differentiable curves are straight.

Why this matters: As points A and B on a curve get closer, the curve between them becomes almost linear; at the infinitesimal scale, it actually is linear.
The distance s along the curve equals the length of the straight line segment AB when s is infinitesimal.
This property extends to all smooth curves (curves without sharp edges or cusps).

⚡ Revealing instantaneous rate of change

What instantaneous rate really means: the average rate of change over an infinitesimal interval.

Moving an infinitesimal amount dx from x produces an infinitesimal change dy in y = f(x).
The average rate over the infinitesimal interval [x, x + dx] is dy/dx.
This is literally the slope—rise over run—of the straight line segment that the curve becomes over that infinitesimal interval.
Don't confuse: This is not a limit of average rates; it is an average rate, just over an infinitesimal interval.

🌊 Application to trigonometric functions

🌊 Deriving sin(dx) = dx

The excerpt uses geometric reasoning with a unit circle:

Consider a circle of radius 1 with an infinitesimal angle dx (in radians).
By geometry (Thales' Theorem and arc length formula), the chord length BC = 2·sin(dx).
The arc length along the circle is 2·dx.
By microstraightness, the arc equals the chord at the infinitesimal scale: 2·sin(dx) = 2·dx.
Therefore: sin(dx) = dx for infinitesimal dx.

📊 Consequences and approximations

From sin(dx) = dx, we also get:

Since sin²(dx) + cos²(dx) = 1 and (dx)² = 0, then cos²(dx) = 1 − 0 = 1.
Therefore: cos(dx) = 1 (choosing the positive solution).

Practical approximation: For real values x close to 0, sin(x) ≈ x.

Figure 1.3.3 shows the graphs of y = sin(x) and y = x are virtually identical over the interval [−0.3, 0.3].
This approximation is widely used in engineering and physics when angles are assumed small.
The key insight: at the infinitesimal level, y = sin(x) is identical to the line y = x (not y = 0) near x = 0.

🧮 Derivative formulas

Using the infinitesimal properties and addition formulas:

For y = sin(x):

dy/dx = [sin(x + dx) − sin(x)] / dx
Apply sine addition formula: [sin(x)·cos(dx) + sin(dx)·cos(x) − sin(x)] / dx
Substitute sin(dx) = dx and cos(dx) = 1: [sin(x)·1 + dx·cos(x) − sin(x)] / dx
Simplify: dx·cos(x) / dx = cos(x)
Result: d/dx(sin x) = cos x

Similarly (left as exercise): d/dx(cos x) = −sin x

🔬 Fundamental difference at infinitesimal scale

There is a crucial difference between a line of slope 1 (y = x) and slope 0 (y = 0) at the infinitesimal level.
In any real interval (−a, a) around x = 0, the difference can be made arbitrarily small.
But in an infinitesimal interval (−δ, δ), there is an "unbridgeable gulf" between them.
This is why sin(dx) = dx rather than 0—the function follows the slope-1 line, not the slope-0 line.

🧰 Working with differentials

🧰 The differential relation

df = f′(x)·dx

Starting from df/dx = f′(x), multiply both sides by dx.
Both sides are infinitesimals for each x in the domain of f′, since f′(x) is a real number.
This formula is important for later applications.

🎯 Function values at infinitesimals

A function evaluated at an infinitesimal may itself be an infinitesimal: sin(dx) = dx.
Or it may be a real number: cos(dx) = 1.
The behavior depends on the specific function.

📚 Historical context

📚 Origins and development

The infinitesimal approach was used by the founders of calculus—Newton and especially Leibniz.
It remains common in physics, engineering, and chemistry due to its intuitive nature.
The approach presented is based on the "nilsquare infinitesimal" method developed by J.L. Bell.
An equivalent treatment uses the hyperreal number system.

📚 Controversy and acceptance

Infinitesimals were radical and controversial; philosopher George Berkeley called them "ghosts of departed quantities" in 1734.
By the 19th century, mathematicians like Cauchy and Weierstrass developed the limit-based approach for greater "rigor."
However, the infinitesimal notion gave calculus its modern character by demonstrating the power of abstractions that don't obey classical mathematical rules.
The limit approach ultimately turns out to be equivalent to the infinitesimal approach—only the terminology differs.

Derivatives of Sums, Products and Quotients

1.4 Derivatives of Sums, Products and Quotients

🧭 Overview

🧠 One-sentence thesis

The derivative of a sum, product, or quotient of functions follows specific rules that allow term-by-term differentiation for sums but require special formulas for products and quotients, with the Power Rule enabling differentiation of any integer power of x.

📌 Key points (3–5)

Five core rules: Sum, Difference, Constant Multiple, Product, and Quotient Rules govern how to differentiate combinations of functions.
Product Rule is not intuitive: The derivative of a product is not the product of the derivatives; instead it is f·(dg/dx) + g·(df/dx).
Power Rule for all integers: The derivative of x to the n is n times x to the (n−1), proven by induction for nonnegative integers and by the Quotient Rule for negative integers.
Common confusion: Don't assume derivatives distribute over products or quotients the way they do over sums—products and quotients require their own formulas.
Linear operator property: Differentiation is linear, meaning the derivative of a linear combination (constants times functions, added together) can be taken term by term.

📐 The five differentiation rules

➕ Sum and Difference Rules

Sum Rule: The derivative of (f + g) equals (df/dx) + (dg/dx).

Difference Rule: The derivative of (f − g) equals (df/dx) − (dg/dx).

These rules say you can differentiate sums and differences term by term.
The proof of the Sum Rule uses the definition of the derivative and rearranges the numerator into separate fractions.
Example: If f(x) = x² and g(x) = x, then d/dx(x² + x) = 2x + 1.

✖️ Constant Multiple Rule

Constant Multiple Rule: The derivative of c·f equals c·(df/dx) for any constant c.

Constants can be "pulled out" of the derivative.
This rule combines with the Sum and Difference Rules to handle linear combinations of functions.

🔗 Product Rule

Product Rule: The derivative of (f · g) equals f·(dg/dx) + g·(df/dx).

Key warning: The derivative of a product is not the product of the derivatives.
Example showing this: If f(x) = x and g(x) = 1, then (f·g)(x) = x, so d(f·g)/dx = 1, but (df/dx)·(dg/dx) = 1·0 = 0.
Don't confuse: d(f·g)/dx ≠ (df/dx)·(dg/dx).

📦 Geometric interpretation of the Product Rule

Imagine a rectangle with sides f(x) and g(x).
Change x by an infinitesimal dx, which changes f by df and g by dg.
The new rectangle has sides f(x + dx) and g(x + dx).
The change in area d(f·g) equals the sum of three shaded regions: f(x)·dg + g(x)·df + df·dg.
Since df and dg are infinitesimals, their product df·dg is (f′(x)·dx)·(g′(x)·dx) = f′(x)g′(x)·(dx)², which equals zero because (dx)² = 0.
Thus d(f·g) = f·dg + g·df, and dividing by dx gives the Product Rule.

➗ Quotient Rule

Quotient Rule: The derivative of (f/g) equals [g·(df/dx) − f·(dg/dx)] / g².

The proof starts by letting y = f/g, so f = g·y, then applies the Product Rule to get df/dx = g·(dy/dx) + (f/g)·(dg/dx), and solves for dy/dx.
Mnemonic device: Write f/g as HI/HO (HI = numerator, HO = denominator), then the rule is "ho-dee-hi minus hi-dee-ho over ho-ho."
Example: The derivative of tan x = (sin x)/(cos x) is [(cos x)·(cos x) − (sin x)·(−sin x)] / cos²x = (cos²x + sin²x) / cos²x = 1/cos²x = sec²x.

🔢 The Power Rule and polynomials

🎯 Power Rule statement

Power Rule: The derivative of x to the n equals n times x to the (n−1) for any integer n.

Memory aid: Bring the exponent down in front, then reduce the exponent by 1.
This works for negative exponents too.
Example: d/dx(x⁴) = 4x³; d/dx(x⁻¹⁰⁰) = −100x⁻¹⁰¹.

🧮 Proof by mathematical induction (nonnegative integers)

Principle of Mathematical Induction: A statement P(n) about integers n ≥ k is true for all n ≥ k if (1) P(k) is true, and (2) if P(n) is true for some n ≥ k then P(n+1) is true.

Step 1 (base case): Show P(0) is true.

Need to show d/dx(x⁰) = 0·x⁻¹ = 0.
Since x⁰ = 1 is a constant, its derivative is 0. ✓

Step 2 (inductive step): Assume P(n) is true (i.e., d/dx(xⁿ) = n·xⁿ⁻¹), and show P(n+1) is true.

Need to show d/dx(xⁿ⁺¹) = (n+1)·xⁿ.
Write xⁿ⁺¹ = x·xⁿ and apply the Product Rule: d/dx(x·xⁿ) = x·(d/dx(xⁿ)) + xⁿ·(d/dx(x)) = x·(n·xⁿ⁻¹) + xⁿ·1 = n·xⁿ + xⁿ = (n+1)·xⁿ. ✓

Thus by induction, the Power Rule holds for all nonnegative integers n ≥ 0.

➖ Proof for negative integers

For negative integer n, write n = −m where m is positive.
Then d/dx(xⁿ) = d/dx(x⁻ᵐ) = d/dx(1/xᵐ).
Apply the Quotient Rule: [xᵐ·(d/dx(1)) − 1·(d/dx(xᵐ))] / (xᵐ)² = [xᵐ·0 − 1·(m·xᵐ⁻¹)] / x²ᵐ = −m·xᵐ⁻¹⁻²ᵐ = −m·x⁻ᵐ⁻¹ = n·xⁿ⁻¹. ✓

📊 Differentiating polynomials

A polynomial is a linear combination of powers of x: aₙxⁿ + aₙ₋₁xⁿ⁻¹ + ··· + a₂x² + a₁x + a₀.
Its derivative is: n·aₙxⁿ⁻¹ + (n−1)·aₙ₋₁xⁿ⁻² + ··· + 2·a₂x + a₁.
Example: If f(x) = x⁴ − 4x³ + 6x² − 4x + 1, then df/dx = 4x³ − 12x² + 12x − 4.
The constant term disappears because its derivative is zero.

🧬 Linearity of differentiation

🔗 Linear combination formula

Linear combination: A sum of the form c₁f₁ + c₂f₂ + ··· + cₙfₙ, where c₁, …, cₙ are constants and f₁, …, fₙ are functions.

The derivative of a linear combination is: d/dx(c₁f₁ + ··· + cₙfₙ) = c₁(df₁/dx) + ··· + cₙ(dfₙ/dx).
This follows by repeatedly applying the Sum Rule and Constant Multiple Rule.
Example: For three functions f₁, f₂, f₃, first apply the Sum Rule to (f₁ + f₂ + f₃) = f₁ + (f₂ + f₃), giving df₁/dx + d/dx(f₂ + f₃), then apply the Sum Rule again to get df₁/dx + df₂/dx + df₃/dx.

🎛️ Linear operator

Linear operator: An operation that satisfies the linearity property—it can be applied term by term to linear combinations.

The symbol d/dx is called a linear operator because it "operates" on differentiable functions by taking their derivatives.
Linearity means differentiation respects addition and scalar multiplication.
This property makes differentiation much easier: you can break complicated expressions into simpler pieces, differentiate each piece, then recombine.

🌊 Trigonometric derivatives via the rules

📐 Derivatives of all six trig functions

The excerpt derives two and lists all six:

Function	Derivative
sin x	cos x
cos x	−sin x
tan x	sec²x
cot x	−csc²x
sec x	sec x tan x
csc x	−csc x cot x

🔍 Example: derivative of tan x

Write tan x = (sin x)/(cos x).
Apply the Quotient Rule: [(cos x)·(d/dx(sin x)) − (sin x)·(d/dx(cos x))] / cos²x.
Substitute derivatives: [(cos x)·(cos x) − (sin x)·(−sin x)] / cos²x = (cos²x + sin²x) / cos²x.
Use the Pythagorean identity cos²x + sin²x = 1: 1/cos²x = sec²x.

🔍 Example: derivative of sec x

Write sec x = 1/(cos x).
Apply the Quotient Rule: [(cos x)·(d/dx(1)) − 1·(d/dx(cos x))] / cos²x.
Substitute: [(cos x)·0 − 1·(−sin x)] / cos²x = (sin x) / cos²x.
Rewrite: (1/cos x)·(sin x/cos x) = sec x tan x.

📝 Note on cot x and csc x

The excerpt states that the derivatives of cot x and csc x can be found using the Quotient Rule and are left as exercises.
The table provides the results: d/dx(cot x) = −csc²x and d/dx(csc x) = −csc x cot x.

The Chain Rule

1.5 The Chain Rule

🧭 Overview

🧠 One-sentence thesis

The Chain Rule enables differentiation of composite functions by multiplying the derivative of the outer function by the derivative of the inner function, which works because the infinitesimals cancel in the fraction multiplication.

📌 Key points (3–5)

What the Chain Rule does: differentiates compositions of functions by treating them as an outer function applied to an inner function.
The formula: if f is a differentiable function of u, and u is a differentiable function of x, then df/dx = (df/du) · (du/dx).
Why it works: the infinitesimals du cancel when multiplying the fractions, and intuitively, if f changes 4 times as fast as u, and u changes 3 times as fast as x, then f changes 12 times as fast as x.
Common confusion: you cannot simply replace x by another expression in a derivative formula (e.g., the derivative of sin 2x is not cos 2x); you must use the Chain Rule.
Key extension: the Chain Rule allows the Power Rule to extend to rational exponents: d/dx (x to the power r) = r · x to the power (r - 1) for any rational number r.

🔗 Why composition requires a special rule

🚫 The naive approach fails

It is tempting to think that the derivative of sin 2x is simply cos 2x, since the derivative of sin x is cos x.
This is incorrect: the actual derivative of sin 2x is 2 cos 2x (proven using the double-angle formula, Constant Multiple Rule, and Product Rule).
You cannot simply replace x by 2x in the derivative formula for sin x.

🧩 Composition structure

Instead, regard sin 2x as a composition of two functions:
- The sine function: f(u) = sin u
- The inner function: u(x) = 2x
Since f is a function of u, and u is a function of x, then f is a function of x: f(x) = sin 2x.
This layered structure requires the Chain Rule.

🔗 The Chain Rule formula and proof

📐 Statement of the rule

Chain Rule: If f is a differentiable function of u, and u is a differentiable function of x, then f is a differentiable function of x, and its derivative with respect to x is: df/dx = (df/du) · (du/dx)

Alternative notation using composition: if g is a differentiable function of x, and f is a differentiable function on the range of g, then (f ∘ g)'(x) = f'(g(x)) · g'(x).

✂️ Why the infinitesimals cancel

Since f is differentiable with respect to u, and u is differentiable with respect to x, both df/du and du/dx exist.
Multiplying the derivatives: (df/du) · (du/dx) = df/dx because the infinitesimals du cancel.
The proof is simple—the infinitesimals du cancel.
The excerpt notes that some textbooks warn against thinking of du as an actual quantity that can be canceled, but states you can safely ignore those warnings because du is just an infinitesimal and hence can be canceled.

🧠 Intuitive understanding

If df/du = 4, then f is increasing 4 times as fast as u.
If du/dx = 3, then u is increasing 3 times as fast as x.
Overall, f should be increasing 12 = 4 · 3 times as fast as x, exactly as the Chain Rule says.

🛠️ How to apply the Chain Rule

📦 Outer and inner functions

Think of the function as the composition of an "outer" function f and an "inner" function u.
First take the derivative of the "outer" function, then multiply by the derivative of the "inner" function.
Think of the "inner" function as a box (denoted ✷) into which you can put any function of x, and the "outer" function being a function of that empty box.

🔢 Example: sin(x squared + x + 1)

Make a substitution: u = x squared + x + 1, so that f(x) = sin u.
By the Chain Rule: df/dx = (df/du) · (du/dx) = d/du (sin u) · d/dx (x squared + x + 1) = (cos u) · (2x + 1).
Replace u by its definition: df/dx = (2x + 1) cos(x squared + x + 1).
The final answer for the derivative should be in terms of x, not u.

🔢 Example: (2x to the fourth - 3 cos x) to the tenth

The "outer" function is f(✷) = ✷ to the tenth.
The "inner" function is ✷ = u = 2x to the fourth - 3 cos x.
df/dx = (df/du) · (du/dx) = 10 · ✷ to the ninth · d/dx(✷) = 10 · (2x to the fourth - 3 cos x) to the ninth · (8x cubed + 3 sin x).

📏 Extending the Power Rule

🔢 Rational exponents

Using the Chain Rule, the Power Rule can be extended to include exponents that are rational numbers:
- d/dx (x to the power r) = r · x to the power (r - 1) for any rational number r.

🧮 Proof sketch

Let r = m/n, where m and n are integers with n not equal to 0.
Then y = x to the power r = x to the power (m/n) = (x to the power m) to the power (1/n), so y to the power n = x to the power m.
Taking the derivative with respect to x of both sides: d/dx (y to the power n) = d/dx (x to the power m).
Evaluating the left side by the Chain Rule: n · y to the power (n - 1) · (dy/dx) = m · x to the power (m - 1).
Solving for dy/dx yields: dy/dx = (m/n) · x to the power ((m/n) - 1) = r · x to the power (r - 1).

🔢 Example: square root of x

Since square root of x = x to the power (1/2), by the Power Rule: df/dx = (1/2) · x to the power ((1/2) - 1) = (1/2) · x to the power (-1/2) = 1 / (2 · square root of x).

🔢 Example: 2 / cube root of x

Rewrite as (2/3) · x to the power (-1/2).
df/dx = (2/3) · (-1/2) · x to the power (-3/2) = -1 / (3 · x to the power (3/2)).

🔗 Extensions of the Chain Rule

🔗 Three or more functions

The Chain Rule can be extended to 3 functions: if u is a differentiable function of x, v is a differentiable function of u, and f is a differentiable function of v, then df/dx = (df/dv) · (dv/du) · (du/dx).
The 3 derivatives are linked together as in a chain (hence the name of the rule).
The Chain Rule can be extended to any finite number of functions by the same technique.

Higher Order Derivatives

1.6 Higher Order Derivatives

🧭 Overview

🧠 One-sentence thesis

Higher order derivatives—obtained by repeatedly differentiating a function—represent rates of change of rates of change, with the second derivative famously capturing acceleration as the rate of change of velocity.

📌 Key points (3–5)

What higher order derivatives are: the result of differentiating a function multiple times (second derivative = derivative of the first derivative, third derivative = derivative of the second, etc.).
Multiple notations exist: prime notation (f′′, f′′′) becomes cumbersome for large n, so alternatives like f⁽ⁿ⁾(x), dⁿy/dxⁿ, and Dⁿf(x) are used.
Physical meaning of second derivative: in motion along a straight line, position s(t) → velocity v(t) = s′(t) → acceleration a(t) = s′′(t).
Common confusion: negative acceleration does not always mean "decelerating"—acceleration refers to the magnitude (absolute value) of velocity increasing, while deceleration means speed (magnitude) is decreasing.
Polynomials differentiate to zero: the (n+1)-st derivative of any degree-n polynomial is always zero.

🔢 Definition and notation

🔢 What higher order derivatives are

Second derivative f′′(x): the derivative of the first derivative f′(x).
Third derivative f′′′(x): the derivative of the second derivative.
n-th derivative: obtained by differentiating f(x) a total of n times.

The first derivative f′(x) is itself a function, so if it is differentiable, you can take its derivative.
Continuing this process yields the fourth, fifth, and all subsequent derivatives.
All derivatives beyond the first are called higher order derivatives.

📝 Notation systems

The excerpt lists many equivalent notations for the second derivative of y = f(x):

Notation type	Examples
Prime notation	f′′(x), y′′
Superscript notation	f⁽²⁾(x), y⁽²⁾
Leibniz notation	d²y/dx², d²f/dx², d²/dx²(f(x))
Dot notation	ÿ, f̈(x)
Operator notation	D²f(x)

Why multiple notations? Prime notation becomes cumbersome for large n (e.g., writing 50 prime marks for the fiftieth derivative).
Parentheses matter: f⁽ⁿ⁾(x) means "take n derivatives," not "raise to the n-th power."
Leibniz notation advantage: makes the iterative nature clear—d²y/dx² = d/dx(dy/dx), d³y/dx³ = d/dx(d²y/dx²), etc.

🧮 General formula for n-th derivative

For the n-th derivative of y = f(x), all of these are equivalent:

f⁽ⁿ⁾(x)
dⁿy/dxⁿ
dⁿ/dxⁿ(f(x))
y⁽ⁿ⁾
dⁿf/dxⁿ
Dⁿf(x)

The relationship: dⁿy/dxⁿ = d/dx(dⁿ⁻¹y/dxⁿ⁻¹) = dⁿ⁻¹/dxⁿ⁻¹(dy/dx).

🧪 Example calculation

Example: For f(x) = 3x⁴, find f′′(x) and f′′′(x).

First derivative: f′(x) = 12x³
Second derivative: f′′(x) = derivative of 12x³ = 36x²
Third derivative: f′′′(x) = derivative of 36x² = 72x

🚗 Physical interpretation: motion in a straight line

🚗 Position, velocity, and acceleration

The excerpt emphasizes motion along a straight line as the "most famous example" of higher order derivatives.

Position s(t): location of an object at time t along a line.
Velocity v(t): instantaneous rate of change of position = s′(t) = ds/dt.
Acceleration a(t): instantaneous rate of change of velocity = v′(t) = dv/dt = s′′(t) = d²s/dt².

One direction is positive, the other negative (e.g., up/down or forward/backward).
Velocity is the first derivative of position.
Acceleration is the second derivative of position (or the first derivative of velocity).
Summary: a(t) = d/dt(ds/dt) = d²s/dt² = s′′(t) = s̈(t).

⚾ Worked example: ball thrown upward

Example: A ball is thrown straight up with initial velocity 34 m/s from 2 m off the ground. Position is s(t) = -4.9t² + 34t + 2 (t in seconds, s in meters). Find velocity and acceleration.

Velocity: v(t) = ds/dt = -9.8t + 34 m/s
Acceleration: a(t) = d²s/dt² = d/dt(-9.8t + 34) = -9.8 m/s²
The acceleration is the constant gravitational acceleration on Earth.
At t = 0, v(0) = 34 m/s, confirming the initial velocity.

🔽 Why acceleration is negative throughout

The ball's acceleration is constant at -9.8 m/s² (negative) both while moving upward and while falling.

While moving upward:

Initial velocity is +34 m/s (upward).
Velocity decreases to 0 m/s at maximum height (t = 34/9.8 = 3.47 seconds).
Velocity is decreasing → rate of change of velocity (acceleration) is negative.

While moving downward:

Velocity goes from 0 m/s to negative values (e.g., v(4) = -5.2 m/s).
Negative velocity indicates downward motion.
Velocity continues to decrease (becomes more negative) → acceleration remains negative.
The ball hits the ground with velocity -33.43 m/s.

⚠️ Common confusion: acceleration vs. "accelerating"

Don't confuse mathematical sign with everyday language:

Acceleration (mathematical): the derivative a(t) = s′′(t); can be positive or negative.
"Accelerating" (common usage): the magnitude (absolute value) of velocity is increasing.
Deceleration: the magnitude of velocity (speed) is decreasing.

Example: As the ball falls, acceleration is negative (-9.8 m/s²), but people commonly say the ball is "accelerating" because its speed (magnitude of velocity) is increasing.

Speed: the magnitude (absolute value) of velocity.

🎢 Beyond the second derivative

🎢 Third and higher derivatives in physics

The excerpt notes that third, fourth, and higher derivatives also have physical meanings:

Derivative order	Name	Meaning
1st	Velocity	Rate of change of position
2nd	Acceleration	Rate of change of velocity
3rd	Jerk	Rate of change of acceleration
4th	Snap	Rate of change of jerk
5th	Crackle	Rate of change of snap
6th	Pop	Rate of change of crackle

Jerk is used in vehicle dynamics (e.g., minimizing jerk for smoother braking).
Snap has applications in flight dynamics (e.g., optimizing drone flight paths).
The names "snap," "crackle," and "pop" are inspired by a breakfast cereal.

🔄 Special cases: zero-th and fractional derivatives

Zero-th derivative: f⁽⁰⁾(x) is defined to be f(x) itself.
Fractional derivatives: e.g., the "one-half derivative" f⁽¹/²⁾(x) exists and will be discussed in Chapter 6 (not covered in this excerpt).

➕ Combining derivative orders

An immediate consequence of the definition:

d^(m+n)/dx^(m+n)(f(x)) = d^m/dx^m(d^n/dx^n(f(x))) for all integers m ≥ 0 and n ≥ 0.
In words: taking m+n derivatives is the same as taking n derivatives, then m more derivatives.

🧮 Polynomials and factorials

🧮 Factorial notation

Factorial n!: the product of all integers from 1 to n.

Examples:

1! = 1
2! = 1·2 = 2
3! = 1·2·3 = 6
4! = 1·2·3·4 = 24
By convention, 0! = 1.

📐 n-th derivative of x^n

The excerpt states (provable by induction):

dⁿ/dxⁿ(xⁿ) = n! for all integers n ≥ 0.

Example: The third derivative of x³ is 3! = 6.

Consequence: The (n+1)-st derivative of xⁿ is zero:

d^(n+1)/dx^(n+1)(xⁿ) = d/dx(dⁿ/dxⁿ(xⁿ)) = d/dx(n!) = 0, because n! is a constant.

🔚 Polynomials differentiate to zero

Key fact: The (n+1)-st derivative of a degree-n polynomial is 0.

For any polynomial p(x) = a₀ + a₁x + a₂x² + ··· + aₙxⁿ of degree n:

d^(n+1)/dx^(n+1)(p(x)) = 0.

Why?

Polynomials are linear combinations of nonnegative powers of x.
The Sum Rule and Constant Multiple Rule hold for higher-order derivatives.
Each term aₖxᵏ (where k ≤ n) has its (n+1)-st derivative equal to zero.

Common statement: "Any polynomial can be differentiated to 0" by taking enough derivatives.

Example: The polynomial p(x) = 100x¹⁰⁰ + 50x⁹⁹ has degree 100, so differentiating 101 times (or more) yields 0.

Inverse Functions

2.1 Inverse Functions

🧭 Overview

🧠 One-sentence thesis

If a differentiable function has an inverse, that inverse is also differentiable (provided the original derivative is nonzero), and its derivative is the reciprocal of the original function's derivative.

📌 Key points (3–5)

What makes a function invertible: a function must be one-to-one (each output corresponds to exactly one input) to have an inverse.
How to test one-to-one: use the horizontal rule—every horizontal line intersects the graph at most once.
The derivative formula: the derivative of the inverse is dx/dy = 1/(dy/dx), provided dy/dx ≠ 0.
Common confusion: the two sides of dx/dy = 1/(dy/dx) are in different variables (y vs. x), so you must substitute to express everything in the same variable.
When the inverse is not differentiable: even if the inverse exists at a point where dy/dx = 0, it is not differentiable there because the derivative would be 1/0 (undefined).

🔍 Testing for functions and invertibility

🔍 What is a function

A function is a rule that assigns a single object y from one set (the range) to each object x from another set (the domain).

Written as y = f(x), where f is the function.
Vertical rule: f is a function if and only if every vertical line intersects the graph of y = f(x) at most once.
This ensures each x maps to only one y.

🔄 What is a one-to-one function

A function f is one-to-one (1-1) if it assigns distinct values of y to distinct values of x.

Formally: if x₁ ≠ x₂ then f(x₁) ≠ f(x₂).
Equivalently: if f(x₁) = f(x₂) then x₁ = x₂.
Horizontal rule: f is one-to-one if and only if every horizontal line intersects the graph of y = f(x) at most once.
Only one-to-one functions have inverses.

🧩 When a function is one-to-one

Intuitively, a function is one-to-one when it is either strictly increasing or strictly decreasing.
If a function increases and then decreases (has a "turning point"), the horizontal rule is violated.
For differentiable functions:
- Positive derivative → function is increasing.
- Negative derivative → function is decreasing.
Important exception: a function can still be one-to-one even if its derivative is zero at isolated points (not over an entire interval), as long as it is either positive or negative everywhere else.
Example: f(x) = x³ has derivative f'(x) = 3x², which is zero only at x = 0 and positive everywhere else; f is one-to-one over all real numbers.

Don't confuse: having a derivative that is always positive or always negative is sufficient for one-to-one but not necessary; however, having a nonzero derivative is necessary for the inverse to be differentiable.

🔄 Inverse functions

🔄 Definition and properties

If a function f is one-to-one on its domain, then f has an inverse function, denoted f⁻¹, such that y = f(x) if and only if f⁻¹(y) = x.

The domain of f⁻¹ is the range of f.
Key idea: f⁻¹ "undoes" what f does, and vice versa.
f⁻¹(f(x)) = x for all x in the domain of f.
f(f⁻¹(y)) = y for all y in the range of f.

📝 Notation conventions

Functions are often expressed in terms of x, so the inverse is also written in terms of x.
Example: the inverse of f(x) = x³ can be written as f⁻¹(x) = ³√x (not f⁻¹(y) = ³√y).
To handle this, switch the roles of x and y: rewrite y = f(x) as x = f(y), then write y = f⁻¹(x).

📐 Derivative of an inverse function

📐 The reciprocal formula

Derivative of an Inverse Function: If y = f(x) is differentiable and has an inverse function x = f⁻¹(y), then f⁻¹ is differentiable and its derivative is dx/dy = 1/(dy/dx) if dy/dx ≠ 0.

This follows from treating infinitesimals dy and dx like numbers: a/b = 1/(b/a) for nonzero a and b.
The inverse exists at a point where dy/dx = 0, but it is not differentiable there because the derivative would be 1/0 (undefined).

🔀 Handling the variable mismatch

Since y is a function of x, dy/dx is in terms of x, so 1/(dy/dx) is also in terms of x.
But dx/dy should normally be in terms of y (since x is a function of y).
Solution: use the formula y = f(x) to solve for x in terms of y, then substitute into dy/dx so that dx/dy = 1/(dy/dx) is in terms of y.
This may not always be possible (e.g., solving for x in y = x sin x).

Example: For f(x) = x³, we have y = x³, so x = f⁻¹(y) = ³√y. The derivative is dx/dy = 1/(dy/dx) = 1/(3x²). Putting it in terms of y: dx/dy = 1/(3(³√y)²) = 1/(3y^(2/3)).

🔁 Alternative approach (switching x and y)

Rewrite y = f(x) = x³ as x = f(y) = y³.
The inverse function is y = f⁻¹(x) = ³√x.
The derivative is dy/dx = 1/(dx/dy) = 1/(3y²). Putting it in terms of x: dy/dx = 1/(3(³√x)²) = 1/(3x^(2/3)).
This agrees with differentiating y = ³√x directly.

🧮 Prime notation formulas

Starting from f⁻¹(f(x)) = x and differentiating both sides using the Chain Rule:

(f⁻¹)'(f(x)) · f'(x) = 1
Therefore: (f⁻¹)'(f(x)) = 1/f'(x) if f'(x) ≠ 0

Two equivalent ways to write this:

Form	Condition
(f⁻¹)'(c) = 1/f'(a)	where c = f(a) and f'(a) ≠ 0
(f⁻¹)'(x) = 1/f'(f⁻¹(x))	if f'(f⁻¹(x)) ≠ 0

Trigonometric Functions and Their Inverses

2.2 Trigonometric Functions and Their Inverses

🧭 Overview

🧠 One-sentence thesis

Trigonometric functions become one-to-one and invertible when restricted to specific domains, and their inverse functions have well-defined derivatives that follow from the chain rule and implicit differentiation.

📌 Key points (3–5)

Periodicity prevents one-to-one behavior: sin x, cos x, csc x, and sec x repeat every 2π radians; tan x and cot x repeat every π radians, so they are not one-to-one over their entire domains.
Domain restriction creates inverses: each trigonometric function becomes one-to-one when restricted to a smaller interval, allowing inverse trigonometric functions to be defined.
Inverse functions have specific domains and ranges: for example, sin⁻¹ x is defined only for x in [-1, 1] and outputs angles in [-π/2, π/2].
Common confusion—notation: sin⁻¹ x (or arcsin x) means the inverse function, not 1/sin x; the inverse of sine is not the same as cosecant.
Derivatives of inverse trig functions: derived using implicit differentiation and the chain rule, they involve square roots and absolute values depending on the domain.

📊 The six trigonometric functions and their periods

📊 Periods of trig functions

The excerpt lists six trigonometric functions with two different periods:

Function	Period
sin x	2π
cos x	2π
csc x	2π
sec x	2π
tan x	π
cot x	π

Period means the function repeats the same values after that interval.
Example: sin(x + 2π) = sin x for all x; tan(x + π) = tan x for all x.

📐 Derivatives of the six trig functions

The excerpt recalls the derivatives from an earlier section:

d/dx (sin x) = cos x
d/dx (cos x) = -sin x
d/dx (tan x) = sec² x
d/dx (csc x) = -csc x cot x
d/dx (sec x) = sec x tan x
d/dx (cot x) = -csc² x

These formulas are used later to derive the derivatives of inverse trigonometric functions.

🔄 Restricting domains to create inverse functions

🔄 Why restriction is necessary

The six trigonometric functions are not one-to-one over their entire domains.

Because they are periodic, they repeat values infinitely many times.
A function must be one-to-one (each output corresponds to exactly one input) to have an inverse.
Solution: restrict each function to a smaller domain where it is one-to-one.

🔄 Standard restricted domains

The excerpt specifies the intervals on which each function is one-to-one:

Function	Restricted domain
sin x	[-π/2, π/2]
cos x	[0, π]
tan x	(-π/2, π/2)
csc x	(-π/2, 0) ∪ (0, π/2)
sec x	(0, π/2) ∪ (π/2, π)
cot x	(0, π)

Example: y = sin x is one-to-one when x is restricted to [-π/2, π/2], as shown in Figure 2.2.2.
On this interval, sin x takes every value from -1 to 1 exactly once.

🔁 Inverse trigonometric functions

🔁 Definitions and notation

The inverse trigonometric functions sin⁻¹ x, cos⁻¹ x, tan⁻¹ x, csc⁻¹ x, sec⁻¹ x, and cot⁻¹ x are defined.

Notation: sin⁻¹ x is also written as arcsin x; similarly for the others.
Don't confuse: sin⁻¹ x does NOT mean 1/sin x (which is csc x). The "-1" denotes the inverse function, not a reciprocal.

🔁 Domains and ranges of inverse trig functions

The excerpt provides a table:

Function	Domain	Range
sin⁻¹ x	[-1, 1]	[-π/2, π/2]
cos⁻¹ x	[-1, 1]	[0, π]
tan⁻¹ x	(-∞, ∞)	(-π/2, π/2)
csc⁻¹ x	\|x\| ≥ 1	(-π/2, 0) ∪ (0, π/2)
sec⁻¹ x	\|x\| ≥ 1	(0, π/2) ∪ (π/2, π)
cot⁻¹ x	(-∞, ∞)	(0, π)

The domain of the inverse function is the range of the original restricted function.
The range of the inverse function is the domain of the original restricted function.
Example: sin⁻¹ x takes inputs from -1 to 1 and outputs angles from -π/2 to π/2.

🔁 Graphs of inverse trig functions

The excerpt includes graphs (Figures 2.2.3 and 2.2.4) showing:

sin⁻¹ x, cos⁻¹ x, tan⁻¹ x in one set.
csc⁻¹ x, sec⁻¹ x, cot⁻¹ x in another set.
These graphs are reflections of the restricted trig function graphs across the line y = x.

🧮 Derivatives of inverse trigonometric functions

🧮 The six derivative formulas

The excerpt lists:

d/dx (sin⁻¹ x) = 1 / √(1 - x²) for |x| < 1
d/dx (cos⁻¹ x) = -1 / √(1 - x²) for |x| < 1
d/dx (tan⁻¹ x) = 1 / (1 + x²)
d/dx (csc⁻¹ x) = -1 / (|x| √(x² - 1)) for |x| > 1
d/dx (sec⁻¹ x) = 1 / (|x| √(x² - 1)) for |x| > 1
d/dx (cot⁻¹ x) = -1 / (1 + x²)

🧮 Derivation example: cos⁻¹ x

The excerpt walks through the proof for d/dx (cos⁻¹ x):

Let y = cos⁻¹ x, so y is an angle between 0 and π radians, defined for -1 ≤ x ≤ 1.
By definition, cos y = x.
Differentiate both sides with respect to y: dx/dy = -sin y.
Use the identity sin² y = 1 - cos² y = 1 - x², so sin y = ±√(1 - x²).
Since 0 ≤ y ≤ π, sin y must be nonnegative, so sin y = √(1 - x²).
By the inverse function rule, dy/dx = 1 / (dx/dy) = 1 / (-sin y) = -1 / √(1 - x²).

Why the sign matters: The range of cos⁻¹ x ensures sin y ≥ 0, which determines the sign in the derivative.

🧮 Derivation example: sec⁻¹ x

The excerpt also derives d/dx (sec⁻¹ x):

Let y = sec⁻¹ x, defined for |x| ≥ 1.
For x ≥ 1, 0 ≤ y < π/2; for x ≤ -1, π/2 < y ≤ π.
Recall sec y = x and 1 + tan² y = sec² y, so tan² y = sec² y - 1 = x² - 1, hence tan y = ±√(x² - 1).
In both ranges, sec y and tan y have the same sign, so sec y tan y is nonnegative: sec y tan y = |sec y tan y|.
Differentiate sec y = x: dx/dy = sec y tan y.
By the inverse function rule, dy/dx = 1 / (sec y tan y) = 1 / |sec y tan y| = 1 / (|x| √(x² - 1)).

Why absolute value: The product sec y tan y is always nonnegative on the range of sec⁻¹ x, so the absolute value ensures the formula works for both positive and negative x.

🧮 Proofs for other inverses

The excerpt states:

The proofs of the derivative formulas for the remaining inverse trigonometric functions are similar, and are left as exercises.

The same technique (implicit differentiation + inverse function rule + trigonometric identities) applies to sin⁻¹ x, tan⁻¹ x, csc⁻¹ x, and cot⁻¹ x.

🧪 Examples using the chain rule

🧪 Example: derivative of 3 tan(π - 2x)

The excerpt provides Example 2.3:

Let u = π - 2x, so y = 3 tan u.
By the chain rule, dy/dx = (dy/du) · (du/dx).
dy/du = 3 sec² u and du/dx = -2.
Therefore, dy/dx = (3 sec² u)(-2) = -6 sec²(π - 2x).

🧪 Example: derivative of sin⁻¹(x/4)

The excerpt provides Example 2.4:

Let u = x/4, so y = sin⁻¹ u.
By the chain rule, dy/dx = (dy/du) · (du/dx).
dy/du = 1 / √(1 - u²) and du/dx = 1/4.
Therefore, dy/dx = (1 / √(1 - u²)) · (1/4) = 1 / (4√(1 - (x²/16))) = 1 / √(16 - x²).

Key point: The chain rule applies to inverse trig functions just as it does to regular trig functions; substitute the inner function and multiply by its derivative.

The Exponential and Natural Logarithm Functions

2.3 The Exponential and Natural Logarithm Functions

🧭 Overview

🧠 One-sentence thesis

The exponential function e^x and its inverse, the natural logarithm ln x, are uniquely important because they model quantities whose instantaneous rate of change is directly proportional to the amount present.

📌 Key points (3–5)

Defining e^x for irrational exponents: exponential functions a^x extend to irrational exponents by taking limits of rational approximations.
Why e is special: the function y = A·e^(kt) satisfies dy/dt = k·y, meaning the rate of change is proportional to the current amount—a pattern that appears throughout physics and biology.
Derivative formulas: d/dx(e^x) = e^x and d/dx(ln x) = 1/x; these simple forms make e and ln the most convenient base and logarithm.
Common confusion: ln x is defined only for x > 0, but d/dx(ln|x|) = 1/x works for all x ≠ 0.
Logarithmic differentiation: taking ln of both sides before differentiating simplifies products, quotients, and variable exponents.

📐 Defining exponential functions for all real exponents

📐 Rational exponents

For rational x = m/n (m and n integers, n ≠ 0), a^x is already defined from algebra.
Example: 3^(14/10) = 3^1.4 is well-defined.

🔢 Irrational exponents via limits

For irrational x (e.g., √2), define a^x as the limit of a^(m/n) as the rational numbers m/n approach x.

√2 = 1.414213562... can be approximated by 1.4, 1.41, 1.414, 1.4142, etc.
Then 3^√2 is the number that 3^1.4, 3^1.41, 3^1.414, ... approach.
The excerpt shows: 3^1.4 ≈ 4.6555, 3^1.41 ≈ 4.7070, ..., 3^√2 ≈ 4.7288.
In practice, calculators use efficient algorithms; you never compute this by hand.
All usual exponent rules from algebra (a^x · a^y = a^(x+y), etc.) apply once a^x is defined this way for a > 0 and all real x.

🌟 The number e and the exponential function

🌟 Definition of e

e = lim (as x → ∞) of (1 + 1/x)^x

e ≈ 2.71828182845905... (the Euler number).
As x becomes larger, (1 + 1/x)^x approaches e.
Example: when x = 5 × 10^6, the value is 2.718281555200129.

🔑 A useful limit

For extremely large x (x ≫ 1), e ≈ (1 + 1/x)^x.
Raising both sides to the power 1/x gives e^(1/x) ≈ 1 + 1/x.
Rearranging: (e^(1/x) − 1)·x ≈ 1.
Letting h = 1/x (so h → 0 as x → ∞) yields:

lim (as h → 0) of (e^h − 1)/h = 1

This limit is central to finding the derivative of e^x.

🧮 Derivative of e^x

d/dx(e^x) = e^x

Proof sketch from the excerpt:

Start with the limit definition: d/dx(e^x) = lim (as h → 0) of (e^(x+h) − e^x)/h.
Factor: = lim (as h → 0) of e^x·(e^h − 1)/h.
Since e^x does not depend on h, pull it out: = e^x · lim (as h → 0) of (e^h − 1)/h.
By the limit above, this equals e^x · 1 = e^x.

Chain Rule extension:

For a differentiable function u = u(x), d/dx(e^u) = e^u · du/dx.
Example: y = 4·e^(−x²) ⇒ dy/dx = 4·e^(−x²)·(−2x) = −8x·e^(−x²).

🎯 Why e is special

Consider y = A·e^(kt) representing the amount of a quantity at time t (A and k are constants).
Then dy/dt = d/dt(A·e^(kt)) = k·A·e^(kt) = k·y.
Interpretation: the instantaneous rate of change is directly proportional to the amount present.
Many physical quantities (radioactive decay, bacterial growth, electric current) exhibit this behavior.
Conversely, any solution to dy/dt = k·y must be of the form y = A·e^(kt) (proven in Chapter 5).

📊 Graph and properties of e^x

e^x > 0 for all x.
The derivative (e^x)' = e^x > 0 for all x, so e^x is strictly increasing.
Domain: all real numbers; range: all positive real numbers.

🪵 The natural logarithm function

🪵 Definition and relationship to e^x

The natural logarithm ln x is the inverse function of e^x.

Domain of ln x = all x > 0 = range of e^x.
Range of ln x = all real numbers = domain of e^x.
Key equivalence: y = e^x if and only if x = ln y.
Inverse identities:
- e^(ln x) = x for all x > 0.
- ln(e^x) = x for all x.

Notation warning:

Many fields outside mathematics use log x instead of ln x for the natural logarithm.
This text uses ln x for compatibility with other mathematics texts.

📈 Graph of ln x

The graph is the reflection of e^x across the line y = x.
Domain: x > 0; range: all real numbers.

🧮 Derivative of ln x

d/dx(ln x) = 1/x

Derivation from the excerpt:

Start with x = e^y (since y = ln x).
Differentiate implicitly: dy/dx = 1/(dx/dy) = 1/(d/dy(e^y)) = 1/e^y = 1/x.

Chain Rule extension:

For u = u(x), d/dx(ln u) = (1/u)·(du/dx) = u'/u.
Example: y = ln(x² + 3x − 1) ⇒ dy/dx = (2x + 3)/(x² + 3x − 1).

🔍 Derivative of ln|x|

Recall |x| = −x for x < 0.
For x < 0, ln(−x) is defined and d/dx(ln(−x)) = (1/(−x))·(−1) = 1/x.
For x > 0, d/dx(ln x) = 1/x.
Combined result:

d/dx(ln|x|) = 1/x for all x ≠ 0

Don't confuse: ln x is only defined for x > 0, but ln|x| extends the domain to all x ≠ 0.

🧩 Properties of ln and e

The excerpt lists equivalent logarithm and exponential properties:

Logarithm property	Exponential equivalent
ln(a·b) = ln a + ln b	e^a · e^b = e^(a+b)
ln(a/b) = ln a − ln b	e^a / e^b = e^(a−b)
ln(a^b) = b·ln a	(e^a)^b = e^(ab)
ln 1 = 0	e^0 = 1

Numerical note:

When computing ln(a/b) on calculators, prefer ln(a/b) over ln a − ln b.
The subtraction ln a − ln b suffers from "subtractive cancellation" if a and b are nearly equal, potentially giving an incorrect answer of 0.

🧪 Logarithmic differentiation

🧪 The technique

Logarithmic differentiation: take the natural logarithm of both sides of y = f(x), simplify using ln properties, then differentiate both sides and solve for y'.

Useful when the function involves products, quotients, or variable exponents.
After taking ln, use properties like ln(a·b) = ln a + ln b to break apart complicated expressions.

🔢 Example: variable exponent

Problem: Find the derivative of y = x^x (assume x > 0).

Solution:

Cannot use the Power Rule (exponent is not constant) or the exponential rule d/dx(a^x) = (ln a)·a^x (base is not constant).
Take ln of both sides: ln y = ln(x^x) = x·ln x.
Differentiate: d/dx(ln y) = d/dx(x·ln x).
Left side: (y'/y).
Right side: 1·ln x + x·(1/x) = ln x + 1.
So y'/y = ln x + 1 ⇒ y' = y·(ln x + 1) = x^x·(ln x + 1).

🧩 Example: complicated product and quotient

Problem: Find the derivative of y = [(2x + 1)^7·(3x³ − 7x + 6)^4] / [(1 + sin x)^5].

Solution:

Take ln: ln y = ln[(2x + 1)^7·(3x³ − 7x + 6)^4] − ln[(1 + sin x)^5].
Expand: = ln[(2x + 1)^7] + ln[(3x³ − 7x + 6)^4] − ln[(1 + sin x)^5].
Use ln(a^b) = b·ln a: = 7·ln(2x + 1) + 4·ln(3x³ − 7x + 6) − 5·ln(1 + sin x).
Differentiate: y'/y = 7·(2/(2x + 1)) + 4·((9x² − 7)/(3x³ − 7x + 6)) − 5·(cos x/(1 + sin x)).
Simplify: y'/y = 14/(2x + 1) + (36x² − 28)/(3x³ − 7x + 6) − (5·cos x)/(1 + sin x).
Multiply by y: y' = y·[14/(2x + 1) + (36x² − 28)/(3x³ − 7x + 6) − (5·cos x)/(1 + sin x)].
Substitute y back in for the final answer.

☢️ Applications: exponential decay and growth

☢️ Radioactive decay

Exponential decay: a radioactive substance's amount y at time t ≥ 0 satisfies dy/dt = k·y, where the decay constant k < 0.

General solution: y = A₀·e^(kt), where A₀ is the initial amount at t = 0.
Since the substance is decaying, dy/dt < 0 and y > 0, so k must be negative.

⏱️ Half-life

Half-life t_H: the time required for half the current amount of substance to decay.

Key insight: half-life is constant, independent of the initial amount A₀.
Proof from the excerpt: pick any time t, so y(t) = A₀·e^(kt) is the current amount.
By definition, y(t + t_H) = (1/2)·y(t).
Then A₀·e^(k(t + t_H)) = (1/2)·A₀·e^(kt).
Cancel A₀·e^(kt): e^(k·t_H) = 1/2.
Take ln: k·t_H = ln(1/2) = −ln 2.
Solve: t_H = −(ln 2)/k and k = −(ln 2)/t_H.
Notice t_H depends only on k, not on A₀ or t.

🧮 Example: finding half-life

Problem: 5 mg of a substance decays to 3 mg in 6 hours. Find the half-life.

Solution:

Initial amount A₀ = 5 mg, so y(t) = 5·e^(kt).
At t = 6, y(6) = 3 mg: 3 = 5·e^(6k).
Solve for k: 6k = ln(3/5) = ln(0.6) ⇒ k = (1/6)·ln(0.6).
Half-life: t_H = −(ln 2)/k = −(ln 2)/[(1/6)·ln(0.6)] = 8.14 hours.

Strategy:

Given a time and amount, use them to find k.
Then use k to find t_H (or vice versa: given t_H, find k, then solve for required time t).

🦠 Exponential growth

Example: bacterial cell growth.
Same form y = A₀·e^(kt), but now k > 0 since the number of cells is increasing.

⚡ Electric circuit example

A simple series DC circuit with voltage V, capacitance C, resistance R, and a switch.
Initially uncharged capacitor; switch closes at t = 0.
By Kirchhoff's Second Law: RC·(dI/dt) + I = 0 ⇒ dI/dt = −I/(RC).
Solution: I(t) = I₀·e^(−t/(RC)), where I₀ is the initial current.
By Ohm's Law, V = I₀·R, so I(t) = (V/R)·e^(−t/(RC)).
The current decreases exponentially.

🌍 Atmospheric pressure

Pressure p as a function of height h above the Earth's surface (assuming constant temperature).
Differential equation: dp/dh = −(w₀/p₀)·p, where p₀ is pressure at ground level (h = 0) and w₀ is the weight of a cubic foot of air at pressure p₀.
Solution: p(h) = p₀·e^(−(w₀/p₀)·h).
Atmospheric pressure decreases exponentially with height.
Don't confuse: the variable here is height h, not time t.

2.4 General Exponential and Logarithmic Functions

🧭 Overview

🧠 One-sentence thesis

General exponential functions a^x and their inverses log_a(x) can be differentiated using logarithmic differentiation and expressed in terms of the natural exponential and logarithm, with their derivatives depending on the constant factor ln(a).

📌 Key points (3–5)

Derivative of a^x: The derivative is (ln a) · a^x, derived using logarithmic differentiation; the natural log of the base appears as a constant factor.
Expressing a^x in terms of e: Any exponential a^x equals e^(x ln a), which is how calculators compute exponentials.
Inverse function log_a(x): For a > 0 and a ≠ 1, the function a^x has an inverse called the base-a logarithm; it is strictly increasing when a > 1 and strictly decreasing when 0 < a < 1.
Derivative of log_a(x): The derivative is 1/(x ln a), which can be derived from the relationship log_a(x) = ln(x)/ln(a).
Common confusion: The base a matters—the derivative formulas include ln(a) as a factor, so base-e (natural) exponentials and logarithms are simpler because ln(e) = 1.

📐 Deriving the derivative of a^x

📐 Using logarithmic differentiation

The excerpt shows how to find the derivative of y = a^x by taking the natural logarithm of both sides:

Start with ln(y) = ln(a^x) = x ln(a).
Differentiate both sides: d/dx(ln y) = d/dx(x ln a) = ln(a).
Since d/dx(ln y) = y'/y, we have y'/y = ln(a).
Solve for y': y' = y · ln(a).

Derivative of a^x: d/dx(a^x) = (ln a) · a^x

The constant ln(a) appears because the base a is fixed.
For composite exponents u = u(x): d/dx(a^u) = (ln a) · a^u · du/dx.

🧮 Example: differentiating 2^(cos x)

The excerpt provides an example with a = 2:

y = 2^(cos x).
dy/dx = (ln 2) · 2^(cos x) · d/dx(cos x).
Since d/dx(cos x) = -sin x, the result is dy/dx = -(ln 2)(sin x) · 2^(cos x).

Don't confuse: The derivative of a^x is not x · a^(x-1) (that rule applies to x^a, not a^x).

🔗 Expressing a^x in terms of e^x

🔗 The conversion formula

The excerpt shows that any exponential can be rewritten using the natural exponential:

Since a^x > 0, we can write e^(ln(a^x)) = a^x.
Because ln(a^x) = x ln(a), we have:

Conversion formula: a^x = e^(x ln a)

Computers and calculators use this formula to compute a^x.
This explains why the derivative formula d/dx(a^x) = (ln a) · a^x matches the chain rule applied to e^(x ln a).

🔄 The inverse function: log_a(x)

🔄 When a^x has an inverse

The excerpt explains that y = a^x has an inverse for any a > 0 except a = 1:

When a = 1, y = 1^x = 1 is constant, so no inverse exists.
For 0 < a < 1: ln(a) < 0, so dy/dx = (ln a) · a^x is always negative → a^x is strictly decreasing.
For a > 1: ln(a) > 0, so dy/dx = (ln a) · a^x is always positive → a^x is strictly increasing.
Because a^x is one-to-one in these cases, it has an inverse.

Base-a logarithm: The inverse of f(x) = a^x is denoted f^(-1)(x) = log_a(x), spoken as "log base a of x."

The natural logarithm ln(x) is the special case log_e(x).
The graphs of log_a(x) are reflections of a^x across the line y = x.

🧩 Properties of log_a(x)

The excerpt lists properties parallel to those of a^x:

Property of log_a	Corresponding property of a^x
log_a(bc) = log_a(b) + log_a(c)	a^b · a^c = a^(b+c)
log_a(b/c) = log_a(b) - log_a(c)	a^b / a^c = a^(b-c)
log_a(b^c) = c · log_a(b)	(a^b)^c = a^(bc)
log_a(1) = 0	a^0 = 1

Don't confuse: log_a(b + c) is not log_a(b) + log_a(c); the sum rule applies to products, not sums.

🔗 Expressing log_a(x) in terms of ln(x)

The excerpt derives a conversion formula:

Start with x = a^(log_a(x)).
Take the natural log: ln(x) = ln(a^(log_a(x))) = (log_a(x)) · (ln a).
Divide by ln(a):

Conversion formula: log_a(x) = ln(x) / ln(a)

This is useful on calculators that lack a log_a(x) function.

🧮 Derivative of log_a(x)

🧮 Deriving the formula

The excerpt takes the derivative of both sides of log_a(x) = ln(x) / ln(a):

Since ln(a) is a constant, d/dx(log_a(x)) = d/dx(ln(x) / ln(a)) = (1/x) / ln(a).

Derivative of log_a(x): d/dx(log_a(x)) = 1 / (x ln a)

For a composite function u = u(x): d/dx(log_a(u)) = (1 / (u ln a)) · du/dx = u' / (u ln a).

🧮 Example: differentiating log_2(cos 4x)

The excerpt provides an example with a = 2:

y = log_2(cos 4x).
dy/dx = (1 / ((cos 4x)(ln 2))) · d/dx(cos 4x).
Since d/dx(cos 4x) = -4 sin 4x, the result is dy/dx = -4 sin 4x / ((ln 2)(cos 4x)).

🔢 Common bases and their uses

🔢 Base 10 and base 2

The excerpt mentions two commonly used bases besides e:

Base	Name	Use
10	Base 10	How numbers are normally expressed (e.g., 2014 = 2·10³ + 0·10² + 1·10¹ + 4·10⁰)
2	Base 2 (binary)	Computer science; computers represent numbers as sequences of 0s and 1s

Binary format: Numbers are expressed as sums of powers of 2.
Example from the excerpt: 6 in binary is 110, because 1·2² + 1·2¹ + 0·2⁰ = 4 + 2 + 0 = 6.

Tangent Lines

3.1 Tangent Lines

🧭 Overview

🧠 One-sentence thesis

The tangent line to a curve at a point is the unique straight line that best approximates the curve locally because it shares the same rate of change (derivative) as the curve at that point.

📌 Key points (3–5)

What a tangent line is: the unique line through a point on a curve with slope equal to the derivative at that point.
Why it approximates the curve: both the tangent line and curve have the same rate of change at the point of tangency, so their values change by roughly the same amount nearby.
How to find it: use the formula y − f(a) = f′(a) · (x − a), where a is the x-value of the point of tangency.
Common confusion: tangent lines to general curves can cut through the curve (unlike circles, where the tangent touches at only one point and stays on one side).
When tangent lines exist: only smooth curves have tangent lines; sharp edges and cusps do not have tangent lines because the derivative does not exist there.

📐 Definition and formula

📐 What is a tangent line

For a curve y = f(x) that is differentiable at x = a, the tangent line to the curve at the point P = (a, f(a)) is the unique line through P with slope m = f′(a). P is called the point of tangency.

The tangent line extends the infinitesimal straight segment of the curve (from the Microstraightness Property) to all x values.
It is the line that passes through the point and has slope equal to the derivative at that point.

🧮 The tangent line equation

The equation is given by:

y − f(a) = f′(a) · (x − a)

Where:

a is the x-coordinate where you want the tangent
f(a) is the y-coordinate (the point is on the curve)
f′(a) is the derivative (the slope of the tangent line)

Example: For y = x² at x = 1, we have f(1) = 1 and f′(1) = 2, so the tangent line is y − 1 = 2(x − 1), or y = 2x − 1.

🔍 Key properties of tangent lines

🔍 Why the tangent line is the best approximation

At point P, the tangent line and curve both have the same rate of change, namely f′(a).
Because they start at the same point and change at the same rate, their values remain nearly equal close to P.
If you zoom in on the curve near P with a microscope, it would look almost identical to its tangent line.
The approximation gets worse farther from the point of tangency.

📏 Slope of a curve

The slope of a curve at a particular point is defined as the slope of its tangent line at that point.
Unlike straight lines (constant slope), curves can have varying slopes depending on the point.
Easy memory aid: "slope = derivative."

➖ Tangent line to a straight line

The tangent line to a straight line is the straight line itself.
This makes sense because a straight line's slope (derivative) never changes, so the tangent line—having the same slope—must coincide with the original line.
Example: The tangent line to y = −3x + 2 is y = −3x + 2 at every point.

🔗 Tangent lines as limits of secant lines

🔗 What is a secant line

A secant line to a curve is a line that passes through two points on the curve.

Consider a secant line passing through points P = (x₀, f(x₀)) and Q = (w, f(w)) on the curve.
The slope of this secant line is (f(w) − f(x₀))/(w − x₀).

🎯 Limit relationship

As point Q moves along the curve toward P:

The secant line approaches the tangent line at P (provided the curve is smooth at P).
This is because the limit of the secant line's slope equals f′(x₀), which is the slope of the tangent line.
Mathematically: limit as Q → P of (slope of secant) = f′(x₀) = slope of tangent line.

⚠️ When tangent lines exist and when they don't

✅ Smooth curves have tangent lines

A curve must be smooth (differentiable) at a point to have a tangent line there.
Smooth means the derivative exists at that point.

❌ Nonsmooth curves do not have tangent lines

Sharp edges: The absolute value function f(x) = |x| has a sharp edge at (0, 0). There is no tangent line at (0, 0) because the derivative does not exist there.

Cusps: Curves with cusps also lack tangent lines at the cusp point.

Many lines can pass through the nonsmooth point, but none can be the tangent line.
Don't confuse: just because a line passes through a point doesn't make it a tangent line—it must also have the correct slope (the derivative).

📐 Tangent lines vs circles

🔄 Different from the trigonometry definition

In trigonometry, a tangent line to a circle touches the circle at only one point and stays on the exterior (one side).
The calculus definition is more general: tangent lines to other curves can cut through the curve.

🔀 Tangent lines can intersect curves multiple times

Example: For y = x³ at x = 0, the tangent line is y = 0 (the x-axis itself). This tangent line cuts through the curve, not staying on one side.

The trigonometry definition is a special case of the calculus definition.
Don't assume a tangent line must touch at only one point or stay on one side.

📐 Angles and normal lines

📐 Angle with the x-axis

Let φ(x) be the smallest angle that the tangent line makes with the positive x-axis, where −90° < φ(x) < 90°.

Since the slope of a line equals rise/run = tan(φ), and the slope is f′(x), we have:

φ(x) = tan⁻¹(f′(x))

Negative slope → −90° < φ < 0°
Positive slope → 0° < φ < 90°
Zero slope (horizontal) → φ = 0°

Example: For y = e^(2x) at x = −1/2, the derivative is f′(−1/2) = 2e⁻¹ ≈ 0.7358, so φ = tan⁻¹(0.7358) ≈ 36.3°.

⊥ Normal line

The normal line to a curve at point P is the line perpendicular to the tangent line at P.

Since perpendicular lines have slopes that are negative reciprocals:

y − f(a) = −1/f′(a) · (x − a) if f′(a) ≠ 0

If f′(a) = 0 (horizontal tangent), the normal line is vertical: x = a.

Example: For y = x² at x = 1, the tangent line has slope 2, so the normal line has slope −1/2, giving y − 1 = −1/2(x − 1), or y = −1/2 x + 3/2.

Limits: Formal Definition

3.2 Limits: Formal Definition

🧭 Overview

🧠 One-sentence thesis

The formal epsilon-delta definition of limits makes precise the intuitive idea that a function can be made arbitrarily close to a limit value by choosing input values sufficiently close to a point, and this framework extends to one-sided limits, infinite limits, and limits at infinity.

📌 Key points (3–5)

What the formal definition captures: for any desired closeness (epsilon) to the limit L, there exists a sufficiently small interval (delta) around the point a such that the function stays within epsilon of L.
The limit does not depend on the function's value at the point itself: x = a is excluded from the definition, so f(a) can differ from L or even be undefined.
One-sided limits and when limits fail to exist: a limit exists only if both the right limit and left limit exist and are equal; otherwise the limit does not exist.
Common confusion—indeterminate forms: expressions like infinity/infinity or 0/0 do not automatically simplify; they require techniques like L'Hôpital's Rule or algebraic manipulation.
Asymptotic behavior and growth rates: limits at infinity reveal long-term behavior, horizontal and vertical asymptotes, and which functions grow faster (exponential outstrips polynomial, polynomial outstrips logarithmic).

📐 The formal epsilon-delta definition

📐 What the definition says in words

Limit (formal definition): L is the limit of a function f(x) as x approaches a, written as limit as x approaches a of f(x) equals L, if for any given number epsilon greater than 0, there exists a number delta greater than 0, such that the absolute value of f(x) minus L is less than epsilon whenever 0 is less than the absolute value of x minus a which is less than delta.

Intuitive meaning: you can make f(x) arbitrarily close to L (within any distance epsilon) by picking x sufficiently close to a (within some distance delta).
Visual interpretation: for any interval around L on the y-axis, you can find at least one small interval around x = a (excluding a itself) on the x-axis that the function maps completely inside that interval on the y-axis.
Choosing smaller intervals around L may force you to find smaller intervals around a.

🔍 Why x = a is excluded

The condition 0 less than absolute value of x minus a less than delta means x = a itself is excluded.
Consequence: f(a) does not have to equal L, or even be defined.
The function just needs to approach L as x approaches a.
Counter-intuitive implication: the existence of the limit does not depend on what happens at x = a itself.
Example: a solid dot at (a, L) could even be a hollow dot; the limit can exist even if f(a) is different or undefined.

🛠️ How epsilon-delta proofs work

The technique is to let epsilon greater than 0 be given, then "work backward" from the inequality absolute value of f(x) minus L less than epsilon to get an inequality of the form absolute value of x minus a less than delta.
Delta usually depends on epsilon.
Example (from the excerpt): to show limit as x approaches a of x equals a, start with absolute value of f(x) minus a less than epsilon, which is absolute value of x minus a less than epsilon, so choosing delta equals epsilon works.
Why this seems silly: it requires extra effort for obvious results, but the formal definition is used most often in proofs of general theorems (e.g., proving the sum rule for limits).

🔀 One-sided limits and when limits fail to exist

🔀 Right and left limits

Right limit: L is the right limit of f(x) as x approaches a, written as limit as x approaches a from the right of f(x) equals L, if f(x) approaches L as x approaches a for values of x larger than a.

Left limit: L is the left limit of f(x) as x approaches a, written as limit as x approaches a from the left of f(x) equals L, if f(x) approaches L as x approaches a for values of x smaller than a.

Key equivalence: the limit of a function exists if and only if both its right limit and left limit exist and are equal: limit as x approaches a of f(x) equals L if and only if limit from the left equals L equals limit from the right.

❌ When limits do not exist

Disagreement between one-sided limits: if the left and right limits do not agree, the two-sided limit does not exist.
Example (from the excerpt): for the piecewise function f(x) equals x squared if x less than 0, and 2 minus x if x greater than or equal to 0, the left limit at 0 is 0 (along the parabola) but the right limit is 2 (along the line), so the limit at 0 does not exist.
Oscillation: if f(x) oscillates without settling down, the limit does not exist.
Example (from the excerpt): for f(x) equals sine of 1 over x, as x approaches 0 from the right, sine of 1 over x oscillates between 1 and negative 1 infinitely often, so the right limit does not exist.

♾️ Infinite limits and asymptotes

♾️ Limits equal to infinity or negative infinity

Limit equals infinity: the limit of f(x) equals infinity as x approaches a if f(x) grows without bound as x approaches a, i.e., f(x) can be made larger than any positive number by picking x sufficiently close to a.

Limit equals negative infinity: the limit of f(x) equals negative infinity as x approaches a if f(x) grows negatively without bound as x approaches a, i.e., f(x) can be made smaller than any negative number by picking x sufficiently close to a.

Formal definition (infinity case): for any given number M greater than 0, there exists a number delta greater than 0, such that f(x) greater than M whenever 0 less than absolute value of x minus a less than delta.
These definitions can be modified for one-sided limits.

📏 Vertical asymptotes

Vertical asymptote: if limit as x approaches a of f(x) equals infinity or negative infinity, then the line x = a is a vertical asymptote of f(x), and f(x) approaches the line x = a asymptotically.

Example (from the excerpt): for f(x) equals 1 over x, as x approaches 0 from the right, 1 over x approaches infinity; from the left, 1 over x approaches negative infinity. The y-axis (x = 0) is a vertical asymptote.
Don't confuse: if the right and left limits are both infinity (or both negative infinity), the two-sided limit equals infinity (or negative infinity). Example: for f(x) equals 1 over x squared, both one-sided limits at 0 equal infinity, so the limit equals infinity.

🌅 Limits at infinity and horizontal asymptotes

🌅 Limits as x approaches infinity or negative infinity

Limit as x approaches infinity: for a real number L, the limit of f(x) equals L as x approaches infinity if f(x) can be made arbitrarily close to L for x sufficiently large and positive.

Limit as x approaches negative infinity: for a real number L, the limit of f(x) equals L as x approaches negative infinity if f(x) can be made arbitrarily close to L for x sufficiently small and negative.

Interpretation: limit as x approaches infinity of f(x) equals L means the long-term behavior of f(x) is to approach a steady-state at L.
These definitions can be modified for L replaced by infinity or negative infinity.

📏 Horizontal asymptotes

Horizontal asymptote: if limit as x approaches infinity of f(x) equals L or limit as x approaches negative infinity of f(x) equals L, then the line y = L is a horizontal asymptote of f(x), and f(x) approaches the line y = L asymptotically.

Example (from the excerpt): for f(x) equals 1 over x and f(x) equals 1 over x squared, both limits as x approaches infinity and negative infinity equal 0, so the x-axis (y = 0) is a horizontal asymptote.

📊 Basic limits at infinity

Limit as x approaches infinity of x to the power n equals infinity for any real n greater than 0; equals 0 for any real n less than 0.
Limit as x approaches negative infinity of x to the power n equals infinity for n equals 2, 4, 6, 8, ...; equals negative infinity for n equals 1, 3, 5, 7, ...
Limit as x approaches infinity of e to the power x equals infinity; limit as x approaches negative infinity of e to the power x equals 0.
Limit as x approaches infinity of e to the power negative x equals 0; limit as x approaches negative infinity of e to the power negative x equals infinity.
Limit as x approaches infinity of natural log of x equals infinity; limit as x approaches 0 from the right of natural log of x equals negative infinity.

🧮 Indeterminate forms and L'Hôpital's Rule

🧮 What indeterminate forms are

Indeterminate forms: expressions like infinity minus infinity, infinity over infinity, 0 over 0, and infinity times 0.
Why they are indeterminate: they do not automatically simplify to a single value; they can equal anything depending on the specific functions involved.
Common mistake: thinking that infinity minus infinity equals 0 (the infinities do not necessarily "cancel out").

🔧 L'Hôpital's Rule

L'Hôpital's Rule: if f and g are differentiable functions and the limit as x approaches a of f(x) over g(x) equals plus or minus infinity over plus or minus infinity, or 0 over 0, then the limit as x approaches a of f(x) over g(x) equals the limit as x approaches a of f prime of x over g prime of x. The number a can be real, infinity, or negative infinity.

How to use it: differentiate the numerator and denominator separately, then take the limit of the new ratio.
Can be applied repeatedly: if the new limit is still an indeterminate form, apply L'Hôpital's Rule again.
Intuitive justification: the limit of a ratio compares how f changes relative to g as x approaches a, so it is really the rates of change (f prime and g prime) that are being compared.

🚀 Growth rates revealed by L'Hôpital's Rule

Exponential growth outstrips polynomial growth: limit as x approaches infinity of any polynomial over e to the power x equals 0.
Example (from the excerpt): limit as x approaches infinity of 2x minus 1 over e to the power x equals 0.
Polynomial growth outstrips logarithmic growth: limit as x approaches infinity of natural log of x over any polynomial equals 0 (equivalently, limit as x approaches infinity of any polynomial over natural log of x equals infinity).
Example (from the excerpt): limit as x approaches infinity of x over natural log of x equals infinity.

🔄 Converting indeterminate forms

Infinity times 0 to infinity over infinity: rewrite the product as a fraction.
Example (from the excerpt): limit as x approaches infinity of x times e to the power negative 2x equals limit as x approaches infinity of x over e to the power 2x, which is infinity over infinity, then apply L'Hôpital's Rule to get 0.

📉 Rational functions (ratios of polynomials)

For limit as x approaches infinity of a ratio of polynomials, you can discard the lower-order terms (terms of degree less than the highest degree).
The limit ends up being the ratio of the leading coefficients.
Example (from the excerpt): limit as x approaches infinity of (2x squared minus 7x minus 5) over (3x squared plus 2x minus 1) equals 2 over 3 (the ratio of the leading coefficients 2 and 3).

🗜️ The Squeeze Theorem

🗜️ Statement of the theorem

Squeeze Theorem: suppose that for some functions f, g, and h there is a number x sub 0 greater than or equal to 0 such that g(x) less than or equal to f(x) less than or equal to h(x) for all x greater than x sub 0, and that limit as x approaches infinity of g(x) equals limit as x approaches infinity of h(x) equals L. Then limit as x approaches infinity of f(x) equals L.

Intuitive meaning: if one function is "squeezed" between two functions approaching the same limit, then the function in the middle must also approach that limit.
The theorem also applies to limits as x approaches a finite number a, and to one-sided limits (x approaches a from the right or left).

🗜️ How to use the Squeeze Theorem

Step 1: find two functions g and h such that g(x) less than or equal to f(x) less than or equal to h(x) for all x in some appropriate range.
Step 2: show that g and h both approach the same limit L.
Step 3: conclude that f also approaches L.
Example (from the excerpt): to evaluate limit as x approaches infinity of sine of x over x, use negative 1 less than or equal to sine of x less than or equal to 1, so dividing by x greater than 0 gives negative 1 over x less than or equal to sine of x over x less than or equal to 1 over x. Since both negative 1 over x and 1 over x approach 0 as x approaches infinity, the Squeeze Theorem gives limit as x approaches infinity of sine of x over x equals 0.

🔢 Big O notation

🔢 Definition

Big O notation: say that f(x) equals O of g(x) as x approaches infinity, spoken as "f is big O of g", if there exist positive numbers M and x sub 0 such that the absolute value of f(x) is less than or equal to M times the absolute value of g(x) for all x greater than or equal to x sub 0.

Meaning: f exhibits the same long-term behavior as g, up to a constant multiple.
You can think of g as the more basic "type" of function that describes f, as far as long-term behavior.

🔢 Example

To show that 5x to the power 4 minus 2 equals O of x to the power 4:
- Use the triangle inequality: absolute value of 5x to the power 4 minus 2 is less than or equal to absolute value of 5x to the power 4 plus absolute value of negative 2, which equals 5 times absolute value of x to the power 4 plus 2.
- For x greater than or equal to 1, x to the power 4 is greater than or equal to 1, so 2 is less than or equal to 2 times x to the power 4.
- Thus absolute value of 5x to the power 4 minus 2 is less than or equal to 7 times absolute value of x to the power 4 for all x greater than or equal to 1.
- This shows 5x to the power 4 minus 2 equals O of x to the power 4, with M equals 7 and x sub 0 equals 1.

Continuity

3.3 Continuity

🧭 Overview

🧠 One-sentence thesis

A function is continuous at a point when its limit equals its value there, and continuous functions have important properties including the ability to pass through all intermediate values and attain maximum and minimum values on closed intervals.

📌 Key points (3–5)

Definition of continuity: A function is continuous at x = a if the limit as x approaches a equals f(a); continuous functions have unbroken graphs over their domain.
Common examples: Polynomials, rational functions, trigonometric functions, exponential functions, and logarithmic functions are all continuous on their domains.
Relationship to differentiability: Every differentiable function is continuous, but the converse is not true (e.g., absolute value at zero).
Common confusion: Continuity on a domain vs. continuity at a point—tan(x) is continuous on its domain (broken intervals) but not continuous over all real numbers.
Key theorems: The Extreme Value Theorem and Intermediate Value Theorem apply only to continuous functions on closed intervals.

📐 Definition and basic concepts

📐 What continuity means

A function f is continuous at x = a if lim(x→a) f(x) = f(a).

This definition has three implicit requirements:

f(a) must be defined (a is in the domain of f)
The limit as x approaches a must exist
The limit must equal the function value

Continuity on an interval: A function is continuous on an interval I if it is continuous at every point in the interval.

For closed intervals [a, b]:

Must be continuous on the open interval (a, b)
Must be right continuous at x = a: lim(x→a+) f(x) = f(a)
Must be left continuous at x = b: lim(x→b−) f(x) = f(b)

🖼️ Visual interpretation

A continuous function has an unbroken graph over its entire domain—you can draw it without lifting your pen.

Example: The excerpt shows a graph with four points:

Continuous at x₁ (limit equals function value)
Discontinuous at x₂ (limit exists but doesn't equal function value)
Discontinuous at x₃ (limit doesn't exist; left and right limits disagree—this is a jump discontinuity)
Discontinuous at x₄ (function value not defined)

🔢 Examples of continuous and discontinuous functions

✅ Standard continuous functions

All of these are continuous on their domains:

Polynomials
Rational functions
Trigonometric functions
Exponential functions
Logarithmic functions

Example: tan(x) is continuous over its domain, which consists of disjoint intervals (−π/2, π/2), (π/2, 3π/2), etc. The graph is unbroken on each interval, but tan(x) is not continuous over all real numbers because it's not defined everywhere.

📊 Floor and ceiling functions (step functions)

Floor function ⌊x⌋: the largest integer less than or equal to x

Examples: ⌊0.1⌋ = 0, ⌊0.9⌋ = 0, ⌊0⌋ = 0, ⌊−1.3⌋ = −2

Ceiling function ⌈x⌉: the smallest integer greater than or equal to x

Examples: ⌈0.1⌉ = 1, ⌈0.9⌉ = 1, ⌈1⌉ = 1, ⌈−1.3⌉ = −1

Both functions:

Have jump discontinuities at all integers
Are continuous at all non-integer values
Are called step functions due to their staircase appearance

Application: Step functions model discrete changes in state, such as gear shifts in a transmission—the gear number jumps from 1 to 2 to 3 as speed increases.

⚠️ Extreme discontinuity example

The function f(x) = 0 if x is rational, 1 if x is irrational is discontinuous at every real number. Within any distance δ of any real number, there are infinitely many rational and irrational numbers, so the function keeps jumping between 0 and 1.

🔗 Operations preserving continuity

🔗 Algebraic combinations

If functions are continuous, then these are also continuous:

Sums and differences
Constant multiples
Products
Quotients (where the denominator is nonzero)
Compositions (continuous function of a continuous function)

🔄 Passing continuous functions through limits

If f is continuous and lim(x→a) g(x) exists and is finite, then: f(lim(x→a) g(x)) = lim(x→a) f(g(x))

This is useful for evaluating indeterminate forms like 0⁰, ∞⁰, and 1∞.

Example: To evaluate lim(x→0+) x^x (form 0⁰):

Let y = lim(x→0+) x^x
Take natural logarithm: ln(y) = lim(x→0+) ln(x^x) = lim(x→0+) x·ln(x)
Pass ln inside: ln(y) = lim(x→0+) x·ln(x) → 0·(−∞)
Rewrite as lim(x→0+) ln(x)/(1/x) → −∞/∞
Apply L'Hôpital's Rule: lim(x→0+) (1/x)/(−1/x²) = lim(x→0+) (−x) = 0
Therefore y = e⁰ = 1

🔗 Relationship to differentiability

🔗 Key theorem

Every differentiable function is continuous.

Proof sketch: If f is differentiable at x = a, then f'(a) exists. This means:

lim(x→a) [f(x) − f(a)]/(x − a) exists
Therefore lim(x→a) [f(x) − f(a)] = lim(x→a) [f(x) − f(a)]/(x − a) · (x − a) = f'(a) · 0 = 0
So lim(x→a) f(x) = f(a), meaning f is continuous at x = a

⚠️ The converse is false

Continuous functions need not be differentiable.

Example: f(x) = |x| is continuous everywhere (unbroken graph) but not differentiable at x = 0 (sharp corner).

Don't confuse: Continuous curves can have sharp edges and cusps, but differentiable curves cannot.

📊 Important theorems

📊 Extreme Value Theorem

If f is continuous on a closed interval [a, b], then f attains both a maximum value and a minimum value on that interval.

Why the closed interval matters: On an open interval (c, d), a continuous function may approach but never attain a maximum or minimum (the excerpt shows this graphically).

📊 Intermediate Value Theorem

If f is continuous on a closed interval [a, b], then f attains every value between f(a) and f(b).

Interpretation: Continuous functions cannot "skip over" intermediate values.

Example application: Show that cos(x) = x has a solution.

Let f(x) = cos(x) − x
f is continuous on [0, 1]
f(0) = 1 > 0 and f(1) ≈ −0.46 < 0
By the Intermediate Value Theorem, there exists c in (0, 1) such that f(c) = 0
Therefore cos(c) = c

Important limitation: The theorem guarantees existence but does not tell you how to find the solution.

🔍 Bisection method

To actually find the solution:

Divide the interval in half
Apply the Intermediate Value Theorem to each half to determine which contains the solution
Repeat on the half-interval containing the solution
Continue until the interval is small enough that its midpoint approximates the solution

The excerpt provides a Python implementation that finds the root of cos(x) − x to be approximately 0.7390851332152 (the number obtained by taking cosine repeatedly).

Implicit Differentiation

3.4 Implicit Differentiation

🧭 Overview

🧠 One-sentence thesis

Implicit differentiation allows us to find the derivative dy/dx even when y cannot be solved explicitly as a function of x, by differentiating both sides of an equation involving x and y and then solving for dy/dx.

📌 Key points (3–5)

When to use it: when an equation involving x and y cannot be solved explicitly for y in terms of x, but y still varies with x.
The procedure: take d/dx of both sides of the equation, apply the Chain Rule to terms involving y (treating y as a function of x), then solve algebraically for dy/dx.
The result: dy/dx is often expressed in terms of both x and y, but can be evaluated at specific points (x, y) on the curve.
Common confusion: the derivative dy/dx = (expression in x and y) is not "incomplete"—it gives a single expression that works for all branches of the curve, even when the original equation defines multiple functions.
Why it matters: enables finding slopes and tangent lines for curves that are not functions in the traditional sense (e.g., circles, ellipses, algebraic curves).

🔍 The core problem and solution

🔍 What implicit differentiation solves

Traditional differentiation works when you have an explicit formula like y = x².
The problem: some equations like x³y²e^(sin(xy)) = x² + xy + y³ describe curves but cannot be solved for y in terms of x.
The insight: even without an explicit formula, y still varies with x as x changes, so the rate of change dy/dx should exist.

🛠️ How implicit differentiation works

Implicit differentiation: the procedure of taking d/dx of both sides of an equation involving x and y, treating y as a function of x, and solving for dy/dx.

The steps:

Start with an equation involving x and y
Take d/dx of both sides
Apply the Chain Rule to any term involving y (since y depends on x)
Solve algebraically for dy/dx

Example: Given x³ + 3x + 2 = y²

Take d/dx of both sides: d/dx(x³ + 3x + 2) = d/dx(y²)
Left side: 3x² + 3
Right side: 2y · dy/dx (by Chain Rule)
Solve: dy/dx = (3x² + 3)/(2y)

📊 Understanding the result

The derivative dy/dx = (3x² + 3)/(2y) is expressed in terms of both x and y.
This is not a problem: evaluate it at specific points (x, y) that satisfy the original equation.
Example: the point (1, √6) satisfies x³ + 3x + 2 = y², so dy/dx at (1, √6) = (3(1)² + 3)/(2√6) = √6/2.
When undefined: dy/dx is not defined when the denominator is zero (e.g., when y = 0 in the example above).

🎯 Key advantage over explicit methods

🎯 Handling multiple branches at once

The limitation of explicit solving: taking the square root of x³ + 3x + 2 = y² gives y = ±√(x³ + 3x + 2), which defines two functions, not one.
The power of implicit differentiation: the single expression dy/dx = (3x² + 3)/(2y) gives the derivative for both branches simultaneously.
You don't need to split the problem into cases; one formula handles all parts of the curve.

🔄 The Chain Rule is essential

Every time you differentiate a term involving y, remember that y depends on x.
Example: d/dx(y²) = 2y · dy/dx (not just 2y).
Example: d/dx(y³) = 3y² · dy/dx.
Example: d/dx(xy) requires the product rule: 1·y + x·dy/dx = y + x·dy/dx.

📐 Applications to algebraic and elliptic curves

📐 Algebraic curves

Algebraic curve: the set of all points (x, y) satisfying a polynomial equation in x and y, such as x² - 3xy⁴ + 1 = x⁵ - y².

These curves often cannot be expressed as y = f(x).
Implicit differentiation allows us to find slopes and tangent lines anyway.

📐 Elliptic curves

Elliptic curve: a special algebraic curve where the polynomial has the form x³ + ax + b = y², such as x³ + 3x + 2 = y².

Elliptic curves have special properties used in cryptography.
The curve from Example 3.26 (x³ + 3x + 2 = y²) is an elliptic curve; its graph resembles an oval shape.

📐 Example: x + y = x³ + y³

This equation defines an algebraic curve consisting of an ellipse with a line through it.
Implicit differentiation:
- d/dx(x + y) = d/dx(x³ + y³)
- 1 + dy/dx = 3x² + 3y² · dy/dx
- Solve: dy/dx = (3x² - 1)/(1 - 3y²)
Special feature: the line y = -x is part of the curve (verify by substituting y = -x into the equation, which gives 0 = 0).
Ambiguity at intersections: where the line intersects the ellipse, is dy/dx the slope of the line or the slope of the ellipse? (This question is left for exercises.)

🧮 Finding tangent lines

🧮 The unit circle example

Problem: Find the tangent line to x² + y² = 1 at the point (4/5, 3/5).

Solution:

Differentiate implicitly:
- d/dx(x² + y²) = d/dx(1)
- 2x + 2y · dy/dx = 0
- dy/dx = -x/y
Evaluate at the point:
- At (4/5, 3/5): dy/dx = -(4/5)/(3/5) = -4/3
- This is the slope m of the tangent line.
Write the tangent line equation:
- Using point-slope form: y - 3/5 = -4/3(x - 4/5)

🧮 Geometric interpretation

The unit circle x² + y² = 1 is not a function (it fails the vertical line test).
Yet implicit differentiation gives dy/dx = -x/y, which works at any point on the circle (except where y = 0).
At (1, 0): dy/dx is undefined (division by zero), which makes geometric sense—the tangent line at (1, 0) is vertical.

Related Rates

3.5 Related Rates

🧭 Overview

🧠 One-sentence thesis

Related rates problems use differentiation of equations relating multiple quantities to find how one rate of change determines another, typically with respect to time.

📌 Key points (3–5)

Core technique: differentiate both sides of an equation relating several quantities with respect to a variable (usually time t) to connect their rates of change.
Goal: use known rates of change to determine an unknown related rate.
Common confusion: distinguish between the value of a quantity at a specific moment and its rate of change (derivative); the rate may be constant while the quantity itself varies.
Evaluation timing: often you must find the derivative expression first, then substitute specific values of the quantities to find the rate at that instant.
Chain Rule is essential: when differentiating quantities that depend on time, apply the Chain Rule to connect derivatives.

🔧 The Related Rates Method

🔧 What the technique does

Related rates: a method where differentiating an equation relating several quantities with respect to a variable produces a relation between the rates of change of those quantities.

You start with an equation connecting multiple quantities (e.g., volume, height, radius).
All these quantities are functions of time t.
Differentiate both sides with respect to t to get an equation involving the derivatives (rates).
Substitute known rates and values to solve for the unknown rate.

🧮 Why differentiation with respect to time works

The excerpt emphasizes that quantities like volume V, height h, radius r, angle θ, and position x are all functions of time t.
Even if the original equation does not explicitly show t, implicit differentiation treats each variable as t-dependent.
Example: if V = 30000 h, then differentiating gives dV/dt = 30000 dh/dt, linking the rate of volume change to the rate of height change.

⚙️ The role of the Chain Rule

When differentiating composite functions (e.g., x = 100 cot θ where θ depends on t), the Chain Rule is required.
Example from the excerpt: dx/dt = −100 csc² θ · dθ/dt.
The Chain Rule ensures that the rate of change of one quantity is correctly expressed in terms of the rate of change of another.

📐 Working Through Examples

📐 Water filling a rectangular pool (Example 3.29)

Setup:

Pool dimensions: 300 ft long, 100 ft wide, 10 ft deep.
Volume V = (300)(100)h = 30000 h cubic feet, where h is water height.
Given: dV/dt = 60,000 ft³/min (water pumped in).
Goal: find dh/dt (how fast the height is changing).

Solution steps:

Differentiate V = 30000 h with respect to t: dV/dt = 30000 dh/dt.
Substitute the known rate: 60,000 = 30000 dh/dt.
Solve: dh/dt = 60,000 / 30,000 = 2 ft/min.

Key insight: The rate of height change is constant because the pool has constant cross-sectional area.

📐 Shadow of a pole (Example 3.30)

Setup:

A 100 ft pole perpendicular to the ground.
Angle of inclination θ from the top of the pole to the sun is decreasing at dθ/dt = −0.05 rad/min (negative because decreasing).
Shadow length x = 100 cot θ.
Goal: find dx/dt when θ = π/6.

Solution steps:

Differentiate x = 100 cot θ with respect to t: dx/dt = −100 csc² θ · dθ/dt.
Substitute dθ/dt = −0.05: dx/dt = −100 csc² θ · (−0.05) = 5 csc² θ.
Evaluate at θ = π/6: dx/dt = 5 csc²(π/6) = 5 · (2)² = 20 ft/min.

Key insight: First find the general derivative formula, then substitute the specific angle value; the notation dx/dt evaluated at θ = π/6 means "the rate at that instant."

📐 Cylinder with changing dimensions (Example 3.31)

Setup:

Right circular cylinder with volume V = π r² h.
Radius decreasing: dr/dt = −3 cm/min.
Height increasing: dh/dt = 2 cm/min.
Goal: find dV/dt when r = 8 cm and h = 6 cm.

Solution steps:

Differentiate V = π r² h using the Product Rule: dV/dt = (2π r · dr/dt)h + π r² · dh/dt.
Substitute known rates and values: dV/dt = 2π(8)(−3)(6) + π(8²)(2).
Simplify: dV/dt = −288π + 128π = −160π cm³/min.

Key insight: The Product Rule is needed because both r and h change with time; the negative result means the volume is decreasing (the radius shrinks faster than the height grows).

🧩 Common Patterns and Confusions

🧩 Distinguishing value from rate

Value: the quantity itself at a specific time (e.g., r = 8 cm, θ = π/6).
Rate: the derivative, how fast the quantity is changing (e.g., dr/dt = −3 cm/min).
Don't confuse: a quantity can have a specific value while its rate of change is constant, or vice versa.

🧩 When to substitute values

Step	What to do	Example
1. Relate quantities	Write the equation connecting the variables	V = π r² h
2. Differentiate	Apply differentiation (Chain Rule, Product Rule) to get rates	dV/dt = (2π r · dr/dt)h + π r² · dh/dt
3. Substitute	Plug in known rates and specific values of quantities	Use r = 8, h = 6, dr/dt = −3, dh/dt = 2
4. Solve	Calculate the unknown rate	dV/dt = −160π

The excerpt shows that substitution happens after differentiation, not before.
Example: in Example 3.30, the derivative dx/dt = 5 csc² θ is found first, then θ = π/6 is substituted.

🧩 Signs of rates

A decreasing quantity has a negative rate: dθ/dt = −0.05 means θ is getting smaller.
An increasing quantity has a positive rate: dh/dt = 2 means h is getting larger.
The final answer's sign tells you whether the unknown quantity is increasing (positive) or decreasing (negative).
Example: dV/dt = −160π means the cylinder's volume is shrinking.

📝 Exercise Themes

📝 Geometric scenarios

The exercises involve:

Expanding circles: ripples in water, relating radius and area rates.
Spheres: radius changing, find volume and surface area rates.
Triangles and ladders: right-triangle setups with moving objects (kite, ladder sliding down a wall).
Shadows: person walking away from a light, shadow length changing.
Cones and cylinders: changing radius and height, find volume rate.

📝 Curves and motion

Exercise 6 asks: for an object moving along y = x³, at what points are dx/dt and dy/dt equal?
This requires differentiating y = x³ to get dy/dt = 3x² · dx/dt, then setting dy/dt = dx/dt and solving for x.

📝 Key skills tested

Setting up the geometric or physical relationship (e.g., Pythagorean theorem for ladder problems, volume formulas).
Differentiating with respect to time using Chain Rule and Product Rule.
Substituting given rates and values at the correct step.
Interpreting the sign and units of the result.

Differentials

3.6 Differentials

🧭 Overview

🧠 One-sentence thesis

Differentials provide a flexible way to express physical laws and mathematical relationships by treating infinitesimal changes as algebraic quantities rather than always computing rates of change with respect to a single variable.

📌 Key points (3–5)

What differentials are: infinitesimal changes in quantities, written as df = f′(x) dx, identical to the infinitesimal notation used earlier in the text.
How they differ from derivatives: differentials relate infinitesimal changes themselves (like dP/P + dV/V = dT/T), not rates of change with respect to a specific variable.
Why they're useful: they allow flexible manipulation without committing to a single independent variable, making them natural for stating physical laws.
Common confusion: differentials (infinitesimals like dx) are NOT the same as small real changes (Δx); infinitesimals cannot be assigned real values.
All derivative rules apply: sum, product, quotient, chain, and power rules all have differential versions obtained by multiplying both sides by dx.

📐 Definition and basic properties

📐 What a differential is

For a differentiable function f(x), the differential of f(x) is df = f′(x) dx where dx is an infinitesimal change in x.

This is identical to the infinitesimal notation from earlier sections (Equation 1.9).
Many modern textbooks call infinitesimals "differentials" for compatibility.
Example: For f(x) = x³, the differential is df = 3x² dx, often written as d(x³) = 3x² dx in science texts.

🧮 Differential versions of derivative rules

All standard derivative rules translate directly to differentials by multiplying both sides by dx:

Rule	Differential form
Constant	d(c) = 0
Constant Multiple	d(cf) = c df
Sum	d(f + g) = df + dg
Difference	d(f − g) = df − dg
Product	d(fg) = f dg + g df
Quotient	d(f/g) = (g df − f dg)/g²
Power	d(fⁿ) = n fⁿ⁻¹ df
Chain	d(f(g)) = (df/dg) dg

Proof method: multiply the standard derivative rule by dx on both sides, then cancel dx terms.
Example: Product Rule becomes d(fg)/dx = f(dg/dx) + g(df/dx), multiply by dx, cancel to get d(fg) = f dg + g df.

🔬 Physical applications

🔬 Ideal gas law example

The ideal gas law PV = RT (where R is constant, P is pressure, V is volume per mole, T is temperature) can be expressed in differential form:

dP/P + dV/V = dT/T

Notice: each term is a relative infinitesimal change (infinitesimal divided by the quantity itself, not by time or another variable).
Proof: Take d(PV) = d(RT), apply product rule to get V dP + P dV = R dT, use R = PV/T, divide by PV.
Alternative proof: Use logarithmic differentiation—take ln of both sides, then differential: ln P + ln V = ln R + ln T, so dP/P + dV/V = 0 + dT/T.

🚀 Rocket equation example

For a rocket with total mass M (rocket plus unburnt fuel) at time t, burning fuel mass dm over infinitesimal time dt, with exhaust velocity vₑ relative to the rocket:

vₑ dm = M dv

Uses conservation of momentum over the interval dt.
Key insight: the product (dm)(dv) = 0 because it equals m′(t)v′(t)(dt)² = 0 (infinitesimal squared).
Dividing by dt gives M(dv/dt) = (dm/dt)vₑ, or Ma = (dm/dt)vₑ, the classic rocket acceleration equation.

🔍 Useful differential identity

For natural logarithm: d(ln u) = du/u

Obtained from chain rule: d(ln u) = (d(ln u)/du) du = (1/u) du.
This is frequently used in logarithmic differentiation techniques.

🎯 Why differentials matter

🎯 Flexibility advantage

Differentials don't force you to choose a single independent variable:

Example: For cylinder volume V = πr²h, the differential form is dV = 2πrh dr + πr² dh.
You can divide by any differential (dt, dr, dh, etc.), not just dt as in related rates problems.
This provides more flexibility than the derivative form dV/dt = (2πr · dr/dt)h + πr² · dh/dt, which commits to time as the independent variable.

🎯 Natural for physical laws

Many physical laws are naturally stated as relationships between infinitesimal changes rather than rates:

The ideal gas differential relation dP/P + dV/V = dT/T expresses how relative changes relate.
This form doesn't privilege any variable as "independent."
It's easier to manipulate algebraically for different purposes.

🔺 Geometric interpretation

🔺 Circle area example

The derivative of circle area A = πr² with respect to radius r equals the circumference 2πr, i.e., dA = 2πr dr.

Geometric explanation:

Increase radius by infinitesimal dr, creating a thin ring of area dA.
Slice the ring and roll it flat into a trapezoid.
Height: dr; inner edge: 2πr; outer edge: 2π(r + dr).
The triangular edges have area ½π(dr)² = 0 (infinitesimal squared).
Remaining rectangular portion: area = 2πr · dr, confirming dA = 2πr dr.

This is not a coincidence—it follows from the geometry of infinitesimal changes.

🔺 Why squares don't work the same way

For a square with side length x, area is x², and d(x²) = 2x dx.

But the perimeter is 4x, not 2x.
The pattern breaks because a square has corners (not differentiable at those points).
A circle is smooth everywhere, allowing the differential relationship to match the geometric boundary.

⚠️ Important distinctions

⚠️ Differentials vs. small real changes

Critical confusion to avoid:

Differential dx: an infinitesimal, NOT a real number, cannot be assigned any real value no matter how small.
Small change Δx: a small but real value that CAN be assigned actual numbers.
Modern textbooks often conflate these, causing student confusion.
They are fundamentally different concepts and should not be used interchangeably.

⚠️ Linear approximation exercises

The approximation f(x) ≈ f(x₀) + f′(x₀)(x − x₀) when x − x₀ is "small":

Example: √63 ≈ 7.9375 using f(x) = √x, x₀ = 64, Δx = −1.
These exercises have nothing to do with differentials (they use real Δx, not infinitesimal dx).
The excerpt notes these are remnants of pre-computing era with dubious modern value.
Don't confuse approximation with small real Δx with differential calculus using infinitesimal dx.

Optimization

4.1 Optimization

🧭 Overview

🧠 One-sentence thesis

Optimization problems find maximum or minimum values of a function by using derivatives to locate critical points, then applying tests to determine whether these points yield global or local extrema.

📌 Key points (3–5)

What optimization solves: finding the largest or smallest value of a quantity (objective function) subject to constraints.
Critical points are key: points where the derivative equals zero are candidates for maxima or minima.
Second Derivative Test: if f′′(c) > 0 at a critical point, it's a local minimum; if f′′(c) < 0, it's a local maximum; if f′′(c) = 0, the test fails.
Common confusion: global vs. local extrema—a global maximum is the largest value everywhere, while a local maximum is only largest in a neighborhood.
Endpoint checking matters: on closed intervals, always compare critical point values with endpoint values to find the true global extremum.

🎯 Types of extrema

🌍 Global extrema

Global maximum at x = c: f(c) ≥ f(x) for all x in the domain of f.

Global minimum at x = c: f(c) ≤ f(x) for all x in the domain of f.

These represent the absolute largest or smallest values over the entire domain.
Physical applications typically seek global extrema, not just local ones.
The Extreme Value Theorem guarantees at least one global maximum and one global minimum for continuous functions on closed intervals.

🏘️ Local extrema

Local maximum at x = c: f(c) ≥ f(x) for all x near c (within some distance δ > 0).

Local minimum at x = c: f(c) ≤ f(x) for all x near c (within some distance δ > 0).

These are "turning points" where the function changes from increasing to decreasing (local max) or vice versa (local min).
Every global extremum is also a local extremum, but not vice versa.
Example: A function on [a, b] might have a global maximum at x = c₁ and a local maximum at x = b, where the local max is smaller than the global max.

🔍 Finding critical points

📍 What are critical points

Critical points (or stationary points): points where the derivative equals zero, f′(c) = 0.

At an internal maximum or minimum, the function must stop increasing or decreasing momentarily.
This means the derivative changes from positive to zero to negative (at a max) or from negative to zero to positive (at a min).
For continuous derivatives, this transition requires f′ = 0 at the turning point.

🧪 Second Derivative Test

The test uses the second derivative to classify critical points:

Condition	Conclusion	Intuition
f′′(c) > 0	Local minimum	Function is concave up (smiling ☺)
f′′(c) < 0	Local maximum	Function is concave down (frowning ☹)
f′′(c) = 0	Test fails	Need other methods

Why it works: If f′′ > 0, then f′ is increasing around c, so f′ goes from negative through zero to positive → minimum.
When it fails: Consider f(x) = x³ at x = 0: f′(0) = 0 and f′′(0) = 0, but x = 0 is neither a max nor min.
Visual mnemonic: The sign of f′′ at a critical point resembles the "eyes" in a face, while the curve shape shows the "mouth."

🛠️ Solution procedure

📋 Standard optimization steps

Identify the objective function: the quantity to maximize or minimize.
Use constraints to eliminate variables: if there's a constraint relating two variables, solve for one in terms of the other.
Substitute to get a single-variable function: rewrite the objective function using only one independent variable.
Find critical points: solve f′(x) = 0.
Apply the Second Derivative Test: determine whether critical points are local maxima or minima.
Check endpoints if applicable: compare values at critical points and endpoints.

🔀 Two main cases

Case 1: Closed interval [a, b]

The global extremum occurs either at an interior critical point or at an endpoint.
Must evaluate f at all critical points and both endpoints, then compare values.
Example: Finding the closest point on a curve to a given point often involves a closed interval.

Case 2: Open interval with one critical point

If the only critical point is a local maximum, it must be the global maximum.
If the only critical point is a local minimum, it must be the global minimum.
No need to check endpoints since they don't exist or aren't included.
Example: Maximizing area of a rectangle with fixed perimeter—dimensions must be positive, giving an open interval.

⚠️ Don't confuse

Local vs. global: A function can have multiple local maxima but only one global maximum.
Critical points vs. extrema: Not every critical point is a maximum or minimum (e.g., when f′′ = 0).
Mathematical vs. practical optimality: The mathematical optimum might not be practical due to other real-world constraints (like the soda can example where packing and handling requirements override material cost minimization).

📐 Applied examples patterns

🔲 Geometric optimization

Rectangle with fixed perimeter

Constraint: L = 2x + 2y (perimeter is constant)
Objective: A = xy (maximize area)
Result: The optimal rectangle is a square (x = y = L/4)

Cylinder with fixed volume

Constraint: V = πr²h (volume is constant)
Objective: S = 2πr² + 2πrh (minimize surface area)
Result: Optimal when height equals diameter (h = 2r)

🎯 Physics applications

Projectile motion

Objective: Maximize horizontal distance L
Result: Launch angle θ = π/4 (45°) maximizes range
Note: Once the formula L = (v₀² sin 2θ)/g is found, the maximum occurs when sin 2θ = 1

Fermat's Principle (light reflection)

Principle: Light travels the path of least time
Result: Angle of incidence equals angle of reflection (θ₁ = θ₂)
Method: Minimize total distance traveled, which is equivalent to minimizing time since speed is constant

🚣 Mixed-rate problems

Boat and running problem

Setup: Travel part of the way by one method (rowing) and part by another (running), each with different speeds
Key: Total time T = (distance₁/speed₁) + (distance₂/speed₂)
Important lesson: Always check endpoints—sometimes the optimal solution is at a boundary (e.g., rowing all the way or running all the way)

📏 Distance minimization

Closest point on a curve

Tip: Minimize D² instead of D to avoid messy square root derivatives
Both give the same optimal point since D is minimized exactly when D² is minimized
Must still check endpoints if the domain is a closed interval

⚡ Key insights

💡 Practical considerations

Endpoint checking is crucial: Even when a critical point exists, the global extremum might occur at a boundary.
Small differences matter: In the boat problem, the difference between optimal and suboptimal solutions was only 5.6 minutes—but retrieving the boat later could take 28 minutes, making the "optimal" solution impractical.
Real constraints override mathematical optima: Soda cans aren't designed to minimize aluminum alone; handling, packing, and ergonomics also matter.

🎓 Problem-solving wisdom

When the objective function simplifies to a form like f(x) = constant × g(x), maximize g(x) instead.
If checking the Second Derivative Test is complicated, just compare function values at all candidate points.
Degenerate cases (like zero width or height) sometimes need consideration as limiting cases.

Curve Sketching

4.2 Curve Sketching

🧭 Overview

🧠 One-sentence thesis

The shape of a function's graph is determined by analyzing its first and second derivatives to identify where it increases or decreases, where it curves upward or downward, and where these behaviors change.

📌 Key points (3–5)

Concavity reveals how a function curves: second derivative positive → concave up (curves like a cup); second derivative negative → concave down (curves like a cap).
Inflection points mark concavity changes: the second derivative must change sign around the point, not just equal zero at that point.
Multiple tests for extrema: the Second Derivative Test uses the second derivative at critical points; the First Derivative Test tracks sign changes of the first derivative; the Nth Derivative Test handles cases where lower derivatives vanish.
Common confusion: a point where the second derivative equals zero is not automatically an inflection point—you must verify that the second derivative changes sign around it (e.g., x⁴ at x = 0 is not an inflection point even though the second derivative is zero there).
Systematic sketching combines all information: critical points, concavity, inflection points, increasing/decreasing intervals, and asymptotes together reveal the function's complete behavior.

📐 Concavity and the second derivative

📐 What concavity means

Concavity: the manner in which a function increases or decreases, determined by the sign of the second derivative.

A function can increase (first derivative positive) in different ways depending on whether the first derivative itself is increasing or decreasing.
Concave up (second derivative > 0): the first derivative is increasing; the graph curves upward like a cup; the function lies below the line joining any two points on the interval.
Concave down (second derivative < 0): the first derivative is decreasing; the graph curves downward like a cap; the function lies above the line joining any two points on the interval.
These definitions apply whether the function itself is increasing or decreasing.

🔍 The Concavity Theorem

Concavity Theorem: For a twice-differentiable function on an interval [a, b]:
(a) If the second derivative is positive on (a, b), then the function lies below the line joining the endpoints.
(b) If the second derivative is negative on (a, b), then the function lies above the line joining the endpoints.

Why this matters:

The theorem formalizes the geometric meaning of concavity.
The proof uses the Extreme Value Theorem and Second Derivative Test to show that the difference between the function and the connecting line cannot have an interior maximum (for case a) or minimum (for case b).

🔄 Inflection points

Inflection point: a point x = c where the concavity of the function changes (from concave up to concave down, or vice versa).

Critical requirement:

The second derivative must change sign around x = c, not merely equal zero at c.
Example: f(x) = x³ has an inflection point at x = 0 because f″(x) = 6x changes from negative (x < 0) to positive (x > 0).
Don't confuse: f(x) = x⁴ has f″(0) = 0 but is not an inflection point because f″(x) = 12x² ≥ 0 everywhere (always concave up, no sign change).

🧪 Tests for local extrema

🧪 Second Derivative Test (standard)

For a critical point x = c where the first derivative equals zero:

If f″(c) > 0 → local minimum at x = c
If f″(c) < 0 → local maximum at x = c
If f″(c) = 0 → test fails; use another method

Limitation: When the second derivative is zero at the critical point, this test gives no information.

🔁 First Derivative Test (alternative)

For a continuous function with a critical point x = c (where f′(c) = 0 or f′(c) does not exist):

If f′(x) changes from negative to positive around x = c → local minimum
If f′(x) changes from positive to negative around x = c → local maximum

Why it works: A function decreases then increases around a minimum; increases then decreases around a maximum.

Example: For f(x) = x^(2/3), the derivative f′(x) = (2/3) · x^(−1/3) is undefined at x = 0, but f′(x) < 0 for x < 0 and f′(x) > 0 for x > 0, so x = 0 is a local minimum by the First Derivative Test.

🔢 Nth Derivative Test (comprehensive)

For a function with continuous derivatives up to order n at x = c:

Conditions: f′(c) = f″(c) = … = f^(n−1)(c) = 0 and f^(n)(c) ≠ 0 (the nth derivative is the first nonzero derivative).

Conclusions:

If n is even and f^(n)(c) > 0 → local minimum
If n is even and f^(n)(c) < 0 → local maximum
If n is odd → inflection point

Example: For f(x) = x⁴ at x = 0, the first three derivatives are zero but f^(4)(0) = 24 > 0. Since n = 4 is even and positive, x = 0 is a local minimum.

Trade-off: This test is complete but can require computing many derivatives for complicated functions.

🎨 Systematic curve sketching

🎨 Step-by-step procedure

To sketch a function's graph, find:

Critical points: where f′(x) = 0 or f′(x) does not exist
Local extrema: use Second Derivative Test, First Derivative Test, or Nth Derivative Test
Inflection points: where f″(x) changes sign
Increasing/decreasing intervals: based on the sign of f′(x)
Concavity intervals: based on the sign of f″(x)
Asymptotes: horizontal (limits as x → ±∞) and vertical (where the function is undefined)

📊 Example: polynomial function

For f(x) = x³ − 6x² + 9x + 1:

Feature	Calculation	Result
Critical points	f′(x) = 3(x − 1)(x − 3) = 0	x = 1, x = 3
Second derivative	f″(x) = 6x − 12	f″(1) = −6, f″(3) = 6
Extrema	Second Derivative Test	Local max at x = 1; local min at x = 3
Inflection point	f″(x) = 0 and changes sign	x = 2
Concavity	Sign of f″(x)	Concave down for x < 2; concave up for x > 2
Increasing/decreasing	Sign of f′(x)	Increasing x < 1 and x > 3; decreasing 1 < x < 3

📈 Example: rational function

For f(x) = −x / (1 + x²):

Critical points at x = ±1
Local minimum at x = 1 (f″(1) = 1/2 > 0); local maximum at x = −1 (f″(−1) = −1/2 < 0)
Inflection points at x = 0, ±√3 (where f″(x) changes sign)
Horizontal asymptote y = 0 (since the limit as x → ±∞ is zero)
No vertical asymptotes (denominator never zero)

🔬 Applications in science and engineering

🔬 Combining constants and variables

Strategy: When a function contains multiple named constants and variables, combine them into a single dimensionless variable to simplify analysis.

Technique:

Identify a natural scaling constant (e.g., the Bohr radius a₀)
Define a new variable as a ratio (e.g., x = r/a₀)
Multiply stray constants to one side so the other side is dimensionless
Sketch the dimensionless function; critical and inflection points remain at the same relative locations

⚛️ Example: hydrogen atom probability

The radial probability density for a hydrogen atom electron is D(r) = (4/a₀³) r² e^(−2r/a₀), where a₀ is the Bohr radius.

Simplification:

Let x = r/a₀
Then a₀ D(x) = 4x² e^(−2x) (dimensionless)
Sketch a₀ D(x) versus x

Result: The graph shows a local maximum at x = 1 (i.e., r = a₀), meaning the electron is most likely found near the Bohr radius, and probability drops dramatically beyond x = 3 (r = 3a₀).

🌡️ Example: thermal energy and heat capacity

For a two-state particle in thermal contact with a reservoir:

Average energy U = ε e^(−ε/τ) / (1 + e^(−ε/τ))
Heat capacity C_V = k_B (ε/τ)² e^(ε/τ) / (1 + e^(ε/τ))²

Both are graphed as functions of τ/ε (temperature scaled by energy), with U/ε and C_V/k_B as dimensionless quantities. This reveals how energy and heat capacity behave across different temperature regimes without being obscured by the specific values of ε and k_B.

Numerical Approximation of Roots of Functions

4.3 Numerical Approximation of Roots of Functions

🧭 Overview

🧠 One-sentence thesis

When finding roots of functions (especially derivatives for critical points) cannot be done in closed form, numerical methods like Newton's method and the secant method provide efficient iterative approximations, with Newton's method converging faster but the secant method avoiding derivative computation.

📌 Key points (3–5)

Why numerical methods are needed: In practice, equations like f'(x) = 0 almost never have simple closed-form solutions, so iterative approximation is essential.
Newton's method: Uses tangent lines at successive points to converge rapidly (quadratic convergence) but requires computing derivatives and can fail if f'(x_n) = 0.
Secant method: Uses secant lines through two points instead of tangent lines, avoiding derivative computation, though it may need a few more iterations than Newton's method.
Common confusion: More iterations does not always mean slower—the secant method can be faster in practice because it avoids expensive derivative computations and can reuse function values.
Bisection method advantage: Always works (guaranteed convergence) but is much slower; modern computers often make this speed difference negligible.

🔍 Why numerical methods matter

🔍 The practical problem

Finding critical points of a function f means solving f'(x) = 0.
Examples and exercises are usually set up so solutions exist in simple closed form, but in practice this is almost never the case.
Example: For f(x) = sin(x) - x²/2, finding critical points requires solving cos(x) - x = 0, which has no closed-form solution.

Numerical methods: Algorithms for finding roots of a function (where the function equals zero) through iterative approximation.

🎯 Connection to derivatives

Finding critical points = finding roots of the derivative f'.
Finding inflection points = finding roots of the second derivative f''.
Numerical root-finding methods make it possible to sketch graphs of many more functions.

🔧 Newton's method

🔧 The geometric idea

Start with an initial guess x₀.
Go up (or down) to the curve y = f(x) and draw the tangent line at the point (x₀, f(x₀)).
Let x₁ be where that tangent line intersects the x-axis.
Repeat this procedure: use x₁ to get x₂, use x₂ to get x₃, and so on.
The sequence x₀, x₁, x₂, x₃, ... will approach the root x̄.

📐 The formula

The tangent line at (x₀, f(x₀)) has slope f'(x₀), so its equation is:

y - f(x₀) = f'(x₀)(x - x₀)

Since (x₁, 0) is on that line:

0 - f(x₀) = f'(x₀)(x₁ - x₀)
Therefore: x₁ = x₀ - f(x₀)/f'(x₀)

Newton's method algorithm: For an initial guess x₀, compute x_n iteratively as:
x_n = x_(n-1) - f(x_(n-1))/f'(x_(n-1)) for n = 1, 2, 3, ...

Each "next" number x_n depends on the previous number x_(n-1).
The algorithm terminates when f'(x_n) = 0 or when desired accuracy is reached.
If f'(x_n) = 0 for some n ≥ 0, start over with a different initial guess.

⚡ Convergence speed

Quadratic rate of convergence: Error terms (differences between approximate roots and actual root) are being squared in the long term.
More precisely: If x_n converges to root x̄, then error terms ε_n = x_n - x̄ satisfy:
- limit as n→∞ of |ε_(n+1)|/|ε_n|² = C for some constant C.
Squaring error terms when |ε_n| < 1 makes them smaller, not larger, speeding convergence.

Example: For f(x) = cos(x) - x with initial guess x₀ = 1.0, Newton's method found the root 0.7390851332151607 after only 4 iterations.

⚠️ Potential pitfalls

When Newton's method fails:

Problem	Geometric reason	What happens
f'(x_n) = 0	Tangent line parallel to x-axis	Division by zero; no "next number" x_(n+1)
Poor choice of x₀	Starting far from root	Moves away from the root instead of closer
Infinite loop	Unlucky geometry	Loops endlessly between same two numbers

Don't confuse: These failures are usually fixable by choosing a different initial guess x₀.
In most cases, conditions exist under which Newton's method is guaranteed to work with fast convergence.

🔗 Secant method

🔗 The geometric idea

Start with two initial guesses x₀ and x₁.
Go up (or down) to the curve y = f(x) and draw the secant line through points (x₀, f(x₀)) and (x₁, f(x₁)).
Let x₂ be where that secant line intersects the x-axis.
Repeat this procedure: use x₁ and x₂ to get x₃, and keep repeating.
The sequence x₀, x₁, x₂, x₃, ... will approach the root x̄ under the right conditions.

📐 The formula

The secant line through (x₀, f(x₀)) and (x₁, f(x₁)) has slope:

[f(x₁) - f(x₀)]/(x₁ - x₀)

Since (x₂, 0) is on that line:

x₂ = x₁ - (x₁ - x₀)·f(x₁)/[f(x₁) - f(x₀)]

Secant method algorithm: For two initial guesses x₀ and x₁, compute x_n iteratively as:
x_n = x_(n-1) - (x_(n-1) - x_(n-2))·f(x_(n-1))/[f(x_(n-1)) - f(x_(n-2))] for n = 2, 3, 4, ...

Each "next" number x_n depends on the previous two numbers x_(n-1) and x_(n-2).
The algorithm terminates when x_n = x_(n-1) (numbers start repeating) or when desired accuracy is reached.

🔄 Key difference from Newton's method

The secant method does not use derivatives.
It replaces the derivative in Newton's method with the slope of a secant line, which approximates the derivative.
Recall: The tangent line is the limit of slopes of secant lines.

Example: For f(x) = cos(x) - x with initial guesses x₀ = 0 and x₁ = 1, the secant method found the root after 6 iterations (compared to Newton's 4).

💡 Why the secant method can be preferable

Don't confuse "fewer iterations" with "faster execution":

Newton's method requires computing both f(x_(n-1)) and f'(x_(n-1)) for each iteration.
The secant method needs f(x_(n-1)) and f(x_(n-2)), but a good programmer can save the value of f(x_(n-1)) and reuse it as f(x_(n-2)) in the next iteration.
Result: The secant method avoids recomputing function values, potentially requiring fewer total computations.
Depending on the complexity of the function and its derivative, Newton's method could involve more "expensive" operations (computing values vs. assigning values).
The few extra iterations possibly required by the secant method can be made up for by fewer total computations.

📊 Comparison of methods

📊 Performance summary

For f(x) = cos(x) - x, comparing 10 iterations:

Method	Iterations to find root	Starting values	Notes
Newton's method	4	x₀ = 1.0	Fastest convergence
Secant method	6	x₀ = 0.0, x₁ = 1.0	No derivative needed
Bisection method	52 (to same precision)	Interval [0, 1]	Much slower but always works

📊 Trade-offs

Newton's method:

✅ Fewest iterations (quadratic convergence)
✅ Fast when derivatives are simple
❌ Requires computing derivatives
❌ Can fail if f'(x_n) = 0
❌ May need careful choice of initial guess

Secant method:

✅ No derivatives needed
✅ Can be faster in practice for complex functions
✅ Can reuse function values
❌ Needs two initial guesses
❌ A few more iterations than Newton's

Bisection method:

✅ Always works (guaranteed convergence)
✅ Simple and robust
❌ Much slower convergence
✅ Modern computers often make speed difference negligible

🎯 Practical recommendation

The speed of modern computers makes the difference in algorithmic efficiency negligible in many cases.
The bisection method has the nice advantage of always working.
Most textbooks on numerical analysis discuss these issues in detail for choosing the right method.

The Mean Value Theorem

4.4 The Mean Value Theorem

🧭 Overview

🧠 One-sentence thesis

The Mean Value Theorem guarantees that for any differentiable function on an interval, there exists at least one point where the instantaneous rate of change (derivative) equals the average rate of change over that interval, providing a powerful tool for proving theoretical results about functions.

📌 Key points (3–5)

Core guarantee: On any interval where a function is continuous and differentiable, the derivative must equal the average rate of change at some interior point.
Geometric interpretation: There exists at least one point where the tangent line is parallel to the secant line connecting the endpoints.
Existence vs. finding: The theorem only guarantees that such a point exists; actually finding it requires solving an equation (often numerically).
Common confusion: The Mean Value Theorem is primarily a theoretical tool, not a computational one—it tells you something exists but not necessarily how to find it.
Why it matters: It enables proofs of fundamental results like "zero derivative implies constant function" and "positive derivative implies increasing function."

📐 The theorem and its geometric meaning

📐 Statement of the Mean Value Theorem

Mean Value Theorem: Let a and b be real numbers such that a < b, and suppose that f is a function such that (a) f is continuous on [a, b], and (b) f is differentiable on (a, b). Then there is at least one number c in the interval (a, b) such that f′(c) = (f(b) − f(a)) / (b − a).

The right side of the equation is the average rate of change over [a, b].
The left side is the instantaneous rate of change at c.
The theorem says these two rates must be equal somewhere inside the interval.

🎨 Geometric picture

Draw a secant line connecting the points (a, f(a)) and (b, f(b)).
The slope of this secant line is (f(b) − f(a)) / (b − a).
At some point c between a and b, the tangent line to the curve has slope f′(c).
The theorem guarantees that these two slopes are equal, meaning the tangent line is parallel to the secant line.
Example: Imagine a smooth hill from point A to point B; somewhere on that hill, the slope of the ground must equal the average slope from A to B.

🔄 Alternative form

The theorem can be rewritten using h = b − a > 0 and a parameter θ in (0, 1):

f(a + h) − f(a) = h · f′(a + θh)
This form expresses the change in f as the step size h times the derivative at some intermediate point a + θh.
The parameter θ tells you "how far" between a and a + h the special point lies (as a fraction of the interval).

🧱 Building block: Rolle's Theorem

🧱 Statement of Rolle's Theorem

Rolle's Theorem: Let a and b be real numbers such that a < b, and suppose that f is a function such that (a) f is continuous on [a, b], (b) f is differentiable on (a, b), and (c) f(a) = f(b) = 0. Then there is at least one number c in the interval (a, b) such that f′(c) = 0.

This is a special case of the Mean Value Theorem where the function starts and ends at zero.
The average rate of change is (0 − 0) / (b − a) = 0, so the theorem says the derivative must be zero somewhere inside.

🎯 Geometric interpretation of Rolle's Theorem

If a continuous, differentiable function starts and ends at the same height (zero in this case), it must have a horizontal tangent line somewhere in between.
Example: If you throw a ball straight up and catch it at the same height, at some moment (the peak) its velocity must be zero.

🔗 How Rolle's Theorem proves the Mean Value Theorem

Define a new function F(x) = f(x) − f(a) − [(f(b) − f(a)) / (b − a)] · (x − a).
This function F "tilts" the graph of f so that the secant line becomes horizontal.
F(a) = F(b) = 0, so Rolle's Theorem applies: F′(c) = 0 for some c in (a, b).
Computing F′(x) = f′(x) − (f(b) − f(a)) / (b − a), we get f′(c) = (f(b) − f(a)) / (b − a).
Don't confuse: Rolle's Theorem is not just a curiosity—it is the key step in proving the Mean Value Theorem.

🔬 Theoretical applications

🔬 Zero derivative implies constant function

If f is a differentiable function on an interval I such that f′(x) = 0 for all x in I, then f is a constant function on I.

Proof by contradiction: Assume f is not constant, so there exist a < b in I with f(a) ≠ f(b).
By the Mean Value Theorem, there exists c in (a, b) such that f′(c) = (f(b) − f(a)) / (b − a).
Since f′(c) = 0, we have (f(b) − f(a)) / (b − a) = 0, which implies f(a) = f(b).
This contradicts f(a) ≠ f(b), so f must be constant.
Why it matters: This formalizes the intuition that "no change in slope means no change in height."

📈 Derivative sign determines increasing/decreasing behavior

Let f be a differentiable function on an interval I. Then: (a) If f′ > 0 on I then f is increasing on I. (b) If f′ < 0 on I then f is decreasing on I.

Proof of (a): Choose any a < b in I. By the Mean Value Theorem, there exists c in (a, b) such that f(b) − f(a) = (b − a) · f′(c).
Since b − a > 0 and f′(c) > 0, we have f(b) − f(a) > 0, so f(b) > f(a).
This holds for any a < b, so f is increasing.
The proof of (b) is similar, with f′(c) < 0 making f(b) − f(a) < 0.
Don't confuse: This is a formal proof of an intuitive fact—positive slope means going uphill.

📏 Proving inequalities

The Mean Value Theorem is useful for establishing bounds on functions.

Example: Show that sin x ≤ x for all x ≥ 0.

For x = 0, sin 0 = 0 ≤ 0 (trivially true).
For x > 0, apply the Mean Value Theorem to f(x) = sin x on [0, x]: there exists c in (0, x) such that (sin x − sin 0) / (x − 0) = cos c.
This gives sin x = x · cos c.
Since cos c ≤ 1 and x > 0, we have sin x ≤ x.
Why this matters: When 0 < x < 1, the inequality sin x ≤ x is sharper (more informative) than sin x ≤ 1.

🔭 Extensions and related results

🔭 Extended Mean Value Theorem

Let a and b be real numbers such that a < b, and suppose that f and g are functions such that (a) f and g are continuous on [a, b], (b) f and g are differentiable on (a, b), and (c) g′(x) ≠ 0 for all x in (a, b). Then there is at least one number c in the interval (a, b) such that f′(c) / g′(c) = (f(b) − f(a)) / (g(b) − g(a)).

This generalizes the Mean Value Theorem by comparing the rates of change of two functions.
The ordinary Mean Value Theorem is the special case where g(x) = x.
Proof technique: Apply Rolle's Theorem to a carefully constructed function F(x) = f(x) − f(a) − [(f(b) − f(a)) / (g(b) − g(a))] · (g(x) − g(a)).

🌊 Darboux's Theorem (intermediate value property for derivatives)

If f is a differentiable function on a closed interval [a, b] then its derivative f′ attains every value between f′(a) and f′(b).

This is similar to the Intermediate Value Theorem, but for derivatives.
Surprising fact: This holds even if f′ is not continuous.
A discontinuous derivative cannot have simple jump discontinuities—it cannot "skip over" intermediate values.
Interpretation: Even a discontinuous derivative behaves "sort of" as if it were continuous in this respect.
Don't confuse: Darboux's Theorem is not the same as the Intermediate Value Theorem—it applies specifically to derivatives, whether continuous or not.

⚙️ Practical considerations

⚙️ Existence vs. computation

Both the Mean Value Theorem and Rolle's Theorem are existence theorems: they guarantee that a certain number exists but do not tell you how to find it.
To actually find the number c, you must solve the equation f′(x) = (f(b) − f(a)) / (b − a) (or f′(x) = 0 for Rolle's Theorem).
Closed-form solutions may be impossible, so numerical root-finding methods (like Newton's method from Section 4.3) may be needed.
Why it matters: The Mean Value Theorem is more useful for theoretical purposes (proving general results) than for direct computation.

🧮 Example applications in exercises

The excerpt lists several types of problems:

Checking applicability: Does Rolle's Theorem apply to f(x) = 1 − |x| on [−1, 1]? (Answer depends on differentiability.)
Real-world interpretation: Two horses start and finish a race together; show they had the same speed at some moment. (This is Rolle's Theorem applied to the difference of their position functions.)
Proving inequalities: Use the Mean Value Theorem to show |sin A − sin B| ≤ |A − B| for all A and B. (Apply the theorem to f(x) = sin x on the interval between A and B.)
More advanced inequalities: Show tan x ≥ x for 0 ≤ x < π/2, or prove bounds involving exponentials, logarithms, and powers.

🔍 Why intuition still matters

The excerpt notes that "intuitive so-called 'hand-waving' explanations often yield more insight than a 'formal' proof."
Formal proofs (like the one showing positive derivative implies increasing function) confirm that intuition has a solid basis.
Don't confuse: Formal proofs are not replacements for intuition—they are confirmations that intuition is correct and can be relied upon.

The Indefinite Integral

5.1 The Indefinite Integral

🧭 Overview

🧠 One-sentence thesis

The indefinite integral reverses differentiation by finding all antiderivatives of a function, which differ only by a constant, and this process is essential for solving problems where rates of change are known but the original function must be recovered.

📌 Key points (3–5)

What an antiderivative is: a function F(x) whose derivative equals the given function f(x); that is, F'(x) = f(x).
Why there are infinitely many antiderivatives: any two antiderivatives of the same function differ only by a constant, so finding one antiderivative gives all of them by adding a generic constant C.
Notation and terminology: the indefinite integral ∫ f(x) dx represents the entire family of antiderivatives; the integral sign acts like a summation symbol for infinitesimal pieces.
Common confusion—one vs. many: there is no single "the" antiderivative; every function has a whole family of antiderivatives, all differing by a constant.
Why it matters: antidifferentiation (integration) solves real-world problems such as finding position from velocity or velocity from acceleration.

🔄 From differentiation to antidifferentiation

🔄 The reverse problem

Differentiation moves from position s(t) to velocity v(t) = s'(t) to acceleration a(t) = v'(t).
The reverse process is needed when you know velocity and want position, or know acceleration and want velocity.
This reverse process is called antidifferentiation.

Antiderivative: An antiderivative F(x) of a function f(x) is a function whose derivative is f(x). In other words, F'(x) = f(x).

🧩 Why antidifferentiation is harder

Differentiation is straightforward: you have learned derivatives of many classes of functions (polynomials, trigonometric, exponential, logarithmic) and rules for combining them (sums, products, quotients).
Antidifferentiation is a different story—it is not as mechanical.

🔢 The family of antiderivatives

🔢 Multiple antiderivatives for the same function

Consider f(x) = 2x. You know that the derivative of x² is 2x, so F(x) = x² is an antiderivative.
But F(x) = x² + 1 also has derivative 2x, as does F(x) = x² + 2.
In fact, any function of the form F(x) = x² + C, where C is some constant, is an antiderivative of f(x) = 2x.

🔑 Key theorem: antiderivatives differ only by a constant

Theorem: Suppose that F(x) and G(x) are antiderivatives of a function f(x). Then F(x) and G(x) differ only by a constant. That is, F(x) = G(x) + C for some constant C.

Why this is true:

Define H(x) = F(x) − G(x) on the common domain I of F and G.
Since F'(x) = G'(x) = f(x), then H'(x) = F'(x) − G'(x) = f(x) − f(x) = 0 for all x in I.
A function with derivative zero everywhere on an interval is constant (from the Mean Value Theorem).
Therefore H(x) = C for some constant C, which means F(x) − G(x) = C, or F(x) = G(x) + C.

Practical consequence:

To find all antiderivatives of a function, it is necessary only to find one antiderivative and then add a generic constant to it.
Example: for f(x) = 2x, since F(x) = x² is one antiderivative, all antiderivatives are of the form F(x) = x² + C.

📐 Notation and meaning of the indefinite integral

📐 The indefinite integral symbol

Indefinite integral: The indefinite integral of a function f(x) is denoted by ∫ f(x) dx and represents the entire family of antiderivatives of f(x).

The large S-shaped symbol ∫ is called an integral sign.
Though ∫ f(x) dx represents all antiderivatives, the integral can be thought of as a single object or function whose derivative is f(x):
- d/dx (∫ f(x) dx) = f(x)

🧮 What the integral sign and dx mean

For an antiderivative F(x) of f(x), the infinitesimal (or differential) dF is given by dF = F'(x) dx = f(x) dx.
Therefore F(x) = ∫ f(x) dx = ∫ dF.
The integral sign acts as a summation symbol: it sums up the infinitesimal "pieces" dF of the function F(x) at each x so that they add up to the entire function F(x).
Think of it as similar to the usual summation symbol Σ used for discrete sums; the integral sign ∫ takes the sum of a continuum of infinitesimal quantities instead.

🗣️ Terminology

Finding (or evaluating) the indefinite integral of a function is called integrating the function.
Integration is antidifferentiation.

🧰 Basic integration formulas

🧰 Simple examples

∫ 0 dx = C: since the derivative of any constant function is 0.
∫ 1 dx = x + C: since the derivative of F(x) = x is F'(x) = 1.
∫ x dx = (x²)/2 + C: since the derivative of F(x) = (x²)/2 is F'(x) = x.

📏 Power Formula

Power Formula:
∫ x^n dx = (x^(n+1))/(n+1) + C if n ≠ −1
∫ x^n dx = ln|x| + C if n = −1

Why:

Since d/dx (x^(n+1)/(n+1)) = x^n for any number n ≠ −1.
Since d/dx (ln|x|) = 1/x = x^(−1).

Examples:

∫ x⁷ dx = (x⁸)/8 + C
∫ √x dx = ∫ x^(1/2) dx = (x^(3/2))/(3/2) + C = (2x^(3/2))/3 + C
∫ (1/x²) dx = ∫ x^(−2) dx = (x^(−1))/(−1) + C = −1/x + C

🔧 Integration rules

Rules for indefinite integrals (let f and g be functions and let k be a constant):

∫ k f(x) dx = k ∫ f(x) dx

∫ (f(x) + g(x)) dx = ∫ f(x) dx + ∫ g(x) dx

∫ (f(x) − g(x)) dx = ∫ f(x) dx − ∫ g(x) dx

Why the first rule is true:

If F(x) = ∫ f(x) dx, then d/dx (k F(x)) = k d/dx (F(x)) = k f(x).
Therefore ∫ k f(x) dx = k F(x) = k ∫ f(x) dx.

General consequence:

For any functions f₁, …, fₙ and constants k₁, …, kₙ:
- ∫ (k₁ f₁(x) + ⋯ + kₙ fₙ(x)) dx = k₁ ∫ f₁(x) dx + ⋯ + kₙ ∫ fₙ(x) dx
Any polynomial (or finite sum of functions) can be integrated term by term.

Example:

∫ (x⁷ − 3x⁴) dx = ∫ x⁷ dx − 3 ∫ x⁴ dx = (x⁸)/8 − 3(x⁵)/5 + C

📐 Trigonometric integrals

∫ cos x dx = sin x + C
∫ sin x dx = −cos x + C
∫ sec² x dx = tan x + C
∫ sec x tan x dx = sec x + C
∫ csc x cot x dx = −csc x + C
∫ csc² x dx = −cot x + C

These are just re-statements of the corresponding derivative formulas for the six basic trigonometric functions.

🌟 Exponential integral

∫ e^x dx = e^x + C (since d/dx (e^x) = e^x)

Example:

∫ (3 sin x + 4 cos x − 5e^x) dx = 3 ∫ sin x dx + 4 ∫ cos x dx − 5 ∫ e^x dx
= −3 cos x + 4 sin x − 5e^x + C

🎯 Applications to motion and physics

🎯 Finding position from acceleration

Scenario: An object is dropped from a height of 100 ft. Show that the height s(t) of the object t seconds after being dropped is s(t) = −16t² + 100 (measured in feet).

Solution:

When the object is dropped at time t = 0, the only force is gravity, causing constant downward acceleration of 32 ft/s².
The object's acceleration is a(t) = −32.
If v(t) is the velocity at time t, then v'(t) = a(t), so:
- v(t) = ∫ a(t) dt = ∫ −32 dt = −32t + C
The constant C is determined by the initial condition: the object was at rest at t = 0, so v(0) = 0.
- 0 = v(0) = −32(0) + C = C, so C = 0.
- Therefore v(t) = −32t for all t ≥ 0.
Since s'(t) = v(t), then:
- s(t) = ∫ v(t) dt = ∫ −32t dt = −16t² + C
The constant C is determined by the initial condition: the object was 100 ft above the ground at t = 0, so s(0) = 100.
- 100 = s(0) = −16(0)² + C = C, so C = 100.
- Therefore s(t) = −16t² + 100 for all t ≥ 0.

Don't confuse: The constant C in an indefinite integral is sometimes generic (representing all possible constants) and sometimes specific (determined by initial conditions in a particular problem).

🚀 General free fall motion formulas

Free fall motion (at time t ≥ 0):

Acceleration: a(t) = −g

Velocity: v(t) = −gt + v₀

Position: s(t) = −(1/2)gt² + v₀t + s₀

Initial conditions: s₀ = s(0), v₀ = v(0)

Units must be consistent.
In metric units, g = 9.8 m/s²; in English units, g = 32 ft/s².
v₀ is positive if thrown upward, negative if thrown downward.

🔬 Solving differential equations by integration

🔬 Equations of differentials

Thinking of the indefinite integral as the sum of all infinitesimal "pieces" of a function provides a handy way of integrating a differential equation.
The key idea is to transform the differential equation into an equation of differentials, which has the effect of treating functions as variables.

🧪 Example: exponential growth/decay

Problem: For any constant k, show that every solution of the differential equation dy/dt = ky is of the form y = Ae^(kt) for some constant A. (Assume y(t) > 0 for all t.)

Solution:

Separate the variables (put y terms on the left, t terms on the right):
- dy/y = k dt
Integrate both sides (notice how the function y is treated as a variable):
- ∫ dy/y = ∫ k dt
- ln y + C₁ = kt + C₂ (C₁ and C₂ are constants)
- ln y = kt + C (combine C₁ and C₂ into the constant C)
- y = e^(kt + C) = e^(kt) · e^C = Ae^(kt)
- where A = e^C is a constant.
This is the formula for radioactive decay (or exponential growth).

⚗️ Example: ideal gas law

Problem: Recall the equation of differentials dP/P + dV/V = dT/T relating the pressure P, volume V, and temperature T of an ideal gas. Integrate that equation to obtain the original ideal gas law PV = RT, where R is a constant.

Solution:

Integrate both sides:
- ∫ dP/P + ∫ dV/V = ∫ dT/T
- ln P + ln V = ln T + C (C is a constant)
- ln(PV) = ln T + C
- PV = e^(ln T + C) = e^(ln T) · e^C = T e^C = RT
- where R = e^C is a constant.

🧗 Limitations and next steps

🧗 The challenge of integration

The integration formulas in this section depended on already knowing the derivatives of certain functions and then "working backward" from their derivatives to obtain the original functions.
Without that prior knowledge you would be reduced to guessing, or perhaps recognizing a pattern from some derivative you have encountered.
A number of integration techniques will be presented shortly (in later sections).

The Definite Integral

5.2 The Definite Integral

🧭 Overview

🧠 One-sentence thesis

The definite integral extends the indefinite integral by summing infinitesimals over a specific interval, yielding either a number or a specific function that can be interpreted geometrically as the area under a curve.

📌 Key points (3–5)

Definite vs indefinite: An indefinite integral sums over a generic variable and yields a generic function; a definite integral sums over a specific interval [a, b] and yields a number or specific function.
Geometric interpretation: For nonnegative functions, the definite integral equals the area under the curve between two bounds.
Riemann sum method: The definite integral can be calculated as the limit of sums of rectangular areas as the number of rectangles approaches infinity and their widths approach zero.
Common confusion: The infinitesimal rectangle area f(x)dx appears slightly smaller than the area under the curve, but the gap (a triangle) has zero area because its area is proportional to (dx)².
Sign matters: When the function is negative or changes sign, the definite integral represents net area (positive above the x-axis, negative below).

📐 From indefinite to definite integrals

📐 What makes an integral "definite"

The definite integral of a function f(x) over an interval [a, b] is denoted by ∫ from a to b of f(x)dx and represents the sum of the infinitesimals f(x)dx for all x in [a, b].

The indefinite integral ∫ f(x)dx uses x "in general"—no specific range.
The definite integral ∫ from a to b of f(x)dx sums over the specific interval [a, b].
Output difference: indefinite → generic function; definite → number or specific function.

🔍 Why the infinitesimal is an exact area

The excerpt shows that f(x)dx is not just approximately equal to the area under the curve—it is exactly equal:

The infinitesimal rectangle has height f(x) and width dx, so area = f(x)dx.
There appears to be a small triangular gap between the rectangle top and the curve.
By the Microstraightness Property, the curve is a straight line over the infinitesimal interval [x, x + dx].
The gap is a right triangle with base dx and height df = f'(x)dx.
Area of triangle = (1/2) · dx · f'(x)dx = (1/2)f'(x)(dx)² = 0 (because (dx)² = 0).
Therefore, the rectangle area f(x)dx equals the area under the curve over [x, x + dx].

🎨 Geometric interpretation: area under a curve

🎨 Area formula for nonnegative functions

For a function f(x) ≥ 0 over [a, b], the area under the curve y = f(x) between x = a and x = b is given by A = ∫ from a to b of f(x)dx.

The region R is bounded:
- Above by y = f(x)
- Below by the x-axis
- On the sides by x = a and x = b (with a < b)
The definite integral equals the area of this region.

➖ When the function is negative or changes sign

The excerpt extends the definition beyond nonnegative functions:

Situation	Definition
f(x) ≤ 0 over [a, b]	∫ from a to b of f(x)dx = negative of the area of R
f(x) changes sign	∫ from a to b of f(x)dx = net area (positive above x-axis, negative below)

Example: If part of the curve is above the x-axis and part below, the definite integral subtracts the area below from the area above.

🧮 Calculating definite integrals via Riemann sums

🧮 The five-step procedure

The excerpt provides an algorithm to calculate the area (and thus the definite integral):

Partition: Divide [a, b] into n subintervals [x₀, x₁], [x₁, x₂], ..., [xₙ₋₁, xₙ], where x₀ = a and xₙ = b.
Pick points: In each subinterval [xᵢ₋₁, xᵢ], choose a point x*ᵢ (left endpoint, midpoint, or right endpoint are common).
Form rectangles: Each rectangle has base [xᵢ₋₁, xᵢ] of length Δxᵢ = xᵢ − xᵢ₋₁ and height f(x*ᵢ).
Sum the areas: The Riemann sum is f(x₁)Δx₁ + f(x₂)Δx₂ + ... + f(x*ₙ)Δxₙ.
Take the limit: As n → ∞ (so subinterval lengths approach 0), the limit of the Riemann sums equals the area A.

Area A = ∫ from a to b of f(x)dx = limit as n → ∞ of the sum from i=1 to n of f(x*ᵢ)Δxᵢ

📊 Practical considerations

Norm: The limit should be taken over all partitions whose norm (length of the largest subinterval) approaches 0.
Equal-length subintervals: In practice, divide [a, b] into n equal pieces, each of length (b − a)/n, then increase n.
Choice of x*ᵢ: Left endpoints, midpoints, and right endpoints are typical; midpoints usually give better accuracy.
Gap areas approach zero: As n grows and subinterval lengths shrink, gaps between rectangles and the curve vanish (true if f is differentiable or even just continuous).

🧪 Example walkthrough

The excerpt calculates ∫ from 1 to 2 of x²dx:

Divide [1, 2] into n equal subintervals of length 1/n.
Partition points: xᵢ = 1 + i/n for i = 0, 1, ..., n.
Use left endpoints: x*ᵢ = xᵢ₋₁ = 1 + (i−1)/n.
Riemann sum = sum from i=1 to n of (1 + (i−1)/n)² · (1/n).
Expand and apply summation formulas (see below).
Take limit as n → ∞ to get 7/3.

The excerpt also shows a table of numerical approximations using 1, 2, 3, ..., up to 1,000,000 rectangles, demonstrating convergence to approximately 2.333... = 7/3.

Don't confuse: Due to the concavity of y = x², left endpoints underestimate the area, right endpoints overestimate, and midpoints usually give the best approximation for a given n.

🔢 Summation notation and formulas

🔢 Sigma notation basics

For real numbers a₁, a₂, ..., aₙ and an integer n ≥ 1, the sum from k=1 to n of aₖ = a₁ + a₂ + ... + aₙ.

Σ is the summation sign (Greek capital letter Sigma).
The index k is a dummy variable; the sum is independent of the letter used.

Rules:

Sum of sums: Σ(aₖ + bₖ) = Σaₖ + Σbₖ
Sum of differences: Σ(aₖ − bₖ) = Σaₖ − Σbₖ
Constant multiple: Σ(c·aₖ) = c·Σaₖ

📐 Key summation formulas

The excerpt provides five formulas useful for calculating Riemann sums:

Formula	Expression	Result
(1)	Sum from k=1 to n of 1	n
(2)	Sum from k=1 to n of k	n(n+1)/2
(3)	Sum from k=1 to n of k²	n(n+1)(2n+1)/6
(4)	Sum from k=1 to n of k³	n²(n+1)²/4
(5)	Sum from k=1 to n of k⁴	n(n+1)(6n³+9n²+n−1)/30

Formula (1) is obvious: adding 1 a total of n times gives n.
Formula (2) can be proved by induction (the excerpt shows the proof steps).
Formulas (3)–(5) can also be proved by induction (left as exercises).

🔬 Using the formulas

Example: In the calculation of ∫ from 1 to 2 of x²dx, the Riemann sum involves:

Sum from i=1 to n of 1/n (uses formula 1)
Sum from i=1 to n of (i−1) = sum from i=1 to n−1 of i (uses formula 2 with n replaced by n−1)
Sum from i=1 to n of (i−1)² = sum from i=1 to n−1 of i² (uses formula 3 with n replaced by n−1)

After substitution and simplification, taking the limit as n → ∞ yields the final answer.

📚 Terminology

📚 Standard terms

The excerpt defines several important terms:

Limits of integration: In ∫ from a to b of f(x)dx, the numbers a and b.
- a = lower limit of integration
- b = upper limit of integration
Integrand: The function f(x) being integrated (applies to both definite and indefinite integrals).
Partition: A division of [a, b] into subintervals {x₀ < x₁ < ... < xₙ}.
Norm of a partition: The length of the largest subinterval.
Riemann sum: The sum of rectangular areas f(x*ᵢ)Δxᵢ for a given partition and choice of points.

Don't confuse: "Limits of integration" (the bounds a and b) with "limit of Riemann sums" (the limiting process as n → ∞).

The Fundamental Theorem of Calculus

5.3 The Fundamental Theorem of Calculus

🧭 Overview

🧠 One-sentence thesis

The Fundamental Theorem of Calculus connects differentiation and integration by showing that antiderivatives provide a simple way to calculate definite integrals without tedious Riemann sums.

📌 Key points (3–5)

Part I (area function): The function A(x) defined as the integral from a to x of f(t) dt is differentiable, and its derivative equals f(x).
Part II (evaluation shortcut): Any antiderivative F of f allows you to calculate the definite integral as F(b) minus F(a).
Why it matters: Calculating integrals using antiderivatives is much easier than using Riemann sums, especially for functions beyond low-degree polynomials.
Common confusion: You do not need to add a constant C when using antiderivatives in definite integrals—the constant cancels out when subtracting F(a) from F(b).
Special symmetry cases: Odd functions integrate to zero over symmetric intervals; even functions let you double the integral over half the interval.

🔗 The two parts of the theorem

🔗 Part I: The area function is differentiable

Part I: The function A(x) defined on [a, b] by A(x) = integral from a to x of f(t) dt is differentiable on [a, b], and A'(x) = f(x) for all x in [a, b].

A(x) is called the area function because it represents the area under the curve y = f(x) over the interval [a, x].
The proof uses the Microstraightness Property: over an infinitesimal interval [x, x + dx], the curve is a straight line.
The differential dA = A(x + dx) - A(x) is the infinitesimal area under the curve over [x, x + dx].
In all three cases (f increasing, constant, or decreasing over the infinitesimal interval), dA = f(x) dx.
Therefore A'(x) = dA/dx = f(x).

Why this matters: Integration creates a function whose derivative is the original integrand—integration and differentiation are inverse operations.

🔗 Part II: Antiderivatives evaluate integrals

Part II: If F is an antiderivative of f on [a, b], i.e., F'(x) = f(x) for all x in [a, b], then the integral from a to b of f(x) dx = F(b) - F(a).

Since both A(x) and F(x) are antiderivatives of f(x), they differ by a constant C: A(x) = F(x) + C.
By definition, A(a) = 0 (the area over an interval of zero length).
Therefore 0 = F(a) + C, so C = -F(a), which means A(x) = F(x) - F(a).
Setting x = b gives the integral from a to b of f(x) dx = A(b) = F(b) - F(a).

Notation shorthand: F(x) evaluated from a to b (written with a vertical bar) means F(b) - F(a).

Don't confuse: Unlike indefinite integrals, you do not add "+ C" when using antiderivatives in definite integrals—any constant cancels when you subtract F(a) from F(b).

📐 Using the theorem: worked examples

📐 Basic polynomial example

Example: Calculate the integral from 1 to 2 of x squared dx.

An antiderivative of f(x) = x squared is F(x) = x cubed divided by 3.
By Part II: integral = F(2) - F(1) = (8/3) - (1/3) = 7/3.
This matches the Riemann sum result from the previous section but is much easier to compute.

📐 Trigonometric example

Example: Calculate the integral from 0 to π of sin x dx.

An antiderivative of f(x) = sin x is F(x) = negative cos x.
Integral = -cos(π) - (-cos(0)) = -(-1) - (-1) = 2.

📐 Odd function example

Example: Calculate the integral from -1 to 1 of x cubed dx.

An antiderivative is F(x) = x to the fourth divided by 4.
Integral = (1/4) - (1/4) = 0.
This illustrates the general rule for odd functions.

🎭 Symmetry properties

🎭 Odd functions integrate to zero

Odd function rule: If f(-x) = -f(x) for all x, then the integral from -a to a of f(x) dx = 0 for all a > 0 where f is continuous on [-a, a].

The curve is symmetric around the origin.
The area over [0, a] cancels the area over [-a, 0] because one is positive and the other negative.
Example: x cubed is odd, so its integral from -1 to 1 equals zero.

🎭 Even functions double the half-interval

Even function rule: If f(-x) = f(x) for all x, then the integral from -a to a of f(x) dx = 2 times the integral from 0 to a of f(x) dx for all a > 0 where f is continuous on [-a, a].

The curve is symmetric around the y-axis.
The area over [-a, 0] equals the area over [0, a], so you can compute one and double it.

🧮 Additional properties and rules

🧮 Linearity and interval rules

Linearity (same as for indefinite integrals):

Integral of k times f(x) = k times the integral of f(x).
Integral of (f(x) + g(x)) = integral of f(x) + integral of g(x).
Integral of (f(x) - g(x)) = integral of f(x) - integral of g(x).

Interval manipulation:

Integral from a to a of f(x) dx = 0 (zero-length interval).
Integral from a to b of f(x) dx = negative of the integral from b to a of f(x) dx (reversing limits flips the sign).
Integral from a to b of f(x) dx = integral from a to c of f(x) dx + integral from c to b of f(x) dx (splitting at an intermediate point c).

🧮 Chain Rule for integrals

Chain Rule for integrals: If F(x) = integral from a to g(x) of f(t) dt, then F'(x) = f(g(x)) times g'(x).

This combines Part I of the Fundamental Theorem with the Chain Rule from differentiation.
The upper limit is a function g(x) instead of just x, so you must multiply by the derivative of that upper limit.

Example: Let F(x) = integral from 0 to x squared of e to the negative t squared dt. Then F'(x) = e to the negative (x squared) squared times (2x) = 2x times e to the negative x to the fourth.

Why this works: By the Chain Rule, differentiating with respect to x when the upper limit is g(x) requires multiplying by g'(x).

Integration by Substitution

5.4 Integration by Substitution

🧭 Overview

🧠 One-sentence thesis

Substitution transforms complicated integrals into simpler ones by replacing part of the function with a new variable, essentially reversing the Chain Rule for differentiation.

📌 Key points (3–5)

Core technique: Replace a complicated part of the integrand with a new variable (typically u) to obtain a simpler integral you already know how to solve.
Critical step: Convert dx into du using the derivative relationship; the entire integral must be expressed in terms of the new variable.
Definite integrals: When substituting in definite integrals, also transform the limits of integration from x-values to u-values.
Common confusion: Never substitute u = x (it changes nothing); choose the substitution that simplifies the integral, often guided by recognizing derivatives already present in the integrand.
Connection to Chain Rule: Substitution is the reverse of the Chain Rule—you're undoing a composition of functions.

🔄 The substitution mechanism

🔄 What substitution does

Substitution: A technique that replaces part of the function being integrated with a new variable so that a complicated function of x becomes a simpler function of u that you know how to integrate.

The goal is to eliminate all references to the original variable x, including in the differential dx.
The entire integral must be rewritten in terms of u and du.
For indefinite integrals, the final answer should be converted back to the original variable x.

🔗 Why it works: reversing the Chain Rule

The excerpt states: "If the procedure seems similar to making a substitution when using the Chain Rule to take a derivative, that is because it is similar: you are basically doing the same thing only in reverse."
Just as differentiation uses the Chain Rule to handle compositions, integration uses substitution to undo them.
Example: The derivative of sin(2x) is 2·cos(2x) by the Chain Rule; reversing this, the integral of cos(2x) is (1/2)·sin(2x) + C.

🔍 Converting dx to du

After choosing u = g(x), differentiate to find du = g'(x) dx.
Solve for dx in terms of du: dx = (1/g'(x)) du.
Substitute both u and dx into the original integral.

🎯 Choosing the right substitution

🎯 What to substitute

Never substitute u = x (the excerpt explicitly warns against this).
Look for a "complicated part" inside a function (e.g., the argument of a trigonometric, exponential, or power function).
A strong hint: if the derivative of one part of the integrand appears elsewhere in the integrand (possibly off by a constant multiple), substitute the part whose derivative is present.

🔎 Recognition patterns from the examples

Pattern	What to substitute	Why
cos(2x) or e^(−3x)	The argument inside the function	Simplifies to a known integral
Derivative present	The expression whose derivative appears	The du will match what's already there
(1 + 4x)^5	The entire base expression	Converts to a simple power of u
2x·e^(x²)	u = x²	The derivative 2x is already present
x/√(1 + 3x²)	u = 1 + 3x²	Derivative 6x is almost present (off by constant 6)

⚠️ Don't confuse: which part to substitute

In the integral of (1 + 4x)^5, you might be tempted to let u = 4x, but that would leave (1 + u)^5, which is still complicated.
Instead, let u = 1 + 4x so the integral becomes u^5, which has a known formula.
The excerpt emphasizes: choose the substitution that results in a simpler integral you already know how to solve.

📐 Definite integrals with substitution

📐 Extra step: transform the limits

Follow the same substitution procedure as indefinite integrals.
Additional requirement: Replace the limits of integration x = a and x = b with u = g(a) and u = g(b).
After substitution, evaluate the integral in terms of u using the new limits—no need to convert back to x.

📐 Example walkthrough

From Example 5.26: Evaluate ∫₁² (2x + 1)³ dx.

Let u = 2x + 1, so dx = (1/2) du.
Lower limit: x = 1 becomes u = 2(1) + 1 = 3.
Upper limit: x = 2 becomes u = 2(2) + 1 = 5.
The integral becomes (1/2)∫₃⁵ u³ du = (1/8)[u⁴] from 3 to 5 = (1/8)(625 − 81) = 68.
Note: No need to substitute back to x since the answer is a number.

🔄 Symmetry property for definite integrals

The excerpt presents a useful property:

For any constant a, ∫₀ᵃ f(x) dx = ∫₀ᵃ f(a − x) dx.

Proof uses substitution u = a − x.
This property can simplify certain integrals by allowing you to rewrite the integrand in a more convenient form.
Example 5.27 uses this to evaluate an integral by setting I equal to the original integral, then adding I to itself after applying the property, which cancels the difficult part.

🧮 Common integral formulas from substitution

🧮 Exponential and power formulas

From the examples, these patterns emerge:

∫ e^(ax) dx = (1/a)·e^(ax) + C for any constant a ≠ 0.
For (1 + 4x)⁵: let u = 1 + 4x to get (1/24)(1 + 4x)⁶ + C.
General pattern: when the derivative of the inner function is present (or off by a constant), substitution works cleanly.

🧮 Logarithmic integrals

When the numerator is the derivative of the denominator:

∫ (2x)/((x² − 1)) dx: Let u = x² − 1, so du = 2x dx; result is ln|x² − 1| + C.
∫ tan(x) dx: Rewrite as sin(x)/cos(x); let u = cos(x), so du = −sin(x) dx; result is −ln|cos(x)| + C = ln|sec(x)| + C.

🧮 Inverse trigonometric formulas

The excerpt provides three formulas for any constant a > 0:

Integral form	Result	Condition
∫ dx/√(a² − x²)	sin⁻¹(x/a) + C	\|x\| < a
∫ dx/(a² + x²)	(1/a)·tan⁻¹(x/a) + C	all x
∫ dx/(\|x\|√(x² − a²))	(1/a)·sec⁻¹(x/a) + C	\|x\| > a

These come from reversing the derivatives of inverse trigonometric functions.
Example 5.25: To evaluate ∫ dx/√(4 − 9x²), let u = 3x so 9x² = u² and dx = (1/3) du; result is (1/3)·sin⁻¹(3x/2) + C.

🎓 Strategy summary

🎓 Step-by-step procedure

Identify the complicated part of the integrand (often inside a function or whose derivative appears elsewhere).
Substitute u = (that part), then differentiate to find du.
Solve for dx in terms of du.
Rewrite the entire integral in terms of u and du.
Integrate using known formulas.
Convert back to x (for indefinite integrals) or evaluate at the new limits (for definite integrals).

🎓 Key hints from the examples

If the derivative of an expression appears in the integrand (possibly off by a constant), substitute that expression.
Constants can be adjusted: if you need 6x dx but only have x dx, factor out 1/6.
For definite integrals, transforming limits saves the step of converting back to x.
The symmetry property ∫₀ᵃ f(x) dx = ∫₀ᵃ f(a − x) dx can simplify integrals where direct substitution is difficult.

Improper Integrals

5.5 Improper Integrals

🧭 Overview

🧠 One-sentence thesis

Improper integrals extend integration to infinite intervals and to functions with discontinuities or unbounded behavior, using limits to determine whether the resulting area is finite (convergent) or infinite (divergent).

📌 Key points (3–5)

Two types of improper integrals: integrals over infinite intervals (e.g., [a, ∞)) and integrals of functions with discontinuities or vertical asymptotes within the interval.
Convergence vs divergence: an improper integral converges if its limit exists as a real number; otherwise it diverges.
Common confusion: infinite length does not mean infinite area—a region extending to infinity can have finite area if the function approaches zero fast enough.
How to evaluate: replace the infinite limit or discontinuity point with a variable, compute the definite integral, then take the limit.
Comparison Test: if you can bound your function by another function whose integral is known to converge or diverge, you can determine convergence without computing the integral.

📏 Improper integrals over infinite intervals

📏 Definition and setup

For a continuous function f and a real number a, the improper integral of f over [a, ∞) is defined by the integral from a to ∞ of f(x) dx = limit as b approaches ∞ of the integral from a to b of f(x) dx.

Similarly, for (−∞, a]: limit as b approaches −∞ of the integral from b to a.
For the entire real line (−∞, ∞): split at any point c (typically 0) and evaluate both pieces.
Key mechanism: you evaluate the "proper" definite integral first, then take the limit.

✅ Convergent vs divergent

Convergent: the limit exists and equals a real number → the area is finite.
Divergent: the limit does not exist or equals ±∞ → the area is infinite or indeterminate.
Example: The integral from 1 to ∞ of 1/x dx diverges (area is infinite), but the integral from 1 to ∞ of 1/(x²) dx converges to 1 (area equals 1).

🔍 Why infinite length can have finite area

The curve y = 1/(x²) approaches the x-axis much faster than y = 1/x.
Fast enough decay toward zero allows the "tail" of the region to contribute negligible area.
Don't confuse: length and area are independent—an infinitely long region can have finite area.
Example: The integral from 1 to ∞ of 1/(x²) dx = 1, even though the interval [1, ∞) is infinite.

🌀 Oscillating functions

If the integrand oscillates (e.g., sin x), the limit may not exist even if the function is bounded.
Example: The integral from 0 to ∞ of sin x dx is divergent because −cos b oscillates between −1 and 1 as b → ∞; the limit does not exist.
The net area (positive above the x-axis, negative below) is indeterminate.

🔢 General result for power functions

For any real number a > 0, the improper integral from a to ∞ of 1/(x^p) dx is convergent if p > 1, and divergent if 0 < p ≤ 1.

This is a useful benchmark for comparison.
Example: p = 2 converges; p = 1 diverges; p = 0.5 diverges.

🚧 Improper integrals with discontinuities or vertical asymptotes

🚧 Definition and setup

For a function f continuous on [a, b) but with a discontinuity or vertical asymptote at x = b, the improper integral of f over [a, b) is defined by the integral from a to b of f(x) dx = limit as c approaches b from the left of the integral from a to c of f(x) dx.

Similarly, if the problem is at x = a, take the limit as c approaches a from the right.
If the discontinuity or asymptote is at an interior point c in (a, b), split the integral at c and evaluate both pieces.
Adjust these definitions for infinite intervals as needed.

📊 Examples with vertical asymptotes

Integral	Asymptote at	Result	Interpretation
∫₀¹ 1/x dx	x = 0	Divergent (∞)	Infinite area; region infinite in y direction
∫₀¹ 1/√x dx	x = 0	Convergent (= 2)	Finite area despite vertical asymptote

The function 1/√x approaches the asymptote slowly enough that the area remains finite.
Don't confuse: a vertical asymptote does not automatically mean divergence.

🪜 Jump discontinuities

Functions like the floor function ⌊x⌋ have jump discontinuities at integer values.
Example: To evaluate the integral from 1 to 3 of ⌊x⌋ dx, split at x = 2 (the discontinuity inside [1, 3)).
Each piece is evaluated as a limit approaching the discontinuity from the appropriate side.
The result: the integral from 1 to 3 of ⌊x⌋ dx = 3.

🧪 Comparison Test for convergence

🧪 Statement of the test

(a) If |f(x)| ≤ g(x) for all x in [a, ∞), and if the integral from a to ∞ of g(x) dx is convergent, then the integral from a to ∞ of f(x) dx is convergent.

(b) If f(x) ≥ g(x) ≥ 0 for all x in [a, ∞), and if the integral from a to ∞ of g(x) dx is divergent, then the integral from a to ∞ of f(x) dx is divergent.

Part (a) intuition: if f is "squeezed" between −g and g, and g has finite area, then f must also have finite area.
Part (b) intuition: if f is at least as large as g and g has infinite area, then f must also have infinite area.

🔬 Using the Comparison Test

Example: Show that the integral from 1 to ∞ of (sin x)/(x²) dx is convergent.
Since |sin x| ≤ 1 for all x, we have |(sin x)/(x²)| ≤ 1/(x²).
The integral from 1 to ∞ of 1/(x²) dx converges (to 1).
By the Comparison Test, the integral from 1 to ∞ of (sin x)/(x²) dx converges.
Why this helps: you don't need to compute the integral of (sin x)/(x²) explicitly; you only need a suitable comparison function.

🛠️ Properties and practical notes

🛠️ Splitting integrals

If both pieces converge, the whole integral converges and equals the sum of the pieces.
Example: the integral from −∞ to ∞ of 1/(1 + x²) dx = the integral from −∞ to 0 + the integral from 0 to ∞ = π/2 + π/2 = π.
You can split at any convenient point c; the result is the same.

⚡ Shortcut notation

If the integrand is continuous over the entire interval (including at ±∞ in the limit sense), you may plug ±∞ directly into the antiderivative.
Example: the integral from −∞ to ∞ of 1/(1 + x²) dx = arctan(x) evaluated from −∞ to ∞ = π/2 − (−π/2) = π.
Caution: this shortcut is valid only if you understand what "plugging in ±∞" means (i.e., taking the limit) and if there are no discontinuities or asymptotes in the interior of the interval.

🔄 Using symmetry

If f is an even function (f(−x) = f(x)), then the integral from −∞ to ∞ of f(x) dx = 2 times the integral from 0 to ∞ of f(x) dx.
Example: the integral from −∞ to ∞ of 1/(1 + x²) dx = 2 times the integral from 0 to ∞ of 1/(1 + x²) dx = 2(π/2) = π.

⚠️ Common pitfall: cancellation

Just because positive and negative areas "cancel" does not mean the integral is zero.
Example: the integral from 0 to ∞ of sin x dx is divergent, not zero, even though each positive hump is followed by a negative hump.
The limit does not exist because the partial sums oscillate indefinitely.

📐 Subtraction rule caution

The rule that the integral of (f − g) equals the integral of f minus the integral of g applies only if both integrals converge.
Example: 1/(x(x+1)) = 1/x − 1/(x+1), and the integral from 1 to ∞ of 1/(x(x+1)) dx converges, but both the integral from 1 to ∞ of 1/x dx and the integral from 1 to ∞ of 1/(x+1) dx diverge.
Don't confuse: convergence of the difference does not imply convergence of the individual pieces.

Integration by Parts

6.1 Integration by Parts

🧭 Overview

🧠 One-sentence thesis

Integration by parts transforms the Product Rule for derivatives into an integral formula that simplifies integrals by strategically choosing which part to differentiate and which to integrate.

📌 Key points (3–5)

Core formula: Integration by parts states that the integral of u dv equals uv minus the integral of v du, derived directly from the Product Rule.
Strategic choice: Success depends on choosing dv as something easy to integrate and u as something that simplifies when differentiated.
Common confusion: Choosing u and dv incorrectly (e.g., letting u = exponential and dv = polynomial dx) can lead to a harder integral instead of a simpler one.
Multiple rounds: Some integrals require repeated integration by parts; the tabular method organizes this process efficiently for polynomial times exponential/trigonometric functions.
Self-reappearing integrals: Occasionally the original integral reappears on the right side, allowing algebraic manipulation to solve for it.

🔧 The core mechanism

🔧 Deriving the formula from the Product Rule

Integration by parts formula: For differentiable functions u and v, the integral of u dv equals uv minus the integral of v du.

Start with the Product Rule for differentials: d(uv) = u dv + v du.
Rearrange to isolate u dv: u dv = d(uv) − v du.
Integrate both sides: the integral of u dv = the integral of d(uv) minus the integral of v du.
Since the integral of dF equals F plus a constant, this yields: the integral of u dv = uv minus the integral of v du.

Why it works: The method converts a difficult integral (the integral of u dv) into a hopefully simpler one (the integral of v du).

🎯 When to use it

The excerpt shows it is "typically used when the integral of v du would be simpler than the original integral."
Example: Integrating x times exponential of negative x has no direct formula, but differentiating it via the Product Rule is easy—so reverse the process.
Even single-function integrals like the integral of ln x dx can use this method by treating dx as dv.

🧩 Choosing u and dv correctly

🧩 Guidelines for the choice

The excerpt provides "rough guidelines" but warns "no rules that are guaranteed to always work":

Guideline	Reason	Example from excerpt
Choose dv as a differential you can integrate easily	You must integrate dv to get v	For x times exponential of negative x dx, pick dv = exponential of negative x dx (easy to integrate)
Choose u as a function whose derivative is simpler than u	The derivative du appears in the next integral	For x times exponential of negative x dx, pick u = x (derivative is just dx, simpler)
Eliminate the most complicated function by integrating it as dv	If it simplifies when integrated	For x cubed times exponential of x squared dx, pick dv = x times exponential of x squared dx (integrates to exponential of x squared over 2)

Don't confuse: Choosing dv as something you cannot integrate defeats the purpose—e.g., choosing dv = ln x dx when trying to integrate ln x dx is "pointless, as integrating dv to get v is the original problem."

⚠️ What happens with a bad choice

Example from the excerpt: For the integral of x times exponential of negative x dx, if you let u = exponential of negative x and dv = x dx:

Then du = negative exponential of negative x dx and v = x squared over 2.
The formula gives: exponential of negative x times x squared over 2 plus one half times the integral of x squared times exponential of negative x dx.
Result: "a more difficult integral than the original"—you went in the wrong direction.

🔁 Repeated integration by parts

🔁 When multiple rounds are needed

Some integrals require integrating by parts more than once.
Example: For the integral of x squared times exponential of negative x dx, the first round reduces it to the integral of x times exponential of negative x dx, which itself needs integration by parts.
The excerpt shows: after the first round, "integrate by parts again."

📊 The tabular method

Tabular method: A systematic way to organize repeated integration by parts, especially when u is a polynomial.

How it works:

Write u in the left column and dv in the right column.
Differentiate down the u column until you reach 0 (for polynomials, this happens after degree plus one steps).
Integrate down the dv column.
Multiply diagonally with alternating signs: plus, minus, plus, minus, etc.
Sum all the products.

Example: For the integral of x cubed times exponential of negative x dx:

u column: x cubed → 3x squared → 6x → 6 → 0 (stop).
dv column: exponential of negative x dx → negative exponential of negative x → exponential of negative x → negative exponential of negative x → exponential of negative x.
Products with alternating signs: plus (x cubed)(negative exponential of negative x) minus (3x squared)(exponential of negative x) plus (6x)(negative exponential of negative x) minus (6)(exponential of negative x).
Result: negative x cubed exponential of negative x minus 3x squared exponential of negative x minus 6x exponential of negative x minus 6 exponential of negative x plus C.

Why it's efficient: The tabular method avoids writing out each integration-by-parts step separately; you know in advance how many rounds are needed (the degree of the polynomial plus one).

🔄 Special cases

🔄 The original integral reappears

Sometimes after one or two rounds of integration by parts, the original integral shows up again on the right side.

Strategy: Treat it as an algebraic equation and solve for the original integral.

Example 1 (the integral of secant cubed x dx):

After one round: the integral of secant cubed x dx = secant x tangent x plus the integral of secant x dx minus the integral of secant cubed x dx.
The original integral appears on the right with a minus sign.
Move it to the left: 2 times the integral of secant cubed x dx = secant x tangent x plus ln absolute value of (secant x plus tangent x) plus C.
Divide by 2 to solve.

Example 2 (the integral of exponential of x times sine x dx):

After two rounds, the original integral reappears.
Rearrange: 2 times the integral of exponential of x times sine x dx = negative exponential of x times cosine x plus exponential of x times sine x.
Solve: the integral equals exponential of x over 2 times (sine x minus cosine x) plus C.

Don't confuse: This is not an error—it's a valid technique. The reappearance allows you to "trap" the integral algebraically.

🔢 Single-function integrals

The excerpt shows that integration by parts can work even when "ln x appears to be the only" function in the integral of ln x dx.

Key insight: The differential dx itself can serve as dv.

Let u = ln x and dv = dx.
Then du = (1 over x) dx and v = x.
Apply the formula: the integral of ln x dx = (ln x)(x) minus the integral of x times (1 over x) dx = x ln x minus the integral of 1 dx = x ln x minus x plus C.

Why this works: Every integral has a differential; treating it as dv is always an option.

🧮 Definite and improper integrals

🧮 Applying limits of integration

For definite or improper integrals with limits a and b (which may be real numbers or plus/minus infinity):

The integral from a to b of u dv equals uv evaluated from a to b minus the integral from a to b of v du.

Example (the integral from 0 to 1 of x cubed times square root of (1 minus x squared) dx):

Let u = x squared and dv = x times square root of (1 minus x squared) dx.
Then v = negative one third times (1 minus x squared) to the three-halves power.
Apply the formula with limits: negative (x squared over 3) times (1 minus x squared) to the three-halves power evaluated from 0 to 1, plus the integral from 0 to 1 of (2x over 3) times (1 minus x squared) to the three-halves power dx.
Evaluate the boundary term: (0 minus 0) = 0.
Evaluate the remaining integral: negative (2 over 15) times (1 minus x squared) to the five-halves power evaluated from 0 to 1 = 2 over 15.

🌌 The Gamma function application

The excerpt introduces the Gamma function, defined by the integral from 0 to infinity of x to the (t minus 1) times exponential of negative x dx for all t greater than 0.

Evaluating Gamma of 2:

This requires the integral from 0 to infinity of x times exponential of negative x dx.
Using integration by parts (as shown earlier): the antiderivative is negative x exponential of negative x minus exponential of negative x.
Evaluate from 0 to infinity: the limit as x approaches infinity of (negative x over exponential of x minus 1 over exponential of x) minus (0 minus 1).
Both limits approach 0, so the result is 0 minus 0 plus 1 = 1.

Why it matters: The Gamma function "has found many uses" in physics and engineering; integration by parts is essential for evaluating it.

Trigonometric Integrals

6.2 Trigonometric Integrals

🧭 Overview

🧠 One-sentence thesis

Trigonometric integrals can be simplified by applying product-to-sum formulas for mixed angles, using substitution tricks based on trigonometric identities for powers of sine and cosine, and converting secant-tangent products into polynomial forms.

📌 Key points (3–5)

Product-to-sum formulas: Convert products of sines and cosines with different angles into sums of single trigonometric functions that are easier to integrate.
Odd-power technique: For odd powers of sine or cosine, factor out one power and replace the remaining even power using the Pythagorean identity, then substitute to get a polynomial.
Even-power technique: For even powers, use half-angle identities to reduce the power, repeating as needed until the integral becomes manageable.
Secant-tangent integrals: When the secant has even power or tangent has odd power, use Pythagorean identities and substitution to convert into polynomial integrals.
Common confusion: Knowing which identity to use depends on whether the power is odd or even, and which function (sine vs cosine, or secant vs tangent) has the favorable power.

🔄 Product-to-sum formulas for mixed angles

🔄 When to use them

Engineering applications often involve integrals like the integral of cos(alpha·t + phi₁)·cos(beta·t + phi₂) dt, where the angles inside the trigonometric functions differ (for example, when voltage and current are out of phase in AC circuits).

📐 The four formulas

The excerpt provides four product-to-sum formulas:

Product	Sum/Difference Form
sin A cos B	(1/2)(sin(A + B) + sin(A − B))
cos A sin B	(1/2)(sin(A + B) − sin(A − B))
cos A cos B	(1/2)(cos(A + B) + cos(A − B))
sin A sin B	−(1/2)(cos(A + B) − cos(A − B))

🎯 How they simplify integration

These formulas turn products of trigonometric functions into sums of individual trigonometric functions.
Sums are much easier to integrate than products.
Example: The excerpt shows sin x · sin 12x converted to −(1/2)(cos 13x − cos 11x), which integrates directly using basic formulas.

📡 Modulated waves

The integrand sin x · sin 12x is an example of a modulated wave, commonly used in electronic communications like radio broadcasting.
The curves y = ±0.5 sin x form an amplitude envelope for the modulated wave.

🔢 Odd powers of sine and cosine

🔢 The substitution trick for odd powers

For sine raised to an odd power of the form 2n + 1 (where n ≥ 1):

Replace sin²x by 1 − cos²x.
This leaves one sin x factor, which becomes part of the differential du = −sin x dx.
Use substitution u = cos x.
The integral becomes the integral of a polynomial p(u) in terms of u.

🧮 Step-by-step pattern

The general form:

Start with the integral of sin^(2n+1) x dx.
Rewrite as the integral of (sin²x)^n · sin x dx.
Replace sin²x with 1 − cos²x to get the integral of (1 − cos²x)^n · sin x dx.
Substitute u = cos x, du = −sin x dx.
Result: the integral of p(u) du, where p(u) is a polynomial.

🔄 Cosine works the same way

For odd powers of cosine:

Use cos²x = 1 − sin²x.
Substitute u = sin x, du = cos x dx.
The remaining single cos x becomes part of du.

🎯 Mixed sine-cosine products

When integrating sin^m x · cos^n x dx where either m or n is odd:

Apply the odd-power trick to whichever function has the odd power.
Example: For sin²x · cos³x, replace cos²x by 1 − sin²x, then substitute u = sin x.

📊 Even powers of sine and cosine

📊 The half-angle identity approach

For even powers of sin x or cos x:

Replace sin²x with (1 − cos 2x)/2.
Or replace cos²x with (1 + cos 2x)/2.
Repeat as often as necessary until odd powers appear or the integral becomes simple.

🔁 Why this works

The half-angle identities reduce the power by converting sin²x or cos²x into expressions involving cos 2x.
This doubles the angle but reduces the power.
After expansion, you may need to apply the identity again to any remaining even powers.

📝 Example walkthrough

The excerpt shows the integral of sin⁴x dx:

Replace sin²x with (1 − cos 2x)/2, so sin⁴x = ((1 − cos 2x)/2)².
Expand to get (1/4)(1 − 2cos 2x + cos²2x).
The cos²2x term still has even power, so apply the identity again: cos²2x = (1 + cos 4x)/2.
Final result: (1/8)(3 − 4cos 2x + cos 4x), which integrates term by term.

🔐 Secant and tangent integrals

🔐 Two favorable cases

For integrals of the form the integral of sec^m x · tan^n x dx:

When m is even (m = 2k + 2): Use sec²x = 1 + tan²x for all but two powers of sec x, then substitute u = tan x, du = sec²x dx.
When n is odd (n = 2k + 1): Use tan²x = sec²x − 1 for all but one power of tan x, then substitute u = sec x, du = sec x · tan x dx.

🔄 The pattern for even secant power

Start with sec^(2k+2) x · tan^n x.
Rewrite as (sec²x)^k · sec²x · tan^n x.
Replace sec²x with 1 + tan²x in the first factor.
Substitute u = tan x, du = sec²x dx.
Result: integral of a polynomial p(u) in terms of u = tan x.

🔄 The pattern for odd tangent power

Start with sec^m x · tan^(2k+1) x.
Rewrite as sec^(m−1) x · sec x · (tan²x)^k · tan x.
Replace tan²x with sec²x − 1.
Substitute u = sec x, du = sec x · tan x dx.
Result: integral of a polynomial p(u) in terms of u = sec x.

🔁 Cosecant and cotangent

Use the same procedure for integrals of the form the integral of csc^m x · cot^n x dx when either m is even or n is odd.
Use the identity csc²x = 1 + cot²x in a similar manner.

💡 General strategy tip

For some trigonometric integrals, try putting everything in terms of sines and cosines first.

Example: The excerpt shows cot⁴x / csc⁵x converted to cos⁴x / sin⁵x · sin⁴x = cos⁴x · sin x, which then integrates easily with u = cos x.
Don't confuse: This is a fallback strategy when the specific power patterns don't apply.

Trigonometric Substitutions

6.3 Trigonometric Substitutions

🧭 Overview

🧠 One-sentence thesis

Trigonometric substitutions transform integrals containing square-root expressions of the form square root of (a squared minus u squared), (a squared plus u squared), or (u squared minus a squared) into simpler trigonometric integrals that can be evaluated using standard techniques.

📌 Key points (3–5)

Core technique: Replace algebraic expressions with trigonometric substitutions (u equals a sine theta, a tangent theta, or a secant theta) to exploit Pythagorean identities.
Three main patterns: Each square-root form has a corresponding substitution and identity that simplifies the integral.
Converting back: After integrating in terms of theta, use right triangles or inverse trigonometric identities to express the answer in terms of the original variable.
Common confusion: Different substitutions (e.g., u equals a cosine theta versus u equals a sine theta) can yield antiderivatives that look different but are equivalent, differing only by a constant.
Extended application: The method works even without square roots present and can handle quadratic expressions after completing the square.

🎯 Motivating example: area of a circle

🎯 Why trigonometric substitution matters

The excerpt proves the formula for the area of a circle (A equals pi r squared) using calculus.
The area equals twice the integral from negative r to r of the square root of (r squared minus x squared).
This integral cannot be evaluated with basic techniques but becomes straightforward with a trigonometric substitution.

🔄 The substitution process

Any point (x, y) on a circle of radius r can be written as (r cosine theta, r sine theta).
Substitute x equals r cosine theta and dx equals negative r sine theta d-theta.
The square root simplifies: square root of (r squared minus r squared cosine squared theta) equals r sine theta (using the identity 1 minus cosine squared theta equals sine squared theta).
The integral becomes 2 r squared times the integral of sine squared theta, which evaluates to pi r squared.

📋 The three standard substitution patterns

📋 Pattern table

Integral contains	Substitution	Identity used
square root of (a squared minus u squared)	u equals a sine theta	1 minus sine squared theta equals cosine squared theta
square root of (a squared plus u squared)	u equals a tangent theta	1 plus tangent squared theta equals secant squared theta
square root of (u squared minus a squared)	u equals a secant theta	secant squared theta minus 1 equals tangent squared theta

🔍 Why these substitutions work

Each substitution is designed so the Pythagorean identity eliminates the square root.
Example: For square root of (a squared minus u squared), substituting u equals a sine theta gives square root of (a squared minus a squared sine squared theta) equals a cosine theta.
The differential also transforms: if u equals a sine theta, then du equals a cosine theta d-theta.

🧮 Standard formulas and equivalences

🧮 Formula for square root of (a squared minus u squared)

The excerpt derives two equivalent formulas:

Using u equals a cosine theta: the antiderivative is negative (a squared over 2) inverse cosine (u over a) plus (1 over 2) u times square root of (a squared minus u squared) plus C.
Using u equals a sine theta: the antiderivative is (a squared over 2) inverse sine (u over a) plus (1 over 2) u times square root of (a squared minus u squared) plus C.

🔄 Why different substitutions give equivalent answers

The identity inverse sine x plus inverse cosine x equals pi over 2 (for all negative 1 less than or equal to x less than or equal to 1) shows that the two antiderivatives differ only by the constant pi a squared over 4, which is absorbed into the generic constant C.

Either substitution (u equals a cosine theta or u equals a sine theta) can be used.
The excerpt notes that u equals a sine theta is sometimes preferred to avoid the negative sign in du.

🧮 Formulas for the other two patterns

For square root of (a squared plus u squared): the antiderivative is (1 over 2) u times square root of (a squared plus u squared) plus (a squared over 2) natural log of absolute value of (u plus square root of (a squared plus u squared)) plus C.
For square root of (u squared minus a squared): the antiderivative is (1 over 2) u times square root of (u squared minus a squared) minus (a squared over 2) natural log of absolute value of (u plus square root of (u squared minus a squared)) plus C.
Both formulas require the result that the integral of secant cubed theta equals (1 over 2) times (secant theta tangent theta plus natural log of absolute value of (secant theta plus tangent theta)) plus C.

🔧 Working examples and techniques

🔧 Example: square root of (9 minus 4 x squared)

Identify the pattern: this is square root of (a squared minus u squared) with a equals 3 and u equals 2x.
Then du equals 2 dx, so dx equals (1 over 2) du.
Apply the formula with the substitution and simplify to get (9 over 4) inverse sine (2x over 3) plus (1 over 2) x times square root of (9 minus 4 x squared) plus C.

🔧 Example without square roots: 1 over (1 plus x squared) squared

The integrand contains a term of the form a squared plus u squared (with a equals 1 and u equals x).
Use the substitution x equals tangent theta, so dx equals secant squared theta d-theta.
The denominator becomes (1 plus tangent squared theta) squared equals (secant squared theta) squared equals secant to the fourth power theta.
The integral simplifies to the integral of cosine squared theta, which equals (theta over 2) plus (1 over 4) sine 2 theta plus C.

🎨 Converting back using right triangles

After integrating in terms of theta, draw a right triangle with an angle theta matching the original substitution.
Example: If tangent theta equals x over 1, draw a triangle with opposite side x and adjacent side 1; the hypotenuse is square root of (1 plus x squared).
Read off sine theta and cosine theta from the triangle to express the answer in terms of x.
Alternative: Use trigonometric identities algebraically (e.g., secant squared theta equals 1 plus tangent squared theta) to solve for sine theta and cosine theta.

🔧 Completing the square for quadratic expressions

Quadratic expressions like 4 x squared plus 8 x minus 5 can be rewritten by completing the square.
Example: 4 x squared plus 8 x minus 5 equals 4 times (x squared plus 2 x) minus 5 equals 4 times (x plus 1) squared minus 9.
This is now of the form u squared minus a squared with u equals 2 times (x plus 1) and a equals 3.
Use the substitution u equals a secant theta and proceed as usual.

🛠️ Practical tips

🛠️ When to use trigonometric substitution

The excerpt states: "when other methods fail, use the table as a guide for certain types of integrals."
The Power Formula substitution does not work for integrals like 1 over (1 plus x squared) squared (because the entire denominator, not just a factor, is raised to a power).
Integration by parts is not promising for these forms.
Trigonometric substitution is the method of choice for square-root expressions matching the three patterns.

🛠️ Don't confuse: different substitutions, same integral

For the same integral, different valid substitutions can produce antiderivatives that look different.
Example: u equals a cosine theta versus u equals a sine theta for square root of (a squared minus u squared).
The answers are equivalent because they differ only by a constant absorbed into C.
Always check whether your answer matches the expected form by verifying the constant difference.

Partial Fractions

6.4 Partial Fractions

🧭 Overview

🧠 One-sentence thesis

The method of partial fractions transforms complicated rational functions into sums of simpler fractions that can be integrated using standard formulas.

📌 Key points (3–5)

Core idea: Replace a single complicated rational function with a sum of simpler fractions, each easy to integrate.
When it applies: The numerator's degree must be less than the denominator's degree; if not, divide first.
Four main cases: Distinct linear factors, repeated linear factors, distinct quadratic factors, and repeated quadratic factors—each has its own decomposition pattern.
Common confusion: Quadratic factors are only "quadratic" if they cannot be factored into linear terms (no real roots); otherwise treat them as linear factors.
Solution method: Assume a decomposition form, get a common denominator, then equate coefficients to solve for unknown constants.

🔧 The basic method

🔧 What partial fraction decomposition means

Partial fraction decomposition: expressing a rational function as a sum of simpler rational functions.

Instead of integrating one complicated fraction, you integrate several simple ones.
Example: one over x squared plus x becomes one over x minus one over x plus one.
Each simpler fraction corresponds to a factor in the denominator.

🧮 How to find the constants

Assume a form: Write the rational function as a sum with unknown constants (A, B, C, etc.).
Common denominator: Multiply through to get a common denominator on the right side.
Equate numerators: Since denominators match, the numerators must be equal.
Match coefficients: Equate coefficients of each power of x (constant term, x term, x squared term, etc.).
Solve the system: Solve the resulting equations for the unknown constants.

📋 Degree requirement

The method assumes the numerator's degree is less than the denominator's degree.
If the numerator's degree is equal to or greater than the denominator's, divide first (polynomial long division).
A trick when degrees are equal: rewrite the numerator to separate out the denominator, leaving a simpler rational function.

📦 Case 1: Distinct linear factors

📦 The decomposition form

When the denominator factors into distinct (non-repeated) linear terms like (a₁x + b₁)(a₂x + b₂)···(aₙx + bₙ), write:

p(x) / q(x) = A₁/(a₁x + b₁) + A₂/(a₂x + b₂) + ··· + Aₙ/(aₙx + bₙ)

Each linear factor gets one fraction with a constant numerator.
"Distinct" means no factor appears more than once.

🧪 Example scenario

Example: Integrating one over (x squared minus 7x plus 10).

Factor the denominator: (x - 2)(x - 5).
Assume: 1/[(x - 2)(x - 5)] = A/(x - 2) + B/(x - 5).
Common denominator gives: (A + B)x + (-5A - 2B) in the numerator.
Equate coefficients: A + B = 0 and -5A - 2B = 1.
Solve: A = -1/3, B = 1/3.
Integrate: -1/3 ln|x - 2| + 1/3 ln|x - 5| + C.

🔁 Case 2: Repeated linear factors

🔁 When a factor repeats

When one linear factor appears m times (like (ax + b)^m) and all others are distinct, the decomposition includes m terms for the repeated factor:

A₁/(ax + b) + A₂/(ax + b)² + ··· + Aₘ/(ax + b)^m

Each power from 1 up to m gets its own fraction.
Distinct factors still get single fractions as in Case 1.

🧪 Example scenario

Example: Integrating (x squared plus x minus 1) over (x cubed plus x squared).

Factor: x²(x + 1), so x is repeated twice.
Assume: A/x + B/x² + C/(x + 1).
Expand and equate coefficients: B = -1, A = 2, C = -1.
Integrate: 2 ln|x| + 1/x - ln|x + 1| + C.

⚠️ Multiple repeated factors

If two or more factors are repeated, each gets its own series of terms.
Example: For x²(x + 1)², use A/x + B/x² + C/(x + 1) + D/(x + 1)².

🟦 Case 3 & 4: Quadratic factors

🟦 What counts as quadratic

A factor ax² + bx + c is considered quadratic only if it cannot be factored into linear terms (has no real roots) and a ≠ 0.

If it factors into linear terms, use Case 1 or 2 instead.
Don't confuse: x² + 1 is quadratic (no real roots), but x² - 4 = (x - 2)(x + 2) is not.

🟦 Distinct quadratic factors (Case 3)

When the denominator is a product of distinct quadratic factors, each gets a linear numerator:

p(x) / q(x) = (A₁x + B₁)/(a₁x² + b₁x + c₁) + (A₂x + B₂)/(a₂x² + b₂x + c₂) + ···

Unlike linear factors (constant numerator), quadratic factors need Ax + B in the numerator.
Solve for both A and B constants for each quadratic factor.

🔁 Repeated quadratic factors (Case 4)

When a quadratic factor repeats m times, include m terms with increasing powers:

(A₁x + B₁)/(ax² + bx + c) + (A₂x + B₂)/(ax² + bx + c)² + ··· + (Aₘx + Bₘ)/(ax² + bx + c)^m

Each power gets a linear numerator.
Distinct quadratic factors still get single fractions as in Case 3.

🧪 Example scenario

Example: Integrating one over [(x² + 1)²(x² + 4)].

x² + 1 is repeated (quadratic), x² + 4 is distinct (quadratic).
Assume: (Ax + B)/(x² + 1) + (Cx + D)/(x² + 1)² + (Ex + F)/(x² + 4).
Expand to a fifth-degree numerator, equate all six coefficients.
Solve the system: A = 0, B = -1/9, C = 0, D = 1/3, E = 0, F = 1/9.
Integrate using arctangent formulas and results from previous sections.

🔍 Common pitfalls and tips

🔍 Degree check first

Always verify that the numerator's degree is less than the denominator's.
If not, perform polynomial division or use algebraic tricks to reduce it.

🔍 Factor completely

The method depends on correctly factoring the denominator.
Check whether quadratics can be factored further into linear terms.

🔍 Systematic coefficient matching

After getting a common denominator, expand fully.
Match every power of x: constant, x, x², x³, etc.
Each equation gives one constraint on the unknown constants.

Miscellaneous Integration Methods

6.5 Miscellaneous Integration Methods

🧭 Overview

🧠 One-sentence thesis

Beyond standard integration techniques, methods like the Leibniz integral rule (differentiation under the integral sign), half-angle substitutions, and fractional derivatives extend the toolkit for evaluating challenging integrals and defining derivatives of non-integer order.

📌 Key points (3–5)

Leibniz integral rule: Differentiate a known integral with respect to a constant parameter inside the integrand to produce a new integral formula.
Half-angle substitution: The substitution t = tan(θ/2) converts rational functions of sine and cosine into rational functions of t, enabling integration via partial fractions.
Beta and Gamma functions: Special functions defined by integrals that can be rewritten through substitutions and relate to each other via a product formula.
Fractional derivatives: The Riemann-Liouville definition extends derivatives to non-integer orders (e.g., order 1/2) using an integral formula involving the Gamma function.
Common confusion: The Leibniz rule requires "working backwards"—to find a desired integral, identify which simpler integral to differentiate with respect to a parameter.

🔄 Leibniz integral rule (differentiation under the integral sign)

🔄 Core mechanism

The Leibniz integral rule allows moving differentiation with respect to a parameter inside an integral: d/dα ∫ f(α,x) dx = ∫ (∂f/∂α) dx when the derivative is continuous.

Start with a known integral involving a constant parameter (e.g., α).
Differentiate both sides with respect to that parameter.
The differentiation operator moves inside the integral on the left side.
This produces a new integral formula.
Example: From ∫ e^(αx) dx = (1/α)e^(αx) + C, differentiating with respect to α yields ∫ x e^(αx) dx.

🎯 Working backwards strategy

If you want to evaluate a specific integral using this rule, identify which simpler integral contains the target when differentiated.
The parameter α is treated as a variable only during differentiation, then can be set to a specific value.
Example: To find ∫ dx/(1+x²)², start from the known formula ∫ dx/(a²+x²) and differentiate with respect to a.

📐 Application to definite integrals

The method also applies to definite integrals, including improper integrals.
Example: The excerpt shows ∫₀^∞ e^(-x²) dx = (1/2)√π by constructing an auxiliary function φ(α) and differentiating under the integral sign.
Don't confuse: The limits of integration remain fixed; only the integrand is differentiated with respect to the parameter.

🔺 Half-angle substitution

🔺 Geometric foundation

The half-angle substitution t = tan(θ/2) identifies points on the unit circle by the slope of a line from (-1,0) through the point to the vertical line x=1.

Each point P on the unit circle (except A = (-1,0)) corresponds to a unique slope t.
The inscribed angle from A to P is half the central angle θ.
This gives: sin θ = 2t/(1+t²), cos θ = (1-t²)/(1+t²), dθ = 2dt/(1+t²).

⚙️ Converting trigonometric integrals

Rational functions of sin θ and cos θ become rational functions of t.
These can then be integrated using partial fractions or other algebraic methods.
Example: ∫ dθ/(1 + sin θ + cos θ) becomes ∫ dt/(t+1) after substitution.

🔗 Half-angle identities

The substitution yields useful identities:

tan(θ/2) = sin θ/(1 + cos θ) = (1 - cos θ)/sin θ
These can sometimes simplify integrals directly without full substitution.
Example: ∫ sin θ/(1 + cos θ) dθ = ∫ tan(θ/2) dθ, which is easier to integrate.

🎲 Special functions: Gamma and Beta

🎲 Gamma function transformations

The Gamma function Γ(t) = ∫₀^∞ x^(t-1) e^(-x) dx can be rewritten via substitution x = y²:
Γ(t) = 2∫₀^∞ y^(2t-1) e^(-y²) dy
This form reveals that Γ(1/2) = √π using the result from the Leibniz rule section.

🎲 Beta function definition and transformations

The Beta function: B(x,y) = ∫₀¹ t^(x-1) (1-t)^(y-1) dt for all x > 0 and y > 0.

Alternative form via substitution u = t/(1-t): B(x,y) = ∫₀^∞ u^(x-1)/(1+u)^(x+y) du
Relationship to Gamma: B(x,y) = Γ(x)Γ(y)/Γ(x+y)
The Beta function is symmetric: B(x,y) = B(y,x)

🔬 Applications

These functions appear in evaluating integrals involving powers and exponentials.
Substitutions can transform integrals into forms involving these special functions.
The relationship between Beta and Gamma allows converting between different integral forms.

🧮 Fractional derivatives

🧮 Riemann-Liouville definition

For 0 < α < 1, the fractional derivative of order α of a function f(x) is: d^α/dx^α f(x) = (1/Γ(1-α)) d/dx ∫₀ˣ f(t)/(x-t)^α dt

This extends the concept of derivatives to non-integer orders.
The definition involves both integration and differentiation.
The Gamma function normalizes the formula.

🧮 Example calculation

For f(x) = x and α = 1/2: d^(1/2)/dx^(1/2) (x) = 2√x/√π
The calculation requires substitution to evaluate the integral before differentiating.
Don't confuse: This is not simply "half of the first derivative"; it's a fundamentally different operation.

🔗 Combining fractional and integer orders

Two half-derivatives can combine to give a whole derivative: d^(1/2)/dx^(1/2) (d^(1/2)/dx^(1/2) f(x)) = d/dx f(x)
For integer n ≥ 1 and fractional 0 < α < 1: d^(n+α)/dx^(n+α) f(x) = d^α/dx^α (d^n/dx^n f(x))
Take the integer-order derivative first, then apply the fractional derivative.

Numerical Integration Methods

6.6 Numerical Integration Methods

🧭 Overview

🧠 One-sentence thesis

Modern computing has made numerical integration methods—rectangle, trapezoid, Simpson's rule, and Gaussian quadrature—practical and efficient for approximating definite integrals that lack closed-form antiderivatives, with differences in efficiency now largely negligible.

📌 Key points (3–5)

Why numerical methods matter: Many functions have no closed-form antiderivative, so approximation methods are essential for evaluating definite integrals.
Core approach: All methods approximate the integral as a weighted sum of function values at specific points.
Three classical methods: Rectangle method (simplest), trapezoid rule (uses trapezoids), and Simpson's rule (uses parabolic curves).
Common confusion: Simpson's rule is slightly more efficient than trapezoid, which beats rectangle method—but with modern computing, all achieve high accuracy quickly, making efficiency differences negligible.
Advanced technique: Gaussian quadrature transforms any interval to [-1, 1] and uses pre-calculated optimal points and weights from tables.

📐 Classical approximation methods

📦 Rectangle method basics

The rectangle method divides an interval into subintervals and approximates the area under the curve using rectangles.

How it works: Partition the interval [a, b] into n equal subintervals; use function values at left endpoints, right endpoints, or midpoints as rectangle heights.
Key insight: More rectangles → better approximation.
Example: For the integral of sin(x²) from 0 to √π with 100,000 subintervals, all three variants (left, right, midpoint) gave accuracy to 9 decimal places.

🔺 Trapezoid rule

The trapezoid rule approximates the area using trapezoids instead of rectangles, with the top edge being a straight line connecting consecutive function values.

Formula structure: For partition with equal width h = (b - a)/n and function values y₀, y₁, ..., yₙ:
- Integral ≈ (h/2) × (y₀ + 2y₁ + 2y₂ + ... + 2yₙ₋₁ + yₙ)
Why it helps: Takes advantage of the function's changing slope by using slanted edges instead of horizontal ones.
Pattern in weights: First and last values get weight h/2; all middle values get weight h.

🎯 Simpson's rule

Simpson's rule uses parabolic curves (not straight lines) as the top edges of approximating regions, working with pairs of subintervals.

Formula structure: For n ≥ 2 (must be even):
- Integral ≈ (h/3) × (y₀ + 4y₁ + 2y₂ + 4y₃ + 2y₄ + ... + 2yₙ₋₂ + 4yₙ₋₁ + yₙ)
Key requirement: Number of subintervals must be even because the method works with pairs.
Weight pattern: Alternates 4 and 2 for interior points (starting with 4), with first and last getting coefficient 1.
Example: For sin(x²) integral, Simpson's rule with 100,000 subintervals gave essentially the exact value.

⚖️ Efficiency comparison

Method	Shape used	Relative efficiency	Modern reality
Rectangle	Rectangles	Least efficient	Negligible difference
Trapezoid	Trapezoids	Slightly better	Negligible difference
Simpson's	Parabolic regions	Slightly best	Negligible difference

Historical context: Before modern computing, efficiency differences mattered; alternative methods were created to reduce calculation burden.
Modern reality: With computers, all methods achieve high accuracy (9+ decimal places) in milliseconds, making efficiency differences practically irrelevant.
Don't confuse: "More efficient" historically meant fewer calculations needed; today it means the methods still differ slightly, but computing speed makes this unimportant for most applications.

🎓 Gaussian quadrature

🔄 Core transformation idea

Gaussian quadrature transforms any integral over [a, b] into an integral over the standard interval [-1, 1], then uses pre-calculated optimal points and weights.

Substitution formula: u = (2x - a - b)/(b - a), which gives x = ((b - a)/2)u + (a + b)/2
Result: Integral from a to b of f(x)dx = ((b - a)/2) × integral from -1 to 1 of g(u)du
- where g(u) = f(((b - a)/2)u + (a + b)/2)
Why standardize: Allows use of pre-computed optimal points and weights that work for any integral.

📊 Using the tables

Table 6.1: Provides points aᵢ and weights wᵢ for n = 2 to 10 points in [-1, 1].
Approximation formula: After transformation, integral ≈ ((b - a)/2) × sum of wᵢ × g(aᵢ) for i = 1 to n.
Symmetry pattern: Points come in ± pairs (except when n is odd, which includes 0).
Example: For integral of 1/(1 + x³) from 0 to 2 with n = 4 points, the method gave 1.091621 vs true value 1.090002.

♾️ Improper integrals

Gaussian quadrature can handle improper integrals of the form: integral from 0 to ∞ of f(x)e⁻ˣ dx.

Table 6.2: Provides special points and weights for n = 3, 4, or 5 points in [0, ∞).
Direct formula: Integral ≈ sum of wᵢ × f(aᵢ) (no transformation factor needed).
Connection to special functions: The points aᵢ are roots of Laguerre polynomials of degree n.
Example: For integral of x⁵e⁻ˣ from 0 to ∞ with n = 3, approximation was 119.997... vs true value Γ(6) = 5! = 120.

🔗 Unified framework

All numerical integration methods share a common structure:

General form: Integral ≈ sum of wᵢ × f(aᵢ)
What varies: The choice of points aᵢ and weights wᵢ
Rectangle/trapezoid/Simpson's: Points are evenly spaced; weights follow simple patterns
Gaussian quadrature: Points and weights are optimally chosen (not evenly spaced) for maximum accuracy with fewer evaluations

💻 Computational implementation

🛠️ Domain-specific languages

Why DSLs: Traditional programming requires loops; scientific computing languages like MATLAB/Octave make implementation much simpler.
Octave advantage: Free, open-source alternative to MATLAB.
One-liner capability: Rectangle method can be implemented in a single command line.

📝 Syntax examples

The excerpt demonstrates Octave commands:

linspace(a, b, n): Creates n equally spaced points from a to b (including endpoints)
Array indexing: (1:end-1) gets all but last element; (2:end) gets all but first
Element-wise operations: The dot before operators (e.g., .^2) applies operation to each element
Skipping elements: (2:2:end-1) starts at position 2, moves in increments of 2
Example: linspace(1, 7, 4) creates [1, 3, 5, 7], dividing [1, 7] into 3 subintervals of width 2

⚡ Built-in functions

trapz function: Octave/MATLAB includes built-in trapezoid rule implementation.
Practical advice: Generally better to use built-in functions instead of implementing your own.
Performance: Calculations with 100,000 subintervals take only a few thousandths of a second.
Accuracy achieved: All methods in examples reached 9+ decimal places (equivalent to measuring Detroit-Chicago distance within a toothpick's thickness).

🔢 Advanced implementation

For Gaussian quadrature with many points:

Array operations: Define arrays of points and weights, then use element-wise operations
Conciseness: Seven-point Gaussian quadrature implemented in three lines: define points array, define weights array, compute weighted sum
Don't confuse: Element-wise operations (./) vs matrix operations (/)—the dot is crucial for applying operations to each array element independently

Ellipses

7.1 Ellipses

🧭 Overview

🧠 One-sentence thesis

An ellipse is defined geometrically as the set of all points whose distances to two fixed foci sum to a constant, and this definition leads to a standard algebraic equation, measurable eccentricity, and a remarkable reflection property used in optics and astronomy.

📌 Key points (3–5)

Geometric definition: An ellipse is all points in a plane where the sum of distances to two fixed points (foci) is constant, unlike a circle which uses one fixed point and one fixed distance.
Standard equation and anatomy: The equation x²/a² + y²/b² = 1 (with a > b > 0) describes an ellipse centered at the origin with semi-major axis a, semi-minor axis b, and foci at (±c, 0) where c = √(a² − b²).
Eccentricity measures "ovalness": The ratio e = c/a (where 0 < e < 1) quantifies how "squished" an ellipse is; e = 0 is a circle, e approaching 1 is nearly a line segment.
Common confusion: The larger denominator in x²/__ + y²/__ = 1 tells you which axis is principal—don't assume the x-axis is always the major axis.
Reflection property: Light from one focus reflects off any point on the ellipse to the other focus, a consequence of Fermat's Principle and the geometry of the tangent line.

📐 Geometric foundation

📐 Definition and construction

Ellipse: The set of all points in a plane such that the sum of the distances from each point to two fixed points (foci) is the same constant.

This contrasts with a circle, which uses a single fixed point (center) and fixed distance (radius).
Physical construction: Pin two points on a board, loop a string around them with slack, pull taut with a pencil, and trace—the pencil draws an ellipse because the string length (d₁ + d₂) stays constant.
The symmetry becomes obvious through this hands-on method.

🏷️ Terminology and parts

Term	Definition
Foci	The two fixed points (plural of focus)
Center	Midpoint between the foci
Principal axis	Line containing the foci
Vertexes	Points where the ellipse intersects the principal axis
Major axis	Chord joining the vertexes (length 2a)
Minor axis	Chord through center perpendicular to major axis (length 2b)
Semi-major/minor axes	Halves of the major/minor axes (lengths a and b)

A circle is the special case where both foci coincide at the center.

🧮 Algebraic representation

🧮 Deriving the standard equation

Starting from the geometric definition with foci at (±c, 0) and constant sum 2a:

Use the distance formula: √((x + c)² + y²) + √((x − c)² + y²) = 2a
After algebraic manipulation (isolating radicals, squaring, simplifying), the equation reduces to:
- x²/a² + y²/(a² − c²) = 1
Define b² = a² − c², so the standard form is x²/a² + y²/b² = 1 with a > b > 0.

📏 Key relationships

Foci location: For any ellipse x²/a² + y²/b² = 1 with a > b > 0, the foci are at (±c, 0) where c = √(a² − b²).
Vertexes: At (±a, 0) on the x-axis (when x-axis is principal).
Minor axis endpoints: At (0, ±b).
Alternative form: If a > b but the equation is x²/b² + y²/a² = 1, the principal axis is the y-axis and foci are at (0, ±c).
Don't confuse: The largest denominator indicates the principal axis direction, not the variable name.

Example: x²/25 + y²/16 = 1 has principal axis along x (since 25 > 16), but x²/4 + y²/9 = 1 has principal axis along y (since 9 > 4).

📊 Eccentricity

📊 Definition and meaning

Eccentricity (e): The ratio of the distance between foci to the length of the major axis.

For x²/a² + y²/b² = 1 with a > b > 0:

e = c/a = √(a² − b²)/a
Range: 0 < e < 1 for ellipses
Boundary cases: e = 0 is a circle; e = 1 is a line segment

🌍 Interpreting eccentricity

Measures "ovalness": The closer e gets to 1, the more "squished" or elongated the ellipse.
Real-world example: Earth's orbit has e = 0.017 (nearly circular); Venus and Neptune are even rounder at e = 0.007; Pluto's orbit is most eccentric at e = 0.252.
Alternative equation form: y² = (1 − e²)(a² − x²) expresses the ellipse in terms of eccentricity.

🔬 Properties and applications

📐 Area calculation

The area inside x²/a² + y²/b² = 1 is πab.

Derivation uses symmetry (four times the first-quadrant area) and integration.
In terms of eccentricity: Area = πa²√(1 − e²).

💡 Reflection property

Reflection property: Light shone from one focus to any point on the ellipse will reflect to the other focus.

This follows from Fermat's Principle: angle of incidence equals angle of reflection (measured relative to the tangent line, or equivalently, the normal line).
Proof sketch: Show that the normal line at point P bisects the angle ∠F₁PF₂, which makes the angles α₁ and α₂ equal.
The proof uses the tangent line equation xx₀/a² + yy₀/b² = 1 at point (x₀, y₀) and distance calculations.
Applications: Elliptical mirrors and reflectors; planetary orbits (planets sweep equal areas in equal times around the Sun at one focus).

🌌 Natural occurrences

Planetary orbits around the Sun are elliptical (Kepler's laws).
Ancient Greeks derived ellipse properties purely geometrically; modern analytic geometry uses coordinate systems to derive the same results algebraically.

Parabolas

7.2 Parabolas

🧭 Overview

🧠 One-sentence thesis

Parabolas are defined geometrically as the set of points equidistant from a fixed focus and a fixed directrix, which produces curves with a constant eccentricity of 1 and a reflection property that makes them useful in engineering applications like headlights and satellite dishes.

📌 Key points (3–5)

Geometric definition: A parabola is the set of all points equidistant from a fixed point (focus) and a fixed line (directrix).
Eccentricity is always 1: Unlike ellipses where the ratio of distances is less than 1, parabolas have a ratio of exactly 1, meaning there is only one vertex.
Standard forms: The equation 4py = x² (vertical axis) or 4px = y² (horizontal axis) describes parabolas with vertex at the origin; the parameter p determines focus location and directrix position.
Common confusion: Not every U-shaped curve is a parabola (e.g., y = x⁴ is not); only curves of the form y = ax² (or equivalent) satisfy the geometric definition.
Reflection property: Light from the focus reflects off the parabola parallel to its axis, which explains applications in headlights, flashlights, and satellite dishes.

📐 Geometric definition and construction

📐 What defines a parabola

A parabola is the set of all points in a plane that are equidistant from a fixed point (the focus) and a fixed line (the directrix).

For any point P on the parabola, the distance PF (to the focus F) equals the distance PG (to the directrix D).
This is similar to the alternative definition of an ellipse, but the ratio PF/PG equals 1 instead of some eccentricity e < 1.
The vertex is the point halfway between the focus and directrix; it is the closest point on the parabola to the directrix.
The axis of the parabola is the line through the focus perpendicular to the directrix.

🛠️ Physical construction method

The excerpt describes a drafting-triangle construction:

Cut a string to length AB (one side of a triangle).
Pin one end at vertex A of the triangle, the other end at a focus F between A and B.
Hold the string taut against edge AB at point P, then slide edge BC along the directrix D.
The traced curve is a parabola because PF = PB (since string length AB = AP + PF implies PF = PB).

Example: Moving the triangle while keeping the string taut ensures every drawn point P satisfies the equal-distance condition.

🔢 Eccentricity of a parabola

The ratio PF/PG = 1 for all points on a parabola.
Therefore, the eccentricity of a parabola is always 1.
This means there is no second vertex, unlike an ellipse (where e < 1 forces two vertices).

🧮 Algebraic equations and standard forms

🧮 Deriving the equation (vertical parabola)

Starting with focus at (0, p) where p > 0, and directrix y = −p:

The vertex is at the origin (0, 0).
For any point (x, y) on the parabola, distance to focus = distance to directrix:
- Distance to focus: square root of ((x − 0)² + (y − p)²)
- Distance to directrix: |y + p|
Setting these equal and squaring: (x − 0)² + (y − p)² = (y + p)²
Simplifying: x² + y² − 2py + p² = y² + 2py + p²
Result: x² = 4py or equivalently y = (1/(4p)) x²

📊 Standard forms and their properties

Form	Focus	Directrix	Vertex	Direction
4py = x² (p > 0)	(0, p)	y = −p	(0, 0)	Opens upward
4py = x² (p < 0)	(0, p)	y = −p	(0, 0)	Opens downward
4px = y² (p > 0)	(p, 0)	x = −p	(0, 0)	Opens rightward
4px = y² (p < 0)	(p, 0)	x = −p	(0, 0)	Opens leftward

🔍 Recognizing parabolas from equations

Any curve of the form y = ax² (where a ≠ 0) is a parabola.
To find focus and directrix: p = 1/(4a), so focus is at (0, 1/(4a)) and directrix is y = −1/(4a).
Example: For y = x², we have a = 1, so p = 1/4; focus is at (0, 1/4) and directrix is y = −1/4.
The excerpt notes that y = ax² + bx + c is also a parabola (proof left as exercise).
Don't confuse: Not every U-shaped curve is a parabola; for example, y = x⁴ is not a parabola because it doesn't satisfy the geometric definition.

🪞 Reflection property and applications

🪞 The reflection property

Light shone from the focus to any point on the parabola will reflect in a path parallel to the axis of the parabola.

This means light emanating from the focus reflects off the parabola and travels parallel to the axis.
The property works in both directions: incoming parallel light reflects to the focus.

🔬 Proof of the reflection property

For the parabola 4px = y²:

Consider light from focus F = (p, 0) hitting point P = (x₀, y₀) on the parabola.
The tangent line at P has equation 2p(x + x₀) = y₀y.
This tangent line intersects the x-axis at Q = (−x₀, 0).
The focal radius FP has length: square root of ((p − x₀)² + (0 − y₀)²) = p + x₀.
The distance FQ also equals p + x₀.
Since FQ = FP in triangle FPQ, the angles ∠FPQ = ∠FQP = β.
This shows the angle of incidence equals the angle of reflection, satisfying Fermat's Principle for curved surfaces.

🔧 Engineering applications

Paraboloid surfaces (formed by revolving a parabola around its axis):

Vehicle headlights: Used to have paraboloid reflective surfaces with a bulb at the focus, so light shines straight ahead in a solid beam.
Flashlights: Many still use this principle.
Satellite dishes and radio telescopes: Wide paraboloids with a signal receiver at the focus maximize reception of incoming reflected signals.

Example: A satellite dish focuses all incoming parallel signals to a single receiver point at the focus, amplifying the signal strength.

🎯 Tangent lines and focal radii

📏 Tangent line equations

For the parabola 4py = x², the tangent line at point (x₀, y₀) is:

2p(y + y₀) = x₀x

For the parabola 4px = y², the tangent line at point (x₀, y₀) is:

2p(x + x₀) = y₀y

These formulas simplify calculations involving tangent lines and are used in proving the reflection property.

📐 Focal radius

The focal radius is the distance from a point on the parabola to the focus.
For a point (x₀, y₀) on the parabola 4px = y², the focal radius FP = p + x₀.
This length plays a key role in the reflection property proof.

🚀 Applications: projectile trajectories

🚀 Parabolic trajectories

The excerpt presents a detailed example showing that projectile paths are parabolas:

An object launched from the ground with initial velocity v₀ at angle θ follows the path:
- y = −(gx²)/(2v₀² cos²θ) + x tan θ
This is a parabola for each angle θ.
Maximum horizontal distance v₀²/g occurs at θ = π/4 (45 degrees).
Maximum vertical height v₀²/(2g) occurs at θ = π/2 (straight up).

🌐 Envelope of trajectories

A surprising result: the boundary (envelope) of all possible trajectories is itself a parabola.

Key insight: All parabolic trajectories share the same directrix y = v₀²/(2g), regardless of launch angle θ.

Proof sketch:

Each trajectory is a parabola with vertex at height (v₀² sin²θ)/(2g).
The directrix for each trajectory is at y = v₀²/(2g) (independent of θ).
All foci lie on a circle C₀ of radius v₀²/(2g) centered at the origin.
A point P inside the envelope lies on trajectories whose foci are on both C₀ and a circle C centered at P touching the directrix.
When C and C₀ intersect at exactly one point, P is on the envelope.
This envelope is a parabola with focus at the origin and directrix y = v₀²/g.

Example: Imagine launching a ball at all possible angles—the region reachable by the ball has a parabolic boundary.

🌉 Suspension bridges

Suspension cables supporting a horizontal bridge form parabolas if the bridge's weight is uniformly distributed.
The cables hang in a parabolic shape, with vertical suspenders connecting the cable to the bridge deck.

Don't confuse: This assumes uniform weight distribution; if the weight distribution changes, the cable shape may not be parabolic.

Hyperbolas

7.3 Hyperbolas

🧭 Overview

🧠 One-sentence thesis

Hyperbolas are conic sections with eccentricity greater than 1, characterized by two symmetric branches, two foci, and asymptotic behavior that distinguishes them from ellipses and parabolas.

📌 Key points (3–5)

Eccentricity classification: Hyperbolas have eccentricity e > 1, completing the family of conic sections (circles e = 0, ellipses 0 < e < 1, parabolas e = 1).
Two equivalent definitions: Either as points with constant ratio (> 1) of distances from focus to directrix, or as points with constant absolute difference of distances from two foci.
Key geometric features: Two symmetric branches, two foci, two directrices, two asymptotes, center, transverse and conjugate axes.
Common confusion: Unlike parabolas (one axis of symmetry), hyperbolas have symmetry about both axes due to their equation form.
Reflection property: Light from one focus reflects off the hyperbola away from the other focus (opposite direction).

📐 Defining hyperbolas by eccentricity

📏 First definition (focus-directrix ratio)

A hyperbola is the set of all points in a plane such that the ratio of the distance from a fixed point (a focus) to the distance from a fixed line (a directrix) is a constant e > 1, called the eccentricity of the hyperbola.

This parallels the second definition of an ellipse (where e < 1) and the parabola definition (where e = 1).
The ratio PF/PG = e > 1 means the distance to the focus always exceeds the distance to the directrix by a constant factor.
Example: If e = 2, a point is always twice as far from the focus as from the directrix.

🔢 Standard equation derivation

Starting with focus at (ea, 0) and directrix at x = a/e, the condition d₁/d₂ = e leads to:

The equation: x²/a² − y²/b² = 1, where b² = (e² − 1)a² > 0
This produces two branches: x = ± (a/b)√(y² + b²)
By symmetry: two foci at (±c, 0) and two directrices at x = ±a²/c, where c = ea
The relationship: b² = c² − a², so c > a (focus is farther from center than vertex)

🌌 Physical analogy (orbital mechanics)

The excerpt connects eccentricity to escape velocity:

Velocity v < escape velocity vₑ → elliptical orbit (e < 1)
Velocity v = escape velocity vₑ → parabolic trajectory (e = 1)
Velocity v > escape velocity vₑ → hyperbolic trajectory (e > 1)
The ratio v/vₑ correlates with the eccentricity of the path

🎯 Anatomy of a hyperbola

🔺 Vertices and axes

Vertices: The points on the hyperbola closest to the directrices; for x²/a² − y²/b² = 1, they are (±a, 0)
Center: The midpoint between the foci; at origin (0, 0) for standard form
Transverse axis: The line through the foci (x-axis for x²/a² − y²/b² = 1)
Conjugate axis: Perpendicular line through the center (y-axis for standard form)

📉 Asymptotes

For x²/a² − y²/b² = 1, the asymptotes are y = ±(b/a)x.

Why these are asymptotes: The difference between the line y = (b/a)x and the upper right branch y = (b/a)√(x² − a²) approaches zero as x approaches infinity:

The limit calculation shows: lim[x→∞] (bx/a − (b/a)√(x² − a²)) = 0
This means the hyperbola branches get arbitrarily close to these lines but never touch them
The asymptotes are oblique (diagonal) lines through the center

🔄 Rotated form

Switching x and y gives y²/a² − x²/b² = 1:

Transverse axis becomes the y-axis
Vertices at (0, ±a)
Foci at (0, ±c) where c² = a² + b²
Directrices at y = ±a²/c
Asymptotes: y = ±(a/b)x
Example: y² − x² = 1 is just x² − y² = 1 rotated 90°

🔁 Alternative definition (two foci)

📍 Second definition

A hyperbola is the set of all points in a plane such that the absolute value of the difference of the distances from two fixed points (the foci) is a positive constant.

For point P with distances d₁ and d₂ to foci F₁ and F₂: |d₁ − d₂| = constant > 0
The absolute value is needed because the difference can be positive or negative depending on which branch P is on
This definition yields the same equation as the first definition (left as exercise in the text)

✏️ Construction method

The two-foci definition enables hand construction:

Fasten a ruler of length L at focus F₁
Attach a string of length L − d (where 0 < d < F₁F₂) from the ruler's end to focus F₂
Hold string taut with pencil against ruler at point P
Rotate ruler about F₁
The difference PF₁ − PF₂ = L − (AP + PF₂) = L − (L − d) = d remains constant
Reverse roles of F₁ and F₂ to draw the other branch

🪞 Reflection property

💡 Statement of the property

Light shone from one focus will reflect off the hyperbola in the opposite direction from the other focus.

Equivalent formulation: The tangent line at point P bisects the angle ∠F₁PF₂, meaning θ₁ = θ₂.

🧮 Proof outline

For hyperbola x²/a² − y²/b² = 1 with point P = (x₀, y₀):

The tangent line has equation: xx₀/a² − yy₀/b² = 1
Its slope is b²x₀/(a²y₀) when y₀ ≠ 0
When y₀ = 0 (at vertices x₀ = ±a), tangent lines are vertical: x = ±a
The proof uses angle relationships in triangles and the tangent subtraction formula
It shows tan θ₁ = tan θ₂ = b²/(cy₀), confirming θ₁ = θ₂

Don't confuse: This is opposite to the ellipse reflection property—hyperbolas reflect light away from the other focus, while ellipses reflect light toward the other focus.

🔺 Conic sections (geometric origin)

🎪 Why "conic sections"

Ellipses, parabolas, and hyperbolas are formed by intersecting planes with a double circular cone of unlimited extent.

Conic	Plane orientation	Nappes intersected	Curve type
Ellipse	Intersects one nappe only	One	Closed, noncircular
Parabola	Parallel to a line on one nappe	One	Open
Hyperbola	Intersects both nappes	Both	Two branches

Nappe: Each half of a double cone (one extending upward, one downward)

🔬 Proof that sections match definitions

For a right circular double cone with intrinsic angle β:

Let plane Pc intersect the cone at angle α with any base circle
Inscribe a sphere touching Pc at point F and the cone along a circle in plane P₀
Let D be the line where Pc and P₀ intersect
For any point P on the curve, the ratio PF/PG = sin α / sin β = e (constant)
If 0° < α < β: then 0 < e < 1 → ellipse
If α = β: then e = 1 → parabola
If β < α ≤ 90°: then e > 1 → hyperbola (intersects both nappes)

Key geometric fact used: Tangent line segments to a sphere from the same external point have equal lengths (proven by Pythagorean Theorem on congruent right triangles).

Translations and Rotations

7.4 Translations and Rotations

🧭 Overview

🧠 One-sentence thesis

Coordinate transformations—translation and rotation—allow conic sections (ellipses, parabolas, hyperbolas) to be repositioned anywhere in the plane, and the discriminant B² − 4AC determines the type of conic from any second-degree equation.

📌 Key points (3–5)

Translation shifts the origin to a new point (h, k) by replacing x with x − h and y with y − k, moving conic sections to any center.
Rotation turns the coordinate plane by an angle θ about the origin using trigonometric substitutions.
The discriminant B² − 4AC determines whether a second-degree equation represents an ellipse (negative), parabola (zero), or hyperbola (positive).
Common confusion: A nonzero Bxy term indicates rotation; nonzero D or E indicates translation—both can appear in the same equation.
Practical use: "Reverse" rotation and translation equations convert complicated second-degree equations back into standard conic forms for easier graphing.

📐 Translation: shifting the origin

📍 What translation does

Translation: a coordinate transformation that shifts the origin O = (0, 0) to a new point O′ = (h, k), creating a new x′y′-plane.

Any point P = (x, y) in the original plane becomes P′ = (x′, y′) in the new plane.
The translation equations are:
- x′ = x − h
- y′ = y − k
To translate a curve, substitute x → x − h and y → y − k in its equation.

🔄 Translated conic equations

Conic	Standard form (centered at origin)	Translated form (centered at (h, k))
Ellipse (horizontal)	x²/a² + y²/b² = 1	(x − h)²/a² + (y − k)²/b² = 1
Parabola (vertical)	x² = 4py	(x − h)² = 4p(y − k)
Hyperbola (horizontal)	x²/a² − y²/b² = 1	(x − h)²/a² − (y − k)²/b² = 1

Vertices, foci, and axes shift by the same amounts h and k.
Example: An ellipse with center (2, 1) has vertices at (h ± a, k) instead of (±a, 0).

🧮 Completing the square for parabolas

A general parabola y = ax² + bx + c can be rewritten in translated form (x − h)² = 4p(y − k) by completing the square.
The vertex is at (h, k) = (−b/(2a), (4ac − b²)/(4a)).
The focus is at (h, k + p) where p = 1/(4a).
The directrix is the line y = k − p.

🔄 Rotation: turning the coordinate plane

🌀 What rotation does

Rotation: a coordinate transformation that turns the xy-plane about the origin by an angle θ, creating a new x′y′-plane with the same origin.

For a point P = (x, y) at distance r from the origin making angle α with the x-axis:
- x = r cos α, y = r sin α
- In the rotated plane: x′ = r cos(α − θ), y′ = r sin(α − θ)
Using sine and cosine subtraction identities:
- x′ = x cos θ + y sin θ
- y′ = −x sin θ + y cos θ

🔁 Rotation substitutions

To rotate a curve by angle θ, substitute:

x → x cos θ + y sin θ
y → −x sin θ + y cos θ

Reverse rotation equations (to convert back to standard form):

x = x′ cos θ − y′ sin θ
y = x′ sin θ + y′ cos θ

🎯 Example: rotating an ellipse

The ellipse x²/4 + y² = 1 rotated 45° becomes 5x² − 6xy + 5y² − 8 = 0.
Note the appearance of the xy cross-term (coefficient B = −6).
The curve xy = 1 is actually a hyperbola rotated 45° from the standard form x²/a² − y²/a² = 1.

🔍 Identifying conic sections from general equations

📊 The general second-degree equation

General form: Ax² + Bxy + Cy² + Dx + Ey + F = 0 (where A, B, C are not all zero)

B ≠ 0 indicates rotation.
D ≠ 0 or E ≠ 0 indicates translation.

🧪 The discriminant test

The sign of B² − 4AC determines the conic type:

Discriminant	Conic type	Degenerate cases
B² − 4AC < 0	Ellipse	Circle, point, or no curve
B² − 4AC = 0	Parabola	Line, two parallel lines, or no curve
B² − 4AC > 0	Hyperbola	Two intersecting lines

This discriminant is invariant under rotation—it doesn't change when you rotate the coordinate system.
Example: For 5x² + 4xy + 8y² − 36 = 0, we have B² − 4AC = 16 − 160 = −144 < 0, so it's an ellipse.

📐 Finding the rotation angle

If B ≠ 0 and the curve is a conic section:

If A = C: use θ = 45°
If A ≠ C: tan 2θ = B/(A − C), with 0° < θ < 90°

Then use half-angle identities to find sin θ and cos θ.

🎨 Graphing strategy

🛠️ Step-by-step approach

Identify the conic type using B² − 4AC.
Find the rotation angle θ (if B ≠ 0).
Apply reverse rotation equations to eliminate the xy term.
Complete the square (if needed) to handle translation.
Sketch in the x′y′-plane, then rotate back to the xy-plane.

💡 Example walkthrough

For 5x² + 4xy + 8y² − 36 = 0:

B² − 4AC = −144 < 0 → ellipse
tan 2θ = 4/(−3) → θ = tan⁻¹(2) ≈ 63.4°
After substituting reverse rotation equations: x′²/4 + y′²/9 = 1
This is a standard ellipse rotated 63.4° counterclockwise.

⚠️ Don't confuse

Translation vs. rotation: Translation moves the center; rotation turns the axes. Both can occur in the same equation.
Invariants: A + C and B² − 4AC stay the same under rotation; A, B, and C stay the same under translation.

Hyperbolic Functions

7.5 Hyperbolic Functions

🧭 Overview

🧠 One-sentence thesis

Hyperbolic functions arise naturally from the unit hyperbola x² − y² = 1 in the same way that circular (trigonometric) functions arise from the unit circle, and they satisfy analogous identities while appearing frequently in physical applications.

📌 Key points (3–5)

Geometric origin: Points on the unit hyperbola x² − y² = 1 can be expressed as (cosh a, sinh a), where a is twice the area of a hyperbolic sector, paralleling how (cos θ, sin θ) describes the unit circle.
Definition via exponentials: sinh x = (eˣ − e⁻ˣ)/2 and cosh x = (eˣ + e⁻ˣ)/2, making them even and odd "versions" of the exponential function.
Identities parallel trigonometry: cosh² x − sinh² x = 1 (compare cos² θ + sin² θ = 1), and addition formulas exist for sinh and cosh.
Common confusion: The fundamental identity for hyperbolic functions is cosh² x − sinh² x = 1 (subtraction), not addition like the circular case; also, cosh is even and sinh is odd, unlike the exponential function.
Derivatives and inverses: Derivatives of hyperbolic functions resemble trigonometric derivatives (e.g., d/dx(sinh x) = cosh x), and inverse hyperbolic functions can be expressed using natural logarithms.

📐 Geometric foundation

📐 The unit hyperbola analogy

Textbooks sometimes call sine and cosine "circular functions" because any point on the unit circle x² + y² = 1 can be written as (cos θ, sin θ).
The unit hyperbola x² − y² = 1 motivates an analogous construction using hyperbolic functions.

Feature	Circular (unit circle)	Hyperbolic (unit hyperbola)
Equation	x² + y² = 1	x² − y² = 1
Point coordinates	(cos θ, sin θ)	(cosh a, sinh a)
Parameter meaning	θ = angle (also twice sector area)	a = twice the hyperbolic sector area

📏 The hyperbolic angle

Hyperbolic angle a: For a point P = (x, y) on the unit hyperbola x² − y² = 1, the hyperbolic angle a is twice the area of the hyperbolic sector (the region bounded by the hyperbola, the x-axis, and the line from the origin to P).

The area a/2 equals the area of the right triangle with hypotenuse OP and legs x and y, minus the area under the hyperbola from 1 to x.
Using integration and formula manipulation, the excerpt derives:
- a = ln(x + √(x² − 1))
- Solving for x: x = (eᵃ + e⁻ᵃ)/2 = cosh a
- Solving for y: y = (eᵃ − e⁻ᵃ)/2 = sinh a
Why twice the area? The factor of 2 is chosen to obtain "cleaner" final formulas involving a instead of 2a.

🧮 Definitions and basic properties

🧮 The six hyperbolic functions

Hyperbolic sine, cosine, tangent, cotangent, secant, cosecant (sinh, cosh, tanh, coth, sech, csch):

sinh x = (eˣ − e⁻ˣ)/2 for all x

cosh x = (eˣ + e⁻ˣ)/2 for all x

tanh x = sinh x / cosh x for all x

coth x = 1 / tanh x for all x ≠ 0

sech x = 1 / cosh x for all x

csch x = 1 / sinh x for all x ≠ 0

All six functions are defined analogously to the trigonometric functions.
Even and odd: cosh is an even function (cosh(−x) = cosh x), sinh is an odd function (sinh(−x) = −sinh x).
Both cosh x and sinh x grow exponentially as x → ∞ (the e⁻ˣ term becomes negligible).
sinh x decreases exponentially to −∞ as x → −∞.

📊 Graph behavior

y = cosh x: Shaped like a catenary (a uniform cable hanging from two fixed points); always ≥ 1, with minimum at x = 0.
y = sinh x: Passes through the origin, increasing for all x.
y = tanh x: Has horizontal asymptotes at y = ±1.
The excerpt provides graphs for all six hyperbolic functions and notes their domains.

🔗 Identities and relationships

🔗 Fundamental identities

The hyperbolic functions satisfy identities analogous to trigonometric identities:

Type	Identity
Pythagorean-like	cosh² x − sinh² x = 1
	tanh² x + sech² x = 1
	coth² x − csch² x = 1
Even/odd	sinh(−x) = −sinh x, cosh(−x) = cosh x, tanh(−x) = −tanh x
Addition	sinh(u ± v) = sinh u cosh v ± cosh u sinh v
	cosh(u ± v) = cosh u cosh v ± sinh u sinh v
	tanh(u ± v) = (tanh u ± tanh v) / (1 ± tanh u tanh v)
Double angle	sinh 2x = 2 sinh x cosh x
	cosh 2x = cosh² x + sinh² x
	tanh 2x = 2 tanh x / (1 + tanh² x)

🔍 Proving identities

The identity cosh² x − sinh² x = 1 was proved when deriving coordinates on the unit hyperbola (since (cosh a, sinh a) must satisfy x² − y² = 1).
Don't confuse: The hyperbolic identity uses subtraction (cosh² − sinh²), unlike the circular identity (cos² + sin²).
Addition identities can be proved using the exponential definitions.
- Example: sinh u cosh v + cosh u sinh v = [(eᵘ − e⁻ᵘ)/2][(eᵛ + e⁻ᵛ)/2] + [(eᵘ + e⁻ᵘ)/2][(eᵛ − e⁻ᵛ)/2] = (eᵘ⁺ᵛ − e⁻⁽ᵘ⁺ᵛ⁾)/2 = sinh(u + v).

🧪 Calculus and applications

🧪 Derivatives and integrals

The derivatives of hyperbolic functions closely parallel trigonometric derivatives:

Function	Derivative	Integral
sinh x	cosh x	∫cosh x dx = sinh x + C
cosh x	sinh x	∫sinh x dx = cosh x + C
tanh x	sech² x	∫sech² x dx = tanh x + C
coth x	−csch² x	∫csch² x dx = −coth x + C
sech x	−sech x tanh x	∫sech x tanh x dx = −sech x + C
csch x	−csch x coth x	∫csch x coth x dx = −csch x + C

Proofs use the exponential definitions.
- Example: d/dx(cosh x) = d/dx[(eˣ + e⁻ˣ)/2] = (eˣ − e⁻ˣ)/2 = sinh x.
The Chain Rule applies as usual.
- Example: For y = sinh(x³), dy/dx = 3x² cosh(x³).

🔬 Physical applications

Differential equations: Both y = cosh(at) and y = sinh(at) satisfy y″(t) = a² y(t), which models rectilinear motion under a repulsive force proportional to displacement.
Paramagnetism: The excerpt shows an integral from classical paramagnetism theory that simplifies to an expression involving sinh.
Catenary: The graph of y = cosh x describes the shape of a uniform cable hanging from two fixed points.
Lorentz transformations: Exercises mention hyperbolic functions appearing in special relativity.

🔄 Inverse hyperbolic functions

🔄 Definitions and domains

Inverse hyperbolic functions (sinh⁻¹, cosh⁻¹, tanh⁻¹, coth⁻¹, sech⁻¹, csch⁻¹): The inverses of the hyperbolic functions, defined where the original functions are one-to-one.

Function	Domain	Range
sinh⁻¹ x	all x	all y
cosh⁻¹ x	x ≥ 1	y ≥ 0
tanh⁻¹ x	\|x\| < 1	all y
coth⁻¹ x	\|x\| > 1	all y ≠ 0
sech⁻¹ x	0 < x ≤ 1	y ≥ 0
csch⁻¹ x	all x ≠ 0	all y ≠ 0

Since sinh x is increasing for all x (because d/dx(sinh x) = cosh x > 0), its inverse sinh⁻¹ y is defined for all y.
cosh x is not one-to-one over all x, so cosh⁻¹ x is defined only for x ≥ 1 with range y ≥ 0.

🔄 Logarithmic expressions

Inverse hyperbolic functions can be expressed using natural logarithms:

Function	Logarithmic form	Restrictions
sinh⁻¹ x	ln(x + √(x² + 1))	all x
cosh⁻¹ x	ln(x + √(x² − 1))	x ≥ 1
tanh⁻¹ x	(1/2) ln[(1 + x)/(1 − x)]	\|x\| < 1
coth⁻¹ x	(1/2) ln[(x + 1)/(x − 1)]	\|x\| > 1
sech⁻¹ x	ln[(1 + √(1 − x²))/x]	0 < x ≤ 1
csch⁻¹ x	ln[(1/x) + √(1/x² + 1)]	x ≠ 0

The formula for cosh⁻¹ x was proved at the beginning of the section when deriving the hyperbolic angle.
The formula for sinh⁻¹ x is derived by setting y = sinh⁻¹ x, so x = sinh y = (eʸ − e⁻ʸ)/2, then solving the resulting quadratic equation in eʸ.
Example: For a point P = (x, y) on the unit hyperbola, the area a/2 of the hyperbolic sector OAP equals (1/2) cosh⁻¹ x, which makes sense because x = cosh a implies cosh⁻¹ x = a.

🔄 Derivatives of inverse hyperbolic functions

Function	Derivative	Restrictions
sinh⁻¹ x	1 / √(x² + 1)	all x
cosh⁻¹ x	1 / √(x² − 1)	x ≥ 1
tanh⁻¹ x	1 / (1 − x²)	\|x\| < 1
coth⁻¹ x	1 / (1 − x²)	\|x\| > 1
sech⁻¹ x	−1 / [x√(1 − x²)]	0 < x ≤ 1
csch⁻¹ x	−1 / [\|x\|√(1 + x²)]	x ≠ 0

These can be derived using either the general formula for the derivative of an inverse function or the logarithmic expressions.
Example derivation for tanh⁻¹ x:
- d/dx(tanh⁻¹ x) = d/dx[(1/2) ln((1 + x)/(1 − x))]
- = (1/2) d/dx[ln(1 + x) − ln(1 − x)]
- = (1/2)[1/(1 + x) − (−1)/(1 − x)]
- = [(1 − x) + (1 + x)] / [2(1 + x)(1 − x)]
- = 1 / (1 − x²).

Parametric Equations

7.6 Parametric Equations

🧭 Overview

🧠 One-sentence thesis

Parametric equations provide a flexible way to describe plane curves by expressing both x and y coordinates as functions of a third variable (the parameter), enabling representation of curves that cannot be written as a single function y = f(x).

📌 Key points (3–5)

What parametric equations are: both x and y coordinates written as functions of a parameter t, so that x = x(t) and y = y(t) for t in some interval.
Why they're useful: they can describe any plane curve shape, not just graphs of single functions; the same curve can have many different parametrizations.
Common confusion: the parameter t often represents time in physical settings, but it can represent any quantity (angle, slope, area, percentage, etc.) and any symbol can be used.
How to find derivatives: use dy/dx = (dy/dt) / (dx/dt) when dx/dt ≠ 0, treating differentials as functions of t.
Key examples: circles, ellipses, hyperbolas, cycloids, and Bézier curves all have natural parametric representations.

🔄 Parametrizations of the unit circle

🔄 Two ways to identify points

The unit circle x² + y² = 1 can be parametrized in at least two distinct ways:

Method	Parameter	Parametric equations	Coverage
By angle	θ	x = cos θ, y = sin θ	All points
By slope	t	x = (1 − t²)/(1 + t²), y = 2t/(1 + t²)	All points except (−1, 0)

The angle method uses the angle θ as the parameter.
The slope method uses t = the slope of lines through (−1, 0) as the parameter; this comes from the half-angle substitution.
Example: Both describe the same geometric object (the unit circle) but trace it differently as the parameter varies.

📐 General definition

Parametric equations of a plane curve C: equations x = x(t) and y = y(t) where C consists of all points (x, y) such that x = x(t) and y = y(t) for the parameter t in some interval I.

Shorthand notation: C : x = x(t), y = y(t), t in I

A curve y = f(x) is the special case where x = t and y = f(t).
The parameter can represent anything: time, angle, slope, area, percentage, etc.
A single curve can have many different parametrizations.

🔵 Standard parametrizations of conic sections

🔵 Circles

For constants ω ≠ 0 and r > 0, the parametric equations

x = h + r cos(ωt), y = k + r sin(ωt) for −∞ < t < ∞

describe the circle (x − h)² + (y − k)² = r² with center (h, k) and radius r.

The constant ω determines how fast and in which direction the circle is traced.
Example: ω = 2 traces the circle counterclockwise at twice the speed of ω = 1.
The circle is re-traced every 2π/ω radians, so t is often restricted to [0, 2π/ω] to trace it only once.

🔵 Ellipses

For a > 0 and b > 0, the parametric equations

x = a cos t, y = b sin t for 0 ≤ t ≤ 2π

describe the ellipse x²/a² + y²/b² = 1.

Verification: x²/a² + y²/b² = (a² cos² t)/a² + (b² sin² t)/b² = cos² t + sin² t = 1.
The parameter t is called the eccentric angle of the ellipse.
The entire ellipse is traced as t varies from 0 to 2π.

🔵 Hyperbolas

The parametric equations

x = cosh t, y = sinh t for −∞ < t < ∞

describe the right branch of the unit hyperbola x² − y² = 1.

Verification: x² − y² = cosh² t − sinh² t = 1.
Since cosh t ≥ 1 and sinh t can take any value, the entire right branch is traced.
The left branch has parametric equations x = −cosh t, y = sinh t.
The parameter t represents half the area of the hyperbolic sector (positive for t > 0, negative for t < 0).
General form: for a > 0 and b > 0, the hyperbola x²/a² − y²/b² = 1 has parametric equations x = ±a cosh t, y = b sinh t.

🎨 Bézier curves and applications

🎨 What Bézier curves do

Bézier curves: curves used in Computer Aided Design (CAD) to join the ends of an open polygonal path of noncollinear control points with a smooth curve that models the "shape" of the path.

Created via repeated linear interpolation.
The curve is not the straight path connecting the control points, but a smooth curve influenced by all control points.

🎨 Construction for three control points

For three control points B₀, B₁, B₂ and parameter t in [0, 1]:

Think of t as a percentage (e.g., t = 0.4 = 40%).
Let A₀ be the point that is 100t% of the way from B₀ to B₁.
Let A₁ be the point that is 100t% of the way from B₁ to B₂.
The point P that is 100t% of the way from A₀ to A₁ lies on the Bézier curve.
Repeat for every t in [0, 1] to fill out the entire curve.

The resulting parametric equations are:

x = (1 − t)² x₀ + 2t(1 − t) x₁ + t² x₂
y = (1 − t)² y₀ + 2t(1 − t) y₁ + t² y₂

for 0 ≤ t ≤ 1, where Bᵢ = (xᵢ, yᵢ).

🎨 Properties and extensions

For n ≥ 3 control points, the parametric equations are polynomials of degree n − 1 in t.
Bézier curves can be constructed in three-dimensional space.
Bézier surfaces are used in three dimensions to model the boundary of a polyhedron.
Example: For B₀ = (1, 2), B₁ = (2, 4), B₂ = (4, 1), the curve is x = t² + 2t + 1, y = −5t² + 4t + 2 for 0 ≤ t ≤ 1, which turns out to be part of a parabola.

🌀 Cycloids

🌀 What a cycloid is

Cycloid: the path of a point P on a circle rolling along a straight line.

For a circle of radius a rolling along the x-axis so that P touches the origin during the roll, the parametric equations are:

x = a(θ − sin θ), y = a(1 − cos θ) for −∞ < θ < ∞

🌀 Derivation

Let θ be the central angle (in radians) as shown in the figure.
The horizontal distance from the circle's center to the y-axis equals the arc length aθ (because P touches the origin as the circle rolls).
The center is at (aθ, a).
Using the circle parametrization with an appropriate angle adjustment gives the cycloid equations.

🌀 Properties

The derivative dy/dx = sin θ / (1 − cos θ) = cot(θ/2).
dy/dx is undefined when cos θ = 1, i.e., when θ = 2πk for all integers k.
At these points (x = 2πka), the cycloid has cusps (sharp points).
The cycloid is always concave down.
The area under one arch (from 0 to 2πa) is 3πa².

🌀 The brachistochrone problem

Problem: Find the plane curve joining two points A and B (where B is lower than A but not directly under it) along which an object slides frictionless under gravity alone from A to B in the shortest time.
Solution: The optimal path is not a straight line, but part of an inverted (upside-down) cycloid with a cusp at A.

📊 Calculus with parametric equations

📊 First derivative

For a curve with parametric equations x = x(t) and y = y(t), the derivative dy/dx is found using differentials:

dy/dx = (dy/dt) / (dx/dt) = y'(t) / x'(t)

when dx/dt ≠ 0.

Use the differentials dy = y'(t) dt and dx = x'(t) dt.
Divide one by the other to get dy/dx.
Example: For the cycloid x = a(θ − sin θ), y = a(1 − cos θ), we have dy/dx = (a sin θ) / (a(1 − cos θ)) = sin θ / (1 − cos θ).

📊 Second derivative

The second derivative d²y/dx² can be found via the Chain Rule:

d²y/dx² = (d/dt)(dy/dx) / (dx/dt)

First compute dy/dx as a function of t.
Then differentiate that result with respect to t.
Finally divide by dx/dt.

📊 Integration

For t in [a, b] with x₁ = x(a) and x₂ = x(b), the integral is:

∫(from x₁ to x₂) y dx = ∫(from a to b) y(t) · x'(t) dt

Substitute x = x(t), so dx = x'(t) dt.
Change the limits from x-values to t-values.
Example: This formula is used to find the area under the cycloid over [0, 2πa], which equals 3πa².

Polar Coordinates

7.7 Polar Coordinates

🧭 Overview

🧠 One-sentence thesis

Polar coordinates provide a powerful alternative to Cartesian coordinates for describing curves with radial symmetry, simplifying both the representation of certain curves and the calculation of areas swept out by rotating rays.

📌 Key points (3–5)

What polar coordinates are: a system using distance from origin (r) and angle from the positive x-axis (θ) instead of x and y coordinates.
When they're useful: curves with symmetry about the origin (like spirals and circles centered at the origin) are simpler in polar form.
Common confusion: polar coordinates are not unique—the same point can be represented by infinitely many (r, θ) pairs; also, r can be negative by convention.
Converting between systems: straightforward formulas exist to switch between polar and Cartesian coordinates in both directions.
Area calculation advantage: finding areas swept out by polar curves often involves simpler integrals than Cartesian approaches.

📐 The polar coordinate system

📍 What polar coordinates measure

Polar coordinates (r, θ): a pair where r is the distance from the origin O to point P, and θ is the angle the ray OP makes with the positive x-axis.

The origin is called the "pole" and the positive x-axis is the "polar axis."
Think of the ray OP "swinging around" the pole at the origin.
Example: A spiral that would violate the vertical line test in Cartesian coordinates can have a simple polar equation like r = 1 + θ/(2π).

🔄 Non-uniqueness and conventions

The same point has multiple polar representations:

(r, θ) = (r, θ + 2πk) for any integer k (angles wrap around every full rotation).
Negative r convention: (−r, θ) = (r, θ + π), meaning draw the ray in the opposite direction.
The origin: when r = 0, the point is the origin O regardless of θ value.

Don't confuse: Unlike Cartesian coordinates where each point has exactly one representation, polar coordinates intentionally allow multiple representations of the same point.

🗺️ Polar graphing paper

Polar grid structure differs fundamentally from rectangular grids:

Feature	What it shows	Visual pattern
Concentric circles	Where r is constant	Circles around origin at r = 1, 2, 3, etc.
Lines through origin	Where θ is constant	Radial lines at regular angle intervals

The excerpt notes angles can be shown in degrees or radians (radians preferred for their "unitless" nature).

🔄 Converting between coordinate systems

➡️ Polar to Cartesian

Given polar coordinates (r, θ), find Cartesian (x, y):

x = r cos θ
y = r sin θ

These follow directly from the geometric definition of polar coordinates.

⬅️ Cartesian to polar

Given Cartesian coordinates (x, y), find polar (r, θ):

r = ±√(x² + y²)
tan θ = y/x (if x ≠ 0)

Special cases and sign considerations:

If x = 0, then θ = π/2 or θ = 3π/2.
The equation tan θ = y/x has two solutions in opposite quadrants (for 0 ≤ θ < 2π).
Sign rule for r: if θ is in the same quadrant as point (x, y), then r is positive; otherwise r is negative.

🔍 Conversion examples

Circle centered at origin: x² + y² = 1 becomes simply r = 1 in polar coordinates (much simpler).

Off-center circle: x² + (y − 4)² = 16 becomes r = 8 sin θ in polar coordinates.

The excerpt notes this polar form is actually less intuitive than the Cartesian form.
Why: the Cartesian equation clearly shows the center (0, 4) and radius 4; the polar equation obscures these properties.
Lesson: polar coordinates work best when there is symmetry about the origin; otherwise Cartesian may be clearer.

📊 Calculus with polar curves

📈 Derivatives in polar form

For a curve given by r = r(θ), treat x = r(θ) cos θ and y = r(θ) sin θ as parametric equations in parameter θ.

The formulas become:

dy/dx = [r'(θ) sin θ + r(θ) cos θ] / [r'(θ) cos θ − r(θ) sin θ]
d²y/dx² = [d/dθ(dy/dx)] / [r'(θ) cos θ − r(θ) sin θ]

These follow from the Product Rule applied to the parametric equations.

📉 Finding extrema and inflection points

Example: For r = 1 + cos θ (a cardioid):

First find dy/dx and d²y/dx² using the formulas above.
Set dy/dx = 0 to find horizontal tangent lines (potential maxima/minima).
Check where dy/dx is undefined (vertical tangent lines).
Use the sign of d²y/dx² to determine concavity and classify critical points.
The excerpt shows this curve has a local maximum at θ = π/3 and local minimum at θ = 5π/3.

🧮 Area calculations in polar coordinates

🥧 The polar area formula

For a curve r = f(θ) swept out between θ = α and θ = β:

A = ∫[from α to β] (1/2)r² dθ = ∫[from α to β] (1/2)(f(θ))² dθ

🔬 Derivation logic

The excerpt derives this using infinitesimals:

An infinitesimal wedge has angle dθ and radial sides r and r + dr.
By the Microstraightness Property, the curved edge is straight over infinitesimal dθ.
The wedge area equals the area of a triangle with sides r and r + dr with included angle dθ.
Using the triangle area formula (1/2)bc sin A: dA = (1/2)r(r + dr) sin(dθ).
Since sin(dθ) = dθ and (dθ)² = 0, this simplifies to dA = (1/2)r² dθ.
Summing (integrating) these infinitesimal areas gives the total area.

⚠️ Important warning about periodicity

If f is periodic, choose the angle interval [α, β] so the area is swept out only once.

Example: For a circle r = R, use 0 ≤ θ ≤ 2π (one full rotation).
Using 0 ≤ θ ≤ 4π would incorrectly give area 4πR² instead of πR² (counting the region twice).

💡 Advantages over Cartesian integration

Circle area example: For a circle of radius R centered at the origin:

Polar: A = ∫[0 to 2π] (1/2)R² dθ = (1/2)R²(2π) = πR² (simple integral).
Cartesian: requires trigonometric substitution (much more complex).

The excerpt emphasizes this simplicity as a key advantage of polar coordinates for radially symmetric regions.

🎯 Worked example

For the cardioid r = 1 + cos θ over 0 ≤ θ ≤ 2π:

A = ∫[0 to 2π] (1/2)(1 + cos θ)² dθ
Expand: (1 + cos θ)² = 1 + 2cos θ + cos² θ
Use identity cos² θ = (1 + cos 2θ)/2
Simplify and integrate to get A = 3π/2

Area Between Curves

8.1 Area Between Curves

🧭 Overview

🧠 One-sentence thesis

The area between two curves is found by integrating the absolute difference of their y-coordinates over an interval, generalizing the "area under a curve" concept to regions bounded by any two functions.

📌 Key points (3–5)

What it generalizes: extends "area under a curve" (curve above x-axis) to any two curves, neither required to be above the x-axis.
The height function: at each x, the height is the absolute value of the difference between the two y-values, ensuring area is never negative.
How to compute: integrate the absolute difference of the functions over the interval where the region exists.
Common confusion: when curves cross, you must split the integral at intersection points because which function is "on top" changes.
Bounded vs unbounded: a "bounded region" has finite area; the curves intersect to enclose the region.

📐 The height function and area formula

📏 Why absolute value matters

The area A of a region between curves cannot be negative.

At each point x in the interval [a, b], the two curves have y-coordinates f₁(x) and f₂(x).
The vertical distance between them is the height function h(x) = |f₁(x) − f₂(x)|.
The absolute value ensures the height is always nonnegative, no matter which curve is higher.
Example: if f₁(x) = 3 and f₂(x) = 5 at some x, the height is |3 − 5| = 2, not −2.

🧮 The integral formula

The area A between two curves y = f₁(x) and y = f₂(x) over an interval [a, b] is: A = integral from a to b of |f₁(x) − f₂(x)| dx.

Each infinitesimal area element dA has width dx and height h(x), so dA = h(x) dx.
Summing (integrating) all these elements from a to b gives the total area.
The interval can be finite or infinite, as long as the integral is defined.
Neither curve needs to be above the x-axis; the formula works for any configuration.

🔀 Handling curve intersections

✂️ Splitting the integral at crossings

When two curves intersect within the interval, which function is larger changes at the intersection point.
The absolute value |f₁(x) − f₂(x)| changes its expression at these points.
Don't confuse: you cannot ignore intersections—the integral must be split into subintervals where one function consistently stays above the other.

🧩 Example: sine and cosine

In Example 8.3, y = sin x and y = cos x are integrated over [0, π/3].
The curves intersect at x = π/4.
For 0 ≤ x ≤ π/4: cos x ≥ sin x, so the height is cos x − sin x.
For π/4 ≤ x ≤ π/3: sin x ≥ cos x, so the height is sin x − cos x.
The area A must be computed as two separate integrals (one for each subinterval) and then summed.

🧪 Worked examples

🧪 Exponential curves (Example 8.1)

Setup: find area between y = eˣ and y = e⁻ˣ over [0, 2].
Which is larger: eˣ ≥ e⁻ˣ for all x in [0, 2], so h(x) = eˣ − e⁻ˣ (no absolute value needed).
Computation: A = integral from 0 to 2 of (eˣ − e⁻ˣ) dx = [eˣ + e⁻ˣ] evaluated from 0 to 2 = e² + e⁻² − 2 = 2(cosh 2 − 1).
Key point: when one function is always above the other, the absolute value simplifies to a single expression.

🧪 Parabola and line (Example 8.2)

Setup: find area of the region bounded by y = x² and y = x.
Finding bounds: the curves intersect where x² = x, so x = 0 and x = 1.
Which is larger: for 0 ≤ x ≤ 1, x ≥ x², so h(x) = x − x².
Computation: A = integral from 0 to 1 of (x − x²) dx = [½x² − ⅓x³] from 0 to 1 = ½ − ⅓ = ⅙.
Bounded region: the intersection points define the finite interval; the region has finite area.

🔑 Key distinctions

🔑 Special case vs general case

Concept	Special case (Chapter 5)	General case (this section)
Second curve	Always the x-axis (y = 0)	Any function y = f₂(x)
Position requirement	Curve must be above x-axis	Neither curve needs to be above x-axis
Height function	Simply f(x) (assumed ≥ 0)	Absolute difference \|f₁(x) − f₂(x)\|

🔑 Bounded vs unbounded regions

Bounded region: finite area; typically curves intersect to enclose the region (as in Example 8.2).
Unbounded region: infinite area or extends to infinity; the excerpt notes intervals can be infinite "over which the integral is defined."
Don't confuse: "bounded" refers to the region's area being finite, not just the interval being finite.

Average Value of a Function

8.2 Average Value of a Function

🧭 Overview

🧠 One-sentence thesis

The average value of a function over an interval generalizes the arithmetic mean of finitely many numbers by using the definite integral to sum over a continuum of infinitely many function values.

📌 Key points (3–5)

Core definition: The average value of a function f over [a, b] is (1/(b - a)) times the integral of f from a to b.
Why integration works: The definite integral provides a way to sum over an infinite continuum of values, which the finite arithmetic mean cannot handle.
Symmetry insight: Symmetric functions can have the same average over different intervals if the function values duplicate (e.g., x² over [-1,1] vs [0,1]).
Common confusion: Don't confuse the average value of a function (a single number summarizing the entire interval) with individual function values at specific points.
Practical alternative: When integration is difficult, the Monte Carlo method approximates the average by sampling many random points and computing their arithmetic mean.

🔢 From finite averages to continuous averages

🔢 The finite case

For n numbers x₁, x₂, ..., xₙ, the average (or mean) is:
- average = (x₁ + x₂ + ... + xₙ) / n
This is the sum of the numbers divided by how many numbers there are.

🌌 The infinite continuum problem

Motivating example: A planet orbiting the Sun has uncountably infinitely many distances from the Sun over one complete orbit.
The finite average formula cannot be applied directly because there is no finite count of distances.
Solution needed: A method to "sum" over an infinite continuum of values.

🧮 The integral as an infinite sum

The definite integral is a sum of a continuum of infinitesimal quantities.

Integration already provides the tool needed: it sums infinitely many infinitesimally small pieces.
This insight bridges the gap between finite arithmetic and continuous functions.

📐 Deriving the average value formula

📐 Partition and approximate

Divide the interval [a, b] into n subintervals of equal length Δxᵢ = (b - a)/n using a partition a = x₀ < x₁ < x₂ < ... < xₙ = b.
The n function values f(x₁), f(x₂), ..., f(xₙ) are a finite subset of all function values over [a, b].
Their arithmetic mean approximates the true average:
- ⟨f⟩ ≈ (f(x₁) + f(x₂) + ... + f(xₙ)) / n

🔄 Algebraic manipulation

Rewrite the approximation by dividing the sum by (b - a) and multiplying each term by (b - a):
- ⟨f⟩ ≈ (1/(b - a)) × sum of f(xᵢ) × (b - a)/n
- ⟨f⟩ ≈ (1/(b - a)) × sum of f(xᵢ) × Δxᵢ
The last summation is a Riemann sum for the definite integral of f from a to b, with right endpoints chosen.

∞ Taking the limit

As n → ∞, more and more function values are included in the average.
The Riemann sum converges to the definite integral.
This yields the definition:

Average value ⟨f⟩ of a function f over a closed interval [a, b] is:
⟨f⟩ = (1/(b - a)) × integral from a to b of f(x) dx

🧪 Examples and patterns

🧪 Basic polynomial example

Example: Average value of f(x) = x² over [0, 1].

Using the formula with a = 0 and b = 1:
- ⟨f⟩ = (1/(1 - 0)) × integral from 0 to 1 of x² dx
- = integral from 0 to 1 of x² dx
- = [x³/3] from 0 to 1
- = 1/3 - 0 = 1/3
Interpretation: If you squared all numbers between 0 and 1, the average of those squares is 1/3.

🔁 Symmetry and duplication

Example: Average value of f(x) = x² over [-1, 1].

Using the formula with a = -1 and b = 1:
- ⟨f⟩ = (1/(1 - (-1))) × integral from -1 to 1 of x² dx
- = (1/2) × [x³/3] from -1 to 1
- = (1/2) × (1/3 - (-1/3))
- = 1/3
Why the same result? The function f(x) = x² is symmetric about the y-axis, so values from [-1, 0] duplicate those from [0, 1] and do not change the average.
Don't confuse: Changing the interval does not always change the average; symmetry matters.

📈 Trigonometric example

Example: Average value of f(x) = sin x over [0, π].

Using the formula with a = 0 and b = π:
- ⟨f⟩ = (1/π) × integral from 0 to π of sin x dx
- = (1/π) × [-cos x] from 0 to π
- = -(1/π) × (cos π - cos 0)
- = -(1/π) × (-1 - 1)
- = 2/π

🌍 Geometric application: ellipse distance

Example: Average distance from the ellipse (x²/25) + (y²/9) = 1 to the point (4, 0).

The point (4, 0) is a focus of the ellipse.
For any point (x, y) on the ellipse, the distance d to (4, 0) is:
- d² = (x - 4)² + y²
- Substituting y² = 9(1 - x²/25) = (9/25)(25 - x²):
- d² = ((4x - 25)²)/25
- d = (25 - 4x)/5 for -5 ≤ x ≤ 5 (distance must be non-negative)
By symmetry about the x-axis, only the upper half of the ellipse is needed.
Average distance:
- ⟨d⟩ = (1/(5 - (-5))) × integral from -5 to 5 of (25 - 4x)/5 dx
- = (1/50) × [25x - 2x²] from -5 to 5
- = (1/50) × ((125 - 50) - (-125 - 50))
- = 5

🎲 Monte Carlo approximation

🎲 When integration is hard

Problem: Some functions are not easily integrable (e.g., no closed-form antiderivative).
Alternative: Use the Monte Carlo method instead of numerical integration techniques.

🎲 The Monte Carlo idea

Go back to the finite arithmetic mean definition.
Take a large number N of random numbers x₁, x₂, ..., xₙ in [a, b].
Approximate the average:
- ⟨f⟩ ≈ (f(x₁) + f(x₂) + ... + f(xₙ)) / N
Why it works: As N increases, the approximations converge to the actual average.
Don't confuse: This is a step backward from calculus (using finite sums), but it is surprisingly useful and simple to implement with computers.

🎲 Implementation example

Example: Approximate the average value of f(x) = x² over [0, 1] using 100 million random numbers.

Known exact average: 1/3 = 0.33333...
Monte Carlo approximation with 10⁸ random numbers: 0.3333292094741531
The approximation is very close to the exact value.

Example: Approximate the average value of f(x) = sin(x²) over [π, 2π].

This function cannot be integrated in closed form.
Actual average: -0.04154374531416104
Monte Carlo approximation with 10⁸ random numbers: -0.04153426177596753
The method provides a practical solution when exact integration is impossible.

Arc Length and Curvature

8.3 Arc Length and Curvature

🧭 Overview

🧠 One-sentence thesis

Calculus enables us to compute the length of curves through integration formulas derived from infinitesimal right triangles, and curvature measures how rapidly a curve changes direction per unit of arc length.

📌 Key points (3–5)

Arc length formula foundation: By dividing infinitesimal right triangles by dx (or dt, or dθ), we convert infinitesimal sides into measurable ratios that obey the Pythagorean Theorem, yielding integral formulas for curve length.
Three coordinate systems: Arc length formulas exist for Cartesian (y = f(x)), parametric (x(t), y(t)), and polar (r(θ)) curves, each derived from the same geometric principle.
Elliptic integrals: Most arc length integrals cannot be evaluated in closed form; the ellipse circumference requires special functions called elliptic integrals.
Curvature definition: Curvature κ measures the rate of change of tangent line angle with respect to arc length (dφ/ds), not just the second derivative.
Common confusion: The second derivative alone does not capture curvature—the parabola y = x² has constant second derivative but varying curvature at different points.

📏 Arc length formulas and derivation

📐 The infinitesimal triangle trick

The key insight: For a curve over an infinitesimal interval [x, x + dx], the curve is straight (by the Microstraightness Property) with length ds.

Why you cannot directly apply Pythagorean Theorem: Writing ds = √((dx)² + (dy)²) = √(0 + 0) = 0 is false because ds is a positive infinitesimal.
The solution: Divide all sides of the infinitesimal right triangle by dx, creating a similar but noninfinitesimal triangle with sides 1, dy/dx, and ds/dx.
Now the Pythagorean Theorem applies: (ds/dx)² = 1² + (dy/dx)², so ds = √(1 + (dy/dx)²) dx.
Summing infinitesimals: Integrate ds from a to b to get total arc length s.

📊 Cartesian arc length

Formula (8.4): The arc length s of a curve y = f(x) over [a, b] is: s = integral from a to b of √(1 + (dy/dx)²) dx

Example: For y = cosh x over [0, 1], since dy/dx = sinh x and 1 + sinh²x = cosh²x, the integral simplifies to ∫ cosh x dx = sinh 1 ≈ 1.1752.
Practical limitation: Most arc length integrals cannot be evaluated in closed form and require numerical methods.

🔄 Parametric arc length

Formula (8.5): The arc length s of a parametric curve x = x(t), y = y(t) for a ≤ t ≤ b is: s = integral from a to b of √((dx/dt)² + (dy/dt)²) dt

Derivation: Divide the infinitesimal right triangle by dt instead of dx, then apply Pythagorean Theorem.
Example: For x = cos³t, y = sin³t over [0, π/2], after simplification the integral becomes (3/2) ∫ sin t cos t dt = 3/2.

🌀 Polar arc length

Formula (8.6): The arc length s of a polar curve r = r(θ) for α ≤ θ ≤ β is: s = integral from α to β of √(r² + (dr/dθ)²) dθ

Derivation: Treat as a special case of parametric with x = r(θ)cos θ and y = r(θ)sin θ.
Example: For a circle r = R (constant), dr/dθ = 0, so s = ∫₀²π R dθ = 2πR, proving the familiar circumference formula.

🥚 Elliptic integrals

🚫 When closed forms fail

The ellipse problem: Attempting to find the circumference of the ellipse (x²/a²) + (y²/b²) = 1 leads to an integral that cannot be evaluated in closed form.
After trigonometric substitution x = a sin θ, the circumference becomes s = 4a ∫₀^(π/2) √(1 - e² sin²θ) dθ, where e is the eccentricity.

📚 Elliptic integral of the second kind

Definition: E(k, φ) = integral from 0 to φ of √(1 - k² sin²θ) dθ; the special case E(k) = E(k, π/2).

Ellipse circumference: s = 4a E(e), where e = c/a is the eccentricity and c = √(a² - b²).
Practical evaluation: Use tables or scientific computing software (Sage, MATLAB, Octave) with built-in elliptic integral functions.
Example: For the ellipse (x²/25) + (y²/9) = 1, we have a = 5, e = 0.8, so s = 20 E(0.8) ≈ 25.527.

📐 Curvature concepts

🔄 Why second derivative is insufficient

The parabola example: For y = x², the second derivative d²y/dx² = 2 everywhere, yet the curve is clearly more curved at the origin than at (1, 1).
What's missing: The second derivative measures rate of change of slope, but does not account for how rapidly the curve is traversed.

📐 Average curvature

Definition (8.7): The average curvature between points A and B is κ̄ = α/s, where s is arc length and α is the angle of contingence (angle between tangent lines at A and B).

Intuition: For the same arc length, a larger angle between tangents indicates greater curvature.
This ratio measures how much direction changes per unit distance traveled.

🎯 Instantaneous curvature

Definition (8.8): The curvature κ at a point is κ = dφ/ds, where φ is the angle the tangent line makes with the positive x-axis and s is arc length.

Interpretation: Curvature is the instantaneous rate of change of direction with respect to distance (not time).
As point B approaches A, the average curvature approaches the instantaneous curvature: lim(B→A) κ̄ = dφ/ds = κ.

🧮 Curvature formulas

📈 Cartesian curvature

Formula (8.9): For y = f(x), the curvature is κ = f''(x) / (1 + (f'(x))²)^(3/2)

Derivation: Use φ = arctan(f'(x)) and the chain rule with ds = √(1 + (f'(x))²) dx.
Sign interpretation: κ > 0 when concave up, κ < 0 when concave down, κ = 0 for straight lines.
Example: For y = x², κ(0) = 2 but κ(1) = 2/(5^(3/2)) ≈ 0.1789, confirming greater curvature at the origin.

🔄 Parametric curvature

Formula (8.10): For x = x(t), y = y(t), the curvature is κ = (x'(t)y''(t) - y'(t)x''(t)) / ((x'(t))² + (y'(t))²)^(3/2)

Derivation: Use dy/dx = y'(t)/x'(t) and the chain rule for d²y/dx², then substitute into formula (8.9).

🌀 Polar curvature

Formula (8.11): For r = r(θ), the curvature is κ = (r² + 2(r')² - rr'') / (r² + (r')²)^(3/2)

⭕ Circle curvature

Special result: A circle of radius R has constant curvature κ = 1/R.
General principle: Any planar curve with constant curvature is either a line (κ = 0) or part of a circle.
Don't confuse: Larger circles have smaller curvature (1/R decreases as R increases).

Surfaces and Solids of Revolution

8.4 Surfaces and Solids of Revolution

🧭 Overview

🧠 One-sentence thesis

Single-variable calculus can compute surface areas and volumes of three-dimensional objects that possess rotational symmetry by revolving curves or regions around an axis and integrating infinitesimal elements.

📌 Key points (3–5)

Surface of revolution: revolving a curve around an axis produces a 3D surface whose area can be found by summing infinitesimal frustrum areas.
Solid of revolution: revolving a region around an axis produces a 3D solid whose volume can be found using the disc method (slicing perpendicular to the axis) or shell method (cylindrical shells parallel to the axis).
Disc vs shell method: disc method integrates cross-sectional areas (discs); shell method integrates cylindrical shell volumes—choose based on the axis of revolution and the region's orientation.
Common confusion: the disc method uses the square of the radius (π r² h), while the shell method uses the circumference times height (2π r h w); mixing these formulas leads to errors.
Why it matters: these techniques extend familiar geometric formulas (sphere volume, surface area) to arbitrary curves and provide a systematic way to handle rotational symmetry.

🎯 Surface of revolution

🔄 What a surface of revolution is

Surface of revolution: the 3D surface produced by revolving a curve y = f(x) around an axis (typically the x-axis or y-axis).

Start with a curve y = f(x) ≥ 0 over an interval [a, b].
Revolve it around the x-axis to create a 3D surface.
The goal is to find the total lateral surface area S of this surface.

📐 How to find surface area

Pick an infinitesimal interval [x, x + dx] on the curve.
By the Microstraightness Property, the curve is a straight line segment of length ds over that tiny interval.
Revolving this segment produces a frustrum (a cone with the top chopped off).
The frustrum has radii r₁ = f(x) and r₂ = f(x + dx) = f(x) + dy, and slant height l = ds.
The lateral surface area formula for a frustrum is π(r₁ + r₂)l.
Substituting and simplifying (noting that dy·dx terms vanish):

dS = 2π f(x) √(1 + (f'(x))²) dx
Integrate dS from a to b to get the total surface area S.

📊 Surface area formulas

Axis of revolution	Formula	Notes
x-axis	S = ∫ 2π \|y\| ds = ∫ 2π \|f(x)\| √(1 + (f'(x))²) dx	Absolute value handles negative y
y-axis	S = ∫ 2π \|x\| ds = ∫ 2π x √(1 + (f'(x))²) dx	For 0 ≤ a ≤ x ≤ b

🌐 Example: sphere surface area

Use the upper half of the circle x² + y² = r², which is y = √(r² - x²) over [-r, r].
Revolve around the x-axis to produce a sphere of radius r.
Apply the formula: the derivative f'(x) = -x/√(r² - x²).
After simplification, the integrand becomes 2πr (constant).
Integrating from -r to r gives S = 4πr², the familiar sphere surface area formula.

🥏 Disc method for volumes

🍞 Core idea: slicing like bread

Disc method: find the volume of a solid of revolution by slicing it into thin discs perpendicular to the axis of revolution, then summing their volumes.

Revolve the region between y = f(x) and the x-axis around the x-axis over [a, b].
This produces a solid of revolution (the surface plus its interior).
At position x, draw an infinitesimal vertical strip of width dx from the x-axis up to the curve.
Revolve this strip around the x-axis to produce a thin disc (right circular cylinder).

🔢 Volume of one disc

The strip is a rectangle of height f(x) and width dx.
The small triangle at the top (from the curve's slope) has area ½ dy·dx = ½ f'(x)(dx)² = 0 (infinitesimal squared).
So only the rectangle contributes: it sweeps out a cylinder of radius r = f(x) and height h = dx.
Volume of a cylinder = π r² h, so:

dV = π (f(x))² dx

📐 Total volume formula

Sum all the disc volumes by integrating:

V = ∫ᵇₐ π (f(x))² dx
No absolute value needed because f(x) is squared (negative values are handled automatically).

🌍 Example: sphere volume

Use the upper half of the circle y = √(r² - x²) over [-r, r].
Revolve the region between this curve and the x-axis around the x-axis.
V = ∫ π(r² - x²) dx from -r to r.
Evaluating: π[r²x - x³/3] from -r to r = 4πr³/3, the familiar sphere volume formula.

🔄 Discs around other axes

The disc method works for any axis, not just the x-axis.
Key: identify the radius r and height h of each disc.
Example: revolving around a vertical line x = 1 requires horizontal strips with height dy.
The radius becomes the horizontal distance from the strip to the line x = 1.

🥫 Shell method for volumes

🕳️ When to use shells

Shell method: find the volume by revolving vertical (or horizontal) strips to produce cylindrical shells, useful when the solid has a hole in the middle or when the disc method is awkward.

Useful when revolving around an axis parallel to the strips (e.g., revolving around the y-axis using vertical strips).
The region may have a gap between the axis and the region, creating a hollow solid.

🔢 Volume of one shell

At position x in [a, b), draw a vertical strip of width dx from the x-axis up to y = f(x).
Revolve this strip around the y-axis to produce a cylindrical shell.
The shell is the volume between two cylinders: outer radius (x + dx) and inner radius x, both with height f(x).
Volume = π(x + dx)² f(x) - πx² f(x) = 2πx f(x) dx (after dropping (dx)² terms).

dV = 2πx |f(x)| dx

📐 Shell method formula

For revolution around the y-axis over 0 ≤ a ≤ x ≤ b:

V = ∫ᵇₐ 2πx |f(x)| dx

🎯 General shell formula

More generally: dV = 2π r h w, where:
- r = distance from the axis of revolution to the strip
- h = height of the strip
- w = width of the strip (dx or dy)

📊 Example: comparing methods

Region between y = x² and the x-axis for 0 ≤ x ≤ 1, revolved around the y-axis.
Using the shell method: V = ∫₀¹ 2πx · x² dx = π/2.
The disc method would require solving for x in terms of y and integrating over y, which is more complex here.

🔀 Choosing the right method

🧭 Disc vs shell decision guide

Situation	Preferred method	Why
Revolving around x-axis, vertical strips natural	Disc method	Discs perpendicular to x-axis
Revolving around y-axis, vertical strips natural	Shell method	Shells parallel to y-axis
Revolving around y-axis, horizontal strips natural	Disc method	Discs perpendicular to y-axis
Solid has a hole / gap from axis	Shell method	Naturally handles hollow solids
Region between two curves	Either	Shell often simpler for subtraction

⚠️ Don't confuse the formulas

Disc method: uses π r² (area of a circle) times height → π(f(x))² dx.
Shell method: uses 2π r (circumference) times height times width → 2πx f(x) dx.
Mixing these leads to wrong answers: remember disc = area × height, shell = circumference × height × width.

🔄 Example: region between two curves

Region between y = x² and y = x for 0 ≤ x ≤ 1, revolved around the y-axis.
Using shells: at position x, the strip has height h = x - x² (upper curve minus lower curve).
dV = 2πx(x - x²) dx.
V = ∫₀¹ 2πx(x - x²) dx = π/6.
This is simpler than using the disc method, which would require splitting into two integrals and subtracting volumes.

Applications in Physics and Statistics

8.5 Applications in Physics and Statistics

🧭 Overview

🧠 One-sentence thesis

Integrals extend discrete summation formulas to continuous domains, enabling calculations of center of gravity, work done by variable forces, and probabilities for continuous random variables.

📌 Key points (3–5)

From discrete to continuous: Integrals replace finite sums when dealing with continuous distributions of mass, force, or probability.
Center of gravity: The balance point of a region is found by integrating moments (mass times position) and dividing by total mass.
Work with variable forces: Work equals the integral of force over displacement when force changes with position.
Continuous probability: For continuous random variables, probability density functions integrate to 1, and probabilities are found by integrating over intervals.
Common confusion: Work can be negative (when displacement opposes force direction), but probability density values can exceed 1 (only the integral over all space must equal 1).

⚖️ Center of gravity

⚖️ Discrete case foundation

Center of gravity (discrete): The point where a rod with attached masses balances; the sum of moments divided by total mass.

For masses m₁, m₂, ..., mₙ at positions x₁, x₂, ..., xₙ, the center of gravity is at x̄ = (sum of mₖxₖ) / (sum of mₖ).
Each product mₖxₖ is called the moment of mass mₖ.
The rod balances when total torque (force times position relative to the balance point) equals zero.

📐 Extending to planar regions

Lamina: A thin plate with uniform density; its area serves as its mass.

A region between curves y = f₁(x) and y = f₂(x) over [a, b] is treated as a continuous distribution of mass.
Divide the region into vertical strips of width dx at position x.
Each strip has mass (f₁(x) - f₂(x))dx and its center of gravity is at its geometric center.

🧮 Moment formulas

The moment about the x-axis for a strip:

mₓ = (f₁(x) - f₂(x))dx · (1/2)(f₁(x) + f₂(x)) = (1/2)((f₁(x))² - (f₂(x))²)dx

The moment about the y-axis for a strip:

m_y = (f₁(x) - f₂(x))dx · x = x(f₁(x) - f₂(x))dx

Total moments: Mₓ and M_y are integrals of mₓ and m_y from a to b.

📍 Final formula

For a region between y = f₁(x) and y = f₂(x) over [a, b] with f₁(x) ≥ f₂(x):

x̄ = M_y / M = (integral of x(f₁(x) - f₂(x))dx) / (integral of (f₁(x) - f₂(x))dx)
ȳ = Mₓ / M = (integral of (1/2)((f₁(x))² - (f₂(x))²)dx) / (integral of (f₁(x) - f₂(x))dx)

Example: For the region under y = x² from x = 0 to x = 1, the center of gravity is at (3/4, 3/10).

💪 Work done by forces

💪 Constant force case

Work (constant force): Force times displacement in the direction of the force.

If constant force F moves an object from x = a to x = b, then W = F · (b - a).
Work is a scalar (has magnitude but no direction), though the magnitude can be positive or negative.

🔄 Variable force extension

When force F(x) varies with position:

Over an infinitesimal interval [x, x + dx], the force is essentially F(x).
The infinitesimal work is dW = F(x)dx.
Total work: W = integral of F(x)dx from a to b.

⚠️ Important distinctions

Force as a vector in one dimension:

Positive sign = direction toward +∞
Negative sign = direction toward -∞
Magnitude is the absolute value

Work sign conventions:

Positive work: displacement in same direction as force
Negative work: displacement opposite to force direction
Zero work: no displacement in the force's direction (e.g., perpendicular forces)

Example: Lifting an object upward—you do positive work, but gravity does negative work.

🌀 Hooke's law application

Hooke's law: A spring's restoring force is F = -kx, where k is the spring constant and x is displacement from equilibrium.

The force to stretch or compress the spring counters the restoring force: F = kx.
Spring constant: k = F/x (force per unit displacement).
Work to compress or stretch: W = integral of kx dx over the displacement interval.

Example: If k = 50 N/m, compressing a spring 3 cm (0.03 m) requires work W = 25x² evaluated from 0 to -0.03, giving 0.0225 Nm.

🎲 Probability for continuous random variables

🎲 Discrete vs continuous

Discrete random variable:

Takes only specific isolated values
Probability P(X = x) can be nonzero
Probabilities sum to 1

Continuous random variable:

Takes any value in a continuum (interval or all real numbers)
P(X = x) = 0 for every specific x
Probabilities are found over intervals

📊 Probability density function

Probability density function f(x): A function f(x) ≥ 0 such that the integral of f(x) over all space equals 1, and P(a < X < b) equals the integral of f(x) from a to b.

Requirements:

Integral of f(x)dx from -∞ to ∞ equals 1
P(a < X < b) = integral of f(x)dx from a to b
For continuous variables, < and ≤ are interchangeable (since P(X = a) = 0)

📈 Exponential distribution example

For a component with average lifetime 700 days:

f(x) = λe^(-λx) for x ≥ 0, where λ = 1/700
f(x) = 0 for x < 0

Finding probabilities:

P(600 < X < 800) = integral of f(x)dx from 600 to 800 ≈ 0.1055 (about 10.55% chance)
P(X > 700) = integral of f(x)dx from 700 to ∞ ≈ 0.3679

🔔 Key concepts mentioned

Distribution function: F(x) = P(X ≤ x); its derivative equals the probability density function f(x).

Expected value: E[X] = integral of x·f(x)dx from -∞ to ∞; the weighted average of all possible values, analogous to center of gravity.

Normal distribution: The "bell curve" with density function involving e^(-(x-μ)²/(2σ²)).

Don't confuse: The density f(x) itself can exceed 1; only the total integral must equal 1.

Sequences and Series

9.1 Sequences and Series

🧭 Overview

🧠 One-sentence thesis

Infinite sequences and series provide a mathematical framework for understanding limits and sums of infinitely many terms, with convergence determined by whether partial sums or sequence terms approach a finite value.

📌 Key points (3–5)

Sequences vs series: A sequence is an ordered list of numbers; a series is the sum of a sequence's terms.
Convergence: A sequence converges if its terms approach a finite limit; a series converges if its partial sums approach a finite limit.
Geometric progressions: Series of the form a + ar + ar² + ... converge to a/(1 - r) when |r| < 1.
Common confusion: The limit of a sequence only means terms get arbitrarily close to the limit value, not that they actually reach it; this distinction matters when applying math to physical problems.
Practical techniques: Limits of sequences can often be found using the same rules as function limits (L'Hôpital's Rule, sum/product rules) by replacing x with n.

📋 Fundamental definitions

📋 What is a sequence

A sequence is an ordered list of objects (in this book, always real numbers), which can be finite or infinite.

Finite sequence: has a last number in the list.
Infinite sequence: every number is followed by another (a "successor").
Order matters: sequences 〈1, 2, 3〉 and 〈1, 3, 2〉 are different, unlike sets where {1, 2, 3} = {1, 3, 2}.
Repetition allowed: numbers may repeat in sequences but not in sets.

Function representation: Any infinite sequence a₀, a₁, a₂, ... can be written as the range of a function f mapping natural numbers N into real numbers R: f(n) = aₙ.

Notation: {aₙ}∞ₙ₌₀ or simply {aₙ} when the starting index is understood (n is always an integer).

📋 What is a series

An infinite series is the sum of an infinite sequence.

If the sequence is {aₙ}∞ₙ₌₀, the series is written as the sum from n=0 to infinity of aₙ = a₀ + a₁ + a₂ + ... + aₙ + ...
The sum is defined through partial sums: sₙ = sum from k=0 to n of aₖ = a₀ + a₁ + a₂ + ... + aₙ.
The series sum equals the limit as n approaches infinity of sₙ.

🎯 Convergence and divergence

🎯 Sequence convergence

A real number L is the limit of an infinite sequence {aₙ}, written as limit as n approaches infinity of aₙ = L or simply aₙ → L, if for any given number ε > 0 there exists an integer N such that |aₙ - L| < ε for all n > N.

Plain language: A sequence converges to L if the terms aₙ can be made arbitrarily close to L for n sufficiently large.

Convergent sequence: converges to some limit L.
Divergent sequence: does not converge.

Example: For aₙ = 1/(2ⁿ) with n ≥ 1, the limit as n approaches infinity is 0, so the sequence converges to 0.

Example: For aₙ = (2n + 1)/(3n + 2), using L'Hôpital's Rule treating n as a real variable x gives limit 2/3, so the sequence converges to 2/3.

Example: For aₙ = eⁿ/(3n + 2), L'Hôpital's Rule gives limit infinity, so the sequence diverges.

🎯 Series convergence

A series is convergent if its sequence of partial sums {sₙ} converges to a real number s; if {sₙ} diverges, the series is divergent.

Key insight: To determine if a series converges, examine whether the partial sums converge.

🔧 Practical limit techniques

Most cases don't need the formal definition: Use the same rules from function limits (Chapters 1 and 3) by replacing x with n:

Sums and products of limits
L'Hôpital's Rule
Standard limit formulas

Example: For aₙ = eⁿ/(3n + 2), treat n as real variable x and apply L'Hôpital's Rule: limit of eⁿ/3 = infinity.

🔢 Geometric progressions

🔢 Definition and formula

A geometric progression is a series of the form a + ar + ar² + ar³ + ... + arⁿ + ... with a ≠ 0.

Convergence condition: The series converges when |r| < 1.

Sum formula: When |r| < 1, the sum from n=0 to infinity of arⁿ = a/(1 - r).

Derivation sketch:

Multiply the n-th partial sum sₙ by r: rsₙ = ar + ar² + ar³ + ... + arⁿ + arⁿ⁺¹
Subtract: sₙ - rsₙ = a - arⁿ⁺¹
Solve: sₙ = a(1 - rⁿ⁺¹)/(1 - r)
Take limit as n → ∞: rⁿ⁺¹ → 0 when |r| < 1, so limit of sₙ = a/(1 - r)

🔢 Applications

Example (Zeno's distances): The sum from n=1 to infinity of 1/(2ⁿ) is a geometric progression with a = 1/2 and r = 1/2, so the sum = (1/2)/(1 - 1/2) = 1.

Example (repeating decimals): Write 0.171717... as a rational number.

This equals 0.17 + 0.0017 + 0.000017 + ...
Geometric progression with a = 0.17 = 17/100 and r = 0.01 = 1/100
Sum = 0.17/(1 - 0.01) = 0.17/0.99 = 17/99

🌟 Special sequences

🌟 Fibonacci sequence

Definition: {Fₙ} starts with F₀ = 0, F₁ = 1, and each successive term is the sum of the previous two: Fₙ = Fₙ₋₁ + Fₙ₋₂ for integers n ≥ 2.

This is a recurrence relation.

First ten terms: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34.

Divergence: {Fₙ} diverges since Fₙ → ∞.

🌟 Fibonacci ratios and the golden ratio

Ratio sequence: For n ≥ 2, define aₙ = Fₙ/Fₙ₋₁ (ratio of each term to the previous term).

First few values: a₂ = 1, a₃ = 2, a₄ = 1.5, a₅ ≈ 1.667

Convergence proof:

Assume aₙ → a for some real number a
Divide Fₙ = Fₙ₋₁ + Fₙ₋₂ by Fₙ₋₁: Fₙ/Fₙ₋₁ = 1 + Fₙ₋₂/Fₙ₋₁
This gives aₙ = 1 + 1/aₙ₋₁
Take limit as n → ∞: a = 1 + 1/a
Solve: a² - a - 1 = 0, so a = (1 ± √5)/2
Since a must be positive, a = (1 + √5)/2 ≈ 1.618

The golden ratio is (1 + √5)/2, the limit of the Fibonacci ratio sequence.

Note: This number is the subject of many claims regarding its appearance in nature and aesthetic appeal.

🤔 Zeno's motion paradox

🤔 The Dichotomy paradox

Setup: Imagine a line segment of length 1 m with a person at one end. Before traversing the entire distance, the person must first travel 1/2 the distance. Before that, 1/4 the distance. Before that, 1/8 the distance, and so on.

Zeno's argument: There is no "first" distance to traverse, so motion cannot even begin.

Key observation: The distance markers 1/2, 1/4, 1/8, ... form an infinite sequence approaching 0, and their sum is an infinite series: 1/2 + 1/4 + 1/8 + ... = 1.

🤔 Why the geometric sum doesn't resolve the paradox

Common mistake: People claim the convergence of the geometric progression (sum = 1) proves Zeno wrong by showing "an infinite number of movements can be completed in a finite time."

Why this fails:

Zeno never argued about time: Time is irrelevant to the paradox.
Circular reasoning: Introducing time brings in speed (distance over time), but using speed assumes distance can be traveled—precisely what Zeno rejects. You cannot prove motion is possible by assuming it is possible.
The reverse problem: If you reverse the person's position to start at the other end (moving 1/2, then 1/4, then 1/8, ...), a new problem arises: there is no "last step," so motion is still impossible.

🤔 Mathematical vs physical reality

Limits only approach, never reach: Partial sums approach 1 but never actually reach it. The limit definition uses an inequality—you can get arbitrarily close, that's all. The equality in the sum formula is shorthand, an abstraction based on the real number system, not necessarily physical reality.

Don't confuse: Mathematical convergence (getting arbitrarily close) with physically reaching a destination.

The real issue: Zeno's paradox is not purely mathematical—it is about space and hence physical, with philosophical elements. All arguments against Zeno end up in circular reasoning.

Possible resolution: If space had some smallest unit that could not be divided further (not infinitely divisible), there would be no paradox—motion over a finite distance could be decomposed into a large but finite number of irreducible steps. As of the excerpt's writing, whether space is continuous or discrete (quantized) remains an open question in physics.

🧪 Tests for convergence

🧪 Monotone Bounded Test

A sequence that is bounded and monotone (either always increasing or always decreasing) is convergent.

Intuition: Think of a bound as a wall the sequence can never pass. An increasing sequence moves toward the bound M but can never pass it. It cannot diverge to infinity, and it cannot fluctuate back and forth since it always increases. Thus it must converge somewhere before or at M.

Limitation: This test tells you only that the sequence converges, not what it converges to.

Example: For n ≥ 1, define aₙ = (1·3·5···(2n - 1))/(2·4·6···(2n)).

The sequence is always decreasing: aₙ₊₁ = aₙ · (2n + 1)/(2n + 2) < aₙ · 1 = aₙ
The sequence is bounded: 0 < aₙ (and there is an upper bound)
Therefore the sequence is convergent by the Monotone Bounded Test

🧪 Comparison Test

If {bₙ} diverges to ∞ and {aₙ} is a sequence such that aₙ ≥ bₙ for all n > N for some N, then {aₙ} also diverges to ∞. Likewise, if {cₙ} diverges to -∞ and aₙ ≤ cₙ for all n > N for some N, then {aₙ} also diverges to -∞.

Intuition: Something larger than a quantity going to infinity must also go to infinity.

🧪 General principle

Finite changes don't matter: Changing or removing a finite number of terms in a sequence does not affect its convergence or divergence.

Tests for Convergence

9.2 Tests for Convergence

🧭 Overview

🧠 One-sentence thesis

A variety of tests—including monotone bounded, comparison, ratio, integral, and telescoping tests—allow us to determine whether sequences and series converge without necessarily computing their limits.

📌 Key points (3–5)

Two main categories: tests for sequence convergence (monotone bounded, comparison) and tests for series convergence (n-th term, ratio, integral, p-series, comparison, limit comparison, telescoping).
Key distinction: convergence tests tell you whether something converges, but usually not what it converges to.
Common confusion: the n-th term test can only prove divergence, never convergence—if the limit of terms is zero, the series might still diverge (e.g., the harmonic series).
Finite terms don't matter: changing or removing a finite number of terms does not affect whether a sequence or series converges or diverges.
Why it matters: these tests provide practical tools to analyze infinite processes without computing exact values.

🔢 Tests for sequence convergence

🔢 Monotone Bounded Test

A sequence that is bounded and monotone—either always increasing or always decreasing—is convergent.

What "monotone" means: the sequence either always increases or always decreases; it doesn't fluctuate back and forth.
Intuition: think of the bound as a wall the sequence can never pass. An increasing sequence moving toward a wall cannot diverge to infinity and cannot fluctuate, so it must converge somewhere before or at the wall.
Important limitation: this test tells you only that the sequence converges, not what it converges to.
Example: the sequence a_n = (1·3·5···(2n−1)) / (2·4·6···(2n)) is always decreasing (since multiplying by (2n+1)/(2n+2) < 1 makes it smaller) and bounded below by 0 and above by 1, so it converges.

🔍 Comparison Test for sequences

If {b_n} diverges to ∞ and a_n ≥ b_n for all n > N for some N, then {a_n} also diverges to ∞. Likewise, if {c_n} diverges to −∞ and a_n ≤ c_n for all n > N, then {a_n} also diverges to −∞.

Intuition: something larger than a quantity going to infinity must also go to infinity.
Key phrase: "for all n > N for some N" means the inequality only needs to hold eventually, not necessarily from the very first term.
Don't confuse: this version is for sequences (individual terms), not series (sums of terms).

📊 Tests for series convergence: basic tests

📊 n-th Term Test

If ∑a_n converges then the limit as n→∞ of a_n = 0.

Logically equivalent form (more useful in practice): If the limit as n→∞ of a_n is not equal to 0, then ∑a_n diverges.
Why it works: if the series converges to L, then each term a_n = s_n − s_(n−1) is the difference of successive partial sums; taking the limit gives L − L = 0.
Critical limitation: this test can never prove convergence, only divergence.
Example: ∑(n/(2n+1)) diverges because the limit of n/(2n+1) = 1/2 ≠ 0.
Common confusion: the harmonic series ∑(1/n) has terms approaching zero, yet it diverges—so having terms approach zero is not sufficient for convergence.

⚖️ Ratio Test

For a series ∑a_n of positive terms, let R = limit as n→∞ of (a_(n+1)/a_n). Then:

if R < 1 then the series converges,

if R > 1 (including R = ∞) then the series diverges,

if R = 1 then the test fails (inconclusive).

When R > 1: the terms do not approach zero, so the series diverges by the n-th term test.
When R = 1: the test is inconclusive—you need another test.
What it doesn't tell you: when the test shows convergence, it does not tell you what the series converges to.
Example: for ∑(n/2^n), R = limit of ((n+1)/2^(n+1)) / (n/2^n) = limit of (n+1)/(2n) = 1/2 < 1, so the series converges.

🧮 Tests for series convergence: integral and p-series

🧮 Integral Test

For a series ∑a_n of positive terms, let f(x) be a decreasing function on [1,∞) such that f(n) = a_n for all integers n ≥ 1. Then ∑a_n and the integral from 1 to ∞ of f(x)dx either both converge or both diverge.

Visual intuition: rectangles of height a_n and width 1 can be compared to the area under the curve y = f(x).
- If the rectangles are under the curve, the integral's area is greater than the sum of rectangle areas, so convergence of the integral implies convergence of the series.
- If the rectangles protrude above the curve, the sum of rectangle areas is greater than the integral, so divergence of the integral implies divergence of the series.
Why the function must be decreasing: this ensures the rectangles are consistently either all below or all above the curve.
Example: for the p-series ∑(1/n^p) with p > 1, the integral of 1/x^p from 1 to ∞ equals 1/(p−1), which is finite, so the series converges.

📐 p-series Test

The series ∑(1/n^p) converges for p > 1, and diverges for p ≤ 1.

This is a special case proven using the Integral Test.
The harmonic series (p = 1): ∑(1/n) diverges even though 1/n → 0, illustrating that terms approaching zero is not sufficient for convergence.
Example: ∑(1/n²) converges (p = 2 > 1), but ∑(1/√n) diverges (p = 1/2 < 1).

🔗 Tests for series convergence: comparison tests

🔗 Comparison Test for series

If 0 ≤ a_n ≤ b_n for n > N for some N, and if ∑b_n is convergent then ∑a_n is convergent. Similarly, if 0 ≤ b_n ≤ a_n for n > N for some N, and if ∑b_n is divergent then ∑a_n is divergent.

Convergence reasoning: since ∑b_n converges, its partial sums are bounded. Because 0 ≤ a_n ≤ b_n, the partial sums for ∑a_n are also bounded. Since a_n ≥ 0, the partial sums for ∑a_n are increasing, so by the Monotone Bounded Test they must converge.
Divergence reasoning: clear intuitively—something larger than a divergent series must also diverge.
Example: since n^n ≥ n² > 0 for n > 2, we have 0 < 1/n^n ≤ 1/n². Since ∑(1/n²) converges (p-series with p = 2), ∑(1/n^n) converges by the Comparison Test.

🔗 Limit Comparison Test

For two series ∑a_n and ∑b_n of positive terms, let L = limit as n→∞ of (a_n/b_n). Then:

if 0 < L < ∞ then ∑a_n and ∑b_n either both converge or both diverge,

if L = 0 and ∑b_n converges then ∑a_n converges,

if L = ∞ and ∑b_n diverges then ∑a_n diverges.

Why it works (case 0 < L < ∞): by definition of limit, a_n/b_n can be made arbitrarily close to L. In particular, for large enough n, L/2 < a_n/b_n < 3L/2, which means 0 < a_n < (3L/2)b_n and 0 < (L/2)b_n < a_n. These inequalities allow the regular Comparison Test to apply.
When to use it: useful when you have a "messy" series and want to compare it to a simpler series (like a geometric or p-series).
Example: to test ∑((n+3)/(n·2^n)), compare to ∑(1/2^n) (a convergent geometric series). The limit of ((n+3)/(n·2^n)) / (1/2^n) = (n+3)/n = 1, so by the Limit Comparison Test the original series converges.

🔭 Telescoping Series Test

🔭 What is a telescoping series

A series ∑a_n is telescoping if a_n = b_n − b_(n+1) for some sequence {b_n}.

Why "telescoping": when you write out the partial sum, most terms cancel:
- s_n = (b₁ − b₂) + (b₂ − b₃) + ··· + (b_n − b_(n+1)) = b₁ − b_(n+1)
Convergence criterion: ∑a_n converges if and only if the sequence {b_n} converges. If b_n → L, then ∑a_n = b₁ − L.
Example: for ∑(1/(n(n+1))), write 1/(n(n+1)) = 1/n − 1/(n+1). Let b_n = 1/n. Since b_n → 0, the series converges to b₁ − 0 = 1.

🔭 How to recognize and use it

Recognition: look for terms that can be split into a difference of two consecutive terms.
Strategy: find a sequence {b_n} such that a_n = b_n − b_(n+1), then check if {b_n} converges.
Don't confuse: the starting index matters—if the series starts at n = 1, use b₁; if it starts elsewhere, adjust accordingly.

🧩 Properties of convergent series

🧩 Algebraic properties

Let ∑a_n and ∑b_n be convergent series, and let c be a number. Then:

Property	Statement
Sum/difference	∑(a_n ± b_n) is convergent, with ∑(a_n ± b_n) = ∑a_n ± ∑b_n
Scalar multiplication	∑(c·a_n) is convergent, with ∑(c·a_n) = c·∑a_n

These properties are based on similar properties of limits.
Practical use: you can break apart or factor out constants when analyzing series.

📋 Summary table of tests

Test	What it tells you	Key limitation
Monotone Bounded (sequences)	Bounded + monotone → convergent	Doesn't tell you the limit
n-th Term (series)	If limit of terms ≠ 0 → diverges	Cannot prove convergence
Ratio	R < 1 → converges; R > 1 → diverges	Fails when R = 1
Integral	Series and integral behave the same	Need a decreasing function f(x)
p-series	∑(1/n^p): converges if p > 1	Only applies to this specific form
Comparison	Smaller than convergent → converges; larger than divergent → diverges	Need to find a suitable comparison series
Limit Comparison	Ratio of terms → L determines behavior	Need to choose a good comparison series
Telescoping	If b_n → L then ∑a_n = b₁ − L	Must recognize the telescoping form

Alternating Series

9.3 Alternating Series

🧭 Overview

🧠 One-sentence thesis

Alternating series can converge even when their non-alternating counterparts diverge, and the distinction between absolute and conditional convergence determines whether rearranging terms changes the sum.

📌 Key points (3–5)

Alternating Series Test: if the absolute values of terms decrease to zero, the alternating series converges.
Conditional vs absolute convergence: a series is conditionally convergent if it converges but the series of absolute values diverges; it is absolutely convergent if the series of absolute values converges.
Why absolute convergence matters: absolute convergence implies ordinary convergence, and absolutely convergent series have the same sum regardless of term order.
Common confusion: the alternating harmonic series converges, but the harmonic series diverges—alternating signs can turn divergence into convergence.
Riemann's Rearrangement Theorem: conditionally convergent series can be rearranged to converge to any number, but absolutely convergent series cannot.

🔄 How alternating series converge

🔄 What makes a series alternating

An alternating series is a series where the signs of the terms alternate between positive and negative.

The general form uses a factor like (−1) raised to some power to flip signs.
Example: the series 1 − 1/2 + 1/3 − 1/4 + 1/5 − ... alternates signs with each term.

✅ The Alternating Series Test

Alternating Series Test: If the sum of a_n is an alternating series such that the absolute values of the terms are decreasing to 0, then the series converges.

The condition means: |a_(n+1)| ≤ |a_n| for all n, and |a_n| → 0 as n → ∞.
Why it works:
- Odd-numbered partial sums decrease from the first term.
- Even-numbered partial sums increase from the second term.
- Both sequences are bounded and monotone, so they converge by the Monotone Bounded Test.
- The difference between consecutive partial sums goes to zero, so both converge to the same limit.
Example: The series 1 − 1/2 + 1/3 − 1/4 + 1/5 − ... has terms whose absolute values (1, 1/2, 1/3, 1/4, ...) decrease to zero, so it converges.
Don't confuse: The harmonic series 1 + 1/2 + 1/3 + 1/4 + ... diverges, but alternating the signs makes it converge—the key is that terms decrease to zero and signs alternate.

🧮 Example: testing for convergence

For the series sum of (−1)^n / ln(n) starting at n=2:

The general term is a_n = (−1)^n / ln(n).
Since ln(n+1) > ln(n) for n ≥ 2, we have 1/ln(n+1) < 1/ln(n), so |a_n| decreases.
Since ln(n) → ∞ as n → ∞, we have 1/ln(n) → 0, so |a_n| → 0.
By the Alternating Series Test, the series converges.

🎯 Absolute vs conditional convergence

🎯 Definitions and distinction

A series sum of a_n is conditionally convergent if sum of a_n converges but sum of |a_n| diverges.

A series sum of a_n is absolutely convergent if sum of |a_n| converges.

| Type | sum of a_n | sum of |a_n| | Example | |------|------------|--------------|---------| | Conditionally convergent | Converges | Diverges | 1 − 1/2 + 1/3 − 1/4 + ... | | Absolutely convergent | Converges | Converges | sum of (−1)^(n−1) / n² |

The alternating harmonic series sum of (−1)^(n−1) / n converges, but sum of 1/n diverges, so it is conditionally convergent.
The series sum of (−1)^(n−1) / n² is absolutely convergent because sum of 1/n² converges (by the p-series Test with p=2).

🔗 Absolute Convergence Test

Absolute Convergence Test: If sum of |a_n| converges, then sum of a_n converges.

Why it works:
- Decompose the series into positive terms and negative terms: sum of a_n = sum of a_pos − sum of |a_neg|.
- Each part is a portion of the convergent series sum of |a_n|, so each converges.
- The difference of two convergent series is finite, so sum of a_n converges.
Logically equivalent form: If sum of a_n diverges, then sum of |a_n| diverges.
Don't confuse: Absolute convergence implies ordinary convergence, but the reverse is not true—conditional convergence means ordinary convergence without absolute convergence.

🔀 Rearrangement behavior

🔀 Riemann's Rearrangement Theorem

Riemann's Rearrangement Theorem: The terms of a conditionally convergent series can be rearranged to converge to any number.

How it works for the alternating harmonic series:
- Write 1 − 1/2 + 1/3 − 1/4 + 1/5 − ... as (1 + 1/3 + 1/5 + ...) − (1/2 + 1/4 + 1/6 + ...).
- Both the positive-term series and the negative-term series diverge.
- To converge to any number A: add positive terms until the partial sum exceeds A, then subtract negative terms until it falls below A, and repeat.
- Since terms approach zero, this process can make the series converge to A.
Why this doesn't happen with absolute convergence: An absolutely convergent series has the same sum regardless of how terms are rearranged.
Example: A conditionally convergent series can be rearranged to appear to converge to 0, 1, or any other target value.

⚠️ Implications for conditionally convergent series

Conditionally convergent series are "fragile"—their sum depends on the order of terms.
Absolutely convergent series are "robust"—rearrangement does not change the sum.
Don't confuse: A divergent alternating series (like 1 − 1 + 1 − 1 + ...) can also appear to converge to different values depending on grouping, but this is different from Riemann's theorem, which applies to conditionally convergent series.

Power Series

9.4 Power Series

🧭 Overview

🧠 One-sentence thesis

Power series are infinite series of the form ∑ aₙ(x − c)ⁿ that converge on specific intervals and can be differentiated or integrated term by term, enabling functions to be represented as polynomials of infinite degree.

📌 Key points (3–5)

What a power series is: an infinite series with constants aₙ and powers of (x − c), where x is a variable and c is a constant.
Interval and radius of convergence: every power series converges on a specific interval (possibly a single point or all real numbers), and the radius R is half the interval's length.
How to find convergence: use the Ratio Test on absolute values to find r(x), then the series converges when r(x) < 1 and diverges when r(x) > 1.
Common confusion: when r(x) = 1, the Ratio Test is inconclusive—you must check those endpoint values individually.
Term-by-term operations: power series can be differentiated and integrated term by term within their radius of convergence.

📐 What power series are

📐 Definition and basic form

Power series: an infinite series whose terms involve constants aₙ and powers of x − c, written as ∑ aₙ(x − c)ⁿ.

The variable x makes this different from ordinary series with constant terms.
Often c = 0, simplifying the form to ∑ aₙxⁿ.
Think of it as a polynomial with infinitely many terms.

🔢 The geometric series example

The geometric progression ∑ rⁿ = 1 + r + r² + r³ + ... converges to 1/(1 − r) when |r| < 1.
Replacing constant r with variable x gives the power series ∑ xⁿ = 1/(1 − x) for −1 < x < 1.
This series diverges for |x| ≥ 1 by the n-th Term Test.
Example: the power series defines a function f(x) = 1/(1 − x) on the interval (−1, 1).

🎯 Convergence concepts

🎯 Interval of convergence

Interval of convergence: the set of all x values for which the power series converges.

The interval can be open, closed, half-open, a single point, or all real numbers.
On this interval, the power series is a well-defined function of x.
Don't confuse: the interval is not always symmetric or centered at zero—it depends on the series.

📏 Radius of convergence

Radius of convergence R: half the length of the interval of convergence.

If the interval is all real numbers, we say R = ∞.
Example: for ∑ xⁿ with interval (−1, 1), the radius is R = 1.

🧪 Using the Ratio Test

The Ratio Test finds where a power series ∑ fₙ(x) converges:

Compute r(x) = lim(n→∞) |fₙ₊₁(x)/fₙ(x)|.
Treat x as fixed when taking the limit; r(x) is a function of x.
Convergence: series converges for all x where r(x) < 1.
Divergence: series diverges when r(x) > 1.
Inconclusive: when r(x) = 1, check those x values individually.

🔬 Examples of finding convergence

🔬 Example: series with factorial in denominator

For ∑ (xⁿ/n!):

Compute r(x) = |x| · lim(n→∞) |1/(n+1)| = |x| · 0 = 0 for any fixed x.
Since r(x) = 0 < 1 for all x, the interval of convergence is all real numbers (−∞ < x < ∞).

🔬 Example: series with n in denominator

For ∑ (xⁿ/n):

Compute r(x) = |x| · lim(n→∞) |n/(n+1)| = |x| · 1 = |x|.
Series converges when |x| < 1, diverges when |x| > 1.
Check endpoints: at x = 1, get ∑(1/n) which diverges; at x = −1, get ∑(−1)ⁿ⁻¹/n which converges.
Interval of convergence: −1 ≤ x < 1 (closed on left, open on right).

🔬 Example: series with factorial in numerator

For ∑ n!xⁿ:

Compute r(x) = |x| · lim(n→∞) |n+1| = 0 if x = 0, ∞ if x ≠ 0.
Since r(x) = ∞ > 1 for all x ≠ 0, series diverges except at x = 0.
Interval of convergence: the single point x = 0.

⚙️ Operations on power series

⚙️ Differentiation and integration

For a power series f(x) = ∑ aₙ(x − c)ⁿ that converges for |x − c| < R:

Derivative: f′(x) = ∑ n·aₙ(x − c)ⁿ⁻¹ converges for |x − c| < R.
Integral: ∫f(x)dx = C + ∑ [aₙ/(n+1)](x − c)ⁿ⁺¹ converges for |x − c| < R.
Both operations are done term by term.
Important: convergence at endpoints |x − c| = R is not guaranteed—check individually.

🧮 Example: differentiating a geometric series

For f(x) = ∑ xⁿ = 1 + x + x² + x³ + ...:

Differentiate term by term: f′(x) = 0 + 1 + 2x + 3x² + ... = ∑ n·xⁿ⁻¹.
Since f(x) = 1/(1 − x) for −1 < x < 1, then f′(x) = 1/(1 − x)².
Check endpoints: at x = ±1, the series for f′(x) diverges by the n-th Term Test.
Result: ∑ n·xⁿ⁻¹ = 1/(1 − x)² for −1 < x < 1.

🌊 Bessel functions application

🌊 What Bessel functions are

Bessel's function of order zero J₀(x): a solution to Bessel's equation (d²y/dx² + (1/x)(dy/dx) + y = 0), defined as a power series.

J₀(x) = ∑ [(−1)ⁿ · x²ⁿ] / [(n!)² · 2²ⁿ] = 1 − x²/2² + x⁴/(2²·4²) − x⁶/(2²·4²·6²) + ...
The Ratio Test shows J₀(x) converges for all x.
Used in engineering and physics applications involving oscillations and mechanical vibrations.

🌊 Higher-order Bessel functions

The general Bessel equation of order m has solution Jₘ(x):

Jₘ(x) = ∑ [(−1)ⁿ / (n! · (n+m)!)] · (x/2)²ⁿ⁺ᵐ for m = 0, 1, 2, ...
Example: J₁(x) = x/2 − x³/(2²·4) + x⁵/(2²·4²·6) − ...
Term-by-term differentiation shows J′₀(x) = −J₁(x).
J₀(x) and J₁(x) behave like "poor man's" cosine and sine functions, respectively.

Taylor's Series

9.5 Taylor’s Series

🧭 Overview

🧠 One-sentence thesis

Taylor's series provides a systematic method to represent any sufficiently differentiable function as an infinite power series, enabling polynomial approximations that simplify computations while maintaining controllable accuracy.

📌 Key points (3–5)

What Taylor's series does: converts a function into an infinite sum of polynomial terms using the function's derivatives at a single point.
The core formula: coefficients are determined by the pattern a_n = f^(n)(c) / n!, where c is the expansion point.
Practical use: even one or two terms often suffice for approximation when x is close to c (i.e., |x| ≪ 1), making complex functions easier to compute.
Common confusion: Taylor's series is not how calculators compute functions like sin(x) or e^x—those typically use algorithms like CORDIC and lookup tables, not infinite series.
Accuracy control: the Remainder Theorem quantifies approximation error, showing how the n-th degree polynomial differs from the true function.

📐 The Taylor's series formula

📐 Deriving the coefficients

Taylor's series coefficients: a_n = f^(n)(c) / n! for n ≥ 0, where f^(n)(c) is the n-th derivative of f evaluated at x = c.

If a function f(x) can be written as a power series in (x − c), then the coefficients are uniquely determined by its derivatives at c.
The derivation works by:
- Setting f(c) = a_0 (the constant term).
- Differentiating term by term and evaluating at x = c.
- Each derivative "picks out" one coefficient: f'(c) = 1·a_1, f''(c) = 2·1·a_2, f'''(c) = 3·2·1·a_3, etc.
The factorial in the denominator cancels the factorial from repeated differentiation.

📐 The full Taylor's formula

Taylor's formula: If f(x) has a power series representation in powers of x − c, then f(x) = sum from n=0 to infinity of [f^(n)(c) / n!] · (x − c)^n for all x in the interval of convergence.

This representation is unique within the interval of convergence.
Special case: when c = 0, this is sometimes called the Maclaurin's series (though that term is less common outside pure mathematics).

🔢 Key examples and techniques

🔢 Exponential function: e^x

Since the derivative of e^x is itself, all derivatives at x = 0 equal 1.
Taylor's series: e^x = 1 + x + x²/2! + x³/3! + x⁴/4! + ...
Converges for all x (interval of convergence is all real numbers).
Example application: e^x ≈ 1 + x is a good approximation when |x| ≪ 1.

🔢 Trigonometric functions

Sine function:

Derivatives cycle every four steps: sin, cos, −sin, −cos, sin, ...
At x = 0: values alternate 0, 1, 0, −1, 0, ...
Taylor's series: sin(x) = x − x³/3! + x⁵/5! − x⁷/7! + ... = sum of (−1)^n · x^(2n+1) / (2n+1)!
Only odd powers appear (because sine is an odd function).
Converges for all x.

Cosine function:

Can be found by differentiating the sine series term by term.
Taylor's series: cos(x) = 1 − x²/2! + x⁴/4! − x⁶/6! + ... = sum of (−1)^n · x^(2n) / (2n)!
Only even powers appear (because cosine is an even function).
Converges for all x.

🔢 Logarithmic function

ln(x) is not defined at x = 0, so we expand ln(1 + x) instead.
Derivatives: f'(x) = 1/(1+x), f''(x) = −1/(1+x)², f'''(x) = 2/(1+x)³, etc.
Pattern: f^(n)(x) = (−1)^(n−1) · (n−1)! / (1+x)^n for n ≥ 1.
Taylor's series: ln(1 + x) = x − x²/2 + x³/3 − x⁴/4 + ... = sum of (−1)^(n−1) · x^n / n.
Converges for −1 < x ≤ 1 (note: includes x = 1 but not x = −1).

🔢 Substitution technique

Instead of computing derivatives from scratch, substitute into a known series.
Example: For e^(x²), replace every x in the series for e^x with x²:
- e^(x²) = 1 + x² + x⁴/2! + x⁶/3! + x⁸/4! + ...
This works whenever the substituted value lies within the original series' convergence interval.

🎯 Practical applications

🎯 Polynomial approximations

n-th degree Taylor polynomial: P_n(x) = sum from k=0 to n of [f^(k)(c) / k!] · (x − c)^k

P_n(x) is the n-th partial sum of the Taylor's series—a polynomial of degree at most n.
Also called the O(x^n) approximation to f(x).
Why use approximations?
- Polynomials are simpler to compute, integrate, and differentiate.
- For |x| ≪ 1, higher powers x^n become negligible, so few terms are needed.
- In many practical applications, only one or two terms suffice.

🎯 Physical science example

Planck's Law simplification:

The energy density E(λ) of black-body radiation involves e^(hc/λkT).
For large wavelengths (λ ≫ 1), the exponent hc/λkT ≪ 1.
Using e^x ≈ 1 + x, the complex formula simplifies to E(λ) ≈ 8πkT / λ⁴.
This shows how Taylor's series can reduce complicated physics formulas to manageable forms.

🎯 Evaluating integrals

Some functions (like e^(−x²)) have no elementary antiderivative.
Replace the function with its Taylor's series, then integrate term by term.
Example: integral of sin(x)/x can be computed by expanding sin(x) = x − x³/3! + x⁵/5! − ..., then dividing by x and integrating.

📊 Accuracy and limitations

📊 The Remainder Theorem

Remainder formula: f(x) = P_n(x) + R_n(x), where R_n(x) = f^(n+1)(c + θ(x−c)) / (n+1)! · (x−c)^(n+1) for some θ between 0 and 1.

R_n(x) measures the error when approximating f(x) by the n-th degree polynomial P_n(x).
Alternative form: R_n(x) = (1/n!) · integral from c to x of (x − t)^n · f^(n+1)(t) dt.
Since θ is unknown, typically only an upper bound on the error can be found.
The integral form may be more practical for numerical estimation.

📊 Convergence behavior

The excerpt shows graphs comparing sin(x) with its O(x⁷), O(x¹¹), and O(x¹⁵) approximations.
All three are good over [−2, 2]; the O(x¹⁵) approximation remains fairly good over [−6, 6].
For |x| > 6, the approximations become poor quickly—they approach ±∞ while sin(x) remains bounded.
Don't confuse: convergence of the series (true for all x) vs. practical accuracy of finite approximations (good only near the expansion point).

📊 How calculators really work

Common misconception: hand-held calculators use Taylor's series to compute sin(x), cos(x), e^x, etc.
Reality: Taylor's series would require far too many terms for large x.
Instead, most calculators use:
- CORDIC algorithm (Coordinate Rotation Digital Computer): uses bit-shifting (computationally cheap) to reduce large inputs to a smaller range.
- Lookup tables stored in memory for values in that range.
- Interpolation for numbers between table entries.
This is much faster and more accurate than computing many series terms.

🔍 Important patterns and techniques

🔍 Recognizing series structure

Function	Series pattern	Key feature
e^x	All positive, factorial denominators	Simplest; all derivatives equal at x=0
sin(x)	Alternating signs, odd powers only	Reflects odd-function symmetry
cos(x)	Alternating signs, even powers only	Reflects even-function symmetry
ln(1+x)	Alternating signs, linear denominators	Narrower convergence (−1 < x ≤ 1)

🔍 When to use which method

Direct computation: when derivatives follow a simple pattern (e.g., e^x, sin(x), cos(x)).
Substitution: when the function is a composition with a known series (e.g., e^(x²), cos(x²)).
Differentiation/integration: when the function is the derivative or integral of a known series (e.g., cos(x) from sin(x)).
Don't confuse: finding the series vs. determining its interval of convergence—use the Ratio Test or other convergence tests separately.