Basic Analysis Introduction to Real Analysis

About this book

0.1 About this book

🧭 Overview

🧠 One-sentence thesis

This first volume provides a one-semester rigorous foundation in basic analysis, teaching students why calculus is true through careful proofs of limits, derivatives, and integrals.

📌 Key points (3–5)

Target audience: Students who have completed a basic proof course, suitable for both those not pursuing graduate school and those in more advanced courses covering metric spaces.
Core content: Real numbers and completeness, sequences, continuous functions, derivatives, Riemann integrals, sequences of functions, and optionally metric spaces.
Pedagogical approach: Prefers direct proofs and contrapositive over contradiction; uses Darboux sums (not tagged partitions) for the Riemann integral; progressively reduces formalism as the book advances.
Common confusion: This is not a "how to do calculus" book—it explains why calculus works through rigorous proofs, not just computational techniques.
Flexible structure: Can be adapted for slower courses (UIUC 444 style), faster courses with metric spaces (UW 521 style), or year-long courses with volume II covering multivariable topics.

📚 Course structure and prerequisites

📚 What you need before starting

A basic proof course is required (examples given: books by Hammack, Franklin, or D'Angelo & West).
The book assumes you can construct proofs but may not yet understand the rigorous foundations of calculus.

🎯 Suggested semester paths

The excerpt provides three main course configurations:

Course type	Pace	Sections covered	Key features
Slower (UIUC 444)	Basic	§0.3, §1.1–§1.4, §2.1–§2.5, §3.1–§3.4, §4.1–§4.2, §5.1–§5.3, §6.1–§6.3	Does not include metric spaces; ends with Picard's theorem
Faster with metric spaces (UW 521)	Rigorous	§0.3, §1.1–§1.4, §2.1–§2.5, §3.1–§3.4, §4.1–§4.2, §5.1–§5.3, §6.1–§6.2, §7.1–§7.6	Covers metric spaces; proves Picard's theorem via fixed point theorem
Faster without metric spaces	Comprehensive	All sections of chapters 0–6	Covers more topics but skips metric spaces

📖 Year-long option

With volume II, the book supports a full-year course covering multivariable topics.
Recommended approach: cover most of volume I in the first semester, leaving metric spaces for the beginning of the second semester.

🎓 Pedagogical philosophy

🎓 Why this book exists

Analysis is the branch of mathematics that deals with inequalities and limits.

The excerpt emphasizes that calculus courses teach what is true but not why it is true.
This book aims to show students why calculus works through rigorous proofs.
Example analogy from the excerpt: An auto mechanic who only knows how to change oil but not how the engine works cannot diagnose new problems; similarly, a teacher who doesn't understand the definition of the Riemann integral or derivative may give nonsensical answers.

✍️ Proof style choices

The book makes deliberate stylistic decisions:

Prefers direct proofs and contrapositive over proof by contradiction when possible.
Rationale: Contradiction can confuse beginners because "we are talking about objects that do not exist."
Uses contradiction only when contrapositive is too awkward or when contradiction follows quickly.

📝 Notation and formalism

Uses B (defined as) instead of = to define objects, even in local contexts like a single exercise.
Progressively reduces formalism as the book advances, leaving out more details to avoid clutter.
Avoids unnecessary formalism where it is unhelpful.

🔧 Technical approach and key choices

🔧 Riemann integral definition

Uses Darboux sums instead of tagged partitions.
The excerpt states this approach is "far more appropriate for a course of this level."
This choice allows fitting a course like UIUC 444 within one semester while still covering interchange of limits and Picard's theorem.

🎯 Capstone theorem

The book builds toward Picard's theorem on existence and uniqueness of solutions to ordinary differential equations.
The excerpt describes this as "a wonderful example that uses many results proved in the book."
Two paths to Picard's theorem:
- Slower courses: reach it through interchange of limits (Chapter 6).
- Advanced courses: prove it using the fixed point theorem in metric spaces (Chapter 7).

📊 Chapter structure

The table of contents shows seven main chapters:

Real Numbers (completeness property emphasized)
Sequences (limit superior/inferior, Bolzano–Weierstrass, Cauchy sequences, series)
Continuous Functions (limits, continuity, extreme/intermediate value theorems, uniform continuity)
The Derivative (mean value theorem, Taylor's theorem, inverse function theorem)
The Riemann Integral (properties, fundamental theorem, logarithm/exponential, improper integrals)
Sequences of Functions (pointwise/uniform convergence, interchange of limits, Picard's theorem)
Metric Spaces (open/closed sets, completeness, compactness, fixed point theorem)

📚 Relationship to other texts

📚 Inspiration and comparisons

The excerpt mentions several other analysis books:

Rudin's Principles of Mathematical Analysis ("baby Rudin"): The author's favorite; described as "a bit more advanced and ambitious than this present course." The author took "a lot of inspiration and ideas from Rudin."
Bartle and Sherbert's Introduction to Real Analysis: The standard book at UIUC; this book's structure "somewhat follows" the UIUC Math 444 syllabus, so it has "some similarities" with Bartle and Sherbert.
Rosenlicht's Introduction to Analysis: Described as "an inexpensive and somewhat simpler alternative to Rudin."
Trench's Introduction to Real Analysis: A freely downloadable option.

🔍 Major difference from Bartle and Sherbert

This book defines the Riemann integral using Darboux sums, not tagged partitions (as Bartle and Sherbert does).
This is presented as a key pedagogical improvement for this level of course.

🙏 Acknowledgments and history

🙏 Book origins

Started as lecture notes for Math 444 at University of Illinois at Urbana-Champaign (UIUC) in fall 2009.
Metric space chapter (Chapter 7) was added for Math 521 at University of Wisconsin–Madison (UW).
Volume II was added for Math 4143/4153 at Oklahoma State University (OSU).

🤝 Contributors

The excerpt lists numerous people who provided feedback:

Instructors who taught with the book and gave feedback.
Frank Beatrous wrote "University of Pittsburgh version extensions" that inspired recent additions.
Students who found errors, typos, and made suggestions.

About Analysis

0.2 About analysis

🧭 Overview

🧠 One-sentence thesis

Analysis is the branch of mathematics that proves statements through inequalities and limits rather than direct equalities, providing the rigorous foundation that explains why calculus is true.

📌 Key points (3–5)

What analysis is: the branch of mathematics dealing with inequalities and limits, not direct equalities like algebra.
Why it matters: understanding analysis means knowing why calculus works, not just how to use it—essential for teaching or working independently.
Core method: in analysis, we prove equalities by proving two inequalities through estimation (e.g., to show x = 0, prove x ≤ 0 and x ≥ 0).
Common confusion: analysis vs algebra—algebra proves equalities directly; analysis proves inequalities by estimating with arbitrarily small values (epsilon).
Course scope: builds from the real number system and completeness through sequences, limits, functions, derivatives, integrals, and metric spaces.

🔍 What analysis is and why it exists

🔍 The branch and its focus

Analysis: the branch of mathematics that deals with inequalities and limits.

This course covers the most basic concepts in analysis.
The goal is to provide rigorous proofs and set a firm foundation for calculus of one and several variables.

🎯 Why you need it

Calculus prepares you to use mathematics but does not tell you why it is true.
To use or teach mathematics effectively, you must know why things are true, not just what is true.
This course shows why calculus is true and gives a rigorous understanding of limits, derivatives, and integrals.

🔧 The mechanic analogy

The excerpt uses an auto mechanic analogy to illustrate the difference:

Type of knowledge	What you can do	What you cannot do
Procedural only (like changing oil, fixing headlights)	Simple, memorized tasks	Diagnose and fix new problems independently
Conceptual (understanding how the engine works)	Work independently, solve novel problems	—

Example: A high school teacher who does not understand the definition of the Riemann integral or derivative may not answer all students' questions properly.
The author remembers nonsensical statements from a calculus teacher who could "do" textbook problems but did not understand the concept of the limit.

🧮 How analysis differs from algebra

🧮 Direct equality vs estimation

The most important difference:

Field	What you prove	How you prove it
Algebra	Equalities directly	Show one object equals another object
Analysis	Inequalities	Estimate; use arbitrarily small values

🔬 The epsilon method

The excerpt gives a key illustrative statement:

Let x be a real number. If x < ε is true for all real numbers ε > 0, then x ≤ 0.

This statement captures the general idea of analysis.
To prove an equality like x = 0 in analysis:
- Prove two inequalities: x ≤ 0 and x ≥ 0.
- To prove x ≤ 0, show x < ε for all positive ε.
- To prove x ≥ 0, show x > −ε for all positive ε.

🧩 Why this approach works

In analysis, we cannot always prove equalities directly.
Instead, we "trap" a value by showing it is smaller than every positive number (no matter how small).
This estimation technique is the foundation of all proofs in analysis.

📚 Course structure and scope

📚 What the course covers

The course builds through the following topics in order:

Real number system: most importantly its completeness property, which is the basis for everything that follows.
Sequences: the simplest form of a limit.
Functions of one variable: continuity and the derivative.
Riemann integral: definition and the fundamental theorem of calculus.
Sequences of functions: interchange of limits.
Metric spaces: an introduction.

🏷️ Terminology note

The term "real analysis" is a bit of a misnomer; the author prefers simply "analysis."
Complex analysis builds on this material rather than being distinct.
More advanced real analysis courses often discuss complex numbers.
The nomenclature is likely historical baggage.

Basic set theory

0.3 Basic set theory

🧭 Overview

🧠 One-sentence thesis

Set theory provides the foundational language for modern analysis by defining how collections of objects (sets) are constructed, compared, and manipulated through operations like union, intersection, and complement, with special attention to the surprising properties of infinite sets.

📌 Key points (3–5)

What sets are: collections of objects (elements) defined only by their membership; two sets with identical members are the same set.
How to build new sets: use operations (union, intersection, complement) and set-builder notation to create subsets satisfying specific properties.
Proof technique for natural numbers: induction allows proving statements for all natural numbers by establishing a base case and an inductive step.
Cardinality distinguishes infinite sets: not all infinite sets have the same "size"—some (like the natural numbers) are countably infinite, while others (like the power set of naturals) are uncountable.
Common confusion: a proper subset can have the same cardinality as the original set (e.g., even naturals and all naturals)—this characterizes infinite sets.

🧱 Fundamental definitions

🧱 What is a set

Set: a collection of objects called elements or members; a set with no objects is the empty set, denoted ∅.

A set is defined only by its members; two sets with the same members are identical.
Notation: x ∈ S means x belongs to S; x ∉ S means x does not belong to S.
Example: S = {0, 1, 2} contains exactly three elements.

🧱 Subsets and equality

Subset: A set A is a subset of B (written A ⊂ B) if every element of A is also in B.

Equality of sets: A = B if and only if A ⊂ B and B ⊂ A (they contain exactly the same elements).

Proper subset: A ⊊ B means A ⊂ B but A ≠ B.

Don't confuse: subset (⊂) allows equality, proper subset (⊊) excludes it.
Example: T = {0, 2} is a proper subset of S = {0, 1, 2}.

🧱 Set-builder notation

The notation {x ∈ A : P(x)} defines the subset of A containing all elements satisfying property P(x).

Sometimes abbreviated as {x : P(x)} when A is understood from context.
Example: {x ∈ S : x ≠ 2} yields {0, 1} when S = {0, 1, 2}.

🔧 Set operations

🔧 Union and intersection

Union: A ∪ B = {x : x ∈ A or x ∈ B}

Intersection: A ∩ B = {x : x ∈ A and x ∈ B}

Union collects all elements in either set.
Intersection collects only elements in both sets.
Disjoint sets: A and B are disjoint if A ∩ B = ∅.

🔧 Complement and difference

Complement of B relative to A (set difference): A \ B = {x : x ∈ A and x ∉ B}

Complement of B: B^c means the universe minus B (when the universe is understood from context).

If B ⊂ ℝ, then B^c typically means ℝ \ B.
Example: if the universe is {0, 1, 2} and B = {1}, then B^c = {0, 2}.

🔧 DeMorgan's laws

Theorem (DeMorgan): For sets A, B, C:

(B ∪ C)^c = B^c ∩ C^c
(B ∩ C)^c = B^c ∪ C^c

Or more generally:

A \ (B ∪ C) = (A \ B) ∩ (A \ C)
A \ (B ∩ C) = (A \ B) ∪ (A \ C)

How to prove set equality: show both directions—if x is in the left side, then x is in the right side, and vice versa.

🔧 Infinite unions and intersections

For a collection of sets {A₁, A₂, A₃, ...}:

Union from n=1 to ∞ of Aₙ = {x : x ∈ Aₙ for some n ∈ ℕ}
Intersection from n=1 to ∞ of Aₙ = {x : x ∈ Aₙ for all n ∈ ℕ}

More generally, for an index set I and sets A_λ for each λ ∈ I:

Union over λ ∈ I of A_λ = {x : x ∈ A_λ for some λ ∈ I}
Intersection over λ ∈ I of A_λ = {x : x ∈ A_λ for all λ ∈ I}

Don't confuse: order matters when mixing unions and intersections—you cannot generally swap them without proof.

🔁 Induction

🔁 Principle of induction

Well ordering property of ℕ: Every nonempty subset of ℕ has a least (smallest) element.

Theorem (Principle of induction): Let P(n) be a statement depending on n ∈ ℕ. If:

(basis) P(1) is true, and
(induction step) if P(n) is true, then P(n+1) is true,

then P(n) is true for all n ∈ ℕ.

How it works:

The induction hypothesis is the assumption that P(n) is true.
Use this assumption to prove P(n+1).
Example: to prove 2^(n-1) ≤ n! for all n ∈ ℕ, verify for n=1, then assume it holds for n and multiply both sides by 2 to show it holds for n+1.

🔁 Strong induction

Theorem (Strong induction): Let P(n) be a statement. If:

(basis) P(1) is true, and
(induction step) if P(k) is true for all k = 1, 2, ..., n, then P(n+1) is true,

then P(n) is true for all n ∈ ℕ.

Strong induction assumes P holds for all previous values, not just the immediately preceding one.
It is equivalent to ordinary induction.

📐 Functions

📐 Definition of function

Cartesian product: A × B = {(x, y) : x ∈ A, y ∈ B}

Function: f : A → B is a subset f of A × B such that for each x ∈ A, there exists a unique y ∈ B with (x, y) ∈ f; we write f(x) = y.

The domain of f is A (all inputs).
The codomain of f is B (the target set).
The range R(f) = {y ∈ B : there exists x ∈ A such that f(x) = y} (actual outputs).
Don't confuse: range ⊂ codomain, but they may not be equal.

📐 Images and inverse images

Direct image of C ⊂ A: f(C) = {f(x) ∈ B : x ∈ C}

Inverse image of D ⊂ B: f⁻¹(D) = {x ∈ A : f(x) ∈ D}

Key properties:

For inverse images: f⁻¹(C ∪ D) = f⁻¹(C) ∪ f⁻¹(D) and f⁻¹(C ∩ D) = f⁻¹(C) ∩ f⁻¹(D)
For inverse images: f⁻¹(C^c) = (f⁻¹(C))^c
For direct images: f(C ∪ D) = f(C) ∪ f(D), but f(C ∩ D) ⊂ f(C) ∩ f(D) (only subset, not always equal)

📐 Types of functions

Injective (one-to-one): f(x₁) = f(x₂) implies x₁ = x₂; equivalently, each y has at most one preimage.

Surjective (onto): f(A) = B; equivalently, every element of B is the image of some element in A.

Bijective: both injective and surjective; establishes a perfect one-to-one correspondence between A and B.

When f is bijective, f⁻¹ can be treated as a function from B to A (the inverse function).
Example: f(x) = x³ from ℝ to ℝ is bijective with inverse f⁻¹(x) = ³√x.

📐 Composition

Composition: given f : A → B and g : B → C, the composition g ∘ f : A → C is defined by (g ∘ f)(x) = g(f(x)).

Composition of injections is injective; composition of surjections is surjective; composition of bijections is bijective.
Example: if f(x) = x³ and g(y) = sin(y), then (g ∘ f)(x) = sin(x³).

🔗 Relations and equivalence

🔗 Binary relations

Binary relation on A: a subset R ⊂ A × A; we write a R b instead of (a, b) ∈ R.

Any subset of A × A defines a relation.
Example: on A = {1, 2, 3}, the relation '<' corresponds to {(1,2), (1,3), (2,3)}.

🔗 Equivalence relations

Equivalence relation: a relation R that is reflexive (a R a for all a), symmetric (a R b implies b R a), and transitive (a R b and b R c implies a R c).

Property	Meaning
Reflexive	Every element is related to itself
Symmetric	If a relates to b, then b relates to a
Transitive	If a relates to b and b relates to c, then a relates to c

🔗 Equivalence classes

Equivalence class of a: [a] = {x ∈ A : a R x}

Key fact: If R is an equivalence relation on A, then:

Every element a ∈ A is in exactly one equivalence class.
a R b if and only if [a] = [b].

Example: rational numbers can be defined as equivalence classes of pairs (a, b) ∈ ℤ × ℕ under the relation (a, b) ∼ (c, d) whenever ad = bc; the class [(a, b)] is written as a/b.

🔢 Cardinality

🔢 Same cardinality

Same cardinality: sets A and B have the same cardinality if there exists a bijection f : A → B; we write |A| = |B|.

The existence of a bijection is an equivalence relation (reflexive, symmetric, transitive).
Example: {1, 2, 3} and {a, b, c} have the same cardinality.

🔢 Finite and infinite

Finite: A is finite if |A| = n for some n ∈ ℕ, or if A is empty (|A| = 0).

Infinite: A is infinite if it is not finite.

Characterization of infinite sets: A set is infinite if and only if it is in one-to-one correspondence with a proper subset of itself.

Example: the set of even natural numbers E = {2, 4, 6, ...} has the same cardinality as ℕ via the bijection f(n) = 2n.

🔢 Countable sets

Countably infinite: |A| = |ℕ|; the cardinality of ℕ is denoted ℵ₀ (aleph-naught).

Countable: A is finite or countably infinite.

Uncountable: A is not countable.

Examples of countable sets:

The set of even natural numbers (bijection: n ↦ 2n)
ℕ × ℕ (arrange pairs by sum: (1,1), (1,2), (2,1), (1,3), (2,2), (3,1), ...)
The set of rational numbers ℚ (list positive rationals by sum of numerator and denominator, skipping duplicates, then include 0 and negatives)

Key fact: If A ⊂ B and B is countable, then A is countable.

🔢 Ordering cardinalities

Cardinality ordering: |A| ≤ |B| if there exists an injection from A to B.

Strict inequality: |A| < |B| if |A| ≤ |B| but A and B do not have the same cardinality.

Cantor–Bernstein–Schröder theorem (stated without proof): A and B have the same cardinality if and only if |A| ≤ |B| and |B| ≤ |A|.

🔢 Power sets and uncountability

Power set: P(A) is the set of all subsets of A.

For finite A with |A| = n, we have |P(A)| = 2ⁿ.

Theorem (Cantor): For any set A, |A| < |P(A)|; in particular, there exists no surjection from A onto P(A).

Proof idea:

An injection exists: x ↦ {x}.
To show no surjection exists, suppose g : A → P(A) is any function.
Define B = {x ∈ A : x ∉ g(x)}.
If B = g(x₀) for some x₀, then x₀ ∈ B ⟺ x₀ ∉ g(x₀) = B, a contradiction.
Therefore B is not in the range of g.

Consequence: There exist progressively larger infinite sets: ℕ, P(ℕ), P(P(ℕ)), P(P(P(ℕ))), etc.

Don't confuse: "infinite" does not mean "all the same size"—uncountable sets are strictly larger than countable ones.

Basic Properties of Real Numbers

1.1 Basic properties

🧭 Overview

🧠 One-sentence thesis

The real numbers form a unique ordered field with the least-upper-bound property, which distinguishes them from the rationals and makes analysis possible.

📌 Key points (3–5)

Three defining properties of ℝ: it is an ordered set, a field (supporting addition and multiplication), and complete (has the least-upper-bound property).
Supremum and infimum: every nonempty subset bounded above has a least upper bound (supremum); bounded-below sets have greatest lower bounds (infimum).
Why ℚ is insufficient: the rationals lack the least-upper-bound property (e.g., {x ∈ ℚ : x² < 2} has no supremum in ℚ), so analysis requires ℝ.
Common confusion: an upper bound vs. the least upper bound—many numbers can be upper bounds, but the supremum is the smallest one; it need not belong to the set itself.
Ordered field behavior: multiplying an inequality by a positive number preserves direction; multiplying by a negative number reverses it.

📐 Ordered sets and bounds

📐 What an ordered set is

Ordered set: a set S together with a relation < satisfying (i) trichotomy: for all x, y in S, exactly one of x < y, x = y, or y < x holds; (ii) transitivity: if x < y and y < z, then x < z.

The relation < must pick exactly one outcome for any pair (no ties, no ambiguity).
Transitivity chains inequalities: if A < B and B < C, then A < C.
Examples: ℚ, ℕ, ℤ with the usual ordering; countries ordered by landmass; words in lexicographic (dictionary) order.

🔺 Upper and lower bounds

Bounded above: a subset E ⊂ S is bounded above if there exists b ∈ S such that x ≤ b for all x ∈ E; b is called an upper bound.

Bounded below: E is bounded below if there exists b ∈ S such that x ≥ b for all x ∈ E; b is a lower bound.

A set is bounded if it is both bounded above and below.
Many elements can serve as upper bounds; any number larger than all elements of E qualifies.
Example: E = {a, c} in S = {a, b, c, d, e} with a < b < c < d < e has upper bounds c, d, e.

🎯 Supremum and infimum

Supremum (least upper bound): an upper bound b₀ of E such that b₀ ≤ b for all upper bounds b of E. Notation: sup E.

Infimum (greatest lower bound): a lower bound b₀ of E such that b₀ ≥ b for all lower bounds b of E. Notation: inf E.

The supremum is the smallest of all upper bounds; the infimum is the largest of all lower bounds.
Uniqueness: if both b and b′ are suprema, then b ≤ b′ and b′ ≤ b, so b = b′.
Don't confuse: supremum need not be in E. Example: E = {x ∈ ℚ : x < 1} has sup E = 1, but 1 ∉ E. In contrast, G = {x ∈ ℚ : x ≤ 1} has sup G = 1 and 1 ∈ G.
A set with no upper bound (e.g., P = {x ∈ ℚ : x ≥ 0}) cannot have a supremum.

✅ Least-upper-bound property (completeness)

Least-upper-bound property: an ordered set S has this property if every nonempty subset E ⊂ S that is bounded above has a supremum in S.

Also called the completeness property or Dedekind completeness.
ℝ has this property; ℚ does not.
Example showing ℚ is incomplete: {x ∈ ℚ : x² < 2} is bounded above in ℚ but has no supremum in ℚ (the supremum is √2, which is irrational).
Proof that √2 is irrational: assume x = m/n in lowest terms with x² = 2. Then m² = 2n², so m is even; write m = 2k, then 4k² = 2n², so n² = 2k², making n even—contradiction since m/n was in lowest terms.

🧮 Fields and algebraic structure

🧮 What a field is

Field: a set F with addition (+) and multiplication (·) satisfying axioms (A1)–(A5) for addition, (M1)–(M5) for multiplication, and (D) the distributive law.

Key axioms:

(A1) Closure under addition; (A2) commutativity; (A3) associativity; (A4) additive identity 0; (A5) additive inverses −x.
(M1) Closure under multiplication; (M2) commutativity; (M3) associativity; (M4) multiplicative identity 1 ≠ 0; (M5) multiplicative inverses 1/x for x ≠ 0.
(D) Distributive law: x(y + z) = xy + xz.
ℚ is a field; ℤ is not (fails M5: no multiplicative inverses, e.g., no integer x with 2x = 1).
Basic consequences: 0·x = 0 for all x (proved using A4, D, M2, A5, A2, A3).

🔗 Ordered fields

Ordered field: a field F that is also an ordered set such that (i) x < y implies x + z < y + z for all z; (ii) x > 0 and y > 0 implies xy > 0.

Terminology: x is positive if x > 0, negative if x < 0, nonnegative if x ≥ 0, nonpositive if x ≤ 0.
ℚ with standard ordering is an ordered field.
ℂ (complex numbers) is a field but cannot be made into an ordered field: in any ordered field, x² > 0 for all nonzero x, but in ℂ, i² = −1.

🔢 Properties of ordered fields (Proposition 1.1.8)

Key results for ordered field F and elements x, y, z, w:

Property	Statement	Intuition
(i)	If x > 0, then −x < 0 (and vice versa)	Signs flip with negation
(ii)	If x > 0 and y < z, then xy < xz	Multiplying inequality by positive preserves direction
(iii)	If x < 0 and y < z, then xy > xz	Multiplying by negative reverses direction
(iv)	If x ≠ 0, then x² > 0	Squares are always positive
(v)	If 0 < x < y, then 0 < 1/y < 1/x	Reciprocals reverse order for positives
(vi)	If 0 < x < y, then x² < y²	Squaring preserves order for positives
(vii)	If x ≤ y and z ≤ w, then x + z ≤ y + w	Inequalities add

Property (iv) implies 1 > 0 in every ordered field.
Don't confuse: xy > 0 does not mean both x and y are positive; both could be negative (e.g., (−1)(−1) = 1 > 0). Proposition 1.1.9 clarifies: if xy > 0, then either both are positive or both are negative.

🔄 Infimum from supremum

Proposition 1.1.11: If F is an ordered field with the least-upper-bound property and A ⊂ F is nonempty and bounded below, then inf A exists.

Proof idea: define B = {−x : x ∈ A}. If b is a lower bound for A, then −b is an upper bound for B. Since F has the least-upper-bound property, c = sup B exists. Then −c is the greatest lower bound of A.
Consequence: in ℝ, every nonempty set bounded below has an infimum.

🌟 The real number system

🌟 Existence and uniqueness of ℝ (Theorem 1.2.1)

Theorem: There exists a unique ordered field ℝ with the least-upper-bound property such that ℚ ⊂ ℝ.

ℝ is characterized by three properties: (1) ordered field structure, (2) least-upper-bound property, (3) contains ℚ.
The excerpt assumes this theorem without proof (construction of ℝ from ℚ is omitted).
ℕ ⊂ ℚ ⊂ ℝ; by induction, n > 0 for all n ∈ ℕ.

🎯 Proving inequalities in analysis (Proposition 1.2.2)

Proposition: If x ∈ ℝ is such that x ≤ ε for all ε ∈ ℝ where ε > 0, then x ≤ 0.

Proof: If x > 0, then 0 < x/2 < x. Take ε = x/2 to get x ≤ x/2, a contradiction. Thus x ≤ 0.
How analysts use this: to prove x ≤ 0, show x ≤ ε for every positive ε.
Variants:
- If x ≥ 0 and x ≤ ε for all ε > 0, then x = 0.
- If |x| ≤ ε for all ε > 0, then x = 0.
- To prove x ≥ 0, show x ≥ −ε for all ε > 0.

📚 Selected exercises (context)

📚 Finite subsets always have sup and inf

Exercise 1.1.2: Let S be an ordered set and A ⊂ S a nonempty finite subset. Then A is bounded, and inf A and sup A exist and are in A.

Hint: use induction on the size of A.
Intuition: a finite set has a smallest and largest element.

📚 Supremum in subset vs superset

Exercise 1.1.4: Let S be an ordered set, B ⊂ S bounded, A ⊂ B nonempty. If all inf and sup exist, then inf B ≤ inf A ≤ sup A ≤ sup B.

Smaller sets have "tighter" bounds: removing elements can only raise the infimum and lower the supremum.

📚 Upper bound in the set

Exercise 1.1.5: Let S be an ordered set, A ⊂ S, and b an upper bound for A. If b ∈ A, then b = sup A.

If the upper bound is in the set, it must be the least upper bound (no smaller upper bound can exist).

📚 Ordered fields contain countably infinite sets

Exercise 1.1.12: Prove that any ordered field must contain a countably infinite set.

Hint: consider {1, 1+1, 1+1+1, ...} (the natural numbers embedded in the field).

The Set of Real Numbers

1.2 The set of real numbers

🧭 Overview

🧠 One-sentence thesis

The real numbers form a unique ordered field with the least-upper-bound property that extends the rationals and enables analysts to take suprema freely, prove inequalities using arbitrarily small positive numbers, and rely on the Archimedean property to find natural numbers and rationals densely distributed throughout the real line.

📌 Key points (3–5)

Existence and uniqueness: The real numbers ℝ exist as the unique ordered field containing ℚ with the least-upper-bound property.
Proving inequalities: Analysts prove x ≤ 0 by showing x ≤ ε for all ε > 0; this technique is fundamental to real analysis.
Archimedean property: For any positive x and any y, there exists a natural number n such that nx > y (equivalently, ℕ is unbounded above).
Density of rationals: Between any two distinct real numbers lies a rational number; irrationals also exist densely.
Common confusion: Strict inequalities (x < y for all pairs) do not always yield strict inequalities for suprema/infima—sup A ≤ inf B can hold even when every element of A is strictly less than every element of B.

🏗️ Construction and fundamental properties

🏗️ Existence theorem

Theorem: There exists a unique ordered field ℝ with the least-upper-bound property such that ℚ ⊂ ℝ.

The excerpt does not construct ℝ from ℚ; it simply asserts existence.
ℚ itself is an ordered field, but lacks the least-upper-bound property.
The natural numbers ℕ ⊂ ℚ ⊂ ℝ, and n > 0 for all n ∈ ℕ (provable by induction).

🔢 Irrational numbers exist

The set ℝ \ ℚ (irrational numbers) is nonempty.
Example: The square root of 2 exists as a positive real number r such that r² = 2, denoted √2.
Proof strategy: Define A = {x ∈ ℝ : x² < 2}, show A is nonempty and bounded above, then let r = sup A and prove r² = 2 by showing both r² ≥ 2 and r² ≤ 2.
The proof uses the least-upper-bound property crucially; no such supremum exists in ℚ.
Generalization: For any x > 0 and n ∈ ℕ, there exists a unique positive real number r such that rⁿ = x (denoted x^(1/n)).

🔍 The analyst's inequality technique

🔍 Core proposition for proving inequalities

Proposition: If x ∈ ℝ is such that x ≤ ε for all ε ∈ ℝ where ε > 0, then x ≤ 0.

Why it works: If x > 0, then ε = x/2 satisfies 0 < x/2 < x, contradicting x ≤ ε for all ε > 0.
This is how analysts prove nonstrict inequalities in practice.

🔍 Common variants

If x ≥ 0 and x ≤ ε for all ε > 0, then x = 0.
If |x| ≤ ε for all ε > 0, then x = 0 (uses absolute value, defined later in the text).
To prove x ≥ 0, show x ≥ −ε for all ε > 0.

🔍 Key insight

Between any two real numbers a < b, there exists another real number c such that a < c < b.
Infinitely many such c exist; one example is c = (a + b)/2.
This density underlies the inequality technique.

📏 Archimedean property and density

📏 Archimedean property

Archimedean property: If x, y ∈ ℝ and x > 0, then there exists an n ∈ ℕ such that nx > y.

Equivalently: ℕ is not bounded above in ℝ.
Proof idea: Suppose ℕ is bounded above with supremum b. Then b − 1 is not an upper bound, so there exists m ∈ ℕ with m > b − 1. But then m + 1 > b, contradicting b being an upper bound.
Named after Archimedes of Syracuse (c. 287 BC – c. 212 BC).

📏 Density of rationals in ℝ

Density: If x, y ∈ ℝ and x < y, then there exists an r ∈ ℚ such that x < r < y.

Proof strategy (for x ≥ 0):
1. Find n ∈ ℕ such that y − x > 1/n (using Archimedean property).
2. Let m be the least natural number such that m > nx (using well-ordering of ℕ).
3. Then m − 1 ≤ nx, so m ≤ nx + 1 < ny, giving x < m/n < y.
For x < 0: if y > 0 take r = 0; if y ≤ 0 apply the result to −y and −x, then negate.

📏 Corollary on 1/n

inf{1/n : n ∈ ℕ} = 0.
Why: 0 is a lower bound; for any a > 0, the Archimedean property gives n such that na > 1, i.e., a > 1/n, so a cannot be a lower bound.
Don't confuse: 0 is the infimum but 0 ∉ {1/n : n ∈ ℕ}.

⚙️ Operations with suprema and infima

⚙️ Addition and scalar multiplication

For nonempty A ⊂ ℝ and x ∈ ℝ, define:

x + A = {x + y : y ∈ A}
xA = {xy : y ∈ A}

Key results (when bounds exist):

sup(x + A) = x + sup A
inf(x + A) = x + inf A
If x > 0: sup(xA) = x(sup A) and inf(xA) = x(inf A)
If x < 0: sup(xA) = x(inf A) and inf(xA) = x(sup A)

Don't confuse: Multiplying by a negative number switches supremum and infimum.

⚙️ Comparing two sets

Proposition: Let A, B ⊂ ℝ be nonempty. If x ≤ y whenever x ∈ A and y ∈ B, then A is bounded above, B is bounded below, and sup A ≤ inf B.

Proof idea: Any x ∈ A is a lower bound for B, so x ≤ inf B for all x ∈ A, making inf B an upper bound for A.
Critical subtlety: Even if x < y (strict) for all x ∈ A and y ∈ B, we still only get sup A ≤ inf B (nonstrict).
Example: A = {0}, B = {1/n : n ∈ ℕ}. Then 0 < 1/n for all n, but sup A = 0 = inf B.

⚙️ Approximating suprema

Proposition: If S ⊂ ℝ is nonempty and bounded above, then for every ε > 0 there exists an x ∈ S such that (sup S) − ε < x ≤ sup S.

This says the supremum can be approached arbitrarily closely by elements of S.
A similar result holds for infima.

🎯 Extended reals and extrema

🎯 Extended real numbers

To handle unbounded or empty sets, define:

If A is empty: sup A = −∞ and inf A = ∞
If A is not bounded above: sup A = ∞
If A is not bounded below: inf A = −∞

The extended real numbers ℝ* = ℝ ∪ {−∞, ∞} with ordering:

−∞ < x < ∞ for all x ∈ ℝ

Warning: ℝ* is not a field; operations like ∞ − ∞, 0·(±∞), and ±∞/±∞ are undefined.

🎯 Maximum and minimum

Maximum: When sup A ∈ A, write max A instead of sup A.
Minimum: When inf A ∈ A, write min A instead of inf A.
Example: max{1, 2.4, π, 100} = 100 and min{1, 2.4, π, 100} = 1.
Finite sets always have maxima and minima.
Using max/min emphasizes that the bound is attained in the set itself.

Term	Condition	Notation	In the set?
Supremum	Bounded above	sup A	Maybe
Maximum	sup A ∈ A	max A	Yes
Infimum	Bounded below	inf A	Maybe
Minimum	inf A ∈ A	min A	Yes

Absolute value and bounded functions

1.3 Absolute value and bounded functions

🧭 Overview

🧠 One-sentence thesis

Absolute value measures the "size" of a real number and enables us to define bounded functions, whose outputs stay within a fixed range, and to prove inequalities like the triangle inequality that control how absolute values behave under addition.

📌 Key points (3–5)

What absolute value measures: the "size" of a real number, defined piecewise as x when x ≥ 0 and −x when x < 0.
Triangle inequality: the absolute value of a sum is at most the sum of the absolute values; this is used repeatedly to find bounds.
Bounded functions: a function is bounded if there exists a number M such that the absolute value of every output is at most M.
Supremum and infimum: for a bounded function, sup and inf denote the least upper bound and greatest lower bound of the range.
Common confusion: when comparing sup and inf of two functions, the inequality sup f(x) ≤ inf g(x) requires the stronger hypothesis f(x) ≤ g(y) for all x and y, not just f(x) ≤ g(x) for each x.

📏 Definition and basic properties of absolute value

📏 Formal definition

Absolute value |x| is defined as:

x if x ≥ 0

−x if x < 0

Think of |x| as the "size" or distance from zero.
It strips away the sign and always returns a non-negative number.

✅ Core properties (Proposition 1.3.1)

The excerpt lists six key properties:

Property	Statement	Intuition
(i) Non-negativity	\|x\| ≥ 0; \|x\| = 0 iff x = 0	Size is never negative; only zero has size zero
(ii) Symmetry	\|−x\| = \|x\|	Flipping sign doesn't change size
(iii) Multiplicativity	\|xy\| = \|x\| \|y\|	Size of a product is product of sizes
(iv) Squares	\|x\|² = x²	Squaring removes sign, same as absolute value then squaring
(v) Bounding	\|x\| ≤ y iff −y ≤ x ≤ y	Saying "size at most y" is the same as "x is between −y and y"
(vi) Sandwich	−\|x\| ≤ x ≤ \|x\|	Every number lies between its negative size and its size

Why these matter: Properties (v) and (vi) are used constantly to manipulate inequalities involving absolute values.

🔍 How property (v) works

Forward direction: If |x| ≤ y, then either x ≥ 0 (so x ≤ y and −y ≤ 0 ≤ x) or x < 0 (so −x ≤ y, hence x ≥ −y, and y ≥ 0 > x).
Reverse direction: If −y ≤ x ≤ y, then when x ≥ 0 we have |x| = x ≤ y; when x < 0 we have |x| = −x ≤ y (from −y ≤ x).
Example: |x| ≤ 3 means −3 ≤ x ≤ 3.

🔺 Triangle inequality and its variants

🔺 Standard triangle inequality (Proposition 1.3.2)

|x + y| ≤ |x| + |y| for all x, y in the reals.

Proof idea:

From property (vi), we know −|x| ≤ x ≤ |x| and −|y| ≤ y ≤ |y|.
Add these two inequalities: −(|x| + |y|) ≤ x + y ≤ |x| + |y|.
Apply property (v) with the sum x + y to conclude |x + y| ≤ |x| + |y|.

Why "triangle": In geometry, the length of one side of a triangle is at most the sum of the lengths of the other two sides.

🔄 Reverse triangle inequality (Corollary 1.3.3)

The excerpt gives two additional forms:

(i) Reverse triangle inequality: (|a| − |b|) ≤ |a − b|
(The difference of sizes is at most the size of the difference.)
(ii) Alternative form: |a − b| ≤ |a| + |b|
(Replace y with −y in the standard inequality; note |−y| = |y|.)

Proof sketch for (i):

Start with |a| = |(a − b) + b| ≤ |a − b| + |b|, so |a| − |b| ≤ |a − b|.
Swap a and b to get |b| − |a| ≤ |b − a| = |a − b|.
Combine using property (v): the absolute value of (|a| − |b|) is at most |a − b|.

📐 Extended triangle inequality (Corollary 1.3.4)

|x₁ + x₂ + ⋯ + xₙ| ≤ |x₁| + |x₂| + ⋯ + |xₙ|

Proof by induction:

Base case n = 1 is trivial; n = 2 is the standard triangle inequality.
Induction step: assume true for n terms; then for n + 1 terms, write
|x₁ + ⋯ + xₙ + xₙ₊₁| ≤ |x₁ + ⋯ + xₙ| + |xₙ₊₁| (standard inequality)
≤ |x₁| + ⋯ + |xₙ| + |xₙ₊₁| (induction hypothesis).

Use case: Finding a bound M for a complicated expression by bounding each term separately.

🧮 Example: bounding a polynomial (Example 1.3.5)

Problem: Find M such that |x² − 9x + 1| ≤ M for all −1 ≤ x ≤ 5.

Solution:

Apply the triangle inequality: |x² − 9x + 1| ≤ |x²| + |9x| + |1| = |x|² + 9|x| + 1.
The expression |x|² + 9|x| + 1 is largest when |x| is largest (because it increases with |x|).
On [−1, 5], the largest |x| is 5.
So M = 5² + 9(5) + 1 = 71 works.

Don't confuse: This M is not the smallest possible bound (the actual maximum is 11), but the problem only asks for some M that works.

📦 Bounded functions

📦 Definition of bounded (Definition 1.3.6)

A function f : D → ℝ is bounded if there exists a number M such that |f(x)| ≤ M for all x in D.

"Bounded" means the outputs stay within a fixed range [−M, M].
The same formula can be bounded on one domain and unbounded on another.
Example: x² − 9x + 1 is bounded on [−1, 5] but unbounded on all of ℝ.

🔝 Supremum and infimum notation

For a function f : D → ℝ, the excerpt defines:

sup over x in D of f(x) = sup f(D) (least upper bound of the range)
inf over x in D of f(x) = inf f(D) (greatest lower bound of the range)

Notation shorthand: The excerpt writes "sup for −1 ≤ x ≤ 5 of (x² − 9x + 1)" to mean the supremum over that interval.

Example: For f(x) = x² − 9x + 1 on [−1, 5]:

sup = 11 (attained somewhere in the interval)
inf = −77/4 (attained somewhere in the interval)

📊 Comparing suprema and infima (Proposition 1.3.7)

Hypothesis: f and g are bounded functions on D, and f(x) ≤ g(x) for all x in D.

Conclusion:

sup f(x) ≤ sup g(x)
inf f(x) ≤ inf g(x)

Proof idea for the sup inequality:

Let b be any upper bound for g(D).
Then for all x in D, f(x) ≤ g(x) ≤ b, so b is also an upper bound for f(D).
Take b = sup g(D) (the least upper bound for g).
Then sup g(D) is an upper bound for f(D), so sup f(D) ≤ sup g(D).

Key point: The x on the left and the x on the right are independent; you should think "sup over x of f(x) ≤ sup over y of g(y)."

⚠️ Common mistake: sup vs inf

Wrong conclusion: If f(x) ≤ g(x) for all x, then sup f(x) ≤ inf g(x).

Why it's wrong: The hypothesis f(x) ≤ g(x) only compares f and g at the same point x. To conclude sup f ≤ inf g, you need the stronger hypothesis:

f(x) ≤ g(y) for all x in D and all y in D.

Example of the mistake: Let D = [0, 1], f(x) = x, g(x) = x. Then f(x) ≤ g(x) for all x, but sup f = 1 and inf g = 0, so sup f > inf g.

Don't confuse: "f(x) ≤ g(x) for each x" versus "f(x) ≤ g(y) for every pair (x, y)."

🔗 Additional results on bounded functions

➕ Sums and scalar multiples (Exercise 1.3.7 and 1.3.8)

The excerpt mentions (in exercises) that:

If f and g are bounded, then f + g is bounded.
If f is bounded and α is a real number, then αf is bounded.
Inequality for sums: sup(f + g) ≤ sup f + sup g, and inf(f + g) ≥ inf f + inf g.
These inequalities can be strict (equality does not always hold).

🔀 Unbounded combinations

From Exercise 1.3.9:

If f + g and g are bounded, then f is bounded.
If f is bounded but g is unbounded, then f + g is unbounded.
Counterexample: Both f and g can be unbounded, yet f + g is bounded (e.g., f(x) = x and g(x) = −x on ℝ).

Don't confuse: "Both unbounded" does not imply "sum is unbounded."

Intervals and the size of ℝ

1.4 Intervals and the size of ℝ

🧭 Overview

🧠 One-sentence thesis

All intervals—bounded or unbounded, open or closed—have the same cardinality from a set-theoretic perspective, and the real numbers ℝ are uncountable, meaning there are far more irrational numbers than rational numbers.

📌 Key points (3–5)

Interval notation: Intervals are subsets of ℝ defined by endpoints and whether those endpoints are included (closed) or excluded (open).
Surprising cardinality result: All intervals have the same cardinality—even bounded intervals like (0, 1) have the same cardinality as the entire real line ℝ.
ℝ is uncountable: Cantor's theorem proves that ℝ cannot be listed as a sequence, unlike the rational numbers ℚ which are countable.
Common confusion: Cardinality (set size) vs. measure (length)—intervals [0, 1] and [0, 2] have the same cardinality but different "sizes" in the intuitive sense; proper measurement requires additional machinery.
Consequence for irrationals: Since ℚ is countable and ℝ is uncountable, there are vastly more irrational numbers than rational numbers.

📏 Interval definitions and types

📏 Bounded intervals

For real numbers a < b, intervals are defined by which endpoints are included:

Closed interval [a, b]: the set {x ∈ ℝ : a ≤ x ≤ b}
Open interval (a, b): the set {x ∈ ℝ : a < x < b}
Half-open intervals (a, b] and [a, b): the sets {x ∈ ℝ : a < x ≤ b} and {x ∈ ℝ : a ≤ x < b}

These are called bounded intervals because both endpoints are real numbers.
Every open interval (a, b) is nonempty; it always contains at least the midpoint (a + b)/2.

📏 Unbounded intervals

Intervals can extend infinitely in one or both directions:

[a, ∞) = {x ∈ ℝ : a ≤ x} and (a, ∞) = {x ∈ ℝ : a < x}
(−∞, b] = {x ∈ ℝ : x ≤ b} and (−∞, b) = {x ∈ ℝ : x < b}
(−∞, ∞) = ℝ itself

The excerpt notes that [a, ∞), (−∞, b], and ℝ are sometimes called unbounded closed intervals, while (a, ∞), (−∞, b), and ℝ are sometimes called unbounded open intervals.

🧩 Characterization of intervals

Proposition 1.4.1: A set I ⊂ ℝ is an interval if and only if I contains at least 2 points and for all a, c ∈ I and b ∈ ℝ such that a < b < c, we have b ∈ I.

In plain language: an interval is a set with at least two points that contains every point between any two of its members.
This captures the intuitive idea that intervals have "no gaps."
Note: In this book, intervals must have at least 2 points; single-point sets and the empty set are not considered intervals.

🔄 Surprising cardinality facts

🔄 All intervals have the same cardinality

The excerpt emphasizes an "unexpected fact": from a set-theoretic perspective, all intervals have the same cardinality.

Examples of bijections:

The map f(x) = 2x takes [0, 1] bijectively to [0, 2].
The map f(x) = tan(x) is a bijection from (−π/2, π/2) to ℝ.
Hence the bounded interval (−π/2, π/2) has the same cardinality as the entire real line ℝ.

Why this is surprising:

Intuitively, [0, 2] seems "twice as large" as [0, 1], but they have the same cardinality.
A bounded interval can have the same cardinality as an unbounded one.

🔄 Cardinality vs. measure

Don't confuse:

Cardinality measures how many elements are in a set (via bijections).
Measure (length) captures the intuitive "size" difference between [0, 1] and [0, 2].

The excerpt notes: "there does exist a way to measure the 'size' of subsets of real numbers that 'sees' the difference between [0, 1] and [0, 2]. However, its proper definition requires much more machinery than we have right now."

🔄 Constructing bijections between closed and open intervals

It is "not completely straightforward" to construct a bijection from [0, 1] to (0, 1), but it is possible.
The exercises explore explicit constructions of such bijections.

🔢 Uncountability of ℝ

🔢 The main result

Theorem 1.4.2 (Cantor): ℝ is uncountable.

This means there is no bijection from ℕ to ℝ; the real numbers cannot be listed as a sequence.
The excerpt notes that the cardinality of ℝ is the same as the cardinality of P(ℕ) (the power set of natural numbers), though this is not proved here.

🔢 Proof strategy (Cantor's 1874 version)

The proof is by contrapositive rather than contradiction.

Setup:

Assume X ⊂ ℝ is a countably infinite subset such that for every pair a < b, there exists x ∈ X with a < x < b (X is "dense" in ℝ).
If ℝ were countable, we could take X = ℝ.
The proof shows X must be a proper subset of ℝ, so X cannot equal ℝ, and thus ℝ is uncountable.

Construction:

Write X as a sequence x₁, x₂, x₃, ... (since X is countable).
Inductively construct two sequences a₁, a₂, a₃, ... and b₁, b₂, b₃, ... such that:
- a₁ < a₂ < ... and ... < b₂ < b₁
- Each interval (aₙ, bₙ) excludes x₁, x₂, ..., xₙ
Define y = sup{aₙ : n ∈ ℕ}.

Key steps:

Show aₙ < bₘ for all n, m ∈ ℕ.
The number y cannot be in the set A = {aₙ} (if y = aₙ, then y < aₙ₊₁, contradiction).
Similarly, y cannot be in B = {bₙ}.
Therefore aₙ < y < bₙ for all n, so y ∈ (aₙ, bₙ) for every n.
By construction, xₙ ∉ (aₙ, bₙ), so y ≠ xₙ for all n.
Thus y ∉ X, proving X is a proper subset of ℝ.

🔢 Consequences for rational and irrational numbers

Before this theorem:

The excerpt notes that ℚ (rational numbers) is countable.
We know irrational numbers exist (ℝ \ ℚ is nonempty).

After this theorem:

Since ℚ is countable and ℝ is uncountable, the set of irrational numbers must be uncountable.
The excerpt states: "there are a lot more irrational numbers than rational numbers."

Example from exercises:

Exercise 1.4.9 asks to prove that algebraic numbers (roots of polynomials with integer coefficients) are countable, which implies that transcendental (non-algebraic) numbers must exist and be uncountable.

📐 Additional properties and exercises

📐 Intersections and unions of intervals

From Exercise 1.4.6:

Every closed interval [a, b] is the intersection of countably many open intervals.
Every open interval (a, b) is a countable union of closed intervals.
An intersection of any family of bounded closed intervals is either empty, a single point, or a bounded closed interval.

📐 Disjoint open intervals

From Exercise 1.4.7:

If S is a set of disjoint open intervals in ℝ (no two intervals overlap), then S must be countable.
This means you cannot have uncountably many non-overlapping open intervals in ℝ.

📐 Explicit bijection constructions

The exercises explore constructing explicit bijections:

From (a, b] to (0, 1]
From [0, 1] to (0, 1) (marked as "Hard")
From (0, 1] to (0, 1) (marked as "Hard")
Using a given bijection f : [0, 1] → (0, 1) to construct a bijection from [−1, 1] to ℝ

Hint for (0, 1] to (0, 1):

Map (1/2, 1] to (0, 1/2], then (1/4, 1/2] to (1/2, 3/4], and so on.
Write down an explicit algorithm and prove it is a bijection.

Decimal representation of the reals

1.5 Decimal representation of the reals

🧭 Overview

🧠 One-sentence thesis

Every real number in (0, 1] can be uniquely represented by an infinite decimal expansion satisfying a strict inequality condition, and this representation provides a powerful tool for proving the uncountability of the reals.

📌 Key points (3–5)

What decimal representation means: an infinite sequence of digits 0.d₁d₂d₃... represents a real number x as the supremum of its finite truncations.
Uniqueness condition: requiring that every truncation Dₙ is strictly less than x ensures a unique representation (avoiding 0.4999... vs 0.5000... ambiguity).
Rational vs irrational: rational numbers have eventually repeating decimal digits; non-repeating decimals correspond to irrational numbers.
Common confusion: not every real has a finite decimal representation (like 1/3 or √2), but every real has an infinite decimal representation.
Cantor diagonalization: constructing a number whose nth digit differs from the nth digit of the nth number in any countable list proves (0, 1] is uncountable.

🔢 Building decimal representations

🔢 Finite decimals for integers and some rationals

Positive integers n can be written as a finite sum: n = dₖ·10^K + ... + d₁·10 + d₀, where each dⱼ is a digit (0–9).
We write this as the digit sequence dₖdₖ₋₁...d₁d₀.
Some rational numbers also have finite decimal representations: x = dₖ·10^K + ... + d₀ + d₋₁·10^(−1) + ... + d₋ₘ·10^(−M).
Example: 0.25 = 2·10^(−1) + 5·10^(−2).
But: not every rational has a finite representation—1/3 does not, and neither does any irrational like √2.

🔢 Infinite decimals for all reals

To represent all real numbers in (0, 1], we allow infinitely many digits: 0.d₁d₂d₃...
Each digit dⱼ corresponds to position j in the natural numbers.

Truncation to n digits: Dₙ = d₁/10 + d₂/10² + d₃/10³ + ... + dₙ/10ⁿ.

The infinite decimal 0.d₁d₂d₃... represents the real number x = sup{Dₙ : n ∈ ℕ}.
In words: x is the least upper bound of all finite truncations.

🎯 Existence and uniqueness (Proposition 1.5.1)

🎯 Every digit sequence represents a unique number

Claim (i): every infinite sequence of digits 0.d₁d₂d₃... represents a unique x ∈ [0, 1].
Why: the truncations Dₙ are bounded above by 1 (using the geometric series formula for the worst case, all 9s), so their supremum exists and lies in [0, 1].
Inequality: Dₙ ≤ x ≤ Dₙ + 1/10ⁿ for all n.
The difference between any two truncations Dₘ − Dₙ (m > n) is at most 1/10ⁿ, so the sequence "converges" to x.

🎯 Every number has a representation

Claim (ii): for every x ∈ (0, 1], there exists an infinite decimal 0.d₁d₂d₃... that represents x.
Construction: start with D₀ = 0. At step n+1, use the Archimedean property to find the smallest integer j such that x − Dₙ ≤ j·10^(−(n+1)). Set dₙ₊₁ = j − 1.
This ensures Dₙ₊₁ < x ≤ Dₙ₊₁ + 10^(−(n+1)) for all n.
Uniqueness under the strict inequality: if we require Dₙ < x for all n, the representation is unique.
Why unique: if two representations satisfy the strict inequality, their digits must match at every position (both are the largest integer j such that j < (x − Dₖ₋₁)·10^K).

🎯 Non-uniqueness without the strict condition

If we allow Dₙ = x for some n, representations are not unique.
Example: 1/2 = 0.5000... = 0.4999...
When non-uniqueness occurs: only for numbers that can be written as m/10ⁿ for integers m, n (see Exercise 1.5.3).
In these cases, there are exactly two representations: one ending in infinitely many 0s, one ending in infinitely many 9s.

🔁 Rational numbers and repeating decimals

🔁 Rationals have eventually repeating digits (Proposition 1.5.3)

If x ∈ (0, 1] is rational and x = 0.d₁d₂d₃..., then the decimal digits eventually start repeating: there exist positive integers N and P such that for all n ≥ N, dₙ = dₙ₊ₚ.

Why: write x = p/q for positive integers p, q. Computing digits amounts to repeatedly dividing remainders by q.
At each step, the remainder rₙ is between 0 and q − 1.
There are at most q possible remainders, so the process must repeat after at most q steps.
The period P is at most q (in fact, at most q − 1; see Exercise 1.5.7).

🔁 Non-repeating decimals are irrational

Converse (Exercise 1.5.2): if the digits eventually repeat, x is rational.
Example: x = 0.101001000100001... (n zeros, then a 1, then n+1 zeros, then a 1, etc.) is irrational because the digits never start repeating—for any period P, going far enough finds a 1 followed by at least P+1 zeros.

Type of number	Decimal behavior	Example
Rational (finite form)	Terminates or repeats immediately	1/2 = 0.5000..., 1/3 = 0.3333...
Rational (general)	Eventually repeating	1/7 = 0.142857142857...
Irrational	Never repeating	√2, 0.101001000100001...

🔀 Cantor diagonalization and uncountability

🔀 Cantor's second proof (Theorem 1.5.2)

Claim: the interval (0, 1] is uncountable.
Strategy: show that any countable subset X = {x₁, x₂, x₃, ...} of (0, 1] cannot contain all numbers in (0, 1].

🔀 The diagonalization trick

Write each xₙ in its unique decimal representation: xₙ = 0.dₙ₁dₙ₂dₙ₃...
Here dₙⱼ is the jth digit of the nth number.
Construct a new number y: for each position n, choose the nth digit eₙ of y to be different from the nth digit dₙₙ of xₙ.
Specifically: eₙ = 1 if dₙₙ ≠ 1; eₙ = 2 if dₙₙ = 1.
This ensures all digits of y are nonzero, so y has the unique representation satisfying Eₙ < y ≤ Eₙ + 10^(−n).
Why y is not in X: for every n, the nth digit of y differs from the nth digit of xₙ, so y ≠ xₙ.
Since X was an arbitrary countable subset, (0, 1] must be uncountable.

🔀 Visualizing the diagonal

Imagine listing the numbers x₁, x₂, x₃, ... in rows, with their digits in columns.
The "diagonal" consists of the digits d₁₁, d₂₂, d₃₃, ...
The constructed number y differs from this diagonal at every position.
Example (from Figure 1.5):
- x₁ = 0.132..., x₂ = 0.794..., x₃ = 0.301..., x₄ = 0.8925..., x₅ = 0.16024...
- Diagonal digits: 1, 9, 1, 5, 4, ...
- Constructed y: 0.21211... (each digit differs from the corresponding diagonal digit).

🔀 Don't confuse with the first proof

This is Cantor's second proof of uncountability (the first used nested intervals and is mentioned in earlier sections).
The diagonalization method is more widely known and applies to other contexts (e.g., proving the halting problem is undecidable in computer science).

🧮 Connections and exercises

🧮 Cardinality insights

Exercise 1.5.5: using binary (base 2) instead of base 10, one can show the cardinality of ℝ equals the cardinality of the power set P(ℕ).
Exercise 1.5.6: surprisingly, one can construct an injection from [0, 1] × [0, 1] to [0, 1] (by interleaving digits), showing the "plane" and the "line" have the same cardinality.

🧮 Other bases

Exercise 1.5.4: the entire theory (Proposition 1.5.1) works for any integer base b ≥ 2, not just base 10.
The proof structure is identical; only the specific base changes.

🧮 Explicit constructions

Exercise 1.5.9: using Proposition 1.5.3 (rationals have repeating decimals), one can explicitly construct an injection from ℝ to ℝ \ ℚ (the irrationals).
This shows there are "at least as many" irrationals as reals—in fact, the irrationals are also uncountable.

Sequences and limits

2.1 Sequences and limits

🧭 Overview

🧠 One-sentence thesis

A sequence converges to a limit if, for any desired closeness, all terms beyond some point stay within that distance of the limit, and monotone bounded sequences always converge to their supremum or infimum.

📌 Key points (3–5)

What a sequence is: a function from natural numbers to real numbers, written as {xₙ}∞ₙ₌₁, where order and repetition matter (unlike sets).
What convergence means: for every epsilon > 0, there exists an M such that all terms xₙ with n ≥ M satisfy |xₙ − x| < epsilon; the limit x is unique.
Monotone convergence theorem: a monotone sequence converges if and only if it is bounded; the limit equals the supremum (if increasing) or infimum (if decreasing).
Common confusion: a bounded sequence need not converge (e.g., {(−1)ⁿ}), but every convergent sequence must be bounded; also, convergent subsequences do not guarantee the whole sequence converges.
Tail behavior: convergence depends only on the tail of the sequence—the first finitely many terms can be arbitrary.

📐 Sequences: definition and boundedness

📐 What a sequence is

Sequence (of real numbers): a function x : ℕ → ℝ.

Instead of x(n), we write xₙ for the nth element.
Notation: {xₙ}∞ₙ₌₁ or {xₙ} for short.
Sequences vs sets: The sequence {(−1)ⁿ}∞ₙ₌₁ is −1, 1, −1, 1, −1, 1, ... (order and repetition matter), but the range (set of values) is just {−1, 1}.
Example: {1/n}∞ₙ₌₁ is the sequence 1, 1/2, 1/3, 1/4, 1/5, ...
Example: A constant sequence {c}∞ₙ₌₁ = c, c, c, c, ... repeats the same value c indefinitely.

📏 Bounded sequences

Bounded sequence: a sequence {xₙ}∞ₙ₌₁ is bounded if there exists B ∈ ℝ such that |xₙ| ≤ B for all n ∈ ℕ.

Equivalently, the set {xₙ : n ∈ ℕ} is bounded.
Similarly defined: bounded below and bounded above.
Example: {1/n}∞ₙ₌₁ is bounded (B = 1 works).
Example: {n}∞ₙ₌₁ = 1, 2, 3, 4, ... is not bounded (grows without bound).

🎯 Convergence and limits

🎯 Definition of convergence

Convergence: A sequence {xₙ}∞ₙ₌₁ converges to x ∈ ℝ if for every epsilon > 0, there exists M ∈ ℕ such that |xₙ − x| < epsilon for all n ≥ M.

The number x is called the limit of the sequence.
If the limit exists and is unique, we write lim_{n→∞} xₙ = x.
A sequence that converges is convergent; otherwise it diverges or is divergent.
Key insight: M can depend on epsilon; we pick M after we know epsilon.
Intuition: Eventually, every term gets arbitrarily close to x (we don't need to ever reach x exactly).

🔍 How to read the definition

"For every epsilon > 0" means: no matter how small the desired closeness.
"There exists M ∈ ℕ" means: we can find a point in the sequence beyond which...
"For all n ≥ M" means: ...all subsequent terms stay within epsilon of x.
The definition does not require any xₙ to equal x; it only requires getting arbitrarily close.

📊 Examples of convergence and divergence

Example (constant sequence): The sequence 1, 1, 1, 1, ... converges to 1.

For every epsilon > 0, pick M = 1. Then |xₙ − 1| = |1 − 1| = 0 < epsilon for all n.

Example (1/n): The sequence {1/n}∞ₙ₌₁ converges to 0.

Given epsilon > 0, find M ∈ ℕ such that 1/M < epsilon (by the Archimedean property).
For all n ≥ M: |xₙ − 0| = 1/n ≤ 1/M < epsilon.

Example (oscillating sequence): The sequence {(−1)ⁿ}∞ₙ₌₁ = −1, 1, −1, 1, ... diverges.

Suppose it converged to some x. Take epsilon = 1/2.
For even n ≥ M: |1 − x| < 1/2. For odd n ≥ M: |−1 − x| < 1/2.
But then 2 = |1 − x − (−1 − x)| ≤ |1 − x| + |−1 − x| < 1/2 + 1/2 = 1, a contradiction.

Example (rational expression): The sequence {(n² + 1)/(n² + n)}∞ₙ₌₁ converges to 1.

Given epsilon > 0, find M such that 1/M < epsilon.
For n ≥ M: |(n² + 1)/(n² + n) − 1| = |(n² + 1 − n² − n)/(n² + n)| = |(1 − n)/(n² + n)| = (n − 1)/(n² + n) ≤ n/(n² + n) = 1/(n + 1) ≤ 1/n ≤ 1/M < epsilon.
This example shows: sometimes you must simplify and throw away information to get a clean estimate.

🔒 Uniqueness and boundedness

Proposition (Uniqueness of limits): A convergent sequence has a unique limit.

Proof technique: Show |y − x| < epsilon for all epsilon > 0, forcing y = x.
Given two limits x and y, pick epsilon > 0 arbitrarily.
Find M₁ such that |xₙ − x| < epsilon/2 for n ≥ M₁.
Find M₂ such that |xₙ − y| < epsilon/2 for n ≥ M₂.
Take n ≥ max(M₁, M₂): |y − x| = |xₙ − x − (xₙ − y)| ≤ |xₙ − x| + |xₙ − y| < epsilon/2 + epsilon/2 = epsilon.
Since |y − x| < epsilon for all epsilon > 0, we have y = x.

Proposition (Convergent implies bounded): A convergent sequence is bounded.

Proof: Suppose {xₙ} converges to x. Pick epsilon = 1.
There exists M such that |xₙ − x| < 1 for all n ≥ M.
For n ≥ M: |xₙ| = |xₙ − x + x| ≤ |xₙ − x| + |x| < 1 + |x|.
Let B = max{|x₁|, |x₂|, ..., |x_{M−1}|, 1 + |x|}. Then |xₙ| ≤ B for all n.

Don't confuse: The converse is false—bounded does not imply convergent. Example: {(−1)ⁿ} is bounded but diverges.

📈 Monotone sequences

📈 Definitions

Monotone increasing: xₙ ≤ xₙ₊₁ for all n ∈ ℕ.

Monotone decreasing: xₙ ≥ xₙ₊₁ for all n ∈ ℕ.

A sequence is monotone if it is either monotone increasing or monotone decreasing.
Example: {n}∞ₙ₌₁ is monotone increasing.
Example: {1/n}∞ₙ₌₁ is monotone decreasing.
Example: {1}∞ₙ₌₁ (constant) is both monotone increasing and decreasing.
Example: {(−1)ⁿ}∞ₙ₌₁ is not monotone.

🏆 Monotone convergence theorem

Theorem (Monotone convergence theorem): A monotone sequence is bounded if and only if it is convergent.

Furthermore:

If {xₙ} is monotone increasing and bounded, then lim_{n→∞} xₙ = sup{xₙ : n ∈ ℕ}.
If {xₙ} is monotone decreasing and bounded, then lim_{n→∞} xₙ = inf{xₙ : n ∈ ℕ}.

Proof (increasing case):

Suppose {xₙ} is monotone increasing and bounded.
Let x = sup{xₙ : n ∈ ℕ}.
Given epsilon > 0, since x is the supremum, there exists M ∈ ℕ such that x_M > x − epsilon.
Since {xₙ} is monotone increasing, xₙ ≥ x_M for all n ≥ M (by induction).
For all n ≥ M: |xₙ − x| = x − xₙ ≤ x − x_M < epsilon.
Therefore {xₙ} converges to x.
The other direction (convergent implies bounded) was already proved.

Practical note:

A monotone increasing sequence is always bounded below (by x₁), so we only need to check if it's bounded above.
A monotone decreasing sequence is always bounded above (by x₁), so we only need to check if it's bounded below.

🧮 Example: {1/√n}

The sequence {1/√n}∞ₙ₌₁ is bounded below: 1/√n > 0 for all n.
It is monotone decreasing: √(n+1) ≥ √n implies 1/√(n+1) ≤ 1/√n.
By the monotone convergence theorem, it converges and lim_{n→∞} 1/√n = inf{1/√n : n ∈ ℕ}.
The infimum is ≥ 0. Suppose b ≥ 0 and b ≤ 1/√n for all n.
Squaring: b² ≤ 1/n for all n ∈ ℕ.
By the Archimedean property, this implies b² ≤ 0, so b² = 0 and b = 0.
Therefore lim_{n→∞} 1/√n = 0.

⚠️ Caution about boundedness

Example (harmonic sequence): The sequence {1 + 1/2 + ... + 1/n}∞ₙ₌₁ is monotone increasing.

It grows slowly and slower as n increases.
However, this sequence has no upper bound and does not converge (will be shown later with series).
Lesson: Showing a monotone sequence is bounded may be difficult; slow growth does not guarantee boundedness.

🔗 Supremum and infimum via sequences

Proposition: Let S ⊂ ℝ be a nonempty bounded set. Then there exist monotone sequences {xₙ}∞ₙ₌₁ and {yₙ}∞ₙ₌₁ such that xₙ, yₙ ∈ S and

sup S = lim_{n→∞} xₙ
inf S = lim_{n→∞} yₙ

🔄 Tail behavior and subsequences

🔄 Tail of a sequence

K-tail: For a sequence {xₙ}∞ₙ₌₁, the K-tail is the sequence starting at K+1, written as {xₙ₊ₖ}∞ₙ₌₁ or {xₙ}∞ₙ₌ₖ₊₁.

Example: The 4-tail of {1/n}∞ₙ₌₁ is 1/5, 1/6, 1/7, 1/8, ...
The 0-tail is the sequence itself.

Proposition (Convergence depends only on the tail): The following are equivalent:

The sequence {xₙ}∞ₙ₌₁ converges.
The K-tail {xₙ₊ₖ}∞ₙ₌₁ converges for all K ∈ ℕ.
The K-tail {xₙ₊ₖ}∞ₙ₌₁ converges for some K ∈ ℕ.

Furthermore, if any limit exists, then lim_{n→∞} xₙ = lim_{n→∞} xₙ₊ₖ for all K.

Why this matters:

The limit does not care about the beginning of the sequence, only the tail.
The first finitely many terms may be arbitrary.
Example: The sequence xₙ = n/(n² + 16) is not monotone at the start, but the 3-tail is monotone decreasing. Since the 3-tail is monotone and bounded below by zero, it converges, so the whole sequence converges.

🎣 Subsequences

Subsequence: Let {xₙ}∞ₙ₌₁ be a sequence. Let {nᵢ}∞ᵢ₌₁ be a strictly increasing sequence of natural numbers (n₁ < n₂ < n₃ < ...). The sequence {xₙᵢ}∞ᵢ₌₁ is called a subsequence of {xₙ}∞ₙ₌₁.

The subsequence is xₙ₁, xₙ₂, xₙ₃, ...
Example: {1/(3i)}∞ᵢ₌₁ = 1, 1/3, 1/6, 1/9, ... is a subsequence of {1/n}∞ₙ₌₁ (take nᵢ = 3i).
Order must be preserved: 1, 1/3, 1/2, 1/5, ... is not a subsequence of {1/n}.
Terms must come from the original sequence: 1, 0, 1/3, 0, 1/5, ... is not a subsequence of {1/n}.
A tail is a special type of subsequence.

🔗 Convergence of subsequences

Proposition: If {xₙ}∞ₙ₌₁ is convergent, then every subsequence {xₙᵢ}∞ᵢ₌₁ is also convergent, and lim_{n→∞} xₙ = lim_{i→∞} xₙᵢ.

Proof:

Suppose lim_{n→∞} xₙ = x. Given epsilon > 0, find M such that |xₙ − x| < epsilon for all n ≥ M.
By induction, nᵢ ≥ i for all i ∈ ℕ.
So i ≥ M implies nᵢ ≥ M, hence |xₙᵢ − x| < epsilon for all i ≥ M.

Don't confuse: The converse is false—a convergent subsequence does not imply the whole sequence converges.

Example (oscillating sequence): The sequence 0, 1, 0, 1, 0, 1, ... diverges.

The subsequence {x₂ᵢ}∞ᵢ₌₁ = 1, 1, 1, ... converges to 1.
The subsequence {x₂ᵢ₊₁}∞ᵢ₌₁ = 0, 0, 0, ... converges to 0.
But the original sequence diverges.

Concept	Implication	Converse
Convergent sequence	All subsequences converge to the same limit	False: convergent subsequence ≠ convergent sequence
Convergent sequence	Sequence is bounded	False: bounded ≠ convergent
Monotone + bounded	Convergent	True (monotone convergence theorem)

Facts about limits of sequences

2.2 Facts about limits of sequences

🧭 Overview

🧠 One-sentence thesis

Limits of sequences interact predictably with inequalities and algebraic operations, allowing us to compute limits of complex sequences by combining simpler ones and to test convergence using ratios or recursive definitions.

📌 Key points (3–5)

Squeeze lemma: if a sequence is trapped between two sequences converging to the same limit, it must converge to that limit too.
Algebraic operations preserve limits: limits can be taken past addition, subtraction, multiplication, division (when denominators are nonzero), and roots.
Strict vs non-strict inequalities: strict inequalities (less-than) may become non-strict (less-than-or-equal) when limits are applied—a common source of errors.
Ratio test: if the ratio of consecutive terms converges to L < 1, the sequence converges to zero; if L > 1, the sequence is unbounded.
Recursively defined sequences: convergence can be proven by showing monotonicity and boundedness, then solving for the limit algebraically.

🔗 Limits and inequalities

🗜️ Squeeze lemma

Squeeze lemma: Let {aₙ}, {bₙ}, and {xₙ} be sequences such that aₙ ≤ xₙ ≤ bₙ for all n. If {aₙ} and {bₙ} converge to the same limit, then {xₙ} converges to that same limit.

The idea: if xₙ is "squeezed" between two sequences that both approach the same value x, then xₙ must also approach x.
The proof uses epsilon-delta: once aₙ and bₙ are both within epsilon of x, then xₙ (trapped between them) must also be within epsilon of x.
Example: To show that 1/(n√n) converges to 0, note that 0 ≤ 1/(n√n) ≤ 1/n, and both outer sequences converge to 0.

Don't confuse: The squeeze lemma requires the two bounding sequences to converge to the same limit. If they converge to different limits, the lemma says nothing.

⚖️ Inequalities and limits

Lemma 2.2.3: If {xₙ} and {yₙ} are convergent sequences with xₙ ≤ yₙ for all n, then lim xₙ ≤ lim yₙ.

Non-strict inequalities (≤) are preserved by limits.
Critical warning: Strict inequalities (<) may become non-strict (≤) when limits are applied.
Example showing the issue: Let xₙ = -1/n and yₙ = 1/n. Then xₙ < 0 < yₙ for all n, but lim xₙ = lim yₙ = 0 (the strict inequalities become equalities).

📦 Corollary for bounded sequences

Corollary 2.2.4 gives two useful facts:

If xₙ ≥ 0 for all n and {xₙ} converges, then lim xₙ ≥ 0.
If a ≤ xₙ ≤ b for all n and {xₙ} converges, then a ≤ lim xₙ ≤ b.

These follow by applying Lemma 2.2.3 with constant sequences.

➕ Algebraic operations and limits

➕ Addition and subtraction

Proposition 2.2.5 (i, ii): If {xₙ} and {yₙ} converge, then {xₙ + yₙ} and {xₙ - yₙ} converge, and the limit of the sum/difference equals the sum/difference of the limits.

In symbols: lim(xₙ + yₙ) = lim xₙ + lim yₙ and lim(xₙ - yₙ) = lim xₙ - lim yₙ.
The proof uses the triangle inequality and splits epsilon into two parts (epsilon/2 for each sequence).

✖️ Multiplication

Proposition 2.2.5 (iii): If {xₙ} and {yₙ} converge, then {xₙyₙ} converges and lim(xₙyₙ) = (lim xₙ)(lim yₙ).

The proof is more delicate than addition because products can grow large.
Strategy: write xₙyₙ - xy = (xₙ - x)y + x(yₙ - y) + (xₙ - x)(yₙ - y) and bound each term.

➗ Division

Proposition 2.2.5 (iv): If {yₙ} converges to a nonzero limit and yₙ ≠ 0 for all n, then {xₙ/yₙ} converges and lim(xₙ/yₙ) = (lim xₙ)/(lim yₙ).

The proof first shows that {1/yₙ} converges to 1/(lim yₙ), then applies the multiplication rule.
Key step: if lim yₙ = y ≠ 0, then eventually |yₙ| > |y|/2, so 1/|yₙ| < 2/|y|.

🔢 Powers and roots

Powers: lim(xₙᵏ) = (lim xₙ)ᵏ for any natural number k (proved by induction using the multiplication rule).
Square roots: If xₙ ≥ 0 for all n and {xₙ} converges, then lim(√xₙ) = √(lim xₙ).
The proof for square roots uses the identity √xₙ - √x = (xₙ - x)/(√xₙ + √x).

📏 Absolute value

Proposition 2.2.7: If {xₙ} converges, then {|xₙ|} converges and lim|xₙ| = |lim xₙ|.

Uses the reverse triangle inequality: ||xₙ| - |x|| ≤ |xₙ - x|.
Warning: The converse is false—{|xₙ|} can converge while {xₙ} diverges (e.g., xₙ = (-1)ⁿ).

⚠️ Common mistake

You must verify that all sequences involved are convergent before applying these rules.

Example of what goes wrong: The sequence {n²/(n+1) - n} converges to -1, but you cannot write this as lim(n²/(n+1)) - lim(n) because neither individual sequence converges.

🔄 Recursively defined sequences

🔄 What are recursive sequences

A recursively defined sequence computes each term from previous terms using a formula.

Example: x₁ = 2 and xₙ₊₁ = xₙ - (xₙ² - 2)/(2xₙ).

🛠️ Strategy for analyzing recursive sequences

Step 1: Prove the sequence is well-defined (e.g., never divides by zero).

Step 2: Prove the sequence is bounded and/or monotone.

Often done by induction.
Example: Show xₙ > 0 for all n by induction.

Step 3: Apply the monotone convergence theorem to conclude the limit exists.

Step 4: Find the limit by taking limits on both sides of the recursive formula.

If xₙ₊₁ = f(xₙ) and lim xₙ = x, then x = f(x).
Solve this equation for x.

📐 Newton's method example

The sequence x₁ = 2, xₙ₊₁ = xₙ - (xₙ² - 2)/(2xₙ) converges to √2.

Verification:

Rewrite as xₙ₊₁ = (xₙ² + 2)/(2xₙ).
Show by induction that xₙ > 0 for all n.
Show xₙ² - 2 ≥ 0 for all n ≥ 1, which implies the sequence is decreasing.
Since it's decreasing and bounded below by 0, it converges.
Taking limits: 2x² = x² + 2, so x² = 2, hence x = √2 (since x ≥ 0).

⚠️ Warning about recursive sequences

Don't assume convergence without proof!

Example: If x₁ = 1 and xₙ₊₁ = xₙ² + xₙ, blindly assuming convergence and solving x = x² + x gives x = 0. But the sequence is actually unbounded and diverges.

The same recursive formula can behave differently depending on the initial value—this is studied in the field of dynamics.

🧪 Convergence tests

🧪 Basic convergence test

Proposition 2.2.10: Suppose there exists x ∈ ℝ and a sequence {aₙ} converging to 0 such that |xₙ - x| ≤ aₙ for all n. Then {xₙ} converges to x.

This reduces the problem of showing xₙ → x to showing another sequence aₙ → 0.
The idea: studying convergence to x is equivalent to studying convergence to 0 (by shifting).

📊 Geometric sequences

Proposition 2.2.11: For c > 0:

If c < 1, then cⁿ → 0.

If c > 1, then {cⁿ} is unbounded.

When c < 1: {cⁿ} is decreasing and bounded below by 0, so it converges. Taking limits in cⁿ⁺¹ = c·cⁿ gives x = cx, so x = 0.
When c > 1: Use the fact that 1/c < 1, so (1/c)ⁿ → 0, which means cⁿ → ∞.

📈 Ratio test for sequences

Lemma 2.2.12 (Ratio test): Let {xₙ} be a sequence with xₙ ≠ 0 for all n. Suppose L = lim(|xₙ₊₁|/|xₙ|) exists.

If L < 1, then xₙ → 0.

If L > 1, then {xₙ} is unbounded.

How it works:

If L < 1, pick r with L < r < 1. Eventually |xₙ₊₁|/|xₙ| < r, so the sequence decreases faster than the geometric sequence {rⁿ}.
If L > 1, pick r with 1 < r < L. Eventually |xₙ₊₁|/|xₙ| > r, so the sequence grows faster than {rⁿ}.

When L = 1: The test is inconclusive. Different sequences with L = 1 can converge, diverge, or be unbounded.

🔬 Examples using ratio test

Example 1: Show that 2ⁿ/n! → 0.

Compute the ratio: (2ⁿ⁺¹/(n+1)!)/(2ⁿ/n!) = 2/(n+1).
This ratio converges to 0 < 1, so the sequence converges to 0.

Example 2: Show that n^(1/n) → 1.

Consider n/(1+ε)ⁿ for any ε > 0.
The ratio of consecutive terms is ((n+1)/n)·(1/(1+ε)) → 1/(1+ε) < 1.
So n/(1+ε)ⁿ → 0, which means n < (1+ε)ⁿ eventually, giving n^(1/n) < 1+ε.
Since ε was arbitrary, n^(1/n) → 1.

🎯 Key takeaways

🎯 Practical workflow

When computing limits of complex sequences:

Check if the sequence is built from simpler convergent sequences using algebraic operations.
If so, apply Proposition 2.2.5 to compute the limit step-by-step.
For recursive sequences, prove convergence first (monotonicity + boundedness), then solve for the limit.
For sequences involving factorials or exponentials, try the ratio test.

🎯 Common pitfalls to avoid

Mistake	Why it's wrong	What to do instead
Assuming strict inequalities are preserved	xₙ < yₙ only implies lim xₙ ≤ lim yₙ	Accept that strict may become non-strict
Applying limit rules to divergent sequences	The rules require convergence	Verify convergence first
Assuming recursive sequences converge	Some diverge or are unbounded	Prove monotonicity and boundedness
Concluding anything when L = 1 in ratio test	L = 1 is inconclusive	Use a different method

Limit superior, limit inferior, and Bolzano–Weierstrass

2.3 Limit superior, limit inferior, and Bolzano–Weierstrass

🧭 Overview

🧠 One-sentence thesis

Bounded sequences, even when divergent, always possess limit superior and limit inferior that capture their long-term behavior and guarantee the existence of convergent subsequences.

📌 Key points (3–5)

Why limsup and liminf exist: For any bounded sequence, we can construct monotone sequences from suprema and infima of "tail sets" that always converge.
When a sequence converges: A bounded sequence converges if and only if its limit inferior equals its limit superior.
Bolzano–Weierstrass guarantee: Every bounded sequence contains at least one convergent subsequence.
Common confusion: The sequences {aₙ} and {bₙ} used to define limsup and liminf are not subsequences of {xₙ}; they are constructed from suprema and infima of tail sets.
Extension to unbounded sequences: Limsup and liminf can take values ∞ or −∞, allowing us to describe the behavior of any sequence.

🔨 Constructing limsup and liminf

🔨 The tail-set construction

For a bounded sequence {xₙ}, define aₙ = sup{xₖ : k ≥ n} and bₙ = inf{xₖ : k ≥ n}.

What these represent: aₙ is the supremum of all terms from position n onward; bₙ is the infimum of all terms from position n onward.
Why this works: As n increases, the set {xₖ : k ≥ n} shrinks (it's a subset of the previous set), so aₙ can only decrease or stay the same, and bₙ can only increase or stay the same.
The key insight: {aₙ} is bounded and monotone decreasing; {bₙ} is bounded and monotone increasing. By the monotone convergence theorem, both sequences converge.

Example: For the sequence xₙ = (n+1)/n if n is odd, 0 if n is even:

The infimum of any tail set is always 0 (since even terms keep appearing), so lim inf = 0.
The supremum of the tail starting at n approaches 1 (the odd terms approach 1), so lim sup = 1.

🔨 Formal definitions

lim sup(xₙ) = lim(aₙ) as n→∞, and lim inf(xₙ) = lim(bₙ) as n→∞.

For bounded sequences, these limits always exist (finite values).
Always: lim inf(xₙ) ≤ lim sup(xₙ).
Alternative characterization: lim sup(xₙ) = inf{aₙ : n ∈ ℕ} and lim inf(xₙ) = sup{bₙ : n ∈ ℕ}.

🔨 Don't confuse with subsequences

The sequences {aₙ} and {bₙ} are not made up of terms from {xₙ}.
Example: For {1/n}, we have bₙ = 0 for all n, but 0 never appears in the original sequence.
The values aₙ and bₙ are bounds, not actual sequence terms.

🔗 Connection to convergence

🔗 The convergence criterion

A bounded sequence {xₙ} converges if and only if lim inf(xₙ) = lim sup(xₙ).

Why this makes sense: If the "largest subsequential limit" equals the "smallest subsequential limit," all subsequential limits must be the same value.
When they're equal: lim(xₙ) = lim inf(xₙ) = lim sup(xₙ).
Proof idea: Use the squeeze lemma—since bₙ ≤ xₙ ≤ aₙ for all n, if bₙ and aₙ converge to the same limit, so must xₙ.

Example: The sequence (−1)ⁿ has lim inf = −1 and lim sup = 1. Since these differ, the sequence diverges.

🔗 Subsequences and their limits

There exists a subsequence {xₙₖ} converging to lim sup(xₙ), and another subsequence {xₘₖ} converging to lim inf(xₙ).

How to construct them: Inductively pick terms that are arbitrarily close to the suprema aₙ.
What this means: Limsup and liminf are the largest and smallest subsequential limits.
For any convergent subsequence: lim inf(xₙ) ≤ lim(xₙₖ) ≤ lim sup(xₙ).

🔗 Testing convergence without knowing the limit

If every convergent subsequence converges to the same value x, then the whole sequence converges to x.
This provides a way to verify convergence without explicitly computing the limit.

🎯 Bolzano–Weierstrass theorem

🎯 The main statement

Every bounded sequence of real numbers contains a convergent subsequence.

Why it's important: Even if a sequence doesn't converge, we can always extract a convergent piece.
Named after: Bernhard Bolzano (1781–1848) and Karl Weierstrass (1815–1897).

🎯 First proof (using limsup)

Since the sequence is bounded, lim sup(xₙ) exists and is finite.
By the theorem connecting limsup to subsequences, there exists a subsequence converging to lim sup(xₙ).
Done—we've found a convergent subsequence.

🎯 Second proof (bisection method)

Start with an interval [a₁, b₁] containing all terms.
At each step, split the interval in half and choose the half containing infinitely many terms.
This creates nested intervals with lengths shrinking to zero.
The sequences {aᵢ} (left endpoints, increasing) and {bᵢ} (right endpoints, decreasing) converge to the same limit.
Pick one term from each interval to form a subsequence; by the squeeze lemma, it converges.

Don't confuse: The bisection proof doesn't require knowing about limsup; it's a direct construction that works in more general settings (like ℝⁿ).

🎯 Alternative approach

Claim: Every sequence has a monotone subsequence.
Proof idea: Define a "peak" as a position n where xₙ ≥ xₘ for all m ≥ n. Either there are infinitely many peaks (giving a decreasing subsequence) or finitely many (allowing construction of an increasing subsequence).
Conclusion: Combine with the monotone convergence theorem to get Bolzano–Weierstrass.

🌐 Extension to unbounded sequences

🌐 Allowing infinite values

For unbounded sequences, define aₙ = sup{xₖ : k ≥ n} and bₙ = inf{xₖ : k ≥ n} as extended real numbers (allowing ∞ and −∞).

New definitions: lim sup(xₙ) = inf{aₙ : n ∈ ℕ} and lim inf(xₙ) = sup{bₙ : n ∈ ℕ}.
When they're infinite: If aₙ = ∞ for all n, then lim sup = ∞. If bₙ = −∞ for all n, then lim inf = −∞.

🌐 Divergence to infinity

A sequence diverges to infinity if for every K ∈ ℝ, there exists M such that xₙ > K for all n ≥ M.

Notation: Write lim(xₙ) = ∞.
Similarly for −∞: For every K, eventually xₙ < K.
For monotone unbounded sequences: An increasing unbounded sequence diverges to ∞; a decreasing unbounded sequence diverges to −∞.

Example: For xₙ = 0 if n is odd, xₙ = n if n is even:

aₙ = ∞ for all n (even terms grow without bound).
bₙ = 0 for all n (odd terms are always 0).
So lim sup = ∞, lim inf = 0, and the sequence does not converge.

🌐 Behavior with infinite limits

Condition	Conclusion
lim inf = lim sup = ∞	Sequence diverges to ∞
lim inf = lim sup = −∞	Sequence diverges to −∞
lim inf = −∞, lim sup = ∞	Sequence oscillates wildly; no limit exists
lim inf and lim sup both finite	Sequence is bounded

Don't confuse: "Diverges to infinity" is different from "diverges"—the former has a specific directional behavior, while the latter just means "doesn't converge."

📐 Properties and comparisons

📐 Behavior with subsequences

For any subsequence {xₙₖ}: lim inf(xₙ) ≤ lim inf(xₙₖ) ≤ lim sup(xₙₖ) ≤ lim sup(xₙ).

Why this holds: The subsequence uses a subset of the tail sets, so its suprema are no larger and its infima are no smaller.
Implication: Limsup and liminf are the extreme (largest and smallest) subsequential limits.

📐 Comparison between sequences

If xₙ ≤ yₙ for all n, then:

lim sup(xₙ) ≤ lim sup(yₙ)
lim inf(xₙ) ≤ lim inf(yₙ)

Proof idea: The supremum of a subset is at most the supremum of the larger set.

📐 Addition behavior (subtle!)

Lower bound: lim inf(xₙ) + lim inf(yₙ) ≤ lim inf(xₙ + yₙ).
Upper bound: lim sup(xₙ + yₙ) ≤ lim sup(xₙ) + lim sup(yₙ).
Inequalities can be strict: It's possible to construct examples where the inequalities are not equalities.
Why it's tricky: The positions where xₙ and yₙ achieve their extreme values may not align.

Example showing strict inequality: Take xₙ and yₙ that oscillate out of phase—when one is large, the other is small, so their sum is more stable than either individually.

Cauchy sequences

2.4 Cauchy sequences

🧭 Overview

🧠 One-sentence thesis

A sequence is Cauchy if its terms eventually become arbitrarily close to each other, and in the real numbers every Cauchy sequence converges—a property equivalent to the least-upper-bound property.

📌 Key points (3–5)

What Cauchy means: terms of the sequence are eventually all arbitrarily close to each other (not to a known limit).
Why Cauchy matters: it lets us check convergence without knowing the limit in advance.
Cauchy ↔ convergence: in ℝ, a sequence is Cauchy if and only if it converges (this is called Cauchy-completeness).
Common confusion: Cauchy is stronger than consecutive-term differences going to zero; the harmonic series partial sums have differences 1/n → 0 but are not Cauchy.
Key mechanism: the proof uses the least-upper-bound property via limsup and liminf; Cauchy sequences are bounded, so their limsup and liminf exist and must be equal.

🎯 Core definition and motivation

🎯 What a Cauchy sequence is

Cauchy sequence: A sequence {xₙ} is Cauchy if for every ε > 0 there exists an M ∈ ℕ such that for all n ≥ M and all k ≥ M, we have |xₙ − xₖ| < ε.

Informally: the terms are eventually all arbitrarily close to each other.
The key is that n and k vary independently and can be arbitrarily far apart.
You do not need to know the limit to check the Cauchy condition.

🔍 Why we need Cauchy sequences

Often we describe a number by a sequence converging to it, but we cannot use the number itself in the proof of convergence.
The Cauchy criterion lets us verify convergence without knowing the limit in advance.
Example: if you are constructing a number via a sequence, you need a way to show the sequence converges before you have defined the limit.

🧪 Examples and non-examples

✅ Example: {1/n} is Cauchy

Given ε > 0, choose M > 2/ε.
For n, k ≥ M, we have 1/n < ε/2 and 1/k < ε/2.
Therefore |1/n − 1/k| ≤ 1/n + 1/k < ε/2 + ε/2 = ε.

❌ Example: {(−1)ⁿ} is not Cauchy

For any M, take n ≥ M even and k = n + 1.
Then |(−1)ⁿ − (−1)ᵏ| = |1 − (−1)| = 2.
For any ε ≤ 2, the definition cannot be satisfied.
The terms oscillate and never get close to each other.

🔗 Cauchy sequences are bounded

🔗 Boundedness proposition

Proposition: If a sequence is Cauchy, then it is bounded.

Proof sketch:

Pick M such that for all n, k ≥ M, |xₙ − xₖ| < 1.
In particular, for all n ≥ M, |xₙ − x_M| < 1.
By the reverse triangle inequality, |xₙ| < 1 + |x_M|.
Let B = max{|x₁|, |x₂|, …, |x_{M−1}|, 1 + |x_M|}.
Then |xₙ| ≤ B for all n.

🧩 Why boundedness matters

Bounded sequences have limsup and liminf that exist (possibly infinite, but here finite).
This is the key to proving that Cauchy sequences converge.

🏆 The main theorem: Cauchy ↔ convergence

🏆 Theorem statement

Theorem: A sequence of real numbers is Cauchy if and only if it converges.

➡️ Direction 1: convergent ⇒ Cauchy (easy)

Suppose {xₙ} converges to x.
Given ε > 0, there exists M such that for n ≥ M, |xₙ − x| < ε/2.
For n, k ≥ M, |xₙ − xₖ| = |xₙ − x + x − xₖ| ≤ |xₙ − x| + |x − xₖ| < ε/2 + ε/2 = ε.

⬅️ Direction 2: Cauchy ⇒ convergent (uses least-upper-bound property)

Proof strategy:

{xₙ} is Cauchy, so it is bounded.
For a bounded sequence, limsup and liminf exist (this uses the least-upper-bound property).
Define a = limsup xₙ and b = liminf xₙ.
By Theorem 2.3.4, there exist subsequences {x_{nᵢ}} → a and {x_{mᵢ}} → b.
Given ε > 0, find M₁ such that |x_{nᵢ} − a| < ε/3 for i ≥ M₁, M₂ such that |x_{mᵢ} − b| < ε/3 for i ≥ M₂, and M₃ such that |xₙ − xₖ| < ε/3 for n, k ≥ M₃.
Let M = max{M₁, M₂, M₃}. For i ≥ M, both nᵢ ≥ M and mᵢ ≥ M.
Then |a − b| = |a − x_{nᵢ} + x_{nᵢ} − x_{mᵢ} + x_{mᵢ} − b| ≤ |a − x_{nᵢ}| + |x_{nᵢ} − x_{mᵢ}| + |x_{mᵢ} − b| < ε/3 + ε/3 + ε/3 = ε.
Since |a − b| < ε for all ε > 0, we have a = b.
By Proposition 2.3.5, if limsup = liminf, the sequence converges.

🌐 Completeness

The statement "every Cauchy sequence converges" is called Cauchy-completeness (or just completeness).
ℝ is Cauchy-complete because it has the least-upper-bound property.
One can construct ℝ by "completing" ℚ: throw in just enough points to make all Cauchy sequences converge.
The advantage of defining completeness via Cauchy sequences is that it generalizes to more abstract settings like metric spaces.

⚠️ Common confusion: Cauchy vs consecutive differences

⚠️ Consecutive differences going to zero is not enough

The Cauchy criterion is stronger than |x_{n+1} − xₙ| → 0 (or |x_{n+j} − xₙ| → 0 for fixed j).
Example: the harmonic series partial sums xₙ = 1 + 1/2 + … + 1/n satisfy x_{n+1} − xₙ = 1/n → 0, yet {xₙ} is divergent.
In fact, for that sequence, lim_{n→∞} (x_{n+j} − xₙ) = 0 for every fixed j ∈ ℕ.

🔑 The key difference

In the Cauchy definition, n and k vary independently and can be arbitrarily far apart.
Checking only consecutive terms (or terms a fixed distance apart) is not sufficient.
Don't confuse: "consecutive differences → 0" does not imply Cauchy; but Cauchy does imply "consecutive differences → 0."

📝 Selected exercises (concepts only)

📝 Exercise themes

Exercise	Concept
2.4.2	If \|x_{n+1} − xₙ\| ≤ C\|xₙ − x_{n−1}\| for C < 1, then {xₙ} is Cauchy (uses geometric series formula)
2.4.3 (Challenging)	If an ordered field F contains ℚ densely and every Cauchy sequence of rationals has a limit in F, then F has the least-upper-bound property
2.4.4	If \|x_m − xₖ\| ≤ yₖ for m ≥ k and yₙ → 0, then {xₙ} is Cauchy
2.4.5	If a Cauchy sequence has infinitely many positive and negative terms past every M, it converges to 0
2.4.7	If a Cauchy sequence equals a constant c infinitely often, it converges to c

These exercises reinforce the definition and explore variations of the Cauchy condition.
Exercise 2.4.3 shows that Cauchy-completeness can be used to define the least-upper-bound property.

2.5 Series

🧭 Overview

🧠 One-sentence thesis

An infinite series converges if and only if its sequence of partial sums converges, and tests such as comparison, ratio, and root tests allow us to determine convergence without computing the limit explicitly.

📌 Key points (3–5)

What a series is: the limit of partial sums; convergence means the sequence of partial sums converges to a finite number.
Necessary condition: if a series converges, its terms must go to zero—but terms going to zero does not guarantee convergence (e.g., harmonic series).
Absolute vs. conditional convergence: a series converges absolutely if the series of absolute values converges; absolute convergence implies convergence, but not vice versa.
Common confusion: the harmonic series Σ(1/n) diverges even though 1/n → 0; terms must go to zero "fast enough."
Practical tests: comparison test, p-series test, ratio test, and root test provide ways to decide convergence without finding the sum.

📐 Definitions and basic properties

📐 What is a series?

Series: Given a sequence {xₙ}, the series Σ(n=1 to ∞) xₙ is defined as the limit of the sequence of partial sums {sₖ}, where sₖ = Σ(n=1 to k) xₙ.

A series is not just an infinite sum written down; it is the limit of finite sums.
The series converges if lim(k→∞) sₖ exists and is finite.
Example: Σ(n=1 to ∞) (1/2ⁿ) = 1 because the partial sums approach 1.

🔁 Geometric series

Condition	Result
−1 < r < 1	Σ(n=0 to ∞) rⁿ = 1/(1−r)
r ≥ 1 or r ≤ −1	Σ(n=0 to ∞) rⁿ diverges

The geometric series is one of the few series for which we can explicitly find the limit.
Proof uses the formula for finite geometric sums and takes the limit as k → ∞.
Example: Σ(n=i to ∞) rⁿ = rⁱ/(1−r) when |r| < 1.

🧩 Cauchy series

Cauchy series: A series Σ xₙ is Cauchy if the sequence of partial sums is a Cauchy sequence.

A series converges if and only if it is Cauchy (because real sequences converge iff they are Cauchy).
Equivalent condition: for every ε > 0, there exists M such that for all n ≥ M and all k > n, |Σ(i=n+1 to k) xᵢ| < ε.
This reformulation is often easier to work with than the definition involving differences of partial sums.

➕ Linearity of series

Proposition: If Σ xₙ and Σ yₙ converge and α ∈ ℝ, then:

Σ αxₙ converges and equals α Σ xₙ
Σ (xₙ + yₙ) converges and equals (Σ xₙ) + (Σ yₙ)
Series behave like finite sums with respect to addition and scalar multiplication.
Proof: apply linearity to the kth partial sum, then take limits.
Don't confuse: multiplying series is not term-by-term; (a+b)(c+d) ≠ ac + bd.

⚠️ Necessary condition and counterexamples

⚠️ Terms must go to zero

Proposition: If Σ xₙ converges, then lim(n→∞) xₙ = 0.

Proof: use the Cauchy property with k = n+1 to show |xₙ₊₁| < ε for large n.
Example: the geometric series Σ rⁿ diverges when |r| ≥ 1 because the terms do not go to zero.
Critical point: the converse is false—terms going to zero does not guarantee convergence.

🎵 The harmonic series

Example: Σ(n=1 to ∞) (1/n) diverges, even though lim(n→∞) (1/n) = 0.

Proof strategy: group terms and bound from below.
- s₂ = 1 + 1/2
- s₄ = 1 + 1/2 + (1/3 + 1/4) ≥ 1 + 1/2 + (1/4 + 1/4) = 1 + 1/2 + 1/2
- s₈ ≥ 1 + 1/2 + 1/2 + 1/2
- In general, s₂ₖ ≥ 1 + k/2
Since {k/2} is unbounded (Archimedean property), {s₂ₖ} is unbounded, so the series diverges.
This shows that terms must go to zero "fast enough" for convergence.

🔍 Absolute vs. conditional convergence

🔍 Definitions

Absolute convergence: A series Σ xₙ converges absolutely if Σ |xₙ| converges.

Conditional convergence: A series converges conditionally if it converges but does not converge absolutely.

Absolute convergence is a stronger property than convergence.
Example: Σ (−1)ⁿ/n converges (shown later) but Σ 1/n diverges, so Σ (−1)ⁿ/n is conditionally convergent.

🔍 Absolute convergence implies convergence

Proposition: If Σ xₙ converges absolutely, then it converges.

Proof: if Σ |xₙ| is Cauchy, then for large n and k > n, |Σ(i=n+1 to k) xᵢ| ≤ Σ(i=n+1 to k) |xᵢ| < ε by the triangle inequality.
Therefore Σ xₙ is also Cauchy, hence convergent.
Don't confuse: the limits of Σ xₙ and Σ |xₙ| are generally different; computing one does not give the other.

📏 Triangle inequality for series

If Σ xₙ converges absolutely, then:

|Σ(i=1 to ∞) xᵢ| ≤ Σ(i=1 to ∞) |xᵢ|

This is the series version of the triangle inequality.
It follows from applying the finite triangle inequality to partial sums and taking limits.

🧪 Convergence tests for positive series

🧪 Monotone series

Proposition: If xₙ ≥ 0 for all n, then Σ xₙ converges if and only if the sequence of partial sums is bounded above.

When all terms are nonnegative, the partial sums form a monotone increasing sequence.
A monotone increasing sequence converges iff it is bounded.
This makes positive-term series easier to analyze than general series.

🔬 Comparison test

Proposition: Let Σ xₙ and Σ yₙ be series with 0 ≤ xₙ ≤ yₙ for all n.

(i) If Σ yₙ converges, then Σ xₙ converges.
(ii) If Σ xₙ diverges, then Σ yₙ diverges.
Proof: partial sums satisfy Σ(n=1 to k) xₙ ≤ Σ(n=1 to k) yₙ.
- If Σ yₙ converges, the right side is bounded, so the left side is bounded, hence Σ xₙ converges.
- If Σ xₙ diverges, its partial sums are unbounded, so Σ yₙ's partial sums are also unbounded.
Example: Σ 1/(n²+1) converges because 1/(n²+1) < 1/n² and Σ 1/n² converges (p-series with p=2).

📊 The p-series test

Proposition: The series Σ(n=1 to ∞) (1/nᵖ) converges if and only if p > 1.

p	Convergence
p ≤ 1	Diverges
p > 1	Converges

Proof for p ≤ 1: use comparison with harmonic series (1/nᵖ ≥ 1/n).
Proof for p > 1: group terms and bound from above, similar to harmonic series proof but in reverse.
- s₂ₖ₋₁ < 1 + Σ(i=1 to k−1) (1/2^(p−1))ⁱ
- Since 1/2^(p−1) < 1 when p > 1, the geometric series converges, so partial sums are bounded.
Example: Σ 1/n² converges (p=2 > 1), but the test does not tell us the sum equals π²/6.
The p-series is a fundamental benchmark for the comparison test.

⚡ Ratio and root tests

⚡ Ratio test

Proposition: Let Σ xₙ be a series with xₙ ≠ 0 for all n, and suppose L = lim(n→∞) |xₙ₊₁|/|xₙ| exists.

(i) If L < 1, then Σ xₙ converges absolutely.
(ii) If L > 1, then Σ xₙ diverges.
The ratio test generalizes the behavior of geometric series (where the ratio is constant).
Proof for L < 1: pick r with L < r < 1; eventually |xₙ₊₁|/|xₙ| < r, so |xₙ| < |x_M| r^(n−M) for large n; compare with geometric series.
Proof for L > 1: the terms do not go to zero (by a lemma on sequences), so the series diverges.
Example: Σ 2ⁿ/n! converges absolutely because lim(n→∞) [2^(n+1)/(n+1)!] / [2ⁿ/n!] = lim 2/(n+1) = 0 < 1.

🌱 Root test

Proposition: Let Σ xₙ be a series and let L = lim sup(n→∞) |xₙ|^(1/n).

(i) If L < 1, then Σ xₙ converges absolutely.
(ii) If L > 1, then Σ xₙ diverges.
The root test also generalizes geometric series behavior.
Proof for L < 1: pick r with L < r < 1; eventually |xₙ|^(1/n) < r, so |xₙ| < rⁿ; compare with geometric series.
Proof for L > 1: there is a subsequence with |xₙₖ|^(1/nₖ) > r > 1, so |xₙₖ| > 1; terms do not go to zero.
The root test uses lim sup, so it applies even when the ordinary limit does not exist.

🔄 Tails and exercises

🔄 Tail of a series

Proposition: Let Σ(n=1 to ∞) xₙ be a series and M ∈ ℕ. Then Σ(n=1 to ∞) xₙ converges if and only if Σ(n=M to ∞) xₙ converges.

Proof: the kth partial sum (for k ≥ M) can be written as Σ(n=1 to k) xₙ = [Σ(n=1 to M−1) xₙ] + [Σ(n=M to k) xₙ].
The first sum is a fixed number; adding a constant does not affect convergence of a sequence.
This means we can ignore finitely many terms when testing convergence.
Example: when using the comparison test, we only need 0 ≤ xₙ ≤ yₙ for n ≥ M for some M; the tail determines convergence.

📝 Key exercises

Exercise 2.5.3: Practice deciding convergence/divergence using the tests.
Exercise 2.5.8: Show Σ (−1)ⁿ/n converges (hint: consider pairs of consecutive terms; this previews the alternating series test in §2.6).
Exercise 2.5.10: Prove the triangle inequality for series.
Exercise 2.5.15 (Cauchy condensation): For decreasing positive sequences, Σ xₙ converges iff Σ 2ⁿ x₂ₙ converges—a powerful tool for series like Σ 1/(n ln n).

More on series

2.6 More on series

🧭 Overview

🧠 One-sentence thesis

Beyond the ratio test, the root test, alternating series test, and careful treatment of rearrangements and products reveal that absolutely convergent series behave predictably while conditionally convergent series can be rearranged to converge to any value, and power series converge absolutely within a radius determined by their coefficients.

📌 Key points (3–5)

Root test: uses the limit supremum of the nth root of absolute values to determine absolute convergence or divergence, similar to the ratio test.
Alternating series test: a monotone decreasing sequence of positive terms going to zero guarantees convergence of the alternating series.
Absolute vs conditional convergence: absolutely convergent series can be rearranged in any order without changing the sum, but conditionally convergent series can be rearranged to converge to any desired value.
Common confusion: conditionally convergent series appear to converge, but their behavior under rearrangement is wildly different from absolutely convergent series.
Power series: converge absolutely within a radius of convergence determined by the coefficients, and can be added and multiplied term-by-term within that radius.

🔬 Root test

🔬 What the root test measures

Root test: Let the sum from n=1 to infinity of x_n be a series and let L be the limit supremum as n approaches infinity of the nth root of the absolute value of x_n. (i) If L < 1, then the series converges absolutely. (ii) If L > 1, then the series diverges.

The test examines the nth root of the absolute value of each term, not the ratio of consecutive terms.
It generalizes the geometric series: if the nth root of absolute value of x_n is eventually less than some r < 1, then absolute value of x_n < r to the power n, and the series is dominated by a convergent geometric series.
When L > 1, there is a subsequence where the nth root of absolute value of x_n exceeds some r > 1, so those terms themselves exceed 1 and cannot go to zero.

🔍 How to apply it

Compute L = limit supremum of the nth root of absolute value of x_n.
If L < 1: pick r such that L < r < 1; eventually all terms satisfy absolute value of x_n < r to the power n, so the series of absolute values is bounded by a convergent geometric series.
If L > 1: a subsequence has absolute value of x_n > r to the power n for some r > 1, so the terms do not go to zero and the series diverges.
Don't confuse: the root test says nothing when L = 1, just like the ratio test.

⚖️ Alternating series test

⚖️ When alternating series converge

Alternating series test: Let {x_n} be a monotone decreasing sequence of positive real numbers such that the limit as n approaches infinity of x_n = 0. Then the sum from n=1 to infinity of (-1) to the power n times x_n converges.

The series alternates signs: subtract x_1, add x_2, subtract x_3, add x_4, etc.
Monotone decreasing means each term is smaller than the previous: x_1 ≥ x_2 ≥ x_3 ≥ ... ≥ 0.
The terms must go to zero.

📐 Why it works

Consider the even partial sums s_2k: they can be grouped as (−x_1 + x_2) + (−x_3 + x_4) + ... + (−x_{2k−1} + x_{2k}).
Because the sequence is decreasing, each pair (−x_{2ℓ−1} + x_{2ℓ}) ≤ 0, so the even partial sums form a decreasing sequence.
The even partial sums are bounded below by −x_1.
A decreasing sequence bounded below converges.
The odd partial sums s_{2k+1} = s_{2k} − x_{2k+1} also converge to the same limit because x_{2k+1} → 0.
Example: the alternating harmonic series sum of (−1)^(n+1)/n converges by this test, even though it does not converge absolutely.

⚠️ Monotonicity is essential

The excerpt notes that monotonicity is necessary (Exercise 2.6.12 asks for a counterexample).
Without monotonicity, even if the terms go to zero, the alternating series can diverge.

🔄 Rearrangements of series

🔄 Absolute convergence preserves sums

Rearrangement: Given a bijective function σ from natural numbers to natural numbers, the rearrangement of the series sum of x_n is the series sum of x_{σ(k)}.

A rearrangement sums the same terms in a different order.
Key result: If the sum from n=1 to infinity of x_n converges absolutely, then any rearrangement converges absolutely to the same sum.
The proof uses the fact that for large enough M, the tail sum from n=M+1 to infinity of absolute value of x_n is small, and any rearrangement eventually includes all the first M terms.

🔄 Conditional convergence allows any sum

If a series converges conditionally (converges but not absolutely), it can be rearranged to converge to any desired value L.
Example: the alternating harmonic series sum of (−1)^(n+1)/n converges to some value, but the sum of odd terms diverges to +∞ and the sum of even terms diverges to −∞.
By alternately adding odd terms until the partial sum exceeds L, then adding even terms until it drops below L, and repeating, the rearranged series converges to L.
Don't confuse: a series that "converges" conditionally does not have a well-defined sum independent of order; only absolutely convergent series do.

✖️ Multiplication of series

✖️ Cauchy product

Cauchy product: Given two series sum of a_n and sum of b_n, define c_n = a_0 b_n + a_1 b_{n−1} + ... + a_n b_0. The series sum of c_n is the Cauchy product.

The Cauchy product generalizes polynomial multiplication to infinite series.
Mertens' theorem: If sum of a_n and sum of b_n both converge, and at least one converges absolutely, then the Cauchy product converges to the product of the two sums.
The proof rearranges partial sums and uses the absolute convergence of one series to control the error.

⚠️ Both conditionally convergent: product may diverge

If both series converge only conditionally, the Cauchy product may diverge.
Example: Let a_n = b_n = (−1)^n / sqrt(n+1). Each series converges by the alternating series test but not absolutely.
The Cauchy product c_n has absolute value at least 1 for all n, so the terms do not go to zero and the product diverges.

📈 Power series

📈 What is a power series

Power series: A power series about x_0 is a series of the form sum from n=0 to infinity of a_n (x − x_0)^n.

A power series is a function of x.
Convention: 0^0 = 1, so when x = x_0 and n = 0, the term is a_0.
The series always converges at x = x_0 (all terms except the first are zero).
Convergent means there is at least one x ≠ x_0 where the series converges; divergent means it converges only at x_0.

📏 Radius of convergence

Radius of convergence: If a power series is convergent, then either it converges absolutely for all x, or there exists a number ρ such that the series converges absolutely on the interval (x_0 − ρ, x_0 + ρ) and diverges when x < x_0 − ρ or x > x_0 + ρ.

Write ρ = ∞ if the series converges for all x; write ρ = 0 if the series is divergent.
At the endpoints x = x_0 ± ρ, the series may or may not converge (the proposition says nothing).
How to compute: Let R = limit supremum as n approaches infinity of the nth root of absolute value of a_n. Then ρ = 1/R (with the convention that 1/0 = ∞ and 1/∞ = 0).
The proof applies the root test to the series sum of a_n (x − x_0)^n.

🧮 Operations on power series

Addition: sum of a_n (x − x_0)^n + sum of b_n (x − x_0)^n = sum of (a_n + b_n)(x − x_0)^n.
Scalar multiplication: α times sum of a_n (x − x_0)^n = sum of α a_n (x − x_0)^n.
Multiplication: The product of two power series is their Cauchy product, sum of c_n (x − x_0)^n where c_n = a_0 b_n + a_1 b_{n−1} + ... + a_n b_0.
All operations preserve convergence within the common radius of convergence (the resulting radius may be larger).

📊 Examples of power series

Series	Radius of convergence	Notes
sum of (1/n!) x^n	ρ = ∞	Converges for all x (ratio test gives limit 0)
sum of (1/n) x^n	ρ = 1	Converges absolutely for \|x\| < 1; converges at x = −1 (alternating series test); diverges at x = 1
sum of n^n x^n	ρ = 0	Diverges for all x ≠ 0 (root test gives limit ∞)
Geometric series sum of x^n	ρ = 1	Equals 1/(1−x) for \|x\| < 1

Rational functions: Can be expanded as power series around x_0 (as long as the denominator is not zero at x_0) using the geometric series and algebraic manipulation.
Example: x/(1 + 2x + x²) = x/(1 − (−x))² = x times (sum of (−1)^n x^n)² = sum from n=1 to infinity of (−1)^(n+1) n x^n for |x| < 1.

Limits of functions

3.1 Limits of functions

🧭 Overview

🧠 One-sentence thesis

The limit of a function at a point captures how the function behaves as we approach that point, and this concept is precisely characterized by cluster points and convergent sequences.

📌 Key points (3–5)

Cluster points are essential: A point c is a cluster point of S if every neighborhood around c contains points of S other than c itself; limits are only defined at cluster points.
Limit definition: The limit of f(x) as x approaches c equals L if f(x) can be made arbitrarily close to L by taking x sufficiently close to c (but not equal to c).
Sequential characterization: A function limit exists if and only if every convergent sequence approaching c produces a sequence of function values converging to the same limit.
Common confusion: The limit at c does not depend on the function value at c itself—f(c) may be undefined, or may differ from the limit.
Arithmetic of limits: Limits behave well under addition, subtraction, multiplication, division (when denominators are nonzero), and inequalities.

🎯 Cluster points

🎯 What is a cluster point

Cluster point: A number x in the real numbers is a cluster point of a set S if for every epsilon greater than 0, the interval (x − epsilon, x + epsilon) intersected with S, excluding x itself, is not empty.

In plain language: x is a cluster point of S if there are points of S arbitrarily close to x.
The cluster point itself need not lie in S.
Alternative phrasing: for every epsilon greater than 0, there exists y in S such that y is not equal to x and the absolute value of x − y is less than epsilon.

📋 Examples of cluster points

Set S	Cluster points	Why
{1/n : n in natural numbers}	0 (unique)	Points get arbitrarily close to 0
(0, 1) (open interval)	[0, 1] (closed interval)	Every point in [0, 1] has points of (0, 1) arbitrarily near
Rational numbers	All real numbers	Rationals are dense in the reals
[0, 1) union {2}	[0, 1]	The isolated point 2 is not a cluster point
Natural numbers	None	No point has natural numbers arbitrarily close

🔗 Sequential characterization of cluster points

Proposition: x is a cluster point of S if and only if there exists a convergent sequence of numbers from S, all different from x, that converges to x.

Forward direction: If x is a cluster point, pick x_n within 1/n of x from S (excluding x); then x_n converges to x.
Reverse direction: If such a sequence exists, then for any epsilon greater than 0, some term x_M satisfies the absolute value of x_M − x less than epsilon, so x is a cluster point.
This gives a practical way to verify cluster points using sequences.

📐 Definition of function limits

📐 The epsilon-delta definition

Limit: Let f : S → real numbers be a function and c a cluster point of S. We say f(x) converges to L as x goes to c if for every epsilon greater than 0, there exists delta greater than 0 such that whenever x in S excluding c and the absolute value of x − c is less than delta, we have the absolute value of f(x) − L less than epsilon.

Notation: limit as x approaches c of f(x) equals L, or f(x) → L as x → c.
Key structure: Given any tolerance epsilon, we can find a neighborhood (controlled by delta) around c where f(x) stays within epsilon of L.
The value f(c) is irrelevant; f need not even be defined at c.
Example: For f(x) = x², the limit as x approaches c equals c². The proof uses delta = min{1, epsilon/(2|c| + 1)}.

🔑 Uniqueness of limits

Proposition: If the limit of f(x) as x goes to c exists, it is unique.

Proof idea: Suppose L₁ and L₂ both satisfy the definition. For any epsilon, find delta₁ and delta₂ for each. Take delta = min{delta₁, delta₂}. Because c is a cluster point, there exists x in S (not equal to c) with absolute value of x − c less than delta. Then the absolute value of L₁ − L₂ is less than or equal to the absolute value of L₁ − f(x) plus the absolute value of f(x) − L₂, which is less than epsilon/2 + epsilon/2 = epsilon. Since this holds for arbitrary epsilon, L₁ = L₂.
Why cluster points matter: Without the cluster point condition, we could not guarantee the existence of such an x, and uniqueness would fail.

🚫 When limits don't exist

Example: limit as x approaches 0 of sin(1/x) does not exist.

Proof: Consider the sequence x_n = 1/(pi·n + pi/2). Then x_n → 0, but sin(1/x_n) = sin(pi·n + pi/2) = (−1)^n, which does not converge.
By the sequential characterization (below), the limit cannot exist.
Contrast: limit as x approaches 0 of x·sin(1/x) equals 0, because the absolute value of x·sin(1/x) is less than or equal to the absolute value of x, which goes to 0.

🔄 Sequential characterization of limits

🔄 The fundamental lemma

Lemma: f(x) → L as x → c if and only if for every sequence {x_n} with x_n in S excluding c and limit as n approaches infinity of x_n = c, the sequence {f(x_n)} converges to L.

Forward direction: If f(x) → L, given epsilon, find delta. Since x_n → c, eventually the absolute value of x_n − c is less than delta, so the absolute value of f(x_n) − L is less than epsilon.
Reverse direction (contrapositive): If the limit does not exist, there exists epsilon such that for every delta there is x with the absolute value of x − c less than delta but the absolute value of f(x) − L greater than or equal to epsilon. Use delta = 1/n to construct a sequence {x_n} converging to c but {f(x_n)} not converging to L.
Importance: This connects function limits to sequence limits, allowing us to use all sequence limit theorems.

🧮 Arithmetic of limits

Corollary: If limit as x approaches c of f(x) and limit as x approaches c of g(x) both exist, then:

(i) limit of (f + g) = limit of f + limit of g
(ii) limit of (f − g) = limit of f − limit of g
(iii) limit of (f·g) = (limit of f)·(limit of g)
(iv) If limit of g is not zero and g(x) is not zero for x in S excluding c, then limit of (f/g) = (limit of f)/(limit of g)

Proof strategy: Use the sequential characterization. Take any sequence x_n → c. By the lemma, f(x_n)→L₁ and g(x_n)→L₂. Apply sequence arithmetic theorems to get (f + g)(x_n) → L₁ + L₂, etc. By the lemma again, the function limit exists and equals the claimed value.

📊 Inequality preservation

Corollary: If f(x) ≤ g(x) for all x in S excluding c, and both limits exist, then limit of f ≤ limit of g.

Proof: Take a sequence {x_n} in S excluding c converging to c. Then f(x_n)≤g(x_n)* for all n. By the sequential lemma, f(x_n)→limit of f and g(x_n)→limit of g. By the sequence inequality lemma, limit of f ≤ limit of g.
Squeeze theorem: If f(x) ≤ g(x) ≤ h(x) and limit of f = limit of h, then limit of g exists and equals the common value.
Bounded functions: If a ≤ f(x) ≤ b for all x in S excluding c, then a ≤ limit of f ≤ b.

🔀 Restrictions and one-sided limits

🔀 Restriction of a function

Restriction: Let f : S → real numbers and A ⊂ S. The restriction f|A : A → real numbers is defined by f|A(x) = f(x) for x in A.

The restriction is simply f on a smaller domain.
When restrictions preserve limits: If there exists alpha greater than 0 such that (A excluding c) intersected with (c − alpha, c + alpha) equals (S excluding c) intersected with (c − alpha, c + alpha), then c is a cluster point of A if and only if it is a cluster point of S, and the limits of f and f|A at c are the same.
Why: Near c, the sets A and S contain exactly the same points (except possibly c itself), so the limit behavior is identical.

⬅️➡️ One-sided limits

One-sided limits:

limit as x approaches c from the right of f(x) = limit as x approaches c of f|(S intersect (c, infinity))(x)

limit as x approaches c from the left of f(x) = limit as x approaches c of f|(S intersect (−infinity, c))(x)

These are limits of restrictions to points greater than c or less than c.
Example: Define f(x) = 1 for x less than 0, f(x) = 0 for x greater than or equal to 0. Then limit from the left at 0 equals 1, limit from the right at 0 equals 0, and the two-sided limit does not exist.
Don't confuse: One-sided limits can exist even when the two-sided limit does not.

🔗 Connecting one-sided and two-sided limits

Proposition: If c is a cluster point of both S intersect (−infinity, c) and S intersect (c, infinity), then limit as x approaches c of f(x) = L if and only if both one-sided limits exist and equal L.

Key observation: (S intersect (−infinity, c)) union (S intersect (c, infinity)) = S excluding c.
This gives a practical way to check two-sided limits: verify both one-sided limits exist and agree.

Continuous Functions

3.2 Continuous functions

🧭 Overview

🧠 One-sentence thesis

Continuity—formalized through the epsilon-delta definition—captures the idea that a function behaves predictably near every point, and continuous functions on closed bounded intervals are guaranteed to be bounded and to achieve their extreme values.

📌 Key points (3–5)

Formal definition: A function is continuous at a point if for every epsilon there exists a delta such that nearby inputs produce nearby outputs.
Equivalent characterizations: Continuity at a cluster point means the limit equals the function value; it also means every convergent sequence maps to a convergent sequence.
Building continuous functions: Polynomials, sine, and cosine are continuous; sums, products, quotients, and compositions of continuous functions remain continuous.
Common confusion: A function can map some convergent sequences correctly yet still be discontinuous—you need all sequences converging to a point to map correctly.
Extreme value theorem: On a closed bounded interval, a continuous function is bounded and achieves both its minimum and maximum values.

📐 The epsilon-delta definition

📐 What continuity means formally

Continuous at c: A function f : S → ℝ is continuous at c ∈ S if for every ε > 0 there exists a δ > 0 such that whenever x ∈ S and |x − c| < δ, we have |f(x) − f(c)| < ε.

This definition took three great mathematicians (Bolzano, Cauchy, Weierstrass) to finalize in the late 1800s.
The intuitive "draw without lifting the pen" idea is useful but not rigorous.
Delta depends on both epsilon and the point c; you don't need one delta for all points.
The excerpt emphasizes this is "the most important definition to understand in analysis, and it is not an easy one."

🎯 Geometric interpretation

For any horizontal band of width 2ε around f(c), you can find a vertical band of width 2δ around c such that the graph stays within the horizontal band.
The figure in the excerpt shows this as a gray region: if |x − c| < δ, then the graph of f(x) must lie within ε of f(c).

🔗 Equivalent characterizations

🔗 Continuity via limits

Proposition 3.2.2 part (ii): If c is a cluster point of S, then f is continuous at c if and only if the limit of f(x) as x → c exists and equals f(c).

This connects continuity to the limit concept from the previous section.
The key: the limit must exist and equal the function value at that point.
If c is not a cluster point, f is automatically continuous at c (part i of the proposition).

🔗 Continuity via sequences

Proposition 3.2.2 part (iii): f is continuous at c if and only if for every sequence {xₙ} in S with limit c, the sequence {f(xₙ)} converges to f(c).

This is "particularly powerful" according to the excerpt.
It allows applying sequence limit theorems to prove continuity.
Don't confuse: Finding one sequence that works is not enough; every sequence converging to c must map to a sequence converging to f(c).

Example from the excerpt: The function that equals −1 for x < 0 and 1 for x ≥ 0 is discontinuous at 0. The sequence {−1/n} converges to 0, but f(−1/n) = −1 for all n, so the limit is −1 ≠ f(0) = 1.

🧱 Building continuous functions

🧱 Basic continuous functions

The excerpt proves or states that the following are continuous:

Polynomials: Any function f(x) = aₐxᵈ + ... + a₁x + a₀ is continuous on ℝ.
Reciprocal: f(x) = 1/x is continuous on (0, ∞).
Trigonometric: sin(x) and cos(x) are continuous on ℝ.

The proofs use the sequential characterization: if {xₙ} → c, then {f(xₙ)} → f(c) by properties of sequence limits.

🧱 Algebraic combinations

Proposition 3.2.5: If f and g are continuous at c, then:

f + g is continuous at c
f − g is continuous at c
f · g is continuous at c
f / g is continuous at c (if g(x) ≠ 0 for all x in S)

This follows from the corresponding properties of limits of sequences.

🧱 Composition of continuous functions

Proposition 3.2.7: If g is continuous at c and f is continuous at g(c), then f ∘ g is continuous at c.

Proof idea: A sequence converging to c maps under g to a sequence converging to g(c), which then maps under f to a sequence converging to f(g(c)).
Example: sin(1/x) squared is continuous on (0, ∞) because it's built from continuous pieces.

🚫 Discontinuous functions

🚫 Testing for discontinuity

Proposition 3.2.9: If there exists a sequence {xₙ} in S with limit c such that {f(xₙ)} does not converge to f(c), then f is discontinuous at c.

"Does not converge to f(c)" means either it diverges or converges to something else.
This gives a practical way to prove discontinuity: find one bad sequence.

🚫 Types of discontinuity

Type	Description	Example from excerpt
Jump discontinuity	Limit from left ≠ limit from right	f(x) = −1 if x < 0, f(x) = 1 if x ≥ 0
Removable discontinuity	Limit exists but ≠ f(c); can be "fixed"	g(x) = 0 if x ≠ 0, g(0) = 1
Everywhere discontinuous	Discontinuous at all points	Dirichlet function: 1 if rational, 0 if irrational

🚫 Surprising examples

Thomae (popcorn) function: Defined on (0, 1) as f(x) = 1/k if x = m/k in lowest terms (rational), and f(x) = 0 if x is irrational.

Continuous at every irrational point
Discontinuous at every rational point
This seems paradoxical since rationals are dense in the reals, but it works because the function values get arbitrarily small as denominators grow.

The excerpt asks: "Can there exist a function continuous at all irrational numbers, but discontinuous at all rational numbers?" and answers yes with this example.

🏔️ Extreme value theorem

🏔️ Boundedness on closed intervals

Lemma 3.3.1: A continuous function f : [a, b] → ℝ is bounded.

Proof technique: Assume f is unbounded, construct a sequence {xₙ} with f(xₙ) ≥ n.
Use Bolzano–Weierstrass to extract a convergent subsequence.
The limit point must be in [a, b] (because the interval is closed).
But then f cannot be continuous at that limit point (because the function values are unbounded).
This is a common proof technique: find a sequence with a property, then use Bolzano–Weierstrass to make it converge.

🏔️ Achieving extreme values

Theorem 3.3.2 (Minimum-maximum theorem / Extreme value theorem): A continuous function f : [a, b] → ℝ achieves both an absolute minimum and an absolute maximum on [a, b].

Absolute minimum: f achieves an absolute minimum at c ∈ S if f(x) ≥ f(c) for all x ∈ S.

Absolute maximum: f achieves an absolute maximum at c ∈ S if f(x) ≤ f(c) for all x ∈ S.

The excerpt emphasizes it is "important that the domain of f is a closed and bounded interval [a, b]."
Two key ingredients: boundedness of [a, b] allows using Bolzano–Weierstrass; closedness ensures the limit stays in [a, b].
The proof begins by using the lemma to show f is bounded, then (the excerpt cuts off but would continue to) show the supremum and infimum are actually achieved.

Extreme and intermediate value theorems

3.3 Extreme and intermediate value theorems

🧭 Overview

🧠 One-sentence thesis

Continuous functions on closed and bounded intervals are guaranteed to achieve both absolute minimum and maximum values and to take on every intermediate value between any two function values, properties that fail when the interval is not closed and bounded or when the function is not continuous.

📌 Key points (3–5)

Boundedness lemma: A continuous function on a closed and bounded interval is always bounded.
Extreme value theorem: A continuous function on a closed and bounded interval achieves both an absolute minimum and an absolute maximum.
Intermediate value theorem: A continuous function on a closed interval takes on every value between any two of its function values.
Common confusion: All three hypotheses—continuity, closedness, and boundedness—are necessary; removing any one causes the theorems to fail.
Practical application: The bisection method uses the intermediate value theorem to find roots of continuous functions to any desired precision.

📏 Boundedness of continuous functions

📏 The boundedness lemma

Bounded function: A function f : [a, b] → ℝ is bounded if there exists a B ∈ ℝ such that |f(x)| ≤ B for all x ∈ [a, b].

Lemma 3.3.1: A continuous function f : [a, b] → ℝ is bounded.

The proof uses contrapositive: assume f is not bounded, then construct a sequence where f(xₙ) ≥ n.
The key technique: use the Bolzano–Weierstrass theorem to extract a convergent subsequence from the bounded sequence {xₙ}.
Since the interval is closed, the limit point x stays in [a, b].
The function values f(xₙᵢ) are unbounded, so f cannot be continuous at x.

Why closedness and boundedness matter:

Boundedness of [a, b] allows use of Bolzano–Weierstrass.
Closedness ensures the limit point remains in the interval.

🔑 Common proof technique

Find a sequence with a desired property.
Use Bolzano–Weierstrass to make it converge.
Use continuity to pass limits through the function.

🎯 Extreme value theorem (min-max theorem)

🎯 Absolute extrema definitions

Absolute minimum: f : S → ℝ achieves an absolute minimum at c ∈ S if f(x) ≥ f(c) for all x ∈ S.

Absolute maximum: f : S → ℝ achieves an absolute maximum at c ∈ S if f(x) ≤ f(c) for all x ∈ S.

The value f(c) itself is called the absolute minimum (or maximum).
"Achieves" means there exists a point c where the extremum occurs.

🎯 The theorem statement

Theorem 3.3.2 (Minimum-maximum theorem / Extreme value theorem): A continuous function f : [a, b] → ℝ achieves both an absolute minimum and an absolute maximum on [a, b].

Proof sketch:

By the boundedness lemma, f([a, b]) has a supremum and infimum.
Construct sequences {f(xₙ)} and {f(yₙ)} approaching the infimum and supremum.
Apply Bolzano–Weierstrass to {xₙ} and {yₙ} to get convergent subsequences.
The limits x and y lie in [a, b] (by closedness).
By continuity, f(x) = inf f([a, b]) and f(y) = sup f([a, b]).

⚠️ Why all hypotheses are needed

Example 3.3.4 (unbounded interval): f(x) = x on ℝ achieves neither minimum nor maximum.

The interval must be bounded.

Example 3.3.5 (open interval): f(x) = 1/x on (0, 1) achieves neither minimum nor maximum.

As x → 0, f(x) → ∞ (unbounded).
As x → 1, f(x) → 1, but f(x) > 1 for all x ∈ (0, 1) (no point achieves the infimum).
The interval must be closed.

Example 3.3.6 (discontinuous): f : [0, 1] → ℝ with f(x) = 1/x for x > 0 and f(0) = 0 does not achieve a maximum.

The function must be continuous.

Example 3.3.3: f(x) = x² + 1 on [−1, 2] achieves minimum 1 at x = 0 and maximum 5 at x = 2.

Note: the domain matters—changing the interval changes where extrema occur.

🌉 Intermediate value theorem

🌉 The zero-crossing lemma

Lemma 3.3.7: Let f : [a, b] → ℝ be continuous. If f(a) < 0 and f(b) > 0, then there exists c ∈ (a, b) such that f(c) = 0.

Proof technique (bisection):

Start with a₁ = a, b₁ = b.
At each step, check the midpoint: (aₙ + bₙ)/2.
If f(midpoint) ≥ 0, set aₙ₊₁ = aₙ and bₙ₊₁ = midpoint.
If f(midpoint) < 0, set aₙ₊₁ = midpoint and bₙ₊₁ = bₙ.
The sequences {aₙ} and {bₙ} are monotone and bounded, so they converge.
The interval length bₙ − aₙ = (b − a)/2ⁿ⁻¹ → 0, so both converge to the same limit c.
By construction, f(aₙ) < 0 and f(bₙ) ≥ 0 for all n.
By continuity, f(c) = lim f(aₙ) ≤ 0 and f(c) = lim f(bₙ) ≥ 0, so f(c) = 0.

🌉 Bolzano's intermediate value theorem

Theorem 3.3.8 (Bolzano's intermediate value theorem): Let f : [a, b] → ℝ be continuous. Suppose y ∈ ℝ is such that f(a) < y < f(b) or f(a) > y > f(b). Then there exists c ∈ (a, b) such that f(c) = y.

What it means: A continuous function on a closed interval achieves all values between the values at the endpoints.

Proof: Define g(x) = f(x) − y (or g(x) = y − f(x) depending on the case) and apply Lemma 3.3.7 to g.

🔧 Bisection method (practical application)

Example 3.3.9: The polynomial f(x) = x³ − 2x² + x − 1 has a real root in (1, 2).

f(1) = −1 and f(2) = 1, so by the intermediate value theorem, there exists c ∈ (1, 2) with f(c) = 0.
To approximate the root:
- f(1.5) = −0.625, so root is in (1.5, 2).
- f(1.75) ≈ −0.016, so root is in (1.75, 2).
- f(1.875) ≈ 0.44, so root is in (1.75, 1.875).
- Continue until desired precision is reached.
- The actual root is c ≈ 1.7549.

Advantages of bisection:

Simple and guaranteed to work for any continuous function.
Once an interval is found where the theorem applies, a root can be found to any desired precision in finitely many steps.
Not just for polynomials—works for any continuous function.

Limitation: The theorem guarantees one c, but there may be other roots; the bisection method finds only one.

🔢 Applications to polynomials

Proposition 3.3.10: Every polynomial of odd degree has at least one real root.

Proof sketch:

Write f(x) = aₐxᵈ + aₐ₋₁xᵈ⁻¹ + ⋯ + a₁x + a₀ with aₐ ≠ 0 and d odd.
Divide by aₐ to get monic polynomial g(x) = xᵈ + bₐ₋₁xᵈ⁻¹ + ⋯ + b₁x + b₀.
Show that for large n, the highest-order term dominates: g(n) > 0 for some large n.
Similarly, g(−n) < 0 for some large n (using that d is odd, so (−n)ᵈ = −nᵈ).
Apply the intermediate value theorem to find c with g(c) = 0.

Example 3.3.11 (existence of kth roots): For any k ∈ ℕ and any y > 0, there exists x > 0 such that xᵏ = y.

Define f(x) = xᵏ − y.
f(0) = −y < 0.
If y < 1, then f(1) = 1 − y > 0.
If y > 1, then f(y) = yᵏ − y = y(yᵏ⁻¹ − 1) > 0.
Apply Bolzano's theorem to find x > 0 with f(x) = 0, i.e., xᵏ = y.
This proves the existence of √2, ∛5, etc., without the hard work of Example 1.2.3.

⚠️ Discontinuous functions with intermediate value property

Example 3.3.12: The function f(x) = sin(1/x) for x ≠ 0 and f(0) = 0 is not continuous at 0, but it has the intermediate value property.

Whenever a < b and y is between f(a) and f(b), there exists c ∈ (a, b) with f(c) = y.
Don't confuse: the intermediate value property does not imply continuity.

📦 Image of continuous functions on closed intervals

📦 Corollary on the image

Corollary 3.3.13: If f : [a, b] → ℝ is continuous, then the direct image f([a, b]) is a closed and bounded interval or a single number.

What it means:

The image is not just any set—it has a specific structure.
The endpoints of the image interval are the absolute minimum and maximum of f (by the extreme value theorem).
All values between the minimum and maximum are achieved (by the intermediate value theorem).

Hint for proof: See Figure 3.8 and notice what the endpoints of the image interval are.

📊 Summary comparison

Property	Requires	Fails when	Example of failure
Boundedness	Continuous, closed, bounded interval	Open interval, unbounded interval, or discontinuous	f(x) = 1/x on (0, 1)
Achieves minimum/maximum	Continuous, closed, bounded interval	Open interval, unbounded interval, or discontinuous	f(x) = x on ℝ
Intermediate value	Continuous, interval	Discontinuous	f with jump discontinuity
Image is closed interval	Continuous, closed bounded interval	Open interval or discontinuous	f(x) = 1/x on (0, 1)

Uniform continuity

3.4 Uniform continuity

🧭 Overview

🧠 One-sentence thesis

Uniform continuity strengthens ordinary continuity by requiring a single δ to work for all points simultaneously, and continuous functions on closed bounded intervals are always uniformly continuous.

📌 Key points (3–5)

What uniform continuity adds: the δ in the ε–δ definition depends only on ε, not on the point c, so one δ works for all points at once.
Key theorem: every continuous function on a closed and bounded interval [a, b] is uniformly continuous.
Common confusion: a function can be continuous everywhere but not uniformly continuous (e.g., x² on all of ℝ); the domain matters crucially.
Lipschitz continuity: a stronger condition where |f(x) − f(y)| ≤ K|x − y| for some constant K; every Lipschitz function is uniformly continuous.
Extension result: a function on an open interval (a, b) is uniformly continuous if and only if it extends continuously to the closed interval [a, b].

🔍 What uniform continuity means

🔍 The definition and key difference

Uniform continuity: Let S ⊂ ℝ and f : S → ℝ be a function. The function f is uniformly continuous if for every ε > 0 there exists a δ > 0 such that whenever x, c ∈ S and |x − c| < δ, then |f(x) − f(c)| < ε.

The only difference from ordinary continuity: δ depends only on ε, not on the point c.
In ordinary continuity, δ may depend on both ε and c; in uniform continuity, δ depends only on ε and works for all c ∈ S simultaneously.
The domain S matters: a function may be uniformly continuous on one set but not on a larger set.
Don't confuse: x and c are treated symmetrically in the definition; neither is special.

🧮 Example: x² on [0, 1] is uniformly continuous

On the interval [0, 1], the function f(x) = x² is uniformly continuous.
Why: For x, c in [0, 1], we have |x² − c²| = |x + c||x − c| ≤ (|x| + |c|)|x − c| ≤ (1 + 1)|x − c| = 2|x − c|.
Given ε > 0, choose δ = ε/2. Then |x − c| < δ implies |x² − c²| ≤ 2|x − c| < 2δ = ε.
The key: the bound 2 on |x + c| is uniform over the entire interval [0, 1].

❌ Example: x² on ℝ is not uniformly continuous

On all of ℝ, the function g(x) = x² is not uniformly continuous.
Why: Suppose it were uniformly continuous. For any ε > 0, there would exist δ > 0 such that |x − c| < δ implies |x² − c²| < ε.
Take x > 0 and c = x + δ/2. Then |x − c| = δ/2 < δ, but |x² − c²| = |x + c||x − c| = (2x + δ/2)(δ/2) ≥ δx.
This gives ε > δx, so x < ε/δ for all x > 0, which is impossible.
Intuition: As x grows, the function becomes steeper, so no single δ can control the variation everywhere.

❌ Example: 1/x on (0, 1) is not uniformly continuous

The function f(x) = 1/x on (0, 1) is not uniformly continuous.
Why: The inequality ε > |1/x − 1/y| = |y − x|/(xy) holds if and only if |x − y| < xyε.
For ε < 1, suppose δ > 0 works. Take x in (0, 1) and y = x + δ/2 in (0, 1). Then |x − y| = δ/2 < δ.
The uniform continuity condition would require δ/2 < x(x + δ/2)ε < x for all x > 0, which forces δ ≤ 0.
Intuition: Near 0, the function blows up, so no single δ can control the variation near the boundary.

🎯 Main theorem: continuity on closed bounded intervals

🎯 Continuous on [a, b] implies uniformly continuous

Theorem: Let f : [a, b] → ℝ be a continuous function. Then f is uniformly continuous.

This is a fundamental result: on closed and bounded intervals, continuity automatically upgrades to uniform continuity.
Proof strategy (by contrapositive): Assume f is not uniformly continuous. Show there exists a point c in [a, b] where f is not continuous.
If f is not uniformly continuous, there exists ε > 0 such that for every δ > 0, there exist x, y in [a, b] with |x − y| < δ but |f(x) − f(y)| ≥ ε.
Build sequences {xₙ} and {yₙ} with |xₙ − yₙ| < 1/n and |f(xₙ) − f(yₙ)| ≥ ε.
By Bolzano–Weierstrass, {xₙ} has a convergent subsequence {xₙₖ} → c in [a, b].
The corresponding subsequence {yₙₖ} also converges to c (since |yₙₖ − c| ≤ |yₙₖ − xₙₖ| + |xₙₖ − c| < 1/nₖ + |xₙₖ − c| → 0).
But |f(xₙₖ) − f(yₙₖ)| ≥ ε, so at least one of {f(xₙₖ)} or {f(yₙₖ)} cannot converge to f(c), contradicting continuity at c.

🔑 Why the theorem needs closed and bounded

Bounded is needed: Bolzano–Weierstrass requires the sequence to lie in a bounded set to extract a convergent subsequence.
Closed is needed: The limit c of the subsequence must lie in the domain [a, b] for f(c) to be defined.
Example: x² is continuous on ℝ (unbounded) but not uniformly continuous.
Example: 1/x is continuous on (0, 1) (not closed) but not uniformly continuous.

🔗 Continuous extension

🔗 Uniformly continuous functions preserve Cauchy sequences

Lemma: Let S ⊂ ℝ and let f : S → ℝ be uniformly continuous. Let {xₙ} be a Cauchy sequence in S. Then {f(xₙ)} is Cauchy.

Why: Given ε > 0, uniform continuity gives δ > 0 such that |x − y| < δ implies |f(x) − f(y)| < ε.
Since {xₙ} is Cauchy, find M such that for all n, k ≥ M, we have |xₙ − xₖ| < δ.
Then for all n, k ≥ M, we have |f(xₙ) − f(xₖ)| < ε, so {f(xₙ)} is Cauchy.
Key point: The limit of {xₙ} may not be in S, but {f(xₙ)} is still Cauchy (and thus convergent in ℝ).

🔗 Extension to closed intervals

Proposition: A function f : (a, b) → ℝ is uniformly continuous if and only if the limits Lₐ = lim_{x→a} f(x) and L_b = lim_{x→b} f(x) exist and the function f̃ : [a, b] → ℝ defined by f̃(x) = f(x) for x ∈ (a, b), f̃(a) = Lₐ, f̃(b) = L_b is continuous.

One direction (easy): If f̃ is continuous on [a, b], then f̃ is uniformly continuous by the main theorem, so f (the restriction) is also uniformly continuous.
Other direction: Suppose f is uniformly continuous on (a, b). Must show the limits Lₐ and L_b exist.
Take any sequence {xₙ} in (a, b) with xₙ → a. Since {xₙ} is Cauchy, {f(xₙ)} is Cauchy by the lemma, so {f(xₙ)} converges to some limit.
Show this limit is independent of the choice of sequence: if {yₙ} also converges to a, then for large n, |xₙ − yₙ| < δ, so |f(xₙ) − f(yₙ)| < ε/3, and the limits must be equal.
Thus Lₐ = lim_{x→a} f(x) exists. Similarly for L_b.
The extended function f̃ is continuous at a and b by definition of the limit, and continuous at c ∈ (a, b) because f is continuous there.

🔗 Application: removable singularities

If f : (−1, 0) ∪ (0, 1) → ℝ is uniformly continuous, then lim_{x→0} f(x) exists.
The function has a removable singularity at 0: we can extend f to a continuous function on (−1, 1) by defining f(0) = lim_{x→0} f(x).

🏃 Lipschitz continuity

🏃 Definition and basic property

Lipschitz continuity: A function f : S → ℝ is Lipschitz continuous if there exists a K ∈ ℝ such that |f(x) − f(y)| ≤ K|x − y| for all x and y in S.

The constant K is called a Lipschitz constant.
Proposition: Every Lipschitz continuous function is uniformly continuous.
Proof: Given ε > 0, take δ = ε/K. Then |x − y| < δ implies |f(x) − f(y)| ≤ K|x − y| < Kδ = ε.
Don't confuse: Lipschitz is stronger than uniform continuity; not every uniformly continuous function is Lipschitz.

📐 Geometric interpretation

For x ≠ y, the Lipschitz condition says |f(x) − f(y)|/|x − y| ≤ K.
The quantity |f(x) − f(y)|/|x − y| is the absolute value of the slope of the secant line between (x, f(x)) and (y, f(y)).
Interpretation: f is Lipschitz continuous if and only if every secant line has slope (in absolute value) at most K.
The function cannot be "too steep" anywhere.

✅ Examples of Lipschitz functions

Function	Domain	Lipschitz?	Constant K	Why
sin(x)	ℝ	Yes	1	\|sin(x) − sin(y)\| ≤ \|x − y\|
cos(x)	ℝ	Yes	1	\|cos(x) − cos(y)\| ≤ \|x − y\|
√x	[1, ∞)	Yes	1/2	\|√x − √y\| = \|x − y\|/(√x + √y) ≤ (1/2)\|x − y\|
√x	[0, ∞)	No	—	Secant lines through (0, 0) become vertical as x → 0
1/x	(c, ∞), c > 0	Yes	1/c²	(Exercise)
1/x	(0, ∞)	No	—	Secant lines become arbitrarily steep near 0

🔍 √x on [0, ∞): uniformly continuous but not Lipschitz

The function g(x) = √x on [0, ∞) is uniformly continuous but not Lipschitz.
Not Lipschitz: Suppose |√x − √y| ≤ K|x − y| for all x, y ≥ 0. Set y = 0 to get √x ≤ Kx, so 1/K² ≤ x for all x > 0, which is impossible.
Uniformly continuous: On [0, 1], it is uniformly continuous by the main theorem. On [1, ∞), it is Lipschitz (hence uniformly continuous). Combining these (exercise), √x is uniformly continuous on [0, ∞).
Key lesson: Uniform continuity is weaker than Lipschitz continuity; there exist functions that are uniformly continuous but not Lipschitz.

Limits at infinity

3.5 Limits at infinity

🧭 Overview

🧠 One-sentence thesis

Limits at infinity extend the notion of limits to describe how functions behave as the input variable grows arbitrarily large (positive or negative), and these limits connect to sequential limits while allowing functions to converge to finite values or diverge to infinity.

📌 Key points (3–5)

What "limit at infinity" means: the function approaches a finite value L as the input x grows beyond any bound (positive or negative).
Cluster points at infinity: infinity (or negative infinity) is a cluster point of a set S if you can always find elements in S larger (or smaller) than any given real number.
Connection to sequences: a continuous limit as x → ∞ equals L if and only if every sequence going to infinity produces a sequence of function values converging to L.
Common confusion: continuous limits vs. sequential limits—sin(πn) → 0 as a sequence (n ∈ ℕ), but sin(πx) has no limit as x → ∞ (x ∈ ℝ) because the function oscillates.
Infinite limits: functions can "diverge to infinity" in a controlled way, written as lim f(x) = ∞, meaning f(x) grows beyond any bound.

🔍 Core definitions

🔍 Cluster point at infinity

Cluster point at infinity: ∞ is a cluster point of S ⊂ ℝ if for every M ∈ ℝ, there exists an x ∈ S such that x ≥ M.

This means S contains arbitrarily large elements.
Similarly, −∞ is a cluster point if S contains arbitrarily small (negative) elements: for every M ∈ ℝ, there exists x ∈ S with x ≤ M.
Example: the set [0, ∞) has ∞ as a cluster point; the set (−∞, 0] has −∞ as a cluster point.

🔍 Limit as x → ∞

Limit as x → ∞: f(x) converges to L as x goes to ∞ if for every ε > 0, there exists M ∈ ℝ such that |f(x) − L| < ε whenever x ∈ S and x ≥ M.

Notation: lim (x → ∞) f(x) = L, or f(x) → L as x → ∞.
The definition for x → −∞ is analogous: replace "x ≥ M" with "x ≤ M."
The limit L, if it exists, is unique (Proposition 3.5.2).

📐 Examples and non-examples

✅ Example: f(x) = 1/(|x| + 1)

Claim: lim (x → ∞) f(x) = 0 and lim (x → −∞) f(x) = 0.
Why: given ε > 0, choose M large enough so that 1/(M + 1) < ε. If x ≥ M, then 0 < 1/(|x| + 1) = 1/(x + 1) ≤ 1/(M + 1) < ε.
The function gets arbitrarily close to 0 as x grows.

❌ Non-example: f(x) = sin(πx)

Claim: lim (x → ∞) sin(πx) does not exist.
Why: at x = 2n + 1/2 (n ∈ ℕ), f(x) = 1; at x = 2n + 3/2, f(x) = −1. These values cannot both be within a small ε of a single real number L.
Don't confuse: the sequence sin(πn) → 0 as n → ∞ (n ∈ ℕ), because sin(πn) = 0 for all integers n. The continuous limit and the sequential limit are different concepts.

🔄 Continuous vs. sequential limits

Notation ambiguity: lim (n → ∞) typically means n ∈ ℕ (sequence); lim (x → ∞) means x ∈ ℝ (continuous variable).
If confusion is possible, write explicitly: lim (n → ∞, n ∈ ℕ) sin(πn).

🔗 Connection to sequences

🔗 Lemma 3.5.5: sequential characterization

Lemma: lim (x → ∞) f(x) = L if and only if lim (n → ∞) f(xₙ) = L for all sequences {xₙ} in S such that lim (n → ∞) xₙ = ∞.

Forward direction: if f(x) → L as x → ∞, then every sequence xₙ → ∞ produces f(xₙ) → L.
- Given ε > 0, find M such that |f(x) − L| < ε for all x ≥ M.
- Since xₙ → ∞, there exists N such that xₙ ≥ M for all n ≥ N.
- Thus |f(xₙ) − L| < ε for all n ≥ N.
Reverse direction (contrapositive): if f(x) does not go to L, then there exists a sequence xₙ → ∞ such that f(xₙ) does not converge to L.
- There exists ε > 0 such that for every n ∈ ℕ, there exists xₙ ∈ S with xₙ ≥ n and |f(xₙ) − L| ≥ ε.
- The sequence {xₙ} satisfies xₙ ≥ n, so xₙ → ∞, but {f(xₙ)} does not converge to L.

🔗 Why this matters

This lemma allows translating results about sequential limits into results about continuous limits.
It provides a tool to prove or disprove limits at infinity by examining all possible sequences.

♾️ Infinite limits

♾️ Divergence to infinity

Diverges to infinity: f(x) diverges to infinity as x → ∞ if for every N ∈ ℝ, there exists M ∈ ℝ such that f(x) > N whenever x ∈ S and x ≥ M.

Notation: lim (x → ∞) f(x) = ∞, or f(x) → ∞ as x → ∞.
This is not a "true" limit (∞ is not a real number), but it distinguishes a specific type of divergence.
Similar definitions exist for:
- f(x) → −∞ as x → ∞
- f(x) → ∞ or −∞ as x → −∞
- f(x) → ∞ or −∞ as x → c (finite c)

♾️ Example: (1 + x²)/(1 + x)

Claim: lim (x → ∞) (1 + x²)/(1 + x) = ∞.
Proof: for x ≥ 1, we have (1 + x²)/(1 + x) ≥ x²/(x + x) = x/2.
Given N ∈ ℝ, take M = max{2N + 1, 1}. If x ≥ M, then x ≥ 1 and x/2 > N, so (1 + x²)/(1 + x) ≥ x/2 > N.

♾️ Terminology note

Some sources say "converges to infinity"; the excerpt uses "diverges to infinity" to emphasize that ∞ is not a real limit.
Exercise 3.5.4 asks to show that if f(x) → ∞, then f(x) does not converge (to any real number).

🧩 Composition of limits

🧩 Proposition 3.5.8: composing functions

Proposition: Suppose f: A → B, g: B → ℝ, a is a cluster point of A, and b is a cluster point of B. If lim (x → a) f(x) = b and lim (y → b) g(y) = c, and if b ∈ B then g(b) = c, then lim (x → a) g(f(x)) = c.

Here a, b, c can be real numbers, ∞, or −∞.
Key requirement: if b ∈ B (i.e., b is a real number in the domain of g), then g must be continuous at b: g(b) = c.
This generalizes earlier composition results (Exercises 3.1.9 and 3.1.14) to include infinite limits.

🧩 Example: h(x) = e^(−x² + x)

Claim: lim (x → ∞) h(x) = 0.
Why:
- First, lim (x → ∞) (−x² + x) = −∞ (the quadratic term dominates).
- Second, lim (y → −∞) eʸ = 0 (standard exponential fact).
- By composition, lim (x → ∞) e^(−x² + x) = 0.

🔄 Relationship to finite limits

🔄 Exercise 3.5.2: transforming limits

Let f: [1, ∞) → ℝ. Define g: (0, 1] → ℝ by g(x) = f(1/x).
Then lim (x → 0⁺) g(x) exists if and only if lim (x → ∞) f(x) exists, and they are equal.
This shows that limits at infinity can be "transformed" into limits at a finite point via a change of variable.

🔄 Exercise 3.5.7: sequences as functions

A sequence {xₙ} can be viewed as a function f: ℕ → ℝ with f(n) = xₙ.
The two notions of limit—lim (n → ∞) xₙ (sequence) and lim (x → ∞) f(x) (continuous)—are equivalent.
This formalizes the connection between discrete and continuous limits.

Monotone functions and continuity

3.6 Monotone functions and continuity

🧭 Overview

🧠 One-sentence thesis

Monotone functions have at most countably many discontinuities, their one-sided limits can be computed via suprema and infima, and strictly monotone functions always have continuous inverses.

📌 Key points (3–5)

One-sided limits via sup/inf: For monotone functions, every one-sided limit exists (when it makes sense) and equals the supremum or infimum of the function values on the appropriate side.
Discontinuity constraint: A monotone function on an interval can have at most countably many discontinuities—not uncountably many.
Continuity ↔ interval image: A non-constant monotone function on an interval has an interval as its image if and only if it is continuous.
Inverse continuity: If a function is strictly monotone on an interval, its inverse is automatically continuous (even if the original function is not continuous).
Common confusion: Strictly monotone does not require continuity, but the inverse will still be continuous; also, a monotone function can be discontinuous on a dense set like the rationals yet still bounded.

📐 Definitions and basic properties

📐 Monotone and strictly monotone

Increasing: A function f on S is increasing if x < y implies f(x) ≤ f(y).

Strictly increasing: A function f on S is strictly increasing if x < y implies f(x) < f(y).

Decreasing and strictly decreasing are defined by reversing the inequalities for f.
Monotone means either increasing or decreasing.
Strictly monotone means either strictly increasing or strictly decreasing.
Alternative terminology: "nondecreasing" emphasizes that the function is not strictly increasing.

Why the "strictly" distinction matters:

Increasing allows flat segments (f(x) = f(y) even when x ≠ y).
Strictly increasing forbids flat segments; it is automatically one-to-one (injective).

🔄 Symmetry between increasing and decreasing

If f is increasing, then −f is decreasing, and vice versa.
Many results can be proved for increasing functions only; the decreasing case follows by applying the result to −f.

🧮 One-sided limits and suprema/infima

🧮 Computing limits via sup and inf (Proposition 3.6.2)

For an increasing function f on S and a point c:

Limit direction	Formula	Interpretation
Left-hand limit (x → c⁻)	sup{f(x) : x < c, x ∈ S}	The limit from the left is the supremum of all values to the left
Right-hand limit (x → c⁺)	inf{f(x) : x > c, x ∈ S}	The limit from the right is the infimum of all values to the right
Limit at infinity (x → ∞)	sup{f(x) : x ∈ S}	The limit as x → ∞ is the overall supremum
Limit at −∞ (x → −∞)	inf{f(x) : x ∈ S}	The limit as x → −∞ is the overall infimum

For a decreasing function g, the formulas swap sup and inf.

Key insight:

All one-sided limits exist whenever they make sense (i.e., whenever c is a cluster point of the appropriate side).
The limits may be infinite, but they always exist in the extended real sense.

🔍 Why this works (sketch of proof)

For increasing f and left-hand limit: let a = sup{f(x) : x < c, x ∈ S}.
If a = ∞, given any M, there exists x_M < c with f(x_M) > M; since f is increasing, all x between x_M and c satisfy f(x) ≥ f(x_M) > M.
If a < ∞, given ε > 0, there exists x_ε < c with f(x_ε) > a − ε; for x between x_ε and c, we have a − ε < f(x_ε) ≤ f(x) ≤ a, so |f(x) − a| < ε.

🚫 Discontinuities of monotone functions

🚫 At most countably many (Corollary 3.6.4)

Theorem: If f is monotone on an interval I, then f has at most countably many discontinuities.

Why this is true:

Suppose f is increasing and c is a discontinuity (not an endpoint).
Let a = lim (x → c⁻) f(x) and b = lim (x → c⁺) f(x).
Since c is a discontinuity, a < b (there is a "jump").
For each discontinuity c, pick a rational number q in the open interval (a, b).
Different discontinuities yield different jumps, so different rational numbers.
This defines an injection from the set of discontinuities into ℚ.
Since ℚ is countable, the set of discontinuities is countable.

Don't confuse:

"At most countably many" allows finitely many or countably infinitely many, but not uncountably many.
A monotone function can have infinitely many discontinuities (see Example 3.6.5) but cannot be discontinuous everywhere.

📊 Example: countably many discontinuities (Example 3.6.5)

The excerpt describes a function f : [0, 1] → ℝ defined by:

f(x) = x + (sum from n=0 to floor(1/(1−x)) of 2^(−n)) for x < 1,
f(1) = 3.

Properties:

f is strictly increasing.
f is bounded.
f has a discontinuity at every point of the form 1 − 1/k for k ∈ ℕ.
There are countably many discontinuities, yet f is defined on a closed bounded interval.

Takeaway:

Even a bounded, strictly increasing function on a compact interval can have infinitely many discontinuities.
The excerpt also notes that one can construct a monotone function discontinuous on a dense set (e.g., all rationals).

🔗 Continuity and interval images

🔗 Monotone + continuous ↔ interval image (Corollary 3.6.3)

Theorem: If I is an interval and f : I → ℝ is monotone and not constant, then f(I) is an interval if and only if f is continuous.

Forward direction (continuous ⇒ interval image):

Suppose f is continuous and increasing.
Take two points f(x₁), f(x₂) in f(I) with f(x₁) < f(x₂).
Since f is increasing, x₁ < x₂.
By the intermediate value theorem, every y between f(x₁) and f(x₂) is achieved at some c ∈ (x₁, x₂) ⊂ I.
Hence f(I) is an interval.

Reverse direction (interval image ⇒ continuous):

Prove by contrapositive: assume f is not continuous at some interior point c ∈ I.
Let a = lim (x → c⁻) f(x) and b = lim (x → c⁺) f(x).
Since c is a discontinuity, a < b.
No point in the open interval (a, b) except possibly f(c) is in f(I).
But there exist x₁ < c and x₂ > c in I, so f(x₁) ≤ a and f(x₂) ≥ b are both in f(I).
Hence f(I) is not an interval (it has a gap).

Example scenario:

A strictly increasing function with a jump discontinuity at c: values to the left approach a, values to the right start at b > a, so the interval (a, b) is missing from the image.

🔄 Inverse functions and continuity

🔄 Strictly monotone ⇒ continuous inverse (Proposition 3.6.6)

Theorem: If I is an interval and f : I → ℝ is strictly monotone, then the inverse f⁻¹ : f(I) → I is continuous.

Why strictly monotone functions are injective:

If x ≠ y, assume x < y.
Strictly increasing: f(x) < f(y), so f(x) ≠ f(y).
Strictly decreasing: f(x) > f(y), so f(x) ≠ f(y).
Hence f is one-to-one and has an inverse on its range.

Why the inverse is continuous (sketch for strictly increasing f):

The inverse f⁻¹ is also strictly increasing.
Take c ∈ f(I); we want to show f⁻¹ is continuous at c.
Compute the one-sided limits of f⁻¹ at c using sup and inf (since f⁻¹ is monotone).
Let x₀ = lim (y → c⁻) f⁻¹(y) = sup{f⁻¹(y) : y < c, y ∈ f(I)} = sup{x ∈ I : f(x) < c}.
Let x₁ = lim (y → c⁺) f⁻¹(y) = inf{f⁻¹(y) : y > c, y ∈ f(I)} = inf{x ∈ I : f(x) > c}.
Since f is strictly increasing, for x > x₀ we have f(x) > c, so {x ∈ I : x > x₀} ⊂ {x ∈ I : f(x) > c}.
Taking infima: x₀ ≥ x₁.
But f⁻¹ is increasing, so x₀ ≤ x₁.
Hence x₀ = x₁, so the left and right limits agree, and f⁻¹ is continuous at c.

🔍 Example: discontinuous f, continuous f⁻¹ (Example 3.6.7)

Define f : ℝ → ℝ by:

f(x) = x if x < 0,
f(x) = x + 1 if x ≥ 0.

Properties:

f is strictly increasing.
f is not continuous at 0 (jump discontinuity).
The image of ℝ is (−∞, 0) ∪ [1, ∞), which is not an interval.
The inverse f⁻¹ : (−∞, 0) ∪ [1, ∞) → ℝ is given by:
- f⁻¹(y) = y if y < 0,
- f⁻¹(y) = y − 1 if y ≥ 1.
f⁻¹ is continuous on its domain.

Don't confuse:

The proposition does not require f to be continuous.
If f(I) happens to be an interval, then by Corollary 3.6.3, both f and f⁻¹ are continuous.
If f(I) is not an interval, f may be discontinuous, but f⁻¹ is still continuous.

🧩 Summary: strictly monotone on intervals

Condition	f continuous?	f(I) an interval?	f⁻¹ continuous?
f strictly monotone, I interval	Not necessarily	Not necessarily	Always
f strictly monotone, continuous, I interval	Yes	Yes	Yes
f strictly monotone, f(I) = J interval	Yes (by Corollary 3.6.3)	Yes	Yes

Key takeaway:

Strict monotonicity on an interval guarantees the inverse is continuous, regardless of whether f itself is continuous.
If both f and f⁻¹ map intervals to intervals and are onto, then both are continuous.

The Derivative

4.1 The derivative

🧭 Overview

🧠 One-sentence thesis

The derivative captures the instantaneous rate of change of a function at a point by measuring the slope of the tangent line to its graph, and it obeys algebraic rules (linearity, product, quotient, and chain rules) that allow us to differentiate complex functions systematically.

📌 Key points (3–5)

What the derivative measures: the slope of the tangent line to the graph at a point, obtained as the limit of the difference quotient as x approaches c.
Differentiability implies continuity: if a function is differentiable at a point, it must be continuous there (but continuity does not guarantee differentiability).
Algebraic structure: derivatives are linear (constant multiples and sums), and products and quotients follow specific rules (product rule and quotient rule).
Chain rule for composition: the derivative of a composite function is the product of the derivatives of the outer and inner functions, evaluated appropriately.
Common confusion: not every continuous function is differentiable (example: absolute value at zero), and the derivative of a product is not the product of derivatives.

📐 Definition and core concept

📐 What is the derivative

Derivative at c: If f is a function on an interval I and c is in I, the derivative of f at c is the limit L = lim (x→c) [f(x) - f(c)] / (x - c), denoted f′(c), provided this limit exists.

The expression [f(x) - f(c)] / (x - c) is called the difference quotient.
It represents the slope of the secant line through the points (c, f(c)) and (x, f(x)).
As x approaches c, the secant line approaches the tangent line at (c, f(c)).
The derivative f′(c) is the slope of this tangent line.

🎯 Geometric interpretation

The left plot shows the secant line with slope [f(x) - f(c)] / (x - c).
Taking the limit as x → c gives the tangent line with slope f′(c).
This captures the "instantaneous" rate of change at the point c.

✅ When differentiability is defined

The excerpt allows c to be an endpoint of a closed interval I.
Some calculus books exclude endpoints, but including them simplifies the theory.
If f is differentiable at all points in I, we say f is differentiable and obtain a function f′ : I → ℝ.

🔗 Relationship to continuity

🔗 Differentiability implies continuity

Proposition: If f is differentiable at c, then f is continuous at c.

Why this is true:

We can write f(x) - f(c) = [f(x) - f(c)] / (x - c) · (x - c).
The first factor has limit f′(c) as x → c.
The second factor has limit 0 as x → c.
Therefore lim (x→c) [f(x) - f(c)] = f′(c) · 0 = 0, so lim (x→c) f(x) = f(c).

⚠️ Continuity does not imply differentiability

Example: The absolute value function f(x) = |x| is continuous at 0 but not differentiable there.

For x > 0: [|x| - |0|] / (x - 0) = x / x = 1.
For x < 0: [|x| - |0|] / (x - 0) = -x / x = -1.
The left and right limits differ, so the limit does not exist.

Don't confuse: A function can be continuous everywhere but differentiable nowhere (Weierstrass example mentioned but not constructed in the excerpt).

🧮 Basic examples

🧮 Power function: f(x) = x²

For any c, the difference quotient is (x² - c²) / (x - c) = (x + c)(x - c) / (x - c) = x + c.
Taking the limit as x → c gives f′(c) = 2c.

📏 Linear function: f(x) = ax + b

The difference quotient is [a(x - c)] / (x - c) = a for x ≠ c.
Therefore f′(c) = a (constant slope everywhere).
The excerpt notes: "every differentiable function 'infinitesimally' behaves like the affine function ax + b."

√ Square root: f(x) = √x

For c > 0 and x > 0, multiply numerator and denominator by the conjugate:
- (√x - √c) / (x - c) = 1 / (√x + √c).
Taking the limit as x → c gives f′(c) = 1 / (2√c).
This shows f is differentiable for x > 0.

⚙️ Algebraic rules

⚙️ Linearity of the derivative

Proposition (Linearity): If f and g are differentiable at c and α is a constant:

(αf)′(c) = α f′(c) (constant multiple rule).
(f + g)′(c) = f′(c) + g′(c) (sum rule).

Why linearity holds:

For constant multiples: [αf(x) - αf(c)] / (x - c) = α · [f(x) - f(c)] / (x - c).
For sums: [f(x) + g(x) - f(c) - g(c)] / (x - c) = [f(x) - f(c)] / (x - c) + [g(x) - g(c)] / (x - c).
Both limits exist by properties of limits, giving the stated formulas.

🔄 Product rule (Leibniz rule)

Proposition (Product rule): If f and g are differentiable at c and h(x) = f(x)g(x), then:

h′(c) = f(c)g′(c) + f′(c)g(c).

Key identity (illustrated in Figure 4.2):

f(x)g(x) - f(c)g(c) = f(x)[g(x) - g(c)] + [f(x) - f(c)]g(c).
This splits the change in area into two rectangles.
Dividing by (x - c) and taking limits gives the product rule.

Don't confuse: The derivative of a product is NOT the product of the derivatives.

➗ Quotient rule

Proposition (Quotient rule): If f and g are differentiable at c, g(x) ≠ 0 for all x in I, and h(x) = f(x) / g(x), then:

h′(c) = [f′(c)g(c) - f(c)g′(c)] / [g(c)]².

The proof is left as an exercise; one approach is to find the derivative of 1/x first, then use the chain rule and product rule.

🔗 Chain rule for composition

🔗 Statement of the chain rule

Proposition (Chain rule): Suppose:

g : I₁ → I₂ is differentiable at c ∈ I₁.
f : I₂ → ℝ is differentiable at g(c).
h(x) = (f ∘ g)(x) = f(g(x)).

Then h is differentiable at c and:

h′(c) = f′(g(c)) · g′(c).

🔍 How the proof works

Key idea: Define auxiliary functions u and v that "capture" the difference quotients:

u(y) = [f(y) - f(d)] / (y - d) if y ≠ d, and u(d) = f′(d) where d = g(c).
v(x) = [g(x) - g(c)] / (x - c) if x ≠ c, and v(c) = g′(c).

Why this works:

u is continuous at d because f is differentiable at d.
v is continuous at c because g is differentiable at c.
For any x and y: f(y) - f(d) = u(y)(y - d) and g(x) - g(c) = v(x)(x - c).
Substituting y = g(x) gives: h(x) - h(c) = u(g(x)) · v(x) · (x - c).
Dividing by (x - c): [h(x) - h(c)] / (x - c) = u(g(x)) · v(x).
Taking the limit as x → c: both u(g(x)) → f′(g(c)) and v(x) → g′(c) by continuity.

Example application: This rule tells us how a derivative changes when we change variables in a function.

Mean Value Theorem

4.2 Mean value theorem

🧭 Overview

🧠 One-sentence thesis

The mean value theorem guarantees that for a continuous, differentiable function on an interval, there exists at least one interior point where the instantaneous rate of change (derivative) equals the average rate of change over the entire interval.

📌 Key points (3–5)

Core claim: For a continuous function on [a,b] that is differentiable on (a,b), there exists a point c in (a,b) where f'(c) equals the slope of the secant line from (a, f(a)) to (b, f(b)).
Foundation—Rolle's theorem: If a differentiable function has the same value at both endpoints of an interval, its derivative must be zero at some interior point.
Critical points: At any relative minimum or maximum in the interior of an interval, the derivative must be zero (if it exists).
Common confusion: A function can be strictly increasing even if its derivative is zero at some points (e.g., x cubed at x = 0); the converse of "positive derivative implies strictly increasing" does not hold.
Why it matters: The theorem enables solving differential equations, characterizing increasing/decreasing functions, locating extrema, and proving that derivatives satisfy the intermediate value property even when discontinuous.

🏔️ Relative extrema and critical points

🏔️ What relative extrema are

Relative maximum: A function f has a relative maximum at c in S if there exists a δ > 0 such that for all x in S where |x − c| < δ, we have f(x) ≤ f(c). Relative minimum is defined analogously.

Unlike absolute extrema (the tallest peak or lowest valley in the entire range), relative extrema are local peaks and valleys.
The derivative is a "local concept"—like walking in fog, it can tell you whether you are at the top of some peak, but not whether it is the highest peak overall.

🎯 Critical points and their role

Critical point: A point c where f'(c) = 0 (or where f' does not exist, in broader usage).

Lemma 4.2.2: If f is differentiable at c in (a,b) and has a relative minimum or maximum at c, then f'(c) = 0.

Why this works: At a relative maximum, the difference quotient is ≤ 0 when approaching from the right (x > c) and ≥ 0 when approaching from the left (y < c). Taking limits from both sides forces f'(c) = 0.
How to find extrema: Check all critical points (where f'(c) = 0 or f' does not exist) plus the endpoints of the interval; evaluate f at each and compare.
Example: If f has a peak at c inside (a,b), secants from the left have non-negative slope, secants from the right have non-positive slope, so the tangent (the limit) must have slope zero.

🎲 Rolle's theorem

🎲 The theorem

Rolle's theorem: Let f : [a,b] → ℝ be continuous on [a,b], differentiable on (a,b), and satisfy f(a) = f(b). Then there exists c in (a,b) such that f'(c) = 0.

Geometric intuition: If a function starts and ends at the same height, it must have a horizontal tangent somewhere in between (either at a peak, a valley, or a flat section).
Proof idea: Since f is continuous on [a,b], it attains an absolute maximum and minimum. If either occurs in the interior (a,b), apply Lemma 4.2.2 to get f'(c) = 0. If both occur at the endpoints, f is constant, so f'(x) = 0 everywhere.

⚠️ Necessity of differentiability everywhere

Don't confuse: Rolle's theorem requires differentiability at all x in (a,b).
Example: f(x) = |x| on [−1, 1] satisfies f(−1) = f(1) = 1, but there is no c where f'(c) = 0 because f is not differentiable at 0 (the only candidate point).

📐 Mean value theorem

📐 The main theorem

Mean value theorem (Theorem 4.2.4): Let f : [a,b] → ℝ be continuous on [a,b] and differentiable on (a,b). Then there exists c in (a,b) such that f(b) − f(a) = f'(c)(b − a).

Rewritten: f'(c) = (f(b) − f(a)) / (b − a), meaning the derivative at c equals the slope of the secant line connecting the endpoints.
Geometric interpretation: There is at least one point c where the tangent line is parallel to the secant line from (a, f(a)) to (b, f(b)).
Name origin: The slope of the secant is the "mean value" (average) of the derivative over [a,b], and the theorem says this average is actually achieved at some interior point.

🔧 Proof technique

How it works: Define a helper function g(x) = f(x) − f(b) − [(f(b) − f(a)) / (b − a)](x − b).
This g is constructed so that g(a) = g(b) = 0 and g' = f' minus the constant slope of the secant.
Apply Rolle's theorem to g: there exists c where g'(c) = 0, which means f'(c) = (f(b) − f(a)) / (b − a).
Generalization: Cauchy's mean value theorem (Theorem 4.2.5) extends this to two functions f and φ, yielding [f(b) − f(a)]φ'(c) = f'(c)[φ(b) − φ(a)].

🚗 Real-world example

Speed enforcement: Police measure the time a car takes to travel between two points. The mean value theorem guarantees the car must have attained the average speed (distance / time) at some moment, so if the average exceeds the limit, the car was speeding.

🔍 Applications of the mean value theorem

🔍 Solving differential equations

Proposition 4.2.6: If f : I → ℝ is differentiable and f'(x) = 0 for all x in interval I, then f is constant.

Proof idea: For any x < y in I, the mean value theorem gives f(y) − f(x) = f'(c)(y − x) for some c in (x,y). Since f'(c) = 0, we have f(y) = f(x).
This is the first "differential equation" solved: the only functions with zero derivative everywhere are constants.

📈 Characterizing increasing and decreasing functions

Increasing: f(x) ≤ f(y) whenever x < y. Strictly increasing: f(x) < f(y) whenever x < y. (Decreasing and strictly decreasing reverse the inequalities.)

Proposition 4.2.7: Let f : I → ℝ be differentiable on interval I.

f is increasing if and only if f'(x) ≥ 0 for all x in I.
f is decreasing if and only if f'(x) ≤ 0 for all x in I.

Proposition 4.2.8: (One direction only)

If f'(x) > 0 for all x in I, then f is strictly increasing.
If f'(x) < 0 for all x in I, then f is strictly decreasing.

Don't confuse: The converse of Proposition 4.2.8 is false. Example: f(x) = x³ is strictly increasing everywhere, but f'(0) = 0.

🎯 First derivative test for extrema

Proposition 4.2.9: Let f : (a,b) → ℝ be continuous, differentiable on (a,c) and (c,b).

If f'(x) ≤ 0 on (a,c) and f'(x) ≥ 0 on (c,b), then f has an absolute minimum at c.
If f'(x) ≥ 0 on (a,c) and f'(x) ≤ 0 on (c,b), then f has an absolute maximum at c.
How it works: The derivative's sign tells us f is decreasing before c and increasing after c (or vice versa), so c must be a valley (or peak).
Proof technique: Use Proposition 4.2.7 to show f is decreasing/increasing on each side, then use continuity at c and sequences approaching c to conclude f(x) ≥ f(c) (or ≤) for all x.
Note: To find relative extrema, restrict f to a smaller interval (c − δ, c + δ) and apply the test.
Converse does not hold: See Example 4.2.12 below.

🔗 Differentiability at endpoints

Proposition 4.2.10:

If f : [a,b) → ℝ is continuous, differentiable on (a,b), and lim (x→a) f'(x) = L, then f is differentiable at a and f'(a) = L.
(Analogous statement for the right endpoint b.)
This allows you to "extend" differentiability to an endpoint if the derivative has a limit there.
The proof uses the mean value theorem (Exercise 4.2.13).

🌊 Intermediate value property of derivatives

🌊 Darboux's theorem

Theorem 4.2.11 (Darboux): Let f : [a,b] → ℝ be differentiable. If y is between f'(a) and f'(b) (i.e., f'(a) < y < f'(b) or f'(a) > y > f'(b)), then there exists c in (a,b) such that f'(c) = y.

What it means: Derivatives satisfy the intermediate value theorem—even if f' is not continuous, it cannot "jump over" any value.
Proof idea: Define g(x) = yx − f(x). Then g'(x) = y − f'(x), so g'(a) > 0 and g'(b) < 0 (assuming f'(a) < y < f'(b)). Because g'(a) > 0, g is increasing near a, so g cannot have a maximum at a. Similarly, g cannot have a maximum at b. Therefore, g attains its maximum at some c in (a,b), where g'(c) = 0, i.e., f'(c) = y.

🚨 Discontinuous derivatives exist

Example 4.2.12: The function f(x) = x² sin²(1/x) for x ≠ 0 and f(0) = 0 is differentiable everywhere, but f' is not continuous at 0.

Why f is differentiable at 0: The difference quotient [f(x) − f(0)] / x = x sin²(1/x) satisfies |x sin²(1/x)| ≤ |x|, which goes to 0 as x → 0. So f'(0) = 0.
Why f' is discontinuous at 0: For x ≠ 0, f'(x) = 2x sin(1/x) sin(1/x) − cos(1/x). By choosing sequences x_n = 1/[(8n+1)π/4] and y_n = 1/[(8n+3)π/4], we get lim f'(x_n) = −1 and lim f'(y_n) = 1, so f' has no limit at 0.
Additional property: f has an absolute minimum at 0 (since f(x) ≥ 0 and f(0) = 0), yet f' changes sign infinitely often near 0. This shows the converse of Proposition 4.2.9 is false.

📚 Continuously differentiable functions

Continuously differentiable: f is differentiable and f' is continuous. Notation: f ∈ C¹(I).

It is sometimes useful to assume f' is continuous to avoid pathologies like Example 4.2.12.
Darboux's theorem shows that even without continuity, derivatives cannot have jump discontinuities—they can only have oscillatory discontinuities.

Taylor's Theorem

4.3 Taylor’s theorem

🧭 Overview

🧠 One-sentence thesis

Taylor's theorem generalizes the mean value theorem by showing that any n-times differentiable function can be approximated near a point by a polynomial, with an error term that behaves like (x − x₀) raised to the (n+1) power.

📌 Key points (3–5)

What Taylor's theorem does: extends the mean value theorem from first-derivative approximation to higher-order polynomial approximation using multiple derivatives.
The Taylor polynomial: a polynomial built from the function's derivatives at a point x₀, matching the function's first n derivatives at that point.
The remainder term: measures the approximation error and involves the (n+1)th derivative at some unknown point c between x₀ and x.
Common confusion: the Taylor polynomial may approximate well near x₀ but can fail far away; convergence of the Taylor series is not guaranteed everywhere, and even when it converges it may not equal the function.
Practical application: the second derivative test for finding local minima/maxima is a direct consequence of Taylor's theorem.

📐 Higher-order derivatives

📐 Building up derivatives

First derivative f′: the derivative of f.
Second derivative f″: the derivative of f′.
nth derivative f⁽ⁿ⁾: the result of differentiating f a total of n times.

When f possesses n derivatives, we say f is n times differentiable.
Notation: f′, f″, f‴, f⁗, then f⁽ⁿ⁾ for larger n (to avoid excessive prime marks).
Each derivative is itself a function from the interval I to the real numbers.

🧮 The Taylor polynomial and theorem

🧮 What the Taylor polynomial is

nth order Taylor polynomial for f at x₀:
P_x₀^n(x) = f(x₀) + f′(x₀)(x − x₀) + [f″(x₀)/2](x − x₀)² + [f‴(x₀)/6](x − x₀)³ + … + [f⁽ⁿ⁾(x₀)/n!](x − x₀)ⁿ

This is a sum from k = 0 to n of [f⁽ᵏ⁾(x₀) / k!] times (x − x₀)ᵏ.
The polynomial is constructed so that its kth derivative at x₀ equals f⁽ᵏ⁾(x₀) for k = 0, 1, 2, …, n.
Why it works: matching derivatives at x₀ makes the polynomial behave like f near that point.

🎯 Taylor's theorem statement

Taylor's Theorem: If f has n continuous derivatives on [a, b] and f⁽ⁿ⁺¹⁾ exists on (a, b), then for distinct points x₀ and x in [a, b], there exists a point c between x₀ and x such that:
f(x) = P_x₀^n(x) + [f⁽ⁿ⁺¹⁾(c) / (n+1)!](x − x₀)^(n+1)

The extra term is called the remainder term R_x₀^n(x), written in Lagrange form.
The point c depends on both x and x₀ (and is generally unknown).
Connection to mean value theorem: when n = 0, this reduces to the mean value theorem (approximation by a constant plus a first-derivative correction).

🔍 How the approximation works

The error behaves like (x − x₀)^(n+1) near x₀.
For large n, (x − x₀)^(n+1) becomes very small in a small interval around x₀, so the approximation improves.
Example: The sine function at x₀ = 0 has odd-degree Taylor polynomials (even-degree terms vanish because even derivatives of sine are zero at the origin); higher-degree polynomials track sine more closely near zero.

⚠️ Limitations and convergence

⚠️ Where approximations can fail

Near vs. far: Taylor polynomials are good approximations near x₀ but not necessarily everywhere.
Example: The function x/(1 − x) expanded around 0 gives polynomials that approximate well for x in (−1, 1) but get worse outside that interval (even as degree increases for x < −1).
The excerpt shows graphs where the 20th-degree polynomial still fails for x < −1.

⚠️ Taylor series and analytic functions

Taylor series: the infinite sum (from k = 0 to ∞) of [f⁽ᵏ⁾(x₀) / k!](x − x₀)ᵏ, when f is infinitely differentiable.

No guarantee of convergence: the series may not converge for any x ≠ x₀.
No guarantee it equals f: even where the series converges, it may converge to something other than f(x).
Analytic functions: functions whose Taylor series converges to the function itself in some open interval around every point x₀; many common functions are analytic, but not all smooth functions are.
Don't confuse: infinitely differentiable ≠ analytic; a function can have derivatives of all orders yet its Taylor series may not represent it.

🧪 Application: the second derivative test

🧪 Using Taylor's theorem for optimization

Strict relative minimum at c: there exists δ > 0 such that f(x) > f(c) for all x in (c − δ, c + δ) with x ≠ c.
Strict relative maximum: defined similarly with f(x) < f(c).

Second Derivative Test (Proposition 4.3.3): Suppose f is twice continuously differentiable on (a, b), x₀ is in (a, b), f′(x₀) = 0, and f″(x₀) > 0. Then f has a strict relative minimum at x₀.

🧪 Why the test works

By continuity of f″, there exists δ > 0 such that f″(c) > 0 for all c near x₀.
Taylor's theorem (n = 1) gives: f(x) = f(x₀) + f′(x₀)(x − x₀) + [f″(c)/2](x − x₀)².
Since f′(x₀) = 0, this simplifies to f(x) = f(x₀) + [f″(c)/2](x − x₀)².
Because f″(c) > 0 and (x − x₀)² > 0, we have f(x) > f(x₀) for x ≠ x₀ near x₀.
Generalization: the excerpt mentions an nth derivative test (left as an exercise) that extends this logic to higher derivatives.

🔧 Proof sketch and key ideas

🔧 How Taylor's theorem is proved

Step 1: Define a number M_(x,x₀) so that f(x) = P_x₀^n(x) + M_(x,x₀)(x − x₀)^(n+1).
Step 2: Construct an auxiliary function g(s) = f(s) − P_x₀^n(s) − M_(x,x₀)(s − x₀)^(n+1).
Step 3: Show that g(x₀) = g′(x₀) = g″(x₀) = … = g⁽ⁿ⁾(x₀) = 0 (because the Taylor polynomial matches f's derivatives at x₀).
Step 4: Also g(x) = 0 by construction.
Step 5: Apply the mean value theorem repeatedly (n+1 times) to find a point c where g⁽ⁿ⁺¹⁾(c) = 0.
Step 6: Compute g⁽ⁿ⁺¹⁾(s) = f⁽ⁿ⁺¹⁾(s) − (n+1)! M_(x,x₀), so M_(x,x₀) = f⁽ⁿ⁺¹⁾(c) / (n+1)!.

🔧 Why the polynomial is a good approximation

The Taylor polynomial's kth derivative at x₀ equals f⁽ᵏ⁾(x₀) for k = 0, 1, …, n.
Matching derivatives means the polynomial "starts out" behaving like f at x₀.
The remainder term quantifies how much the function deviates from the polynomial as you move away from x₀.

Inverse function theorem

4.4 Inverse function theorem

🧭 Overview

🧠 One-sentence thesis

The inverse function theorem guarantees that a continuously differentiable function with nonzero derivative at a point has a locally invertible, continuously differentiable inverse whose derivative is the reciprocal of the original function's derivative.

📌 Key points (3–5)

Core formula: If f is differentiable at x₀ with f'(x₀) ≠ 0, then the inverse function g satisfies g'(y₀) = 1 / f'(x₀), where y₀ = f(x₀).
Local vs global: The theorem guarantees invertibility only on a small interval around x₀, not necessarily on the entire domain.
Nonzero derivative requirement: The condition f'(x₀) ≠ 0 is essential; when f'(x₀) = 0, the inverse may fail to be differentiable at the corresponding point.
Common confusion: A function may be globally invertible but the inverse function theorem only applies locally where the derivative is nonzero (e.g., x³ is globally invertible but its inverse has no derivative at 0 because the original derivative vanishes there).
Application: The theorem rigorously establishes the existence and differentiability of nth roots.

🔗 The basic lemma for inverse derivatives

🔗 Setup and formula

Lemma 4.4.1: Let I, J be intervals. If f : I → J is strictly monotone, onto, differentiable at x₀ ∈ I, and f'(x₀) ≠ 0, then the inverse f⁻¹ is differentiable at y₀ = f(x₀) and (f⁻¹)'(y₀) = 1 / f'(x₀).

The lemma assumes f is already globally invertible (strictly monotone and onto).
The key insight: the derivative of the inverse is the reciprocal of the derivative of the original function.
If f is continuously differentiable and f' is never zero, then f⁻¹ is continuously differentiable.

📐 Geometric intuition

The proof uses the difference quotient relationship:

For the inverse g = f⁻¹, write y = f(x).
Then [g(y) - g(y₀)] / [y - y₀] = [x - x₀] / [f(x) - f(x₀)].
This is the reciprocal of the difference quotient for f.
As x → x₀, the left side approaches g'(y₀) and the right side approaches 1 / f'(x₀).

Don't confuse: The formula is not g'(y₀) = 1 / f'(y₀); it is g'(y₀) = 1 / f'(g(y₀)), evaluated at the corresponding point in the domain of f.

🎯 The inverse function theorem (local version)

🎯 Statement and guarantees

Theorem 4.4.2 (Inverse function theorem): Let f : (a, b) → ℝ be continuously differentiable, x₀ ∈ (a, b) a point where f'(x₀) ≠ 0. Then there exists an open interval I ⊂ (a, b) with x₀ ∈ I, the restriction f restricted to I is injective with a continuously differentiable inverse g : J → I defined on an interval J = f(I), and g'(y) = 1 / f'(g(y)) for all y ∈ J.

The theorem does not claim f is invertible on all of (a, b).
It only guarantees a small interval I around x₀ where f is locally invertible.
The interval I is constructed so that f' has the same sign throughout I (using continuity of f').

🔧 How the proof works

Without loss of generality, assume f'(x₀) > 0.
By continuity of f', there exists an interval I = (x₀ - δ, x₀ + δ) where f'(x) > 0 for all x ∈ I.
By Proposition 4.2.8, f is strictly increasing on I, hence injective.
By continuity of f and the intermediate value theorem, f(I) is an interval J.
Apply Lemma 4.4.1 to conclude that the inverse is continuously differentiable.

Example: For f(x) = x², the derivative f'(x₀) ≠ 0 only when x₀ ≠ 0. If x₀ > 0, we can take I = (0, ∞), but no larger—we cannot include negative numbers because f is not injective on any interval containing both positive and negative numbers.

📏 Application to nth roots

📏 Existence and uniqueness

Corollary 4.4.3: Given n ∈ ℕ and x ≥ 0, there exists a unique number y ≥ 0 (denoted x^(1/n)) such that yⁿ = x. Furthermore, g(x) = x^(1/n) is continuously differentiable on (0, ∞) and g'(x) = 1 / [n · x^((n-1)/n)].

For x = 0, the unique root is trivially y = 0.
For x > 0, define f(y) = yⁿ on (0, ∞).
f is continuously differentiable with f'(y) = n·y^(n-1) > 0 for y > 0.
f is strictly increasing, hence injective.
f is onto (0, ∞): for any x with ε < x < M, the intermediate value theorem guarantees x ∈ f([ε, M]).
The inverse g = f⁻¹ exists and is continuously differentiable by Lemma 4.4.1.

🧮 Derivative formula

Using the inverse derivative formula:

g'(x) = 1 / f'(g(x)) = 1 / [n · (x^(1/n))^(n-1)] = 1 / [n · x^((n-1)/n)].
This can also be written as (1/n) · x^((1-n)/n).

Convention: x^(m/n) means (x^(1/n))^m.

⚠️ When the theorem fails

⚠️ The zero derivative case

Example 4.4.5: Consider f(x) = x³.

f : ℝ → ℝ is globally one-to-one and onto, so f⁻¹(y) = y^(1/3) exists on the entire real line.
f has a continuous derivative everywhere.
However, f⁻¹ has no derivative at the origin.
The reason: f'(0) = 0.
At the origin, the inverse function has a vertical tangent.

Key lesson: The condition f'(x₀) ≠ 0 is not just technical—it is essential for the inverse to be differentiable.

⚠️ Local vs global invertibility

Situation	f globally invertible?	Inverse differentiable everywhere?	Why?
f(x) = ax, a ≠ 0	Yes	Yes	f'(x) = a ≠ 0 everywhere
f(x) = x² on ℝ	No	N/A	Not injective
f(x) = x² on (0, ∞)	Yes (onto (0, ∞))	Yes	f'(x) = 2x > 0 for x > 0
f(x) = x³ on ℝ	Yes	No (not at 0)	f'(0) = 0

Don't confuse: Global invertibility does not guarantee the inverse is differentiable everywhere; you still need the derivative to be nonzero at each point.

⚠️ Continuity of the derivative matters

Example 4.4.6 (from exercises): f(x) = x + 2x² sin(1/x) for x ≠ 0, f(0) = 0.

f is differentiable everywhere.
f'(0) > 0.
Yet f is not invertible on any open interval containing the origin.
The issue: f' is not continuous at 0, so the theorem does not apply.

Lesson: The theorem requires f to be continuously differentiable, not just differentiable.

The Riemann Integral

5.1 The Riemann integral

🧭 Overview

🧠 One-sentence thesis

The Riemann integral is defined by approximating the area under a curve with rectangles from above and below, and a function is Riemann integrable when these upper and lower approximations converge to the same value.

📌 Key points (3–5)

What the integral measures: The integral represents the area under the curve, not an antiderivative (that connection requires separate proof).
How integrability is defined: A bounded function is Riemann integrable when its lower Darboux integral equals its upper Darboux integral.
The approximation method: Partitions divide the interval into subintervals; lower sums use infimum heights, upper sums use supremum heights.
Common confusion: The integral and the antiderivative are different concepts—computing an antiderivative using the integral is a nontrivial result that must be proved.
Refinement makes bounds tighter: Adding more partition points increases lower sums and decreases upper sums, squeezing toward the true integral value.

📐 Partitions and Darboux sums

📐 What a partition is

Partition: A finite set of numbers {x₀, x₁, x₂, ..., xₙ} such that a = x₀ < x₁ < x₂ < ⋯ < xₙ₋₁ < xₙ = b.

A partition divides the interval [a, b] into n subintervals.
Each subinterval has width Δxᵢ = xᵢ − xᵢ₋₁.
Example: {0, 0.5, 1, 2} is a partition of [0, 2] into three subintervals.

📊 Lower and upper Darboux sums

For a bounded function f on [a, b] and partition P:

mᵢ = infimum of f(x) on [xᵢ₋₁, xᵢ] (lowest value f takes on that subinterval)
Mᵢ = supremum of f(x) on [xᵢ₋₁, xᵢ] (highest value f takes on that subinterval)
Lower sum L(P, f) = sum of mᵢ Δxᵢ (area of shaded rectangles using minimum heights)
Upper sum U(P, f) = sum of Mᵢ Δxᵢ (area of all rectangles using maximum heights)

Geometric meaning: The lower sum underestimates the area under the curve; the upper sum overestimates it.

🔢 Basic bounds on Darboux sums

If m ≤ f(x) ≤ M for all x in [a, b], then for every partition P:

m(b − a) ≤ L(P, f) ≤ U(P, f) ≤ M(b − a)

The lower sum is at least the area of a rectangle of height m.
The upper sum is at most the area of a rectangle of height M.
Lower sums never exceed upper sums.

🔍 Lower and upper Darboux integrals

🔍 Defining the two auxiliary integrals

Lower Darboux integral: The supremum (least upper bound) of all lower sums L(P, f) over all partitions P of [a, b].

Upper Darboux integral: The infimum (greatest lower bound) of all upper sums U(P, f) over all partitions P of [a, b].

Notation:

Lower integral: ∫ from a to b (underbar) of f
Upper integral: ∫ from a to b (overbar) of f

Why both are needed: These are defined for all bounded functions, even those that are not Riemann integrable.

🧮 Key inequality

For any bounded function f on [a, b]:

m(b − a) ≤ (lower integral) ≤ (upper integral) ≤ M(b − a)

The lower integral never exceeds the upper integral.
Both are bounded by the minimum and maximum rectangle areas.

⚠️ When the two integrals differ

Example (Dirichlet function): f(x) = 1 if x is rational, f(x) = 0 if x is irrational, on [0, 1].

For any partition, every subinterval contains both rationals and irrationals.
So mᵢ = 0 and Mᵢ = 1 for every i.
Lower sum L(P, f) = 0 for all P, so lower integral = 0.
Upper sum U(P, f) = 1 for all P, so upper integral = 1.
The two integrals are different, so this function is not Riemann integrable.

🔧 Refinements and their role

🔧 What a refinement is

Refinement: Partition P̃ is a refinement of partition P if P ⊂ P̃ (P̃ contains all points of P plus possibly more).

A refinement cuts the subintervals into smaller pieces.
Example: {0, 0.2, 0.5, 1, 1.5, 1.75, 2} is a refinement of {0, 0.5, 1, 2}.

📈 How refinements improve bounds

Key property: If P̃ is a refinement of P, then:

L(P, f) ≤ L(P̃, f) (lower sum increases or stays the same)
U(P̃, f) ≤ U(P, f) (upper sum decreases or stays the same)

Why this matters: Refinements make the lower and upper sums closer together, tightening the bounds on the integral.

🔗 Comparing arbitrary partitions

For any two partitions P₁ and P₂:

Form P̃ = P₁ ∪ P₂ (union of both partitions).
P̃ is a refinement of both P₁ and P₂.
Therefore L(P₁, f) ≤ U(P₂, f) for any two partitions.

This proves that the lower integral ≤ upper integral.

✅ Riemann integrability

✅ Definition of Riemann integrable

Riemann integrable: A bounded function f on [a, b] is Riemann integrable if its lower Darboux integral equals its upper Darboux integral.

Notation: The set of Riemann integrable functions on [a, b] is denoted R[a, b].

The Riemann integral: When f is Riemann integrable, the integral ∫ from a to b of f is defined as the common value of the lower and upper integrals.

Don't confuse: A function must be bounded to be Riemann integrable; unbounded functions are not covered by this definition.

🎯 Practical integrability criterion

Proposition: f is Riemann integrable if and only if for every ε > 0, there exists a partition P such that:

U(P, f) − L(P, f) < ε

Interpretation: The gap between upper and lower sums can be made arbitrarily small.

Geometric meaning: The total area of the "white parts" (unshaded rectangles between upper and lower bounds) can be made smaller than any given ε.

📏 Bounds on the integral

If f is Riemann integrable and m ≤ f(x) ≤ M for all x in [a, b], then:

m(b − a) ≤ ∫ from a to b of f ≤ M(b − a)

The integral is bounded by the areas of rectangles with heights m and M.
If f(x) ≤ M for all x, then the integral ≤ M(b − a).

🧪 Examples of integration

🧪 Constant functions

If f(x) = c (constant), then f is integrable on [a, b] and:

∫ from a to b of f = c(b − a)

Why: For any partition, mᵢ = Mᵢ = c, so L(P, f) = U(P, f) = c(b − a).

🧪 Step function with one jump

Define f on [0, 2] by: f(x) = 1 if x < 1, f(x) = 1/2 if x = 1, f(x) = 0 if x > 1.

Proof of integrability:

For any 0 < ε < 1, take partition P = {0, 1 − ε, 1 + ε, 2}.
Lower sum L(P, f) = 1·(1 − ε) + 0·(2ε) + 0·(1 − ε) = 1 − ε.
Upper sum U(P, f) = 1·(1 − ε) + 1·(2ε) + 0·(1 − ε) = 1 + ε.
Gap: U(P, f) − L(P, f) = 2ε, which can be made arbitrarily small.
Therefore f is integrable and ∫ from 0 to 2 of f = 1.

Key technique: Isolate the discontinuity in a small subinterval of width 2ε.

🧪 Integrating 1/(1 + x)

To show f(x) = 1/(1 + x) is integrable on [0, b]:

Take uniform partition with n subintervals: xᵢ = ib/n.
Since f is decreasing, mᵢ = 1/(1 + xᵢ) and Mᵢ = 1/(1 + xᵢ₋₁).
Gap: U(P, f) − L(P, f) = (b/n) · sum of [1/(1 + (i−1)b/n) − 1/(1 + ib/n)].
The sum telescopes: = (b/n) · [1/1 − 1/(1 + b)] = b²/[n(b + 1)].
Choose n large enough so b²/[n(b + 1)] < ε.

Telescoping: Successive terms cancel, leaving only the first and last terms.

📝 Additional notation and conventions

📝 Integration on larger domains

If f is defined on a set S containing [a, b], we say f is Riemann integrable on [a, b] if the restriction of f to [a, b] is Riemann integrable.

📝 Reversed limits and zero-width intervals

Reversed limits: If b < a and f is integrable on [b, a], define:

∫ from a to b of f = − ∫ from b to a of f

Zero-width interval: For any function f, define:

∫ from a to a of f = 0

📝 Change of variable notation

When the variable x has another meaning, use a different letter:

∫ from a to b of f(s) ds = ∫ from a to b of f(x) dx

The variable of integration is a "dummy variable"—the integral's value doesn't depend on which letter is used.

💡 Conceptual insights

💡 Integral vs antiderivative

Critical distinction: The integral is the area under the curve, nothing else.

That we can compute an integral using an antiderivative is a nontrivial result (the Fundamental Theorem of Calculus) that must be proved separately.
Don't confuse: The definition of the integral makes no reference to derivatives.

💡 Local vs global

Derivatives are local: They describe behavior at a single point.

Integrals are global: They aggregate information over an entire interval.

The integral "sums" f(x) dx over all x in the interval.
The integral sign (long S) was chosen by Leibniz to represent summation.
Applications: total distance traveled, average temperature, total charge—all require "global" answers.

💡 Why refinements matter

Coarser partitions give loose bounds (large gap between upper and lower sums).
Finer partitions (refinements) give tighter bounds (smaller gap).
Integrability means the gap can be made arbitrarily small by choosing fine enough partitions.

Properties of the integral

5.2 Properties of the integral

🧭 Overview

🧠 One-sentence thesis

The Riemann integral satisfies additivity, linearity, and monotonicity properties, and continuous functions (even with finitely many discontinuities) are always Riemann integrable.

📌 Key points (3–5)

Additivity: Integrating over an interval split at a point equals the sum of integrals over the two subintervals.
Linearity: The integral of a constant times a function equals the constant times the integral; the integral of a sum equals the sum of integrals.
Monotonicity: If one function is always less than or equal to another, its integral is also less than or equal.
Continuous functions are integrable: Any continuous function on a closed interval is Riemann integrable; even functions with finitely many discontinuities are integrable.
Common confusion: Changing a function's values at finitely many points does not change the integral—the integral "ignores" isolated point changes.

➕ Additivity of the integral

➕ What additivity means

Additivity property: If you split an interval at an interior point, the integral over the whole interval equals the sum of integrals over the two pieces.

The excerpt states: if a < b < c and f is bounded on [a, c], then the integral from a to c equals the integral from a to b plus the integral from b to c.
This holds for both lower and upper Darboux integrals (Lemma 5.2.1) and for the Riemann integral itself (Proposition 5.2.2).
Why it matters: You can break complicated integrals into simpler pieces.

➕ How the proof works

Take partitions P₁ of [a, b] and P₂ of [b, c]; their union P is a partition of [a, c].
The lower sum L(P, f) splits naturally: L(P, f) = L(P₁, f) + L(P₂, f).
Taking the supremum over all such partitions gives the additivity for lower integrals.
The same argument with upper sums (using infimum) gives additivity for upper integrals.
For Riemann integrable functions, lower and upper integrals coincide, so the formula holds for the integral.

➕ Integrability on subintervals

Corollary 5.2.3: If f is Riemann integrable on [a, b] and [c, d] is a subinterval, then f restricted to [c, d] is also Riemann integrable.
Example: A function integrable on [0, 10] is automatically integrable on [3, 7].
Don't confuse: The converse (integrable on every subinterval implies integrable on the whole) requires additional argument, as shown in Proposition 5.2.2.

📐 Linearity and monotonicity

📐 Linearity (Proposition 5.2.4)

Linearity: The integral is linear—it respects scalar multiplication and addition.

Scalar multiplication: The integral of α times f equals α times the integral of f (for any real α).
Addition: The integral of (f + g) equals the integral of f plus the integral of g.
Why this works: For α ≥ 0, multiplication by α commutes with infimum and supremum; for α < 0, you need to handle the sign flip (left as an exercise in the excerpt).
Example: If you know the integrals of f and g separately, you can immediately compute the integral of any linear combination like 3f + 5g.

📐 Monotonicity (Proposition 5.2.6)

Monotonicity: If f(x) ≤ g(x) for all x, then the integral of f is less than or equal to the integral of g.

The proof uses the fact that if f ≤ g pointwise, then the infimum of f on any subinterval is ≤ the infimum of g, so lower sums satisfy L(P, f) ≤ L(P, g).
Taking suprema preserves the inequality.
The same holds for upper integrals and for Riemann integrals.
Example: If one curve lies entirely below another, the area under the first curve is smaller.

📐 Inequalities for Darboux integrals (Proposition 5.2.5)

For the sum f + g, the lower integral satisfies an inequality (not equality): lower integral of (f + g) ≥ lower integral of f + lower integral of g.
Similarly for upper integrals: upper integral of (f + g) ≤ upper integral of f + upper integral of g.
Don't confuse: These are inequalities for Darboux integrals; equality holds only when f and g are both Riemann integrable (by linearity).

🔄 Continuous functions and integrability

🔄 Continuous functions are integrable (Lemma 5.2.7)

If f is continuous on [a, b], then f is Riemann integrable.

The proof uses uniform continuity: on a closed bounded interval, a continuous function is uniformly continuous.
Given ε > 0, find δ such that |f(x) - f(y)| < ε/(b - a) whenever |x - y| < δ.
Take a partition with all subintervals shorter than δ; then on each subinterval, the maximum minus the minimum of f is < ε/(b - a).
Summing over all subintervals, the difference between upper and lower sums is < ε.
Since ε is arbitrary, the upper and lower integrals coincide, so f is integrable.

🔄 Functions with finitely many discontinuities (Theorem 5.2.9)

A bounded function with finitely many discontinuities is Riemann integrable.

Finitely many discontinuities: There exists a finite set S = {x₁, x₂, ..., xₙ} such that f is continuous at all points outside S.
The proof divides [a, b] into subintervals so that f is continuous on the interior of each subinterval.
By Lemma 5.2.8, if f is integrable on intervals [aₙ, bₙ] that approach [a, b] from inside, then f is integrable on [a, b].
By additivity, f is integrable on the union of these intervals.
Example: A function that jumps at three isolated points but is otherwise continuous is integrable.

🔄 Integrability "from the inside" (Lemma 5.2.8)

If f is bounded on [a, b] and integrable on every interval [aₙ, bₙ] where a < aₙ < bₙ < b and aₙ → a, bₙ → b, then f is integrable on [a, b].
The integral on [a, b] equals the limit of integrals on [aₙ, bₙ].
The proof uses boundedness to show that the sequence of integrals is bounded, extracts a convergent subsequence, and shows that the limit must equal both the lower and upper integrals on [a, b].
Don't confuse: This does not say f must be continuous at the endpoints; it only needs to be integrable "inside."

🔧 Additional properties

🔧 Changing values at finitely many points (Proposition 5.2.10)

If f is Riemann integrable and g differs from f only at finitely many points, then g is also integrable and has the same integral.

The integral "ignores" changes at isolated points.
The proof uses additivity to split the interval so that f = g except at endpoints, then applies Lemma 5.2.8.
Example: If you redefine a continuous function at five points, the integral does not change.
Why this matters: You can "fix" bad values without affecting the integral.

🔧 Monotone functions are integrable (Proposition 5.2.11)

Any monotone (increasing or decreasing) function on [a, b] is Riemann integrable.

The excerpt states this result but leaves the proof as an exercise (5.2.14).
The hint suggests using a uniform partition (all subintervals of equal length).
Example: A function that only increases (or only decreases) is always integrable, even if it has jump discontinuities.

🔧 Summary table

Property	Statement	Key idea
Additivity	∫ₐᶜ f = ∫ₐᵇ f + ∫ᵇᶜ f	Split interval at b
Linearity (scalar)	∫ₐᵇ αf = α ∫ₐᵇ f	α commutes with sup/inf
Linearity (sum)	∫ₐᵇ (f + g) = ∫ₐᵇ f + ∫ₐᵇ g	Requires both integrable
Monotonicity	f ≤ g ⇒ ∫ₐᵇ f ≤ ∫ₐᵇ g	Pointwise inequality preserved
Continuous ⇒ integrable	f continuous ⇒ f integrable	Use uniform continuity
Finitely many jumps OK	Bounded + finitely many discontinuities ⇒ integrable	Split into continuous pieces
Point changes ignored	f = g except at finitely many points ⇒ same integral	Isolated points have zero "weight"
Monotone ⇒ integrable	f increasing or decreasing ⇒ integrable	Uniform partition works

Fundamental Theorem of Calculus

5.3 Fundamental theorem of calculus

🧭 Overview

🧠 One-sentence thesis

The fundamental theorem of calculus connects derivatives and integrals by showing that integration and differentiation are inverse operations, enabling us to compute integrals via antiderivatives and vice versa.

📌 Key points (3–5)

Two forms of the theorem: the first form computes integrals using antiderivatives (if F′ = f, then the integral of f equals F(b) − F(a)); the second form constructs antiderivatives using integrals (the integral from a to x of f is differentiable with derivative f).
Why it matters: the theorem allows explicit computation of integrals when we know an antiderivative, and it solves differential equations by expressing solutions as integrals.
Continuity vs differentiability: the second form guarantees F is always continuous, but F is differentiable at c only if f is continuous at c; discontinuities in f can prevent differentiability of F.
Common confusion: not every integral can be written in "closed form" using elementary functions—many important functions (like ln x and erf x) are themselves defined as integrals, and numerical approximation is often necessary.
Change of variables: the theorem enables substitution techniques for solving integrals, but the hypotheses (continuous differentiability, correct domains) must be satisfied—mindless symbol manipulation leads to errors.

📐 First form: computing integrals via antiderivatives

📐 Statement of the first form

Theorem 5.3.1: Let F : [a, b] → ℝ be continuous and differentiable on (a, b). Let f be Riemann integrable on [a, b] such that f(x) = F′(x) for x in (a, b). Then the integral of f from a to b equals F(b) − F(a).

What it says in plain language: if you know a function F whose derivative is f, then the integral of f over [a, b] is just the difference F(b) − F(a).
Why this is powerful: instead of computing limits of Riemann sums, you evaluate F at two points and subtract.
Example: To compute the integral of x² from 0 to 1, notice that x² is the derivative of (x³)/3. The theorem gives: integral = (1³)/3 − (0³)/3 = 1/3.

🔍 How the proof works

Key idea: use the mean value theorem on each subinterval of a partition.
For each subinterval [x_{i−1}, x_i], the mean value theorem guarantees a point c_i where f(c_i) Δx_i = F(x_i) − F(x_{i−1}).
Summing over all subintervals, the middle terms cancel (telescoping sum), leaving F(b) − F(a).
This quantity is squeezed between the lower sum L(P, f) and upper sum U(P, f) for every partition P.
Taking supremum and infimum over all partitions, and using that f is Riemann integrable, forces the integral to equal F(b) − F(a).

⚠️ Generalization note

The excerpt mentions the theorem can be generalized to allow F to be non-differentiable at finitely many points in [a, b], as long as F remains continuous.
This generalization is left as an exercise (Exercise 5.3.3).

🔄 Second form: constructing antiderivatives via integrals

🔄 Statement of the second form

Theorem 5.3.3: Let f : [a, b] → ℝ be Riemann integrable. Define F(x) as the integral of f from a to x. Then: (1) F is continuous on [a, b]; (2) if f is continuous at c in [a, b], then F is differentiable at c and F′(c) = f(c).

What it says in plain language: integrating f from a fixed base point a to a variable upper limit x produces a function F that is always continuous; moreover, wherever f is continuous, F is differentiable and its derivative is f.
Why this is powerful: it solves the differential equation F′(x) = f(x) by defining F as an integral.
Example (implicit): the natural logarithm is defined as ln x = integral from 1 to x of (1/s) ds. The second form guarantees that ln is differentiable and (ln x)′ = 1/x.

🧱 Proof sketch: continuity

Because f is bounded (say |f(x)| ≤ M for all x), we have |F(x) − F(y)| = |integral from y to x of f| ≤ M |x − y|.
This shows F is Lipschitz continuous, hence continuous.

🧱 Proof sketch: differentiability

Assume f is continuous at c. Given ε > 0, choose δ > 0 so that |x − c| < δ implies |f(x) − f(c)| < ε.
Then for x in this δ-neighborhood, f(c) − ε < f(x) < f(c) + ε.
Integrating this inequality from c to x (or x to c if x < c) and dividing by (x − c), we get: |[F(x) − F(c)]/(x − c) − f(c)| ≤ ε.
This proves F′(c) = f(c).

🔀 Arbitrary base point

Remark 5.3.4: the theorem still holds if we define F(x) as the integral from any fixed point d in [a, b] to x (not just from a).
The proof is left as an exercise (Exercise 5.3.4).

🔗 Relationship between continuity and differentiability

🔗 When f is discontinuous

Example from the excerpt: Let f(x) = −1 if x < 0 and f(x) = 1 if x ≥ 0. Then F(x) = integral from 0 to x of f equals |x|.
f is discontinuous at 0, and F is not differentiable at 0.
Lesson: discontinuity of f at a point can cause F to fail to be differentiable there.

🔗 Converse does not hold

Example from the excerpt: Let g(x) = 0 if x ≠ 0 and g(0) = 1. Then G(x) = integral from 0 to x of g equals 0 for all x.
g is discontinuous at 0, but G′(0) exists and equals 0.
Lesson: f can be discontinuous at c yet F can still be differentiable at c (though F′(c) may not equal f(c)).

📊 Summary table

Condition on f at c	Guaranteed property of F at c
f Riemann integrable	F is continuous at c
f continuous at c	F is differentiable at c and F′(c) = f(c)
f discontinuous at c	F may or may not be differentiable at c

🧮 Integrals not in closed form

🧮 The "closed form" misconception

Common misunderstanding: students often think integrals that cannot be expressed using elementary functions (polynomials, trig, exponentials, etc.) are somehow deficient or unsolvable.
Reality: most integrals cannot be computed in closed form. This is normal and not a problem.

🧮 Natural logarithm as an integral

The natural logarithm is defined as: ln x = integral from 1 to x of (1/s) ds.
Key point: writing ln x does not "simplify" the integral—it merely gives the integral a name.
When a computer computes ln x, it often numerically approximates this integral.
Lesson: "closed form" is often just a convenient label, not a fundamentally different kind of answer.

🧮 The erf function

The error function is defined as: erf(x) = (2/√π) times the integral from 0 to x of e^(−s²) ds.
This function appears frequently in applied mathematics.
It is simply the antiderivative of (2/√π) e^(−x²) that equals zero at zero.
The second form of the fundamental theorem tells us we can write it as an integral; to compute specific values, we numerically approximate the integral.

🔄 Change of variables (u-substitution)

🔄 Statement of the theorem

Theorem 5.3.5 (Change of variables): Let g : [a, b] → ℝ be continuously differentiable, let f : [c, d] → ℝ be continuous, and suppose the image of g under [a, b] is contained in [c, d]. Then: integral from a to b of f(g(x)) g′(x) dx = integral from g(a) to g(b) of f(s) ds.

What it says in plain language: you can substitute s = g(x), ds = g′(x) dx, and change the limits of integration accordingly.
Why it works: define F(y) = integral from g(a) to y of f(s) ds. By the second form, F′(y) = f(y). By the chain rule, (F ∘ g)′(x) = F′(g(x)) g′(x) = f(g(x)) g′(x). Apply the first form to F ∘ g.

🔄 Example of correct use

Problem: compute the integral from 0 to √π of x cos(x²) dx.
Solution: Let g(x) = x², so g′(x) = 2x. Then x cos(x²) = cos(x²) · (1/2) · 2x = (1/2) cos(g(x)) g′(x).
The integral becomes (1/2) times the integral from 0 to π of cos(s) ds = (1/2)[sin(π) − sin(0)] = 0.

⚠️ Example of incorrect use

Problem: consider the integral from −1 to 1 of [ln|x|]/x dx.
Temptation: let g(x) = ln|x|, so g′(x) = 1/x. Try to write: integral from g(−1) to g(1) of s ds = integral from 0 to 0 of s ds = 0.
Why this is wrong:
1. [ln|x|]/x is not continuous on [−1, 1] (undefined at 0 and cannot be made continuous).
2. [ln|x|]/x is not Riemann integrable on [−1, 1] (it is unbounded).
3. g(x) = ln|x| is not continuous on [−1, 1], let alone continuously differentiable.
Lesson: the hypotheses of the theorem must be satisfied. Do not manipulate symbols mindlessly; verify that the functions and integrals actually make sense.

🛡️ Hypotheses checklist

Before applying change of variables, verify:

g is continuously differentiable on [a, b].
f is continuous on [c, d].
The image of g lies in [c, d].
The resulting integrand is Riemann integrable.

The logarithm and the exponential

5.4 The logarithm and the exponential

🧭 Overview

🧠 One-sentence thesis

The natural logarithm and exponential function can be rigorously defined through integral calculus and their unique characterizing properties, which then allow us to extend exponentiation to arbitrary real powers.

📌 Key points (3–5)

Logarithm construction: The natural logarithm is uniquely defined as the integral from 1 to x of 1/t, satisfying ln(1) = 0 and derivative 1/x.
Exponential as inverse: The exponential function is the inverse of the logarithm, uniquely characterized by E(0) = 1 and the property that its derivative equals itself.
Extending exponentiation: For irrational y, x to the power y is defined as exp(y ln(x)), making exponentiation continuous for all real exponents.
Common confusion: The logarithm is defined first via integration, then the exponential is defined as its inverse—not the other way around as in elementary calculus.
Uniqueness from derivatives: Both functions are uniquely determined by their derivative properties alone: ln'(x) = 1/x with ln(1) = 0, and E'(x) = E(x) with E(0) = 1.

📐 Constructing the logarithm

🔨 Definition via integration

The logarithm is defined as:

L(x) = integral from 1 to x of (1/t) dt, for x > 0.

This is the starting point, not derived from any prior notion of logarithm.
Property (i): L(1) = 0 follows immediately (integral from 1 to 1 is zero).
Property (ii): L is differentiable and L'(x) = 1/x follows from the second fundamental theorem of calculus.
The excerpt emphasizes that this definition is constructive: we define a candidate function and then verify it has all required properties.

📊 Key properties of the logarithm

The function L satisfies five characterizing properties:

Property	Statement	Why it matters
(i) Initial value	L(1) = 0	Anchors the function at a known point
(ii) Derivative	L'(x) = 1/x	Makes L strictly increasing; connects to integration
(iii) Behavior	Strictly increasing, bijective, limits ±∞	Ensures an inverse exists
(iv) Product rule	L(xy) = L(x) + L(y)	Converts multiplication to addition
(v) Power rule	L(x^q) = q L(x) for rational q	Extends to rational exponents

🔗 Proving the product rule

The excerpt proves property (iv) by change of variables:

Start with L(x) = integral from 1 to x of (1/t) dt.
Substitute u = yt in the integral: L(x) = integral from y to xy of (1/u) du.
Split: integral from 1 to xy minus integral from 1 to y = L(xy) - L(y).
Therefore L(x) = L(xy) - L(y), which rearranges to L(xy) = L(x) + L(y).

Example: L(6) = L(2·3) = L(2) + L(3).

♾️ Proving the limits

To show L maps onto all real numbers:

Since 1/t ≥ 1/2 on [1, 2], we have L(2) ≥ 1/2 > 0.
By induction using property (iv): L(2^n) = n L(2).
By the Archimedean property, for any y > 0 there exists n such that L(2^n) > y.
The intermediate value theorem gives some x₁ in (1, 2^n) with L(x₁) = y.
Since L is strictly increasing (derivative always positive), lim as x→∞ of L(x) = ∞.
Using L(x) = -L(1/x) (from L(x · 1/x) = L(x) + L(1/x) = L(1) = 0), we get lim as x→0⁺ of L(x) = -∞.

Don't confuse: The limit as x→0 is only from the right (x > 0), since L is only defined on positive numbers.

🎯 Uniqueness

The excerpt proves uniqueness using the first fundamental theorem of calculus:

If any function satisfies L(1) = 0 and L'(x) = 1/x, then by the fundamental theorem it must equal the integral from 1 to x of (1/t) dt.
Therefore properties (i) and (ii) alone uniquely determine L.

After proving existence and uniqueness, the excerpt defines the natural logarithm: ln(x) = L(x).

🚀 Constructing the exponential

🔄 Definition as inverse

The exponential is defined as:

E(x) = the inverse function of L = ln.

Since ln is strictly increasing and bijective from (0, ∞) to ℝ, it has an inverse from ℝ to (0, ∞).
Property (i): E(0) = 1 follows because ln(1) = 0.
Property (ii): E'(x) = E(x) follows from the inverse function theorem (Lemma 4.4.1): E'(x) = 1/L'(E(x)) = 1/(1/E(x)) = E(x).

📊 Key properties of the exponential

The function E satisfies five characterizing properties:

Property	Statement	Connection
(i) Initial value	E(0) = 1	Inverse of ln(1) = 0
(ii) Derivative	E'(x) = E(x)	Self-replicating growth
(iii) Behavior	Strictly increasing, bijective, limits 0 and ∞	Inverse properties of ln
(iv) Sum rule	E(x + y) = E(x) E(y)	Inverse of ln product rule
(v) Rational multiples	E(qx) = E(x)^q for rational q	Inverse of ln power rule

🔗 Proving the sum rule

Property (iv) is proved using the corresponding property of the logarithm:

Take any x, y in ℝ.
Since L is bijective, find a and b such that x = L(a) and y = L(b).
Then E(x + y) = E(L(a) + L(b)) = E(L(ab)) = ab = E(x) E(y).

Example: E(3 + 5) = E(3) · E(5).

🎯 Uniqueness from the differential equation

The excerpt proves uniqueness using only properties (i) and (ii):

Suppose E and F both satisfy E(0) = F(0) = 1 and E'(x) = E(x), F'(x) = F(x).
Consider the quotient F(x) E(-x).
Its derivative is F'(x) E(-x) - E'(-x) F(x) = F(x) E(-x) - E(-x) F(x) = 0.
By Proposition 4.2.6, F(x) E(-x) is constant, equal to F(0) E(0) = 1.
From E(x - x) = E(x) E(-x) = 1, we have E(-x) = 1/E(x) ≠ 0.
Therefore F(x) - E(x) = 0 for all x.

Don't confuse: The uniqueness proof works even if we only assume E and F map ℝ to ℝ (not necessarily into positive reals), because the derivative condition forces E(x) ≠ 0 everywhere.

After proving existence and uniqueness, the excerpt defines the exponential function: exp(x) = E(x).

🔢 Extending exponentiation to real powers

🌉 Bridging rational and irrational exponents

For rational exponents, the excerpt shows:

If y is rational and x > 0, then x^y = exp(ln(x^y)) = exp(y ln(x)).
This uses property (v) of the logarithm: ln(x^y) = y ln(x) for rational y.

For irrational exponents, the excerpt defines:

If x > 0 and y is irrational, define x^y = exp(y ln(x)).

This makes x^y a continuous function of y for all real y.
The excerpt notes that we would get the same result by taking a sequence of rational numbers {yₙ} approaching y and defining x^y = lim as n→∞ of x^(yₙ).

Example: √2^(√2) is now rigorously defined as exp(√2 · ln(√2)).

🎲 Defining Euler's number

The excerpt defines:

e = exp(1).

This is called Euler's number or the base of the natural logarithm.

The notation e^x for exp(x) is justified:

e^x = exp(x ln(e)) = exp(x · 1) = exp(x).
Here we used ln(e) = ln(exp(1)) = 1 (since ln and exp are inverses).

📐 Extended properties

Proposition 5.4.3 states that the logarithm and exponential properties extend to irrational powers:

(i) exp(xy) = (exp(x))^y for all x, y in ℝ.
(ii) If x > 0, then ln(x^y) = y ln(x) for all y in ℝ.

The proof is immediate from the definitions using irrational exponents.

🔄 Alternative approaches

🛤️ Other equivalent definitions

Remark 5.4.4 mentions alternative ways to define the exponential and logarithm:

Approach	Starting point	Reference
Integral (this section)	ln(x) = integral of 1/t	Main approach
Differential equation	E'(x) = E(x), E(0) = 1	Example 6.3.3
Power series	Define by infinite sum	Example 6.2.14

All three approaches are equivalent—they define the same functions.
The excerpt emphasizes that the uniqueness proofs show that any function satisfying the key properties must be the same function.

🔑 Minimal characterization

Remark 5.4.5 highlights that uniqueness follows from minimal conditions:

For the logarithm: L(1) = 0 and L'(x) = 1/x alone determine L.
For the exponential: E'(x) = E(x) and E(0) = 1 alone determine E.
Existence also follows from just these properties.
Alternatively, uniqueness can be proved from the laws of exponents (see exercises).

Don't confuse: The excerpt proves uniqueness first from the derivative conditions, then separately notes that the algebraic properties (product rule, power rule) also characterize the functions.

Improper integrals

5.5 Improper integrals

🧭 Overview

🧠 One-sentence thesis

Improper integrals extend the Riemann integral to unbounded intervals or unbounded functions by taking limits of proper integrals, and they converge when these limits exist.

📌 Key points (3–5)

What improper integrals are: limits of Riemann integrals over expanding intervals or approaching singularities, not true integrals themselves.
Two main types: integrals over infinite intervals (e.g., from a to infinity) and integrals of unbounded functions on bounded intervals.
Convergence vs divergence: an improper integral converges if the limit exists; otherwise it diverges.
Common confusion: you cannot always split an improper integral into two separate improper integrals—the limit of a sum is not always the sum of limits when both parts diverge.
Why it matters: the p-test, comparison test, and integral test provide practical tools to determine convergence and estimate sums of series.

🔧 Definition and basic setup

🔧 What makes an integral "improper"

The Riemann integral is defined only for bounded functions on bounded intervals. Improper integrals handle two situations:

Unbounded intervals: integrating from a finite point to infinity, or over the entire real line.
Unbounded functions: integrating functions that blow up at one or both endpoints of a bounded interval.

Improper integral: a limit of Riemann integrals rather than a Riemann integral itself.

📐 Formal definition for infinite right endpoint

Suppose f is defined on [a, ∞) and is Riemann integrable on every finite interval [a, c]. Then:

The improper integral from a to infinity of f is defined as the limit as c approaches infinity of the integral from a to c of f.
If this limit exists, the improper integral converges; otherwise it diverges.

Similar definitions apply for left endpoints approaching negative infinity or finite endpoints where the function is unbounded.

🔄 Bounded functions at finite endpoints

If f is bounded on [a, b), then by Lemma 5.2.8 the improper integral definition adds nothing new—it agrees with the ordinary Riemann integral. What is new: the definition now applies to unbounded functions.

🧪 The p-test for integrals

🧪 Statement of the p-test

Proposition (p-test): The improper integral from 1 to infinity of 1 over x to the power p converges to 1 over (p minus 1) if p is greater than 1, and diverges if 0 less than p less than or equal to 1.

The improper integral from 0 to 1 of 1 over x to the power p converges to 1 over (1 minus p) if 0 less than p less than 1, and diverges if p is greater than or equal to 1.

🔍 Why the p-test works

The proof uses the fundamental theorem of calculus to compute the antiderivative of x to the power negative p.
For p greater than 1, the term 1 over (b to the power (p minus 1)) goes to zero as b approaches infinity.
For p equal to 1, the antiderivative is the logarithm, which diverges to infinity.
Example: the integral from 1 to infinity of 1 over x squared converges (p equals 2 is greater than 1), but the integral from 1 to infinity of 1 over x diverges (p equals 1).

Don't confuse: the behavior at infinity (p greater than 1 converges) is opposite to the behavior near zero (p less than 1 converges).

🧩 Key properties and techniques

🧩 Tails of improper integrals

Proposition (Tails): For any b greater than a, the integral from b to infinity of f converges if and only if the integral from a to infinity of f converges. When both converge, the integral from a to infinity equals the integral from a to b plus the integral from b to infinity.

This means convergence depends only on behavior "at infinity," not on any finite starting point.
You can shift the lower limit without affecting convergence, though the value changes by a finite amount.

➕ Nonnegative functions are easier

Proposition: If f is nonnegative and integrable on every [a, b], then the improper integral from a to infinity of f equals the supremum of all integrals from a to x for x greater than or equal to a.

For nonnegative f, the integral from a to x is an increasing function of x.
Convergence means this increasing sequence is bounded above.
Any sequence x_n going to infinity gives the same limit (if it exists).

Warning: This proposition holds only for nonnegative functions. The exercises show it fails otherwise.

🔬 Comparison test

Proposition (Comparison test): Suppose f(x) is less than or equal to g(x) for all x greater than or equal to a.

If the integral of g converges, then the integral of f converges, and the integral of f is less than or equal to the integral of g.

If the integral of f diverges, then the integral of g diverges.

Example: To show the integral from 0 to infinity of sin(x squared) times (x plus 2) over (x cubed plus 1) converges, observe that for x greater than or equal to 1, the absolute value is bounded by 3 over x squared. Since the integral of 3 over x squared converges (p-test with p equals 2), the original integral converges by comparison.

⚠️ Common pitfalls

⚠️ Cannot always split improper integrals

The excerpt gives a critical warning: you cannot split an improper integral into two parts if both parts diverge individually.

Example: The integral from 2 to infinity of 2 over (x squared minus 1) converges. But if you write 2 over (x squared minus 1) equals 1 over (x minus 1) minus 1 over (x plus 1) and try to integrate each term separately, both diverge to infinity. You get "infinity minus infinity," which is meaningless.

The limit of a sum is not always the sum of limits.
You must take the limit of the combined integral, not split first.

⚠️ Double limits require care

When integrating over the entire real line or over an interval with two bad endpoints, you must take limits at both ends.

Definition: The integral from negative infinity to positive infinity of f is defined as the limit as c approaches negative infinity of the limit as d approaches positive infinity of the integral from c to d of f.

Proposition: If either iterated limit exists, both exist and are equal, and you can also compute it as the limit as a approaches infinity of the integral from negative a to a.

Example: The integral from negative infinity to positive infinity of 1 over (1 plus x squared) equals pi (using arctan).

Warning: The limit as a approaches infinity of the integral from negative a to a may exist even when the double-sided improper integral does not converge.

Example: Let f(x) equal x over absolute value of x for x not equal to zero. The integral from negative a to a of f equals zero for all a, so the limit is zero. But the improper integral does not converge because for any fixed negative c, the limit as d approaches infinity of the integral from c to d diverges.

🌟 The sinc function example

🌟 A remarkable function

The sinc function is defined as sin(x) over x for x not equal to zero, and 1 at x equals zero.

Key facts:

The integral from negative infinity to positive infinity of sinc(x) equals pi (converges).
The integral from negative infinity to positive infinity of absolute value of sinc(x) equals infinity (diverges).

This is analogous to the alternating harmonic series (which converges) versus the harmonic series (which diverges).

🌟 Why sinc converges

The proof uses careful estimation over intervals [2πn, 2π(n+1)]:

On alternating half-periods, sin(x) is positive then negative.
The integral over each full period is bounded by 1 over (πn(n plus 1)).
The sum of these bounds is a convergent series.
Therefore the improper integral converges.

Don't confuse: convergence here cannot be shown by comparison test (the function changes sign). Direct analysis of cancellation is required.

📊 Integral test for series

📊 Connecting integrals and series

Proposition (Integral test): Suppose f is decreasing and nonnegative on [k, ∞). Then the series sum from n equals k to infinity of f(n) converges if and only if the integral from k to infinity of f converges.

When both converge:

The integral from k to infinity of f is less than or equal to the series sum.
The series sum is less than or equal to f(k) plus the integral from k to infinity of f.

📊 Why it works

Because f is decreasing, the integral from n to n+1 of f is less than or equal to f(n) (rectangle above the curve).
Also f(n) is less than or equal to the integral from n-1 to n of f (rectangle below the curve).
Summing these inequalities gives the bounds.

📊 Estimating series sums

Example: Estimate the sum from n equals 1 to infinity of 1 over n squared to within 0.01.

The integral from k to infinity of 1 over x squared equals 1 over k.
So 1 over k plus the sum from n equals 1 to k minus 1 of 1 over n squared is within 1 over k squared of the true sum.
For k equals 10, the error is 0.01.
Computation gives the sum is between 1.6397 and 1.6497.
(The actual sum is pi squared over 6, approximately 1.6449.)

This technique turns series convergence questions into integral convergence questions, which are often easier to analyze.

Pointwise and uniform convergence

6.1 Pointwise and uniform convergence

🧭 Overview

🧠 One-sentence thesis

Uniform convergence is a stronger condition than pointwise convergence because it requires a single threshold N that works for all points in the domain simultaneously, not just for each point individually.

📌 Key points (3–5)

Pointwise convergence: for each x in the domain, the sequence of function values converges to the limit function's value at that x; N can depend on both ε and x.
Uniform convergence: for any ε > 0, there exists one N that works for all x simultaneously; the entire function stays within an ε-strip around the limit.
Key distinction: uniform convergence implies pointwise convergence, but the converse does not hold.
Common confusion: in pointwise convergence, N may depend on x (different points may need different N); in uniform convergence, N depends only on ε and must work for all x at once.
Domain matters: a sequence may converge pointwise on a set but not uniformly, yet converge uniformly on a smaller subset.

🎯 Pointwise convergence

📖 Definition and meaning

Pointwise convergence: A sequence of functions {fₙ} converges pointwise to f : S → ℝ if for every x ∈ S, we have f(x) = lim(n→∞) fₙ(x).

This means: at each individual point x, the sequence of numbers {fₙ(x)} converges to f(x).
The limit function f is unique (because limits of sequences of numbers are unique).
You check convergence separately at each point.

🔍 Reformulation with ε-N language

The excerpt provides an equivalent characterization:

{fₙ} converges pointwise to f if and only if for every x ∈ S and every ε > 0, there exists an N ∈ ℕ such that |fₙ(x) − f(x)| < ε for all n ≥ N.

Crucial detail: N can depend on both ε and x.

For different points x, you may need different values of N.
Example: one point might converge quickly (small N), another slowly (large N).

🧪 Example: power functions on [−1, 1]

The excerpt gives fₙ(x) = x^(2n) on [−1, 1].

Pointwise limit:

For x ∈ (−1, 1): since 0 ≤ x² < 1, we have (x²)ⁿ → 0, so fₙ(x) → 0.
For x = 1 or x = −1: x^(2n) = 1 for all n, so fₙ(x) → 1.
The limit function is f(x) = 1 if x = ±1, and f(x) = 0 otherwise.

Why this works: at each fixed x, the sequence of numbers converges; the N needed depends on how close x² is to 1.

🧪 Example: geometric series

The excerpt mentions the series sum from k=0 to ∞ of xᵏ.

The partial sums fₙ(x) = sum from k=0 to n of xᵏ.
These converge pointwise to 1/(1−x) on the interval (−1, 1).
Important subtlety: even though 1/(1−x) is defined for all x ≠ 1, and fₙ(x) is defined for all x, convergence only happens on (−1, 1).
When we write f(x) = sum of xᵏ, we mean f is defined on (−1, 1) and is the pointwise limit there.

🧪 Non-example: sin(nx)

The excerpt states fₙ(x) = sin(nx) does not converge pointwise to any function on any interval.

It may converge at certain isolated points (e.g., x = 0 or x = π).
In any interval [a, b], there exists an x such that sin(xn) does not have a limit as n → ∞.
Don't confuse: convergence at some points vs. convergence on an interval.

🎯 Uniform convergence

📖 Definition and meaning

Uniform convergence: A sequence {fₙ} converges uniformly to f if for every ε > 0, there exists an N ∈ ℕ such that for all n ≥ N, |fₙ(x) − f(x)| < ε for all x ∈ S.

Key difference from pointwise: N depends only on ε, not on x.

Given ε, you must find one N that works simultaneously for every x in the domain.
Visually: for n ≥ N, the entire graph of fₙ lies within an ε-strip around f (see Figure 6.3 in the excerpt).

🔗 Relationship to pointwise convergence

The excerpt states:

Uniform convergence implies pointwise convergence.

Why: if you have one N that works for all x, then in particular it works for each individual x, so pointwise convergence holds.

The converse does not hold: pointwise convergence does not imply uniform convergence.

🧪 Example: x^(2n) does not converge uniformly on [−1, 1]

The excerpt shows fₙ(x) = x^(2n) converges pointwise on [−1, 1] but not uniformly.

Proof by contradiction:

Suppose convergence were uniform.
Take ε = 1/2. Then there would exist N such that |x^(2N) − 0| < 1/2 for all x ∈ (−1, 1).
But x^(2N) is continuous, and as x → 1 from below, x^(2N) → 1.
This gives the contradiction: 1 = lim(x→1⁻) x^(2N) ≤ 1/2.

Don't confuse: the limit function is discontinuous (jumps at x = ±1), but the fₙ are continuous; this mismatch is a sign that convergence is not uniform.

🧪 Example: uniform convergence on a smaller domain

The excerpt shows that restricting the domain can restore uniform convergence.

On [−a, a] where 0 < a < 1:

fₙ(x) = x^(2n) converges uniformly to 0.
Why: a^(2n) → 0 as n → ∞. Given ε > 0, pick N so that a^(2n) < ε for all n ≥ N.
For any x ∈ [−a, a], |x| ≤ a, so |x^(2n)| = |x|^(2n) ≤ a^(2n) < ε for all n ≥ N.
Key point: N depends only on ε (and the fixed a), not on the individual x.

Lesson: a sequence may fail to converge uniformly on a large set but converge uniformly on a smaller subset where the functions behave more "uniformly."

🔄 Comparing the two notions

📊 Summary table

Aspect	Pointwise convergence	Uniform convergence
Definition	For each x, fₙ(x) → f(x)	For all x simultaneously, fₙ(x) → f(x)
N depends on	Both ε and x	Only ε
Visual	Each vertical slice converges	Entire graph stays in ε-strip
Strength	Weaker	Stronger (implies pointwise)
Limit function	May be discontinuous even if fₙ continuous	Better preserves properties (continuity, integrability, etc.)

🧩 How to tell them apart

Pointwise but not uniform: look for points where convergence is "slow" or where the limit function has a discontinuity while the fₙ are continuous.
Uniform: if you can bound |fₙ(x) − f(x)| by something that goes to 0 as n → ∞ and does not depend on x, then convergence is uniform.

⚠️ Common pitfall

Don't assume that because a sequence converges at every point, it converges uniformly.

Example: fₙ(x) = x^(2n) on [−1, 1] converges at every point but not uniformly.
The "speed" of convergence varies with x: near x = ±1, convergence is very slow; near x = 0, it is fast.

Interchange of limits

6.2 Interchange of limits

🧭 Overview

🧠 One-sentence thesis

Uniform convergence of a sequence of functions allows us to interchange limiting operations—such as taking limits, integrals, and derivatives—while pointwise convergence alone does not guarantee this interchange.

📌 Key points (3–5)

Core problem: When we have two limits (e.g., limit of a sequence and limit defining continuity, or limit and integral), we cannot always swap their order; uniform convergence provides the condition under which interchange is valid.
Continuity: Uniform convergence of continuous functions guarantees the limit function is continuous; pointwise convergence does not.
Integration: Uniform convergence allows swapping the limit and the integral; pointwise convergence can fail even when each function is integrable.
Differentiation: Uniform convergence of the sequence alone is insufficient; we need uniform convergence of the derivatives plus pointwise convergence at one point.
Common confusion: Pointwise vs. uniform convergence—pointwise convergence can destroy continuity, integrability, and differentiability, while uniform convergence preserves these properties under interchange.

🔄 Why interchange of limits matters

🔄 The central question

Modern analysis deals mainly with the question: when can we swap two limiting operations?

A chain of two limits cannot always be interchanged.
Example (from the excerpt): limit as n→∞ then k→∞ of n/(n+k) equals 0, but reversing the order gives 1.
When working with sequences of functions, this issue arises constantly: continuity, integration, differentiation, and power series all involve nested limits.

🎯 Four main instances covered

The excerpt examines four scenarios where interchange matters:

Scenario	What we want to swap	Condition needed
Continuity	limit of sequence with limit defining continuity	Uniform convergence
Integration	limit of sequence with integral	Uniform convergence
Differentiation	limit of sequence with derivative	Uniform convergence of derivatives + one pointwise value
Power series	sum with derivative/integral	Automatic within radius of convergence

🧵 Continuity of the limit

🧵 The interchange problem for continuity

If {fₙ} is a sequence of continuous functions converging pointwise to f, and if xₖ → x, we want to know whether lim(k→∞) f(xₖ) = f(x).

This is equivalent to asking: can we swap lim(k→∞) and lim(n→∞)?
Written out: lim(k→∞) lim(n→∞) fₙ(xₖ) ?= lim(n→∞) lim(k→∞) fₙ(xₖ).
The question mark indicates this equality does not always hold.

❌ Pointwise convergence is not enough

Example 6.2.1 (from the excerpt):

Define fₙ(x) = 1 - nx if x < 1/n, and 0 if x ≥ 1/n, on [0,1].
Each fₙ is continuous.
For x > 0, once n ≥ 1/x, we have fₙ(x) = 0, so lim fₙ(x) = 0.
But fₙ(0) = 1 for all n, so lim fₙ(0) = 1.
The pointwise limit f is 1 at x=0 and 0 elsewhere—not continuous at 0.
Don't confuse: each fₙ is continuous, but the limit is not.

✅ Uniform convergence guarantees continuity

Theorem 6.2.2: If {fₙ} is a sequence of continuous functions on S ⊂ ℝ converging uniformly to f, then f is continuous.

Why it works (proof idea from the excerpt):

Fix x ∈ S and a sequence xₘ → x.
Given ε > 0, uniform convergence gives us one k such that |fₖ(y) - f(y)| < ε/3 for all y.
Continuity of fₖ at x gives us N such that |fₖ(xₘ) - fₖ(x)| < ε/3 for m ≥ N.
Triangle inequality: |f(xₘ) - f(x)| ≤ |f(xₘ) - fₖ(xₘ)| + |fₖ(xₘ) - fₖ(x)| + |fₖ(x) - f(x)| < ε/3 + ε/3 + ε/3 = ε.
Hence f(xₘ) → f(x), so f is continuous at x.

📐 Integral of the limit

📐 The interchange problem for integration

We want to know whether ∫ lim(n→∞) fₙ = lim(n→∞) ∫ fₙ.

Again, pointwise convergence is insufficient.

❌ Pointwise convergence fails for integrals

Example 6.2.3 (from the excerpt):

Define fₙ(x) = 0 if x=0, n - n²x if 0 < x < 1/n, and 0 if x ≥ 1/n.
Each fₙ is Riemann integrable and ∫₀¹ fₙ = 1/2 (by the fundamental theorem).
For x > 0, once n ≥ 1/x, fₙ(x) = 0, so lim fₙ(x) = 0.
Also fₙ(0) = 0 for all n.
The pointwise limit is the zero function, so ∫₀¹ lim fₙ = 0.
But lim ∫₀¹ fₙ = 1/2 ≠ 0.
The "mass" under the graph escapes to infinity as n increases, even though the height at each fixed point goes to zero.

✅ Uniform convergence allows interchange

Theorem 6.2.4: If {fₙ} is a sequence of Riemann integrable functions on [a,b] converging uniformly to f, then f is Riemann integrable and ∫ₐᵇ f = lim(n→∞) ∫ₐᵇ fₙ.

Why it works (proof sketch from the excerpt):

Uniform convergence: for large n, |fₙ(x) - f(x)| < ε/(2(b-a)) for all x.
This bounds f and shows it is integrable.
Then |∫ f - ∫ fₙ| = |∫ (f - fₙ)| ≤ ∫ |f - fₙ| < ε/(2(b-a)) · (b-a) = ε/2 < ε.
Hence ∫ fₙ → ∫ f.

🔍 Application example

Example 6.2.5: Compute lim(n→∞) ∫₀¹ (nx + sin(nx²))/n dx.

The integrand (nx + sin(nx²))/n converges uniformly to x on [0,1].
By Theorem 6.2.4, lim ∫₀¹ (nx + sin(nx²))/n dx = ∫₀¹ x dx = 1/2.
We computed the limit without finding an antiderivative for sin(nx²).

⚠️ Pointwise limits can be non-integrable or unbounded

Example 6.2.6: A sequence of integrable functions can converge pointwise to a non-integrable function.

Define fₙ(x) = 1 if x = p/q in lowest terms with q ≤ n, else 0.
Each fₙ is integrable (differs from zero at finitely many points), ∫ fₙ = 0.
The pointwise limit is the Dirichlet function (1 on rationals, 0 on irrationals), which is not Riemann integrable.

Example 6.2.7: Pointwise limits of bounded functions need not be bounded.

Define fₙ(x) = 0 if x < 1/n, else 1/x, on [0,1].
Each fₙ is bounded: fₙ(x) ≤ n.
The pointwise limit is f(x) = 0 if x=0, else 1/x—unbounded on [0,1].

🔀 Derivative of the limit

🔀 Uniform convergence alone is not enough

Even if {fₙ} converges uniformly to f, we cannot conclude that f is differentiable or that f' = lim fₙ'.

Example 6.2.8: fₙ(x) = sin(nx)/n converges uniformly to 0.

The derivative of the limit is 0.
But fₙ'(x) = cos(nx) does not converge even pointwise (e.g., fₙ'(π) = (-1)ⁿ oscillates).
At x=0, fₙ'(0) = 1 for all n, which converges to 1 ≠ 0.

Example 6.2.9: fₙ(x) = 1/(1 + nx²).

Converges pointwise to a function that is 1 at x=0 and 0 elsewhere—not continuous at 0.
The derivatives fₙ'(x) = -2nx/(1 + nx²)² converge pointwise to 0, but not uniformly on any interval containing 0.
The limit is not differentiable at 0 (not even continuous there).

✅ Uniform convergence of derivatives works

Theorem 6.2.10: Let fₙ : I → ℝ be continuously differentiable on a bounded interval I. Suppose:

{fₙ'} converges uniformly to g : I → ℝ, and
{fₙ(c)} converges for some c ∈ I.

Then {fₙ} converges uniformly to a continuously differentiable function f : I → ℝ, and f' = g.

Why it works (proof idea from the excerpt):

By the fundamental theorem, fₙ(x) = fₙ(c) + ∫ₓᶜ fₙ'.
Since fₙ' converges uniformly, we can pass the limit inside the integral: f(x) = f(c) + ∫ₓᶜ g.
Differentiate: f'(x) = g(x).
Uniform convergence of fₙ to f follows from the uniform convergence of fₙ' and the convergence at c.

⚠️ Don't confuse: need convergence at one point

Uniform convergence of derivatives alone is not enough; we also need {fₙ(c)} to converge at some point c.
Without this, the sequence fₙ might "drift" (e.g., fₙ(x) = x/n on ℝ: derivatives converge uniformly to 0, but fₙ converges pointwise to 0, not uniformly).

🌀 Convergence of power series

🌀 Power series converge uniformly on compact subsets

Proposition 6.2.11: Let Σ cₙ(x - a)ⁿ be a power series with radius of convergence ρ (0 < ρ ≤ ∞). Then:

The series converges uniformly on [a - r, a + r] for every 0 < r < ρ.
The limit function is continuous on (a - ρ, a + ρ).

Why it works (proof sketch from the excerpt):

For r < ρ, the series Σ |cₙ| rⁿ converges (absolute convergence at x = a + r).
For x ∈ [a - r, a + r], |cₙ(x - a)ⁿ| ≤ |cₙ| rⁿ.
The partial sums are uniformly Cauchy on [a - r, a + r], hence converge uniformly.
Uniform limit of continuous functions (polynomials) is continuous.

🔁 Integration and differentiation of power series

Corollary 6.2.12 (term-by-term integration): If f(x) = Σ cₙ(x - a)ⁿ on interval I with radius ρ, then

∫ₐˣ f = Σ (cₙ₋₁/n)(x - a)ⁿ,
and this integrated series has radius of convergence at least ρ.

Corollary 6.2.13 (term-by-term differentiation): If f(x) = Σ cₙ(x - a)ⁿ on interval I with radius ρ, then

f is differentiable and f'(x) = Σ (n+1)cₙ₊₁(x - a)ⁿ,
and this differentiated series has radius of convergence exactly ρ.

Why differentiation preserves radius (proof idea from the excerpt):

The radius is determined by lim sup |cₙ|^(1/n) = R.
For the derivative series, lim sup |ncₙ|^(1/n) = lim sup n^(1/n) · |cₙ|^(1/n).
Since n^(1/n) → 1, we get the same R.
Hence the differentiated series has the same radius ρ = 1/R.

🔍 Application examples

Example 6.2.14: The exponential function.

f(x) = Σ xⁿ/n! has radius ρ = ∞.
f(0) = 1.
Differentiating term by term: f'(x) = Σ xⁿ⁻¹/(n-1)! = f(x).
This characterizes the exponential function.

Example 6.2.15: Σ nxⁿ = x/(1-x)² on (-1,1).

Start with Σ xⁿ = 1/(1-x) on (-1,1).
Differentiate: Σ nxⁿ⁻¹ = 1/(1-x)².
Multiply by x: Σ nxⁿ = x/(1-x)².

🎯 Key takeaway

Within the radius of convergence, power series behave like polynomials: you can differentiate and integrate term by term, and the resulting series converge to the derivative and integral of the original function.

Picard's theorem

6.3 Picard’s theorem

🧭 Overview

🧠 One-sentence thesis

Picard's theorem guarantees that under continuity and a Lipschitz condition on the right-hand side, a first-order ordinary differential equation with an initial condition has exactly one solution on some interval around the initial point.

📌 Key points (3–5)

What the theorem solves: the first-order ODE y′ = F(x, y) with initial condition y(x₀) = y₀, guaranteeing both existence and uniqueness of a solution on some interval [x₀ − h, x₀ + h].
Key hypotheses: F must be continuous and Lipschitz in the second variable (y), meaning |F(x, y) − F(x, z)| ≤ L|y − z| for some constant L.
Proof technique (Picard iteration): convert the differential equation into an integral equation, then iteratively plug approximations into the right-hand side to generate a sequence of functions that converges uniformly to the solution.
Common confusion—Lipschitz vs continuity: continuity of F alone guarantees existence (Peano theorem, mentioned but not proved), but the Lipschitz condition is needed for uniqueness; without it, multiple solutions may exist.
Why it matters: the theorem not only proves a solution exists and is unique, but also provides a constructive method (Picard iterates) to approximate the solution.

🔧 The differential equation and setup

🔧 What is a first-order ODE

First-order ordinary differential equation: an equation of the form y′ = F(x, y), where the unknown is a function y(x) and the equation involves the derivative y′.

"Ordinary" means the unknown is a function of one variable (x), not partial derivatives.
The equation relates the derivative y′(x) to both the independent variable x and the function value y(x) itself.
Initial condition: we specify y(x₀) = y₀, a starting point for the solution.
Solution: a differentiable function y(x) such that y′(x) = F(x, y(x)) and y(x₀) = y₀.

📊 Slope field interpretation

The function F(x, y) assigns a slope at every point (x, y) in the plane.
A solution is a curve that "follows" these slopes: at each point (x, y(x)) on the curve, the tangent slope y′(x) equals F(x, y(x)).
Example: Figure 6.8 shows F(x, y) = x(1 − y); the solution through (x₀, y₀) = (1, 0.3) is drawn following the slope directions.

🧩 Continuity in two variables

Continuity at (x, y): F is continuous at (x, y) if for every sequence (xₙ, yₙ) in the domain with xₙ → x and yₙ → y, we have F(xₙ, yₙ) → F(x, y).

This is the sequential definition of continuity for functions of two variables.
It generalizes the one-variable definition: limits in both coordinates must be respected.

🎯 Statement of Picard's theorem

🎯 Hypotheses

The theorem requires:

Domain: closed bounded intervals I and J in ℝ; the initial point (x₀, y₀) lies in the interior I° × J°.
Continuity: F : I × J → ℝ is continuous.
Lipschitz in y: there exists L ∈ ℝ such that |F(x, y) − F(x, z)| ≤ L|y − z| for all y, z ∈ J and x ∈ I.

Don't confuse: "Lipschitz in the second variable" means the Lipschitz inequality holds for any fixed x when varying y and z; it does not require Lipschitz behavior in x.

🎯 Conclusion

There exists h > 0 such that [x₀ − h, x₀ + h] ⊂ I and a unique differentiable function f : [x₀ − h, x₀ + h] → J satisfying:

f′(x) = F(x, f(x)) for all x in the interval,
f(x₀) = y₀.

Why h may be small: the proof constructs h depending on the size of I, J, the bound M on |F|, and the Lipschitz constant L; the theorem guarantees some h > 0 exists, but not necessarily the largest possible h.

🔁 Proof strategy: Picard iteration

🔁 From differential to integral equation

If f is a solution to f′(x) = F(x, f(x)) with f(x₀) = y₀, then by the fundamental theorem of calculus:

f(x) = y₀ + ∫ from x₀ to x of F(t, f(t)) dt.
This integral equation is equivalent to the differential equation plus initial condition.
Key idea: iteratively plug approximate solutions into the right-hand side to generate better approximations on the left-hand side.

🔁 Constructing the Picard iterates

Start: f₀(x) ≡ y₀ (constant function).
Inductive step: given fₖ₋₁, define

fₖ(x) = y₀ + ∫ from 0 to x of F(t, fₖ₋₁(t)) dt.

(The proof assumes x₀ = 0 without loss of generality.)
Each fₖ is continuous (by the fundamental theorem) and maps [−h, h] into [y₀ − α, y₀ + α] for carefully chosen h and α.
Choice of h: h = min{α, α/(M + Lα)}, where M bounds |F| on I × J and α ensures the intervals fit inside I and J.

🔁 Uniform convergence of the iterates

Define C = Lα/(M + Lα); the proof shows C < 1.
By induction, the uniform norm satisfies:

‖fₙ − fₖ‖ ≤ Cᵏ α.
Since C < 1, the sequence {fₙ} is uniformly Cauchy and converges uniformly to some continuous function f on [−h, h].
The uniform limit f maps [−h, h] into [y₀ − α, y₀ + α] ⊂ J.

🔁 Verifying f solves the integral equation

Because fₙ → f uniformly, F(t, fₙ(t)) → F(t, f(t)) uniformly (using the Lipschitz condition).
Uniform convergence allows interchange of limit and integral:

f(x) = lim fₙ₊₁(x) = lim [y₀ + ∫ F(t, fₙ(t)) dt] = y₀ + ∫ F(t, f(t)) dt.
By the fundamental theorem of calculus, f is differentiable and f′(x) = F(x, f(x)).
Clearly f(0) = y₀, so f solves the original differential equation.

🔁 Uniqueness

Suppose g is another solution on [−h, h].
Then |f(x) − g(x)| = |∫ [F(t, f(t)) − F(t, g(t))] dt| ≤ Lh ‖f − g‖.
Taking the supremum, ‖f − g‖ ≤ C ‖f − g‖ with C < 1.
This is only possible if ‖f − g‖ = 0, so f = g.

Don't confuse: the proof uses the uniform norm (supremum over the interval) to control the iteration; pointwise bounds alone would not suffice.

📚 Examples and counterexamples

📚 Example: the exponential (f′ = f, f(0) = 1)

Here F(x, y) = y, which is continuous and Lipschitz with L = 1.
Picard iterates:
- f₀(x) = 1,
- f₁(x) = 1 + x,
- f₂(x) = 1 + x + x²/2,
- f₃(x) = 1 + x + x²/2 + x³/6.
These are the partial sums of the Taylor series for eˣ.
The proof guarantees h < 1/2 for any choice of intervals, though the exponential is defined for all x ∈ ℝ.
By iteratively applying the theorem on overlapping intervals, one can construct the exponential on all of ℝ.

📚 Example: f′ = f², f(0) = 1

The solution is f(x) = 1/(1 − x), defined only on (−∞, 1).
F(x, y) = y² is continuous but not globally Lipschitz in y (its derivative 2y is unbounded).
As x → 1⁻, the solution grows without bound, and the required Lipschitz constant L must grow.
The proof guarantees h ≈ 0.134 for x₀ = 0, y₀ = 1, even though any h < 1 should work; the theorem's h is not optimal.

📚 Counterexample: f′ = 2√f, f(0) = 0

F(x, y) = 2√y is continuous but not Lipschitz near y = 0 (the derivative 1/√y blows up).
Two solutions exist: f(x) = x² (for x ≥ 0, −x² for x < 0) and f(x) = 0.
Lesson: without the Lipschitz condition, uniqueness fails even if a solution exists.

📚 Counterexample: discontinuous F

Let φ(x) = 0 if x ∈ ℚ, φ(x) = 1 if x ∉ ℚ; consider y′ = φ(x).
F(x, y) = φ(x) is discontinuous.
No solution exists: a derivative must satisfy the intermediate value property (Darboux's theorem), but φ does not.
Lesson: continuity of F is essential for existence.

📚 Summary table

Condition on F	Existence	Uniqueness	Example
Continuous + Lipschitz in y	✓	✓	f′ = f, f′ = f²
Continuous, not Lipschitz	✓ (Peano)	✗	f′ = 2√f
Discontinuous	✗	✗	f′ = φ(x), φ discontinuous

Don't confuse: the Peano existence theorem (mentioned but not proved) guarantees existence under continuity alone, but Picard's theorem is stronger because it also guarantees uniqueness via the Lipschitz condition.

🔍 Remarks and extensions

🔍 Weak solutions

The excerpt mentions that one can weaken "solution" to mean satisfying the integral equation f(x) = y₀ + ∫ F(t, f(t)) dt, even if f is not everywhere differentiable.
Example: y′ = H(x), y(0) = 0, where H is the Heaviside function (H(x) = 0 for x < 0, H(x) = 1 for x ≥ 0).
The "solution" is the ramp function f(x) = 0 for x < 0, f(x) = x for x ≥ 0.
f′(0) does not exist, so f is only a weak solution.

🔍 Why this theorem matters

Constructive: Picard iteration provides an explicit algorithm to approximate solutions.
Foundational: differential equations are the language of modern science; proving solutions exist and are unique is essential.
Sophisticated: the proof combines uniform convergence, the fundamental theorem of calculus, the Lipschitz condition, and the Cauchy criterion—nearly everything learned in the course.
Pièce de résistance: the excerpt calls this a highlight theorem, more sophisticated than the fundamental theorem of calculus.

Metric Spaces

7.1 Metric spaces

🧭 Overview

🧠 One-sentence thesis

Metric spaces provide a unified framework for defining distance and taking limits across diverse mathematical contexts—from real numbers to function spaces—by abstracting the essential properties of distance into four axioms.

📌 Key points (3–5)

What a metric space is: a set equipped with a distance function satisfying four axioms (nonnegativity, identity of indiscernibles, symmetry, triangle inequality).
Why metric spaces matter: they unify limit concepts across different contexts (sequences of numbers, points in space, functions) so theorems need not be reproved in each setting.
Common examples: real numbers with standard distance, n-dimensional Euclidean space, continuous functions with supremum distance, discrete metric (all points equally distant).
Common confusion: the same set can carry different metrics (e.g., ℝ with standard vs. nonstandard metrics), and balls/open sets depend on which metric is used.
Boundedness: a subset is bounded if all its points lie within some fixed distance from a reference point; this generalizes the notion from real analysis.

📐 The four axioms of a metric

📏 What is a metric space

Metric space: A pair (X, d) where X is a set and d : X × X → ℝ is a function (called the metric or distance function) satisfying four properties for all x, y, z in X.

The four required properties are:

Property	Formal statement	Intuitive meaning
(i) Nonnegativity	d(x, y) ≥ 0	Distance is never negative
(ii) Identity of indiscernibles	d(x, y) = 0 if and only if x = y	Only identical points have zero distance
(iii) Symmetry	d(x, y) = d(y, x)	Distance from x to y equals distance from y to x
(iv) Triangle inequality	d(x, z) ≤ d(x, y) + d(y, z)	Direct path is never longer than a detour

🔺 The triangle inequality

The triangle inequality says: the distance from x to z is at most the distance from x to y plus the distance from y to z.
Geometric interpretation: going directly from x to z cannot be longer than going via an intermediate point y.
This property is crucial for proving many results in analysis.
Example: If you want to travel from x to z, taking a detour through y might be longer, but never shorter than the direct route.

🎯 Standard examples

🔢 Real numbers with standard metric

Standard metric on ℝ: d(x, y) = |x − y|.

This is the familiar absolute value distance.
Properties (i)–(iii) are immediate from properties of absolute value.
Triangle inequality follows from the standard triangle inequality for real numbers: |x − z| = |(x − y) + (y − z)| ≤ |x − y| + |y − z|.
When ℝ is mentioned as a metric space without specifying a metric, this standard metric is assumed.

📊 n-dimensional Euclidean space

Standard metric on ℝⁿ: For x = (x₁, x₂, ..., xₙ) and y = (y₁, y₂, ..., yₙ), define d(x, y) = square root of the sum from k=1 to n of (xₖ − yₖ)².

This generalizes the Pythagorean distance formula to n dimensions.
For n = 1, it reduces to the standard metric on ℝ.
The triangle inequality requires the Cauchy–Schwarz inequality for its proof.

🔬 Cauchy–Schwarz inequality

Cauchy–Schwarz inequality: For x = (x₁, ..., xₙ) and y = (y₁, ..., yₙ) in ℝⁿ, the square of the sum from k=1 to n of xₖyₖ is at most the product of (sum of xₖ²) and (sum of yₖ²).

This inequality is essential for proving the triangle inequality in Euclidean space.
The proof uses the fact that a sum of squares is nonnegative.
It generalizes to infinite-dimensional settings (mentioned for the space ℓ²).

🌐 Complex numbers

Complex modulus: For z = x + iy, define |z| = square root of (x² + y²).

The set ℂ of complex numbers is treated as the metric space ℝ² for taking limits.
Distance between z₁ = x₁ + iy₁ and z₂ = x₂ + iy₂ is d(z₁, z₂) = |z₁ − z₂|.
The complex conjugate z̄ = x − iy satisfies |z|² = zz̄, which is useful for computations.

🔀 Nonstandard and special metrics

🔄 Alternative metric on ℝ

The excerpt gives an example of a different metric on the real numbers:

d(x, y) = |x − y| / (|x − y| + 1).

This is a valid metric on ℝ, but not the standard one.
With this metric, d(x, y) < 1 for all x, y in ℝ—every two points are less than 1 unit apart.
The triangle inequality requires showing that the function φ(t) = t/(t+1) is increasing and subadditive.
Don't confuse: The same set (ℝ) can carry different metrics, leading to different geometric properties.

🎲 Discrete metric

Discrete metric: For any set X, define d(x, y) = 1 if x ≠ y, and d(x, y) = 0 if x = y.

All distinct points are equally distant (distance 1) from each other.
This works on any set X, finite or infinite.
Example use: as a "smell test" for statements about metric spaces—if a claim fails for the discrete metric, it's false in general.
With the discrete metric, any set X is bounded (diameter at most 1).

🌍 Great circle distance on a sphere

Great circle distance on unit sphere S²: For points x and y on the unit sphere in ℝ³, let θ be the angle between the lines from the origin through x and y; then d(x, y) = θ.

This is the shortest distance between points if travel is restricted to the sphere's surface.
Formula: d(x, y) = arccos(x₁y₁ + x₂y₂ + x₃y₃).
For a sphere of radius r, multiply by r: d(x, y) = rθ.
Example: This is the standard distance for computing airplane routes on Earth's surface (e.g., London to Los Angeles).
The excerpt notes that the triangle inequality is harder to prove and requires trigonometry and linear algebra.

📚 Function spaces

📈 Continuous functions with supremum metric

Metric on C([a, b], ℝ): For continuous functions f and g on [a, b], define d(f, g) = supremum over x in [a, b] of |f(x) − g(x)|.

This is the uniform norm from earlier chapters, now used as a metric.
The distance measures the maximum vertical separation between the graphs of f and g.
Properties (i)–(iii) are straightforward; (iv) uses the standard triangle inequality pointwise.
Why it matters: Treating sets of functions as metric spaces allows powerful results (like Picard's theorem) to be proved in a unified way.
When C([a, b], ℝ) is mentioned as a metric space without specifying a metric, this supremum metric is assumed.

🔍 Verifying the metric properties

For the function space example:

Finiteness: d(f, g) is finite because |f(x) − g(x)| is continuous on the closed bounded interval [a, b], hence bounded.
Nonnegativity: Clear, as it's the supremum of nonnegative numbers.
Identity: If f = g, then |f(x) − g(x)| = 0 for all x, so d(f, g) = 0. Conversely, if d(f, g) = 0, then |f(x) − g(x)| ≤ 0 for all x, forcing f(x) = g(x) for all x.
Symmetry: |f(x) − g(x)| = |g(x) − f(x)|.
Triangle inequality: d(f, g) = sup |f(x) − g(x)| = sup |f(x) − h(x) + h(x) − g(x)| ≤ sup(|f(x) − h(x)| + |h(x) − g(x)|) ≤ sup |f(x) − h(x)| + sup |h(x) − g(x)| = d(f, h) + d(h, g).

🧩 Subspaces and boundedness

🔗 Subspaces

Subspace: If (X, d) is a metric space and Y ⊂ X, then (Y, d') with d' = d restricted to Y × Y is called a subspace of (X, d).

The restriction of a metric to a subset is automatically a metric (trivial to verify).
The metric on Y is often written as d (same symbol) since it's just the restriction.
Common confusion: Balls and open sets in a subspace may differ from those in the larger space.
Example: In [0, 1] as a subspace of ℝ, the ball B([0,1])(0, 1/2) = [0, 1/2), which differs from Bℝ(0, 1/2) = (−1/2, 1/2).

📏 Boundedness

Bounded subset: A subset S ⊂ X is bounded if there exists a point p in X and a number B in ℝ such that d(p, x) ≤ B for all x in S.

The metric space (X, d) is bounded if X itself is bounded as a subset.
Example: ℝ with the standard metric is not bounded, but ℝ with the discrete metric is bounded.
Equivalent characterizations (for nonempty X):
- For every p in X, there exists B > 0 such that d(p, x) ≤ B for all x in S.
- The diameter diam(S) = supremum of {d(x, y) : x, y in S} is finite.
Don't confuse: Boundedness depends on the metric; the same set can be bounded in one metric and unbounded in another.

📐 Diameter

Diameter: For a nonempty subset S, diam(S) = supremum of {d(x, y) : x, y in S}.

This measures the "width" of the set—the largest distance between any two points in S.
A set is bounded if and only if its diameter is finite.
Example: For the interval [a, b] in ℝ, diam([a, b]) = b − a.

Open and closed sets

7.2 Open and closed sets

🧭 Overview

🧠 One-sentence thesis

Open and closed sets in metric spaces formalize the intuition that open sets allow "wiggling room" around every point while closed sets contain all points that can be approached from within, and these concepts underpin the topology needed for convergence and continuity.

📌 Key points (3–5)

Open sets: A set is open if every point has a ball around it entirely contained in the set (no boundary points included).
Closed sets: A set is closed if its complement is open (equivalently, it contains all points that can be approached from within).
Common confusion: Most sets are neither open nor closed; also, the same set can be open in one space but not in another (subspace topology matters).
Connected sets: A set is connected if it cannot be split into two disjoint nonempty open pieces; in ℝ, connected sets are exactly intervals.
Closure and boundary: The closure adds all approachable points; the boundary consists of points approachable from both the set and its complement.

🎯 Balls: the building blocks

⚪ Open ball

Open ball of radius δ around x: B(x, δ) = {y ∈ X : d(x, y) < δ}

Contains all points strictly within distance δ from x.
The inequality is strict (< not ≤).
Example: In ℝ with standard metric, B(x, δ) = (x − δ, x + δ), an open interval.

⚫ Closed ball

Closed ball of radius δ around x: C(x, δ) = {y ∈ X : d(x, y) ≤ δ}

Contains all points at distance at most δ from x.
The inequality includes equality (≤).
Example: In ℝ with standard metric, C(x, δ) = [x − δ, x + δ], a closed interval.

⚠️ Subspace caution

When working in a subspace Y ⊂ X, balls are computed within Y only.
Example: In [0, 1] as a subspace of ℝ, B[0,1](0, 1/2) = [0, 1/2), which looks "closed on the left" but is actually an open ball in [0, 1].
Always keep track of which metric space you are working in.

🔓 Open sets

🔓 Definition

A subset V ⊂ X is open if for every x ∈ V, there exists a δ > 0 such that B(x, δ) ⊂ V.

Intuition: An open set does not include its "boundary"—you can wiggle a little around any point and stay inside.
The radius δ depends on the point x (different points may need different radii).
Example: (0, ∞) ⊂ ℝ is open because for any x ∈ (0, ∞), taking δ = x gives B(x, δ) = (0, 2x) ⊂ (0, ∞).

🏗️ Properties of open sets

Property	Statement	Key idea
Empty and whole space	∅ and X are open	Vacuous truth for ∅; every ball in X stays in X
Finite intersection	Intersection of finitely many open sets is open	Take δ = min{δ₁, δ₂, ..., δₖ}
Arbitrary union	Union of any collection of open sets is open	If x is in the union, it's in some Vλ, so a ball around x fits in Vλ and hence in the union

Don't confuse: Infinite intersections of open sets need not be open. Example: The intersection of (−1/n, 1/n) for all n ∈ ℕ is {0}, which is not open in ℝ.

✅ Open balls are open

The excerpt proves that B(x, δ) is indeed an open set.
Proof idea: For any y ∈ B(x, δ), let α = δ − d(x, y) > 0. Then B(y, α) ⊂ B(x, δ) by the triangle inequality.
This justifies the terminology "open ball."

🔒 Closed sets

🔒 Definition

A subset E ⊂ X is closed if its complement Eᶜ = X \ E is open.

Intuition: A closed set contains all its boundary points; everything not in E is some distance away from E.
Example: [0, ∞) ⊂ ℝ is closed because its complement (−∞, 0) is open.

🏗️ Properties of closed sets

Property	Statement	Key idea
Empty and whole space	∅ and X are closed	Their complements are open
Arbitrary intersection	Intersection of any collection of closed sets is closed	Complement is a union of open sets
Finite union	Union of finitely many closed sets is closed	Complement is a finite intersection of open sets

Proof strategy: Apply properties of open sets to complements.
Don't confuse: Infinite unions of closed sets need not be closed (dual to the open case).

✅ Closed balls are closed

The excerpt states that C(x, δ) is a closed set (proof left as exercise).
This justifies the terminology "closed ball."

🤔 Neither open nor closed

Most sets are neither open nor closed.
Example: [0, 1) ⊂ ℝ is neither open (no ball around 0 fits inside) nor closed (its complement is not open because balls around 1 contain points in [0, 1)).
A singleton {x} is always closed, but may or may not be open depending on the space.

🔗 Connected sets

🔗 Definition

A nonempty metric space X is connected if the only subsets that are both open and closed (clopen) are ∅ and X itself.

Equivalently: X is connected if it cannot be written as X = X₁ ∪ X₂ where X₁ and X₂ are nonempty, disjoint, and both open.
Intuition: You cannot split a connected set into two separated pieces.

🧩 Characterization for subsets

A nonempty S ⊂ X is disconnected if and only if there exist open sets U₁ and U₂ in X such that:
- U₁ ∩ S ≠ ∅ and U₂ ∩ S ≠ ∅ (both pieces are nonempty),
- U₁ ∩ U₂ ∩ S = ∅ (the pieces don't overlap in S),
- S = (U₁ ∩ S) ∪ (U₂ ∩ S) (S is the union of the two pieces).
Example: If S ⊂ ℝ and there exist x < z < y with x, y ∈ S but z ∉ S, then S is disconnected (take U₁ = (−∞, z) and U₂ = (z, ∞)).

📏 Connected sets in ℝ

A nonempty set S ⊂ ℝ is connected if and only if S is an interval or a single point.

Proof idea (interval → connected): Suppose S is an interval and split into S = (U₁ ∩ S) ∪ (U₂ ∩ S) with x ∈ U₁ ∩ S and y ∈ U₂ ∩ S, x < y. Let z = inf(U₂ ∩ [x, y]). Then z ∈ U₁ (as the infimum), but since U₁ is open, a ball around z is in U₁, which must intersect U₂ ∩ [x, y], contradiction.
This characterization is specific to ℝ; other metric spaces have different connected sets.
Example: In a two-point discrete metric space {a, b}, the ball B(a, 2) = {a, b} is not connected.

📦 Closure, interior, and boundary

📦 Closure

The closure of A ⊂ X is Ā = ⋂{E ⊂ X : E is closed and A ⊂ E}.

Ā is the intersection of all closed sets containing A.
Ā is closed, and A ⊂ Ā.
If A is already closed, then Ā = A.
Example: The closure of (0, 1) in ℝ is [0, 1].
Subspace caution: The closure of (0, 1) in (0, ∞) is (0, 1] (not [0, 1], because 0 is not in the ambient space).

🎯 Characterization of closure

x ∈ Ā if and only if for every δ > 0, B(x, δ) ∩ A ≠ ∅.

Intuition: Ā consists of all points that can be "approached" from within A.
Proof idea: x ∉ Ā means there's a closed set E containing A with x ∉ E, so Eᶜ is open and contains x, giving a ball around x disjoint from A.

🏠 Interior

The interior of A is A° = {x ∈ A : there exists δ > 0 such that B(x, δ) ⊂ A}.

A° consists of points that have "wiggle room" entirely within A.
A° is open, and A° ⊂ A.
A is open if and only if A° = A.
Example: For A = (0, 1] in ℝ, A° = (0, 1).

🚧 Boundary

The boundary of A is ∂A = Ā \ A°.

∂A consists of points in the closure but not in the interior.
∂A is closed.
Example: For A = (0, 1] in ℝ, ∂A = {0, 1}.

🔍 Characterization of boundary

x ∈ ∂A if and only if for every δ > 0, both B(x, δ) ∩ A and B(x, δ) ∩ Aᶜ are nonempty.

Intuition: Boundary points can be approached from both the set and its complement.
Corollary: ∂A = Ā ∩ Āᶜ (boundary is where the closures of A and its complement meet).
A is closed if and only if ∂A ⊂ A.
A is open if and only if ∂A ∩ A = ∅.

🔄 Subspace topology

🔄 Open sets in subspaces

U ⊂ Y is open in Y (with subspace metric) if and only if there exists an open set V ⊂ X such that V ∩ Y = U.

Example: Let X = ℝ, Y = [0, 1], U = [0, 1/2). Then U is open in Y because U = V ∩ Y where V = (−1/2, 1/2) is open in ℝ.
Proof idea (V ∩ Y = U → U open in Y): For x ∈ U, since V is open in X, there's a δ > 0 with Bₓ(x, δ) ⊂ V. Then Bᵧ(x, δ) = Bₓ(x, δ) ∩ Y ⊂ V ∩ Y = U.

🎯 Special cases

If V ⊂ X is open and U ⊂ V, then U is open in V if and only if U is open in X.
- Proof: If U is open in V, write U = W ∩ V for some W open in X. Then U is the intersection of two open sets in X, hence open in X.
If E ⊂ X is closed and F ⊂ E, then F is closed in E if and only if F is closed in X.
- Similar reasoning using complements.

Sequences and convergence

7.3 Sequences and convergence

🧭 Overview

🧠 One-sentence thesis

Sequences in metric spaces converge when their terms eventually stay arbitrarily close to a limit point, and this convergence behavior is fully determined by the topology (the collection of open sets) of the space.

📌 Key points (3–5)

What a sequence is: a function from natural numbers into a metric space, written {xₙ}∞ₙ₌₁.
What convergence means: for every epsilon > 0, there exists M such that all terms beyond M lie within epsilon of the limit p.
Topology encodes convergence: a sequence converges to p if and only if every open neighborhood of p eventually contains all terms.
Common confusion: convergence in different metrics—uniform convergence of functions is exactly convergence in the uniform-norm metric space, but pointwise convergence cannot be captured by any metric.
Closed sets and limits: closed sets contain the limits of all their convergent sequences; the closure of a set A consists of A plus all limits of sequences in A.

📐 Basic definitions

📐 What is a sequence in a metric space

Sequence: A function x : ℕ → X from the natural numbers into a metric space (X, d). We write xₙ for the nth element and {xₙ}∞ₙ₌₁ for the whole sequence.

This generalizes sequences of real numbers by replacing ℝ with an arbitrary metric space X.
The notation and intuition are the same as for real sequences.

📏 Bounded sequences

Bounded sequence: A sequence {xₙ}∞ₙ₌₁ is bounded if there exists a point p ∈ X and B ∈ ℝ such that d(p, xₙ) ≤ B for all n ∈ ℕ.

Equivalently: the set {xₙ : n ∈ ℕ} is a bounded set.
All terms lie within some fixed distance from a reference point.

🔗 Subsequences

Subsequence: If {nₖ}∞ₖ₌₁ is a sequence of natural numbers with nₖ₊₁ > nₖ for all k, then {xₙₖ}∞ₖ₌₁ is a subsequence of {xₙ}∞ₙ₌₁.

You pick out infinitely many terms in increasing order.
Example: if the original sequence is x₁, x₂, x₃, x₄, ..., a subsequence might be x₂, x₄, x₆, x₈, ...

🎯 Convergence

🎯 Definition of convergence

Convergence: A sequence {xₙ}∞ₙ₌₁ in (X, d) converges to p ∈ X if for every ε > 0, there exists M ∈ ℕ such that d(xₙ, p) < ε for all n ≥ M.

Plain language: eventually, all terms get arbitrarily close to p and stay close.
The point p is called a limit of the sequence.
A sequence that converges is convergent; otherwise it is divergent.
If the limit is unique, we write lim(n→∞) xₙ = p.

✨ Uniqueness of limits

Proposition: A convergent sequence in a metric space has a unique limit.

Proof idea: suppose xₙ converges to both x and y. For any ε > 0, find M₁ and M₂ such that d(xₙ, x) < ε/2 and d(xₙ, y) < ε/2 for large enough n. Then by the triangle inequality, d(x, y) ≤ d(x, xₙ) + d(xₙ, y) < ε. Since this holds for every ε > 0, we have d(x, y) = 0, so x = y.
Don't confuse: in some contexts (not metric spaces), limits might not be unique, but in metric spaces they always are.

🔍 Basic properties of convergent sequences

Convergent sequences are bounded: If {xₙ}∞ₙ₌₁ converges to p, there exists B such that d(p, xₙ) ≤ B for all n.

Subsequences inherit convergence: If {xₙ}∞ₙ₌₁ converges to p, then every subsequence {xₙₖ}∞ₖ₌₁ also converges to p.

Tails don't matter: If the K-tail {xₙ}∞ₙ₌K₊₁ converges to p, then the full sequence {xₙ}∞ₙ₌₁ converges to p.

Intuition: convergence is about eventual behavior; finitely many initial terms are irrelevant.

🔄 Alternative characterization via real sequences

Proposition: A sequence {xₙ}∞ₙ₌₁ in (X, d) converges to p ∈ X if and only if there exists a sequence {aₙ}∞ₙ₌₁ of real numbers such that d(xₙ, p) ≤ aₙ for all n ∈ ℕ, and lim(n→∞) aₙ = 0.

This reduces convergence in a metric space to convergence of real numbers.
The distances d(xₙ, p) must be squeezed to zero by a sequence of real numbers converging to zero.

🌐 Special cases

🌐 Convergence in Euclidean space ℝⁿ

Proposition: A sequence {xₘ}∞ₘ₌₁ in ℝⁿ, where xₘ = (xₘ,₁, xₘ,₂, ..., xₘ,ₙ), converges if and only if each component sequence {xₘ,ₖ}∞ₘ₌₁ converges for k = 1, 2, ..., n. In that case:

lim(m→∞) xₘ = (lim(m→∞) xₘ,₁, lim(m→∞) xₘ,₂, ..., lim(m→∞) xₘ,ₙ).

Proof sketch (one direction): if xₘ → y = (y₁, ..., yₙ), then for any k, the absolute difference |yₖ - xₘ,ₖ| is at most the Euclidean distance d(y, xₘ), which goes to zero.
Proof sketch (other direction): if each component converges, pick M so that |yₖ - xₘ,ₖ| < ε/√n for all k when m ≥ M. Then the Euclidean distance is the square root of the sum of squares, which is less than ε.
Example: for complex numbers ℂ ≅ ℝ², a sequence {zₙ}∞ₙ₌₁ = {xₙ + iyₙ}∞ₙ₌₁ converges to z = x + iy if and only if {xₙ}∞ₙ₌₁ → x and {yₙ}∞ₙ₌₁ → y.

📚 Convergence in function spaces

Example: Consider C([a, b], ℝ), the set of continuous functions on [a, b] with the uniform norm metric.

Convergence in this metric space is exactly uniform convergence of functions.
That is, {fₙ}∞ₙ₌₁ converges uniformly if and only if it converges in the metric space sense.
Remark: There is no metric on the set of all functions f : [a, b] → ℝ (continuous or not) that gives pointwise convergence. (Proof beyond scope.)

🏛️ Convergence and topology

🏛️ Open neighborhoods characterize convergence

Proposition: A sequence {xₙ}∞ₙ₌₁ converges to p ∈ X if and only if for every open neighborhood U of p, there exists M ∈ ℕ such that xₙ ∈ U for all n ≥ M.

Plain language: convergence means the sequence eventually enters and stays inside every open set containing p.
Proof (one direction): if xₙ → p, then for any open U containing p, there exists ε > 0 such that the ball B(p, ε) ⊂ U. By convergence, find M such that d(p, xₙ) < ε for n ≥ M, so xₙ ∈ B(p, ε) ⊂ U.
Proof (other direction): given ε > 0, let U = B(p, ε). By hypothesis, there exists M such that xₙ ∈ U = B(p, ε) for n ≥ M, so d(p, xₙ) < ε.
Why this matters: convergence depends only on the topology (the collection of open sets), not on the specific metric.

🔒 Closed sets contain limits

Proposition: Let (X, d) be a metric space, E ⊂ X a closed set, and {xₙ}∞ₙ₌₁ a sequence in E that converges to some p ∈ X. Then p ∈ E.

Proof (contrapositive): suppose p ∈ Eᶜ (the complement of E). Since Eᶜ is open, by the previous proposition there exists M such that xₙ ∈ Eᶜ for all n ≥ M. So {xₙ}∞ₙ₌₁ is not a sequence in E (contradiction).
Intuition: closed sets are "closed under taking limits."
Don't confuse: this does not say every sequence in E converges; it says if a sequence in E converges (to something in X), that limit must be in E.

🧲 Closure and sequences

Proposition: Let (X, d) be a metric space and A ⊂ X. Then p ∈ Ā (the closure of A) if and only if there exists a sequence {xₙ}∞ₙ₌₁ of elements in A such that lim(n→∞) xₙ = p.

Proof (one direction): if p ∈ Ā, then for every n ∈ ℕ, there exists xₙ ∈ B(p, 1/n) ∩ A (by a previous proposition about closure). Since d(p, xₙ) < 1/n, we have lim(n→∞) xₙ = p.
Proof (other direction): if a sequence in A converges to p, then p is in the closure of A (left as exercise in the excerpt).
Plain language: the closure of A consists of A plus all limits of sequences in A.

Concept	What it means	Key property
Convergence	Terms eventually stay within ε of p	Characterized by open neighborhoods
Closed set	Contains all its limit points	If sequence in E converges, limit is in E
Closure Ā	Smallest closed set containing A	Points in Ā are limits of sequences in A

🧪 Exercises and extensions

🧪 Discrete metric

Exercise insight: In a metric space with the discrete metric (d(x, y) = 1 if x ≠ y, 0 if x = y), if a sequence {xₙ}∞ₙ₌₁ converges, then there exists K ∈ ℕ such that xₙ = xₖ for all n ≥ K.

Intuition: in the discrete metric, every point is isolated, so convergence means the sequence is eventually constant.

🧪 Dense sets

Dense set: A set S ⊂ X is dense in X if X ⊂ S̄ (the closure of S equals X), or equivalently, if for every p ∈ X, there exists a sequence {xₙ}∞ₙ₌₁ in S that converges to p.

Example: ℝⁿ contains a countable dense subset (e.g., points with rational coordinates).
Intuition: a dense set "fills up" the space in the sense that you can approximate any point arbitrarily well using points from the dense set.

🧪 Alternative metrics on ℝⁿ

Exercise insight: On ℝⁿ, the standard Euclidean metric d, the sum metric d′(x, y) = sum of |xₖ - yₖ|, and the max metric d″(x, y) = max of |xₖ - yₖ| all give the same notion of convergence.

A sequence converges in one metric if and only if it converges in the others (to the same limit).
This shows that different metrics can generate the same topology.

🧪 Extended reals

Exercise insight: The extended reals ℝ* = {-∞} ∪ ℝ ∪ {∞} can be given a metric such that:

A sequence of real numbers converges to ∞ in (ℝ*, d) if and only if for every M ∈ ℝ, there exists N such that xₙ ≥ M for all n ≥ N.
A sequence of real numbers converges to a real number in (ℝ*, d) if and only if it converges in ℝ with the standard metric.

Completeness and compactness

7.4 Completeness and compactness

🧭 Overview

🧠 One-sentence thesis

Completeness ensures every Cauchy sequence converges within the space, while compactness guarantees every sequence has a convergent subsequence and every open cover has a finite subcover, with closed and bounded subsets of ℝⁿ being precisely the compact ones.

📌 Key points (3–5)

Cauchy sequences and completeness: A metric space is complete if every Cauchy sequence converges to a point in the space; ℝⁿ with the standard metric is complete.
Compactness via covers: A set is compact if every open cover has a finite subcover; equivalently (in metric spaces), every sequence has a convergent subsequence.
Heine–Borel theorem: In ℝⁿ, a set is compact if and only if it is closed and bounded—this equivalence does not hold in general metric spaces.
Common confusion: Compact always implies closed and bounded, but closed and bounded does not always imply compact (only in ℝⁿ); also, completeness depends on the metric, but compactness depends only on the topology.
Why it matters: Compactness is preserved under continuous functions, closed subsets of compact sets are compact, and the Lebesgue covering lemma provides a uniform "δ" for any open cover of a compact set.

🔄 Cauchy sequences and completeness

🔄 What is a Cauchy sequence

Cauchy sequence: A sequence {xₙ} in a metric space (X, d) such that for every ε > 0, there exists M ∈ ℕ so that for all n ≥ M and all k ≥ M, we have d(xₙ, xₖ) < ε.

The definition generalizes the concept from real numbers to arbitrary metric spaces.
A sequence of real numbers is Cauchy in the sense of chapter 2 if and only if it is Cauchy in the metric space sense with the standard metric d(x, y) = |x − y|.
Key property: Every convergent sequence is Cauchy (the converse is not always true).

Proof sketch: If {xₙ} converges to p, given ε > 0, find M such that d(p, xₙ) < ε/2 for all n ≥ M. Then for n, k ≥ M, the triangle inequality gives d(xₙ, xₖ) ≤ d(xₙ, p) + d(p, xₖ) < ε/2 + ε/2 = ε.

🔄 Complete metric spaces

Complete (Cauchy-complete) metric space: A metric space (X, d) in which every Cauchy sequence converges to a point in X.

Completeness means "no missing limit points"—every sequence that should converge (i.e., is Cauchy) actually does converge within the space.
Example: ℝⁿ with the standard metric is complete.
Example: The space of continuous functions C([a, b], ℝ) with the uniform norm is complete.
Non-example: (0, 1] with the subspace metric is not complete, because {1/n} is Cauchy but converges to 0, which is not in (0, 1].

🔄 Completeness of ℝⁿ

Proof strategy: Reduce to the one-dimensional case.

Let {xₘ} be a Cauchy sequence in ℝⁿ, where xₘ = (xₘ,₁, xₘ,₂, …, xₘ,ₙ).
For each coordinate k = 1, 2, …, n, the sequence {xₘ,ₖ} is Cauchy in ℝ because |xᵢ,ₖ − xⱼ,ₖ| ≤ d(xᵢ, xⱼ).
Since ℝ is complete, each coordinate sequence converges: yₖ = lim xₘ,ₖ.
Define y = (y₁, y₂, …, yₙ). Then {xₘ} converges to y in ℝⁿ.

🔄 Closed subsets of complete spaces

Proposition: If (X, d) is a complete metric space and E ⊂ X is closed, then E is complete with the subspace metric.

Intuition: A closed set "contains all its limit points," so Cauchy sequences in E cannot escape E.
Example: [0, 1] is complete (as a subspace of ℝ), but (0, 1) is not.

🎯 Compactness: definition and basic properties

🎯 Definition via open covers

Compact set: A set K in a metric space (X, d) such that every open cover {Uλ}λ∈I of K has a finite subcover.

An open cover of K is a collection of open sets whose union contains K.
K is compact means: no matter how you cover K with open sets, you can always find finitely many that still cover K.

Example: {0} ⊂ ℝ is compact. Given any open cover, there must be some Uλ₀ containing 0, and that single set is a finite subcover.

Non-example: ℝ is not compact. Cover ℝ by Uⱼ = (−j, j) for j ∈ ℕ. Any finite subcollection Uⱼ₁ ∪ … ∪ Uⱼₘ equals Uⱼₘ (where jₘ is the largest index), but jₘ ∉ Uⱼₘ, so ℝ is not covered.

Non-example: (0, 1) is not compact. Cover by Uⱼ = (1/j, 1 − 1/j) for j = 3, 4, 5, …. Any finite subcollection equals some Uⱼ, which does not contain all of (0, 1).

🎯 Compact implies closed and bounded

Proposition: If K ⊂ X is compact, then K is closed and bounded.

Proof (bounded): Fix p ∈ X. Cover K by B(p, n) for n = 1, 2, 3, …. A finite subcover exists, say B(p, n₁) ∪ … ∪ B(p, nₘ) = B(p, nₘ). Thus K is contained in a ball, so K is bounded.

Proof (closed): Suppose K ≠ K̄, so there exists x ∈ K̄ \ K. Cover K by the complements of closed balls: K ⊂ ⋃ₙ₌₁^∞ C(x, 1/n)ᶜ. Any finite subcollection equals C(x, 1/nₘ)ᶜ for the largest index nₘ. But x ∈ K̄ implies C(x, 1/nₘ) ∩ K ≠ ∅, so K is not covered by the finite subcollection. Hence K is not compact.

Don't confuse: Compact ⇒ closed and bounded, but closed and bounded ⇏ compact in general metric spaces (only in ℝⁿ by Heine–Borel).

🎯 Sequential compactness

Sequentially compact: A set K such that every sequence in K has a subsequence converging to a point in K.

Theorem: In a metric space, K is compact if and only if K is sequentially compact.

This equivalence is specific to metric spaces; in more general topological spaces, the two notions can differ.
Proof direction 1 (compact ⇒ sequentially compact): If K is compact and {xₙ} is a sequence in K, there exists x ∈ K such that every ball B(x, δ) contains xₙ for infinitely many n. Construct a subsequence inductively: pick n₁ such that xₙ₁ ∈ B(x, 1), then nⱼ > nⱼ₋₁ such that xₙⱼ ∈ B(x, 1/j). Then {xₙⱼ} converges to x.
Proof direction 2 (sequentially compact ⇒ compact): Use the Lebesgue covering lemma (see below).

📏 Lebesgue covering lemma and applications

📏 The Lebesgue covering lemma

Lebesgue covering lemma: If K is sequentially compact and {Uλ}λ∈I is an open cover of K, there exists δ > 0 (the Lebesgue number) such that for every x ∈ K, there exists λ ∈ I with B(x, δ) ⊂ Uλ.

The δ depends on the cover, but not on x—it is a uniform constant.
Proof by contrapositive: If no such δ exists, then for each n ∈ ℕ, there is xₙ ∈ K such that B(xₙ, 1/n) is not contained in any Uλ. For any x ∈ K, pick λ such that x ∈ Uλ. Since Uλ is open, B(x, ε) ⊂ Uλ for some ε > 0. For large n (n ≥ M with 1/M < ε/2), if y ∈ B(x, ε/2), then B(y, 1/n) ⊂ B(x, ε) ⊂ Uλ. Thus xₙ ∉ B(x, ε/2) for all n ≥ M, so {xₙ} has no subsequence converging to x. Since x was arbitrary, {xₙ} has no convergent subsequence, contradicting sequential compactness.

Example: Let K = [−10, 10] and Uₙ = (n, n + 2) for n ∈ ℤ. Then δ = 1/2 works: for any x ∈ K, if n ≤ x < n + 1/2, then B(x, 1/2) ⊂ Uₙ₋₁; if n + 1/2 ≤ x < n + 1, then B(x, 1/2) ⊂ Uₙ.

📏 Proof that sequentially compact implies compact

Proof sketch: Given an open cover {Uλ}, use the Lebesgue lemma to find δ > 0. Pick x₁ ∈ K and find λ₁ such that B(x₁, δ) ⊂ Uλ₁. If K ⊂ Uλ₁, stop. Otherwise, pick x₂ ∈ K \ Uλ₁ (note d(x₂, x₁) ≥ δ) and find λ₂ such that B(x₂, δ) ⊂ Uλ₂. Continue inductively. Either we stop at a finite subcover, or we obtain an infinite sequence {xₙ} with d(xₙ, xₖ) ≥ δ for all n ≠ k. Such a sequence has no Cauchy subsequence, hence no convergent subsequence, contradicting sequential compactness.

🏛️ Heine–Borel theorem and examples

🏛️ Heine–Borel theorem

Heine–Borel theorem: A subset K ⊂ ℝⁿ is compact if and only if it is closed and bounded.

This is the key characterization of compactness in Euclidean spaces.
Only in ℝⁿ: The theorem does not hold in general metric spaces, even complete ones.

Proof for ℝ: If K ⊂ ℝ is closed and bounded, then K ⊂ [a, b] for some closed interval. The interval [a, b] is compact (by Bolzano–Weierstrass: every sequence in [a, b] has a convergent subsequence with limit in [a, b]). Since K is a closed subset of the compact set [a, b], K is compact.

Proof for ℝ²: If K ⊂ ℝ² is closed and bounded, then K ⊂ B = [a, b] × [c, d]. Let {(xₖ, yₖ)} be a sequence in B. The sequence {xₖ} is bounded, so it has a convergent subsequence {xₖⱼ}. The subsequence {yₖⱼ} is also bounded, so it has a convergent subsequence {yₖⱼᵢ}. Then {(xₖⱼᵢ, yₖⱼᵢ)} converges to some (x, y) ∈ B. Thus B is sequentially compact, hence compact. Since K is a closed subset of B, K is compact.

Extension to ℝⁿ: Use induction on n, extracting convergent subsequences coordinate by coordinate.

🏛️ Examples and counterexamples

Set	Compact?	Reason
{0} ⊂ ℝ	Yes	Finite sets are always compact
[0, 1] ⊂ ℝ	Yes	Closed and bounded in ℝ (Heine–Borel)
(0, 1) ⊂ ℝ	No	Not closed
ℝ	No	Not bounded
(0, 1] ⊂ ℝ	No	Not closed
Closed unit ball in C([a, b], ℝ)	No	Closed and bounded, but not compact (Heine–Borel fails outside ℝⁿ)

Discrete metric example: Let (X, d) be an infinite set with the discrete metric (d(x, y) = 1 if x ≠ y).

(X, d) is complete.
Every subset is closed and bounded.
A subset K ⊂ X is compact if and only if K is finite.
The Lebesgue covering lemma conclusion holds for any K (e.g., δ = 1/2), even noncompact K.

🏛️ Closed subsets of compact sets

Proposition: If (X, d) is a metric space and K ⊂ X is compact, then every closed subset E ⊂ K is compact.

Proof: Let {xₙ} be a sequence in E. Since E ⊂ K and K is compact, {xₙ} has a subsequence {xₙⱼ} converging to some x ∈ K. Since E is closed and all terms of the subsequence are in E, the limit x must also be in E. Thus E is sequentially compact, hence compact.

Application: To show a set is compact, it suffices to show it is a closed subset of a known compact set.

⚠️ Important distinctions and subtleties

⚠️ Completeness vs compactness

Property	Completeness	Compactness
Definition	Every Cauchy sequence converges	Every open cover has finite subcover (or: every sequence has convergent subsequence)
Depends on	The metric	Only the topology (which sets are open)
Examples	ℝⁿ, C([a, b], ℝ)	[a, b] ⊂ ℝⁿ, finite sets
Relation	Compact ⇒ complete (as a subspace)	Complete ⇏ compact (e.g., ℝ is complete but not compact)

Don't confuse: A metric space can be complete without being compact (ℝ), or compact without being complete in the original metric (but compact sets are always complete as subspaces).

⚠️ Closed and bounded vs compact

In ℝⁿ: Closed and bounded ⇔ compact (Heine–Borel).
In general metric spaces: Compact ⇒ closed and bounded, but closed and bounded ⇏ compact.
Example: In C([0, 1], ℝ) with the uniform norm, the closed unit ball is closed and bounded but not compact.
Example: (0, 1) with the subspace metric from ℝ is bounded but not closed, hence not compact.

⚠️ Dependence on ambient space

Closed: Depends on the ambient space. (0, 1] is not closed in ℝ but is closed in (0, ∞).
Compact: Does not depend on the ambient space. If K is compact in (X, d), then K is compact in the subspace metric on K itself.
Bounded: Depends on the ambient space and the metric.

⚠️ Topology vs metric

Remark: Compactness and convergence depend only on the topology (which sets are open), but Cauchy sequences and completeness depend on the actual metric.

Example: Consider ℝ with two metrics:

d(x, y) = |x − y| (standard metric)
d′(x, y) = |x/(1 + |x|) − y/(1 + |y|)|

The two metrics give the same topology (same open sets), so:

A set is compact in (ℝ, d) ⇔ compact in (ℝ, d′).
A sequence converges in (ℝ, d) ⇔ converges in (ℝ, d′).

But:

(ℝ, d) is complete, but (ℝ, d′) is not complete.
There exist sequences that are Cauchy in (ℝ, d′) but not Cauchy in (ℝ, d).

Continuous functions

7.5 Continuous functions

🧭 Overview

🧠 One-sentence thesis

Continuity in metric spaces generalizes the real-line definition through epsilon-delta conditions, preserves compactness, and can be characterized topologically by the behavior of preimages of open sets.

📌 Key points (3–5)

Definition of continuity: f : X → Y is continuous at c if for every ε > 0 there exists δ > 0 such that d_X(x, c) < δ implies d_Y(f(x), f(c)) < ε.
Sequential characterization: f is continuous at c if and only if for every sequence converging to c, the image sequence converges to f(c).
Topological characterization: f is continuous if and only if the preimage of every open set is open.
Compactness preservation: continuous functions map compact sets to compact sets, and continuous functions on compact spaces achieve absolute maximum and minimum.
Common confusion: taking limits separately in each variable does not guarantee continuity—a function can be continuous in each variable separately but still fail to be continuous as a function of both variables.

🔍 Core definition and characterizations

🔍 Epsilon-delta definition

Continuity at a point: Let (X, d_X) and (Y, d_Y) be metric spaces and c ∈ X. Then f : X → Y is continuous at c if for every ε > 0 there is a δ > 0 such that whenever x ∈ X and d_X(x, c) < δ, then d_Y(f(x), f(c)) < ε.

This extends the familiar real-line definition to arbitrary metric spaces.
When f is continuous at all c ∈ X, we say f is a continuous function.
The definition agrees with chapter 3 when f is a real-valued function on the real line with the standard metric.

📊 Sequential characterization (Proposition 7.5.2)

Statement: f : X → Y is continuous at c ∈ X if and only if for every sequence {x_n} in X converging to c, the sequence {f(x_n)} converges to f(c).

Why this matters:

Sequences are often easier to work with than epsilon-delta arguments.
This characterization shows that continuity is determined by convergent sequences.

Proof sketch:

Forward direction: If f is continuous, given ε > 0, find δ > 0 from continuity; since x_n → c, eventually d_X(x_n, c) < δ, so d_Y(f(x_n), f(c)) < ε.
Reverse direction (contrapositive): If f is not continuous at c, there exists ε > 0 such that for every n ∈ ℕ there exists x_n with d_X(x_n, c) < 1/n but d_Y(f(x_n), f(c)) ≥ ε. Then x_n → c but f(x_n) does not converge to f(c).

⚠️ Separate continuity vs joint continuity

Important warning: A function of two variables can be continuous in each variable separately but still fail to be continuous.

Example: Define f : ℝ² → ℝ by f(x, y) = xy/(x² + y²) for (x, y) ≠ (0, 0) and f(0, 0) = 0.

For every fixed y, the function g(x) = f(x, y) is continuous.
For every fixed x, the function h(y) = f(x, y) is continuous.
However, f is not continuous at the origin (Exercise 7.5.2).
Don't confuse: separate continuity does NOT imply joint continuity.

🧮 Polynomials are continuous (Example 7.5.3)

A polynomial in n variables is continuous:

For ℝ², a polynomial has the form f(x, y) = sum of a_jk x^j y^k.
If (x_n, y_n) → (x, y), then x_n → x and y_n → y separately.
By properties of limits (Proposition 2.2.5), lim f(x_n, y_n) = f(x, y).
Similarly for polynomials in any number of variables.

🔢 Complex-valued functions (Example 7.5.4)

For f : X → ℂ with f(p) = g(p) + ih(p):

f is continuous at c if and only if both g (real part) and h (imaginary part) are continuous at c.
This follows because a sequence in ℂ converges if and only if both real and imaginary parts converge.

🎯 Compactness and continuity

🎯 Continuous images of compact sets (Lemma 7.5.5)

Statement: Let f : X → Y be continuous. If K ⊂ X is compact, then f(K) is compact.

Proof idea:

Take any sequence {f(x_n)} in f(K), where {x_n} is a sequence in K.
Since K is compact, there exists a convergent subsequence x_{n_j} → x ∈ K.
By continuity, f(x_{n_j}) → f(x) ∈ f(K).
So every sequence in f(K) has a convergent subsequence in f(K), hence f(K) is compact.

Why this matters: Continuous maps preserve compactness, even though they don't necessarily preserve closedness.

📈 Extreme value theorem (Theorem 7.5.6)

Statement: Let (X, d) be a nonempty compact metric space and f : X → ℝ continuous. Then f is bounded and achieves both an absolute minimum and an absolute maximum on X.

Absolute minimum: f achieves an absolute minimum at c ∈ X if f(x) ≥ f(c) for all x ∈ X.

Absolute maximum: f achieves an absolute maximum at c ∈ X if f(x) ≤ f(c) for all x ∈ X.

Proof sketch:

Since X is compact and f is continuous, f(X) ⊂ ℝ is compact.
Hence f(X) is closed and bounded.
In particular, sup f(X) ∈ f(X) and inf f(X) ∈ f(X) (because f(X) is closed).
Therefore there exist x, y ∈ X such that f(x) = sup f(X) and f(y) = inf f(X).

🔓 Topological characterization

🔓 Neighborhood characterization (Lemma 7.5.7)

Statement: f : X → Y is continuous at c ∈ X if and only if for every open neighborhood U of f(c) in Y, the set f^{-1}(U) contains an open neighborhood of c in X.

Intuition: Continuity means that points close to c map to points close to f(c), which translates to: open neighborhoods around f(c) pull back to open neighborhoods around c.

Proof sketch:

Forward: If f is continuous at c and U is an open neighborhood of f(c), then B_Y(f(c), ε) ⊂ U for some ε > 0. By continuity, there exists δ > 0 such that B_X(c, δ) ⊂ f^{-1}(B_Y(f(c), ε)) ⊂ f^{-1}(U).
Reverse: Given ε > 0, if f^{-1}(B_Y(f(c), ε)) contains an open neighborhood W of c, then W contains a ball B_X(c, δ) for some δ > 0, giving the epsilon-delta condition.

🗺️ Global topological characterization (Theorem 7.5.8)

Statement: f : X → Y is continuous if and only if for every open U ⊂ Y, f^{-1}(U) is open in X.

This follows from Lemma 7.5.7 applied at every point.
This characterization depends only on the topology (the collection of open sets), not on the specific metric.

🔒 Preimages of closed sets (Example 7.5.9)

If f : X → Y is continuous and E ⊂ Y is closed, then f^{-1}(E) is closed:

f^{-1}(E) = X \ f^{-1}(E^c), and E^c is open, so f^{-1}(E^c) is open, hence f^{-1}(E) is closed.

Applications:

The zero set of a continuous function f : X → ℝ, namely f^{-1}(0) = {x ∈ X : f(x) = 0}, is closed.
This is the most basic result in algebraic geometry: the zero set of a polynomial is closed.
The set where f is nonnegative, f^{-1}([0, ∞)) = {x : f(x) ≥ 0}, is closed.
The set where f is positive, f^{-1}((0, ∞)) = {x : f(x) > 0}, is open.

🌐 Uniform continuity

🌐 Definition and basic properties

Uniform continuity: f : X → Y is uniformly continuous if for every ε > 0 there is a δ > 0 such that whenever p, q ∈ X and d_X(p, q) < δ, we have d_Y(f(p), f(q)) < ε.

Key difference from continuity:

Continuity: for each point c and each ε, we find a δ (which may depend on both c and ε).
Uniform continuity: for each ε, we find a single δ that works for all points simultaneously.
Every uniformly continuous function is continuous, but not vice versa.

🎯 Compactness implies uniform continuity (Theorem 7.5.11)

Statement: Let f : X → Y be continuous and X compact. Then f is uniformly continuous.

Proof idea (uses Lebesgue covering lemma):

Given ε > 0, for each c ∈ X, pick δ_c > 0 such that d_Y(f(x), f(c)) < ε/2 whenever x ∈ B(c, δ_c).
The balls B(c, δ_c) cover X, and X is compact.
By the Lebesgue covering lemma, there exists δ > 0 such that for every x ∈ X, there is a c ∈ X for which B(x, δ) ⊂ B(c, δ_c).
If d_X(p, q) < δ, find c such that B(p, δ) ⊂ B(c, δ_c). Then q ∈ B(c, δ_c), so by the triangle inequality, d_Y(f(p), f(q)) ≤ d_Y(f(p), f(c)) + d_Y(f(c), f(q)) < ε/2 + ε/2 = ε.

🔗 Lipschitz continuity (Example 7.5.13)

Lipschitz (or K-Lipschitz) function: f : X → Y is Lipschitz if there exists K ∈ ℝ such that d_Y(f(p), f(q)) ≤ K d_X(p, q) for all p, q ∈ X.

Every Lipschitz function is uniformly continuous: take δ = ε/K.
Not every uniformly continuous function is Lipschitz: the square root function on [0, 1] is uniformly continuous but not Lipschitz.
Practical note: If a function is Lipschitz, it is often easiest to simply show it is Lipschitz, even if we only need continuity.

🧪 Application: continuity of integrals (Proposition 7.5.12)

Statement: If f : [a, b] × [c, d] → ℝ is continuous, then g : [c, d] → ℝ defined by g(y) = integral from a to b of f(x, y) dx is continuous.

Proof idea:

The rectangle [a, b] × [c, d] is compact, so f is uniformly continuous.
Given ε > 0, there exists δ > 0 such that |z - y| < δ implies |f(x, z) - f(x, y)| < ε/(b - a) for all x ∈ [a, b].
Then |g(z) - g(y)| = |integral of (f(x, z) - f(x, y)) dx| ≤ (b - a) · ε/(b - a) = ε.

Application: If f is continuous in [a, b] × ℝ, then g is continuous on ℝ (apply the proposition to [a, b] × [y₀ - ε, y₀ + ε] for any y₀).

🎯 Limits of functions

🎯 Cluster points (Definition 7.5.14)

Cluster point: A point p ∈ X is a cluster point of S ⊂ X if for every ε > 0, the set B(p, ε) ∩ S \ {p} is not empty.

It is not enough that p is in the closure of S; p must be in the closure of S \ {p}.
p is a cluster point if and only if there exists a sequence in S \ {p} that converges to p.

🎯 Limit of a function (Definition 7.5.15)

Limit: Let S ⊂ X, p ∈ X a cluster point of S, and f : S → Y. We say f(x) converges to L ∈ Y as x goes to p if for every ε > 0, there exists δ > 0 such that whenever x ∈ S \ {p} and d_X(x, p) < δ, then d_Y(f(x), L) < ε.

If L is unique, we write lim_{x → p} f(x) = L.
If f(x) does not converge as x goes to p, we say f diverges at p.

📊 Properties of limits

Uniqueness (Proposition 7.5.16): If f(x) converges as x goes to p, then the limit is unique.

Sequential characterization (Lemma 7.5.17): f(x) converges to L as x goes to p if and only if for every sequence {x_n} in S \ {p} such that lim x_n = p, the sequence {f(x_n)} converges to L.

Connection to continuity: For cluster points p of S ⊂ X, the function f : S → Y is continuous at p if and only if lim_{x → p} f(x) = f(p).

Fixed point theorem and Picard's theorem again

7.6 Fixed point theorem and Picard’s theorem again

🧭 Overview

🧠 One-sentence thesis

The contraction mapping principle guarantees a unique fixed point in complete metric spaces, and this theorem provides a streamlined proof of Picard's existence and uniqueness theorem for ordinary differential equations.

📌 Key points (3–5)

Contraction mapping principle: every contraction on a nonempty complete metric space has exactly one fixed point.
What makes a contraction: a map that is k-Lipschitz for some k < 1, meaning it shrinks distances by a factor strictly less than 1.
Constructive proof: the fixed point is found by iterating the map from any starting point; the sequence converges to the unique fixed point.
Common confusion: both completeness and the contraction property (k < 1) are necessary; a 1-Lipschitz map or a contraction on a non-complete space may have no fixed point.
Application to differential equations: Picard's theorem converts the ODE into an integral equation and applies the fixed point theorem to a suitable function space.

🔧 Contraction mappings and fixed points

🔧 What is a contraction

Contraction (or contractive map): Let (X, d_X) and (Y, d_Y) be metric spaces. A map φ: X → Y is a contraction if it is k-Lipschitz for some k < 1, i.e., there exists k < 1 such that d_Y(φ(p), φ(q)) ≤ k d_X(p, q) for all p, q in X.

A contraction is a Lipschitz map with Lipschitz constant strictly less than 1.
It shrinks distances: the image points are closer together than the original points.
Example: the map f(x) = kx + b on the real line with 0 < k < 1 is a contraction.

🎯 What is a fixed point

Fixed point: Given a map φ: X → X, a point x in X is a fixed point if φ(x) = x.

The map leaves the point unchanged.
Example: for f(x) = x - x², the point x = 0 is a fixed point because f(0) = 0.

🏆 The contraction mapping principle

🏆 Statement of the theorem

Theorem (Contraction mapping principle or Banach fixed point theorem): Let (X, d) be a nonempty complete metric space and φ: X → X a contraction. Then φ has a unique fixed point.

Both hypotheses are necessary:
- Complete: the space must be complete (every Cauchy sequence converges).
- Contraction: the map must have k < 1 (not just k = 1).
The theorem is named after Stefan Banach (1892–1945), who stated it in 1922.

🔨 How the proof works

Construction of the fixed point:

Start with any point x₀ in X.
Define a sequence by iteration: x_{n+1} = φ(x_n).
The contraction property implies d(x_{n+1}, x_n) ≤ k^n d(x₁, x₀).
For m > n, the triangle inequality and geometric series give:
- d(x_m, x_n) ≤ k^n d(x₁, x₀) · (sum of k^i from i=0 to ∞) = k^n d(x₁, x₀) / (1 - k).
Since k < 1, the sequence is Cauchy.
By completeness, the sequence converges to some x in X.

Verification:

x is a fixed point: Because φ is a contraction, it is Lipschitz continuous, so φ(x) = φ(lim x_n) = lim φ(x_n) = lim x_{n+1} = x.
Uniqueness: If x and y are both fixed points, then d(x, y) = d(φ(x), φ(y)) ≤ k d(x, y). Since k < 1, this forces d(x, y) = 0, so x = y.

Don't confuse: The proof is constructive—it tells you how to find the fixed point by iteration, not just that it exists.

🧮 Practical use

The iteration scheme provides better and better approximations to the fixed point.
You can estimate how far you are from the fixed point at each step.
This makes the theorem useful in real-world applications (e.g., numerical methods).

📐 Application: Picard's theorem for ODEs

📐 The differential equation problem

Setup:

Consider the ordinary differential equation (ODE): dy/dx = F(x, y).
Given an initial condition (x₀, y₀), we want a function y = f(x) such that:
- f(x₀) = y₀ (initial condition).
- f'(x) = F(x, f(x)) (the ODE is satisfied).

Examples from the excerpt:

y' = y, y(0) = 1 has solution y(x) = e^x.
y' = -2xy, y(0) = 1 has solution y(x) = e^(-x²) (Gaussian).
y' = y², y(0) = 1 has solution y(x) = 1/(1 - x), which "blows up" at x = 1 (solution does not exist for all x).

🔄 Converting to an integral equation

Why work with integrals:

Instead of looking for a differentiable function, we solve the equivalent integral equation:
- f(x) = y₀ + integral from x₀ to x of F(t, f(t)) dt.
By the fundamental theorem of calculus, if f satisfies this integral equation, then f is differentiable and f'(x) = F(x, f(x)).
We only need f to be continuous to make sense of the integral, so we search in the space C([a, b], ℝ) of continuous functions.

Don't confuse: We are looking for a differentiable solution, but we work in the space of continuous functions because the integral equation is easier to handle there.

🧪 The metric space for Picard's theorem

The space C([a, b], ℝ):

Functions f: [a, b] → ℝ that are continuous.
Metric: d(f, g) = ||f - g||{[a,b]} = sup{x in [a,b]} |f(x) - g(x)| (uniform norm).
Convergence in this metric is uniform convergence.
This space is complete (Proposition 7.4.5).

🎓 Statement of Picard's theorem

Theorem (Picard's theorem on existence and uniqueness): Let I, J ⊂ ℝ be closed and bounded intervals, let (x₀, y₀) be in the interior of I × J. Suppose F: I × J → ℝ is continuous and Lipschitz in the second variable, i.e., there exists L in ℝ such that |F(x, y) - F(x, z)| ≤ L |y - z| for all y, z in J, x in I. Then there exists h > 0 such that [x₀ - h, x₀ + h] ⊂ I and a unique differentiable function f: [x₀ - h, x₀ + h] → J such that f'(x) = F(x, f(x)) and f(x₀) = y₀.

Hypotheses:

F is continuous on a compact rectangle I × J.
F is Lipschitz in the second variable (the y-variable).
The initial point (x₀, y₀) is in the interior.

Conclusion:

A solution exists on some interval [x₀ - h, x₀ + h].
The solution is unique.

🛠️ How the proof uses the fixed point theorem

Step 1: Set up the complete metric space

Without loss of generality, assume x₀ = 0.
F is bounded on the compact set I × J, say |F(x, y)| ≤ M.
Choose α > 0 so that [-α, α] ⊂ I and [y₀ - α, y₀ + α] ⊂ J.
Let h = min{α, α/(M + Lα)}.
Define Y = {f in C([-h, h], ℝ) : f([-h, h]) ⊂ J}.
Y is a closed subset of C([-h, h], ℝ), so Y is a complete metric space with the subspace metric.

Step 2: Define the operator T

Define T: Y → C([-h, h], ℝ) by T(f)(x) = y₀ + integral from 0 to x of F(t, f(t)) dt.
T is well-defined because f is continuous, so F(t, f(t)) is continuous in t, so the integral exists.
For f in Y and |x| ≤ h:
- |T(f)(x) - y₀| = |integral from 0 to x of F(t, f(t)) dt| ≤ |x| M ≤ hM ≤ α.
So T(f)([-h, h]) ⊂ [y₀ - α, y₀ + α] ⊂ J, meaning T(f) is in Y.
Thus T maps Y to Y.

Step 3: Show T is a contraction

For f, g in Y and x in [-h, h]:
- |F(x, f(x)) - F(x, g(x))| ≤ L |f(x) - g(x)| ≤ L d(f, g).
Therefore:
- |T(f)(x) - T(g)(x)| = |integral from 0 to x of [F(t, f(t)) - F(t, g(t))] dt| ≤ |x| L d(f, g) ≤ hL d(f, g).
Since h ≤ α/(M + Lα), we have hL ≤ Lα/(M + Lα) < 1.
Taking the supremum over x in [-h, h]: d(T(f), T(g)) ≤ (Lα/(M + Lα)) d(f, g).
So T is a contraction with k = Lα/(M + Lα) < 1.

Step 4: Apply the fixed point theorem

By the contraction mapping principle, there exists a unique f in Y such that T(f) = f.
This means f(x) = y₀ + integral from 0 to x of F(t, f(t)) dt.
Clearly f(0) = y₀.
By the fundamental theorem of calculus, f is differentiable and f'(x) = F(x, f(x)).
So f is the unique differentiable solution on [-h, h].

Don't confuse: The solution may not exist for all x (as in the y' = y² example); the theorem only guarantees existence on a small interval [x₀ - h, x₀ + h].

🔍 Key insights and common confusions

🔍 Why completeness matters

A contraction on a non-complete space may have no fixed point.
Example (Exercise 7.6.6a): find a contraction on a non-complete space with no fixed point.
Completeness ensures that the Cauchy sequence constructed by iteration actually converges to a point in the space.

🔍 Why k < 1 matters

A 1-Lipschitz map (k = 1) may have no fixed point.
Example (Exercise 7.6.6b): find a 1-Lipschitz map on a complete space with no fixed point.
The strict inequality k < 1 is what forces the distance to shrink and the sequence to converge.

🔍 The constructive nature of the proof

The proof tells you how to find the fixed point: start anywhere and iterate.
You can estimate the error at each step: d(x, x_n) ≤ k^n d(x₁, x₀) / (1 - k).
This is useful for numerical approximation (e.g., Newton's method for square root 2 in Exercise 7.6.9).

🔍 Why Picard's theorem uses an integral equation

The integral equation f(x) = y₀ + integral of F(t, f(t)) dt is equivalent to the ODE f'(x) = F(x, f(x)) with initial condition f(x₀) = y₀.
Working with the integral equation allows us to search in the space of continuous functions, which is complete.
The operator T defined by the integral is a contraction under the right conditions (Lipschitz in y, small interval).