An Introduction to the Theory of Numbers

1

Compositions and Partitions

Chapter 1. Compositions and Partitions

🧭 Overview

🧠 One-sentence thesis

Compositions and partitions count the ways to write a number as a sum—compositions care about order, partitions do not—and both can be studied through combinatorial reasoning or generating-function algebra.

📌 Key points (3–5)

  • Compositions vs partitions: compositions count ordered sums (c(n)), partitions count unordered sums (p(n)); for n=3, c(3)=4 but p(3)=3.
  • Two main methods: combinatorial arguments (counting directly) and algebraic arguments (manipulating generating series).
  • Compositions are simpler: c(n) = 2^(n−1) by a slash-insertion argument; restricted compositions (e.g., odd parts only) connect to Fibonacci numbers.
  • Partitions are harder: no simple formula for p(n), but Euler found a recursion using pentagonal numbers; Ferrers diagrams enable visual proofs.
  • Common confusion: don't mix up "distinct parts" vs "odd parts"—they count the same number of partitions (proven algebraically), but they are conceptually different restrictions.

🔢 Compositions: ordered sums

🔢 Definition and basic count

Composition: a way to write n as a sum where the order of terms matters.

  • Denoted c(n).
  • Example: 3 = 3, 3 = 1+2, 3 = 2+1, 3 = 1+1+1 → c(3) = 4.
  • Combinatorial argument: write n as n ones with (n−1) spaces between them; each space can have a slash or not → 2^(n−1) possibilities.
  • Example: 3 = 1 1 1 can become 1/1 1 or 1 1/1 or 1/1/1, plus no slashes → 4 compositions.

🧮 Algebraic method for c(n)

  • The generating function is the sum over n of c(n) x^n.
  • This equals the sum over m of (x + x² + x³ + ...)^m = the sum over m of (x/(1−x))^m.
  • Simplifying: x/(1−2x) = sum of 2^(n−1) x^n.
  • This confirms c(n) = 2^(n−1) algebraically.

🎯 Restricted compositions

The excerpt gives three exercises (combinatorial + algebraic):

  1. Exactly m parts: the number is (n−1 choose m−1) (Catalan).
  2. Even parts only: 2^(n/2 − 1) if n is even, 0 if n is odd.
  3. Even vs odd number of parts: the counts are equal.

Don't confuse "even parts" (each summand is even) with "even number of parts" (the count of summands is even).

🌀 Odd parts and Fibonacci numbers

  • Let c*(n) = number of compositions of n into odd parts.
  • Algebraic approach: generating function = x/(1−x−x²).
  • Cross-multiplying shows F_{n+2} = F_n + F_{n+1}, F₀=0, F₁=1 → the Fibonacci sequence 1, 1, 2, 3, 5, 8, 13, ...
  • Two explicit formulas:
    • By partial fractions: F_n = (1/√5) [ ((1+√5)/2)^n − ((1−√5)/2)^n ].
    • By expanding (Lucas): F_n = (n−1 choose 0) + (n−2 choose 1) + (n−3 choose 2) + ...
  • The excerpt suggests proving Lucas's formula combinatorially as an exercise.

⚖️ Summands at most 2 vs at least 2

  • a(n) = compositions of n with all summands ≤ 2.
  • b(n) = compositions of n with all summands ≥ 2.
  • Result: a(n) = b(n+2).
  • Proof:
    • a(n) satisfies a(n) = a(n−1) + a(n−2) (every composition ends in 1 or 2; delete it).
    • a(1)=1, a(2)=2 → a(n) = F_n.
    • b(n) satisfies the same recursion: if last summand is 2, delete it; if >2, reduce by 1.
    • b(2)=b(3)=1 → b(n) = F_{n−2}.
    • Therefore a(n) = F_n = b(n+2).
  • The excerpt asks for a reasonable generalization (open problem).

🏋️ Weight of a composition

Weight: the product of all summands in a composition.

  • w(n) = sum of weights of all compositions of n.
  • Generating function: x/(1−3x+x²).
  • Recursion: w(n) = 3w(n−1) − w(n−2).
  • Exercise: prove w(n) = F_{2n−1}.

🧩 Partitions: unordered sums

🧩 Definition and difficulty

Partition: a way to write n as a sum where order does not matter.

  • Denoted p(n).
  • Example: 3 = 3, 3 = 2+1, 3 = 1+1+1 → p(3) = 3.
  • No simple explicit formula for p(n) (unlike compositions).
  • Main goal: prove Euler's recursion formula.

📊 Generating function

  • The generating function for p(n) is 1 / [(1−x)(1−x²)(1−x³)···].
  • This product encodes unrestricted partitions.
  • Algebraic manipulations of this function yield results for restricted partitions.

🖼️ Ferrers diagrams

  • A Ferrers diagram (also called Young diagram) represents a partition as rows of dots.
  • Example: 7 = 4+2+1 is drawn as:
    • Row 1: •••• (4 dots)
    • Row 2: •• (2 dots)
    • Row 3: • (1 dot)

🔄 Conjugate partitions

Conjugate partition: reflect the Ferrers diagram across a 45° line from top-left.

  • Example: 7=4+2+1 (rows 4,2,1) conjugates to 7=3+2+1+1 (columns 3,2,1,1).
  • Immediate consequences:
    • Number of partitions of n into m parts = number of partitions of n with largest part m.
    • Number of partitions of n into ≤m parts = number of partitions of n into parts ≤m.

🎲 Restricted partitions and identities

🎲 Odd parts vs distinct parts

Theorem: The number of partitions of n into odd parts equals the number of partitions of n into distinct parts.

  • Algebraic proof: show that 1/[(1−x)(1−x³)(1−x⁵)···] = (1+x)(1+x²)(1+x³)···.
  • Cross-multiplying makes the result "intuitive" (the excerpt does not elaborate).
  • Don't confuse: "odd parts" means each summand is odd; "distinct parts" means no summand repeats.

🔺 Euler's pentagonal-number theorem

Theorem (Euler):
(1−x)(1−x²)(1−x³)··· = 1 − x − x² + x⁵ + x⁷ − x¹² − x¹⁵ + ···

  • The exponents are pentagonal numbers: (1/2)k(3k±1) for k=1,2,3,...
  • These are: 1, 2, 5, 7, 12, 15, 22, 26, ...

🧮 Combinatorial proof via E(n) and O(n)

  • Expand (1−x)(1−x²)··· = sum of (E(n) − O(n)) x^n.
  • E(n) = partitions of n into an even number of distinct parts.
  • O(n) = partitions of n into an odd number of distinct parts.
  • Goal: show E(n) − O(n) = 0 except when n is pentagonal.

The bijection:

  • Draw a Ferrers diagram for a partition into distinct parts.
  • Base: the bottom row AB.
  • Wing: the longest south-west diagonal line from the top-right corner C.
  • Operation: try to move the base to become a new wing (parallel to the old wing), or vice versa.
  • This operation flips parity (even ↔ odd number of parts), so usually E(n) = O(n).

Exceptional cases:

  • The operation fails when the diagram has a special shape:
    • Type 1: k + (k+1) + ··· + (2k−1) = (1/2)(3k²−k).
    • Type 2: (k+1) + (k+2) + ··· + 2k = (1/2)(3k²+k).
  • In both cases, there is one extra partition (even or odd) depending on whether k is even or odd.
  • Therefore E(n) − O(n) = (−1)^k when n is pentagonal, 0 otherwise.

🔁 Euler's recursion for p(n)

From the pentagonal theorem and the generating function:

p(n) = p(n−1) + p(n−2) − p(n−5) − p(n−7) + p(n−12) + p(n−15) − ···

  • The indices are n minus pentagonal numbers.
  • Signs alternate in pairs: +, +, −, −, +, +, −, −, ...
  • This recursion allows computation of p(n) without a closed formula.

🔗 Methods and philosophy

🔗 Three methods (only two covered)

The excerpt mentions three approaches to compositions and partitions:

MethodDescriptionCoverage
CombinatorialDirect counting arguments, diagrams✓ Covered
AlgebraicGenerating series manipulations✓ Covered
AnalyticAnalytic operations on generating series✗ Not covered

🧠 Combinatorial vs algebraic trade-offs

  • Combinatorial: often more intuitive, visual (e.g., Ferrers diagrams, slash insertion).
  • Algebraic: can be faster for deriving formulas (e.g., Fibonacci connection, Euler's theorem).
  • The excerpt recommends practicing both methods on the same problem (see exercises).

🎯 Open problems hinted

  • Generalize a(n) = b(n+2) beyond "≤2 vs ≥2."
  • Prove Lucas's binomial formula for Fibonacci numbers combinatorially.
  • The excerpt notes that "natural problems in number theory are either too easy or much too difficult," so it focuses on problems with "reasonable hope for a solution."
2

Arithmetic Functions

Chapter 2. Arithmetic Functions

🧭 Overview

🧠 One-sentence thesis

Arithmetic functions—such as the divisor function, sum-of-divisors function, and Euler's totient—can be unified through Dirichlet multiplication, revealing deep interconnections and enabling systematic derivation of identities.

📌 Key points (3–5)

  • Core arithmetic functions: τ(n) counts divisors, σ(n) sums divisors, φ(n) counts integers ≤ n relatively prime to n, and μ(n) is the Möbius function.
  • Multiplicative property: functions like τ, σ, and φ satisfy f(ab) = f(a)f(b) when a and b are coprime.
  • Dirichlet multiplication: the operation (f × g)(n) = sum over d|n of f(d)g(n/d) corresponds to multiplying Dirichlet series and reveals hidden relationships.
  • Common confusion: Dirichlet multiplication is not pointwise; it involves summing over all divisor pairs, not just multiplying function values.
  • Möbius inversion: the formula g = f × I₀ if and only if f = μ × g provides a powerful tool for inverting summation formulas.

📐 Core arithmetic functions and their formulas

📊 Key functions defined

The excerpt introduces several fundamental arithmetic functions:

FunctionNotationDefinitionMeaning
Prime countingπ(n)sum over p ≤ n of 1Number of primes not exceeding n
Distinct prime factorsω(n)sum over p dividing n of 1Number of distinct prime factors
Prime power factorsΩ(n)sum over pⁱ dividing n of 1Number of prime power factors (with multiplicity)
Divisor functionτ(n)sum over d dividing n of 1Number of divisors
Sum of divisorsσ(n)sum over d dividing n of dSum of all divisors
Euler totientφ(n)sum over 1 ≤ a ≤ n with (a,n)=1 of 1Count of integers ≤ n relatively prime to n

🔢 Generalized functions

  • σₖ(n): sum over d dividing n of dᵏ (the sum of kth powers of divisors).
    • Special cases: σ₀(n) = τ(n) and σ₁(n) = σ(n).
  • φₖ(n): Jordan's generalization, counting k-tuples ≤ n whose g.c.d. is relatively prime to n.

🧩 The Möbius function μ(n)

μ(d) = 0 if d has a square factor; μ(1) = 1; μ(p₁p₂...pₛ) = (−1)ˢ for distinct primes.

  • The function takes values 0, 1, −1 only.
  • It appears "artificial" at first but plays a central role in number theory.
  • Example: if n = 6 = 2·3, then μ(6) = (−1)² = 1; if n = 12 = 2²·3, then μ(12) = 0 (has square factor 4).

🔧 Multiplicative functions and prime factorization

🔑 Multiplicative property

A function f is called weakly multiplicative (or simply multiplicative) if f(ab) = f(a)f(b) whenever (a,b) = 1.

  • The functions τ, σ, φ, and σₖ all have this property.
  • This property allows explicit formulas using prime factorization.

🧮 Formulas from prime factorization

Suppose n = p₁^α₁ · p₂^α₂ · ... · pₛ^αₛ.

For τ(n):

  • τ(n) = product over p|n of (α + 1).
  • Example: 60 = 2² · 3¹ · 5¹, so τ(60) = (2+1)(1+1)(1+1) = 3·2·2 = 12.

For σ(n):

  • σ(n) = product over p|n of (1 + p + p² + ... + p^α) = product over p|n of (p^(α+1) − 1)/(p − 1).
  • Example: σ(60) = (1+2+4)(1+3)(1+5) = 7·4·6 = 168.

For φ(n):

  • φ(n) = n · product over p|n of (1 − 1/p).
  • Example: φ(60) = 60·(1 − 1/2)·(1 − 1/3)·(1 − 1/5) = 60·(1/2)·(2/3)·(4/5) = 16.

🧷 Deriving φ(n) via inclusion-exclusion

Principle of Inclusion and Exclusion: Given N objects and characteristics A₁, A₂, ..., the number with none of these characteristics is N − sum of N(Aᵢ) + sum of N(Aᵢ,Aⱼ) − sum of N(Aᵢ,Aⱼ,Aₖ) + ..., with alternating signs.

  • An integer is relatively prime to n only if it is not divisible by any prime factor of n.
  • Let A₁, A₂, ..., Aₛ denote divisibility by p₁, p₂, ..., pₛ.
  • Applying inclusion-exclusion: φ(n) = n − sum of n/pᵢ + sum of n/(pᵢpⱼ) − sum of n/(pᵢpⱼpₖ) + ...
  • This factors into φ(n) = n · product over p|n of (1 − 1/p).

⚙️ Dirichlet multiplication and the calculus of arithmetic functions

🔗 Dirichlet product definition

The Dirichlet product (f × g)(n) = sum over d|n of f(d)g(n/d) = sum over dd′=n of f(d)g(d′).

  • Motivation: If F(s) = sum of f(n)n⁻ˢ and G(s) = sum of g(n)n⁻ˢ, then F(s)·G(s) = sum of h(n)n⁻ˢ where h = f × g.
  • Dirichlet multiplication of functions corresponds to ordinary multiplication of their Dirichlet series.
  • Properties: commutative (f × g = g × f) and associative ((f × g) × h = f × (g × h)).

🎯 The unity function ℓ

  • Define ℓ(n) = 1 if n=1, else 0 (written as 1, 0, 0, ...).
  • For any function f: f × ℓ = f.
  • ℓ is the identity element for Dirichlet multiplication.

🔄 Inverses and regular functions

  • A function f is regular if f(1) ≠ 0.
  • Every regular function has a Dirichlet inverse f⁻¹ such that f × f⁻¹ = ℓ.
  • The regular functions form a group under Dirichlet multiplication.
  • Key fact: The Dirichlet product of multiplicative functions is again multiplicative.

🧩 Building functions from Iₖ and ℓ

Define Iₖ(n) = nᵏ (the sequence 1ᵏ, 2ᵏ, 3ᵏ, ...).

Möbius function:

  • μ = I₀⁻¹ (the Dirichlet inverse of I₀).
  • This means μ × I₀ = ℓ, i.e., sum over d|n of μ(d) = ℓ(n) = 1 if n=1, else 0.

Sum of divisors:

  • σₖ = I₀ × Iₖ.
  • This means σₖ(n) = sum over d|n of dᵏ·1.
  • Special cases: τ = I₀ × I₀ = I₀² and σ = I₀ × I₁.

Euler totient:

  • φₖ = μ × Iₖ = I₀⁻¹ × Iₖ.
  • This means φₖ(n) = sum over d|n of μ(d)·(n/d)ᵏ.
  • Special case: φ = φ₁ = μ × I₁.

Don't confuse: These are definitions via Dirichlet multiplication, not pointwise operations.

🔀 Möbius inversion and key identities

🔁 Möbius inversion formula

g = f × I₀ if and only if f = μ × g.

  • Written explicitly: g(n) = sum over d|n of f(d) if and only if f(n) = sum over d|n of μ(d)g(n/d).
  • This provides a systematic way to invert summation formulas over divisors.

📋 Important identities from inversion

From σₖ = I₀ × Iₖ:

  • Inverting: Iₖ = μ × σₖ.
  • Meaning: sum over d|n of μ(d)σₖ(n/d) = nᵏ.
  • Special cases:
    • sum over d|n of μ(d)τ(n/d) = 1
    • sum over d|n of μ(d)σ(n/d) = n

From φₖ = I₀⁻¹ × Iₖ:

  • Inverting: Iₖ = I₀ × φₖ.
  • Meaning: sum over d|n of φₖ(d) = nᵏ.
  • Special case: sum over d|n of φ(d) = n (a fundamental identity).

🔗 Combined identities

From σₖ × φₖ = I₀ × Iₖ × I₀⁻¹ × Iₖ = Iₖ × Iₖ:

  • sum over d|n of σₖ(d)φₖ(n/d) = sum over d|n of dᵏ(n/d)ᵏ = sum over d|n of nᵏ = τ(n)nᵏ.
  • Special case: sum over d|n of σ(d)φ(n/d) = nτ(n).

🎼 Dirichlet series correspondences

📡 Basic correspondences

The correspondence F(s) ↔ f(n) means F(s) = sum of f(n)n⁻ˢ.

Starting points:

  • I₀ ↔ ζ(s) (the Riemann zeta function)
  • Iₖ ↔ sum of nᵏn⁻ˢ = ζ(s − k)
  • μ ↔ 1/ζ(s)
  • I₀′ ↔ sum of (−log n)n⁻ˢ = ζ′(s)

🧮 Derived series formulas

For divisor functions:

  • sum of σₖ(n)n⁻ˢ = ζ(s)ζ(s − k)
  • Special cases:
    • sum of τ(n)n⁻ˢ = ζ²(s)
    • sum of σ(n)n⁻ˢ = ζ(s)ζ(s − 1)

For totient functions:

  • sum of φₖ(n)n⁻ˢ = ζ(s − k)/ζ(s)
  • Special case: sum of φ(n)n⁻ˢ = ζ(s − 1)/ζ(s)

🔢 Numerical examples

Using known values of ζ (e.g., ζ(2) = π²/6, ζ(4) = π⁴/90):

  • sum of τ(n)n⁻² = ζ²(2) = π⁴/36
  • sum of σ₄(n)n⁻² = ζ(2)·ζ(4) = (π²/6)·(π⁴/90) = π⁶/540
  • sum of μ(n)n⁻² = 6/π²

🌟 The von Mangoldt function Λ(n)

Λ(n) = log p if n = p^α for some prime p and α ≥ 1; otherwise Λ(n) = 0.

  • Key property: sum over d|n of Λ(d) = log n.
  • In Dirichlet notation: Λ × I₀ = −I₀′.
  • Therefore: Λ = I₀⁻¹ × (−I₀′) = μ × (−I₀′).
  • Explicit formula: Λ(n) = sum over d|n of μ(d)log(n/d) = − sum over d|n of μ(d)log d.
  • Dirichlet series: sum of Λ(n)n⁻ˢ = −ζ′(s)/ζ(s).
  • The prime number theorem depends on estimating Ψ(x) = sum from n=1 to x of Λ(n), aiming to show Ψ(x) ~ x.
  • Contour integration with −ζ′(s)/ζ(s) requires knowing where ζ(s) vanishes—a central problem in number theory.

🎭 Other functions and generating series

🔄 The Liouville function λ(n)

If n = p₁^α₁ · p₂^α₂ · ... · pₛ^αₛ, define λ(n) = (−1)^(α₁ + α₂ + ... + αₛ).

  • Key property: sum over d|n of λ(d) = 1 if n = r² (a perfect square), else 0.
  • Define s(n) = 1 if n is a perfect square, else 0.
  • Then ζ(2s) = sum of s(n)n⁻ˢ.
  • Since λ × I₀ = s, we have sum of λ(n)n⁻ˢ · ζ(s) = ζ(2s), so sum of λ(n)n⁻ˢ = ζ(2s)/ζ(s).
  • Example: sum of λ(n)n⁻² = (π⁴/90)/(π²/6) = π²/15.

🌊 Lambert series

Lambert series have the form sum of f(n)xⁿ/(1 − xⁿ).

  • Key relationship: If F = f × I₀, then sum of f(n)xⁿ/(1 − xⁿ) = sum of F(n)xⁿ.

Special cases:

  • f = I₀: sum of xⁿ/(1 − xⁿ) = sum of τ(n)xⁿ
  • f = μ: sum of μ(n)xⁿ/(1 − xⁿ) = x
  • f = φ: sum of φ(n)xⁿ/(1 − xⁿ) = sum of nxⁿ = x/(1 − x)²

Numerical example: Taking x = 1/10 in the last equality: φ(1)/9 + φ(2)/99 + φ(3)/999 + ... = 10/81.

3

Distribution of Primes

Chapter 3. Distribution of Primes

🧭 Overview

🧠 One-sentence thesis

The distribution of primes can be understood through a progression of increasingly sophisticated results—from Euclid's proof of infinitude to estimates of the prime-counting function π(x) and average behavior of arithmetic functions—culminating in bounds that show π(x) is asymptotically x/log x.

📌 Key points (3–5)

  • Infinitude of primes: Euclid's classical proof and Euler's analytic approach (via the zeta function) both establish that there are infinitely many primes, with Euler showing that the sum of reciprocals of primes diverges.
  • Chebycheff's three main results: (1) there is always a prime between n and 2n; (2) π(x) is bounded above and below by constant multiples of x/log x; (3) if π(x)log x/x approaches a limit, that limit is 1.
  • Bertrand's Postulate and binomial coefficients: The central binomial coefficient (2n choose n) encodes information about which primes divide it and with what multiplicity, enabling proofs about prime gaps.
  • Average order of arithmetic functions: Functions like τ(n), σ(n), and φ(n) behave erratically on individual integers but have predictable average values (e.g., the average of φ(n) is ∼ (3/π²)n).
  • Common confusion—individual vs average behavior: An arithmetic function may have no simple closed form for a single n, yet its sum over 1 to x can be approximated by an elementary function; don't confuse pointwise bounds with average estimates.

🔢 Proving infinitude of primes

🔢 Euclid's classical argument

Euclid's proof: If p were the largest prime, then (2·3·5···p) + 1 would not be divisible by any prime up to p, hence would be a product of primes exceeding p—a contradiction.

  • The proof is "extremely simple" yet raises hard open questions: are the numbers (2·3···p) + 1 usually prime or composite? No general answer is known.
  • Variations like (2·3···p) − 1, p! + 1, or p! − 1 are also mysterious; almost nothing is known about their factorizations.
  • Example curiosity: 4! + 1 = 25, 5! + 1 = 121, 7! + 1 = 5041 are all perfect squares, but no other cases are known, and it is unknown whether infinitely many n! + 1 are squares.

🔢 Euler's analytic proof via the zeta function

Euler's identity: ζ(s) = sum over n of 1/nˢ = product over primes p of 1/(1 − 1/pˢ).

  • This identity is "essentially an analytic statement of the unique factorization theorem."
  • Formally: the product expands as (1 + 1/2ˢ + 1/4ˢ + ···)(1 + 1/3ˢ + ···)··· = sum of 1/nˢ.
  • For s = 1, the harmonic series diverges, so the product over primes must be infinite, implying the sum of 1/p diverges.
  • Euler's original argument "although not quite valid, can certainly be made valid."

🔢 A new proof that sum of 1/p diverges

The excerpt presents a proof "somewhat related to Euclid's" that uses a lemma about subseries of the harmonic series.

Lemma: If a sequence p₁ < p₂ < ··· of positive integers has counting function π(x) = sum over p ≤ x of 1, and if R(x) = sum over p ≤ x of 1/p converges to a finite limit, then π(x)/x → 0.

  • Proof sketch: π(x) = sum from k=1 to x of k·(R(k) − R(k−1)); dividing by x and rearranging shows π(x)/x = R(x) − [average of R(0),…,R(x−1)], which → 0 if R converges.
  • Proof by contradiction: Assume sum of 1/p converges. Then there exist n and m such that sum over p > n of 1/p < 1/2 and π(n!m)/(n!m) < 1/(2n!).
  • Consider the m numbers Tᵢ = i·n! − 1 for i = 1,…,m. None have prime factors ≤ n or ≥ mn!. If p divides both Tᵢ and Tⱼ, then p divides (i − j), so multiples of p are spaced p apart; hence at most m/p + 1 of the Tᵢ are divisible by p.
  • Summing over primes n < p < n!m: sum of (m/p + 1) ≥ m, i.e., sum of 1/p + π(n!m)/m ≥ 1. But by our assumptions the left side is < 1/2 + 1/2 = 1, a contradiction.

🔢 Euler's result refined

The excerpt shows that sum over p ≤ x of 1/p = log log x + O(1).

  • This is derived from the identity sum over p ≤ x of (log p)/p = log x + O(1) (obtained by taking logarithms of n! and approximating the exponent eₚ(n) by n/p).
  • The result R(x) ∼ log log x is used later in the chapter.

📐 Chebycheff's estimates and Bertrand's Postulate

📐 Chebycheff's three main results

Chebycheff obtained the first substantial progress toward the Prime Number Theorem (π(x) ∼ x/log x):

  1. Bertrand's Postulate: There is a prime between n and 2n for every n > 1.
  2. Two-sided bounds: There exist positive constants c₁ and c₂ such that (c₂ x)/log x < π(x) < (c₁ x)/log x.
  3. Uniqueness of the limit: If π(x)log x / x approaches a limit, that limit is 1.
  • The proofs use "methods as modified by Landau, Erdős and, to a minor extent, L. Moser."
  • Don't confuse: Chebycheff did not prove the Prime Number Theorem itself (that π(x) ∼ x/log x), only that if the ratio has a limit, it must be 1, and that π(x) is squeezed between two multiples of x/log x.

📐 Lemmas on factorials and binomial coefficients

The proofs require estimates for n! and (2n choose n).

Lemma 1: nⁿ e⁻ⁿ < n! < nⁿ.

  • Proof: eⁿ > nⁿ/n! (since nⁿ/n! is one term in the expansion of eⁿ).

Lemma 2: 4ⁿ/(2n) < (2n choose n) < 4ⁿ.

  • Upper bound: (1+1)²ⁿ = sum of (2n choose k) ≥ (2n choose n), so (2n choose n) < 4ⁿ.
  • Lower bound: by induction, (2n choose n) > 4ⁿ/(2n) for n > 1.

Lemma 3: (2n+1 choose n) < 4ⁿ.

  • (2n+1 choose n) is one of two equal largest terms in (1+1)²ⁿ⁺¹.

📐 Prime factorization of n! and (2n choose n)

Lemma 4 (Legendre): The exponent eₚ(n) of prime p in n! is eₚ(n) = floor(n/p) + floor(n/p²) + floor(n/p³) + ···.

  • Interpretation: floor(n/p) counts multiples of p up to n; floor(n/p²) adds the extra contribution from multiples of p², etc.
  • Example: e₃(30) = floor(30/3) + floor(30/9) + floor(30/27) + ··· = 10 + 3 + 1 = 14.
  • Alternative formula: eₚ(n) = (n − sₚ(n))/(p − 1), where sₚ(n) is the sum of digits of n in base p.

Exponent Eₚ(n) in (2n choose n):

Eₚ(n) = eₚ(2n) − 2eₚ(n) = sum over i of {floor(2n/pⁱ) − 2floor(n/pⁱ)}.

  • Each term in the sum is 0 or 1, and the number of terms ≤ log_p(2n).

Lemma 5: Eₚ(n) ≤ log_p(2n).

Lemma 6: The contribution of p to (2n choose n) does not exceed 2n.

Lemma 7: Every prime n < p < 2n appears exactly once in (2n choose n).

Lemma 8: No prime (2n)/3 < p < n divides (2n choose n).

Lemma 9: No prime p > √(2n) appears more than once in (2n choose n).

📐 Proof of Theorem 1: product of primes ≤ n is < 4ⁿ

Theorem 1: Product over p ≤ n of p < 4ⁿ.

  • Proof by induction on n, considering cases n = 2m and n = 2m+1 separately.
  • For n = 2m: product over p ≤ 2m of p = product over p ≤ 2m−1 of p < 4^(2m−1) by induction.
  • For n = 2m+1: product over p ≤ 2m+1 of p = (product over p ≤ m+1 of p)·(product over m+1 < p < 2m+1 of p) < 4^(m+1)·(2m+1 choose m) ≤ 4^(m+1)·4^m = 4^(2m+1).
  • Deeper methods (Rosser) show product < (2.83)ⁿ.

📐 Proof of Theorems 2 and 3: bounds on π(n)

Theorem 2: π(n) < c·n/log n.

  • From Theorem 1: 4ⁿ > product over p ≤ n of p > product over √n ≤ p ≤ n of p > (√n)^(π(n) − π(√n)).
  • Taking logarithms: n log 4 > (π(n) − π(√n))·(1/2)log n, so π(n) < (4 log 2)·n/log n + √n < c·n/log n.

Theorem 3: π(n) > c·n/log n.

  • From Lemmas 6 and 2: (2n)^π(2n) > (2n choose n) > 4ⁿ/(2n).
  • Taking logarithms: (π(n) + 1)log(2n) > 2n log 2, so for even m, π(m) + 1 > (m/log m)log 2.

📐 Proof of Bertrand's Postulate (weakened form)

Theorem 6: For every integer r there exists a prime p with 3·2^(2r−1) < p < 3·2^(2r).

  • The proof compares (3·2^(2r) choose 3·2^(2r−1)) with a product of binomial coefficients of the form (2^k choose 2^(k−1)).
  • Key restated lemmas:
    1. If n < p < 2n, then p occurs exactly once in (2n choose n).
    2. If 2·2^(2r−1) < p < 3·2^(2r−1), then p does not occur in (3·2^(2r) choose 3·2^(2r−1)).
    3. If p > 2^(r+1), then p occurs at most once in (3·2^n choose 3·2^(n−1)).
    4. No prime occurs more than 2r+1 times in (3·2^r choose 3·2^(2r−1)).
  • Proof by contradiction: Assume no prime in the range 3·2^(2r−1) < p < 3·2^(2r). Then every prime in (3·2^(2r) choose 3·2^(2r−1)) also appears in the product (2^(2r) choose 2^(2r−1))···(2^1 choose 1)·{(2^(r+1) choose 2^r)···(2^1 choose 1)}^(2r) with at least the same multiplicity.
  • But for r ≥ 6, 3·2^(2r) > (sum of 2^k for k ≤ 2r) + 2^r·(sum of 2^k for k ≤ r+1), so by interpreting (2n choose n) as the number of ways to choose n objects from 2n, the product is strictly smaller than (3·2^(2r) choose 3·2^(2r−1))—contradiction.
  • For r ≤ 6, the primes 7, 29, 97, 389, 1543 verify the theorem directly.
  • The full Bertrand's Postulate (prime between n and 2n) is left as an exercise.

📐 Applications of Bertrand's Postulate

The excerpt lists four applications (proofs not given):

  1. 1/2 + 1/3 + ··· + 1/n is never an integer.
  2. Every integer > 6 can be written as the sum of distinct primes.
  3. Every prime pₙ can be expressed as ±2 ± 3 ± 5 ± ··· ± p_(n−1) with an error of at most 1 (Scherk).
  4. The equation π(n) = φ(n) has exactly 8 solutions.

🧮 Average order of arithmetic functions

🧮 Heuristic for average values

The excerpt gives a "purely heuristic argument" for the average value of σₖ(n).

  • The probability that r divides n is 1/r.
  • If r | n, then n/r contributes (n/r)^k to σₖ(n).
  • Expected value of σₖ(n) is (1/1)·(n/1)^k + (1/2)·(n/2)^k + ··· + (1/n)·(n/n)^k = n^k·(1/1^(k+1) + 1/2^(k+1) + ··· + 1/n^(k+1)).
  • For k = 0 (i.e., τ(n)), this is ≈ n log n.
  • For k ≥ 1, this is ≈ n^k·ζ(k+1); e.g., for k=1, ≈ n·ζ(2) = n·π²/6.

Don't confuse: this is a probabilistic heuristic, not a proof; the rigorous proofs use summation techniques below.

🧮 Inversion of order of summation

Let f be an arithmetic function, F(n) = sum from d=1 to n of f(d), and g(n) = sum over d | n of f(d) (i.e., g = f × I).

Key identity: sum from n=1 to x of g(n) = sum from d=1 to x of floor(x/d)·f(d) = sum from n=1 to x of F(floor(x/n)).

  • Proof idea: sum from n=1 to x of sum over d | n of f(d) can be rearranged by summing "along vertical lines" (fixed d) or "along diagonals" (fixed quotient floor(x/n)).
  • Example: the array of f(d) for d | n, n=1,2,3,…, can be summed column-by-column (giving sum of floor(x/d)·f(d)) or along diagonal lines (giving sum of F(floor(x/n))).

🧮 Average order of τ(n)

Taking f = 1 (so g = τ):

sum from n=1 to x of τ(n) = sum from n=1 to x of floor(x/n) = x log x + O(x).

  • Geometric interpretation: τ(r) is the number of lattice points on the hyperbola xy = r (x,y > 0). So sum of τ(r) for r ≤ x is the number of lattice points in the region xy ≤ x, x,y > 0.
  • Using symmetry about x = y and setting u = floor(√x):
    • sum of τ(r) = 2·(floor(x/1) + ··· + floor(x/u)) − u² = 2x·h(u) − x + O(√x) = x log x + (2γ − 1)x + O(√x), where h(n) = 1 + 1/2 + ··· + 1/n and γ is Euler's constant.

🧮 Average order of σ(n)

Taking f = I (identity function, so g = σ):

sum from n=1 to x of σ(n) = sum from n=1 to x of (1 + 2 + ··· + floor(x/n)) = (1/2)·sum from n=1 to x of floor(x/n)·(floor(x/n) + 1) = (1/2)·sum from n=1 to ∞ of x²/n² + O(x log x) = (x²·ζ(2))/2 + O(x log x) = (π²x²)/12 + O(x log x).

🧮 Average order of φ(n)

Recall sum over d | n of φ(d) = n. So:

sum from n=1 to x of n = x(x+1)/2 = sum from d=1 to x of floor(x/d)·φ(d) = sum from n=1 to x of Φ(floor(x/n)),

where Φ(n) = sum from d=1 to n of φ(d).

  • From this: sum from d=1 to x of φ(d)/d ≥ (x+1)/2, revealing that on average φ(d) > d/2.

🧮 Visible lattice points and φ(n)

A lattice point (x,y) is visible from the origin if gcd(x,y) = 1.

  • φ(r) is the number of visible lattice points on the vertical segment x = r, 0 < y < r.
  • So φ(1) + φ(2) + ··· + φ(n) is the number of visible lattice points in the region n > x > y > 0.

Heuristic: A point (x,y) is invisible by virtue of prime p if p | x and p | y, which occurs with probability 1/p². The probability that (x,y) is invisible is product over p of (1 − 1/p²) = 1/ζ(2) = 6/π². Hence the probability of visibility is 6/π², so the number of visible points should be (6/π²)·(area).

Rigorous outline: Let R be a region with finite Jordan measure M(R) and finite perimeter. Let tR denote R magnified by t, L(tR) the number of lattice points in tR, V(tR) the number of visible lattice points in tR.

  • L(tR) = M(tR) + O(t) and M(tR) = t²M(R).
  • L(tR) = V(tR) + V(tR/2) + V(tR/3) + ···.
  • By Möbius inversion: V(tR) = sum over d of μ(d)·L(tR/d) = sum over d of μ(d)·M(tR/d) ≈ M(tR)·sum over d of μ(d)/d² ≈ M(tR)·(6/π²) = t²M(R)·(6/π²).

For R the region n > x > y > 0, M(R) = n²/2, so:

φ(1) + φ(2) + ··· + φ(n) ≈ (n²/2)·(6/π²) = (3/π²)n².

  • More careful analysis yields φ(1) + ··· + φ(n) = (3/π²)n² + O(n log n).
  • The error term cannot be reduced to O(n log log log n) (Chowla), but can be replaced by O(n log^(3/4) n) (Walfitz).
  • Erdős and Shapiro showed that φ(1) + ··· + φ(n) − (3/π²)n² changes sign infinitely often.

🧮 Interpretation as probability

If a pair of integers (a,b) are chosen at random, the probability that they are relatively prime is 6/π².

  • If (a,b) are chosen at random, the expected value of gcd(a,b) is π²/6.
  • If f(x) is one of a certain class of arithmetic functions (including x^α for 0 < α < 1), the probability that gcd(x, f(x)) = 1 is 6/π², and the expected value of gcd(x, f(x)) is π²/6 (Lambek and Moser).
  • The probability that n numbers chosen at random are relatively prime is 1/ζ(n).

🧮 Quadratfrei and k-th power-free numbers

Quadratfrei (square-free): an integer not divisible by any perfect square > 1.

  • The number Q(n) of quadratfrei numbers ≤ n is ∼ (6/π²)n.
  • Proof sketch: sum of Q(floor(n²/r²)) = n², so by Möbius inversion Q(n²) = sum of μ(r)·floor(n/r)² ∼ n²/ζ(2).
  • More detailed: Q(x) = (6x)/π² + O(√x).
  • This can be written: sum from n=1 to x of |μ(n)| ∼ (6/π²)x.

The number Qₖ(n) of k-th power-free numbers ≤ n is n/ζ(k).

🧮 Average order of ω(n) and Ω(n)

ω(n): the number of distinct prime divisors of n.
Ω(n): the number of prime divisors of n counted with multiplicity.

sum from n=1 to x of ω(n) = sum over p ≤ x of floor(x/p) ∼ x log log x.

  • Thus the average value of ω(n) is log log x.
  • The same holds for Ω(n).

Erdős–Kac theorem (stated without proof): Let A(x; α, β) be the number of integers n ≤ x for which α√(log log n) + log log n < ω(n) < β√(log log n) + log log n. Then

lim as x → ∞ of A(x; α, β)/x = (1/√(2π))·integral from α to β of e^(−u²/2) du.

  • Interpretation: ω(n) is "normally distributed" around log log n with standard deviation √(log log n).
  • Example: ω(n) < log log n about half the time.
  • A similar result holds for ω(f(n)) where f(x) is an irreducible polynomial with integer coefficients (Tatuzawa).

🧮 Density of integers with large prime factors

The number C(x, α) of integers ≤ x having a prime divisor > x^α (for 1 > α > 1/2) is ∼ −x log α.

  • Proof: C(x, α) = sum over x^α < p < x of x/p ∼ x·sum over x^α < p < x of 1/p = x·(log log x − log log x^α) = x·(log log x − log log x − log α) = −x log α.
  • Example: the density of numbers having a prime factor exceeding their square root is log 2 ≈ 0.7.

🔍 Absolute bounds and extremal behavior

🔍 Bounds on φ(n)/n and σ(n)/n

  • 1 > φ(n)σ(n)/n² > ε > 0 for all n (for some ε).
  • n > φ(n) > c·n/log log n for some constant c.
  • n < σ(n) < c·n log log n for some constant c.

🔍 Bounds on τ(n)

  • τ(n) > (log n)^k infinitely often for every k.
  • τ(n) < n^ε for every ε and n sufficiently large.

Main theorem (stated without proof): If ε > 0, then

  • τ(n) < 2^((1+ε)log n / log log n) for all n > n₀(ε),
  • τ(n) > 2^((1−ε)log n / log log n) infinitely often.

Don't confuse: the upper bound holds for all large n, but the lower bound holds only infinitely often (not for all n).

🔍 Sum of digits in various bases

Let sᵣ(n) be the sum of the digits of n when written in base r.

  • Mirsky proved: sᵣ(1) + sᵣ(2) + ··· + sᵣ(n) = ((r−1)/2)·n·logᵣ n + O(n).
  • R. Trollope (University of Alberta master's thesis) considered similar sums where the elements run over sequences such as primes, squares, etc.
  • Trollope also showed: (s₁(n) + s₂(n) + ··· + sₙ(n))/n² ∼ 1 − π²/12.

🔬 Advanced topics (stated without proof)

🔬 Elementary proof of the Prime Number Theorem

Around 1949, Erdős and Selberg discovered an elementary proof of the Prime Number Theorem (π(x) ∼ x/log x).

Selberg's inequality (the main new result):

sum over p ≤ x of (log p)² + sum over pq ≤ x of (log p)(log q) = 2x log x + O(x).

  • Although this inequality is "still far from the Prime Number Theorem," the latter can be deduced from it "without recourse to any further number theoretical results."
  • Several proofs of Selberg's lemma have been given; "perhaps the simplest being due to Tatuzawa and Iseki."
  • Selberg's original proof uses the functions λₙ = λₙ,ₓ = sum over d | n of μ(d)·log²(x/d) and T(x) = sum from n=1 to x of λₙ·x/n.

🔬 Lambek–Moser rational logarithm analogue

About five years before the excerpt, J. Lambek and L. Moser showed that Selberg's lemma can be proved "in a completely elementary way, i.e., using properties of integers only."

Rational analogue of the logarithm: Let h(n) = 1 + 1/2 + 1/3 + ··· + 1/n and ℓₖ(n) = h(kn) − h(k).

  • They prove in an elementary way: |ℓₖ(ab) − ℓₖ(a) − ℓₖ(b)| < 1/k.
  • This rational function ℓₖ mimics the logarithm's additivity property log(ab) = log a + log b, but with a small error.

🔬 Second Möbius inversion formula

Theorem: If G(x) = sum from n=1 to x of F(floor(x/n)), then F(x) = sum from n=1 to x of μ(n)·G(floor(x/n)).

  • Proof: sum from n=1 to x of μ(n)·G(floor(x/n)) = sum from n=1 to x of μ(n)·sum from m=1 to floor(x/n) of F(floor(x/(mn))) = sum from k=1 to x of F(floor(x/k))·sum over n | k of μ(n) = F(x).
  • This is a "second" inversion formula (distinct from the usual f ↔ g = f × I inversion).

🔬 Miscellaneous results

  • sum over gcd(a,b)=1 of 1/(a²b²) = 5/2 (proof left as exercise).
  • sum from n=1 to x of μ(n) = M(x) = o(x) (known but difficult to prove; equivalent in depth to the Prime Number Theorem).
  • The excerpt mentions results on the distribution of ω(f(n)) for irreducible polynomials f, and various density results, but does not give proofs.
4

Irrational Numbers

Chapter 4. Irrational Numbers

🧭 Overview

🧠 One-sentence thesis

The chapter demonstrates various techniques for proving irrationality and transcendence, ranging from simple contradiction arguments to sophisticated approximation theorems that distinguish algebraic numbers from transcendental ones.

📌 Key points (3–5)

  • Novel proof for √2: a minimal-denominator contradiction that avoids divisibility arguments.
  • Niven's integral method: proves π is irrational by constructing an integral that must be an integer yet becomes arbitrarily small.
  • Liouville's approximation theorem: algebraic numbers of degree n cannot be approximated "too well" by rationals (not better than order n), enabling construction of transcendental numbers.
  • Common confusion: proving irrationality vs proving transcendence—transcendence (not satisfying any polynomial with integer coefficients) is much harder; even well-known constants like Euler's γ remain unproven as irrational.
  • Roth's breakthrough: improved Liouville's bound to f(n) = 2 + ε, nearly optimal given Dirichlet's result that f(n) = 2 is the limit.

🔢 Simple irrationality proofs

🔢 √2 without divisibility

The excerpt presents a proof that √2 is irrational using a "smallest denominator" contradiction:

  • Assume √2 = a/b with b as small as possible.
  • Then b < a < 2b (since 1 < √2 < 2).
  • Algebraic manipulation yields √2 = (2b − a)/(a − b).
  • But a − b < b, so we have a representation with a smaller denominator than the "smallest possible"—contradiction.

Why this matters: the proof avoids the usual divisibility/parity arguments, offering a fresh perspective suitable for students.

🔟 log₁₀ 2 is irrational

  • If log₁₀ 2 = a/b, then 10^(a/b) = 2, hence 10^a = 2^b.
  • Left side is divisible by 5; right side is not—contradiction.

Example: this is a quick way to convince students that irrationals exist beyond √2.

📐 cos 1° and sin 1° are irrational

  • From cos 45° + i sin 45° = (cos 1° + i sin 1°)^45, deduce that cos 45° is a polynomial with integer coefficients in cos 1°.
  • If cos 1° were rational, then cos 45° = 1/√2 would also be rational—but √2 is irrational.

Don't confuse: this uses the algebraic relationship between angles, not direct approximation.

🥧 Niven's proof that π is irrational

🥧 The integral construction

The excerpt presents Niven's 1947 proof:

  • Assume π = a/b.
  • Define f(x) = x^n (a − bx)^n / n! and F(x) = f(x) − f^(2)(x) + f^(4)(x) − ⋯ (alternating derivatives).
  • Key property: f(x) and all derivatives have integral values at x = 0 and x = π.

🧮 The contradiction

By calculus, the derivative of [F′(x) sin x − F(x) cos x] equals f(x) sin x, so:

∫₀^π f(x) sin x dx = [F′(x) sin x − F(x) cos x] from 0 to π = an integer.

But for 0 < x < π:

  • 0 < f(x) sin x < π^n a^n / n!
  • As n → ∞, this upper bound → 0.

Hence the integral is a positive integer that becomes arbitrarily small—impossible.

Why it works: the polynomial f(x) is designed so its values at endpoints are integers, yet the integral over (0, π) shrinks to zero.

🔄 Extensions of Niven's method

  • Niven also proved: cosine of a rational number is irrational (so if π were rational, cos π = −1 would be irrational—contradiction).
  • The method extends to irrationality of certain numbers defined as roots of solutions to second-order differential equations with special boundary conditions.
  • A variation avoids integrals/infinite series but is more complicated.

Current status: a really simple proof that π is transcendental (not just irrational) is still lacking.

🌌 Transcendental numbers: existence and construction

🌌 Three types of problems

The excerpt identifies three levels of difficulty:

  1. Prove existence of transcendental numbers (easiest).
  2. Construct specific transcendental numbers.
  3. Prove that naturally occurring numbers are transcendental (much harder).

Examples of proven transcendentals: π, e, e^(−π), log 3 / log 2.

Still unproven: Euler's constant γ and ∑(1/n^(2s+1)) for integer s have not even been proved irrational.

🔢 Cantor's existence proof

  • Algebraic numbers (roots of polynomials with integer coefficients) are countable.
  • Real numbers are uncountable.
  • Therefore transcendental numbers exist (in fact, "most" reals are transcendental).

Example: there exists a transcendental number of the form e^(iθ) for some 0 < θ < π/2.

🎩 Kuratowski's "disappearing stunt"

The excerpt describes a curious construction using a transcendental e^(iθ):

  • Start at origin O; build set S̃ by repeatedly applying:
    • T: translate 1 unit right (add 1).
    • R: rotate by θ about O (multiply by e^(iθ)).
  • Every point in S̃ is a polynomial in e^(iθ) with positive integer coefficients, hence has a unique representation (else e^(iθ) would satisfy a polynomial equation, contradicting transcendence).
  • Partition S̃ into T̃ (points whose last operation was T) and R̃ (last operation R).
  • Magic: translating S̃ by 1 maps S̃ into T̃ (R̃ vanishes); rotating S̃ by θ maps S̃ into R̃ (T̃ vanishes).

Why it's interesting: a geometric illustration of transcendence via set operations.

📏 Liouville's approximation theorem and refinements

📏 Definitions

Algebraic number of degree n: satisfies a polynomial equation of degree n with integer coefficients.

Approximable to order n: the inequality |λ − a/b| < c/b^n has infinitely many solutions (for some constant c).

📏 Liouville's theorem

Statement: A real algebraic number of degree n is not approximable to any order greater than n.

Proof sketch:

  • Let λ satisfy f(λ) = a₀λ^n + a₁λ^(n−1) + ⋯ + aₙ = 0.
  • There exists M such that |f′(x)| < M near λ.
  • For any rational approximation p/q ≠ λ close enough:
    • |f(p/q)| ≥ 1/q^n (since f(p/q) is a nonzero integer divided by q^n).
    • |f(p/q) / (λ − p/q)| < M (by mean value theorem).
    • Hence |λ − p/q| > c/q^n.

Application: construct transcendental numbers by building numbers approximable to arbitrarily high order.

Example: λ = 1/10^(1!) + 1/10^(2!) + ⋯ + 1/10^(p!) + ⋯ satisfies |λ − λₚ| < 1/λ^(n+1) for every p, so λ is approximable to order n for any n—hence transcendental.

🔬 Refinements of Liouville's bound

The excerpt traces improvements to the approximation bound:

YearMathematicianBest f(n)Meaning
~1844LiouvillenAlgebraic degree n ⇒ not approximable beyond order n
~1909Thuen/2Improved bound
1921Siegel2√nFurther improvement
~1947Dyson, Schneider√(2n)Slight refinement
1955Roth2 + εNear-optimal (for any ε > 0)

Roth's result: For algebraic λ of degree n, |λ − p/q| < M/q^(2+ε) has at most finitely many solutions.

Why 2 + ε is best possible: Dirichlet proved that for any irrational λ, |λ − p/q| < 1/q² has infinitely many solutions.

🔍 Dirichlet's result

Statement: For irrational λ, there exist infinitely many solutions to |λ − p/q| < 1/q².

Proof sketch:

  • Fix n; consider fractional parts (λ), (2λ), …, (nλ) on (0, 1).
  • These n points are distinct; by pigeonhole, two are within distance ≤ 1/n.
  • Say (iλ) and (jλ) differ by < 1/n, so kλ − m < 1/n for integers k, m ≤ n.
  • Then |λ − m/k| ≤ 1/(nk) ≤ 1/n², as required.

Don't confuse: Dirichlet's result applies to all irrationals; Roth's applies only to algebraic numbers and shows they cannot be approximated much better than order 2.

5

Congruences

Chapter 5. Congruences

🧭 Overview

🧠 One-sentence thesis

Congruence theory provides a systematic framework for studying divisibility and residue patterns modulo m, with applications ranging from Euler's theorem to the distribution of quadratic residues, revealing deep structural properties of integers under modular arithmetic.

📌 Key points (3–5)

  • Congruence notation and properties: a ≡ b (mod m) means m divides (a − b); congruences behave like equations under addition and multiplication, but cancellation requires care.
  • Euler's and Fermat's theorems: if (a, m) = 1, then a raised to the power φ(m) ≡ 1 (mod m); for primes p, a raised to the power (p − 1) ≡ 1 (mod p).
  • Group-theoretic viewpoint: integers relatively prime to m form a multiplicative group of order φ(m); for primes p, this group is cyclic with a primitive root g.
  • Common confusion—cancellation in congruences: ka ≡ kb (mod m) does NOT always imply a ≡ b (mod m); cancellation is valid only when (k, m) = 1.
  • Quadratic residues and reciprocity: the distribution of squares modulo p reveals patterns governed by the Law of Quadratic Reciprocity and the cyclic structure of the multiplicative group.

🔢 Congruence fundamentals

🔢 Definition and basic properties

Congruence notation: a ≡ b (mod m) means m divides (a − b).

  • This notation is due to Gauss.
  • Congruence is an equivalence relation:
    • Reflexive: a ≡ a
    • Symmetric: a ≡ b ⇒ b ≡ a
    • Transitive: a ≡ b and b ≡ c ⇒ a ≡ c

➕ Arithmetic with congruences

Addition and multiplication:

  • If a ≡ b and c ≡ d, then ac ≡ bd (mod m).
  • In particular, a ≡ b ⇒ ka ≡ kb (mod m).

Cancellation pitfall:

  • The converse is not true in general.
  • Example: 2 × 3 = 4 × 3 (mod 6) does NOT imply 2 ≡ 4 (mod 6).
  • When cancellation works: if (k, m) = 1, then ka ≡ kb (mod m) does imply a ≡ b (mod m).
  • Don't confuse: cancellation in ordinary equations always works; in congruences it requires the multiplier to be coprime to the modulus.

🔄 Complete residue systems

Theorem: If a₁, a₂, …, a_φ(m) form a complete residue system mod m, then so does aa₁, aa₂, …, aa_φ(m) provided (a, m) = 1.

Proof idea:

  • We have φ(m) residues.
  • If two are congruent: aa_i ≡ aa_j, then a(a_i − a_j) ≡ 0.
  • Since (a, m) = 1, we can cancel a, giving a_i ≡ a_j.
  • This means the new system has no duplicates, so it is complete.

🎯 Euler's and Fermat's theorems

🎯 Euler's theorem

Theorem: If (a, m) = 1, then a raised to the power φ(m) ≡ 1 (mod m).

Proof using complete residue systems:

  • Let a₁, a₂, …, a_φ(m) be a complete residue system mod m.
  • Then aa₁, aa₂, …, aa_φ(m) is also a complete residue system (in some order).
  • Their products are congruent: a raised to φ(m) times (a₁a₂⋯a_φ(m)) ≡ a₁a₂⋯a_φ(m).
  • Cancel the product of the a_i (which is coprime to m) to get a raised to φ(m) ≡ 1.

🔴 Fermat's Little Theorem

Special case: If p is prime and (a, p) = 1, then a raised to the power (p − 1) ≡ 1 (mod p).

  • Multiplying by a: a raised to the power p ≡ a (mod p) for all a.

Alternative proofs:

  1. Induction on a: Use the binomial expansion (a + 1) raised to p = a raised to p + (binomial coefficients) + 1; all middle binomial coefficients are divisible by p.
  2. Multinomial theorem: Consider (1 + 1 + ⋯ + 1) raised to p with a ones.
  3. Combinatorial (colored polygons): Count regular convex p-gons with each edge colored in one of a colors. Total: a raised to p. Monochromatic: a. Non-monochromatic: a raised to p − a, which come in sets of p by rotation. Hence a raised to p − a ≡ 0 (mod p).

🔁 Group-theoretic viewpoint

Multiplicative group mod m:

  • The integers relatively prime to m and less than m form a group under multiplication mod m.
  • Order of the group: φ(m).
  • Key property: every element has an inverse in the system.

Lagrange's theorem application:

  • If a is an element of a group G of order m, then a raised to m = 1.
  • Applied to our group: a raised to φ(m) ≡ 1 (mod m).

For primes p:

  • The integers under p form a field with respect to addition and multiplication.
  • The multiplicative group (containing p − 1 elements) is cyclic.
  • There exists a primitive root g such that 1 = g⁰, g¹, g², …, g^(p−1) are all incongruent mod p.

🧮 Wilson's theorem and polynomial congruences

🧮 Wilson's theorem

Theorem: (p − 1)! ≡ −1 (mod p) for prime p.

Derivation from polynomial factorization:

  • In the ring of polynomials with coefficients mod p, x raised to p − x ≡ 0 has at most p solutions.
  • Since every residue class 0, 1, 2, …, p − 1 is a root, we have the factorization:
    • x raised to p − x ≡ x(x − 1)(x − 2)⋯(x − p + 1) (mod p).
  • Comparing coefficients of x: (p − 1)! ≡ −1 (mod p).

Cayley's geometrical proof:

  • Consider directed p-gons with vertices of a regular p-gon.
  • Total number: (p − 1)!
  • Regular ones: p − 1.
  • Non-regular: (p − 1)! − (p − 1), which come in sets of p by rotation.
  • Hence (p − 1)! − (p − 1) ≡ 0 (mod p), giving (p − 1)! + 1 ≡ 0 (mod p).

📐 Polynomial congruences

Key properties:

  • If f(x) is a polynomial with coefficients in residue classes mod p, then f(x) ≡ 0 (mod p) has at most p solutions.
  • If r is a root, then (x − r) is a factor.
  • Not guaranteed: f(x) ≡ 0 (mod p) does not necessarily have at least one root.

Primality criterion from Wilson:

  • p is prime if and only if (p − 1)! ≡ −1 (mod p).
  • This is a necessary and sufficient condition, but not practical for testing primality.

🧩 Compatible sets and coefficient of compatibility

Compatible set (mod n): a set of distinct residue classes a₁, a₂, …, a_k such that there exists a polynomial f(x) with integer coefficients for which f(x) ≡ 0 (mod n) has exactly these roots and no others.

  • Let C(n) = number of compatible sets (mod n).
  • Coefficient of compatibility: c(n) = C(n) / 2^n.
  • For primes p: c(p) = 1 (every subset is compatible via the polynomial (x − a₁)(x − a₂)⋯(x − a_k)).
  • For composite n ≠ 4: c(n) < 1.
  • c(4) = 1, but c(n) < 1 for n = 6, 8, 9, 10.

🎲 Quadratic residues and reciprocity

🎲 Quadratic residues definition

Quadratic residue: a is a quadratic residue of prime p if there exists x such that x² ≡ a (mod p).

Legendre symbol notation:

  • (a/p) = +1 if a is a residue of p.
  • (a/p) = −1 if a is a non-residue of p.
  • (a/p) = 0 if p divides a.

Example for p = 7:

  • 1² ≡ 1 ≡ 6², 2² ≡ 4 ≡ 5², 3² ≡ 2 ≡ 4².
  • Residues: 1, 2, 4. Non-residues: 3, 5, 6.
  • Sequence: + + − + − −.

🔄 Group structure of residues

Cyclic multiplicative group:

  • Residue classes mod p form a field; the multiplicative group has p − 1 elements and is cyclic.
  • If g is a generator: g¹, g², …, g^(p−1) = 1 are all incongruent.
  • Even powers of g are quadratic residues (form a subgroup of index 2).
  • Odd powers of g are non-residues.

Multiplication rules:

  • res × res = res
  • res × nonres = nonres
  • nonres × nonres = res

Inverse property:

  • 1/a (the unique inverse of a mod p) is a residue or non-residue according as a itself is a residue or non-residue.

⚖️ Law of Quadratic Reciprocity

Law of Quadratic Reciprocity: For distinct odd primes p and q,
(p/q) × (q/p) = (−1) raised to the power [(p−1)(q−1)/4].

  • First proved by Gauss around 1800.
  • Over 50 proofs have been given (now over 200).
  • Provides an algorithm for deciding the value of (p/q).

Gauss's lemma (used in his first proof):

  • For p ≡ 1 (mod 4), the least non-residue of p does not exceed 2√p + 1.

📊 Distribution of quadratic residues

Dirichlet's theorem (1839):

  • If p ≡ 3 (mod 4), then among the integers 1, 2, …, (p−1)/2, there are more residues than non-residues.
  • All published proofs involve Fourier series; no elementary proof is known.

Consecutive residues and non-residues:

  • Let R_n = number of blocks of n consecutive residues.
  • Let N_n = number of blocks of n consecutive non-residues.
  • Conjecture: R_n ∼ N_n ∼ p / 2^n.
  • A. Brauer's result (using Van der Waerden's theorem): for p > p₀(n), R_n > 0 and N_n > 0.

Bounds on the least non-residue n_p:

  • Nagel: for p ≠ 7, 23, n_p < √p.
  • Vinogradov: for p > p₀, n_p < p raised to the power [1/(2√e log² p)] < p^0.303.
  • Pillai: n_p ≠ o(log log p).
  • Using the Riemann hypothesis, Ankeny: n_p ≠ O(log² p).

🔍 Lattice methods and explicit bounds

🔍 Visible lattice points

Counting visible points:

  • V(m) = number of visible lattice points in a square of side m.
  • Exact formula: V(m) = sum over d≥1 of μ(d) × floor(m/d)².
  • Asymptotic: V(m) ∼ (6/π²)m².
  • Explicit bound: for all m, V(m) > 0.6m².

🔢 Application to residue bounds

Method:

  • Take m = floor(√p).
  • For reasonably large p: V(floor(√p)) > 0.59m².
  • Associate each visible lattice point (a, b) with the number a/b (mod p).
  • Distinct visible points correspond to distinct numbers: if a/b = c/d (mod p), then ad ≡ bc; since ad < p and bc < p, we have ad = bc, so a/b = c/d as fractions; since (a,b) = (c,d) = 1, we get (a,b) = (c,d).

Result:

  • At least 0.59 distinct numbers represented by fractions a/b with a, b < √p.
  • At least 0.09 of these correspond to non-residues.
  • Let R = number of residues < √p, N = number of non-residues < √p.
  • Then R + N = √p and 2RN > 0.09p.
  • Solving: R, N > 0.04√p.

Comparison:

  • Weaker than Nagel's result but holds for both p ≡ 1 (mod 4) and p ≡ 3 (mod 4).
  • Exceptions: only primes 7 and 23.
6

Diophantine Equations

Chapter 6. Diophantine Equations

🧭 Overview

🧠 One-sentence thesis

This chapter explores several specific Diophantine equation problems—including the factorial-square conjecture and the sum-of-powers equation—demonstrating how modular arithmetic and prime-distribution arguments can establish extremely large lower bounds for potential solutions even when complete solutions remain unknown.

📌 Key points (3–5)

  • The factorial-square conjecture: The equation n! + 1 = x² is conjectured to have only three solutions (n = 4, 5, 7), verified up to n = 5000 by Kraitchik's modular-arithmetic method, but remains unproven.
  • Finite solutions via prime-form arguments: The equation n! + 1 = x⁸ has only finitely many solutions because the prime factorizations on each side have incompatible distributions of primes of the form 4k + 1 versus 4k + 3.
  • The sum-of-powers equation: The equation 1ⁿ + 2ⁿ + ⋯ + (m − 1)ⁿ = mⁿ has only the trivial solution 1 + 2 = 3 known; if other solutions exist, then m > 10¹⁰⁰⁰⁰⁰⁰.
  • Common confusion—checking vs proving: Brute-force verification (e.g., up to n = 5000) shows no counterexamples but does not constitute a proof; the methods here use congruences and prime-sum estimates to derive theoretical bounds.
  • Why it matters: These problems illustrate how number-theoretic tools (modular arithmetic, prime distribution, Fermat's theorem) can yield powerful partial results even when full solutions are out of reach.

🔢 The factorial-square conjecture

🧩 Statement and history

Brochard's conjecture (1885): The only solutions to n! + 1 = x² are 4! + 1 = 5², 5! + 1 = 11², and 7! + 1 = 71².

  • Ramanujan independently made the same conjecture around 1895 but made no progress.
  • Around 1940 it appeared as an "elementary" problem in the Monthly; no correct solutions were offered, and an incorrect solution was published in 1950.
  • Kraitchik verified the conjecture up to n = 5000 using modular methods; that is where the problem stands.

🔍 Kraitchik's modular method

How it works:

  • To check whether 100! + 1 is a perfect square, work modulo a prime p > 100 (e.g., p = 103).
  • By Wilson's theorem, 100! · (−2) · (−1) ≡ −1 (mod 103), so 100! + 1/2 ≡ 0, hence 100! + 1 ≡ 1/2 ≡ 52 (mod 103).
  • If 52 is a nonresidue modulo 103, then 100! + 1 cannot be a perfect square.
  • If 52 is a residue, try another prime (e.g., 107).
  • Don't confuse: Working modulo 101 gives 100! + 1 ≡ 0 (mod 101), which provides no information about whether it is a square.

Variations:

  • Kraitchik used variations of this method to eliminate many candidates wholesale, achieving verification up to n = 5000.

🔢 Finite solutions for n! + 1 = x⁸

Outline of the proof:

  • The proof relies on two facts (stated without proof in the excerpt):
    1. Every odd prime divisor of x² + 1 is of the form 4n + 1.
    2. There are roughly as many primes of the form 4n + 1 as 4n + 3.
  • Rewrite n! + 1 = x⁸ as n! = x⁸ − 1 = (x⁴ + 1)(x² + 1)(x² − 1).
  • On the right side, primes of the form 4k + 1 and 4k − 1 contribute roughly equally.
  • On the left side, all odd prime factors of (x⁴ + 1)(x² + 1)—about (n!)³⁄₄ of the product—are of the form 4n + 1.
  • This imbalance can hold only for finitely many n.

Example: If n is large, the left side n! has roughly equal contributions from 4k + 1 and 4k + 3 primes, but the right side requires a disproportionate number of 4k + 1 primes, leading to a contradiction for all but finitely many n.

🧮 The sum-of-powers equation

🧩 Statement and near-solutions

The equation:

1ⁿ + 2ⁿ + ⋯ + (m − 1)ⁿ = mⁿ

  • The only known solution is the trivial 1 + 2 = 3.
  • Erdős conjectured this is the only solution.

Near-solutions (not exact solutions):

  • 3² + 4² = 5²
  • 3³ + 4³ + 5³ = 6³
  • 1⁶ + 2⁶ + 4⁶ + 7⁶ + 9⁶ + 12⁶ + 13⁶ + 15⁶ + 16⁶ + 18⁶ + 20⁶ + 22⁶ + 23⁶ = 28⁶

🔍 Main result: m > 10¹⁰⁰⁰⁰⁰⁰

Theorem: If the equation has a solution with n > 1, then m > 10¹⁰⁰⁰⁰⁰⁰.

Strategy:

  • Examine the equation modulo various primes.
  • Combine the resulting congruences to obtain extremely large lower bounds for m without laborious computation.

🧷 Key lemma and initial congruences

🧷 Lemma 1: Sum of powers modulo p

Lemma 1: If p is a prime and εₙ(p) is defined by εₙ(p) = −1 when (p − 1) | n and εₙ(p) = 0 otherwise, then Sₙ(p) ≡ εₙ(p) (mod p), where Sₙ(m) denotes the sum 1ⁿ + 2ⁿ + ⋯ + (m − 1)ⁿ.

🔢 Congruence from primes dividing m − 1

  • Suppose p | (m − 1). Then:
    • Sₙ(m) ≡ (m − 1)/p · εₙ(p) (mod p).
    • Since m ≡ 1 (mod p) and Sₙ(m) ≡ mⁿ (mod p), we get (m − 1)/p · εₙ(p) ≡ 1 (mod p).
    • Hence εₙ(p) ≢ 0 (mod p), so εₙ(p) = −1, which means (p − 1) | n.
    • This gives (m − 1)/p + 1 ≡ 0 (mod p), or m − 1 + p ≡ 0 (mod p²).

Consequences:

  • m − 1 is squarefree.
  • m − 1 ≠ 2, so m − 1 has at least one prime divisor, hence n is even.

🔢 Combined congruence for m − 1

  • Multiply together all congruences of the form (m − 1)/p + 1 ≡ 0 (mod p) for each prime p dividing m − 1.

  • The modulus becomes m − 1, and products with two or more distinct factors (m − 1)/p are divisible by m − 1.

  • Result:

    ∑ (1/p) + 1/(m − 1) ≡ 0 (mod 1), where the sum is over primes p dividing m − 1.

  • The only values m ≤ 1000 satisfying this are 3, 7, 43.

🧮 Additional congruences

🔢 Congruence for m + 1

  • Rewrite the equation as Sₙ(m + 2) = 2mⁿ + (m + 1)ⁿ.
  • If p | (m + 1), then (p − 1) | n and (m + 1)/p + 2 ≡ 0 (mod p).
  • No odd prime appears with exponent > 1 in m + 1.
  • Examining modulo 4 (using n even): m + 1 ≡ 1 or 4 (mod 8), so m + 1 is odd or contains 2 exactly to the second power.
  • If m + 1 is even, multiply congruences to get:

    ∑ (1/p) + 2/(m + 1) ≡ 0 (mod 1), where the sum is over primes p dividing m + 1.

🔢 Congruence for 2m − 1

  • If p | (2m − 1), a similar argument yields:

    ∑ (1/p) + 2/(2m − 1) ≡ 0 (mod 1), where the sum is over primes p dividing 2m − 1.

  • 2m − 1 is squarefree.

🔢 Congruence for 2m + 1

  • Rewrite the equation as Sₙ(m + 1) = 2mⁿ.
  • If p | (2m + 1), then:

    ∑ (1/p) + 4/(2m + 1) ≡ 0 (mod 1), where the sum is over primes p dividing 2m + 1.

  • 2m + 1 is squarefree.

🔍 Combining congruences to bound m

🧩 The key inequality

  • Assume m + 1 is even (so all four congruences hold).

  • Add the left-hand sides of the four congruences; the result is an integer ≥ 4.

  • No prime p > 3 can divide more than one of m − 1, m + 1, 2m − 1, 2m + 1.

  • Primes 2 and 3 can divide at most two of these numbers.

  • Let M = (m − 1)(m + 1)(2m − 1)(2m + 1). Then:

    ∑ (1/p) + 1/(m − 1) + 2/(m + 1) + 2/(2m − 1) + 4/(2m + 1) ≥ 4 − 1/2 − 1/3, where the sum is over primes p dividing M.

  • Simplifying: ∑ (1/p) > 3.16, where the sum is over primes p dividing M.

🔢 Lemma 2 and the final bound

Lemma 2: ∑ (1/p) < 3.16 for p ≤ 10⁷.

Proof sketch:

  • By direct computation, ∑ (1/p) < 2.18 for p ≤ 10⁸.
  • Using Lehmer's table and Rosser's estimates for π(x), one can check that π(x) < 1.2x / log x for 10³ < x < 10⁷.
  • Combining these with the integral formula ∑ (1/p) = π(x)/x + ∫₂ˣ π(t)/t² dt gives the result.

Consequence:

  • If ∑ (1/p) > 3.16, then M > ∏ (p for p ≤ 10⁷).
  • Using Rosser's result ∑ log p > (1 − 1/log x)x for x < e¹⁰⁰, we get:
    • log M > (1 − 1/(7 log 10)) · 10⁷ > 0.93 · 10⁷.
    • Since M < 4m², we have log m > (log M − log 4)/2 > 0.231 · 10⁷.
    • Hence m > e^(0.231 · 10⁷) > 10¹⁰⁰⁰⁰⁰⁰.

🔍 The case m − 1 odd

  • If m − 1 is odd, we cannot use the congruence for m + 1.

  • Let N = (m − 1)(2m − 1)(2m + 1). Then:

    ∑ (1/p) + 1/(m − 1) + 2/(2m − 1) + 4/(2m + 1) > 3 − 1/3, where the sum is over primes p dividing N.

  • Since the prime 2 does not appear on the left, this is actually a stronger condition than the previous inequality.

  • Hence in all cases, m > 10¹⁰⁰⁰⁰⁰⁰.

Don't confuse: The bound m > 10¹⁰⁰⁰⁰⁰⁰ does not prove there are no other solutions; it only shows that if another solution exists, m must be astronomically large.

📊 Summary table

EquationKnown solutionsStatusMethod
n! + 1 = x²n = 4, 5, 7Conjectured complete; verified to n = 5000Modular arithmetic (Kraitchik)
n! + 1 = x⁸UnknownOnly finitely many solutionsPrime-form distribution argument
1ⁿ + 2ⁿ + ⋯ + (m − 1)ⁿ = mⁿ1 + 2 = 3If n > 1, then m > 10¹⁰⁰⁰⁰⁰⁰Combining congruences + prime-sum estimates
7

Combinatorial Number Theory

Chapter 7. Combinatorial Number Theory

🧭 Overview

🧠 One-sentence thesis

Combinatorial number theory explores how integers can be partitioned, sequenced, or combined under various constraints, revealing surprising thresholds—such as Schur's theorem guaranteeing that sufficiently many consecutive integers split into n classes must contain a solution to a + b = c in at least one class, and Van der Waerden's theorem ensuring arithmetic progressions appear in at least one class when enough integers are partitioned.

📌 Key points (3–5)

  • Schur's theorem: If you partition 1, 2, …, [n! e] into n classes, at least one class contains a, b, c with a + b = c; but clever partitions can avoid this up to smaller bounds like (3^(n)−1)/2.
  • Van der Waerden's theorem: Given k classes and a desired progression length , there exists W(k, ) such that partitioning 1, 2, …, W into k classes forces at least one class to contain terms in arithmetic progression.
  • Nonaveraging sequences: A sequence is nonaveraging if no element equals the average of two others (aᵢ + aⱼ ≠ 2aₖ); the maximum density of such sequences was long conjectured to be very sparse, and Roth (1954) proved A(n)/n → 0.
  • Common confusion: Schur's lower bound (3^(n)−1)/2 vs. upper bound [n! e]—the gap reveals an open problem about the true threshold.
  • Addition chains and sum-distinct sets: Problems about representing integers as sums (e.g., ensuring all 2^k subset sums are distinct, or finding shortest addition chains) connect combinatorics, representation, and density questions.

🎯 Schur's theorem and partition thresholds

🎯 What Schur's theorem says

Schur's theorem: If the integers 1, 2, …, Tₙ (where Tₙ = [n! e]) are partitioned into n classes in any manner, at least one class contains a solution to a + b = c.

  • Equivalently: no class can contain the difference of two of its elements (since a + b = cca = b).
  • The bound Tₙ = n! (1 + 1/1! + 1/2! + … + 1/n!) = [n! e] is the smallest integer ≥ n! e.

🧱 Counterexamples showing the bound is not tight

The excerpt gives three constructions that avoid a + b = c within each class for smaller ranges:

ConstructionRangeMethodWhy it works
Powers of 21, …, 2^n − 1 into n classesPut m in class k if m = 2^k θ (θ odd)Sum of two elements in class k has factor 2^k, so lies in a higher class
Doubling blocks1, …, 2^n − 1Class 1: {1}, class 2: {2,3}, class 3: {4,5,6,7}, …Each class is a block of 2^(k−1) consecutive integers; sums exceed the block
Complicated distribution1, …, (3^n − 1)/2Pattern: 1 / 2 5 / 3 4 6 / 10 11 7 / …Carefully arranged so no class has a + b = c

Open problem: Can the [n! e] in Schur's theorem be replaced by a much smaller number? The gap between (3^n − 1)/2 and [n! e] is enormous.

🔍 Proof sketch of Schur's theorem

The proof is by contradiction and uses a "successive differencing" argument:

  1. Assume 1, 2, …, Tₙ are partitioned into n classes with no class containing a + b = c.
  2. Start with class A₁ containing at least ⌊Tₙ/n⌋ + 1 = Tₙ₋₁ + 1 elements a₁ < a₂ < ….
  3. Form Tₙ₋₁ differences: b₁ = a₂ − a₁, b₂ = a₃ − a₁, … (all b's are differences of a's, so none lie in A₁).
  4. These Tₙ₋₁ b's lie in the remaining n − 1 classes; at least Tₙ₋₂ + 1 lie in some class A₂.
  5. Form their Tₙ₋₂ differences (the c's), which lie outside A₁ and A₂.
  6. Continue until you obtain T₀ = 1 number not in any of the n classes—contradiction.

Don't confuse: The b's, c's, d's are all differences of original a's, so they remain among 1, 2, …, Tₙ; the process exhausts classes one by one.

🔗 Connection to Fermat's Last Theorem

  • A natural approach to Fermat's Last Theorem (x^n + y^n = z^n has no nontrivial integer solutions) would be to show x^n + y^nz^n (mod p) is unsolvable for some prime p not dividing xyz.
  • Schur's theorem implies this approach must fail: if p > n! e, then x^n + y^nz^n (mod p) does have a solution with pxyz.
  • This shows modular obstructions alone cannot prove Fermat's Last Theorem.

🌈 Van der Waerden's theorem

🌈 Statement and significance

Van der Waerden's theorem: Given integers k (number of classes) and (desired progression length), there exists W = W(k, ) such that if 1, 2, …, W are partitioned into k classes in any manner, at least one class contains terms in arithmetic progression.

  • Original motivation: Can one assert that arithmetic progressions of arbitrary length exist in at least one class when all integers are divided into two classes? (Related to quadratic residues.)
  • Van der Waerden (1928) solved this by generalizing the problem (a common strategy: make it harder to make it easier).

🔢 Known bounds for W(k, )

The excerpt notes:

  • Van der Waerden's original proof is "extremely tricky, difficult to see through, and leads only to fantastically large bounds."
  • Open problem: Find a simpler proof and reasonable bounds.
  • Moser (using nonaveraging techniques) proved W(k, ) > k^(log k).
  • Erdős and Rado (different method) showed W(k, ) > √2 k^.

Don't confuse: These are lower bounds (showing you can partition up to that many integers without forcing a progression); the actual W(k, ) is much larger.

🚫 Nonaveraging sequences

🚫 Definition and basic examples

Nonaveraging sequence: A sequence A: a₁ < a₂ < … such that no element is the average of two others, i.e., aᵢ + aⱼ ≠ 2aₖ for ij.

  • Let A(n) = number of elements in A not exceeding n.
  • Main question: How large can A(n) be?

Greedy construction (Szekeres' sequence):

  • Start with 1, 2, … and always take the smallest number that doesn't violate nonaveraging.
  • Result: 1, 2, 4, 5, 10, 11, 13, 14, 28, 29, 31, …
  • Interesting fact: This equals {integers whose base-3 representation uses only 0's and 1's} + 1.
  • Related to the Cantor ternary set.
  • Counting function: A(n) ∼ n^(log 2 / log 3) ≈ n^0.631.
  • Szekeres (1930) conjectured this was the densest possible nonaveraging set.

🎉 Salem–Spencer construction (1942)

Salem and Spencer disproved Szekeres' conjecture by constructing a nonaveraging set with at least n^(1 − c/√log log n) elements ≤ n—much denser than n^0.631.

Construction via bracketing:

  • Write x in base 10 and bracket digits: first digit in bracket 1, next two in bracket 2, next three in bracket 3, etc.
  • Example: 32653200200 → (00003)(2653)(200)(20)(0).
  • Let r = number of nonempty brackets; call the number in bracket i as xᵢ (i = 1, …, r − 2), and let = number in last two brackets together (excluding last digit).
  • xR if and only if:
    1. Last digit of x is 1.
    2. xᵢ begins with 0 for i = 1, …, r − 2.
    3. x₁² + … + xᵣ₋₂² = .

Why R is nonaveraging:

  • If x, yR have different numbers of brackets, their average violates (1).
  • If x, yR have the same r brackets, averaging bracket-by-bracket gives = ( + ȳ)/2 = (Σ xᵢ² + Σ yᵢ²)/2.
  • But if zR, then = Σ zᵢ² = Σ ((xᵢ + yᵢ)/2)².
  • This forces Σ (xᵢ − yᵢ)²/2 = 0, so xᵢ = yᵢ for all i, hence x = y.
  • Therefore no three distinct elements of R are in arithmetic progression.

Counting:

  • For r brackets, the first r − 2 can be filled in 10^(((r−1)(r−2))/2) ways; the last two are determined by (3).
  • For n with 10^((r(r+1))/2) ≤ n < 10^(((r+1)(r+2))/2), we have R(n) ≥ 10^(((r−1)(r−2))/2).
  • From r + 2 > √(2 log n), we get R(n) ≥ 10^(log nc√log n) = 10^((log n)(1 − c/√log n)) = n^(1 − c/√log n).

🏆 Roth's theorem (1954)

  • Old conjecture: A(n)/n → 0 for every nonaveraging sequence.
  • Roth (1954) proved this conjecture (non-elementary proof).
  • This means nonaveraging sets have density zero, even though Salem–Spencer showed they can be much denser than Szekeres' sequence.

🔢 Sum-distinct sets and addition chains

🔢 Sum-distinct sets

Problem (Erdős): What is the maximum number k of integers a₁ < … < aₖ ≤ n such that all 2^k sums of distinct a's are distinct?

Basic bounds:

  • Powers of 2 show k + 1 integers ≤ 2^k work; in fact k + 2 integers < 2^k can satisfy the condition.
  • All sums are < kn, so 2^kkn, giving k < (log n)/log 2 + (1 + o(1))(log log n)/log 2.

Improved bound (Erdős–Moser):

  • Estimate Σᵢ (sᵢ − A/2)² where sᵢ are the sums and A = a₁ + … + aₖ.
  • Σᵢ (sᵢ − A/2)² = 2^k Σ aᵢ² < 2^(k−2) n² k.
  • At most 2^(k−1) sums satisfy |sᵢ − A/2| ≥ n/√k.
  • So 2^(k−1) distinct sums lie in a range of length 2n/√k, giving 2^(k−1) ≤ 2n/√k.
  • Hence 2^k < 4√(kn), improving to k < (log n)/log 2 + (1 + o(1))(log log n)/(2 log 2).

Erdős' conjecture: k = (log n)/log 2 + o(1).

➕ Addition chains

Addition chain for n: A sequence 1 = a₀ < a₁ < … < aᵣ = n such that every aₚ = aσ + aτ for some σ, τ < p.

  • Length (n): The smallest r for which such a chain exists.
  • Example for n = 666: both 1, 2, 4, 8, 16, 24, 40, 80, 160, 320, 640, 664, 666 and 1, 2, 3, 6, 9, 18, 27, 54, 81, 162, 324, 648, 666 have length 12.

Scholz's results:

  1. m + 1 ≤ (n) ≤ 2m for 2^m + 1 ≤ n ≤ 2^(m+1) (m ≥ 1).
  2. (ab) ≤ (a) + (b).
  3. (2^(m+1) − 1) ≤ m + (m + 1).

Proof of (1): Write n in binary. Build the chain by doubling and adding 1 when needed. Each binary digit "costs" at most 2 steps, so < 2 log₂ n. The lower bound m + 1 is trivial.

Brauer's improvement: (n) ∼ log₂ n.

  • Idea: Build a "stock" of all r-digit binary numbers first (cost: 2^r).
  • Break n's binary representation into blocks of r digits each.
  • Use doubling between blocks and add from stock at block boundaries.
  • Total cost ≈ 2^r + m + m/r (where n ≈ 2^m).
  • Choose r to minimize 2^r + m/r, making it negligible compared to m.
  • Result: (n) < log₂ n {1 + 1/log log n + 2 log 2 / (log n)^(1 − log 2)}.

Open question: Can Scholz's inequality (3) be improved or is it false?

📊 Other combinatorial problems

📊 Representation functions

Let a₁ < a₂ < … be an infinite sequence of integers. Define f(n) = number of solutions to n = aᵢ + aⱼ (counting each solution once).

Dirac–Newman theorem: f(n) cannot be constant from some stage on.

  • Proof sketch: If f( + 1) = f( + 2) = … = a, then
    • ½ (Σ z^(aₖ))² + Σ z^(2aₖ) = Σ f(n) z^n = Pₗ(z) + a z^(+1) / (1 − z),
    • where Pₗ(z) is a polynomial of degree ≤ .
    • As z → −1 on the real axis, the right side stays bounded, but the left side → ∞ (both terms positive, second → ∞).
    • Contradiction.

Conjectures:

  • Turán–Erdős: If f(n) > 0 for all large n, then lim sup f(n) = ∞.
  • Stronger: If aₖ > ck², then lim sup f(n) = ∞.
  • Best known: lim sup f(n) ≥ 2.

Erdős–Fuchs result: Σₖ₌₁ⁿ f(k) = cn + o(n^(1/4) log n) is impossible (much more general than Hardy–Landau's result for lattice points in circles).

🎯 Difference-set problems

Erdős conjecture: Given 2n integers a₁ < … < a₂ₙ in [1, 4n] and the remaining 2n integers b₁ < … < b₂ₙ, there exists x such that aᵢ + x = bⱼ has at least n solutions.

Easy bound: Total solutions Σ (aᵢ + y = bⱼ) = 4n²; possible y values: 8n. So some y₀ gives ≥ n/2 solutions.

Improvements:

  • Scherk: ≥ n(2 − √2) ≈ 0.586n.
  • Moser: ≥ 0.712n.
  • Selfridge–Ralston–Motzkin (using computer): found counterexamples where no x gives > 0.8n solutions, disproving the original conjecture.

🧩 Lorentz's theorem

Erdős conjecture: Given any infinite sequence {aᵢ}, there exists a sequence {bᵢ} of zero density such that every integer is of the form aᵢ + bⱼ.

  • Lorentz proved this conjecture.

Open problem: Find a sequence B: b₁ < b₂ < … with B(n) < cn/log n such that every integer is of the form 2^k + bⱼ.

8

Geometry of Numbers

Chapter 8. Geometry of Numbers

🧭 Overview

🧠 One-sentence thesis

Minkowski's fundamental theorem establishes that any convex, centrally symmetric region in the plane with area greater than 4 contains a lattice point other than the origin, and this geometric tool yields powerful results about approximating irrationals by rationals and solving linear inequalities in integers.

📌 Key points (3–5)

  • The fundamental theorem: A region in the x-y plane with area > 4, symmetric about the origin and convex, must contain a lattice point other than the origin.
  • Why the constant 4 is sharp: A square of side 2 − ε centered at the origin shows that 4 cannot be replaced by any smaller number.
  • What symmetry and convexity really mean: Together they ensure that if vectors V₁ and V₂ are in R, so is ½(V₁ − V₂); this weaker condition is actually sufficient.
  • Common confusion: Neither symmetry nor convexity alone is indispensable; what matters is the combined property that the region contains ½(V₁ − V₂) whenever it contains V₁ and V₂.
  • Applications: The theorem gives bounds on rational approximations to irrationals, determines visibility distance in a lattice "forest," and provides conditions for integer solutions to linear inequalities.

🎯 The fundamental theorem and its proof

🎯 Statement of the theorem

Minkowski's fundamental theorem (simplest form): Let R be a region in the x-y plane of area A > 4, symmetric about the origin and convex. Then R contains a lattice point other than the origin.

  • The constant 4 is best possible: consider a square of side 2 − ε centered at the origin, which has area 4(1 − ε/2)² < 4 and contains no lattice point except the origin.
  • The theorem generalizes to n dimensions by replacing 4 with 2ⁿ.

🔍 What the conditions mean

Symmetry: If vector V₁ is in R, then −V₁ is also in R.

Convexity: If vectors V₁ and V₂ are in R, then ½(V₁ + V₂) is also in R.

The key combined property: Symmetry and convexity together imply that if V₁ and V₂ are in R, so is ½(V₁ − V₂).

  • This last condition is actually sufficient for the theorem and may replace symmetry and convexity.
  • It is implied by symmetry and convexity but does not imply either condition individually.
  • Don't confuse: the theorem does not require both symmetry and convexity as separate necessities; the weaker combined property suffices.

🧩 Hajos' proof (1934)

The proof uses a "chessboard compression" argument:

  1. Cut up the plane: Divide the x-y plane into an infinite chessboard with basic squares of area 4, determined by |x| ≤ 1, |y| ≤ 1.
  2. Superimpose squares: Cut along the edges and superimpose all squares that contain parts of region R.
  3. Force overlap: Since R has area > 4 and we compress it into a region of area 4, there must be overlapping—a pin can pierce two points V₁ and V₂ of R.
  4. Find the lattice point: When reassembled, the x and y coordinates of V₁ and V₂ differ by a multiple of 2, so V₁ ≡ V₂ (mod 2).
  5. Conclusion: This implies ½(V₁ − V₂) ≡ 0 (mod 1), so ½(V₁ − V₂) is a lattice point different from O (since V₁ ≠ V₂) that lies in R.

📐 Irrational approximation application

📐 The irrational-slope line example

Consider a line through O with irrational slope tan θ (Figure 4 in the excerpt).

  • This line passes through no lattice point other than the origin.
  • Take a long segment extending length R on either side of O.
  • There will be a lattice point (p, q) closest to the segment, at distance r from it.

🔢 What the theorem tells us

No matter how large R is, we can construct a rectangle containing this line segment with:

  • Long side: 2R
  • Short side: 2r
  • Area: 4rR

By Minkowski's theorem, since the rectangle contains no lattice point other than O, its area cannot exceed 4:

  • 4rR ≤ 4
  • Therefore r ≤ 1/R

Implication: If (p, q) is a lattice point on the border of the rectangle, then p/q ≈ tan θ, so the fundamental theorem gives information about how closely an irrational number can be approximated by rationals.

🌲 Visibility in a lattice forest (Pólya's problem)

🌲 The problem setup

  • Every lattice point other than O is surrounded by a circle of radius r ≤ ½ (representing trees in a forest).
  • A person stands at O.
  • In direction θ, they can see a distance f(r, θ).
  • Question: What is the furthest they can see in any direction? That is, determine F(r) = max over θ of f(θ, r).

👁️ The solution using Minkowski's theorem

Lower bound: By looking past the circle centered at (1, 0), we can see almost a distance 1/r.

Upper bound: Suppose we can see a distance F(r) in direction θ.

  • Construct a rectangle about this line of vision with side 2r.
  • This rectangle contains no lattice point (otherwise the tree centered there would obstruct vision).
  • By Minkowski's theorem: 4F(r)r ≤ 4
  • Therefore F(r) ≤ 1/r

Conclusion: F(r) = 1/r (approximately).

Note: No lattice point can be in either semicircle in the diagram, which enables a slight improvement on Pólya's result.

🔧 Solving linear inequalities in integers

🔧 The problem

Consider the system of n inequalities:

  • |a₁₁x₁ + a₁₂x₂ + ⋯ + a₁ₙxₙ| ≤ λ₁
  • |a₂₁x₁ + a₂₂x₂ + ⋯ + a₂ₙxₙ| ≤ λ₂
  • |aₙ₁x₁ + aₙ₂x₂ + ⋯ + aₙₙxₙ| ≤ λₙ

where aᵢⱼ are real numbers and λ₁, λ₂, …, λₙ are positive numbers.

Goal: Find sufficient conditions for the existence of integers x₁, …, xₙ (not all 0) satisfying the system.

✅ The solution criterion

Sufficient condition: A solution exists if the absolute value of the determinant det(aᵢⱼ) is less than the product λ₁ · λ₂ · ⋯ · λₙ.

Why it works:

  • Geometrically, the inequalities determine an n-dimensional parallelepiped.
  • Its volume (content) is: (1/|det(aᵢⱼ)|) · 2ⁿ · λ₁ · λ₂ · ⋯ · λₙ
  • If λ₁ · λ₂ · ⋯ · λₙ > |det(aᵢⱼ)|, then the content exceeds 2ⁿ.
  • By the n-dimensional Minkowski theorem, the region contains a lattice point different from O.

🔬 Recent results and open problems

🔬 Non-symmetric regions

Recent analogue: Let R be a convex region, not necessarily symmetric about O, but having its centroid at O. If its area exceeds 9/2, then it contains a lattice point other than O.

  • The constant 9/2 is best possible.
  • An n-dimensional analogue is unknown.

🤔 Conjectured generalization

Let R be a convex region containing the origin, defined by r = f(θ), 0 ≤ θ < 2π.

Conjecture: If the integral from 0 to π of f(θ)f(θ + π) dθ > 4, then R contains a nontrivial lattice point.

  • For symmetrical regions, f(θ) = f(θ + π), and the conjecture reduces to the fundamental theorem of Minkowski.
  • This conjecture remains unproven.

📊 The covering problem

Define M(n) as the smallest number such that any convex region of area M(n) can be placed to cover n lattice points.

nKnown result
1M(1) = 0
2M(2) = π/4 (any convex region exceeding the area of a circle of diameter 1 can cover 2 lattice points)
3Difficult to determine

Known bounds:

  • M(n) ≤ n − 1 (easy to prove)
  • Conjecture: There exists a positive constant c such that M(n) < n − c√n