Yet Another Introductory Number Theory Textbook (Cryptology Emphasis Version)

The Well-Ordering Principle and Mathematical Induction

1.1. The Well-Ordering Principle and Mathematical Induction

🧭 Overview

🧠 One-sentence thesis

The well-ordering principle, pigeonhole principle, and mathematical induction are three fundamental proof tools for establishing properties of integers, with induction relying on well-ordering to show that properties holding for one natural number propagate to all subsequent numbers.

📌 Key points (3–5)

Well-ordering principle: every non-empty set of natural numbers has a least element (smallest member).
Two forms of induction: the first principle uses a single previous case (k implies k+1), while the second uses all previous cases (1 through k imply k+1).
How induction works: prove a base case (usually n=1), then prove that if the statement holds for k, it must hold for k+1; this chain covers all natural numbers.
Common confusion: the two induction principles differ in their inductive step—don't confuse "assuming P(k)" with "assuming P(1) through P(k)."
Why these tools matter: they provide rigorous methods to prove statements about all integers without checking infinitely many cases individually.

🏗️ Foundational principles

🏗️ Well-ordering principle

Every non-empty set of natural numbers has a least element.

This means: if you have any collection of natural numbers (1, 2, 3, ...) that isn't empty, there is always a smallest one.
The excerpt notes this is "often taken as an axiom"—a starting assumption rather than something proved.
Example: the set {5, 17, 3, 100} has least element 3; the set of all even numbers has least element 2.

🕳️ Pigeonhole principle

If s objects are placed in k boxes where s > k, then at least one box contains more than one object.

Plain language: if you have more items than containers, at least one container must hold multiple items.
The proof uses contradiction: if no box had more than one object, there would be at most k objects total, contradicting s > k.
Example: if 10 students sit in 9 chairs, at least one chair must have 2 or more students.

🔁 First principle of mathematical induction

🔁 The structure

The first principle has two required steps:

Step	What you prove	Why it matters
Base case	Property P(1) is true	Establishes a starting point
Inductive step	For any k, if P(k) is true then P(k+1) is true	Creates a chain: 1→2→3→...

If both steps succeed, the property holds for all natural numbers.
The excerpt states it more formally: if a set S contains 1, and whenever S contains k it also contains k+1, then S equals all natural numbers.

🔗 How the proof works

The proof uses well-ordering by contradiction.
Assume some natural numbers are NOT in S; call this set T.
By well-ordering, T has a least element ℓ.
Since 1 is in S, we know ℓ > 1, so ℓ-1 is a natural number.
Since ℓ is the least element not in S, ℓ-1 must be in S.
But the inductive property says if ℓ-1 is in S, then (ℓ-1)+1 = ℓ must also be in S—contradiction!
Therefore T is empty and S contains all natural numbers.

📐 Example: sum formula

The excerpt proves that the sum 1 + 2 + ... + n equals n(n+1)/2.

Base case (n=1):

Left side: 1
Right side: 1·2/2 = 1
They match ✓

Inductive step:

Assume the formula holds for some n (this is the "inductive hypothesis").
Prove it for n+1: the sum 1 + 2 + ... + n + (n+1).
Split this as [1 + 2 + ... + n] + (n+1).
By the inductive hypothesis, the bracketed part equals n(n+1)/2.
So the total is n(n+1)/2 + (n+1) = (n+1)(n+2)/2, which is the formula for n+1 ✓

📈 Example: factorial inequality

The excerpt proves n! ≤ n^n for all natural numbers n (where n! means n factorial and n^n means n to the nth power).

Base case (n=1):

1! = 1 and 1^1 = 1, so 1 ≤ 1 ✓

Inductive step:

Assume k! ≤ k^k for some k.
Prove (k+1)! ≤ (k+1)^(k+1).
Note (k+1)! = (k+1)·k! ≤ (k+1)·k^k by the inductive hypothesis.
Since k^k < (k+1)^k, we get (k+1)·k^k < (k+1)·(k+1)^k = (k+1)^(k+1) ✓

🔁🔁 Second principle of mathematical induction

🔁🔁 The stronger assumption

The second principle differs in the inductive step:

First principle: assume P(k) is true, prove P(k+1).
Second principle: assume P(1), P(2), ..., P(k) are ALL true, prove P(k+1).

The second form is "stronger" because you can use more information in your proof—not just the immediately previous case, but all previous cases.

🔗 Relationship to the first principle

The excerpt proves the second principle using the first principle.
Define a new property: "P(n) means that P(1), P(2), ..., P(n) are all true."
Apply the first principle to this new property.
Details are left to the reader, but the key idea is that the second principle is not fundamentally different—it's a consequence of the first.

🎯 When to use which

Use the first principle when the step from n to n+1 depends only on the case n.
Use the second principle when proving P(n+1) might require looking back at multiple earlier cases (e.g., P(n-1), P(n-2), etc.).
Don't confuse: both principles are valid; the second just gives you more flexibility in the inductive step.

Algebraic Operations with Integers

1.2. Algebraic Operations with Integers

🧭 Overview

🧠 One-sentence thesis

The integers support addition and multiplication with familiar properties (commutativity, associativity, distributivity), but while every integer has an additive inverse, only 1 has a multiplicative inverse, which constrains how subtraction and division are defined.

📌 Key points (3–5)

Two basic operations: addition (+) and multiplication (·) are binary operations on the integers, satisfying commutativity, associativity, and distributivity.
Identity elements: 0 is the additive identity and 1 is the multiplicative identity.
Additive vs multiplicative inverses: every integer has an additive inverse (−a), but only 1 has a multiplicative inverse in the integers.
Common confusion: subtraction is always defined for any two integers, but division is not a binary operation—it is defined only for specific pairs where one integer is a multiple of the other.
How subtraction and division are defined: subtraction is defined as adding the additive inverse; division a/b is defined as c only when a = b · c for some integer c.

🔢 The two basic operations and their properties

➕ Addition and multiplication

Addition (denoted by +) and multiplication (denoted by ·) are the two basic binary operations on the set of integers Z.

A binary operation takes two integers and produces another integer.
These operations satisfy three fundamental properties.

🔄 Commutativity

What it means: the order of operands does not matter.
For all integers a and b:
- a + b = b + a
- a · b = b · a
Example: 3 + 5 equals 5 + 3; 2 · 7 equals 7 · 2.

🔗 Associativity

What it means: when combining three integers, grouping does not matter.
For all integers a, b, c:
- (a + b) + c = a + (b + c)
- (a · b) · c = a · (b · c)
Example: (2 + 3) + 4 equals 2 + (3 + 4).

🌐 Distributivity

What it means: multiplication distributes over addition.
For all integers a, b, c:
- a · (b + c) = a · b + a · c
Example: 2 · (3 + 4) equals 2 · 3 + 2 · 4.

🎯 Identity elements

🎯 Additive identity

0 is the identity element for addition.

For every integer a:
- a + 0 = 0 + a = a
Adding 0 leaves the integer unchanged.

🎯 Multiplicative identity

1 is the identity element for multiplication.

For every integer a:
- a · 1 = 1 · a = a
Multiplying by 1 leaves the integer unchanged.

↩️ Inverses: addition vs multiplication

↩️ Additive inverse (always exists)

For every integer a in Z, there exists another integer denoted by −a such that a + (−a) = 0.

Key point: every integer has an additive inverse.
The additive inverse "undoes" addition.
Example: the additive inverse of 5 is −5, because 5 + (−5) = 0.

↩️ Multiplicative inverse (only for 1)

An integer a has a multiplicative inverse (denoted a⁻¹ or 1/a) if there exists another integer such that a · a⁻¹ = 1.

Key restriction: only the integer 1 has a multiplicative inverse in Z (namely, 1 itself).
Other integers do not have integer multiplicative inverses.
Example: 2 has no integer b such that 2 · b = 1, because 1/2 is not an integer.
Don't confuse: this is about inverses within the integers; rational numbers allow more inverses, but the excerpt is restricted to Z.

➖ Derived operations: subtraction and division

➖ Subtraction (always a binary operation)

For all integers a, b in Z, subtraction a − b is defined to be a + (−b).

Subtraction is defined for any two integers.
It uses the additive inverse: subtracting b means adding the additive inverse of b.
Example: 7 − 3 is defined as 7 + (−3).

➗ Division (not always defined)

Given integers a, b in Z where b ≠ 0, division a/b is defined to be c if there exists an integer c such that a = b · c.

Key restriction: division is not a binary operation on Z.
It is defined only for specific pairs of integers.
The condition: a must be a multiple of b (i.e., a = b · c for some integer c).
Example: 12/3 is defined as 4 because 12 = 3 · 4; but 7/3 is not defined in Z because there is no integer c such that 7 = 3 · c.
Don't confuse: division in Z is not the same as division in rational numbers; here, the result must be an integer.

🔍 Why division is restricted

Because only 1 has a multiplicative inverse in Z, most integers cannot be "divided out" to produce another integer.
Division a/b exists only when a is an exact multiple of b.
This restriction leads to the concept of divisibility, which the excerpt mentions will be discussed in the next section.

1.3. Divisibility and the Division Algorithm

🧭 Overview

🧠 One-sentence thesis

The concept of divisibility formalizes when one integer is a multiple of another, and the Division Algorithm guarantees that any integer can be uniquely expressed as a quotient times a divisor plus a remainder.

📌 Key points (3–5)

What divisibility means: integer a divides integer b if there exists an integer k such that b equals k times a.
Key properties: divisibility is transitive (if a divides b and b divides c, then a divides c) and extends to linear combinations (if c divides both a and b, then c divides any combination ma + nb).
The Division Algorithm: for any integer a and positive integer b, there exist unique integers q (quotient) and r (remainder) such that a = qb + r with 0 ≤ r < b.
Common confusion: divisibility (a divides b) is not the same as division (a divided by b); divisibility requires the result to be an integer, while division may not be defined for all integer pairs.
Why it matters: divisibility underpins definitions like even/odd, and the Division Algorithm is foundational for number theory proofs and algorithms.

🔢 Understanding divisibility

🔢 Core definition

Divisibility: If a and b are integers such that a ≠ 0, then a divides b (written a | b) if there exists an integer k such that b = ka.

This is about whether b is an exact multiple of a.
Terminology: if a divides b, then a is a factor or divisor of b, and b is a multiple of a.
If a does not divide b, we write a ∤ b.
Example: 2 divides 4 (written 2 | 4) because 4 = 2 × 2; 7 divides 63 (7 | 63) because 63 = 7 × 9; but 5 does not divide 26 (5 ∤ 26).

🔢 Even and odd integers

Even: an integer a is even if 2 | a, i.e., if there exists an integer k such that a = 2k.

Odd: an integer a is odd if 2 ∤ a.

The excerpt notes that if a is odd, then (as a consequence of the Division Algorithm) there exists an integer k such that a = 2k + 1.
Don't confuse: even means "divisible by 2," odd means "not divisible by 2" (and leaves remainder 1 when divided by 2).

🔢 Basic divisibility facts

The excerpt lists several propositions:

Every integer divides zero: for all a in the integers, a | 0 (because 0 = a × 0).
Absolute value constraint: if b is an integer such that the absolute value of b is less than a, and b ≠ 0, then a does not divide b.
Absolute value equivalence: given integers a and b, a divides b if and only if a divides the absolute value of b.

🔗 Properties of divisibility

🔗 Transitivity

Theorem: If a, b, and c are integers such that a | b and b | c, then a | c.

Why it works: Since a | b, there exists an integer k₁ such that b = k₁a. Since b | c, there exists an integer k₂ such that c = k₂b. Substituting gives c = k₁k₂a, so a | c.
Example: Since 6 | 18 and 18 | 36, then 6 | 36.

🔗 Linear combinations

Theorem: For all integers a, b, c, m, n, if c | a and c | b, then c divides (ma + nb).

Why it works: Since c | a and c | b, there exist integers k₁ and k₂ such that a = k₁c and b = k₂c. Then ma + nb = mk₁c + nk₂c = c(mk₁ + nk₂), so c divides the linear combination.
Generalization: This extends to any finite linear combination: if a divides b₁, b₂, ..., b_n, then a divides the sum of k_j times b_j for any integers k₁, ..., k_n.

📐 The Division Algorithm

📐 Statement of the theorem

The Division Algorithm: Given integers a and b such that b > 0, there exist unique integers q and r such that a = qb + r and 0 ≤ r < b.

q is called the quotient and r is called the remainder when a is divided by b.
Example: If a = 71 and b = 6, then 71 = 6 × 11 + 5, so q = 11 and r = 5.

📐 Why it works (proof sketch)

The proof uses the well-ordering principle:

Existence: Consider the set A = {a − bk ≥ 0 | k is an integer}. This set is nonempty (for k < a/b, a − bk > 0). By the well-ordering principle, A has a least element r = a − qb for some integer q. By construction, r ≥ 0. If r ≥ b, then r − b = a − (q + 1)b ≥ 0, contradicting the minimality of r. Thus 0 ≤ r < b.
Uniqueness: Suppose a = q₁b + r₁ and a = q₂b + r₂ with 0 ≤ r₁ < b and 0 ≤ r₂ < b. Then (q₁ − q₂)b = r₂ − r₁, so b divides (r₂ − r₁). Since the absolute value of (r₂ − r₁) is at most the maximum of r₁ and r₂, which is less than b, the only possibility is r₂ − r₁ = 0, i.e., r₂ = r₁. This forces q₁ = q₂.

📐 Relationship to division operation

Recall from the excerpt's background: division (a/b) is defined only when there exists an integer c such that a = b × c. In that case, a/b is defined to be c.
The Division Algorithm applies to any integer a and positive integer b, not just when b divides a.
When b divides a, the remainder r is zero and the quotient q equals a/b.
Don't confuse: divisibility is a yes/no property; the Division Algorithm always produces a quotient and remainder, even when divisibility fails.

🧮 Selected exercises and applications

The excerpt includes exercises that illustrate key applications:

Verifying divisibility: Show that 5 | 25, 19 | 38, and 2 | 98.
Using the Division Algorithm: Find quotient and remainder when 76 is divided by 13, or when −100 is divided by 13.
Products of divisors: If a | b and c | d, then ac | bd.
Ordering: If a and b are positive integers and a | b, then a ≤ b.
Even and odd sums/products: The sum of two even integers is even; the sum of two odd integers is even; the sum of an even and an odd integer is odd. The product of two even integers is even; the product of two odd integers is odd; the product of an even and an odd integer is even.
Divisibility patterns: Show that 3 divides m³ − m for any integer m; the square of every odd integer is of the form 8m + 1; the square of any integer is of the form 3m or 3m + 1 but not 3m + 2.
Cancellation and equality: If ac | bc, then a | b. If a | b and b | a, then a = ±b.

Representations of Integers in Different Bases

1.4. Representations of Integers in Different Bases

🧭 Overview

🧠 One-sentence thesis

Any positive integer can be uniquely expressed in any base greater than 1, and the proof establishes both the existence and uniqueness of this representation through repeated division.

📌 Key points (3–5)

Core theorem: Every natural number m can be written uniquely as a sum of powers of any base b > 1, with coefficients between 0 and b-1.
How to convert: Repeatedly divide by the base and collect remainders to convert from decimal to another base; multiply powers and sum to convert back to decimal.
Notation: An integer a written in base b is denoted (a)_b; the sequence of digits is written without commas (e.g., (21221)_3).
Common confusion: The coefficients must satisfy 0 ≤ a_j < b, not 0 ≤ a_j ≤ b; this constraint is essential for uniqueness.
Practical bases: Binary (base 2) uses only 0 and 1, making it ideal for computers; octal (base 8), hexadecimal (base 16), and sexagesimal (base 60) are also used in specific contexts.

🔢 The fundamental theorem

🔢 Existence of base representation

Theorem 1.4.1: Let b be an integer with b > 1. Then for every natural number m, there exists a natural number l and integers a_1, ..., a_l such that m = a_l · b^l + a_(l-1) · b^(l-1) + ... + a_1 · b + a_0, where 0 ≤ a_j < b for j = 0, 1, ..., l, and a_l ≠ 0.

How the proof works:

Start by dividing m by b to get m = q_0 · b + a_0, where 0 ≤ a_0 < b.
If q_0 ≠ 0, divide q_0 by b to get q_0 = q_1 · b + a_1, where 0 ≤ a_1 < b.
Continue this process: q_1 = q_2 · b + a_2, and so on.
The sequence q_0, q_1, q_2, ... is decreasing and non-negative, so it must eventually reach 0.
Substitute backwards to build the full expansion.

Why it works: Each division step extracts one digit (the remainder), and the quotients decrease until they reach zero, guaranteeing the process terminates.

🔒 Uniqueness of base representation

The proof shows that if two different expansions existed for the same number, they would lead to a contradiction:

Suppose m has two expansions with coefficients a_j and c_j.
Subtracting them gives (a_l - c_l) · b^l + ... + (a_0 - c_0) = 0.
If any coefficient differs (say a_j ≠ c_j), then b divides (a_j - c_j).
But since 0 ≤ a_j < b and 0 ≤ c_j < b, we have -(b-1) ≤ (a_j - c_j) ≤ (b-1).
The only multiple of b in this range is 0, so a_j = c_j—a contradiction.

Don't confuse: The constraint 0 ≤ a_j < b is strict on the upper end; a_j cannot equal b. This strictness is what guarantees uniqueness.

🔄 Converting between bases

🔄 Decimal to another base

Method: Repeatedly divide by the target base and collect remainders.

Example from the excerpt: To convert 214 to base 3:

214 = 3 · 71 + 1 (remainder 1)
71 = 3 · 23 + 2 (remainder 2)
23 = 3 · 7 + 2 (remainder 2)
7 = 3 · 2 + 1 (remainder 1)
2 = 3 · 0 + 2 (remainder 2)

Reading the remainders from bottom to top: (214)_10 = (21221)_3.

Why this works: Each division extracts the next digit (from right to left) in the base expansion, and the process stops when the quotient becomes 0.

🔄 Another base to decimal

Method: Multiply each digit by its corresponding power of the base and sum.

Example from the excerpt: To convert (364)_7 to decimal:

4 · 7^0 + 6 · 7^1 + 3 · 7^2
= 4 + 42 + 147
= 193

So (364)_7 = (193)_10.

Why this works: This is simply evaluating the definition of the base expansion directly.

💻 Special bases and applications

💻 Binary (base 2)

Called binary representation.
Coefficients are only 0 or 1.
Why computers use it: Each digit can be represented by a wire with voltage (1) or no voltage (0).
The word "bit" is a contraction of "binary digit."

💻 Other common bases

Base	Name	Digits used	Application
2	Binary	0, 1	Computers (fundamental)
8	Octal	0–7	Computer programming
10	Decimal	0–9	Daily life (ten fingers)
16	Hexadecimal (hex)	0–9, A–F	Computer programming
60	Sexagesimal	0–59	Babylonians

For bases ≥ 10: Use letters to represent digits beyond 9. For example:

10 → A
11 → B
12 → C
etc.

Cultural note: The excerpt humorously suggests we use base 10 "probably only because we have ten fingers," and jokes that Babylonians must have had "30 fingers on each hand, or 15 on each hand and each foot" to use base 60.

💻 Notation conventions

Definition 1.4.2: The base b expression for m is the sequence of digits m_b = a_ℓ...a_1.

The subscript indicates the base: (214)_10 means 214 in base 10.
Digits are written without separators: (21221)_3, not (2,1,2,2,1)_3.
The leftmost digit (a_ℓ) is the most significant; the rightmost (a_0) is the least significant.

The Greatest Common Divisor

1.5. The Greatest Common Divisor

🧭 Overview

🧠 One-sentence thesis

The greatest common divisor of two integers is not only the largest integer dividing both, but can always be expressed as a linear combination of those integers, which is the foundation for many number-theoretic results.

📌 Key points (3–5)

What the gcd is: the largest integer that divides both given integers (not both zero).
Key property—linear combination: the gcd of two integers can always be written as ma + nb for some integers m and n.
Invariance under addition: adding a multiple of one integer to the other does not change the gcd.
Common confusion: mutually relatively prime vs pairwise relatively prime—the first means gcd of all is 1; the second means every pair has gcd 1.
Scaling property: dividing both integers by their gcd produces relatively prime integers.

🔢 Definition and basic properties

🔢 What the greatest common divisor is

Greatest common divisor: Given integers a and b (not both zero), the greatest common divisor is the largest integer that divides both a and b, written gcd(a, b) or sometimes (a, b).

Two integers not both zero have only finitely many divisors, so only finitely many common divisors exist.
The gcd is the largest of these common divisors.
By convention, gcd(0, 0) = 0 to simplify some formulas.
Example: gcd(24, 18) = 6 because 6 is the largest integer dividing both 24 and 18.

➕ Sign does not matter

Every integer has both positive and negative divisors: if a divides m, then −a also divides m.
Therefore: gcd(a, b) = gcd(|a|, |b|).
The gcd is always considered as a positive integer (or zero in the special case).

🤝 Relatively prime integers

Relatively prime: Integers a and b are relatively prime if gcd(a, b) = 1.

Example: gcd(9, 16) = 1, so 9 and 16 are relatively prime.
This means the only common divisor is 1.

🔧 Key theorems about the gcd

🔧 Dividing by the gcd produces relatively prime integers

Theorem 1.5.5: If integers a and b have gcd(a, b) = d, then gcd(a/d, b/d) = 1.

Why this works: Suppose some natural number k divides both a/d and b/d. Then a/d = km and b/d = kn for some natural numbers m and n. This means a = kmd and b = knd, so kd is a common divisor of a and b. Since kd ≥ d and d is the greatest common divisor, we must have k = 1.
Implication: You can always "reduce" two integers to relatively prime integers by dividing out their gcd.
Example: gcd(24, 18) = 6, so gcd(24/6, 18/6) = gcd(4, 3) = 1.

➕ Adding multiples does not change the gcd

Theorem 1.5.6: For integers a, b, c, we have gcd(a, b) = gcd(a + cb, b).

Why this works: Every common divisor of a and b also divides a + cb (by Theorem 1.3.9). Conversely, every common divisor of a + cb and b also divides (a + cb) − cb = a. Therefore a and b have exactly the same common divisors as a + cb and b, so their greatest common divisors are equal.
Example: gcd(4, 14) = gcd(4, 14 − 3·4) = gcd(4, 2) = 2.
Why it matters: This property is the basis for the Euclidean algorithm (covered in the next section).

🧮 The gcd as a linear combination

Theorem 1.5.8: Let a and b be integers, not both zero. Then gcd(a, b) is the smallest natural number that can be written as d = ma + nb for some integers m and n.

Proof outline:

Consider the set S of all positive integers of the form ma + nb for integers m and n.
S is non-empty because a = 1·a + 0·b and b = 0·a + 1·b are both in S.
By the well-ordering principle, S has a least element d = ma + nb for some integers m and n.
Using the division algorithm, write a = qd + r with 0 ≤ r < d. Then r = a − qd = a − q(ma + nb) = (1 − qm)a − qnb, so r is also a linear combination of a and b.
Since d is the smallest positive linear combination and 0 ≤ r < d, we must have r = 0, so d divides a.
The same argument shows d divides b.
If any integer c divides both a and b, then c divides every linear combination of a and b (by Theorem 1.3.9), so c divides d. Therefore c ≤ d.
Hence d is the greatest common divisor.

Corollary 1.5.9: If a and b are relatively prime, then there exist integers m and n such that ma + nb = 1.

This follows immediately from Theorem 1.5.8 with gcd(a, b) = 1.
Why it matters: This result is fundamental in modular arithmetic and solving linear Diophantine equations.

🔀 Multiple integers and types of relative primality

🔀 Greatest common divisor of more than two integers

Greatest common divisor (multiple integers): For natural number n, let a₁, a₂, ..., aₙ be integers not all zero. The greatest common divisor is the largest integer that divides all of them, denoted gcd(a₁, ..., aₙ).

🤝 Mutually relatively prime

Mutually relatively prime: For natural number n, integers a₁, a₂, ..., aₙ are mutually relatively prime if gcd(a₁, a₂, ..., aₙ) = 1.

Example: The integers 3, 6, 7 are mutually relatively prime because gcd(3, 6, 7) = 1, even though gcd(3, 6) = 3.
Don't confuse: This does NOT mean every pair has gcd 1; it only means the gcd of all of them together is 1.

👥 Pairwise relatively prime

Pairwise relatively prime: For natural number n, integers a₁, a₂, ..., aₙ are pairwise relatively prime if for all i, j ≤ n with i ≠ j, we have gcd(aᵢ, aⱼ) = 1.

Example: The integers 3, 14, 25 are pairwise relatively prime (and also mutually relatively prime).
Comparison:

Property	Condition	Example
Mutually relatively prime	gcd(all together) = 1	3, 6, 7: gcd(3,6,7)=1 but gcd(3,6)=3
Pairwise relatively prime	Every pair has gcd = 1	3, 14, 25: gcd(3,14)=1, gcd(3,25)=1, gcd(14,25)=1

Proposition 1.5.15: If integers a₁, a₂, ..., aₙ are pairwise relatively prime, then they are mutually relatively prime.

Direction matters: Pairwise relatively prime implies mutually relatively prime, but the converse is false (as shown by the example 3, 6, 7).

The Euclidean Algorithm

1.6. The Euclidean Algorithm

🧭 Overview

🧠 One-sentence thesis

The Euclidean algorithm provides a systematic method to find the greatest common divisor of two integers by repeatedly applying division and tracking remainders until reaching zero.

📌 Key points (3–5)

Core mechanism: the gcd of two integers equals the gcd of the smaller integer and the remainder from division—this allows successive reduction.
Termination guarantee: remainders decrease at each step and are non-negative integers, so the process must eventually reach zero.
Extended version: the algorithm can also express the gcd as a linear combination of the original two integers using auxiliary sequences s_j and t_j.
Common confusion: "mutually relatively prime" vs "pairwise relatively prime"—the former only requires gcd of all together = 1; the latter requires gcd = 1 for every pair.
Practical outcome: the last non-zero remainder in the sequence is the gcd.

🔑 Foundational lemma

🔑 The key substitution property

Lemma 1.6.1: If a, b, q, r are integers and a = qb + r, then gcd(a, b) = gcd(r, b).

This lemma is the heart of why the algorithm works.
It says you can replace the larger number with the remainder without changing the gcd.
Why it matters: each division step produces a smaller remainder, and the gcd stays the same throughout.
Example: if you divide a by b and get remainder r, finding gcd(a, b) is equivalent to finding gcd(b, r).

🔄 The algorithm structure

🔄 Successive division steps

The algorithm starts with two natural numbers a ≥ b and performs repeated divisions:

Set r₀ = a and r₁ = b.
Apply division algorithm: r_j = r_{j+1} · q_{j+1} + r_{j+2} where 0 ≤ r_{j+2} < r_{j+1}.
Continue until some remainder r_{n+1} = 0.
The last non-zero remainder r_n is the gcd.

Concrete sequence:

r₀ = q₁ · r₁ + r₂
r₁ = q₂ · r₂ + r₃
r₂ = q₃ · r₃ + r₄
...
r_{n-1} = q_n · r_n (remainder is zero)

⏹️ Why the algorithm terminates

Every new remainder is strictly smaller than the previous one: r_{j+2} < r_{j+1}.
All remainders are non-negative integers.
A strictly decreasing sequence of non-negative integers must reach zero in finitely many steps.

🧮 Worked example

To find gcd(4147, 10672):

10672 = 4147 · 2 + 2378
4147 = 2378 · 1 + 1769
2378 = 1769 · 1 + 609
1769 = 609 · 2 + 551
609 = 551 · 1 + 58
551 = 58 · 9 + 29
58 = 29 · 2 (remainder 0)

Result: gcd(4147, 10672) = 29 (the last non-zero remainder).

🔧 Extended Euclidean Algorithm

🔧 Auxiliary sequences s_j and t_j

The extended version tracks additional sequences:

Initialize: s₀ = 1, s₁ = 0, t₀ = 0, t₁ = 1.
Recurrence: s_{j+1} = s_{j-1} - q_{j+1} · s_j and t_{j+1} = t_{j-1} - q_{j+1} · t_j.
These coefficients allow expressing the gcd as a linear combination.

🎯 Linear combination result

Theorem 1.6.2 (extended part): gcd(a, b) = r_n = s_{n+1} · a + t_{n+1} · b.

The gcd can be written as an integer linear combination of the original two numbers.
The excerpt notes that the proof of this linear-combination claim is left as an exercise.
Don't confuse: the basic Euclidean Algorithm only finds the gcd; the extended version also finds the coefficients s and t.

🔀 Relatively prime concepts (background)

🔀 Mutually vs pairwise relatively prime

The excerpt includes definitions that clarify common confusion:

Term	Definition	Example from excerpt
Mutually relatively prime	gcd(a₁, a₂, ..., a_n) = 1 (gcd of all together)	3, 6, 7 are mutually relatively prime since gcd(3,6,7)=1, even though gcd(3,6)=3
Pairwise relatively prime	gcd(a_i, a_j) = 1 for every pair i ≠ j	3, 14, 25 are pairwise relatively prime (every pair has gcd=1)

➡️ Implication direction

Proposition 1.6.15: If integers are pairwise relatively prime, then they are mutually relatively prime.

The reverse is not true: mutually relatively prime does not imply pairwise relatively prime.
Example: 3, 6, 7 are mutually relatively prime but not pairwise (since gcd(3,6)=3 ≠ 1).

Introduction to Congruences

2.1. Introduction to Congruences

🧭 Overview

🧠 One-sentence thesis

Congruences, introduced by Gauss, provide a powerful framework for reasoning about divisibility that behaves much like ordinary equality but with important differences when dividing.

📌 Key points (3–5)

What a congruence is: a statement that n divides the difference (a − b), written a ≡ b (mod n).
How congruences behave like equality: they satisfy symmetry, transitivity, and can be added, subtracted, and multiplied on both sides.
Key difference from equality: dividing both sides by the same number does not always preserve the congruence—it depends on the greatest common divisor.
Common confusion: when canceling/dividing, you must check gcd(a, n); if gcd(a, n) = 1, you can cancel a from both sides safely, but otherwise the modulus changes to n/d.
Technical tools: Euclid's Lemma and counting solutions modulo n provide essential techniques for working with congruences.

🔤 Definition and basic examples

🔤 What congruence means

Definition 2.1.1: Given a, b ∈ Z and n ∈ N, we say that a is congruent to b modulo n if n | (a − b), i.e., if there exists k ∈ Z such that a = b + kn. We write a ≡ b (mod n).

In plain language: a and b are congruent modulo n when their difference is a multiple of n.
Equivalently: a and b leave the same remainder when divided by n.
Example: 19 ≡ 5 (mod 7) because 19 − 5 = 14 = 2 · 7.
Example: every odd number 2k + 1 is congruent to 1 modulo 2, because (2k + 1) − 1 = 2k is divisible by 2.

🔍 How to check congruence

Check whether n divides (a − b).
Or: write a = b + kn for some integer k.
Don't confuse: congruence is not equality; 19 ≠ 5, but 19 ≡ 5 (mod 7).

⚖️ Properties that mirror equality

⚖️ Symmetry and transitivity (Theorem 2.1.3, parts 1–2)

Property	Statement	Why it works
Symmetry	If a ≡ b (mod n), then b ≡ a (mod n)	If n divides (a − b), then n divides (b − a) = −(a − b)
Transitivity	If a ≡ b (mod n) and b ≡ c (mod n), then a ≡ c (mod n)	If a = b + kn and b = c + ln, then a = c + (k + l)n

These properties mean congruence is an equivalence relation.
Example: 22 ≡ 10 (mod 6) and 10 ≡ 4 (mod 6), so 22 ≡ 4 (mod 6).

➕ Adding and subtracting (Theorem 2.1.3, parts 3–4, 7–8)

Adding the same number to both sides: If a ≡ b (mod n), then a + c ≡ b + c (mod n).
- Proof idea: n | (a − b) implies n | ((a + c) − (b + c)).
- Example: 50 ≡ 20 (mod 15), so 50 + 5 = 55 ≡ 20 + 5 = 25 (mod 15).
Subtracting the same number: If a ≡ b (mod n), then a − c ≡ b − c (mod n).
- Example: 50 ≡ 20 (mod 15), so 50 − 5 = 45 ≡ 20 − 5 = 15 (mod 15).
Adding two congruences: If a ≡ b (mod n) and c ≡ d (mod n), then a + c ≡ b + d (mod n).
- Proof idea: (a − b) + (c − d) = (a + c) − (b + d) is divisible by n.
- Example: 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), so 19 + 17 = 36 ≡ 3 + 9 = 12 (mod 8).
Subtracting two congruences: If a ≡ b (mod n) and c ≡ d (mod n), then a − c ≡ b − d (mod n).
- Example: 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), so 19 − 17 = 2 ≡ 3 − 9 = −6 (mod 8).

✖️ Multiplying (Theorem 2.1.3, parts 5–6, 9)

Multiplying both sides by the same number: If a ≡ b (mod n), then ac ≡ bc (mod n).
- Proof idea: if a − b = kn, then ac − bc = (kc)n.
- Example: 19 ≡ 16 (mod 3), so 2(19) = 38 ≡ 2(16) = 32 (mod 3).
Multiplying both sides and the modulus: If c > 0 and a ≡ b (mod n), then ac ≡ bc (mod nc).
- Proof idea: ac − bc = (k)cn, so nc divides the difference.
- Example: 19 ≡ 16 (mod 3), so 38 ≡ 32 (mod 6).
Multiplying two congruences: If a ≡ b (mod n) and c ≡ d (mod n), then ac ≡ bd (mod n).
- Proof idea: ca − cb = (ck)n and bc − bd = (bl)n, so ac − bd = (kc + bl)n.
- Example: 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), so 19 · 17 = 323 ≡ 3 · 9 = 27 (mod 8).

⚠️ The key difference: dividing/canceling

⚠️ When you can cancel (Theorem 2.1.7)

Theorem 2.1.7: Given a, b, c ∈ Z and n ∈ N, define d = gcd(a, n). If ab ≡ ac (mod n), then b ≡ c (mod n/d).

In plain language: you can "cancel" a from both sides, but the modulus may shrink from n to n/d.
Special case: If gcd(a, n) = 1, then ab ≡ ac (mod n) if and only if b ≡ c (mod n).
- When a and n are relatively prime, you can cancel a without changing the modulus.
- Example: 38 ≡ 10 (mod 7). Since gcd(2, 7) = 1, we have 19 ≡ 5 (mod 7).
Why it works: If ab ≡ ac (mod n), then n | a(b − c). Dividing by d gives (n/d) | (a/d)(b − c). Since gcd(a/d, n/d) = 1, Euclid's Lemma tells us (n/d) | (b − c).

🚫 Don't confuse with ordinary division

In ordinary equations, dividing both sides by a nonzero number preserves equality.
In congruences, dividing both sides by the same integer does not necessarily preserve the congruence unless you adjust the modulus.
Example scenario: If 6x ≡ 6y (mod 9), you cannot simply conclude x ≡ y (mod 9). Instead, since gcd(6, 9) = 3, you get x ≡ y (mod 9/3) = x ≡ y (mod 3).

🔧 Technical tools

🔧 Euclid's Lemma (Lemma 2.1.6)

Euclid's Lemma: Given x, y, z ∈ Z, if x | yz and gcd(x, y) = 1, then x | z.

In plain language: if x divides a product yz and x shares no common factor with y, then x must divide z.
Proof idea: Since gcd(x, y) = 1, there exist m, n such that mx + ny = 1. Multiply by z: mxz + nyz = z. Since x divides both mxz and nyz, x divides their sum z.
Why it matters: this lemma is used repeatedly in proofs about congruences, especially when canceling factors.

🔧 Divisibility by coprime factors (Theorem 2.1.4)

Theorem 2.1.4: Given a, b, c ∈ Z, if a | c, b | c, and a and b are relatively prime, then ab | c.

In plain language: if two coprime numbers both divide c, then their product divides c.
Proof idea: Write c = pa and c = qb. Since gcd(a, b) = 1, there exist m, n with ma + nb = 1. Then c = c(ma + nb) = mca + ncb = mqba + npab = (mq + np)ab.
Example scenario: If 3 divides c and 5 divides c, and gcd(3, 5) = 1, then 15 divides c.

🔧 Counting solutions modulo n (Theorem 2.1.9)

Theorem 2.1.9: Given n, d ∈ N such that d | n, there are exactly d values x ∈ Z, up to congruence modulo n, satisfying x ≡ 0 (mod n/d).

In plain language: the number of distinct solutions modulo n to x ≡ 0 (mod n/d) is exactly d.
The d solutions are x₀ = 0, x₁ = n/d, x₂ = 2(n/d), ..., x_(d−1) = (d − 1)(n/d).
Proof idea: Any solution x is a multiple of n/d. Use the Division Algorithm: x = qn + r with 0 ≤ r < n. Then r is also a multiple of n/d in the range [0, n), so r must be one of the d values listed above.
Why it matters: this result is useful for understanding the structure of solutions to congruences.

📝 Linear congruences (preview)

📝 What a linear congruence is

Definition 2.2.1: Given constants a, b ∈ Z and n ∈ Z, a congruence of the form ax ≡ b (mod n) where x ∈ Z is unknown is called a linear congruence in one variable.

This is the congruence analogue of the linear equation ax = b.
Example: 3x ≡ 7 (mod 10) is a linear congruence.

📝 Infinitely many solutions (Theorem 2.2.2)

If x is a solution to ax ≡ b (mod n), then any x′ ≡ x (mod n) is also a solution.
In other words: if you find one solution, you automatically have infinitely many (all differing by multiples of n).
Proof idea: If ax ≡ b (mod n) and x′ ≡ x (mod n), then ax′ ≡ ax ≡ b (mod n).

📝 Connection to Diophantine equations

Definition 2.2.3: An algebraic equation whose constants and variables are all integers is called a Diophantine equation.

The linear congruence ax ≡ b (mod n) is equivalent to the linear Diophantine equation ax − ny = b in two unknowns x and y.
Historically, before Gauss introduced congruence notation, people worked directly with Diophantine equations.

📝 Existence of solutions (Theorem 2.2.4, partial)

The excerpt states that the characterization depends on d = gcd(a, n):
- If d does not divide b, then the congruence has no solutions.
- If d divides b, then... (the excerpt cuts off here).
This preview indicates that the gcd plays a crucial role in determining whether solutions exist.

Linear Congruences

2.2. Linear Congruences

🧭 Overview

🧠 One-sentence thesis

Linear congruences—equations of the form ax ≡ b (mod n)—have either no solutions or exactly d solutions modulo n (where d = gcd(a, n)), and when a and n are relatively prime, the solution is unique and defines a modular inverse.

📌 Key points (3–5)

What a linear congruence is: an equation ax ≡ b (mod n) where x is unknown, analogous to linear equations in algebra but using congruence instead of equality.
When solutions exist: the congruence has solutions if and only if gcd(a, n) divides b; otherwise there are no solutions.
How many solutions: if d = gcd(a, n) divides b, there are exactly d solutions that are distinct modulo n.
Common confusion: if one solution exists, infinitely many exist—but only d of them are distinct modulo n (the rest are congruent to one of these d).
Special case—modular inverses: when gcd(a, n) = 1, the congruence ax ≡ 1 (mod n) has a unique solution called the inverse of a modulo n, denoted a⁻¹.

🔍 What linear congruences are

🔍 Definition and analogy to algebra

Linear congruence in one variable: Given constants a, b ∈ Z and n ∈ Z, a congruence of the form ax ≡ b (mod n) where x ∈ Z is unknown.

Because congruence is analogous to equality, linear congruences are the natural analogue of linear equations in algebra.
Instead of solving ax = b for equality, we solve ax ≡ b (mod n) for congruence.
The excerpt notes that historically, before Gauss, mathematicians worked with Diophantine equations (algebraic equations with integer constants and variables); the modern linear congruence ax ≡ b (mod n) is equivalent to the linear Diophantine equation ax - ny = b in two unknowns x and y.

♾️ Infinitely many solutions from one

Theorem 2.2.2: Given constants a, b ∈ Z, n ∈ Z, and a solution x ∈ Z to the linear congruence ax ≡ b (mod n), any other x′ ∈ Z satisfying x′ ≡ x (mod n) is also a solution of the same congruence.

If you find one solution x, then x + n, x + 2n, x + 3n, ... are all solutions.
This means: if a linear congruence has one solution, it has infinitely many.
Don't confuse: "infinitely many" does not mean "infinitely many distinct modulo n"—many of these solutions are congruent to each other modulo n.

🧮 Existence and number of solutions

🧮 The main characterization theorem

Theorem 2.2.4: Let a, b ∈ Z and n ∈ N and consider the linear congruence ax ≡ b (mod n). Setting d = gcd(a, n), we have:

If d does not divide b, then the congruence has no solutions.
If d divides b, then the congruence has exactly d solutions which are distinct modulo n.

The key quantity is d = gcd(a, n).
Existence condition: solutions exist if and only if d | b.
Count: when solutions exist, there are exactly d of them that are distinct modulo n.

🔧 Why this works (proof sketch)

Part 1 (no solutions when d ∤ b):

The proof uses the contrapositive: assume a solution exists, meaning there exist x, k ∈ Z such that ax - b = kn, or ax - kn = b.
Since d = gcd(a, n) divides both a and n, it must divide any linear combination ax - kn = b.
Hence d | b. Contrapositive: if d does not divide b, no solution exists.

Part 2 (d solutions when d | b):

Assume d | b, so there exists k ∈ Z such that kd = b.
From Theorem 1.5.8 (Bézout's identity), there exist p, q ∈ Z such that d = pa + qn.
Multiply by k: kpa + kqn = kd = b, or a(kp) - b = (-kq)n.
Hence n | a(kp) - b, i.e., a(kp) ≡ b (mod n), so x = kp is one solution.
Let y be any other solution. Then ax ≡ b ≡ ay (mod n), so x ≡ y (mod n/d) (by Theorem 2.1.7, part 2).
Setting δ = y - x, we have δ ≡ 0 (mod n/d).
By Theorem 2.1.9, there are exactly d possibilities for δ modulo n.
Thus there are d solutions of the form x + δ.

📋 Example: solving 3x ≡ 12 (mod 6)

Example 2.2.6:

Find all solutions of 3x ≡ 12 (mod 6).
Notice gcd(3, 6) = 3 and 3 | 12, so there are three incongruent solutions modulo 6.
Using the Euclidean Algorithm to solve 3x - 6y = 12, we get a solution x₀ = 6.
The three incongruent (modulo 6) solutions are:
- x₀ = 6 (mod 6) ≡ 0 (mod 6)
- x₁ = 6 + 2 = 8 ≡ 2 (mod 6)
- x₂ = 6 + 4 = 10 ≡ 4 (mod 6)

🔑 Modular inverses

🔑 The special case when gcd(a, n) = 1

Remark 2.2.5: If a ∈ Z and n ∈ N are relatively prime, then for all b ∈ Z there is a unique solution modulo n to the equation ax ≡ b (mod n).

When gcd(a, n) = 1, the number of solutions d = 1, so the solution is unique modulo n.
This special case is important enough to deserve its own definition.

🔑 Definition of modular inverse

Inverse of a modulo n: Given a ∈ Z and n ∈ N with gcd(a, n) = 1, a solution to the congruence ax ≡ 1 (mod n) is called the inverse of a modulo n, denoted a⁻¹ (with n understood from context).

Corollary 2.2.8: Given a ∈ Z and n ∈ N which are relatively prime, the modular inverse a⁻¹ exists and is unique modulo n.

This follows directly from Remark 2.2.5 with b = 1.
The inverse is unique modulo n, meaning there is exactly one residue class modulo n that serves as the inverse.

🔑 Example and property of inverses

Example 2.2.9: The modular inverse 7⁻¹ of 7 modulo 48 is 7. A solution of 7x ≡ 1 (mod 48) is x ≡ 7 (mod 48).

Exercise 2.2.4 (property of inverses): Given a ∈ Z and n ∈ N, if a⁻¹ is the inverse of a modulo n and b⁻¹ is the inverse of b modulo n, then a⁻¹b⁻¹ is the inverse of ab modulo n.

This shows that modular inverses behave like reciprocals: the inverse of a product is the product of the inverses.

📊 Summary table

Condition	Number of solutions modulo n	Notes
d = gcd(a, n) does not divide b	0	No solutions exist
d = gcd(a, n) divides b	exactly d	d distinct solutions modulo n
gcd(a, n) = 1 (relatively prime)	exactly 1	Unique solution; defines modular inverse when b = 1

The Chinese Remainder Theorem

2.3. The Chinese Remainder Theorem

🧭 Overview

🧠 One-sentence thesis

The Chinese Remainder Theorem guarantees that a system of congruences with pairwise relatively prime moduli has exactly one solution (modulo the product of all moduli), and provides a constructive method to find it.

📌 Key points (3–5)

What the theorem solves: systems of congruences with different moduli (e.g., find a number leaving remainder 1 mod 2, remainder 2 mod 3, remainder 3 mod 5).
When a unique solution exists: the moduli must be pairwise relatively prime (any two share no common factors).
How the solution is unique: the solution is unique modulo N, where N is the product of all moduli.
Common confusion: the solution is not unique as an integer—many integers satisfy the system—but all solutions are congruent modulo N (they differ by multiples of N).
Construction method: the proof provides an explicit formula using modular inverses to build the solution.

🔑 The theorem statement

🔑 What system it solves

The Chinese Remainder Theorem addresses systems like:

x ≡ b₁ (mod n₁)
x ≡ b₂ (mod n₂)
...
x ≡ bₖ (mod nₖ)

where you are given k integers b₁, ..., bₖ and k natural numbers n₁, ..., nₖ.

Example scenario: Find a number that leaves remainder 1 when divided by 2, remainder 2 when divided by 3, and remainder 3 when divided by 5.

✅ Conditions for a solution

The system has a solution x if the moduli n₁, n₂, ..., nₖ are pairwise relatively prime.

Pairwise relatively prime means gcd(nᵢ, nⱼ) = 1 for every pair i ≠ j.
This is stronger than just saying the gcd of all moduli together is 1.
Without this condition, the theorem does not guarantee a solution.

🎯 Uniqueness of the solution

The solution exists as an integer x.
The solution is unique modulo N, where N = n₁ · n₂ · ... · nₖ (the product of all moduli).
This means: if x and y both solve the system, then x ≡ y (mod N), or equivalently N divides (x − y).
Don't confuse: "unique modulo N" does not mean there is only one integer solution; it means all integer solutions differ by multiples of N.

🔨 How to construct the solution

🔨 The construction formula

The proof provides an explicit construction:

Let N = n₁ · n₂ · ... · nₖ (product of all moduli).
For each j from 1 to k, define Nⱼ = N / nⱼ (the product of all moduli except nⱼ).
Since the moduli are pairwise relatively prime, gcd(Nⱼ, nⱼ) = 1, so by Corollary 2.2.8 there exists a modular inverse yⱼ such that Nⱼ · yⱼ ≡ 1 (mod nⱼ).
The solution is: x = b₁N₁y₁ + b₂N₂y₂ + ... + bₖNₖyₖ.

🧩 Why the construction works

For any i ≠ j, Nⱼ is divisible by nᵢ (since Nⱼ includes nᵢ as a factor), so Nⱼ ≡ 0 (mod nᵢ).
Therefore, when computing x modulo nⱼ, all terms except bⱼNⱼyⱼ vanish:
- x ≡ bⱼNⱼyⱼ (mod nⱼ).
By construction, Nⱼyⱼ ≡ 1 (mod nⱼ), so:
- x ≡ bⱼ · 1 ≡ bⱼ (mod nⱼ).
This holds for every j, so x satisfies all congruences in the system.

🔒 Why the solution is unique modulo N

Suppose x and y are both solutions.
Then x ≡ bⱼ ≡ y (mod nⱼ) for every j, which means nⱼ divides (x − y) for all j.
Since the moduli are pairwise relatively prime, Theorem 2.1.4 (applied k times) implies that their product N also divides (x − y).
Therefore x ≡ y (mod N).

📝 Worked example

📝 Solving the introductory system

The excerpt solves:

x ≡ 1 (mod 2)
x ≡ 2 (mod 3)
x ≡ 3 (mod 5)

Step 1: Compute N = 2 · 3 · 5 = 30.

Step 2: Compute Nⱼ for each modulus:

N₁ = 30 / 2 = 15
N₂ = 30 / 3 = 10
N₃ = 30 / 5 = 6

Step 3: Find modular inverses yⱼ:

Solve 15y₁ ≡ 1 (mod 2): y₁ ≡ 1 (mod 2)
Solve 10y₂ ≡ 1 (mod 3): y₂ ≡ 1 (mod 3)
Solve 6y₃ ≡ 1 (mod 5): y₃ ≡ 1 (mod 5)

Step 4: Compute x:

x = 1 · 15 · 1 + 2 · 10 · 1 + 3 · 6 · 1 = 15 + 20 + 18 = 53
Reduce modulo 30: x ≡ 23 (mod 30).

Result: The solution is x ≡ 23 (mod 30), meaning any integer of the form 23 + 30k (for integer k) solves the system.

🧮 Background: modular inverses

🧮 What a modular inverse is

Given a ∈ Z and n ∈ N, a solution to the congruence ax ≡ 1 (mod n) for gcd(a, n) = 1 is called the inverse of a modulo n, denoted a⁻¹.

The inverse exists and is unique modulo n when a and n are relatively prime (Corollary 2.2.8).
This is essential for the Chinese Remainder Theorem construction, which requires finding yⱼ = Nⱼ⁻¹ (mod nⱼ).

🧮 Example from the excerpt

The modular inverse of 7 modulo 48 is 7, because 7 · 7 = 49 ≡ 1 (mod 48).

Another Way to Work with Congruences: Equivalence Classes

2.4. Another Way to Work with Congruences: Equivalence Classes

🧭 Overview

🧠 One-sentence thesis

Congruence classes provide an alternative framework for working with congruences by organizing integers into equivalence classes under modular arithmetic, which allows arithmetic operations to be performed directly on these classes.

📌 Key points (3–5)

Equivalence relations are relations satisfying reflexivity, symmetry, and transitivity; congruence modulo n is one such relation.
Congruence classes partition the integers into n distinct classes, with representatives 0 through n−1, forming the set Z/nZ.
Arithmetic on classes is well-defined: addition and multiplication can be performed on congruence classes independently of which representatives are chosen.
Common confusion: the operations are defined on classes (sets of integers), not individual integers; different representatives of the same class yield the same result.
Key property: an element [a] in Z/nZ has a multiplicative inverse if and only if gcd(a, n) = 1.

🔑 Equivalence relations and classes

🔑 What is an equivalence relation

Equivalence relation: A relation ∼= on a set S that satisfies three properties: reflexivity (every element is related to itself), symmetry (if x relates to y, then y relates to x), and transitivity (if x relates to y and y relates to z, then x relates to z).

Reflexivity: For all x in S, x ∼= x.
Symmetry: For all x, y in S, x ∼= y if and only if y ∼= x.
Transitivity: For all x, y, z in S, if x ∼= y and y ∼= z, then x ∼= z.

These three properties together ensure the relation behaves like "sameness" in some respect.

📦 Equivalence classes

Equivalence class of x: The set [x] = {y in S | y ∼= x}, containing all elements related to x.

Every element x belongs to its own equivalence class [x] (by reflexivity).
For any two elements x and y, their equivalence classes are either identical or completely disjoint—they cannot partially overlap.
Why this matters: Equivalence classes partition the entire set S into non-overlapping subsets.

Proof sketch of the disjoint-or-equal property: If some element z belongs to both [x] and [y], then z ∼= x and z ∼= y. By symmetry and transitivity, this forces x ∼= y, which means every element in [x] is also in [y] and vice versa, so [x] = [y].

🔄 Representative of a class

Representative: Any element r in S such that C = [r] for some equivalence class C.

A single equivalence class can have many representatives—any element in the class can serve as a representative.
Example: The rational numbers Q can be described as equivalence classes of pairs (n, m) where (a, b) ∼= (c, d) if ad = cb; different pairs like (1, 2) and (2, 4) represent the same rational number.

🔢 Congruence as an equivalence relation

🔢 Congruence modulo n

The excerpt defines a specific equivalence relation on the integers:

a ∼= b (under congruence mod n) if and only if a ≡ b (mod n).

This relation satisfies reflexivity, symmetry, and transitivity, making it an equivalence relation.
Don't confuse: The relation ∼= is defined by congruence, but the framework of equivalence classes is a general mathematical structure that applies to many different relations.

🏷️ Congruence class

Congruence class of a mod n: The equivalence class [a]_n = {b in Z | b ≡ a (mod n)}, written [a] when n is clear from context.

The set of all congruence classes is called the integers mod n, written Z/nZ (or Z/n or Z_n by some authors).
Example: For n = 5, the congruence class [2]_5 contains {..., −8, −3, 2, 7, 12, 17, ...}—all integers leaving remainder 2 when divided by 5.

📊 Structure of Z/nZ

Theorem: Z/nZ has exactly n elements, with representatives 0, 1, 2, ..., n−1.

Property	Explanation
Number of classes	Exactly n distinct equivalence classes
Standard representatives	0, 1, 2, ..., n−1 (one for each remainder)
Why these work	By the Division Algorithm, every integer a can be written as qn + r with 0 ≤ r < n, so a ≡ r (mod n)
Uniqueness	Each integer belongs to exactly one of these n classes

Proof idea: The Division Algorithm guarantees every integer a has a unique remainder r in {0, ..., n−1} when divided by n. Since a ≡ r (mod n), every integer belongs to [r] for some r in this range. Since each r belongs only to [r], the n classes [0], [1], ..., [n−1] are all distinct.

➕ Arithmetic on congruence classes

➕ Defining operations

Addition and multiplication on Z/nZ: Given congruence classes C and D, define C + D = [a + b] and C · D = [a · b], where a and b are any representatives of C and D, respectively.

The key word is "any"—you can pick any representative from each class.
Example: In Z/5Z, to compute [3] + [4], you can use 3 + 4 = 7, so [3] + [4] = [7] = [2] (since 7 ≡ 2 mod 5). Or you could use 8 (another representative of [3]) and 9 (another representative of [4]): 8 + 9 = 17, and [17] = [2] as well.

✅ Well-definedness

Theorem: The operations + and · on Z/nZ are well-defined—they do not depend on the choice of representatives.

Why this matters: If you pick different representatives for the same classes, you must get the same result. Otherwise, the operations would be ambiguous and unusable.

Proof sketch: Suppose C = [a] = [p] and D = [b] = [q]. Then a ≡ p (mod n) and b ≡ q (mod n). By properties of congruence (from earlier sections), a + b ≡ p + q (mod n) and a · b ≡ p · q (mod n). Therefore [a + b] = [p + q] and [a · b] = [p · q], so the result is the same regardless of which representatives are used.

Don't confuse: Well-definedness is not automatic—it must be proven. The proof relies on the fact that congruence "respects" addition and multiplication.

🧮 Properties of arithmetic in Z/nZ

🧮 Basic algebraic properties

Theorem: Addition and multiplication on Z/nZ satisfy:

Commutativity and associativity: Both operations can be reordered and regrouped.
Distributivity: Multiplication distributes over addition.
Identities: [0] is the additive identity; [1] is the multiplicative identity.
Additive inverses: Every element [a] has an additive inverse [−a] (equivalently, [n − a]).
Multiplicative inverses: An element [a] has a multiplicative inverse [a]^(−1) if and only if gcd(a, n) = 1; when it exists, the inverse is unique.

Why these matter: Properties 1–4 make Z/nZ behave like the integers. Property 5 is special: unlike the integers, some elements in Z/nZ do have multiplicative inverses.

🔍 Multiplicative inverses

When they exist: [a] has an inverse in Z/nZ if and only if gcd(a, n) = 1.
What this means: If gcd(a, n) = 1, there exists [b] such that [a] · [b] = [1].
Example: In Z/7Z, [3] has an inverse because gcd(3, 7) = 1. Indeed, 3 · 5 = 15 ≡ 1 (mod 7), so [3]^(−1) = [5].
Don't confuse: In the integers Z, only 1 and −1 have multiplicative inverses. In Z/nZ, many more elements can have inverses, depending on n.

🧩 Solving equations in Z/nZ

Theorem: Given a, b in Z and n in N, let d = gcd(a, n). Consider the equation [a] · x = [b] where x is in Z/nZ.

Condition	Number of solutions
d does not divide b	0 solutions
d divides b	Exactly d solutions

Why this works: This restates an earlier theorem about linear congruences in the language of congruence classes. The solvability and number of solutions depend on the greatest common divisor.

Example scenario: To solve [6] · x = [4] in Z/9Z, compute d = gcd(6, 9) = 3. Since 3 divides 4? No, 3 does not divide 4, so there are no solutions. But for [6] · x = [3] in Z/9Z, since 3 divides 3, there are exactly 3 solutions.

Euler's φ Function

2.5. Euler’s φ Function

🧭 Overview

🧠 One-sentence thesis

Euler's φ function counts invertible elements in modular arithmetic and exhibits a multiplicative property that can be proven using the Chinese Remainder Theorem.

📌 Key points (3–5)

What φ(n) counts: the number of non-negative integers less than n that are relatively prime to n (i.e., gcd equals 1).
Connection to invertibility: φ(n) equals the number of multiplicatively invertible elements in Z/nZ.
Multiplicative property: when gcd(n, m) = 1, then φ(nm) = φ(n)·φ(m).
Common confusion: φ is only multiplicative for relatively prime inputs—the condition gcd(n, m) = 1 is essential.
Proof technique: the multiplicative property is proven by constructing a bijection using the Chinese Remainder Theorem.

🔢 Definition and basic meaning

🔢 What φ(n) counts

Euler's φ function (Euler's totient function): Given n ∈ N, φ(n) equals the count of integers m where 0 ≤ m < n and gcd(m, n) = 1.

The function counts how many non-negative integers less than n are relatively prime to n.
"Totient" rhymes with "quotient"; the name was given by mathematician Sylvester.
Example: to find φ(n), list all integers from 0 to n−1, then count only those that share no common divisor with n (other than 1).

🔗 Connection to invertible elements

Theorem: Given n ∈ N, φ(n) is the number of elements of Z/nZ which are multiplicatively invertible.

This follows from an earlier result (Theorem 2.4.9 part 5) about when congruence classes have multiplicative inverses.
An element [m] in Z/nZ is invertible if and only if gcd(m, n) = 1.
Don't confuse: φ(n) counts invertible elements, not all elements—Z/nZ has n elements total, but only φ(n) of them are invertible.

✖️ Multiplicative property

✖️ The main theorem

Theorem: Given n, m ∈ Z, if gcd(n, m) = 1 then φ(nm) = φ(n)·φ(m).

The function φ is multiplicative, but only for relatively prime inputs.
The condition gcd(n, m) = 1 is essential—without it, the equality does not hold.
Example: if n and m share no common factors, then the number of integers relatively prime to nm equals the product of those relatively prime to n and those relatively prime to m separately.

🎯 Why the condition matters

The proof relies on the Chinese Remainder Theorem, which requires gcd(n, m) = 1.
Without relative primality, the bijection constructed in the proof breaks down.
Don't confuse: φ(nm) ≠ φ(n)·φ(m) in general—only when n and m are coprime.

🔍 Proof strategy using bijections

🔍 Setting up the correspondence

The proof constructs a bijection between two sets of the same size:

Define (Z/kZ)* as the set of invertible elements in Z/kZ, so φ(k) = #((Z/kZ)*).
Define the Cartesian product (Z/nZ)* × (Z/mZ)* as all pairs (a, b) where a is invertible in Z/nZ and b is invertible in Z/mZ.
The number of such pairs is #((Z/nZ)) · #((Z/mZ)) = φ(n)·φ(m), since each component is chosen independently.

🔗 The bijection function

The function F maps from (Z/(nm)Z)* to (Z/nZ)* × (Z/mZ)*:

F([x]_nm) = ([x]_n, [x]_m)
In words: take a congruence class modulo nm and map it to the pair of its congruence classes modulo n and modulo m.
If F is a bijection, then the two sets have the same size, so φ(nm) = φ(n)·φ(m).

✅ Proving F is well-defined

Well-defined means the function's output depends only on the congruence class, not on the choice of representative:

If y is another representative of [x]_nm, then y ≡ x (mod nm).
This means y − x = k(nm) for some integer k.
Rewriting: y − x = (km)n = (kn)m, so y ≡ x (mod n) and y ≡ x (mod m).
Therefore [x]_n = [y]_n and [x]_m = [y]_m, so F([x]_nm) is the same regardless of representative.

✅ Proving F is one-to-one (injective)

One-to-one means different inputs produce different outputs:

Assume F([x]_nm) = F([y]_nm), i.e., [x]_n = [y]_n and [x]_m = [y]_m.
Let z = x − y. Then z ≡ 0 (mod n) and z ≡ 0 (mod m).
By the Chinese Remainder Theorem, this system has a unique solution modulo nm.
Since z = 0 also solves the system, we have z ≡ 0 (mod nm), so x ≡ y (mod nm).
Therefore [x]_nm = [y]_nm, proving F is one-to-one.

✅ Proving F is onto (surjective)

Onto means every element in the target set is hit by some input:

Given any pair ([x]_n, [y]_m) in (Z/nZ)* × (Z/mZ)*, consider the system:
- z ≡ x (mod n)
- z ≡ y (mod m)
Since gcd(n, m) = 1, the Chinese Remainder Theorem guarantees a solution z.
Then F([z]_nm) = ([x]_n, [y]_m), so every pair is reached.
Therefore F is onto.

🎓 Conclusion of the proof

F is well-defined, one-to-one, and onto, so it is a bijection.
Bijective sets have the same number of elements.
Therefore φ(n)·φ(m) = #((Z/nZ)* × (Z/mZ)) = #((Z/(nm)Z)) = φ(nm).

📝 Exercises and patterns

📝 Exploring prime inputs

The exercises ask to compute φ(n) for n = 2, 3, 5, 7, 11, 13, 17 and make a conjecture:

All these values are prime numbers.
The pattern suggests a formula for φ(p) when p is prime.
Hint: if p is prime, every integer from 1 to p−1 is relatively prime to p.

📝 Exploring powers of 2

The exercises ask to compute φ(n) for n = 2, 4, 8, 16, 32, 64 and conjecture a formula for φ(2^k):

These are successive powers of 2.
The pattern should reveal a formula for φ(2^k) in terms of k.
This is a special case of the multiplicative property applied repeatedly.

Basics and the Fundamental Theorem of Arithmetic

3.1. Basics and the FTA

🧭 Overview

🧠 One-sentence thesis

Every natural number greater than 1 can be written as a product of primes in exactly one way (up to reordering), making primes the fundamental building blocks of all integers.

📌 Key points (3–5)

What primes are: natural numbers greater than 1 divisible only by 1 and themselves; composites are natural numbers greater than 1 that are not prime.
Testing for compositeness: if a number is composite, it must have a divisor no larger than its square root.
How primes divide products: if a prime divides a product, it must divide at least one of the factors (Euclid's Lemma for primes).
The FTA existence claim: every natural number ≥ 2 can be factored into primes.
The FTA uniqueness claim: the prime factorization is unique up to reordering—this is what makes primes "atoms" of the integers.

🔢 Prime and composite numbers

🔢 Definition of prime

A natural number p is prime if p > 1 and the only natural numbers that divide p are 1 and p.

Examples: 2, 3, 5, 7, 11, 13, 17.
The number 2 is the only even prime (any other even number is divisible by 2 and thus not prime).
The excerpt notes that 2 has unusual properties and jokes that "2 is the oddest prime."

🧱 Definition of composite

A natural number c is composite if c > 1 and c is not prime.

In other words, a composite number has divisors other than 1 and itself.
The excerpt uses the metaphor: primes are "atoms" and composites are "molecules" built from primes.

🔍 Testing for compositeness

Theorem 3.1.4: If n is composite, then it has a positive divisor d satisfying d ≤ √n.

Why this works: Suppose n has a divisor a. Then n = a · (n/a), so n/a is also a divisor.
Both a and n/a cannot be less than √n, because that would give n = a · (n/a) < √n · √n = n, a contradiction.
Practical implication: To check if n is composite, you only need to test divisors up to √n.
Example: To test if a number is composite, check divisibility by primes up to its square root; if none divide it, the number is prime.

🧩 How primes interact with products

🧩 Euclid's Lemma for primes

Proposition 3.1.5: Suppose p is prime and a, b are integers. If p divides ab, then p divides a or p divides b.

Why this is true: Since p is prime, gcd(p, a) is either 1 or p.
- If gcd(p, a) = p, then p divides a and we're done.
- If gcd(p, a) = 1, then by Euclid's Lemma (2.1.6), p must divide b.
Don't confuse: This property is special to primes; composite numbers do not always have this property.
Example: If a prime divides the product of Sender and Receiver's numbers, it must divide at least one of their individual numbers.

🔗 Extension to multiple factors

Corollary 3.1.6: If p is prime and p divides a product a₁ · a₂ · ... · aₖ, then p divides at least one of the factors aⱼ.

This follows by induction on the number of factors k.
The base case (k = 2) is Proposition 3.1.5.
This property is crucial for proving uniqueness in the Fundamental Theorem of Arithmetic.

🏛️ The Fundamental Theorem of Arithmetic

🏛️ Statement of the theorem

Theorem 3.1.7 (FTA): Let n be a natural number with n ≥ 2. Then:

Existence: There exist a natural number k and primes p₁, ..., pₖ such that n = p₁ · ... · pₖ.
Uniqueness: If q₁, ..., qₗ are also primes with n = q₁ · ... · qₗ, then l = k and the q factorization is just a reordering of the p factorization.

This theorem justifies calling primes the "atoms" of integers: every integer is built from primes in exactly one way.

✅ Proof of existence

The proof uses the Second Principle of Mathematical Induction:

Base case (n = 2): The number 2 is prime, so k = 1 and p₁ = 2 works.
Inductive step: Assume every integer k < n has a prime factorization. Now consider n:
- If n is prime, then k = 1 and p₁ = n works.
- If n is composite, it has a divisor d with 1 < d < n. Then n = d · (n/d), where both d and n/d are less than n.
- By the inductive hypothesis, both d and n/d have prime factorizations.
- Multiplying these factorizations gives a prime factorization of n.
Therefore, every n ≥ 2 has a prime factorization.

🔒 Proof of uniqueness

Suppose n has two prime factorizations: p₁ · ... · pₖ = n = q₁ · ... · qₗ.

Since p₁ divides the left side, it divides the right side q₁ · ... · qₗ.
By Corollary 3.1.6, p₁ divides one of the qⱼ.
Since both p₁ and qⱼ are prime, this means p₁ = qⱼ.
Cancel p₁ from the left and qⱼ from the right: p₂ · ... · pₖ = q₁ · ... · qⱼ₋₁ · qⱼ₊₁ · ... · qₗ.
Repeat this process. We cannot run out of primes on one side before the other, because that would make a product of primes equal to 1, which is impossible.
Therefore, k = l and the factorizations are reorderings of each other.

📊 Why uniqueness matters

Aspect	Implication
Atoms metaphor	Primes are the unique building blocks; every integer has exactly one "chemical formula"
gcd and lcm	Prime factorizations allow systematic computation of greatest common divisors and least common multiples
Theoretical foundation	Uniqueness is essential for many proofs in number theory

🧪 Related concepts

🧪 Square-free numbers

A number n > 1 is square-free if it is not divisible by the square of any natural number other than 1.

Equivalent characterization (Exercise 3.1.3): An integer n ≥ 1 is square-free if and only if it is the product of distinct primes.
Example: If an organization's ID number is square-free, its prime factorization uses each prime at most once.
Don't confuse: "square-free" means no repeated prime factors, not "not a perfect square" (though perfect squares are never square-free except 1).

🔗 Prime factorizations and gcd

Exercise 3.1.2 asks to relate the prime factorizations of a, b, and gcd(a, b).

The excerpt does not provide the answer, but the exercise suggests that gcd can be understood through prime factorizations.
This follows from the uniqueness part of the FTA: since factorizations are unique, the gcd is determined by the common prime factors.

Wilson's Theorem

3.2. Wilson’s Theorem

🧭 Overview

🧠 One-sentence thesis

Wilson's Theorem establishes that a number p (at least 2) is prime if and only if (p − 1)! is congruent to −1 modulo p, providing a theoretical primality test based on factorial congruences.

📌 Key points (3–5)

The theorem statement: p is prime if and only if (p − 1)! ≡ −1 (mod p).
Key lemma: A number n equals its own inverse mod p (where p is prime) if and only if n ≡ ±1 (mod p).
Proof strategy: Pair up inverses in (p − 1)! so that only 1 and (p − 1) remain unpaired, yielding the result −1.
Common confusion: The theorem works both directions—it's an "if and only if" statement, so the factorial congruence characterizes primes completely.
Practical limitation: While Wilson's Theorem gives a primality test, computing (n − 1)! is impractical for large n.

🔑 The foundational lemma

🔑 Self-inverse elements modulo a prime

Lemma 3.2.1: Let p be a prime. Then n ∈ ℕ equals its own inverse mod p if and only if p divides n + 1 or p divides n − 1, i.e., n ≡ ±1 (mod p).

What it means:

"Equals its own inverse mod p" means n · n ≡ 1 (mod p), or n² ≡ 1 (mod p).
This happens precisely when n ≡ 1 (mod p) or n ≡ −1 (mod p).

Why it's true:

If n² ≡ 1 (mod p), then p divides n² − 1 = (n + 1)(n − 1).
Since p is prime, it must divide one of the factors: either p | (n + 1) or p | (n − 1).
Conversely, if n ≡ ±1 (mod p), then n² ≡ (±1)² = 1 (mod p).

Why it matters:

This lemma identifies exactly which elements in the factorial product cannot be paired with distinct inverses.
Only 1 and (p − 1) are self-inverse modulo p; all other elements from 2 to p − 2 can be paired with different inverses.

🎯 Wilson's Theorem statement and proof

🎯 The theorem

Theorem 3.2.2 (Wilson's Theorem): Given p ∈ ℕ such that p ≥ 2, p is prime if and only if (p − 1)! ≡ −1 (mod p).

Both directions:

Forward: If p is prime, then (p − 1)! ≡ −1 (mod p).
Reverse: If p ≥ 2 and (p − 1)! ≡ −1 (mod p), then p is prime.

🔄 Forward direction: prime implies the congruence

Setup:

Assume p is prime.
Consider the product (p − 2)! = 1 · 2 · 3 · ... · (p − 2).

Pairing strategy:

Every term from 2 to p − 2 has an inverse modulo p (by Corollary 2.2.8).
By Lemma 3.2.1, only 1 equals its own inverse in this range (p − 1 is not in (p − 2)!).
Pair each element with its distinct inverse: there are (p − 3)/2 such pairs.
Each pair multiplies to 1 modulo p.

Result:

(p − 2)! ≡ 1^((p−3)/2) ≡ 1 (mod p).
Therefore (p − 1)! = (p − 1) · (p − 2)! ≡ (p − 1) · 1 ≡ p − 1 ≡ −1 (mod p).

Example from the excerpt (p = 7):

Find inverses: 2 · 4 ≡ 1 (mod 7) and 3 · 5 ≡ 1 (mod 7).
So 5! = (5 · 3) · (4 · 2) · 1 ≡ 1 · 1 · 1 ≡ 1 (mod 7).
Then 6! = 6 · 5! ≡ 6 · 1 ≡ 6 ≡ −1 (mod 7).

🔙 Reverse direction: the congruence implies prime

Setup:

Assume p ≥ 2 and (p − 1)! ≡ −1 (mod p).
Rewrite as (p − 1)! + 1 ≡ 0 (mod p), meaning p | ((p − 1)! + 1).

Proof by contradiction:

Let d be any divisor of p with d ≠ p.
Since d divides p and d ≠ p, we have 1 ≤ d ≤ p − 1.
Therefore d appears in the product (p − 1)!, so d | (p − 1)!.
Also, d | p and p | ((p − 1)! + 1), so d | ((p − 1)! + 1).
By Theorem 1.3.9, d divides the difference: d | [((p − 1)! + 1) − (p − 1)!] = 1.
This means d = 1.

Conclusion:

Every divisor of p is either p itself or 1.
Therefore p is prime.

💡 Significance and limitations

💡 As a primality test

What the theorem offers:

A computational criterion: to test if n is prime, check whether (n − 1)! ≡ −1 (mod n).
This is the first example in the excerpt of a "simple computational congruence" that characterizes primes.

Why it's impractical:

Computing (n − 1)! for large n is computationally infeasible.
The factorial grows extremely rapidly, making this test "entirely impractical" for real-world use.

Don't confuse:

Theoretical importance vs. practical utility: Wilson's Theorem is mathematically elegant and useful for proofs, but not for actual primality testing in applications.

📜 Historical note

The theorem is "usually named after an 18th century English mathematician."
However, "it was actually first stated by Ibn al-Haytham nearly 800 years earlier."
This is a reminder that mathematical attribution often reflects European history rather than the full global history of mathematics.

Multiplicative Order and Applications

3.3. Multiplicative Order and Applications

🧭 Overview

🧠 One-sentence thesis

The multiplicative order of an integer modulo n leads directly to Euler's Theorem and Fermat's Little Theorem, which state that powers of integers relatively prime to n eventually return to 1 modulo n.

📌 Key points (3–5)

Multiplicative order: the smallest positive power k such that a to the k is congruent to 1 mod n (when a and n are relatively prime).
Key structural result: the order of a mod n always divides φ(n), the Euler totient function.
Euler's Theorem: if gcd(a, n) = 1, then a to the φ(n) is congruent to 1 mod n.
Fermat's Little Theorem: a special case when n is prime p; then a to the (p−1) is congruent to 1 mod p (when gcd(a, p) = 1).
Common confusion: Fermat's Little has two forms—one requires gcd(a, p) = 1 and concludes a to the (p−1) ≡ 1 mod p; the other holds for all a and concludes a to the p ≡ a mod p.

🔢 Defining multiplicative order

🔢 What is multiplicative order?

Multiplicative order of a in mod n (written ord_n(a)): the smallest k in the natural numbers such that a to the k is congruent to 1 mod n, where n ≥ 2 and gcd(a, n) = 1.

The definition requires that a and n be relatively prime.
"Order" is shorthand when the context (multiplicative, and the modulus n) is clear.
Example: if a to the 5 ≡ 1 mod n and no smaller positive power works, then ord_n(a) = 5.

✅ Why the order is well-defined

Potential problem: maybe no positive power of a is ever congruent to 1 mod n.
Resolution via Pigeonhole Principle:
- Consider the infinite sequence of congruence classes [a to the p] for p = 1, 2, 3, ...
- There are only n congruence classes in Z/nZ (n "boxes").
- By the Pigeonhole Principle, two distinct powers p and q (with p > q) must land in the same box: [a to the p] = [a to the q].
- This means a to the p ≡ a to the q mod n.
- Since gcd(a, n) = 1, the inverse a to the (−1) exists mod n.
- Multiply both sides by (a to the (−1)) to the q: a to the (p−q) ≡ 1 mod n.
- So the set of k for which a to the k ≡ 1 mod n is non-empty, and the smallest such k exists by the Well-Ordering Principle.

🧮 The order divides φ(n)

🧮 Main structural theorem

Theorem 3.3.3: Given relatively prime numbers n and a, ord_n(a) divides φ(n).

This is a translation of Lagrange's Theorem from abstract algebra into the language of congruences.
The proof strategy: show that the group (Z/nZ)* can be partitioned into pieces (cosets), each of which has exactly ord_n(a) elements.

🔧 Proof sketch: cosets and partitions

The cyclic subgroup ⟨a⟩:
- Define ⟨a⟩ = {[a to the j] | j ∈ N, j ≤ ord_n(a)}.
- This set has exactly ord_n(a) elements.
- It contains [1] because a to the ord_n(a) ≡ 1 mod n.
- Every element in ⟨a⟩ has an inverse mod n, so ⟨a⟩ is a subset of (Z/nZ)*.
Cosets of ⟨a⟩:
- A coset is a set of the form x⟨a⟩ = {[x] · [a to the j] = [x a to the j] | j ∈ N, j ≤ ord_n(a)}.
- Every element [x] in (Z/nZ)* belongs to some coset (because [1] ∈ ⟨a⟩, so [x] ∈ x⟨a⟩).
- Each coset x⟨a⟩ is bijective with ⟨a⟩ (via the map that sends [a to the j] to [x a to the j]; the inverse comes from x to the (−1)).
Cosets are either identical or disjoint:
- If two cosets x⟨a⟩ and y⟨a⟩ share any element [z], then x a to the j ≡ y a to the k mod n for some j, k.
- Multiplying by (a to the (−1)) to the j shows x ≡ y a to the (k−j) mod n.
- This implies every element of x⟨a⟩ can be written as an element of y⟨a⟩ and vice versa, so x⟨a⟩ = y⟨a⟩.
Conclusion:
- (Z/nZ)* is partitioned into m disjoint cosets, each with ord_n(a) elements.
- So m · ord_n(a) = #((Z/nZ)*) = φ(n).
- Therefore ord_n(a) divides φ(n).

🎯 Euler's Theorem and Fermat's Little Theorem

🎯 Euler's Theorem

Theorem 3.3.4 (Euler's Theorem): If a and n are integers with gcd(a, n) = 1, then a to the φ(n) ≡ 1 mod n.

Proof:
- By Theorem 3.3.3, ord_n(a) divides φ(n).
- So there exists an integer m such that m · ord_n(a) = φ(n).
- By definition of order, a to the ord_n(a) ≡ 1 mod n.
- Therefore a to the φ(n) = a to the (m · ord_n(a)) = (a to the ord_n(a)) to the m ≡ 1 to the m ≡ 1 mod n.

🔺 Fermat's Little Theorem (first form)

Corollary 3.3.5 (Fermat's Little Theorem): If p is prime and a is an integer with gcd(p, a) = 1, then a to the (p−1) ≡ 1 mod p.

Proof:
- For a prime p, φ(p) = p − 1.
- Apply Euler's Theorem with n = p.
This is a special case of Euler's Theorem.

🔺 Fermat's Little Theorem (second form)

Theorem 3.3.6: If p is prime, then for all integers a, a to the p ≡ a mod p.

Proof (two cases):
- Case 1: gcd(a, p) = 1:
  - By Fermat's Little (first form), a to the (p−1) ≡ 1 mod p.
  - Multiply both sides by a: a to the p ≡ a mod p.
- Case 2: gcd(a, p) ≠ 1:
  - Since p is prime, the only divisors of p are 1 and p.
  - So gcd(a, p) = p, which means p divides a.
  - Then a ≡ 0 mod p, and all powers of a are also ≡ 0 mod p.
  - Therefore a to the p ≡ 0 ≡ a mod p.
Don't confuse: the first form requires gcd(a, p) = 1 and gives a to the (p−1) ≡ 1; the second form holds for all a and gives a to the p ≡ a.

🧪 Applications and primality testing

🧪 Wilson's Theorem as a primality test

Context from earlier in the excerpt: Wilson's Theorem states that (p−1)! ≡ −1 mod p if and only if p is prime.
The excerpt notes that Wilson's Theorem can be used to test primality: check if (n−1)! ≡ −1 mod n; if so, n is prime.
Practicality: the excerpt calls this "entirely impractical" (computing (n−1)! is very expensive for large n).
Significance: it is "our first example of a simple computational congruence involving an integer which can tell us that that integer is prime."

🧪 Fermat's Little Theorem as a primality test

Exercise 3.3.3 asks: can we build a primality test from Fermat's Little Theorem? Would it be more efficient than Wilson's?
The excerpt does not provide the answer but suggests:
- Write a formal statement of the proposed test.
- Optionally implement it in code.
- Research whether this has been investigated and what the conclusion was (either a formal result or a counterexample).
Example idea: test if a to the (n−1) ≡ 1 mod n for some a with gcd(a, n) = 1; if true, n might be prime (but the converse is not guaranteed—there are composite numbers that pass this test for some bases a, called pseudoprimes).

🔄 Alternative proof approach (Section 3.4 preview)

🔄 Fermat's Little via multiples

The excerpt previews another proof of Fermat's Little Theorem (Section 3.4) that does not use the order concept.
Proof sketch for Fermat's Little (redux):
- Consider the set M_a = {[0], [a], [2a], ..., [(p−1)a]} of multiples of [a] in Z/pZ.
- Claim: M_a has p distinct elements.
- Why: suppose [ja] = [ka] for 0 ≤ j, k < p. Then ja ≡ ka mod p, so p divides (j−k)a.
- By Proposition 3.1.5 (Euclid's Lemma for primes), either p divides (j−k) or p divides a.
- Since gcd(p, a) = 1, p does not divide a, so p divides (j−k).
- But −p+1 < j−k < p−1, so the only multiple of p in this range is 0, meaning j = k.
- Therefore all p elements in M_a are distinct, so M_a = Z/pZ (just a relabeling).
The excerpt breaks off here, but the idea is to multiply all non-zero elements together in two ways and compare, leading to a to the (p−1) ≡ 1 mod p.
Why this matters: this approach "usefully fills out our understanding of multiplication in Z/nZ" and is "more historically accurate" (Fermat and Euler did not use the abstract-algebra language of orders and cosets).

Another Approach to Fermat's Little and Euler's Theorems

3.4. Another Approach to Fermat’s Little and Euler’s Theorems

🧭 Overview

🧠 One-sentence thesis

Fermat's Little Theorem and Euler's Theorem can both be proved by showing that multiplying all invertible elements in modular arithmetic by a coprime number simply rearranges those elements, allowing cancellation to reveal the power-of-a result.

📌 Key points (3–5)

Core strategy: construct a set of multiples of a coprime element and prove it equals the original set of invertible elements, just rearranged.
Key mechanism: if two multiples are equal mod n, then the original elements must be equal (using coprimality and divisibility properties).
Fermat's Little Theorem: for prime p and a coprime to p, multiplying all nonzero elements by a gives the same set, leading to a to the power (p minus 1) congruent to 1 mod p.
Euler's Theorem: for any n and a coprime to n, the same logic applies to the phi(n) invertible elements, yielding a to the power phi(n) congruent to 1 mod n.
Common confusion: the proofs don't compute new elements; they show that multiplying by a coprime element is a rearrangement (permutation) of the invertible set.

🔄 The rearrangement strategy

🔄 What the approach does

Instead of proving the theorems directly, this method examines what happens when you multiply every invertible element by a fixed coprime number a.
The claim: the resulting set of multiples is exactly the same as the original set of invertible elements, just in a different order.
Once this is established, multiplying all elements together on both sides allows algebraic cancellation to isolate the desired power of a.

🧩 Why rearrangement matters

If the set of multiples were different (contained new elements or missed old ones), the product equality would fail.
The proof hinges on showing two things:
- Every multiple is still invertible (remains in the invertible set).
- All multiples are distinct (no collisions).
Example: In Z/5Z, the nonzero elements are [1], [2], [3], [4]. Multiplying each by [2] gives [2], [4], [6]=[1], [8]=[3]—the same four elements, rearranged.

🔢 Fermat's Little Theorem proof

🔢 Setting up the multiples

The set M_a = {[0]_p, [a]_p, [2a]_p, ..., [(p−1)a]_p} of multiples of [a]_p in Z/pZ.

Start with all p congruence classes: 0, a, 2a, ..., (p−1)a mod p.
Goal: prove this set has exactly p elements and equals Z/pZ.

🔍 Proving distinctness

Suppose [ja]_p = [ka]_p for some 0 ≤ j, k < p.
Then p divides (j − k)a.
By Proposition 3.1.5 (Euclid's Lemma), since gcd(p, a) = 1, either p divides (j − k) or p divides a.
Since a is coprime to p, p cannot divide a, so p must divide (j − k).
But j and k are both between 0 and p−1, so (j − k) is between −(p−1) and (p−1), and the only multiple of p in that range is 0.
Therefore j = k, so all p elements in M_a are distinct.
Since Z/pZ has exactly p elements, M_a = Z/pZ.

🧮 Multiplying and canceling

Multiply all nonzero elements of M_a:
- a · 2a · ... · (p−1)a ≡ 1 · 2 · ... · (p−1) (mod p)
Rearrange the left side:
- a to the power (p−1) times (p−1)! ≡ (p−1)! (mod p)
This means p divides (a to the power (p−1) − 1) times (p−1)!.
Since p is prime, gcd(p, (p−1)!) = 1, so by Proposition 3.1.5, p must divide (a to the power (p−1) − 1).
Conclusion: a to the power (p−1) ≡ 1 (mod p).

📜 Historical note

The excerpt notes this proof is similar to Euler's original proof of Fermat's Little Theorem.
Fermat himself gave no proof of his theorem (like his famous Last "Theorem").

🌐 Euler's Theorem proof

🌐 The invertible elements

The set (Z/nZ)* of (multiplicatively) invertible elements in Z/nZ can be written as {[b₁]_n, ..., [b_φ(n)]_n} where b₁, ..., b_φ(n) satisfy 1 ≤ b₁ < ... < b_φ(n) < n and are all relatively prime to n.

Unlike the prime case, not all nonzero elements mod n are invertible when n is composite.
Only the phi(n) elements coprime to n are invertible.
The proof constructs M_a = {[b₁a]_n, ..., [b_φ(n)a]_n} and shows it equals (Z/nZ).

🔍 Proving M*_a is a rearrangement

Step 1: Every element of M*_a is invertible

Each b_j is relatively prime to n, and a is relatively prime to n.
Therefore b_j · a is also relatively prime to n (the primes dividing b_j·a divide either b_j or a, none of which divide n).
So M_a ⊆ (Z/nZ).

Step 2: All elements are distinct

Suppose [b_j·a]_n = [b_k·a]_n for some 1 ≤ j, k ≤ φ(n).
Then n divides (b_j − b_k)a.
By Euclid's Lemma (2.1.6), since gcd(a, n) = 1, n must divide (b_j − b_k).
Since 1 ≤ b₁ < ... < b_φ(n) < n, we have −n+1 < b_j − b_k < n−1, and the only multiple of n in that range is 0.
Thus b_j = b_k, so all φ(n) elements in M*_a are distinct.

Step 3: Conclusion

M_a has φ(n) distinct elements and is a subset of (Z/nZ), which also has φ(n) elements.
Therefore M_a = (Z/nZ).

🧮 Multiplying and canceling

Multiply all elements in M_a and in (Z/nZ):
- b₁a · ... · b_φ(n)a ≡ b₁ · ... · b_φ(n) (mod n)
Regroup:
- a to the power φ(n) times b₁ · ... · b_φ(n) ≡ b₁ · ... · b_φ(n) (mod n)
Since all the b's are invertible mod n, multiply by their inverses one by one.
Result: a to the power φ(n) ≡ 1 (mod n).

⚠️ Don't confuse

The proof does not claim that multiplying by a creates new invertible elements; it shows that multiplication by a coprime element permutes the existing invertible elements.
The key is that gcd(a, n) = 1 ensures both that multiples remain coprime to n and that the multiplication is reversible (no collisions).

Some Speculative History

4.1. Some Speculative History

🧭 Overview

🧠 One-sentence thesis

The invention of writing created new information security challenges—confidentiality, integrity, authentication, and non-repudiation—that early cryptosystems like the scytale attempted to address by using keys to protect messages even when encryption algorithms became known.

📌 Key points (3–5)

Why writing changed everything: face-to-face communication has natural controls (identity verification, direct transmission), but written messages can be intercepted, altered, or shared with unintended parties.
Four core security goals: confidentiality (only intended recipient reads it), integrity (message not altered), authentication (verify sender identity), and non-repudiation (sender cannot deny sending).
Key vs algorithm: the scytale example shows that even when the encryption method is known, a secret parameter (the key—here, the scytale's diameter) can preserve confidentiality.
Kerckhoffs's Principle: security should rely on keeping the key secret, not the algorithm; revealing algorithms publicly allows the crypto community to test and strengthen systems, following the scientific method.
Common confusion: "security through obscurity" (hiding the algorithm) is weak and ill-advised, unlike key-based security where the algorithm is public but the key remains secret.

📜 The problem writing created

📜 Loss of face-to-face controls

Before writing, when two people spoke face-to-face:

The listener could judge the speaker's trustworthiness.
Both could verify each other's identity and choose what to share.
Words passed directly from speaker to listener without possibility of change in transit.

With writing (invented over 5000 years ago):

Words frozen in physical form can be taken and shared with unintended parties.
If the author cannot hand the work directly to the intended reader, both parties lose certainty:
- Is the other person who they claim to be?
- Has the writing been changed since it was originally set down?

🎭 The Alice, Bob, and Eve framework

The excerpt formalizes these issues using characters:

Alice: the sender of a message.
Bob: the intended recipient.
Eve: represents the environment and potential eavesdropper who may disrupt or intrude.

Example: Alice wants to send a message to Bob, but Eve might listen in or intercept it.

🛡️ Four information security goals

🛡️ Confidentiality

Confidentiality means that only the intended recipient can extract the content of the message.

Alice wants only Bob to get her message, not Eve—even if Eve listens or intercepts the message in transit.

🔒 Integrity

Message integrity means that the recipient can be sure the message was not altered.

Bob wants to know that what he receives is what Alice wrote, not what Eve changed it into.

🆔 Authentication

Sender authentication means that the recipient can determine from the message the identity of the sender.

Bob wants to be sure the message actually originated with Alice.

📝 Non-repudiation

Sender non-repudiation means that the sender should not be able to deny sending that message.

Bob wants to be able to hold Alice to her promises.

🔄 Same-person use case

Alice and Bob may be the same person sending a message from past self to future self.

Example: Someone may want to keep records that must be confidential and have reliable integrity, even if there is a break-in to the site keeping those records.

🏛️ The scytale: an early cryptosystem

🏛️ Historical context

One of the earliest schemes attempting to achieve confidentiality was probably used around the 7th century BCE in Greece by military commanders.

Goal: get dispatches from far-away soldiers and send them orders in a way that even if the messenger is captured, the message will not be readable by the enemy.

🎲 How the scytale worked

A scytale was a cylindrical stick (maybe a spear shaft; the word means "baton" in ancient Greek).

Process:

A long strip of parchment was wrapped in a spiral around the scytale.
A message was written in parallel rows along the length of the scytale.
When the strip was unwound, the letters were all jumbled and the message was unreadable.
On the other end, the receiver (who also has a scytale) wound the strip in a spiral and read lengthwise—the message reappeared.

🔑 The scytale's key

Even if a spy sees scouts wrapping parchment around spears and writing along the spear (meaning Eve learns the encryption algorithm), confidentiality may be preserved.

Key: additional information used in encryption and necessary for successful decryption.

In the scytale case, the diameter of the scytale is the key.
There is conjecture that the scytale also provided simple authentication: only Alice had the matching scytale to Bob's, so if Bob could read any coherent sequence of letters at all along his scytale, he knew Alice had sent it.

⚠️ Vulnerabilities

The vulnerabilities of this cryptosystem are fairly clear. (The excerpt notes that Athens did lose the Peloponnesian War.)

🔓 Core cryptology terminology

🔓 Plaintext

Plaintext: the message that Alice wishes to send to Bob, in its actual original (and final) form.

🔐 Cipher and encryption

Cipher: an algorithm that Alice uses to transform (obfuscate) the plaintext into a form that will remain confidential even if observed by Eve while in flight.

We also say that Alice encrypts the (plaintext) message.

🔏 Ciphertext

Ciphertext: after the message has been encrypted, it is called the ciphertext.

🔓 Decryption

Decrypt: when Bob receives the ciphertext, he applies another algorithm to decrypt it and recover the plaintext.

📊 The basic flow

Step	Who	Action	Form
1	Alice	Encrypts plaintext m to c	Plaintext → Ciphertext
2	Alice	Transmits c	Ciphertext in transit
3	Bob	Receives c	Ciphertext
4	Bob	Decrypts c to recover m	Ciphertext → Plaintext

🔬 Kerckhoffs's Principle vs security through obscurity

🔬 Why reveal algorithms publicly

Many hundreds of years of cryptology have shown that it is better to:

Reveal the algorithms of your cryptosystem to the public.
Only keep the key for a particular communication channel (between Alice and Bob) secret.

🔬 Reasons for public algorithms

Primary reason: the author of a cryptosystem can never be sure they have thought of all the possible methods of cryptanalysis which will be used against it.

The system's author is better off letting the general crypto community do its best against the system.
Only systems which have withstood such assault should be used.
This is basic to the scientific method itself: scientists publish all the details of their experiments, so others can try them as well and give independent verification or find something to criticize.

🔬 Kerckhoffs's Principle

Kerckhoffs's Principle: the security of a cryptosystem should be based on the secrecy of the key but not of the algorithm.

Named after a set of cryptosystem design ideas written down in 1883 by a French military cryptographer.

⚠️ Security through obscurity

Kerckhoffs's Principle is held in opposition to cryptosystems that rely upon no one ever finding out the algorithms.

Such systems rely instead on the ill-advised security through obscurity paradigm.
These are thought of as very weak.

Don't confuse: keeping the algorithm secret (security through obscurity, weak) vs. keeping the key secret while the algorithm is public (Kerckhoffs's Principle, strong).

4.2. The Caesar Cipher and Its Variants

🧭 Overview

🧠 One-sentence thesis

The Caesar cipher and its variants—including Vigenère and the one-time pad—illustrate a progression from trivially breakable substitution ciphers to information-theoretically secure systems, with the trade-off being key distribution complexity.

📌 Key points (3–5)

Caesar cipher: shifts every letter by a fixed number k places in the alphabet; encryption and decryption use the same operation with opposite keys.
Vigenère cipher: uses a repeating sequence of shifts (a tuple of integers) instead of a single shift, making it essentially ℓ Caesar ciphers in parallel.
One-time pad: a Vigenère cipher where the key is as long as the message, random, and never reused—provably perfectly secure but suffers from key distribution problems.
Common confusion: ROT13 (key = 13) is self-inverse, but most Caesar keys are not; Vigenère with key length 1 is identical to Caesar.
Security vs practicality: one-time pads are information-theoretically secure (no computational assumptions needed), but require sharing enormous keys in advance.

🔐 The Caesar cryptosystem

🔐 How Caesar encryption works

Caesar cryptosystem: Alice removes spaces and punctuation, converts to one case, then shifts each letter k places down the alphabet (wrapping Z to A), where k is the shared secret key; Bob decrypts by shifting k places backward (encrypting with key −k).

The key k is a fixed integer known only to Alice and Bob.
Example: Julius Caesar used k = 3; his nephew Augustus used k = −1.
The transformation is purely mechanical: same shift for every letter.

🔄 ROT13: the self-inverse variant

With a 26-letter alphabet, key k = 13 has a special property: encryption and decryption are identical.
Applying ROT13 twice returns the original message.
Modern use: early Internet chat rooms and bulletin boards used ROT13 to hide spoilers or sensitive content—not for security, but to require a deliberate action to read.
Don't confuse: ROT13 was never secure; some commercial products (e.g., parts of Windows) mistakenly used it for actual security despite it being "completely insecure."

🔗 The Vigenère cryptosystem

🔗 How Vigenère extends Caesar

Vigenère cryptosystem: Alice uses a key that is a tuple of integers k⃗ = (k₁, …, kₗ); she shifts each plaintext letter by the corresponding key number, cycling back to the start of the key when she runs out; Bob decrypts by shifting with the negative key −k⃗ = (−k₁, …, −kₗ).

The key length ℓ is a natural number; if ℓ = 1, Vigenère is exactly Caesar.
Traditionally, the key is a memorized word; each letter corresponds to a shift (A = 1, B = 2, etc.).
Why it's stronger: Vigenère is essentially ℓ Caesar ciphers running in parallel, plus the attacker doesn't know ℓ.

📜 Historical context

Vigenère was considered unbreakable for hundreds of years and served as the main diplomatic cipher in European courts.
Eventually, cryptanalysts developed methods to break it (the excerpt notes it is "ℓ + 1 times harder to crack than Caesar").

🎲 The one-time pad

🎲 What makes the one-time pad special

One-time pad: a Vigenère cryptosystem where the key is as long as the message, chosen randomly, and never reused.

This is the opposite extreme from ℓ = 1 (Caesar): here ℓ equals the message length.
The key itself is also called the "one-time pad."
Sometimes called "Vernam Cipher," though the excerpt notes this is historically inappropriate.

🔒 Perfect security

With a truly random one-time pad, the system is information-theoretically secure: security does not depend on assumptions about the attacker's computational power.
The proof of security is referenced to Shannon (1948, 1949).
Contrast with other systems: most cryptosystems are only secure if the attacker is limited to a "probabilistic polynomial-time Turing machine" (a computational complexity assumption).

⚠️ Critical requirement: never reuse the pad

If the same pad is used twice, an attacker can subtract one ciphertext from the other, completely removing the pad.
At that point, the message can usually be determined quickly.
Don't confuse: the one-time pad is only secure if the key is random, as long as the message, and never reused—violating any condition breaks security.

💾 Digital implementation

In modern digital communications, messages are bits (0s and 1s).
Shifting in a binary alphabet means: even shift = no change; odd shift = flip 0 ↔ 1.
Equivalently, encryption is adding corresponding bits modulo 2 (treating bits as elements [0], [1] in ℤ/2ℤ).
A good one-time pad is a very random, very long string of bits.

🚧 The key distribution problem

🚧 Why one-time pads are impractical

Key distribution: Alice and Bob must share enormous keys in advance—keys as large as all their future messages combined.
This makes one-time pads impractical for most real-world scenarios.

🎭 Pseudorandomness as a workaround

Idea: Alice and Bob share a small secret and use it to generate arbitrarily long sequences of bits that appear random to attackers.
These pseudorandom sequences are not truly random (Alice and Bob can both generate them from the shared secret), but they look random to everyone else.
If successful, pseudorandom sequences could be used as one-time pads.
The excerpt references Luby (1996) for more on pseudorandomness.

Cipher	Key structure	Security	Key distribution
Caesar	Single integer k	Completely broken before 1000 CE	Easy (one number)
Vigenère	Tuple (k₁, …, kₗ)	Broken after centuries; ℓ + 1 times harder than Caesar	Moderate (a word)
One-time pad	Random sequence as long as message	Information-theoretically secure	Impractical (huge keys)

First Steps into Cryptanalysis: Frequency Analysis

4.3. First Steps into Cryptanalysis: Frequency Analysis

🧭 Overview

🧠 One-sentence thesis

Frequency analysis exploits the characteristic distribution of letters in natural language to break simple substitution ciphers like Caesar and Vigenère by finding the decryption key that produces letter frequencies closest to standard English.

📌 Key points (3–5)

Keyspace size matters: Caesar cipher has only 26 possible keys, making brute-force attacks trivial; Vigenère with longer keys (e.g., length 5) has millions of keys, requiring automated detection of valid plaintext.
Frequency analysis core idea: natural language has predictable letter frequency patterns (e.g., 'e' is most common in English); ciphertexts that are shifted versions of plaintext preserve this pattern shape but shifted.
Automated decryption method: try all possible keys, measure how close each candidate plaintext's letter frequency is to standard English using total square error, and pick the key with minimum error.
Common confusion—single letter vs. full distribution: looking only at the most frequent letter (hoping it decrypts to 'e') often fails; the full frequency distribution comparison is more robust, even for short texts.
Vigenère weakness: when key length ℓ is unknown, the attacker tests multiple ℓ values, splits ciphertext into ℓ substrings (every ℓ-th letter), applies Caesar cracking to each, and identifies the correct ℓ by finding which produces the smallest square error against standard English.

🔑 Keyspace and brute-force attacks

🔑 What is keyspace

Keyspace: The set of all valid keys for some cryptosystem.

It is not the set of messages or ciphertexts; it is the set of possible keys an attacker must search.
Example: Caesar cipher keyspace = the alphabet (26 letters in modern Latin alphabet, 23 in Julius Caesar's time).
Example: Vigenère cipher with key length ℓ has N^ℓ keys (N = alphabet size); if key length is unknown but ≤ L, there are sum from j=1 to L of N^j possible keys.
Example: one-time pad for a message of length M has N^M possible pads.

💪 Brute-force attack (exhaustive search)

Brute-force attack (exhaustive search): trying all possible keys and checking which one yields valid plaintext.

For Caesar cipher: only 26 keys to try—feasible even in ancient times.
For Vigenère with key length 5: nearly 12 million keys—before computers, completely intractable; even with computers, a human would need to inspect 12 million candidate decryptions.
Why automation is needed: a computer program must distinguish gibberish from real messages automatically, enabling instant brute-force success.

📝 Message space

Message space: The set of possible messages for an encrypted communication.

Without structure in the message space, cryptanalysis becomes nearly impossible.
- Example: if Alice sends only a safe combination, brute force has no way to identify the correct decryption among candidates.
When structure helps: if the message space is "short or long English texts," English has recognizable structure (words, letter patterns, frequency distributions) that can be automated.

📊 Frequency analysis fundamentals

📊 Why frequency analysis works

Natural language texts have characteristic letter frequency distributions.
- Example: in Shakespeare's plays, 'e' is the most common letter by a noticeable margin (see Figure 4.3.1 in the excerpt).
A simple substitution cipher (like Caesar) shifts the entire distribution but preserves its shape.
- Example: if plaintext 'e' is most common, ciphertext letter corresponding to 'e' will also be most common (assuming the text is long enough).
Key insight: ciphertext that "does not look like English" often means it has the wrong letter frequencies—too few vowels, wrong letter combinations, etc.

🔍 Naive approach: most frequent letter only

Idea: assume the most frequent letter in ciphertext decrypts to 'e'; use the Caesar key that maps that letter to 'e'.
Why it fails: the excerpt shows an example where this yields gibberish:
```
ezmpzcyzeezmpesletdespbfpdetzy
hspespcetdyzmwpctyespxtyoezdfqqpc
...
```
"Not so good. Apparently looking only at the most frequent letter is insufficient: the most frequent letter in the plaintext of this message was not 'e'."
Lesson: short texts or unusual word choices can make the most frequent letter something other than 'e'.

📏 Better approach: full distribution + distance measure

Instead of looking only at the most frequent letter, compare the entire frequency distribution of each candidate decryption to standard English.
Distance measure: total square error.

Total square error: a distance between two distributions f and g given by d(f, g) = sum from j=0 to 25 of (f(j) − g(j))^2, where letters are numbered 'a'=0, 'b'=1, ..., 'z'=25.

This is equivalent to least squares in statistics.
Method: try all possible Caesar keys (only 26), compute square error for each, pick the key with minimum error.
Example: for the ciphertext at the beginning of the section, the minimal square error comes from decryption key 23 (encryption key 3), yielding:
```
tobeornottobethatisthequestion
whethertisnoblerinthemindtosuffer
...
```
which is recognizable English (Hamlet).

🎯 Robustness with short texts

The excerpt emphasizes that this approach works even for quite short texts, which might have frequency distributions far from the standard.
Example: ciphertext rkkrtbrkeffespkyvtifjjifruj (very short) still yields correct decryption key 9 (encryption key 17), producing attackatnoonbythecrossroads.
Why it works: even short texts tend to have frequency distributions closer to standard English than to random gibberish.

🔐 Breaking the Vigenère cipher

🔐 The challenge: unknown key length

If the key length ℓ were known, breaking Vigenère would be straightforward:
- Divide ciphertext into ℓ substrings: 1st, (ℓ+1)-th, (2ℓ+1)-th, ... letters form substring 1; 2nd, (ℓ+2)-th, ... form substring 2; etc.
- Each substring is Caesar-encrypted with one letter of the key.
- Apply the Caesar cracker (frequency analysis) to each substring to recover the ℓ digits of the key.
Problem: Eve does not know ℓ in advance.

🌊 Why Vigenère resists simple frequency analysis

The excerpt shows an example: Hamlet encrypted with a 35-letter Vigenère key hippopotomonstrosesquipedaliophobia.
The letter frequency distribution (Figure 4.3.5) is much flatter than Caesar ciphertext (Figure 4.3.2):
- No zero values, no letters much more common than others.
- Reason: the distribution is the average of many different shifts (one for each letter in the key), which smooths out the characteristic shape of English.
This makes it hard to identify the correct decryption shift by eye.

🧪 Testing wrong key lengths

If Eve guesses a wrong key length (e.g., ℓ = 5 when the true length is 35), the substrings she extracts will mix letters encrypted with different key letters.
Example: letters at positions congruent to 1 (mod 5) in the Hamlet ciphertext, when analyzed with Caesar cracking, produce a graph (Figure 4.3.6) where:
- The minimum square error is at decryption key 23, but it is not significantly less than other possibilities.
- All errors are around ten times higher than in correct Caesar decryptions (compare Figure 4.3.3).
Interpretation: the frequency distribution is still too flat (averaged over multiple shifts), so the square error remains large.

🛠️ Vigenère cryptanalytic approach (step-by-step)

Choose a bound L: based on available computer time and ensuring that every ℓ-th letter yields enough letters for frequency analysis.
For each candidate key length ℓ from 1 to L:
- Extract ℓ substrings: substring k consists of letters at positions k, k+ℓ, k+2ℓ, ... (for k = 1, ..., ℓ).
- Apply the Caesar cracker to each substring to find the optimal Caesar key for that substring.
- This produces a candidate Vigenère key k_ℓ = (k_{ℓ,1}, ..., k_{ℓ,ℓ}).
Decrypt the entire ciphertext with each candidate key k_ℓ.
Compute the square error d_ℓ between the decrypted text's letter frequency and standard English.
Report the ℓ and k_ℓ with the smallest d_ℓ as the true key length and key.

Why this works: the correct ℓ will produce substrings that are true Caesar encryptions, so their frequency distributions will be close to standard English (small square error); wrong ℓ values will produce flat, averaged distributions (large square error).

⚠️ Don't confuse: Vigenère vs. Caesar frequency patterns

Cipher	Frequency distribution shape	Why
Caesar	Shifted version of standard English; peaks and valleys preserved	Single shift applied to all letters
Vigenère (wrong ℓ guess)	Flat, smoothed; no clear peaks	Averaging over multiple shifts mixes the pattern
Vigenère (correct ℓ, correct substring)	Shifted version of standard English	Each substring is a true Caesar encryption

🧩 Practical considerations and limitations

🧩 Ambiguity in short texts

Exercise 4.3.1 asks: can a Caesar ciphertext have two decryptions that both seem like valid English?
Implication: very short texts might yield multiple plausible plaintexts, making unique decryption uncertain.
Exercise 4.3.2 asks whether this is harder or easier for Vigenère.
- Likely harder: more keys to test, and longer keys mean more possible ambiguous decryptions.

🔄 Combining cryptosystems (Exercise 4.3.3)

Naive idea: encrypt with Caesar, then re-encrypt the ciphertext with Vigenère (or vice versa) using different keys—does this make the system "substantially more secure"?
The excerpt does not provide the answer but poses the question: will this result in a substantially more secure cryptosystem? Why or why not?
Hint for reasoning: both Caesar and Vigenère are substitution ciphers; composing them may not add much security if the combined effect is still a substitution cipher with a predictable structure.

📚 Historical note

Frequency analysis in cryptanalysis was apparently first described by the Muslim philosopher and mathematician al-Kindi in his 9th century work Manuscript on Deciphering Cryptographic Messages.

Al-Kindi also brought Hindu numbers (with place-value notation) into the Muslim world.
This shows that frequency analysis is a centuries-old technique, long predating modern computers.

Public-Key Crypto: the RSA Cryptosystem

4.4. Public-Key Crypto: the RSA Cryptosystem

🧭 Overview

🧠 One-sentence thesis

Public-key cryptography solves the problem of secure communication between parties who have never met by using separate encryption and decryption keys, with RSA relying on the computational difficulty of factoring large composite numbers to ensure security.

📌 Key points (3–5)

The core problem: symmetric (private-key) systems require Alice and Bob to exchange a secret key in advance, which is impossible if they've never met.
How asymmetric systems work: Bob publishes an encryption key publicly but keeps the decryption key private; anyone can encrypt messages to Bob, but only Bob can decrypt them.
Why RSA is secure: multiplying two large primes is fast, but factoring the product back into primes is computationally infeasible—this is a one-way function.
Common confusion: encryption and decryption keys are different in public-key systems (asymmetric), whereas they are the same in symmetric systems.
Practical requirements: the message space must be large enough (often using cryptographic salt) to prevent Eve from simply encrypting all possible messages and comparing them to intercepted ciphertexts.

🔐 Symmetric vs asymmetric cryptosystems

🔐 Symmetric (private-key) systems

Symmetric cipher (symmetric cryptosystem): a system with a message space M, a keyspace K, an encryption algorithm that creates ciphertext c = e_k(m), and a decryption algorithm that recovers the message d_k(c) such that d_k(e_k(m)) = m for all keys k and messages m.

Both Alice and Bob use the same key k for encryption and decryption.
They must agree on this key through private communication before using the system.
Also called private key cryptosystems because the key must remain secret.
Security requirement: the keyspace K must be large, or the system must make it hard to tell if a decryption is valid, to resist brute-force attacks (Eve trying every possible key).

Example: Alice and Bob meet in person, agree on key k, then later Alice encrypts message m_A as c_A = e_k(m_A), sends c_A over a public network, and Bob decrypts it as m_A = d_k(c_A).

🔓 Asymmetric (public-key) systems

Asymmetric cipher (asymmetric cryptosystem): a system with a message space M, an encryption keyspace K_e, a decryption keyspace K_d, a one-to-one function E associating decryption keys to encryption keys, an encryption algorithm c = e_{k_e}(m), and a decryption algorithm d_{k_d}(c) such that d_{k_d}(e_{E(k_d)}(m)) = m for all keys and messages.

Bob picks a decryption key k_d (his private key) and keeps it secret.
Bob computes the corresponding encryption key k_e = E(k_d) (his public key) and publishes it (e.g., on his website).
Alice downloads Bob's public key k_e, encrypts her message m as c = e_{k_e}(m), and sends c to Bob.
Only Bob, with his private key k_d, can decrypt: m = d_{k_d}(c).
Also called public-key cryptosystems.

Don't confuse: In symmetric systems, the same key encrypts and decrypts; in asymmetric systems, different keys are used, and only the encryption key is public.

📊 Comparison table

Feature	Symmetric	Asymmetric
Key exchange	Requires private meeting or channel	No prior meeting needed
Encryption key	Same as decryption key	Different from decryption key
Key secrecy	Both parties must keep key secret	Only decryption key is secret
Public information	Algorithms and keyspaces	Algorithms, keyspaces, and encryption key

🧂 Protecting the message space

🧂 The small-message-space attack

Since Eve knows Bob's public encryption key k_e and the encryption algorithm, if the message space M is small, she can encrypt every possible message and compare the results to an intercepted ciphertext c.
Example: if M has only 1000 possible messages, Eve computes e_{k_e}(m) for all 1000 messages and finds which one matches c.

🧂 Cryptographic salt

Cryptographic salt: random data added to a message before encryption, which is automatically removed after decryption.

Salting makes the effective message space much larger (messages plus all possible random salt values).
Additional benefit: even if Alice sends the same plaintext message twice, the ciphertext will be different each time (because the salt is random).
This prevents traffic analysis: Eve cannot tell when the same message is sent on separate occasions, and cannot correlate repeated ciphertexts with real-world events to learn their meaning.

🔑 One-way functions and RSA foundations

🔑 What makes public-key crypto possible

[Cryptographic] one-way function: a function f: A → B that is one-to-one and can be computed in a reasonable amount of time on a standard (classical) computer, but whose inverse cannot feasibly be computed.

Bob must be able to compute E (the function from decryption keys to encryption keys) during setup.
But if Eve can compute the inverse of E, she can recover Bob's private key from his public key, breaking the system.
Feasible computation in cryptology means time less than a polynomial function of the logarithm of the input size.

🔢 Multiplication vs factoring

Multiplying two numbers (even very large ones) is fast on a computer.
Testing if a number is prime or composite is also surprisingly fast (a result from 2004).
Factoring large composite numbers into their prime factors is computationally infeasible—no one has found a fast algorithm.
Example: RSA-768, a 232-digit composite number, took hundreds of computers about two years (in 2009) to factor; larger numbers are astronomically harder.
Therefore, multiplying two large primes to get a composite is a good candidate for a one-way function.

Don't confuse: Determining whether a number is prime is fast; finding the prime factors of a composite number is slow.

🔐 The RSA cryptosystem

🔐 RSA setup (Bob's side)

RSA modulus: the product n = pq of two very large primes p and q.

RSA exponent: a number e (typically with few 1's in binary, like 3 or 65537) satisfying gcd(e, φ(n)) = 1.

RSA public (encryption) key: the pair k_e = (n, e).

RSA private (decryption) key: the pair k_d = (n, d), where d = e^(−1) (mod φ(n)) (the modular inverse of e modulo φ(n)).

Bob's steps:

Pick two very large primes p and q.
Compute the RSA modulus n = pq.
Pick an RSA exponent e with gcd(e, φ(n)) = 1.
Publish the public key k_e = (n, e) (e.g., on his website).
Compute d = e^(−1) (mod φ(n)) using the Extended Euclidean Algorithm.
Keep the private key k_d = (n, d) secret.

🔐 RSA encryption and decryption

Message space: M = {m ∈ Z | 0 ≤ m < n}.
Encryption (Alice's side): c = e_{(n,e)}(m) = m^e (mod n).
Decryption (Bob's side): d_{(n,d)}(c) = c^d (mod n).

Why it works: The proof shows that for any message m, d_{(n,d)}(e_{(n,e)}(m)) = m. The key insight is that e · d ≡ 1 (mod φ(n)), so e · d = 1 + k·φ(n) for some integer k. By Euler's Theorem, m^(φ(n)) ≡ 1 (mod n) when gcd(m, n) = 1, so m^(ed) = m^(1 + k·φ(n)) ≡ m · (m^(φ(n)))^k ≡ m · 1^k ≡ m (mod n).

Example: Alice downloads Bob's public key (n, e), computes c = m^e (mod n), and sends c. Bob receives c and computes m = c^d (mod n) using his private key.

🔐 The association function E

The function E: K_d → K_e maps decryption keys to encryption keys.
For RSA: E((n, d)) = (n, e^(−1) (mod φ(n))).
This function is one-to-one.
Computing E requires knowing φ(n) = (p − 1)(q − 1), which requires knowing the prime factors p and q.
Security: Eve knows n and e (the public key), but to find d she would need to compute φ(n), which requires factoring n—the computationally infeasible step.

⚙️ Practical feasibility of RSA

⚙️ Why RSA is practical

The excerpt walks through each step to show that RSA can be implemented efficiently:

Step	Task	Why it's feasible
1	Choose large primes p and q	Algorithms exist to test primality in time polynomial in log(k)
2	Ensure enough primes exist	Prime Number Theorem: probability a random k is prime ≈ 1/ln(k); for 200-digit numbers, test ~461 candidates
3	Compute φ(n) = (p−1)(q−1)	Nearly instant using the formula
4	Find e with gcd(e, φ(n)) = 1	Many such e exist; probability two random numbers are coprime is 6/π²; testing uses Euclidean Algorithm (feasible)
5	Compute d = e^(−1) (mod φ(n))	Extended Euclidean Algorithm is feasible
6	Encrypt/decrypt (exponentiation mod n)	Fast modular exponentiation algorithm is polynomial in log of inputs

Feasible computation: time less than a polynomial function of the logarithm of the input numbers.

⚙️ Encoding real messages

Alice's actual message (text, not just a number) must be converted to numbers in the range [0, n).
ASCII code (1960): 7-bit binary numbers for Latin alphabet letters (upper/lower case), symbols, and control characters; stored in 8 bits (1 byte) with a leading 0.
Unicode (version 6.3, 2013): over 110,000 characters for global alphabets and writing systems; stored in encodings like UTF-8 or UTF-16 (2–4 bytes per character).
Process: Alice writes her message in ASCII or Unicode, concatenates all bytes, then cuts the result into blocks of b bits, where b is the largest value such that 2^b < n. Each block represents an integer m ∈ [0, n), which can be encrypted with RSA.
Space must be left for cryptographic salt within these blocks.

Example: If n is a 2048-bit number, Alice might use blocks of 2040 bits (leaving 8 bits for salt), encrypt each block separately, and send the sequence of ciphertexts.

🖊️ Digital signatures

🖊️ The signature problem

Public-key systems enable new use-cases beyond encryption.
Scenario: Bob wants to send a legally binding contract (e.g., a signed lease) to Larry via email.
Larry needs confidence that the email truly originated from Bob, not an imposter.

🖊️ How RSA enables signatures

Bob takes the document m (e.g., the lease plus his agreement and identifying information).
Bob applies his decryption algorithm using his private key k_d: s = d_{k_d}(m).
This s is Bob's signature on message m.
Bob emails both m and s to Larry.
Larry receives (m, s), detaches the signature s, and computes e_{k_e}(s) using Bob's public key k_e (from Bob's website).
Since e_{k_e} and d_{k_d} are inverses, the result should be m.
If e_{k_e}(s) = m, Larry knows that whoever created s must have had access to Bob's private key—so the message came from Bob.

Why it works: Only Bob knows his private key k_d, so only Bob can compute s = d_{k_d}(m). Anyone can verify the signature by checking e_{k_e}(s) = m using Bob's public key.

Don't confuse: For encryption, Alice uses Bob's public key to encrypt; for signatures, Bob uses his own private key to sign, and others use his public key to verify.

Digital Signatures

4.5. Digital Signatures

🧭 Overview

🧠 One-sentence thesis

Digital signatures use public-key cryptography to authenticate electronic documents by allowing a sender to sign a message with their private key so that anyone with the public key can verify the signature came from that sender.

📌 Key points (3–5)

Core mechanism: Bob signs a message by applying his decryption (private) key to the message; anyone can verify by applying Bob's encryption (public) key and checking if the result matches the original message.
Why hash functions are needed: signing the entire message doubles its size, so instead Bob signs a small "digest" produced by a cryptographic hash function that depends on every part of the message.
What makes a good hash function: it must be easy to compute, but infeasible to reverse (pre-image resistance), find another input with the same hash (second pre-image resistance), or find any two inputs with the same hash (collision resistance).
Common confusion: the signature is created by applying the private (decryption) key, not the public (encryption) key—this reverses the usual encryption flow.
Terminology shift: in the signature context, the private key is called the "signing key" and the public key is called the "verification key."

🔏 How digital signatures work

🔏 Basic signature creation and verification

The excerpt describes a scenario where Bob wants to send a legally binding lease agreement to Larry:

Bob's process:
- Takes the message m (the lease plus his agreement and identifying information).
- Applies his decryption algorithm using his private key k_d, yielding s = d_{k_d}(m).
- This s is Bob's signature.
- Sends both m and s to Larry.
Larry's verification:
- Detaches the signature s.
- Applies Bob's encryption algorithm using the public key k_e (downloaded from Bob's website).
- Computes e_{k_e}(s).
- Since encryption and decryption are inverses and order doesn't matter, the result should equal m.
- If e_{k_e}(s) = m, Larry accepts; otherwise, he rejects.

Why this works: only someone with access to Bob's secret key could have created s such that encrypting it with the public key yields m.

Example: If an imposter tried to forge a signature, they would need Bob's private key to create an s that passes verification—without it, e_{k_e}(s) will not equal m.

⚠️ The size problem

The basic scheme doubles the message size because both m and s (which is the same size as m) must be transmitted.
The excerpt calls this "one problem with this scheme" and motivates the need for hash functions.

🧮 Cryptographic hash functions

🧮 What a hash function does

A cryptographic hash function: a function h that takes arbitrary-length bit strings as input and produces fixed-length bit strings as output.

The hash h(m) is a small "digested piece of data" that depends on every part of the input m.
Bob signs h(m) instead of m, making the signature much smaller.

Why every part matters: if h(m) depended only on (say) the first 100 bits, a malicious attacker could alter the rest of the message undetected.

🛡️ Four resistance properties

The excerpt defines four requirements for a cryptographic hash function:

Property	Definition	Why it matters
Ease of computation	Feasible to compute h(m) for any m	Must be practical to use
Pre-image resistance	Given hash value t, infeasible to find m such that h(m) = t	Prevents reversing the hash
Second pre-image resistance	Given m₁, infeasible to find different m₂ such that h(m₂) = h(m₁)	Prevents forging a signature on a different message
Collision resistance	Infeasible to find any two messages m₁ and m₂ such that h(m₂) = h(m₁)	Prevents creating two messages with the same signature

"Feasible" and "infeasible" mean the same as in the previous section: computation time bounded (or not) by a polynomial function of input size.

🔍 Why collisions must exist but shouldn't be findable

The excerpt notes that "since a hash function takes inputs of arbitrary length but has a fixed output size, there will necessarily be an infinite number of collisions."
The goal is not to eliminate collisions (impossible) but to make them infeasible to find.

Don't confuse: collision existence (guaranteed by mathematics) vs. collision resistance (a computational hardness requirement).

🎨 The "black art" of hash design

The excerpt explains why hash functions are hard to design:

If a candidate hash has "clear structure (usually mathematical)" or is very fast to compute, the cryptological community usually finds a way to break one of the resistance requirements.
Current widely-used algorithms "tend to be very ad hoc computations that just seem messy and have resisted attempts at inversion or breaking resistance."

📚 Real-world hash function examples

📚 MD5 (early 1990s–2004)

Developed by Ron Rivest, published 1992.
Output size: 128 bits.
Thought flawed since mid-1990s; real attack published 2004 showing it is not collision resistant.
Still used today to verify non-malicious data corruption (called a "fingerprint" in this context).

Don't confuse: MD5 is broken for cryptographic purposes (malicious attacks) but still useful for detecting accidental transmission errors.

📚 SHA-1 (late 1990s–2010)

Output size: 160 bits.
Developed by US National Security Agency (NSA) in a semi-public process.
Adopted by US National Institute of Standards and Technology (NIST) as part of Federal Information Processing Standards.
2004: work published indicating vulnerability to certain attacks.
2010: NIST required many US federal applications to move to another hash function.

📚 SHA-2 / SHA-256 (current)

At the time of the excerpt's writing, most security-conscious users and organizations recommend SHA-2, usually the "SHA-256" variant.
Output size: 256 bits.
The excerpt notes concern that NSA participated in SHA-2 development, given "recent revelations of the NSA's involvement – and weakening of – cryptographic protocols."

🔐 RSA digital signatures (formal definition)

🔐 Setup and signing process

The excerpt provides a formal definition:

Bob's setup:

Performs RSA setup (as in Definition 4.4.5).
Chooses a cryptographic hash function h.
Publishes both his public encryption key k_e and the description of h on his website.

Signing a message:

Bob wants to sign message m.
Computes the signature: s = d_{k_d}(h(m)) (applies private key to the hash).
Transmits both m and s to the recipient (Charlie).

🔐 Verification process

Charlie's verification:

Receives signed message (m, s) claiming to be from Bob.
Downloads Bob's public key k_e and hash function description h from Bob's website.
Computes e_{k_e}(s) and compares it to h(m).
If they are equal, accepts the signature; if not, rejects.

🏷️ Terminology for signature context

The excerpt introduces new names for the keys:

Bob's private/decryption key k_d is called the signing key.
Bob's public/encryption key k_e is called the verification key.

This reflects their roles in the signature workflow rather than the encryption workflow.

Man-In-The-Middle Attacks, Certificates, and Trust

4.6. Man-In-The-Middle Attacks, Certificates, and Trust

🧭 Overview

🧠 One-sentence thesis

Public-key cryptography's security depends on reliably linking a real person's identity to their public key, a gap that man-in-the-middle attacks exploit and that certificate authorities or webs of trust attempt to close.

📌 Key points (3–5)

The core vulnerability: public-key crypto assumes the bits claiming to be from Bob actually are from Bob, but an active attacker can substitute their own key.
Man-in-the-middle attack: Eve intercepts Bob's public key, substitutes her own, decrypts Alice's message, then re-encrypts it with Bob's real key—violating confidentiality and integrity while appearing normal.
Certificate authorities (CAs): trusted third parties that digitally sign public keys to certify ownership, forming a public key infrastructure (PKI).
Web of trust: an alternative where individuals personally verify and sign each other's keys, creating chains of trust without central authority.
Common confusion: symmetric vs asymmetric key exchange—symmetric systems gain trust through in-person key exchange, while asymmetric systems must solve the identity-to-key binding problem over the network.

🕳️ The fundamental gap in public-key cryptography

🕳️ Person vs bits claiming to be from that person

The excerpt identifies a "dangerous gap" between casual assumptions and reality.
The distinction: there is a difference between "Bob, the person" and "the bits arriving over the wire to Alice or Larry which claim to be from Bob."
This gap matters little if Eve is only a passive observer, but becomes critical if Eve can actively replace communications.

🔓 When symmetric crypto has an advantage

Symmetric cryptosystems require Alice and Bob to meet in person to exchange keys.
At that "perfect, prelapsarian moment," they can verify each other's identity directly—they wouldn't exchange keys without seeing "lots of official-looking identification materials."
Asymmetric systems lack this built-in identity verification step.

🎭 Man-in-the-middle attack mechanism

🎭 How the attack works

Man-in-the-middle attack: an active network attack where Eve intercepts and replaces public keys, allowing her to decrypt, read, and modify messages while making both parties believe communication is normal.

Step-by-step scenario:

Alice requests Bob's public key from his web page.
Eve intercepts Bob's web server response containing his real public key k_B^e.
Eve substitutes her own key k_E^e (for which only she knows the decryption key k_E^d) into the web page data.
Alice encrypts her message m using what she thinks is Bob's key: c_A = encrypt with k_E^e (m).
Eve intercepts c_A, decrypts it with her private key k_E^d to read m, and stores it.
Eve re-encrypts the message (possibly modified to m') with Bob's real public key: c_E = encrypt with k_B^e (m').
Bob receives c_E, decrypts it with his private key k_B^d, and reads a sensible message that appears to come from Alice.

💥 What Eve achieves

Confidentiality violation: Eve reads all messages between Alice and Bob.
Integrity violation: Eve can modify messages "any time she liked, still making it appear to come legitimately from Alice."
Invisibility: both Alice and Bob think everything is working normally, so they continue their "revealing chatter."
From Bob's perspective, the decrypted message makes perfect sense and Alice behaves off-line as if she sent it—which she did, just not in the encrypted form Bob received.

⚠️ Don't confuse

This is not a passive eavesdropping attack—Eve must actively intercept and replace messages.
The attack doesn't break the encryption itself; it exploits the lack of verified identity-to-key binding.

🏛️ Certificate authorities and PKI

🏛️ How CAs establish trust

Certificate authority (CA): a trusted third party that issues digital certificates for public keys by signing them with the CA's signing key.

Individuals or organizations request certificates from a CA.
The certificate is a digital signature on the public key itself.
The CA's verification key is "assumed widely disseminated across the Internet or perhaps built into the basic operating system software or computer hardware."

🔐 Using certificates

When Alice wants to use Bob's public key, she first checks the validity of the associated certificate.
If the certificate is signed by a trusted third party (TTP) whose public key Alice already has, she can verify that the key belongs to Bob "at least as far as the TTP knew."
This transfers trust: Alice trusts the CA, the CA vouches for Bob's key, so Alice can trust the key.

🏗️ Public key infrastructure (PKI)

Public key infrastructure (PKI): the entire aggregate of certificates, CAs, pre-installed or widely distributed verification keys, etc.

🚧 Limitations in practice

The excerpt notes that CAs often "cannot do more than certify that a certain public key is owned by an individual who has access to a certain e-mail address, or the webmaster password to a certain site."
This is "purely Internet-based" verification—connecting to a real external identity would require checking government-issued ID, which is "rarely done."
The excerpt suggests governments could act as CAs, with IDs containing RFID chips that transmit certificates (as recent US passports do).

🕸️ Web of trust alternative

🕸️ Peer-to-peer trust model

Web of trust: an approach where individuals who know each other personally sign each other's public keys, creating chains of signatures instead of relying on central authorities.

Favored by "those who mistrust authority but are willing to trust some others they know personally."
Individuals with "faith in each other's good cryptologic habits" sign each other's public keys.
Each signature is added to those already collected on a key.

🔗 Using the web of trust

When you want to use someone's public key, you "unwrap a chain of digital signatures, each signing the next."
You follow the chain until you find a signature by "a person whom you know personally, have met, and whose public key you have verified."
This approach is "strongly supported by GnuPG and other OpenPGP-compatible organizations."

🎉 Key-signing parties

To be useful, a web of trust needs "as many individuals and their keys as possible" with "as many connections" as possible.
Key-signing party: an event to quickly build connections.
If you're standing next to someone you trust who vouches for another person, "you might be willing to sign both of their keys right there."
In practice, you can verify just the md5 fingerprint (shorter than the whole key) because you're physically present with someone you trust, so you don't expect "anyone there has devoted large computational resources to finding a second md5 pre-image."

🔄 Comparison: CA vs web of trust

Aspect	Certificate Authority (CA)	Web of Trust
Trust model	Centralized: trust a third-party authority	Decentralized: trust personal connections
Who signs keys	The CA signs keys after verification	Individuals sign each other's keys
Verification basis	Often just email/website access; rarely real ID	Personal meetings and direct verification
Philosophy	Institutional trust	Peer trust, mistrust of authority
Distribution	CA verification keys pre-installed in systems	Chains of signatures between individuals

More Properties of Multiplicative Order

5.1. More Properties of Multiplicative Order

🧭 Overview

🧠 One-sentence thesis

The multiplicative order of an element modulo n determines exactly which powers of that element are congruent to 1, and the order of any power of the element can be computed using the greatest common divisor.

📌 Key points (3–5)

When powers equal 1: a raised to power k is congruent to 1 mod n if and only if the order of a divides k.
When two powers are equal: two powers of a are congruent mod n if and only if their exponents are congruent modulo the order of a.
Order of a power: the order of a to the k-th power equals the order of a divided by the gcd of the order and k.
Common confusion: powers of a generator live in the cyclic subgroup, but the exponents themselves behave like integers modulo the order—don't confuse the modulus n with the modulus ord_n(a) for exponents.
Subgroup containment: smaller cyclic subgroups generated by powers of a are subsets of the cyclic subgroup generated by a, with size determined by the order formula.

🔑 When powers equal 1

🔑 The divisibility criterion (Theorem 5.1.1)

For n ≥ 2 and gcd(a, n) = 1: a to the k-th power is congruent to 1 mod n for some natural number k if and only if ord_n(a) divides k.

Forward direction (easy):

If ord_n(a) divides k, then there exists d such that k = ord_n(a) times d.
Then a^k = (a^ord_n(a))^d ≡ 1^d = 1 mod n.

Reverse direction (uses Division Algorithm):

Suppose a^k ≡ 1 mod n.
Write k = ord_n(a) times q + r where 0 ≤ r < ord_n(a).
Then 1 ≡ a^k = (a^ord_n(a))^q · a^r ≡ 1^q · a^r ≡ a^r mod n.
But ord_n(a) is the smallest positive integer such that a to that power is congruent to 1.
The only way a^r ≡ 1 mod n with 0 ≤ r < ord_n(a) can be true is if r = 0.
Therefore k = ord_n(a) times q, so ord_n(a) divides k.

Why it matters:

This tells us exactly which exponents produce the identity element.
Example: if ord_n(a) = 6, then a^12 ≡ 1, a^18 ≡ 1, a^24 ≡ 1 (all multiples of 6), but a^7 ≢ 1, a^10 ≢ 1 (not multiples of 6).

🔄 When two powers are equal

🔄 Exponents modulo the order (Theorem 5.1.2)

For n ≥ 2 and gcd(a, n) = 1: a^j ≡ a^k mod n for natural numbers j, k if and only if j ≡ k mod ord_n(a).

Key insight:

Powers of a in the cyclic subgroup should be treated as if the exponents live in Z/(ord_n(a))Z.
The modulus for the base is n, but the modulus for the exponents is ord_n(a).

Forward direction:

If j ≡ k mod ord_n(a), then there exists ℓ such that j = k + ord_n(a) times ℓ.
Then a^j = a^k · (a^ord_n(a))^ℓ ≡ a^k · 1^ℓ = a^k mod n.

Reverse direction:

Suppose a^j ≡ a^k mod n, and without loss of generality j ≥ k.
Multiply both sides by (a^(-1))^k, which exists since gcd(a, n) = 1.
Get a^(j-k) ≡ 1 mod n.
By Theorem 5.1.1, ord_n(a) divides (j - k), so j ≡ k mod ord_n(a).

Don't confuse:

The congruence a^j ≡ a^k mod n (base modulo n) is controlled by j ≡ k mod ord_n(a) (exponents modulo the order).
These are two different moduli for two different objects.

🧮 Example from the tables

The excerpt notes that Table 5.0.1 illustrates these theorems:

Each cyclic subgroup (row) has a number of elements that divides φ(n).
Powers of the generator a are only defined up to ord_n(a)—meaning exponents repeat modulo the order.

🔢 Order of a power

🔢 The order formula (Theorem 5.1.4)

For n ≥ 2, gcd(a, n) = 1, and any natural number k: ord_n(a^k) = ord_n(a) / gcd(ord_n(a), k).

Setup:

Let r = ord(a^k) and s = ord(a).
Goal: prove r = s / gcd(s, k).
Note that gcd(s, k) divides both s and k, so s/gcd(s, k) and k/gcd(s, k) are both natural numbers.

Proof sketch:

Show r divides s/gcd(s, k):
- Compute (a^k)^(s/gcd(s,k)) = (a^s)^(k/gcd(s,k)) = 1^(k/gcd(s,k)) ≡ 1 mod n.
- By Theorem 5.1.1, r divides s/gcd(s, k).
Show s/gcd(s, k) divides r:
- Since a^(kr) = (a^k)^r ≡ 1 mod n, by Theorem 5.1.1, s divides kr.
- So there exists m such that ms = kr.
- Divide both sides by gcd(s, k): m(s/gcd(s,k)) = (k/gcd(s,k))r.
- Therefore s/gcd(s,k) divides (k/gcd(s,k))r.
- By a theorem (1.5.5), gcd(s/gcd(s,k), k/gcd(s,k)) = 1.
- By Euclid's Lemma (2.1.6), s/gcd(s,k) divides r.
Conclusion:
- r and s/gcd(s,k) divide each other, so they are equal.

🧩 Subgroup containment example

The excerpt gives a concrete example with n = 7 and a = 3:

Case 1: b = a^2 ≡ 2

The cyclic subgroup ⟨b⟩ = {b, b^2, b^3} = {2, 4, 1} = {3^2, 3^4, 3^6}.
This is a subset of ⟨a⟩ = {3, 3^2, 3^3, 3^4, 3^5, 3^6}.
⟨b⟩ consists of half the elements of ⟨a⟩, namely the even powers of a.

Case 2: c = a^3 ≡ 6

The cyclic subgroup ⟨c⟩ = {c, c^2} = {6, 1} = {3^3, 3^6}.
This is a subset of ⟨a⟩.
⟨c⟩ consists of one third of the elements of ⟨a⟩, the powers of a which are multiples of 3.

General pattern:

The size of ⟨a^k⟩ as a subset of ⟨a⟩ is just the order of a^k.
By Theorem 5.1.4, this size is ord_n(a) / gcd(ord_n(a), k).
Example: if ord_n(a) = 6 and k = 2, then ord_n(a^2) = 6 / gcd(6, 2) = 6 / 2 = 3, so ⟨a^2⟩ has 3 elements (half of ⟨a⟩'s 6 elements).

📝 Exercises and applications

📝 Selected exercise highlights

The excerpt includes four exercises:

Exercise 5.1.1:

If there exists a in (Z/nZ)* such that ord_n(a) = n - 1, then n is prime.
This connects order to primality testing.

Exercise 5.1.2:

For odd prime p and a in (Z/pZ)*, if ord_p(a) = 2^k for some k, then a^k ≡ -1 mod p.
Hint references the proof of Lemma 3.2.1.

Exercise 5.1.3:

For n ≥ 2 and a, b both relatively prime to n: ord_n(ab) divides ord_n(a) times ord_n(b).
This gives a bound on the order of a product.

Exercise 5.1.4:

Prove there are infinitely many primes congruent to 1 mod 4.
Uses the fact that if n^2 ≡ -1 mod p for odd prime p, then 4 divides φ(p).
Follows a contradiction argument similar to Euclid's proof of infinitely many primes.

5.2. A Necessary Digression: Gauss's Theorem on Sums of Euler's Function

5.2. A Necessary Digression: Gauss’s Theorem on Sums of Euler’s Function

🧭 Overview

🧠 One-sentence thesis

Gauss's theorem states that for any natural number n, the sum of Euler's phi function over all divisors of n equals n itself, and this can be proved either by counting elements or by exploiting multiplicative properties.

📌 Key points (3–5)

The theorem: For all natural numbers n, summing phi(d) over all positive divisors d of n gives exactly n.
First proof strategy: partition the integers from 1 to n by their gcd with n, count each partition using phi, and show the total is n.
Second proof strategy: define a function F(n) as the sum of phi over divisors, prove F is multiplicative for coprime factors, then use prime factorization.
Common confusion: the sum is over divisors of n (not all numbers up to n), and each divisor d contributes phi(d) (not d itself).
Why it matters: this result is needed later in the chapter on indices and discrete logarithms.

🔢 The theorem statement

🔢 What Gauss proved

Theorem 5.2.1: For all n in the natural numbers, the sum of phi(d) over all positive divisors d of n equals n.

Notation: sum over d such that d divides n, of phi(d), equals n.
Euler's phi function phi(d) counts how many numbers from 1 to d are relatively prime to d.
The theorem says: add up phi for every divisor of n, and you get n back.
Example: if n = 6, divisors are 1, 2, 3, 6; phi(1)=1, phi(2)=1, phi(3)=2, phi(6)=2; sum is 1+1+2+2=6.

🧩 Why this is not obvious

You are not summing the divisors themselves (that would give a different number).
You are summing how many numbers are coprime to each divisor.
The fact that this always equals n is a non-trivial counting identity.

🧮 First proof: partition and count

🧮 Partitioning by gcd

Fix n. For each positive divisor d of n, define the set S_d as all k from 1 to n such that gcd(k, n) = d.
These sets are disjoint: every k has exactly one gcd with n, so k belongs to exactly one S_d.
The union of all S_d (over all divisors d of n) is the entire set {1, ..., n}.

🔍 Counting each partition

For k in S_d, gcd(k, n) = d, which (by a theorem) is equivalent to gcd(k/d, n/d) = 1.
So the number of elements in S_d equals the count of integers ℓ from 1 to n/d that are coprime to n/d.
That count is exactly phi(n/d).
Therefore #(S_d) = phi(n/d).

🔄 Reindexing the sum

Summing over all divisors d: n = sum over d dividing n of #(S_d) = sum over d dividing n of phi(n/d).
Key observation: as d runs over all divisors of n, so does n/d (the set of divisors is symmetric under this map).
Therefore sum over d of phi(n/d) = sum over d of phi(d), which is the desired formula.

Don't confuse: the reindexing step relies on the fact that {d : d divides n} = {n/d : d divides n}; this is a bijection between divisors.

🏗️ Second proof: multiplicative function

🏗️ Defining the function F

Define F(n) = sum over d dividing n of phi(d).
Goal: prove F(n) = n for all n.
Strategy: show F is multiplicative (F(ab) = F(a)F(b) when gcd(a,b)=1), then use prime factorization.

🔢 F on prime powers

If p is prime and k is a natural number, the divisors of p^k are 1, p, p^2, ..., p^k.
phi(1)=1, phi(p)=p−1, phi(p^2)=p^2−p, ..., phi(p^k)=p^k − p^(k−1).
Summing these: F(p^k) = 1 + (p−1) + (p^2−p) + ... + (p^k − p^(k−1)) = p^k (telescoping sum).

🔗 F on products of distinct primes

If p and q are distinct primes, divisors of pq are 1, p, q, pq.
Since gcd(p,q)=1, phi(pq) = phi(p)phi(q) (by a theorem).
F(pq) = phi(1) + phi(p) + phi(q) + phi(pq) = 1 + phi(p) + phi(q) + phi(p)phi(q) = (1 + phi(p))(1 + phi(q)) = F(p)F(q).

🧩 General multiplicativity

The excerpt states (and leaves as an exercise) that F(ab) = F(a)F(b) whenever gcd(a,b)=1.
This means F has the same multiplicative property as phi.

🎯 Finishing the proof

Any n > 1 has a prime factorization n = p_1^(k_1) · ... · p_r^(k_r) with distinct primes.
Since the prime powers are pairwise coprime, F(n) = F(p_1^(k_1)) · ... · F(p_r^(k_r)).
By the prime-power calculation, each F(p_i^(k_i)) = p_i^(k_i).
Therefore F(n) = p_1^(k_1) · ... · p_r^(k_r) = n.
For n=1, the theorem is trivial (the only divisor is 1, phi(1)=1).

🔄 Comparing the two proofs

Aspect	Proof 1	Proof 2
Main idea	Partition {1,...,n} by gcd, count each piece	Define F, prove it's multiplicative, use prime factorization
Key technique	Direct counting and reindexing	Multiplicative property and induction on prime structure
What it illustrates	Combinatorial interpretation of the sum	Algebraic structure of arithmetic functions
Complexity	One direct argument	Requires proving F is multiplicative (exercise)

Don't confuse: both proofs arrive at the same theorem, but they highlight different aspects—one is combinatorial (counting elements), the other is algebraic (exploiting multiplicativity).

Primitive Roots

5.3. Primitive Roots

🧭 Overview

🧠 One-sentence thesis

Primitive roots are special elements in modular arithmetic that generate the entire group of units, and for prime moduli there are exactly φ(p − 1) such roots, though certain powers of 2 have none at all.

📌 Key points (3–5)

What a primitive root is: an element a in (Z/nZ)* whose order equals φ(n), meaning its powers cycle through all units mod n.
How many primitive roots exist: if n has any primitive root, it has exactly φ(φ(n)) of them.
When primitive roots are guaranteed: every prime p has φ(p − 1) primitive roots.
Common confusion: not all moduli have primitive roots—powers of 2 greater than or equal to 8 have none.
Why it matters: primitive roots let you express every unit as a power of a single element, enabling "discrete logarithms" (indices).

🔑 Definition and basic properties

🔑 What is a primitive root?

Primitive root mod n: an element a ∈ (Z/nZ)* such that ord_n(a) = φ(n).

In plain language: a primitive root is an element whose order is as large as possible—it equals the size of the group of units.
An integer x ∈ Z is called a primitive root mod n if its congruence class [x]_n is a primitive root.
Example: For n = 7, the element 3 is a primitive root because ord₇(3) = 6 = φ(7).

📊 Examples from small moduli

The excerpt provides a table of primitive roots for small n:

n	Primitive Roots mod n	φ(n)	φ(φ(n))
2	1	1	1
3	2	2	1
4	3	2	1
5	2, 3	4	2
6	5	2	1
7	3, 5	6	2
8	none	4	2
16	none	8	4
17	3, 5, 6, 7, 10, 11, 12, 14	16	8

Notice: n = 8 and n = 16 have no primitive roots.
The column φ(φ(n)) often matches the count of primitive roots.

🔢 Counting primitive roots

🔢 The general counting theorem

Theorem 5.3.3: If n ≥ 2 has any primitive root, then it has exactly φ(φ(n)) primitive roots.

Why this works:

Let a be a primitive root, so ⟨a⟩ = (Z/nZ)* (the powers of a fill the entire group).
Any other primitive root b must be some power of a, say b = a^k.
The order of b = a^k is φ(n) / gcd(φ(n), k).
For b to be a primitive root, we need gcd(φ(n), k) = 1.
By definition of Euler's φ function, there are exactly φ(φ(n)) such values of k.

Don't confuse: This theorem does not say every n has a primitive root; it only counts them if at least one exists.

🧮 Polynomial roots mod p (Lagrange's theorem)

Theorem 5.3.4: For prime p, a polynomial of degree n with leading coefficient not divisible by p has at most n solutions in Z/pZ.

How the proof works:

Base case (n = 1): a linear congruence a₁x + a₀ ≡ 0 (mod p) with gcd(p, a₁) = 1 has exactly one solution.
Inductive step: if a(x) has a root z₁, divide to get a(x) = b(x)(x − z₁) + r(x) where r(x) is constant.
Plugging in z₁ shows r(x) ≡ 0 (mod p), so a(x) ≡ b(x)(x − z₁) (mod p).
By the "zero product property" (valid in Z/pZ for prime p), roots of a(x) come from roots of b(x) or (x − z₁).
Since b(x) has degree n and at most n roots by induction, a(x) has at most n + 1 roots.

🎯 Exact count for x^d = 1 (mod p)

Theorem 5.3.5: If p is prime and d divides p − 1, then x^d ≡ 1 (mod p) has exactly d solutions.

Why:

Write m = (p − 1)/d, so dm = p − 1.
Factor: x^(p−1) − 1 = (x^d − 1)(x^(dm−d) + x^(dm−2d) + ⋯ + x^d + 1).
By Fermat's Little Theorem, x^(p−1) − 1 has exactly p − 1 roots (all nonzero classes).
The second factor has degree dm − d = p − 1 − d, so at most p − 1 − d roots.
Therefore the first factor x^d − 1 must have at least d roots; by Lagrange's theorem it has at most d roots, so exactly d.

🏆 Primitive roots for primes

🏆 Counting elements of each order

Theorem 5.3.6: If p is prime and d divides p − 1, there are exactly φ(d) congruence classes of order d in Z/pZ.

How the proof works:

Define ψ(k) = number of elements of order k.
Every element of order dividing p − 1 has some order, so Σ ψ(k) = p − 1 (sum over k dividing p − 1).
By Gauss's theorem (5.2.1), Σ φ(k) = p − 1 (same sum).
Goal: show ψ(k) ≤ φ(k) for all k dividing p − 1; then equality must hold everywhere.
If ψ(k) = 0 (no elements of order k), then ψ(k) ≤ φ(k) trivially.
If ψ(k) > 0, let a be an element of order k. Then a, a², ..., a^k are k distinct solutions to x^k ≡ 1 (mod p).
By Theorem 5.3.5, these are all the solutions.
By Theorem 5.1.4, a^ℓ has order k / gcd(k, ℓ), so order k exactly when gcd(k, ℓ) = 1.
There are φ(k) such ℓ in {1, ..., k}, so ψ(k) = φ(k).

🎉 Corollary for primitive roots

Corollary 5.3.7: If p is prime, there are φ(p − 1) primitive roots in Z/pZ.

This follows immediately from Theorem 5.3.6 with d = p − 1.
Primitive roots have order p − 1, which divides p − 1.
Example: For p = 17, φ(16) = 8, so there are 8 primitive roots (confirmed in the table: 3, 5, 6, 7, 10, 11, 12, 14).

🚫 When primitive roots don't exist

🚫 Powers of 2 have no primitive roots

Theorem 5.3.8: For k ≥ 3, n = 2^k has no primitive roots.

Proof by induction:

Claim: For all odd a, a^(2^(k−2)) ≡ 1 (mod 2^k).
Base case (k = 3): Check all odd classes mod 8:
- 1² = 1 ≡ 1 (mod 8)
- 3² = 9 ≡ 1 (mod 8)
- 5² = 25 ≡ 1 (mod 8)
- 7² = 49 ≡ 1 (mod 8)
Inductive step: Assume a^(2^(k−2)) ≡ 1 (mod 2^k), so a^(2^(k−2)) = 2^k ℓ + 1 for some integer ℓ.
Square both sides: a^(2^(k−1)) = (2^k ℓ + 1)² = 2^(2k) ℓ² + 2·2^k ℓ + 1 = 2^(k+1)(2^(k−1) ℓ² + ℓ) + 1.
So a^(2^(k−1)) ≡ 1 (mod 2^(k+1)), proving the claim for k + 1.

Why this means no primitive roots:

For all a ∈ (Z/2^k Z)* (all odd a), ord_(2^k)(a) ≤ 2^(k−2).
But φ(2^k) = 2^(k−1).
Since 2^(k−2) < 2^(k−1) for k ≥ 3, no element has order φ(2^k).
Therefore n = 2^k has no primitive roots.

Don't confuse: n = 2 and n = 4 do have primitive roots (see the table); the theorem only applies to 2^k for k ≥ 3.

🔍 Introduction to indices

🔍 The index as a discrete logarithm

The excerpt begins to define the concept of an index:

Index (partial definition): If n has a primitive root a, then for any b with gcd(b, n) = 1, the smallest k ∈ N such that b ≡ a^k (mod n) is called...

The definition is incomplete in the excerpt, but the idea is clear: the index is the exponent k.
Since a is a primitive root, every unit b can be written as a power of a.
The index is the "discrete logarithm" of b to the base a.
Example: If a = 3 is a primitive root mod 7 and b = 5, we find k such that 3^k ≡ 5 (mod 7). Since 3³ = 27 ≡ 6 (mod 7), 3⁴ = 81 ≡ 4 (mod 7), 3⁵ = 243 ≡ 5 (mod 7), the index of 5 (base 3) is 5.

Indices

5.4. Indices

🧭 Overview

🧠 One-sentence thesis

Indices (discrete logarithms) provide a way to express elements in modular arithmetic as powers of a primitive root, enabling algebraic manipulation of congruences by converting multiplication into addition.

📌 Key points (3–5)

What an index is: the smallest exponent k such that b ≡ a^k (mod n), where a is a primitive root of n.
Why indices behave like logarithms: they convert multiplication to addition (ind_a(bc) ≡ ind_a(b) + ind_a(c)) and exponentiation to multiplication (ind_a(b^k) ≡ k · ind_a(b)), both modulo φ(n).
Common confusion: the index congruence uses modulus φ(n), not the original modulus n—this is a frequent mistake.
Practical use: indices solve exponential congruences by reducing them to linear congruences in the index.
Cryptographic significance: computing b = a^k (mod n) is fast, but finding k = ind_a(b) given b is computationally hard, making indices useful for one-way functions.

🔢 Definition and basic structure

🔢 What an index is

Index of b relative to a: If n ∈ ℕ has a primitive root a, then for any b ∈ ℤ with gcd(b, n) = 1, the smallest k ∈ ℕ such that b ≡ a^k (mod n) is called the index of b relative to a, denoted ind_a(b).

The definition requires that n has a primitive root a.
Since a is a primitive root, the cyclic group generated by a equals all of (ℤ/nℤ)*, so every b coprime to n is some power of a.
The index tells you which power.
Example: If 3 is a primitive root mod 7 and 4 ≡ 3^4 (mod 7), then ind_3(4) = 4.

📋 Index tables

The excerpt provides tables for small moduli (n = 2, 3, 4, 5, 7) and a detailed table for n = 17 showing ind_a(b) for all primitive roots a and all b ∈ (ℤ/17ℤ)*.

For n = 17, there are 8 primitive roots (3, 5, 6, 7, 10, 11, 12, 14).
Each primitive root gives a different index function, but they all satisfy the same algebraic properties.

🧮 Logarithm-like properties

🧮 Four fundamental rules

The excerpt states that indices behave like logarithms with base a:

Property	Statement	Modulus
Identity	ind_a(1) = φ(n) ≡ 0 (mod φ(n))	φ(n)
Base	ind_a(a) = 1	—
Product	ind_a(bc) ≡ ind_a(b) + ind_a(c)	φ(n)
Power	ind_a(b^k) ≡ k · ind_a(b)	φ(n)

Why the product rule works: if b ≡ a^m and c ≡ a^n, then bc ≡ a^(m+n), so ind_a(bc) = m + n = ind_a(b) + ind_a(c).
Why the power rule works: if b ≡ a^m, then b^k ≡ a^(km), so ind_a(b^k) = km = k · ind_a(b).
All congruences for indices are modulo φ(n), not modulo n.

⚠️ Common mistake: wrong modulus

The excerpt explicitly warns:

Caution: A common mistake with indices is to use the same modulus with the indices as with the original congruence. Instead, when the congruence is in mod n, for n ∈ ℕ, the index congruence is in mod φ(n)!

Original congruence: mod n.
Index congruence: mod φ(n).
Don't confuse: the modulus changes when you take indices.

🔧 Solving congruences with indices

🔧 The method

Indices convert exponential congruences into linear congruences, which are easier to solve.

Steps:

Start with a congruence in mod n (e.g., 3x^5 ≡ 4 (mod 7)).
Take ind_a of both sides, using the product and power rules.
Solve the resulting linear congruence in mod φ(n).
Convert the solution back by looking up which element has that index.

🧪 Worked example

The excerpt solves 3x^5 ≡ 4 (mod 7) using primitive root a = 5:

Take ind_5 of both sides: ind_5(3) + 5 · ind_5(x) ≡ ind_5(4) (mod 6).
From the table: ind_5(3) = 5 and ind_5(4) = 2.
Substitute: 5 + 5 · ind_5(x) ≡ 2 (mod 6).
Solve: 5 · ind_5(x) ≡ -3 ≡ 3 (mod 6), so ind_5(x) ≡ 5^(-1) · 3 ≡ 5 · 3 ≡ 3 (mod 6).
From the table: ind_5(6) = 3, so x = 6.
Check: 3 · 6^5 = 23328 ≡ 4 (mod 7). ✓

🔄 Choice of primitive root

The excerpt shows that using a different primitive root (a = 3 instead of a = 5) yields the same solution:

ind_3(3) + 5 · ind_3(x) ≡ ind_3(4) (mod 6).
1 + 5 · ind_3(x) ≡ 4 (mod 6).
5 · ind_3(x) ≡ 3 (mod 6), so ind_3(x) ≡ 3 (mod 6).
Again, x = 6.

The choice of primitive root affects the intermediate calculations but not the final answer.

🔐 Cryptographic application

🔐 One-way function property

The excerpt notes that indices are important for cryptography because of an asymmetry:

Direction	Problem	Feasibility
Forward	Given n, a, k, compute b = a^k (mod n)	Fast (feasible) using fast modular exponentiation
Backward	Given n, a, b, find k = ind_a(b) such that a^k ≡ b (mod n)	No known feasible algorithm (hard)

Forward direction: exponentiation in (ℤ/nℤ)* is computationally easy.
Backward direction: finding the discrete logarithm (index) is computationally hard.
This asymmetry makes indices an "excellent candidate one-way function" for cryptological applications.
The excerpt mentions that discrete logarithm-based cryptological protocols will be discussed in the next sections.

🔬 Terminology

Indices are called discrete logarithms in the computer science literature.

The term "discrete logarithm" emphasizes the analogy with ordinary logarithms.
"Discrete" refers to the finite, modular arithmetic setting (as opposed to continuous real numbers).

5.5. Diffie-Hellman Key Exchange

🧭 Overview

🧠 One-sentence thesis

Diffie-Hellman key exchange (DHKE) allows two parties to establish a shared secret over a public network by using modular exponentiation as a one-way function, with security depending on the difficulty of solving the Diffie-Hellman problem.

📌 Key points (3–5)

What DHKE does: Alice and Bob create a shared secret key over a public channel without ever directly transmitting the secret itself.
How it works: Both parties use a public prime p and primitive root r, each choose a private exponent, publish partial results, then compute the same shared secret using the other's public value.
Why it's secure: Breaking DHKE requires solving the Diffie-Hellman problem (computing r to the power xy from r to the power x and r to the power y), which is infeasible on classical computers.
Common confusion: The security does NOT require that computing discrete logs is the only way to break DHKE—any solution to the DHP would break it, but no efficient classical algorithm for DHP is known.
Practical use: DHKE is widely deployed in Internet protocols including SSL, TLS, SSH, IPsec, and many VPNs.

🔐 The DHKE protocol

🔐 Setup and public parameters

Diffie-Hellman key exchange (DHKE): A protocol where Alice and Bob agree on a large prime p and a primitive root r in (ℤ/pℤ)*, then publish both values.

These two values (p and r) are public—anyone, including an eavesdropper Eve, knows them.
The prime p should be large for security.
Finding p is feasible (discussed in § 4.4), and modular exponentiations are also computationally feasible.

🔑 Private choices and public exchange

Alice's steps:

Chooses a secret integer α satisfying 1 ≤ α ≤ p − 1.
Computes A = r to the power α (mod p).
Keeps α secret but publishes A.

Bob's steps:

Chooses a secret integer β satisfying 1 ≤ β ≤ p − 1.
Computes B = r to the power β (mod p).
Keeps β secret and publishes B.

The values A and B travel over the public network.
Neither Alice nor Bob ever reveals their private exponents α or β.

🤝 Computing the shared secret

Alice computes:

S = B to the power α (mod p).

Bob computes:

S = A to the power β (mod p).

Why they get the same value (Proposition 5.5.2):

Alice computes B to the power α = (r to the power β) to the power α = r to the power (β · α).
Bob computes A to the power β = (r to the power α) to the power β = r to the power (α · β).
By commutativity of multiplication, β · α = α · β, so both compute the same S.

Shared key (shared secret): The value S that both Alice and Bob compute but Eve does not have.

Alice and Bob then use S as the key for a symmetric cryptosystem they had previously agreed upon.

📋 Example walkthrough (Example 5.5.3)

Step	Alice	Bob
Public agreement	p = 617, r = 17	p = 617, r = 17
Private choice	α = 19	β = 13
Public value sent	A = 17^19 ≡ 385 (mod 617)	B = 17^13 ≡ 227 (mod 617)
Shared secret computed	S = 227^19 ≡ 127 (mod 617)	S = 385^13 ≡ 127 (mod 617)

Both arrive at S = 127, which they use for symmetric encryption.
All communication of A and B can happen over insecure channels (e.g., unencrypted email).

🛠️ Practical implementation issues

🛠️ Finding the primitive root r

The excerpt discusses two strategies:

Strategy 1: Reuse a known pair

Find one prime p and an associated primitive root r once.
Publish this pair as a standard.
Each user picks their own secret exponent (α or β) without overlap or conflict.
This approach is suggested in some Internet standards (references [HC] and [LK08]).

Strategy 2: Use Sophie Germain primes

Sophie Germain prime: A prime p such that 2p + 1 is also prime.

The excerpt notes that Sophie Germain primes have independent mathematical interest.
It is conjectured (but not proven) that there are infinitely many Sophie Germain primes.
There are precise conjectures on their asymptotic density and efficient algorithmic techniques to generate them (reference [Sho09]).
Example: The first seventeen Sophie Germain primes are 2, 3, 5, 11, 23, 29, 41, 53, 83, 89, 113, 131, 173, 179, 191, 233, 239.
The largest known Sophie Germain prime (at the time of writing) is 18543637900515 × 2^666667 − 1, discovered in 2012 by Philipp Bliedung and a distributed network.
The usefulness of Sophie Germain primes in DHKE is explored in the exercises (the excerpt does not explain the connection in detail).

⚠️ Feasibility of computations

Finding large primes p is feasible (discussed in § 4.4).
Modular exponentiations (computing A, B, and S) are feasible using fast modular exponentiation.
The remaining challenge is finding the primitive root r, addressed by the two strategies above.

🔒 Security of DHKE

🔒 The one-way function assumption

DHKE security depends on modular exponentiation being a one-way function.
Forward direction (feasible): Computing r to the power x (mod p) is fast using fast modular exponentiation.
Backward direction (infeasible): Computing the discrete log (finding x given r to the power x) is believed to be infeasible.

🔒 What would break DHKE

If Eve could compute discrete logs:

Starting with public values A and B, she would compute α = ind_r(A) and β = ind_r(B).
She could then compute S as B to the power α, or A to the power β, or directly as r to the power (α · β).
Having S, she could decrypt all of Alice and Bob's communications.

The Diffie-Hellman problem (DHP):

Diffie-Hellman problem (DHP): Given a prime p, a primitive root r in (ℤ/pℤ), and two elements a, b in (ℤ/pℤ) known to be of the form a = r to the power x and b = r to the power y (where x and y are not known), compute r to the power (xy).

An efficient discrete log algorithm would solve the DHP.
However, it is not known whether there might be a solution to the DHP that does not require computing full discrete logs.
Most precise security statement: Breaking DHKE amounts to solving the DHP, which is not feasible on a classical computer.

Don't confuse:

The security claim is NOT "discrete logs are the only way to break DHKE."
The security claim IS "solving the DHP (by any method) would break DHKE, and no efficient classical algorithm for DHP is known."

🔒 Quantum computing caveat

The excerpt notes that there is a known algorithm for a quantum computer that solves the DHP efficiently (reference [Sho94]).
This is similar to the situation with factoring.
On classical computers, however, the DHP remains infeasible.

🔒 Man-in-the-middle vulnerability

The excerpt mentions (in Exercise 5.5.3) that a man-in-the-middle attack is possible against DHKE.
The exercise asks to spell out the steps in precise mathematical detail, but the excerpt does not provide the attack description itself.

🌐 Real-world deployment

🌐 Widespread use

DHKE is one of the most widely used cryptologic protocols on the Internet.
It is part of:
- SSL (Secure Sockets Layer)
- TLS (Transport Layer Security)
- SSH (Secure Shell)
- IPsec (Internet Protocol Security)
- Many VPNs (Virtual Private Networks)

🌐 Historical context

Whitfield Diffie and Martin Hellman published "New directions in cryptography" [DH76] about a year before the RSA cryptosystem was invented.
This was the first full description of a working public key cryptosystem in the open scientific literature.
The excerpt notes that some workable public key crypto had been invented earlier within the US/UK intelligence community but was not shared with the public—there have been mathematical theorems and proofs held in secret by large governments since early in the Cold War.

The ElGamal Cryptosystem

5.6. The ElGamal Cryptosystem

🧭 Overview

🧠 One-sentence thesis

ElGamal builds a public-key cryptosystem and digital signature scheme by using modular exponentiation as a one-way function, where encryption scrambles messages through multiplication by a random power and decryption relies on knowledge of a secret discrete logarithm.

📌 Key points (3–5)

Core one-way function: exponentiation mod p (a public prime) is fast forward but its inverse—discrete log—is thought to be infeasible, even when the primitive root r is public.
Encryption mechanism: Bob multiplies the message m by a random power of Alice's public value a, producing a two-part ciphertext that Alice can decrypt using her secret exponent α.
Decryption relies on discrete log knowledge: Alice knows α such that a = r^α, allowing her to cancel the scrambling multiplication without computing discrete logs.
Digital signatures: Alice signs a message by creating a pair (x, y) that Bob verifies by checking whether a^x · x^y ≡ r^m (mod p); acceptance proves Alice knows the secret α.
Common confusion: ElGamal encryption produces two ciphertext components (c₁, c₂), not one like RSA; the first component helps the recipient undo the random scrambling.

🔐 How ElGamal encryption works

🔑 Alice's key setup

Alice performs the following steps to create her public and private keys:

Pick a large prime p.
Find a primitive root r mod p.
Choose a secret value α ∈ ℕ with 2 ≤ α ≤ p − 1.
Compute a = r^α mod p.

ElGamal public (encryption) key: (p, r, a)
ElGamal private (decryption) key: (p, r, α)

The association is E : (p, r, α) ↦ (p, r, r^α).

The message space is M = {m ∈ ℤ | 2 ≤ m ≤ p − 1}, assumed to be numerical encodings of meaningful messages.
Alice posts (p, r, a) on her website; she keeps α secret.

📤 Bob's encryption process

When Bob wants to send cleartext message m ∈ M to Alice:

Download Alice's public key (p, r, a).
For each new message m, generate a fresh random number β ∈ ℕ such that 2 ≤ β ≤ p − 2.
Build the ciphertext as two pieces:
- c₁ = r^β mod p
- c₂ = m · a^β mod p
Transmit the ciphertext c = (c₁, c₂) to Alice.

Why two parts? The first component c₁ = r^β carries information about the random exponent β (without revealing β itself), which Alice will use to undo the scrambling.

Example: If m = 42, β = 123, p = 11717, r = 103, a = 1020, Bob computes c₁ = 103^123 mod 11717 and c₂ = 42 · 1020^123 mod 11717, then sends (c₁, c₂).

📥 Alice's decryption process

When Alice receives ciphertext c = (c₁, c₂):

Compute the cleartext by:
m = c₂ · c₁^(p − 1 − α) mod p

Why this works (correctness proof):

c₂ · c₁^(p − 1 − α) ≡ (m · a^β) · (r^β)^(p − 1 − α) (mod p)
≡ m · (r^α)^β · (r^β)^(p − 1 − α) (mod p) (since a = r^α)
≡ m · r^(αβ + β(p − 1) − αβ) (mod p)
≡ m · r^(β(p − 1)) (mod p)
≡ m · (r^(p − 1))^β (mod p)
≡ m · 1^β (mod p) (by Fermat's Little Theorem, since r^(p−1) ≡ 1 mod p)
≡ m (mod p)

Don't confuse: The exponent p − 1 − α is used to compute (c₁^(−1))^α without using negative powers directly, applying the fact that c₁^(p−1) ≡ 1 mod p.

✍️ ElGamal digital signatures

🖊️ Alice signs a message

Suppose Alice has private key (p, r, α) and wants to digitally sign message m ∈ M:

Choose a random γ ∈ ℕ such that 1 < γ < p − 1 and gcd(γ, p − 1) = 1.
Compute the signature as two values:
- x = r^γ mod p
- y = γ^(−1) · (m − α · r^γ) mod (p − 1)
  (the inverse γ^(−1) is taken in mod p − 1)
The digital signature on m is (x, y).

Why mod (p − 1) for y? Because exponents in mod p arithmetic live in the group of order p − 1 (by Fermat's Little Theorem).

✅ Bob verifies the signature

To verify signature (x, y) on message m using Alice's public key (p, r, a):

Check whether a^x · x^y ≡ r^m (mod p).
If the congruence holds, accept; otherwise, reject.

Why this works (correctness proof):

a^x · x^y ≡ a^(r^γ) · (r^γ)^(γ^(−1)(m − α·r^γ)) (mod p)
≡ (r^α)^(r^γ) · (r^γ)^(γ^(−1)(m − α·r^γ)) (mod p) (since a = r^α)
≡ r^(α·r^γ + γ·γ^(−1)(m − α·r^γ)) (mod p)
≡ r^(α·r^γ + m − α·r^γ) (mod p) (since γ·γ^(−1) ≡ 1 mod (p−1))
≡ r^m (mod p)

So Bob will accept all legitimately signed messages from Alice.

Example: If Alice signs m = 97 with signature (6220, 10407) using public key (11717, 103, 1020), Bob computes 1020^6220 · 6220^10407 mod 11717 and checks if it equals 103^97 mod 11717.

🔒 Security foundation

🧱 The one-way function

ElGamal's security rests on the same principle as DHKE:

Forward direction (easy): exponentiation mod p is fast and feasible.
Inverse direction (hard): discrete logarithm with respect to a primitive root r mod p is thought to be infeasible, even when r is known to the public.

Comparison to RSA:

Aspect	RSA	ElGamal
Encryption operation	Exponentiation in mod n = pq	Multiplication by a random power in mod p
Decryption key	Multiplicative inverse in mod φ(n) of encryption exponent	Secret discrete log α such that a = r^α
Key knowledge	Owner knows (p, q) so can compute φ(n)	Owner knows α so can cancel scrambling
Ciphertext size	One component	Two components (c₁, c₂)

⚠️ Quantum vulnerability

The excerpt notes (in passing) that there is a known quantum computer algorithm (Shor's algorithm, reference [Sho94]) which solves the Discrete Log Problem (DHP) efficiently.

This means ElGamal, like DHKE and RSA factoring, would be vulnerable to sufficiently powerful quantum computers.
Don't confuse: This is a theoretical future threat; classical computers are still believed unable to solve discrete log efficiently for large primes.

🛠️ Practical workflow summary

📋 Encryption workflow

Alice (setup): pick p, find primitive root r, choose secret α, compute a = r^α, publish (p, r, a).
Bob (encrypt): download (p, r, a); given message m, choose random β, compute c₁ = r^β mod p and c₂ = m · a^β mod p, transmit (c₁, c₂).
Alice (decrypt): receive (c₁, c₂), compute m = c₂ · c₁^(p − 1 − α) mod p.

📋 Signature workflow

Alice (setup): same as encryption—publish (p, r, a), keep α secret.
Alice (sign): given message m, pick random γ with gcd(γ, p−1)=1, compute s₁ = r^γ mod p and s₂ = γ^(−1) · (m − α·r^γ) mod (p−1), transmit (m, s₁, s₂).
Bob (verify): download (p, r, a), receive (m, s₁, s₂), check if a^(s₁) · s₁^(s₂) ≡ r^m (mod p); if yes, accept; if no, reject.

Key difference from encryption: Signatures prove Alice knows α without revealing it; encryption hides the message m from everyone except Alice.