Counting Rocks! An Introduction to Combinatorics

1.1 Introducing combinatorics with a handshake

🧭 Overview

🧠 One-sentence thesis

The handshake problem demonstrates that the same counting question can be solved through multiple approaches—arithmetic reasoning, sequential counting, and visual graph modeling—all yielding the same answer and introducing core combinatorial techniques.

📌 Key points (3–5)

The problem: five students each shake hands with all others; the goal is to count the total number of handshakes.
Three solution methods: (1) arithmetic with double-counting correction, (2) sequential counting by student, (3) visual graph representation.
Graph terminology: vertices represent students, edges represent handshakes, and a complete graph connects every pair of vertices.
Common confusion: intersections of line segments in the graph drawing are not vertices—only the labeled dots representing students are vertices.
Why it matters: this simple problem introduces counting techniques, mathematical proof approaches, and graph theory that will be extended to larger, more complex scenarios.

🔢 Three ways to count handshakes

🔢 Arithmetic with double-counting

Each of 5 students shakes 4 hands, giving 5 times 4 equals 20.
But each handshake involves 2 students, so we counted every handshake twice.
Correction: divide 20 by 2 to get 10 handshakes.
Example: when student a shakes hands with student b, that single handshake is counted once for a and once for b.

📝 Sequential counting by student

Label the students a, b, c, d, e and count handshakes in order:
- Student a shakes hands with 4 others.
- Student b shakes hands with 3 others (excluding a, already counted).
- Student c shakes 2 more hands (excluding a and b).
- Student d shakes 1 more hand (with e).
- Student e does not need to shake any more hands (all already counted).
Total: 1 plus 2 plus 3 plus 4 equals 10.
This method avoids double-counting by ensuring each handshake is recorded only once.

🎨 Visual graph model

Draw a dot for each student and a line segment connecting each pair of dots.
Counting the line segments gives 10 handshakes.
This visual tool is called a graph.

📊 Graph terminology and structure

📊 Basic definitions

Vertices: the dots representing the students in the graph.

Edges: the line segments between vertices representing the handshakes.

Complete graph: a graph where every two vertices are connected by an edge.

The graph shown is called K5, the "complete graph on 5 vertices."
Don't confuse: intersection points of line segments that are not labeled are not vertices—they should be ignored.

🔗 Why "complete"

The graph is complete because every pair of students is connected.
In K5, every one of the 5 vertices has an edge to every other vertex.
This models the handshake scenario where every student shakes hands with all others.

🧩 What this problem introduces

🧩 Counting problems

The handshake problem is a prototype for combinatorial counting.
The excerpt states that similar questions will become more complicated when the number of students increases beyond 5.
Example: for 101 students, the handshake count involves adding numbers from 1 to 100.

🧩 Mathematical proofs and techniques

The excerpt mentions that the class will cover:

Basic counting, sets, and binomial coefficients.
Advanced counting techniques including bijections, recurrence relations, and generating functions.
Graph theory and optimization.
Combinatorial geometry, including planar graphs and coloring problems.

🧩 Extensions of the handshake problem

The excerpt lists example questions that build on the handshake scenario:

Topic area	Example question
Basic counting	How many ways can 5 students line up? How many ways can 3 students be chosen for a project?
Counting techniques	How many ways can 10 rocks be distributed among students?
Graph theory	How many phone calls are needed to share news? How many cables to connect 40 buildings? Can you trace the edges of K5 without lifting your pencil?
Combinatorial geometry	Can K5 be redrawn so edges do not intersect? How many regions can 4 students' lines divide a paper into?

🎯 Pedagogical approach

🎯 Group work and communication

The excerpt emphasizes that developing good communication patterns is more important at this stage than answering problems.
A group works best if everyone speaks for about the same amount of time and questions are valued as much as answers.
For unsolved problems, groups should decide if the question is clear or ambiguous, if all needed information is provided, and provide ideas or strategies.

🎯 Problem-solving process

The exercises ask students to:

Introduce themselves and share why they are taking the class.
Try to solve some problems.
Develop strategies for approaching other problems.
Generate ideas about more advanced problems.
Write down explanations of answers and thought processes for solved problems.

Three Classical Counting Formulas

1.2 Three classical counting formulas

🧭 Overview

🧠 One-sentence thesis

Three fundamental counting techniques—triangular numbers, factorials, and binomial coefficients—provide powerful methods for solving classical problems about handshakes, orderings, and subset selection.

📌 Key points (3–5)

Triangular numbers count cumulative sums (like handshakes) using the formula n(n+1)/2.
Factorials count the number of ways to arrange n distinct objects in order.
Binomial coefficients count the number of ways to choose k elements from n elements when order doesn't matter.
Common confusion: choosing vs. arranging—binomial coefficients ignore order (choosing), while factorial-based counts care about order (arranging).
Two proof strategies: rearrangement proofs (algebraic manipulation) and combinatorial proofs (counting the same thing two different ways).

🔢 Triangular numbers and cumulative sums

🔢 What triangular numbers count

Triangular numbers Tₙ = n(n+1)/2 represent the sum of the first n integers: 1 + 2 + ... + n.

The name comes from arranging objects in a triangular pattern.
The first several triangular numbers are 1, 3, 6, 10, 15, ...
Real-world application: The number of handshakes among n+1 people equals Tₙ.
Example: For 5 students, handshakes = 1+2+3+4 = 10; for 101 students, handshakes = 1+2+...+100 = 5050.

🧮 Proof by rearrangement

Strategy: Pair numbers from opposite ends of the sequence.
Pair (1 + n), (2 + (n-1)), (3 + (n-2)), etc.
Each pair sums to n+1.
If n is even: there are n/2 pairs, so total = (n+1) × n/2.
If n is odd: there are (n-1)/2 pairs plus the middle number (n+1)/2, which also gives n(n+1)/2.

🎨 Combinatorial proof using rocks

Strategy: Count the same collection of objects in two different ways.
Arrange n(n+1) rocks in a rectangular grid with n rows and n+1 columns.
Paint the lower-left half blue: 1 rock in row 1, 2 in row 2, ..., n in row n.
First count: Blue rocks = 1 + 2 + ... + n.
Second count: Blue rocks = exactly half of n(n+1) total rocks = n(n+1)/2.
Since both counts describe the same blue rocks, the formula is proven.

🔄 Factorials and ordering problems

🔄 What factorials count

Factorial n! = n · (n-1) · (n-2) · ... · 3 · 2 · 1 counts the number of ways to arrange n distinct objects in order.

Definition: 1! = 1, and for n > 1, n! = n · (n-1)!
Examples: 2! = 2, 3! = 6, 4! = 24, 5! = 120.

📋 The reasoning behind factorials

Why multiplication works: Each position has a shrinking number of choices.
Example: 5 students lining up at a desk.
- First position: 5 choices.
- Second position: 4 remaining choices.
- Third position: 3 choices.
- Fourth position: 2 choices.
- Fifth position: 1 choice.
- Total: 5 · 4 · 3 · 2 · 1 = 5! = 120.

🎯 Partial arrangements

When only some objects are arranged, use a truncated product.
Example: If only 3 of 5 students line up, the count is 5 · 4 · 3 = 5!/2!.
Example: 47 students seated in 47 labeled chairs = 47! ways.

🎲 Binomial coefficients and subset selection

🎲 What binomial coefficients count

The binomial coefficient (n choose k), written as C(n,k), is the number of ways to choose k elements from a set of n elements when order does not matter.

Pronounced "n choose k."
Formula: C(n,k) = n! / [k!(n-k)!]

🔍 Why order doesn't matter

Key distinction: Choosing vs. arranging.
When choosing 2 numbers from {1, 2, ..., n}, selecting {2, 5} is the same as selecting {5, 2}.
Example: C(n,2) = n(n-1)/2 because there are n choices for the first element and n-1 for the second, then divide by 2 since order doesn't matter.

🧩 Two methods to derive the formula

Method 1: Adjust for overcounting

Start with ordered selections: n · (n-1) · ... · (n-k+1) = n!/(n-k)!
The same k-element set can be chosen in k! different orders.
Divide by k! to remove the ordering: n! / [k!(n-k)!]

Method 2: Line up and select

There are n! ways to line up all n numbers.
Choose the first k elements in line.
Divide by k! because order among chosen numbers doesn't matter.
Divide by (n-k)! because order among unchosen numbers doesn't matter.

📊 Combining binomial coefficients with factorials

Problem	Setup	Answer	Reasoning
47 students, 50 labeled chairs	Choose chairs, then assign students	C(50,47) · 47!	Choose which 47 chairs to use, then arrange students in those chairs
Same problem	Add 3 ghost students	50! / 3!	Arrange all 50 "students" (including ghosts), then divide by arrangements of ghosts

Don't confuse: Both answers equal the same value because C(50,47) · 47! = [50!/(47!·3!)] · 47! = 50!/3!

🔑 Proof strategies recap

🔑 Rearrangement proofs

Manipulate the algebraic expression directly.
Example: Pairing numbers from opposite ends to show 1+2+...+n = n(n+1)/2.

🔑 Combinatorial proofs

Count the same collection of objects in two different ways.
If both methods count the same thing, the two expressions must be equal.
Example: Counting blue rocks in a rectangular grid proves the triangular number formula.

Introduction to graphs

1.3 Introduction to graphs

🧭 Overview

🧠 One-sentence thesis

Graphs are visual tools that represent relationships between objects using vertices and edges, and they help solve problems about connections, walks, and equivalence between different-looking structures.

📌 Key points (3–5)

What a graph represents: vertices stand for objects (people, places) and edges represent connections (handshakes, bridges).
Bipartite graphs: useful when modeling connections between two different types of things (e.g., students and employers).
Walks on graphs: some graphs allow you to trace every edge exactly once without lifting your pencil; others (like Königsberg) do not.
Common confusion: two graphs that look different in a drawing can actually be the same graph—only which pairs of vertices are connected matters, not positions or shapes.
How to tell graphs apart: compare number of vertices, number of edges, or other structural properties like degree.

🎨 What graphs are and how to use them

🎨 Basic definition

A graph uses vertices to represent objects and edges to represent connections between pairs of objects.

The excerpt uses handshakes as an example: each person is a vertex, each handshake is an edge.
The graph K_n has n vertices, and every pair of vertices is connected by an edge; the number of edges in K_n is "n choose 2."
Example: K_5 represents 5 students shaking hands with each other.

🔗 Bipartite graphs

A bipartite graph models connections between two different types of things.

The excerpt gives the example of 5 students meeting 3 employers at a job fair: each employer shakes hands with each student.
Draw one set of vertices for students and another set for employers; edges only connect students to employers, not within the same group.
Bipartite graphs are useful for studying relationships like students-and-classes or students-and-employers.

🚶 Walks on graphs

🌉 The Königsberg bridge problem

The problem asks: can you plan a walk that crosses each of the seven bridges exactly once?
In 1736, Euler proved it is impossible.
The excerpt presents a related question: can you trace your pencil along the edges of the Königsberg graph so that each edge is drawn exactly once, without lifting your pencil?

🖊️ Tracing edges

The challenge is to draw every edge exactly once in one continuous path.
Whether this is possible depends on the structure of the graph, not on how it is drawn.

🔄 When are two graphs the same?

🔄 What "sameness" means

When drawing a graph, the position of vertices and edges does not matter.
The only relevant information is which pairs of vertices are connected.
Two pictures can look completely different but represent the same graph if the connection pattern is identical.

🔍 How to tell if two graphs are different

The excerpt lists methods:

Method	What to check
Number of vertices	If the counts differ, the graphs are different.
Number of edges	If the counts differ, the graphs are different.
Other structural properties	Degree of vertices, planarity, bipartiteness, etc.

Don't confuse: two graphs with the same number of vertices and edges can still be different if their connection patterns differ.
Example: the excerpt asks you to find two such graphs that are still different, and to identify other ways of distinguishing graphs.

📐 Graph properties

📐 Degree of a vertex

The degree of a vertex is the number of edges adjacent to that vertex.

The excerpt uses the cube graph as an example and asks for the degree of each vertex.
Degree is a structural property that can help distinguish graphs.

🗺️ Planar graphs

A planar graph can be drawn on a piece of paper with no edges crossing.

The excerpt asks whether the cube graph is planar.
Planarity is another way to classify and compare graphs.

🎨 Bipartite property revisited

A graph is bipartite if its vertices can be colored with two colors (e.g., green and gold) so that vertices of the same color are not connected by an edge.

The excerpt asks whether the cube graph is bipartite.
This property is useful for problems involving two distinct groups.

🧩 Application examples

🧩 Snow plow operator problem

Vertices represent apartment buildings, edges represent streets.
The operator needs to plow each street, return home, and minimize travel.
The excerpt asks:
- How to remove edges from the Königsberg graph to make the operator happy.
- How to add edges to make the operator happy.
- Whether the location of the operator's house matters.
- What must be true about a graph for the operator to be happy (in mathematical terms).

📬 Mail carrier problem

Vertices represent apartment buildings, edges represent streets.
The carrier needs to deliver mail to each building, return home, and minimize road travel.
The excerpt asks:
- Draw graphs where the carrier is happy and where the carrier is unhappy.
- Whether the location of the carrier's house matters.
- What must be true about a graph for the carrier to be happy (in mathematical terms).

🌉 Relation to Königsberg

The excerpt asks how the graph of Königsberg relates to the picture of the bridge problem.
It also asks how the pencil-tracing question relates to the bridge-crossing question.
Both involve the same underlying graph structure, just presented in different ways.

Introduction to SAGE

1.4 Introduction to SAGE

🧭 Overview

🧠 One-sentence thesis

SAGE is a free, online, open-source computing tool that allows students to compute combinatorial numbers and expressions quickly and accurately without the slowness and errors of manual calculator work.

📌 Key points (3–5)

Why use SAGE: Exact formulas for triangular numbers, factorials, binomial coefficients, and other famous numbers can be slow and error-prone on calculators; SAGE is faster and more powerful.
What SAGE is: A free online open-source program based on Python, accessible through web interfaces without installation.
How to use it: Type simple commands (e.g., arithmetic, factoring) and evaluate; for larger jobs or saving work, create a free CoCalc account.
Key advantage over manual methods: SAGE can compute many values at once using list comprehensions and conditional filtering, making exploration much faster.

💻 What SAGE is and where to find it

💻 Definition and access

SAGE: a free online open-source program, based on Python.

Quick access: Go to http://sagecell.sagemath.org/ for immediate use without an account.
For larger jobs or saving work: Create a free CoCalc account at http://www.sagemath.org/.
Documentation: The SAGE reference manual is at http://doc.sagemath.org/html/en/reference/.
Advanced combinatorics: A tutorial for using SAGE in combinatorics is at https://doc.sagemath.org/html/en/reference/combinat/sage/combinat/tutorial.html.

🧮 Why use computing software instead of calculators

Speed: Computing many values manually or on a calculator is slow.
Accuracy: Calculators introduce small errors; SAGE provides exact results.
Power: SAGE can handle complex expressions and large numbers that are impractical for manual computation.
Example: Computing factorials of large numbers or powers like 4 to the 4 to the 4 is tedious by hand but instant in SAGE.

🔧 Basic SAGE commands

➕ Simple arithmetic and factoring

Arithmetic: Type expressions like 2+3 and evaluate.
Factoring: Use factor(2021) to factor integers.
Powers: Compute exponents with the caret symbol, e.g., 3^6 gives 729.
Factorials: Use factorial(6) to compute 6 factorial (which equals 720).

🔍 Comparing values

Example from the excerpt: Which is bigger, 3 to the 6th power or 6 factorial?

Method 1: Compute each separately:
- 3^6 outputs 729
- factorial(6) outputs 720
Method 2: Use a comparison expression:
- 3^6 - factorial(6) > 0 outputs True, confirming that 3^6 is larger.

This approach avoids manual comparison and gives a direct true/false answer.

📋 Computing multiple values at once

📋 List comprehensions

SAGE can compute many related values in a single command using list comprehensions.

Example from the excerpt: For which values of k is the binomial coefficient (11 choose k) bigger than 100?

Slow method: Compute (11 choose k) for each k from 0 to 11 individually and check each result.
Fast method: Compute all at once:
- [binomial(11, k) for k in range(12)] outputs [1, 11, 55, 165, 330, 462, 462, 330, 165, 55, 11, 1]
- From this list, you can see that (11 choose k) > 100 when k is between 3 and 8.

🎯 Filtering with conditions

You can combine computation with filtering to get only the values that meet a condition.

Command: [k for k in range(12) if binomial(11,k) >100]
Output: [3, 4, 5, 6, 7, 8]
This directly gives the values of k that satisfy the condition, without needing to manually inspect the full list.

⚠️ Important note about `range(m)`

The command range(m) includes all integers from 0 up to m minus 1.
Example: range(12) includes 0, 1, 2, ..., 11 (not 12).
Don't confuse: range(12) does not include 12 itself; it stops at 11.

🧪 Example problems from the excerpt

🧪 Comparing exponentials and factorials

Problem: Which is bigger, 3^6 or 6!?

Compute 3^6 → 729
Compute factorial(6) → 720
Conclusion: 3^6 is bigger by 9.

🧪 Finding binomial coefficients above a threshold

Problem: For which values of k is (11 choose k) bigger than 100?

Generate all binomial coefficients: [binomial(11, k) for k in range(12)]
Inspect the list or filter directly: [k for k in range(12) if binomial(11,k) >100]
Answer: k = 3, 4, 5, 6, 7, 8.

🧪 Very large computations

The exercises mention computing 4^(4^4) and (4!)! (factorial of 4 factorial).

These numbers are astronomically large and impractical to compute by hand or on most calculators.
SAGE handles them instantly, demonstrating its power for combinatorial and number-theoretic exploration.

Theorems, Lemmas, and Propositions oh my!

1.5 Theorems, Lemmas, and Propositions oh my!

🧭 Overview

🧠 One-sentence thesis

Mathematics uses different labels—theorem, lemma, proposition, corollary, conjecture, definition, example, and remark—to organize facts by their role and proof status, helping readers understand which statements are proven tools, which are unproven beliefs, and which are simply explanations.

📌 Key points (3–5)

Proven facts have different labels: theorem, lemma, proposition, and corollary are all proven statements, but they serve different purposes in mathematical exposition.
Lemmas vs propositions: lemmas are proven as tools to help prove bigger theorems (like subroutines in coding), while propositions are interesting on their own but less central than the main theorem.
Conjectures vs theorems: a conjecture is a statement mathematicians believe to be true but have not yet proven; once proven, it becomes a theorem.
Common confusion: definitions require no proof—they simply describe what a term means—while theorems, lemmas, propositions, and corollaries all require rigorous proofs.
Why labels matter: these tags help readers navigate formal mathematics by signaling whether a statement is a major result, a supporting tool, an immediate consequence, or an unproven hypothesis.

🏛️ Proven statements and their roles

🏛️ What is a theorem

A theorem in mathematics is a statement that is known to be true and has a rigorous proof (or several known proofs).

Lemmas, propositions, and corollaries are all types of theorems—they differ only in how they are used in mathematical exposition.
The label "theorem" is typically reserved for important, central results.

🔧 What is a lemma

A lemma is a fact that is proven before proving a big important theorem, because it will be used as a tool in the main proof.

Why lemmas exist: they break down complex proofs into manageable pieces.
Analogy: lemmas are like subroutines in coding—reusable building blocks.
When to use: if the base case of an induction is complicated, it can be proven as a lemma first and then referred to in the main proof.
Sometimes an important mathematical fact is called a lemma because it is used to prove many different things.

📄 What is a proposition

A proposition is a fact that is interesting on its own, is not just a tool to prove other theorems, but is perhaps not quite as big or interesting as the theorem you're talking about in that section or paper.

Key distinction: propositions stand alone as interesting results, unlike lemmas which are primarily tools.
Subjectivity: the excerpt notes "it is indeed all rather subjective"—the choice between calling something a theorem or a proposition depends on the author's judgment of its importance.

➡️ What is a corollary

A corollary is a fact that follows immediately from a theorem that was just stated, perhaps with just one or two lines of proof needed to explain why it follows.

Immediacy: corollaries require minimal additional work beyond the theorem they follow.
They capture consequences that are obvious once the main theorem is proven.

🤔 Unproven and non-proof statements

🔮 What is a conjecture

A conjecture is a statement that mathematicians believe to be true but do not have a proof.

Example from the excerpt: if you notice that the sum of the first 5 odd numbers is 5 squared, and the sum of the first 6 odd numbers is 6 squared, you might conjecture that the sum of the first n odd numbers is always n squared—but until you prove it, it remains a conjecture, not a theorem.
Famous example: Goldbach's conjecture states that every even number larger than four is the sum of two prime numbers; mathematicians have many such conjectures they don't know how to prove.
Don't confuse: a conjecture becomes a theorem only after someone provides a rigorous proof.

📖 What is a definition

A definition is a statement that describes what a term means in mathematics.

No proof needed: definitions require no proof—they are just definitions.
Sometimes definitions are also referred to as notation.
Key distinction: unlike theorems, lemmas, propositions, and corollaries, definitions do not make claims that need proving; they simply establish meaning.

🧪 What is an example

An example is an instance of a theorem in a special case, usually obtained by plugging in numbers for the variables in the theorem or considering an application.

Examples illustrate how a theorem works in practice.
They help readers understand abstract statements by showing concrete cases.

💡 What is a remark

A remark is an observation whose proof is often clear or brief.

Purpose after a theorem: remarks often add insight that might not be obvious from simply reading a theorem statement.
Pointing to more: other times, a remark may point to a stronger theorem which is known to be true, but whose proof is not given in the current exposition.

📊 Summary comparison

Label	Proven?	Purpose
Theorem	Yes	Important, central result with rigorous proof
Lemma	Yes	Tool proven to help prove bigger theorems (like a subroutine)
Proposition	Yes	Interesting on its own, but less central than the main theorem
Corollary	Yes	Follows immediately from a theorem with minimal extra proof
Conjecture	No	Believed true but not yet proven
Definition	N/A	Describes what a term means; no proof needed
Example	N/A	Special case illustrating a theorem
Remark	Often yes, briefly	Adds insight or points to related results

Motivation

2.1 Motivation

🧭 Overview

🧠 One-sentence thesis

This section uses student dialogue to illustrate common counting mistakes—such as overcounting due to order, undercounting by missing cases, and misapplying probability—and motivates the need to learn systematic counting principles.

📌 Key points (3–5)

Two example problems: choosing 6 numbers from {1, …, 20} with at least four odd, and counting poker "full house" hands.
Student a's mistake: multiplies choices in sequence but forgets that order doesn't matter, leading to overcounting.
Student b's insight and error: correctly recognizes order doesn't matter and uses combinations, but still makes mistakes (e.g., in the first problem, doesn't account for "at least four" meaning four, five, or six odd numbers).
Student c's confusion: tries to use probability but gets a non-integer answer, revealing a conceptual error.
Common confusion: when order matters vs. when it doesn't; when to multiply vs. when to add; how to handle "at least" conditions (multiple cases).

🎭 The two counting problems

🎲 Problem 2.1.1: Choosing numbers with odd constraints

Question 2.1.1: How many ways are there to choose 6 numbers from the set {1, …, 20} so that at least four of them are odd?

The set {1, …, 20} contains 10 odd and 10 even numbers.
"At least four odd" means four, five, or six odd numbers among the six chosen.
This is a combination problem (order doesn't matter) with a constraint that requires breaking into cases.

🃏 Problem 2.1.2: Poker full house

Question 2.1.2: In a game of poker, you are dealt five cards from a standard 52-card deck. How many ways can you be dealt a full house?

A full house is defined as a triple of one number (rank) plus a pair of a different number.
Example: 6♥, 6♣, 6♦ together with 3♥, 3♠.
The deck has 13 ranks and 4 suits per rank.

🧑‍🎓 Student a: Overcounting by treating order as mattering

🔢 Approach on Problem 2.1.1

Student a multiplies: 10 · 9 · 8 · 7 · 16 · 15 = 1,209,600.
Logic: pick the first odd (10 choices), second odd (9 choices), third odd (8), fourth odd (7), then any two from the remaining 16 numbers (16 · 15).
Error: this counts each set of six numbers multiple times, once for each order in which they could be chosen.
Example: choosing {1, 3, 5, 7, 2, 4} is counted differently from {3, 1, 5, 7, 2, 4}, but they are the same set.

🃏 Approach on Problem 2.1.2

Student a multiplies: 52 · 3 · 48 · 3 = 22,464.
Logic: pick the first card (52 choices), then two more of that rank (3 ways to choose 2 from 3 remaining suits), then pick another card (48 choices), then one more of that rank (3 ways).
Error: again, order of dealing doesn't matter in a hand, so this overcounts.
Student b points out: "That's over counting again because it doesn't matter which card is dealt first."

🧑‍🎓 Student b: Recognizes combinations but misses cases

🔢 Approach on Problem 2.1.1

Student b calculates: (10 choose 4) · (16 choose 2) = 25,200.
Logic: choose 4 odd numbers from 10, then choose 2 numbers from the remaining 16.
Correct insight: order doesn't matter, so use combinations.
Error: "at least four odd" means four or five or six odd numbers; student b only counted the case of exactly four odd and two non-odd.
Missing cases: exactly five odd (one even) and exactly six odd (zero even).

🃏 Approach on Problem 2.1.2

Student b calculates: (13 choose 2) · (4 choose 3) · (4 choose 2) = 1,872.
Logic: choose 2 ranks from 13, then choose 3 suits for the first rank (the triple) and 2 suits for the second rank (the pair).
Error: (13 choose 2) treats the two ranks symmetrically, but a full house distinguishes which rank is the triple and which is the pair; the correct count should first choose which rank is the triple (13 choices), then which rank is the pair (12 choices), not (13 choose 2).
The formula overcounts by a factor related to the symmetry.

🧑‍🎓 Student c: Misapplying probability

🔢 Approach on Problem 2.1.1

Student c calculates: (20 choose 6) · (1/2)^4 = 2422.5.
Logic: total ways to choose 6 from 20, times the "probability" that four are odd (treating each number as having probability 1/2 of being odd).
Error: the answer is not an integer, which is impossible for a counting problem.
The probability approach is incorrect here because the constraint "at least four odd" is not the same as a binomial probability with independent trials; the numbers are chosen without replacement from a finite set.

🃏 Approach on Problem 2.1.2

Student c doesn't know what a full house is and asks for clarification.
This highlights the importance of understanding the problem definition.

🧑‍🎓 Student d: Using external data without understanding

🔢 Approach on Problem 2.1.1

Student d suggests moving to the next question when student c gets a non-integer answer.
Correct observation: asks "What if there are more than four odd numbers?"—recognizing that "at least four" includes multiple cases.

🃏 Approach on Problem 2.1.2

Student d looks up the probability of a full house (0.001441) and the total number of 5-card hands ((52 choose 5) = 2,598,960).
Multiplies them: 0.001441 · 2,598,960 ≈ 3745.10136.
Correct insight: the number of full houses should equal the probability times the total number of hands.
Error: the final answer should be an integer; the decimal suggests either the probability or the calculation is imprecise.
Student d is close but doesn't verify the reasoning or the exact value.

🧩 Common mistakes and lessons

❌ Order matters vs. order doesn't matter

Don't confuse: when you care about the sequence (permutations) vs. when you only care about the set (combinations).
Student a's errors stem from treating unordered problems as ordered.
Example: choosing {1, 3, 5} is the same as choosing {3, 1, 5} if order doesn't matter.

❌ "At least" means multiple cases

"At least four odd" means four, five, or six odd numbers.
Student b only counted one case (exactly four odd).
Lesson: break "at least" constraints into separate cases and add them.

❌ Probability vs. counting

Probability formulas (like binomial probability) assume independent trials; counting problems with "choose without replacement" require combinatorial methods.
Student c's non-integer answer signals a conceptual error.

❌ Symmetry and labeling

In the full house problem, the triple and the pair are not interchangeable.
Student b's use of (13 choose 2) treats them symmetrically, which is incorrect.
Lesson: distinguish labeled vs. unlabeled choices.

📝 Exercises and takeaways

📝 What the exercises ask

Identify something correct and something wrong in each student's reasoning.
Assess how close the students are to solving each problem and what advice to give.

📝 Takeaway

The section ends with student c saying "Maybe we need to read more of the book."
This motivates the rest of Chapter 2, which will introduce systematic counting principles (addition, subtraction, multiplication, division, and inclusion-exclusion) to avoid these common mistakes.
The dialogue format illustrates real pitfalls: overcounting, undercounting, misapplying probability, and not breaking problems into cases.

2.2 The addition and subtraction principles

🧭 Overview

🧠 One-sentence thesis

The addition and subtraction principles speed up counting by splitting collections into shorter lists or by counting a larger set and removing unwanted items.

📌 Key points (3–5)

Why these principles matter: Counting one-by-one is always possible for finite sets, but it is slow and error-prone; splitting or subtracting speeds things up.
Addition principle: count separate sub-collections and add the totals together.
Subtraction principle: count a larger collection and subtract the unwanted items.
Common confusion: addition requires separate collections (e.g., things ending in 0 vs. things ending in 1), not overlapping groups.
When to use which: use addition when you can naturally split items into non-overlapping groups; use subtraction when it's easier to count "everything" and remove what you don't want.

🧮 Why we need smarter counting strategies

🐌 The problem with counting one-by-one

Counting "by 1's" (adding 1 for every item) always works for finite collections.
But it is very slow and easy to make mistakes:
- You might forget some items.
- You might count some items twice.
Even using a computer to generate the list can be slow if the list is very long.

🧩 The solution: split or subtract

Split: break the collection into shorter lists, count each, then add.
Subtract: count a larger set that includes everything, then remove what you don't want.

➕ The addition principle

📐 Definition and core idea

Addition Principle: The sum a + b counts the total number of things in a collection formed by adding a collection of b things to a collection of a things.

You have two (or more) separate collections.
Count each collection separately.
Add the counts together to get the total.

🔢 Example: two-digit integers ending in 0 or 1

Goal: count two-digit positive integers ending in 0 or 1.
Step 1: count those ending in 0 → 9 numbers (one for each tens digit: 10, 20, 30, …, 90).
Step 2: count those ending in 1 → 9 numbers (11, 21, 31, …, 91).
Step 3: add them → 9 + 9 = 18 total.
The two groups do not overlap (a number cannot end in both 0 and 1), so addition is safe.

🍎 Example: apples and bananas

You have 90 pieces of fruit (apples, bananas, and oranges).
You want to count apples and bananas together.
Instead of counting each apple and banana one-by-one, you can count apples, count bananas, then add.
(The excerpt mentions this scenario to motivate the subtraction principle next.)

➖ The subtraction principle

📐 Definition and core idea

Subtraction Principle: If b things are removed from a collection of a things, then there are a − b things left.

Count a larger, easier-to-count collection.
Subtract the count of the items you don't want.
What remains is the count you need.

🍊 Example: apples and bananas (using subtraction)

You have 90 pieces of fruit total.
You can easily see there are 3 oranges.
So apples and bananas together = 90 − 3 = 87.
This is faster than counting apples and bananas one-by-one.

🔢 Example: two-digit integers not ending in 9

Goal: count two-digit positive integers ending in 0, 1, 2, 3, 4, 5, 6, 7, or 8.
Alternative to addition: instead of adding 9 + 9 + 9 + … (nine times), use subtraction.
Step 1: count all two-digit positive integers → 90 (from 10 to 99).
Step 2: count those ending in 9 → 9 (19, 29, …, 99).
Step 3: subtract → 90 − 9 = 81.
This is simpler than nine separate additions.

🧷 When to use each principle

➕ Use addition when…

You can naturally split the collection into non-overlapping groups.
Each group is easy to count separately.
Example: counting numbers ending in 0 or 1 (two separate groups).

➖ Use subtraction when…

It is easier to count a larger set that includes everything.
The unwanted items are easy to count.
Example: counting all two-digit numbers, then removing those ending in 9.

⚠️ Don't confuse: overlapping vs. separate

Addition only works if the sub-collections do not overlap.
If a number could be in both groups, you would count it twice.
Example: "ends in 0" and "ends in 1" are separate; "ends in 0" and "is even" overlap (10, 20, … are in both), so simple addition would overcount.

The Multiplication and Division Principles

2.3 The multiplication and division principles

🧭 Overview

🧠 One-sentence thesis

The multiplication and division principles provide systematic ways to count outcomes when choices are made in sequence or when identical groupings must be accounted for, and they explain why formulas like combinations divide by factorial terms.

📌 Key points (3–5)

Multiplication principle: counts the number of ways to make one choice in a ways and then another choice in b ways, giving a · b total outcomes.
Division principle (two forms): either counts how many collections result when a things are grouped evenly into size b (a/b collections), or counts the size of each collection when a things are divided into c equal groups (a/c per group).
Why division matters for combinations: when order doesn't matter, we've overcounted by the number of ways to rearrange items, so we divide to correct for this.
Common confusion: addition vs multiplication—use addition when choosing "one thing or another" (separate cases); use multiplication when choosing "one thing and then another" (sequential choices).
Combining principles: real problems often require mixing addition (for cases), multiplication (for sequential choices), and division (for overcounting corrections) in the same solution.

🔢 The multiplication principle

🔢 What it counts

Multiplication Principle: The product a · b counts the number of ways to choose one thing in a ways and then another in b ways.

It applies when you make sequential choices where the second choice has the same number of options regardless of the first choice.
The key word is "and then"—you perform one action followed by another.

👕 Example: outfits

Student k owns 15 shirts and 6 pairs of pants. How many outfits (one shirt and one pair of pants)?
First choose a shirt (15 ways), then choose pants (6 ways).
Total: 15 · 6 = 90 possible outfits.

🔢 Example: two-digit integers

Count two-digit positive integers ending in 0, 4, or 8.
First digit: 9 ways (1 through 9, since it's a two-digit number).
Second digit: 3 possibilities (0, 4, or 8).
Total: 9 · 3 = 27 possibilities.

➗ The division principles

➗ Division Principle 1: grouping into fixed-size collections

Division Principle 1: If a things are grouped evenly into collections of size b, there are a/b collections.

You know the size of each group and want to find how many groups.
Example: 90 pieces of fruit, 10 pieces per basket → 90 / 10 = 9 baskets needed.
Example: 36 students, 4 people per van → 36 / 4 = 9 vans needed.

➗ Division Principle 2: dividing into a fixed number of collections

Division Principle 2: If a things are grouped evenly into c collections of equal size, the number of things in each collection is a/c.

You know the number of groups and want to find the size of each group.
Example: 90 pieces of fruit divided evenly into 9 baskets → 90 / 9 = 10 pieces per basket.
Example: 36 students divided evenly into 9 cars → 36 / 9 = 4 people per car.

🎟️ Example: lotto tickets and overcounting

A lotto ticket is 5 different numbers from {1, ..., 90}, where order does not matter.
Overcounting approach: There are 90 · 89 · 88 · 87 · 86 ways to pick 5 different numbers one at a time (multiplication principle).
But this counts each ticket multiple times—once for each ordering of the same 5 numbers.
Each set of 5 numbers can be arranged in 5! = 120 different orders.
So we've counted each ticket 5! times.
Correction: Divide by 5! to get the true count: (90 · 89 · 88 · 87 · 86) / 5! = 43,949,268.
This explains why the combination formula divides by the factorial.

🤝 Example: pairing people

🤝 Problem setup

How many different ways can 10 people be grouped into 5 pairs of two?

🤝 Solution 1: multiplication principle

Order the people by a predetermined criterion (e.g., age).
The youngest person has 9 possible partners.
The youngest remaining has 7 possible partners.
The youngest remaining has 5 possible partners.
The youngest remaining has 3 possible partners.
The last two people are paired together.
Total: 9 · 7 · 5 · 3 · 1 = 945.

🤝 Solution 2: division principle

There are 10! ways to arrange 10 people in a line.
Pair the first and second, third and fourth, etc.
But this overcounts:
- There are 5! ways to rearrange the ordering of the 5 pairs (doesn't change who is paired with whom).
- There are 2 ways to switch the order within each of the 5 pairs (2⁵ total), which also doesn't change pairings.
Total: 10! / (5! · 2⁵) = 945.

🔀 Combining the principles

🔀 When to use each principle

Principle	When to use	Key word
Addition (a + b)	Choosing one thing or another (separate cases)	"or"
Subtraction (a − b)	Removing b things from a things	"not" / "except"
Multiplication (a · b)	Choosing one thing and then another (sequential choices)	"and then"
Division (a/b)	Grouping evenly or correcting for overcounting	"grouped into" / "order doesn't matter"

🔀 Distinguishing addition vs multiplication

Addition: making one choice or another (mutually exclusive cases).
Multiplication: making one choice and then another (successive choices).
Example: 3 car models and 4 bike models.
- Buy one vehicle (car or bike): 3 + 4 = 7 ways.
- Buy one car and one bike: 3 · 4 = 12 ways.

👔 Example: matching outfit colors

Bob has 5 brown shirts and 4 blue shirts, 3 brown pants and 4 blue pants.
He wants shirt and pants to have the same color. How many ways?
Split into cases (addition): brown outfit or blue outfit.
- Brown case: pick one of 5 brown shirts and then one of 3 brown pants → 5 · 3 = 15 (multiplication).
- Blue case: pick one of 4 blue shirts and then one of 4 blue pants → 4 · 4 = 16 (multiplication).
Add the cases: 15 + 16 = 31 total outfits.

🔀 Summary table

Expression	Combinatorial meaning
a + b	Total size of a collection formed by adding b things to a things
a − b	Number of elements left after removing b things from a things
a · b	Number of ways to choose one thing from a things and then one from b things
a/b	If a things can be sorted into collections of size b, then a/b is the number of collections

Combining the Principles

2.4 Combining the principles

🧭 Overview

🧠 One-sentence thesis

Solving counting problems requires carefully choosing and combining the four basic principles—addition, subtraction, multiplication, and division—by recognizing whether you are making one choice or another (addition) versus one choice then another (multiplication).

📌 Key points (3–5)

Four basic principles summarized: addition (total size after adding), subtraction (elements left after removing), multiplication (ways to choose one thing then another), and division (number of collections when sorting).
Common confusion—addition vs multiplication: use addition when making one choice or another; use multiplication when making one choice then another in succession.
Combining principles: real problems often require splitting into cases (addition) and then applying multiplication within each case.
Key difficulty: small errors in choosing the wrong principle at any step can make the entire answer wrong.

📋 The four basic principles

➕ Addition principle

a + b: The total size of a collection formed by adding b things to a things.

Use addition when you are counting "this or that"—mutually exclusive alternatives.
Example: 3 car models and 4 bike models; buying one vehicle means 3 + 4 = 7 ways (you pick a car or a bike, not both).

➖ Subtraction principle

a − b: The number of elements left after removing b things from a things.

Straightforward: start with a, take away b, count what remains.

✖️ Multiplication principle

a · b: The number of ways to choose one thing from a things and then one from b things.

Use multiplication when you are making one choice then another in succession, and the choices do not depend on earlier choices.
Example: 3 car models and 4 bike models; buying one car and one bike means 3 · 4 = 12 ways (you pick a car then a bike).

➗ Division principle

a/b: If a things can be sorted into collections of size b, then a/b is the number of collections.

Used when grouping or partitioning objects into equal-sized sets.

🔀 Distinguishing addition and multiplication

🔀 The "or" vs "then" rule

Addition: one choice or another—alternatives that don't happen together.
Multiplication: one choice then another—sequential choices made in succession.
Don't confuse: the same problem can look similar but require different principles depending on whether you are choosing alternatives or making multiple independent choices.

🧪 Example: vehicles

Question	Principle	Calculation	Reason
Buy one vehicle (car or bike)	Addition	3 + 4 = 7	You choose a car or a bike
Buy one car and one bike	Multiplication	3 · 4 = 12	You choose a car then a bike

🧩 Combining principles in practice

🧩 Strategy: split into cases

Many problems require breaking into mutually exclusive cases (addition) and then applying multiplication within each case.
Example: Bob has 5 brown shirts, 4 blue shirts, 3 brown pants, and 4 blue pants. He wants an outfit where shirt and pants match in color.
- Case 1 (brown outfit): pick one of 5 brown shirts then one of 3 brown pants → 5 · 3 = 15 ways.
- Case 2 (blue outfit): pick one of 4 blue shirts then one of 4 blue pants → 4 · 4 = 16 ways.
- Total: brown or blue → 15 + 16 = 31 outfits.

⚠️ Watch for errors

The excerpt emphasizes: "Small errors in the set-up can make an answer completely wrong."
At each step, ask: am I choosing alternatives (add) or making sequential choices (multiply)?
Always identify which principle you are using at each step to avoid mixing them up.

Sets, Subsets, and the Number of Subsets

2.5 Sets, subsets, and the number of subsets

🧭 Overview

🧠 One-sentence thesis

A set with n elements has exactly 2 to the power of n subsets, because forming each subset requires n independent yes-or-no decisions about whether to include each element.

📌 Key points (3–5)

What a set is: an unordered collection of distinct objects; order does not matter, and elements are not repeated.
What a subset is: a set contained in a bigger set—every element of the subset is also in the larger set.
The subset-counting formula: a set with n elements has 2 to the power of n subsets (including the empty set and the set itself).
Why the formula works: for each of the n elements, you make 2 choices (include it or not), so by the multiplication principle the total is 2 × 2 × … × 2 (n times).
Common confusion: the empty set is a subset of every set, and every set is a subset of itself—these are often overlooked when listing subsets.

📦 What sets are and how to write them

📦 Basic definition of a set

Set: a collection of distinct objects (in no particular order). The objects in a set are called elements.

Sets are written inside curly brackets: { }.
Example: {a, b, c} and {1, 2, 3, 4} are both sets.
Order does not matter: {a, b, c} is the same set as {b, c, a}.
Don't confuse: a set is not a sequence or list—rearranging elements does not create a different set.

🔢 Size (cardinality) of a set

Size (or cardinality) of a finite set A (written |A|): the number of elements in A.

Example: |{2, 4, 5}| = 3.
Counting problems can be phrased as computing the cardinality of a set.
The excerpt notes that infinite sets also have cardinality, and there are even different sizes of infinity, but this topic is not covered.

✍️ Set builder notation

Set builder notation: {x ∈ A | x has property P} represents the set of all elements x in the set A that satisfy property P.

Sometimes written {x | x has property P} if the containing set A is clear.
Example: {x ∈ {2, 4, 5} | x is even} = {2, 4}.
Example: the set of all integers whose last digit is zero can be written {10 · k | k is an integer}.
This notation helps describe large or infinite sets without listing every element.

🌐 Famous sets

The excerpt introduces standard notation for important infinite sets:

Symbol	Name	Description
∅	Empty set	The unique set containing no elements: `{ }`
N	Natural numbers	`{0, 1, 2, 3, 4, …}` (nonnegative integers; note: 0 is included in this book)
Z	Integers	`{…, −2, −1, 0, 1, 2, …}`
Q	Rational numbers	`{a/b
R	Real numbers	All real numbers

Example: 1.73 ∈ Q is a good approximation for √3, but √3 itself is not in Q (proven in Section 5.5).
Example: π is a real number but not in Q (Lambert proved this in the 1760s).
Example: the set of all even natural numbers can be written {x ∈ N | x is even} = {0, 2, 4, 6, 8, …} = {2 · k | k ∈ N}.

🔣 Membership notation

b ∈ A means b is an element of the set A.
b ∉ A means b is not an element of A.
Example: 4 ∈ {2, 4, 5} but 3 ∉ {2, 4, 5}.

🧩 Subsets and containment

🧩 What a subset is

Subset: a set A is a subset of a set B (written A ⊆ B) if every element of A is also an element of B.

Example: {2, 4} ⊆ {2, 3, 4}, but {2, 4, 5} is not a subset of {2, 3, 4}.
Don't confuse: subset vs element—{2} is a subset of {2, 3, 4}, but 2 is an element of {2, 3, 4}.

🔄 Every set is a subset of itself

For any set A, A ⊆ A because every element of A is an element of A.
The empty set ∅ is a subset of every set A because there are no elements in ∅ to violate the condition.

🪆 Nested inclusions

Example: all natural numbers are integers, all integers are rational numbers, and all rational numbers are real numbers.
This can be written as a nested sequence: ∅ ⊆ N ⊆ Z ⊆ Q ⊆ R.

🔢 Counting the number of subsets

🎯 The subset-counting problem

Motivating example: four students say they might go to office hours on Monday. How many different groups might show up?
This is the same as counting the possible subsets of the set {a, b, c, d} of four students.
The general question: how many subsets does a set of size n have?

🔍 Small cases to build intuition

The excerpt works through small examples:

Set size n	Set example	All subsets	Number of subsets
1	{a}	∅, {a}	2 = 2¹
2	{a, b}	∅, {a}, {b}, {a, b}	4 = 2²
3	{a, b, c}	∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}	8 = 2³

Pattern: the number of subsets is a power of 2.
For a set with 4 elements, we might guess 2 to the power of 4 = 16 subsets.

🧮 The theorem and its proof

Theorem 2.5.16: If S is a set with n elements, then S has 2 to the power of n subsets.

Proof idea:

Label the elements of S as a₁, a₂, a₃, …, aₙ.
Picking a subset of S is the same as:
- First, choosing if a₁ is in the subset or not (2 choices),
- Next, choosing if a₂ is in the subset or not (2 choices),
- …
- Finally, choosing if aₙ is in the subset or not (2 choices).
At each step i, there are 2 choices (include aᵢ or not).
Since we make a decision n times, the total number of subsets is 2 × 2 × … × 2 (n times) = 2 to the power of n by the multiplication principle.

🌳 Visualizing the proof

The excerpt provides a tree diagram for n = 3 with S = {a₁, a₂, a₃}:

At each level, you decide "yes" or "no" for one element.
The tree has 3 levels (one per element).
The leaves of the tree correspond to the 8 subsets: ∅, {a₃}, {a₂}, {a₂, a₃}, {a₁}, {a₁, a₃}, {a₁, a₂}, {a₁, a₂, a₃}.

🔍 Edge case: the empty set

The theorem even works when n = 0.
The empty set ∅ is the unique set with 0 elements.
There is 1 = 2 to the power of 0 subset of ∅, namely ∅ itself.

🧠 Abstract perspective

You can think of forming a new set T whose elements are the subsets of S.
Then 2 to the power of n is the cardinality of T.
In other words, the "set of all subsets" has size 2 to the power of n.

🎓 Examples and applications

🎓 Office hours example

Four students {a, b, c, d} say they might go to office hours.
Number of different groups that might show up: 2 to the power of 4 = 16.
This includes the empty set (no one shows up) and the full set (everyone shows up).

🎓 Even-sized subsets

Students a, b, c say they might show up to office hours.
Among the 8 possible outcomes, how many have an even number of students?
Answer: 4, namely ∅, {a, b}, {a, c}, {b, c}.
(Even cardinality means 0, 2, or 4 students; here the maximum is 3, so 0 or 2.)

🎓 Rocks on a cairn (Exercise 4)

You have 20 different rocks and want to place a subset on a cairn.
Requirements: use at least 1 rock and do not use all 20.
Total subsets: 2 to the power of 20.
Subtract 2 (the empty set and the full set): 2 to the power of 20 − 2.

🎓 Book donation (Exercise 9)

You have 6 different books and want to donate a subset to the library.
Requirements: donate at least 1 book and do not donate all 6.
Total subsets: 2 to the power of 6.
Subtract 2 (the empty set and the full set): 2 to the power of 6 − 2 = 64 − 2 = 62.

🎓 Subsets containing a specific element (Exercise 10)

How many subsets of {1, 2, 3, 4, 5} contain the number 1?
If 1 must be in the subset, you only decide for the other 4 elements.
Answer: 2 to the power of 4 = 16.

Addition and subtraction from the perspective of set theory

2.6 Addition and subtraction from the perspective of set theory

🧭 Overview

🧠 One-sentence thesis

The addition and subtraction principles from counting can be expressed precisely using set operations—union for addition (when sets don't overlap) and difference for subtraction (when one set is contained in another).

📌 Key points (3–5)

Union and addition: when two finite sets have no elements in common, the size of their union equals the sum of their sizes.
Intersection: the intersection of two sets contains only the elements that belong to both sets.
Difference and subtraction: when set B is a subset of set A, the size of A minus B equals the size of A minus the size of B.
Common confusion: "or" in set union means "or both," not the exclusive "either-or" meaning in everyday phrases like "soup or salad."
Complement: the complement of B in A is just another name for A minus B when B is a subset of A.

🔗 Union and intersection operations

🔗 Union definition and meaning

Union of sets A and B is A ∪ B = {x : x ∈ A or x ∈ B}.

The "or" here means x belongs to A, or to B, or to both.
This is different from the exclusive "or" in everyday language (like "paper or plastic").
The union collects all elements that appear in at least one of the sets.
Example: {2, 4, 5} ∪ {2, 5, 7, 9} = {2, 4, 5, 7, 9}—every element from either set appears once in the union.
Visually, the union covers the entire region of overlapping circles in a diagram.

🔗 Intersection definition and meaning

Intersection of sets A and B is A ∩ B = {x : x ∈ A and x ∈ B}.

The intersection contains only elements that belong to both sets simultaneously.
Example: {2, 4, 5} ∩ {2, 5, 7, 9} = {2, 5}—only 2 and 5 appear in both sets.
Visually, it is the common region where the circles overlap.
If two sets have no elements in common, their intersection is the empty set ∅.

➕ The addition principle for sets

➕ When addition works

Addition principle for sets: If A and B are finite sets with no elements in common (that is, A ∩ B = ∅), then the union A ∪ B has size |A| + |B|.

This is the set-theory version of the counting addition principle.
The key requirement: the sets must have no overlap (their intersection is empty).
Why it works: if no element is counted twice, you can simply add the sizes.
Example: A student has climbed 3 big mountains A = {Longs peak, Pikes peak, Mt. Evans} and 2 small mountains B = {Arthur's rock, Horsetooth}. Since there is no overlap between big and small mountains, the total is |A| + |B| = 3 + 2 = 5 mountains.

➕ What happens when sets overlap

The addition principle does not apply when A ∩ B ≠ ∅.
If sets share elements, simply adding |A| + |B| would count the shared elements twice.
Don't confuse: the union still exists when sets overlap, but its size is not |A| + |B|; you need the Principle of Inclusion-Exclusion (covered in the next section).

➖ Difference, complement, and the subtraction principle

➖ Set difference

Difference of sets A and B is A − B = {x ∈ A | x ∉ B}.

A − B contains elements that are in A but not in B.
Order matters: A − B and B − A are generally different.
Example: {2, 4, 5} − {2, 5, 7, 9} = {4}, but {2, 5, 7, 9} − {2, 4, 5} = {7, 9}.
If A and B have no overlap (A ∩ B = ∅), then A − B = A.

➖ Complement

Complement of B in A: when B ⊆ A, the complement of B in A is the difference A − B, written as B^c when A is clear from context.

The complement is just a special name for the difference when one set is contained in the other.
The "c" stands for complement.
Context matters: the complement depends on which larger set A you are working within.

➖ The subtraction principle for sets

Subtraction principle for sets: If A and B are finite sets and B ⊆ A, then |A − B| = |A| − |B|.

This is the set-theory version of the counting subtraction principle.
The key requirement: B must be a subset of A (every element of B is also in A).
Why it works: removing |B| elements from A leaves |A| − |B| elements.
Example: How many numbers in A = {1, …, 25} are not a multiple of 5? Let B = {5, 10, 15, 20, 25} be the multiples of 5. Since B ⊆ A, the answer is |A − B| = 25 − 5 = 20.

➖ When subtraction applies

The subtraction principle requires B ⊆ A.
If B is not a subset of A, you cannot use |A| − |B| to find |A − B|.
Don't confuse: A − B always exists as a set, but the size formula |A − B| = |A| − |B| only holds when B ⊆ A.

🧮 Comparing the principles

Principle	Set operation	Key requirement	Size formula
Addition	Union A ∪ B	A ∩ B = ∅ (no overlap)	\|A ∪ B\| = \|A\| + \|B\|
Subtraction	Difference A − B	B ⊆ A (B is a subset of A)	\|A − B\| = \|A\| − \|B\|

Both principles translate familiar arithmetic operations into set language.
Both require specific conditions: no overlap for addition, subset for subtraction.
When the conditions are not met, the formulas do not hold and you need more advanced techniques (like Inclusion-Exclusion for overlapping unions).

Venn diagrams and the Principle of Inclusion-Exclusion

2.7 Venn diagrams and the Principle of Inclusion-Exclusion

🧭 Overview

🧠 One-sentence thesis

The Principle of Inclusion-Exclusion provides a systematic way to count elements in unions of overlapping sets by adding individual set sizes, subtracting intersections, and (for three or more sets) adding back higher-order intersections to avoid double-counting.

📌 Key points (3–5)

What the principle solves: counting elements in a union when sets overlap non-trivially (share common elements).
Core mechanism for two sets: add the sizes of both sets, then subtract the intersection to correct for double-counting.
Extension to three sets: add all individual sizes, subtract all pairwise intersections, then add back the triple intersection.
Common confusion: forgetting that elements in intersections get counted multiple times when you simply add set sizes—the principle systematically corrects this.
Visualization tool: Venn diagrams (overlapping circles) help see which regions are counted how many times.

🎯 The two-set principle

🎯 Core formula and meaning

Principle of Inclusion-Exclusion for two sets: If A and B are finite sets, then |A ∪ B| = |A| + |B| − |A ∩ B|.

Why subtract the intersection: when you add |A| + |B|, every element in A ∩ B is counted twice (once in A, once in B).
Subtracting |A ∩ B| once removes the duplicate count.
Example: 5 votes for mint, 4 for caramel, total 9 votes—but two students voted for both, so the actual number of students is 5 + 4 − 2 = 7.

📐 Venn diagram visualization

Draw two overlapping circles, one for A and one for B.
The overlap region represents A ∩ B.
The union A ∪ B covers all parts of both circles.
Visually, adding |A| and |B| counts the overlap twice; the formula corrects this.

🔢 Worked example: multiples of 3 or 7

Setup: S = {1, …, 210}; A = multiples of 3 in S; B = multiples of 7 in S.
|A| = 70, |B| = 30, |A ∩ B| = 10 (multiples of 21).
|A ∪ B| = 70 + 30 − 10 = 90.
Finding the complement: numbers with no factor in common with 21 = |S − (A ∪ B)| = 210 − 90 = 120.
Don't confuse: the complement counts what is not in the union; use the subtraction principle after finding the union size.

🔺 The three-set principle

🔺 Core formula and structure

Principle of Inclusion-Exclusion for three sets: If A, B, C are finite sets, then |A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |B ∩ C| − |A ∩ C| + |A ∩ B ∩ C|.

Pattern: add all individual sizes, subtract all pairwise intersections, add back the triple intersection.
Why add back the triple intersection: elements in A ∩ B ∩ C are counted three times in the first sum, subtracted three times in the pairwise subtractions (net zero), so you must add them back once.

🧩 Step-by-step logic

Start by adding |A| + |B| + |C|: every element in exactly one set is counted once; elements in two sets are counted twice; elements in all three are counted three times.
Subtract |A ∩ B|, |B ∩ C|, |A ∩ C|: this removes one count from each pairwise overlap, but elements in A ∩ B ∩ C are subtracted three times (now net count is zero for them).
Add |A ∩ B ∩ C| once: restores the count for elements in all three sets to exactly one.

🍓 Worked example: ice cream flavors

Setup: 7 students; 5 like mint, 4 like caramel, 2 like strawberry; 2 students like both mint and caramel; 1 likes mint and strawberry but not caramel; 0 like strawberry and caramel but not mint.
Fill in Venn diagram regions step by step.
Conclusion: exactly 1 student likes all three flavors (the only way to account for the 2 strawberry votes given the constraints).

🔢 Worked example: multiples of 2, 3, or 5

Setup: S = {1, …, 210}; A = multiples of 2, B = multiples of 3, C = multiples of 5.
|A| = 105, |B| = 70, |C| = 42.
Pairwise intersections (using relative primality): |A ∩ B| = 35 (multiples of 6), |A ∩ C| = 21 (multiples of 10), |B ∩ C| = 14 (multiples of 15).
Triple intersection: |A ∩ B ∩ C| = 7 (multiples of 30).
|A ∪ B ∪ C| = 105 + 70 + 42 − 35 − 14 − 21 + 7 = 154.
Finding the complement: numbers with no factor in common with 30 = |S − (A ∪ B ∪ C)| = 210 − 154 = 56.

🖼️ Using Venn diagrams

🖼️ What a Venn diagram shows

Each set is drawn as a circle (or other closed shape).
Overlapping regions represent intersections.
The entire area covered by all circles is the union.
Helpful for visualizing which elements are counted in which parts of the formula.

🧮 Filling in a Venn diagram

Start with known totals and constraints.
Work from the most specific regions (e.g., the triple intersection) outward.
Use subtraction to find regions: e.g., "likes mint and strawberry but not caramel" = (mint ∩ strawberry) − (mint ∩ strawberry ∩ caramel).
Example: in the ice cream problem, knowing 1 student likes mint and strawberry but not caramel, and 2 total like strawberry, you can deduce the remaining strawberry region.

⚠️ Common pitfall

Don't confuse "likes A or B" (union) with "likes A and B" (intersection).
The principle counts the union; intersections are subtracted to avoid double-counting.

🔗 Relationship to complements

🔗 Counting what is not in the union

Often the goal is to count elements that satisfy none of several conditions.
This is the complement of the union: |S − (A ∪ B ∪ C)|.
Use the subtraction principle: |S − (A ∪ B ∪ C)| = |S| − |A ∪ B ∪ C|.
First apply inclusion-exclusion to find |A ∪ B ∪ C|, then subtract from the total.

🔢 Example pattern

"Numbers with no factor in common with 30" = numbers not divisible by 2, 3, or 5.
Let A, B, C be multiples of 2, 3, 5 respectively.
Compute |A ∪ B ∪ C| using inclusion-exclusion.
Answer = |S| − |A ∪ B ∪ C|.

📊 Summary table

Number of sets	Formula structure	Key idea
Two (A, B)	\|A\| + \|B\| − \|A ∩ B\|	Add sizes, subtract overlap once
Three (A, B, C)	\|A\| + \|B\| + \|C\| − \|A ∩ B\| − \|B ∩ C\| − \|A ∩ C\| + \|A ∩ B ∩ C\|	Add sizes, subtract pairwise overlaps, add back triple overlap
Complement	\|S\| − \|A ∪ B ∪ …\|	Use inclusion-exclusion first, then subtract from total

The Multiplication Principle from the Perspective of Set Theory

2.8 The multiplication principle from the perspective of set theory

🧭 Overview

🧠 One-sentence thesis

The Cartesian product of two sets formalizes the multiplication principle by showing that the number of ordered pairs from sets A and B equals the product of their sizes.

📌 Key points

Cartesian product definition: the set of all ordered pairs (a, b) where a comes from set A and b comes from set B.
Multiplication principle for sets: if A has |A| elements and B has |B| elements, then A × B has |A| · |B| elements.
Ordered pairs matter: the Cartesian product produces ordered pairs, not just combinations—order is significant.
Common confusion: don't confuse the Cartesian product (all ordered pairs) with simply listing elements from both sets; the product creates pairs of elements.
Real-world application: counting problems can be reframed as finding the size of a Cartesian product of smaller, easier-to-count sets.

🔢 Cartesian product fundamentals

🔢 Definition and notation

Cartesian product: If A and B are sets, then their Cartesian product, denoted A × B, is the set consisting of all ordered pairs of elements from A and B: A × B = {(a, b) | a ∈ A and b ∈ B}.

The product is a new set whose elements are ordered pairs.
Each pair has a first component from A and a second component from B.
The notation (a, b) emphasizes that order matters: (a, b) is different from (b, a) unless a = b.

📐 Simple numeric example

The excerpt gives {2, 3, 4} × {x, y}:

Result: {(2, x), (2, y), (3, x), (3, y), (4, x), (4, y)}
Each element from the first set pairs with each element from the second set.
Total pairs: 3 elements × 2 elements = 6 ordered pairs.

Don't confuse: the Cartesian product is not {2, 3, 4, x, y}; it is a set of pairs, not a union of elements.

🗺️ Geometric interpretation

The excerpt mentions R × R (the product of the real numbers with themselves):

This represents all points (x, y) in the usual coordinate plane R².
Called the Cartesian plane, which is where the term "Cartesian product" originates.
Example: [0, 1] × [0, 1] represents all points in the unit square (x, y) where both coordinates are between 0 and 1.

🧮 The multiplication principle for sets

🧮 The formal statement

Lemma (Multiplication principle for sets): If set A has |A| elements and set B has |B| elements, then set A × B has |A| · |B| elements.

|A| denotes the size (number of elements) of set A.
The principle connects set size to multiplication: counting ordered pairs reduces to multiplying the sizes of the component sets.
This is the set-theoretic foundation for the multiplication principle in combinatorics.

🎯 Why it works

For each of the |A| choices from A, there are |B| choices from B.
Total number of distinct ordered pairs: |A| choices × |B| choices = |A| · |B| pairs.
Example from the excerpt: {2, 3, 4} × {2, 5} has 3 · 2 = 6 elements: {(2, 2), (2, 5), (3, 2), (3, 5), (4, 2), (4, 5)}.

🧩 Applied counting examples

🧩 Granola bar indexing

The excerpt describes three boxes of granola bars, each with five bars stacked:

Index each bar by (i, j) where i ∈ A = {1, 2, 3} (box number) and j ∈ B = {1, ..., 5} (position in box).
The set of all bars is A × B.
Total bars: |A × B| = 3 · 5 = 15.

Key insight: real-world objects can be indexed by Cartesian products when they have multiple independent attributes.

🔢 Odd numbers not divisible by 5

The excerpt finds how many numbers in {1, ..., 40} are odd and not multiples of 5:

These numbers have last digit in C = {1, 3, 7, 9} (odd, not 5).
Their first digit (tens place) is in B = {0, 1, 2, 3} (to stay ≤ 40).
The set A of such numbers can be identified with B × C (via the pair (tens digit, ones digit)).
Size: |A| = |B| · |C| = 4 · 4 = 16.

Strategy: break a complex counting problem into independent choices (digits, positions, attributes), then use the Cartesian product to count.

🔄 Comparison: products vs. other set operations

Concept	What it produces	Size formula	Example
Cartesian product A × B	Set of ordered pairs (a, b)	\|A\| · \|B\|	{2, 3} × {x, y} = {(2,x), (2,y), (3,x), (3,y)}
Union A ∪ B (not in excerpt)	Elements in A or B	(not directly multiplicative)	Not covered here
Intersection A ∩ B (not in excerpt)	Elements in both A and B	(not directly multiplicative)	Not covered here

Don't confuse: the Cartesian product creates new elements (pairs) from the originals; it does not simply combine or filter existing elements.

Set partitions, the division principle, and equivalence relations

2.9 Set partitions, the division principle, and equivalence relations

🧭 Overview

🧠 One-sentence thesis

Set partitions organize a set into disjoint blocks, enabling the division principle to count arrangements and equivalence relations to group elements systematically.

📌 Key points (3–5)

Set partition structure: a collection of disjoint blocks whose union is the entire set.
Division principle: when a set is partitioned into equal-sized blocks, the number of blocks equals the set size divided by block size.
Equivalence relations: three properties (reflexive, symmetric, transitive) that naturally create set partitions.
Common confusion: distinguishing when elements are "the same" (equivalent) vs. "different"—the COOL anagram example shows how treating identical letters as distinct leads to overcounting.
Modular arithmetic application: congruence modulo m partitions integers into m blocks based on remainders.

🧱 Set partitions and blocks

🧱 What is a set partition

Set partition: a collection of blocks {B₁, B₂, ..., Bₖ} of a finite set A such that (1) the blocks are disjoint (Bᵢ ∩ Bⱼ = ∅ for all i ≠ j), and (2) the union of the blocks is A (B₁ ∪ B₂ ∪ ... ∪ Bₖ = A).

A block is a non-empty subset of A.
"Disjoint" means no element appears in more than one block.
"Union is A" means every element of A appears in exactly one block.
Example: {{1, 3, 4}, {2, 6}, {5, 7}} is a set partition of {1, 2, 3, 4, 5, 6, 7}.

🔢 The division principle for sets

Division principle: If {B₁, ..., Bₖ} is a set partition of A where each block has the same cardinality m, then |A|/m = k and |A|/k = m.

This applies only when all blocks have equal size.
It connects three quantities: total set size, block size, and number of blocks.
Knowing any two lets you compute the third.

📝 Counting with equal blocks: the COOL example

Problem: How many ways can the letters in "COOL" be rearranged?

If we treat the two O's as different (O₁ and O₂), there are 4! = 24 arrangements.
But the two O's are actually identical, so arrangements differing only by swapping O₁ and O₂ look the same.
We partition the 24 arrangements into blocks of size 2, where each block contains pairs like {CLO₁O₂, CLO₂O₁}.
By the division principle: 24/2 = 12 distinct rearrangements.
Don't confuse: the 24 counts distinguishable arrangements when O's are labeled; the 12 counts actual distinct arrangements when O's are indistinguishable.

🔗 Equivalence relations

🔗 Definition and three properties

Equivalence relation: a way of identifying elements x ∼ y with three rules: (1) reflexive: x ∼ x; (2) symmetric: if x ∼ y then y ∼ x; (3) transitive: if x ∼ y and y ∼ z then x ∼ z.

Reflexive: every element is equivalent to itself.
Symmetric: equivalence works both ways.
Transitive: equivalence chains together.
These three properties ensure that equivalence relations naturally create set partitions.

🎡 Round table seating example

Problem: How many ways can students A, B, C, D be seated at a rotating round table?

There are 4! = 24 ways to seat them if the table were fixed.
Two arrangements are equivalent if one is a rotation of the other.
Example: ABCD ∼ DABC ∼ CDAB ∼ BCDA (all look the same after rotation).
Each arrangement is equivalent to 3 others, so blocks have size 4.
By the division principle: 24/4 = 6 distinct seating arrangements.

🔢 Modular arithmetic and congruence

🔢 Congruence modulo m

Congruence modulo m: integers x and y are congruent modulo m (written x ≡ y mod m) when m divides y − x.

This means x and y have the same remainder when divided by m.
Key lemma: If r is the remainder when x is divided by m, then x ≡ r mod m.
Special case: if m divides x, then x ≡ 0 mod m.

🕐 Everyday examples of congruence

Context	Modulus	Example
Clock arithmetic	12	11 + 2 = 13 ≡ 1 mod 12 (1 o'clock after waiting 2 hours from 11 o'clock)
Even/odd	2	Every even integer ≡ 0 mod 2; every odd integer ≡ 1 mod 2
Last digit	10	The last digit of a number indicates its congruence modulo 10

📊 Counting with congruence blocks

Problem: How many numbers in S = {1, ..., 40} are congruent to 1 modulo 5?

List them: {1, 6, 11, 16, 21, 26, 31, 36}.
Answer: 8 numbers.
General pattern: S can be partitioned into 5 blocks (one for each remainder 0, 1, 2, 3, 4), each of size 8.
Two numbers are in the same block exactly when they are congruent modulo 5.

🧮 General lemma for modular partitions

Lemma 2.9.13: If N is a multiple of m, then S = {1, ..., N} has a set partition into m blocks, each of size N/m, where two numbers are in the same block exactly when they are congruent modulo m.

This works because every integer has exactly one remainder when divided by m (from 0 to m−1).
The m blocks correspond to the m possible remainders.
Each block has equal size N/m because N is a multiple of m.
Example: For N = 35 and m = 5, there are 5 blocks of size 7 each.

Additional problems for Chapter 2

2.10 Additional problems for Chapter

🧭 Overview

🧠 One-sentence thesis

This problem set applies counting principles, set operations, modular arithmetic, and binary string analysis to reinforce the combinatorial and number-theoretic techniques introduced in Chapter 2.

📌 Key points (3–5)

Counting techniques: multiplication and division principles are used to count handshakes, clothing combinations, and sequences.
Set operations: problems use cardinality formulas (inclusion-exclusion), Venn diagrams, and complement operations to verify identities and count elements.
Number theory applications: Euler's totient function counts numbers relatively prime to a given integer; modular arithmetic classifies integers.
Binary strings and symmetry: binary strings of length m correspond to corners of geometric shapes; complement operations reveal structural symmetries.
Common confusion: distinguishing between "with repetition" vs "without repetition" in counting problems, and understanding when complement sets equal the original set (depends on parity).

🤝 Counting with multiplication and division principles

🤝 Handshakes between people

Problem 1: Count handshakes among 7 people if every pair shakes hands exactly once.
Problem 2: Generalize to n people; prove the count is n times (n minus 1) divided by 2.
Why division: Each handshake involves two people, so counting "person by person" double-counts every handshake; divide by 2 to correct.
Example: With 3 people (A, B, C), pairs are AB, AC, BC → 3 times 2 divided by 2 equals 3 handshakes.

👕 Clothing combinations

Problem 3: Bob has 4 pairs of socks, 2 pairs of shoes, 5 pairs of pants, 10 shirts, and 1 hat. Count ways to get dressed (one of each type).
Multiplication principle: Multiply the number of choices for each independent item: 4 × 2 × 5 × 10 × 1.
Don't confuse: This assumes order doesn't matter within each category; we're choosing one item per category, not arranging them.

🔢 Even-digit integers

Problem 4: How many three-digit positive integers have only even digits?
Even digits: 0, 2, 4, 6, 8.
First digit cannot be 0 (must be positive three-digit), so 4 choices; second and third digits each have 5 choices.
Total: 4 × 5 × 5.

📝 Letter sequences

Problem 12: Count sequences of 5 letters from A, E, I, O, U, M, S, allowing repeats (e.g., MMSSU).
7 letters available, 5 positions, repetition allowed → 7 to the power 5.
Problem 13: Count sequences with exactly 2 vowels (A, E, I, O, U are vowels; M, S are consonants).
Choose 2 positions out of 5 for vowels; fill those with 5 choices each; fill remaining 3 with 2 choices each.
Calculation involves binomial coefficient for positions and multiplication for letter choices.

🧑‍💼 Job assignments

Problem 14: Assign 5 jobs to 4 people so each person gets at least one job.
This is a surjective function problem: every person must receive at least one job.
Use inclusion-exclusion or Stirling numbers of the second kind multiplied by permutations.

🔤 Rearrangements with repetition

Problem 15: Rearrange the letters of PUPPY (including the original).
PUPPY has 5 letters: P appears 3 times, U once, Y once.
Formula: 5 factorial divided by (3 factorial times 1 factorial times 1 factorial).
Don't confuse: This is a permutation with repetition, not a simple permutation.

🧮 Set operations and cardinality

🧮 Inclusion-exclusion principle

Problem 5: Given sets A and B with |A| = 21, |B| = 15, and |A ∪ B| = 30, find |A ∩ B|.
Formula: |A ∪ B| = |A| + |B| − |A ∩ B|.
Rearrange: |A ∩ B| = |A| + |B| − |A ∪ B| = 21 + 15 − 30 = 6.

🖼️ Venn diagram identities

Problem 6: Prove A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) using a Venn diagram.
This is the distributive law for intersection over union.
Shade both sides on separate diagrams; they match.
Problem 7: Prove A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) using a Venn diagram.
This is the distributive law for union over intersection.
Again, shade both sides; regions coincide.

🔢 Number theory and modular arithmetic

🔢 Euler's totient function

Problem 8: Count numbers in {1, ..., 105} that are relatively prime to 105.
A number is relatively prime to 105 if it shares no prime factors with 105.
Use Euler's totient function φ(n); for n = product of distinct primes, φ(n) = n times (1 − 1/p₁) times (1 − 1/p₂) ...
Problem 9: For n = pq (p, q distinct primes), show there are pq − p − q + 1 numbers in {1, ..., n} relatively prime to n.
Numbers not relatively prime: multiples of p or q.
Use inclusion-exclusion: total n, subtract multiples of p (q many), subtract multiples of q (p many), add back multiples of pq (1 many).
Result: pq − q − p + 1.

🔢 Modular arithmetic classification

Problem 16: In {1, 2, ..., 24}, find integers that are 2 mod 8 and compute their sum.
Numbers: 2, 10, 18 (each is 2 more than a multiple of 8).
Sum: 2 + 10 + 18 = 30.

🧊 Binary strings and geometric symmetry

🧊 Binary strings of length m

Problem 10(a): Count binary strings of length m.
Each position has 2 choices (0 or 1) → 2 to the power m.

🔄 Complement operation

Problem 10(b): For a string s, the complement sᶜ swaps 0 ↔ 1.
Example for m = 3: strings are 000, 001, 010, 011, 100, 101, 110, 111.
Pairs: 000 ↔ 111, 001 ↔ 110, 010 ↔ 101, 011 ↔ 100.

🧊 Geometric interpretation

Problem 10(c): Plot each binary string s = s₁s₂s₃ as the point (s₁, s₂, s₃) in 3D space.
The 8 points are the corners of a unit cube.
Complement pairs are opposite corners (diagonal through the cube's center).

🧊 Subsets and their complements

Problem 11(a): For m = 2, list all subsets of S (the set of binary strings of length 2) of size k = 3.
S = {00, 01, 10, 11}; subsets of size 3: {00, 01, 10}, {00, 01, 11}, {00, 10, 11}, {01, 10, 11}.
Problem 11(b): For a subset T of S containing k strings, Tᶜ is the subset of their k complements.
Match each subset with its complement subset.

🔄 Parity and self-complementarity

Problem 11(c): If k is odd, prove T ≠ Tᶜ for any subset T of size k.
If T = Tᶜ, then every string in T has its complement also in T → strings come in pairs → |T| must be even.
Contradiction when k is odd.
Problem 11(d): If k is even, construct T such that T = Tᶜ.
Take a subset R of size k/2; let T = R ∪ Rᶜ.
Then T has size k and T = Tᶜ (T is closed under complement).
Don't confuse: Self-complementarity is possible only when k is even.

🔍 Lemma verification

🔍 Problem 9 (Chapter 2.9)

Task: Explain why Lemma 2.9.13 is true using Lemma 2.9.3.
The excerpt does not provide the statements of these lemmas, so the problem requires referring back to earlier sections of Chapter 2.
This is a proof-writing exercise linking two results within the chapter.

Investigation: Divisors of a positive integer

2.11 Investigation: Divisors of a positive integer

🧭 Overview

🧠 One-sentence thesis

Divisor functions—including the count of divisors, their sum, and the Möbius function—can be computed through formulas based on the prime factorization of a positive integer, and these functions share the multiplicative property when applied to relatively prime integers.

📌 Key points (3–5)

Divisor functions depend on prime factorization: the value at N depends on how N breaks down into prime factors.
Three main divisor functions: sigma-zero (count of divisors), sigma-one (sum of divisors), and the Möbius function (alternating sign based on distinct prime factors).
Multiplicative property: all three functions satisfy f(N·M) = f(N)·f(M) when N and M share no prime factors (relatively prime).
Common confusion: the multiplicative property only holds when the integers are relatively prime; it fails when they share common prime factors.
Building formulas progressively: start with prime powers, then products of distinct primes, then general prime factorizations.

🔢 The number of divisors function

🔢 Definition and basic examples

sigma-zero(N): the number of positive divisors of N, including 1 and N itself.

For N = 22, the divisors are 1, 2, 11, 22, so sigma-zero(22) = 4.
For N = 23 (a prime), the divisors are only 1 and 23, so sigma-zero(23) = 2.
For N = 24, the divisors are 1, 2, 3, 4, 6, 8, 12, 24, so sigma-zero(24) = 8.

📐 Formula for prime powers

When N = p raised to power e (where p is prime):

The divisors are 1, p, p-squared, up to p to the e-th power.
Formula: sigma-zero(N) = e + 1.
Example: for N = 81 = 3 to the 4th power, sigma-zero(81) = 5 (the divisors are 1, 3, 9, 27, 81).
Don't confuse: this formula requires p to be prime; it does not work for composite bases.

🧮 Formula for products of distinct primes

When N = p·q where p and q are distinct primes:

Formula: sigma-zero(N) = 4 (specifically, 2 times 2).
The divisors are: 1, p, q, and p·q.
Example: N = 21 = 3·7 has divisors 1, 3, 7, 21, so sigma-zero(21) = 4.

When N = p·q·r where p, q, r are distinct primes:

Formula: sigma-zero(N) = 2 times 2 times 2 = 8.
Each prime can either appear or not appear in a divisor.

🎯 General formula for any factorization

When N = p-one to the e-one power · p-two to the e-two power · ... · p-n to the e-n power (distinct primes):

Formula: sigma-zero(N) = (e-one + 1) · (e-two + 1) · ... · (e-n + 1).
Why it works: each divisor is formed by choosing an exponent from 0 to e-i for each prime p-i; the choices are independent and multiply by the counting principle.

➕ The sum of divisors function

➕ Definition and basic examples

sigma-one(N): the sum of the positive divisors of N, including 1 and N itself.

For N = 22: 1 + 2 + 11 + 22 = 36, so sigma-one(22) = 36.
For N = 23: 1 + 23 = 24, so sigma-one(23) = 24.
For N = 24: 1 + 2 + 3 + 4 + 6 + 8 + 12 + 24 = 60, so sigma-one(24) = 60.

📊 Formula for prime powers

When N = p to the e-th power (p is prime):

The divisors are 1, p, p-squared, ..., p to the e-th power.
Formula: sigma-one(N) = 1 + p + p-squared + ... + p to the e-th power (a geometric series).
This is the sum of a geometric progression.

🔗 Formula for products of primes

When N = p-one · p-two (distinct primes):

Formula: sigma-one(N) = sigma-one(p-one) · sigma-one(p-two).
The sum factors because every divisor of N is uniquely a product of a divisor of p-one and a divisor of p-two.

When N = p-one to the e-one · p-two to the e-two · ... · p-n to the e-n (distinct primes):

Formula: sigma-one(N) = sigma-one(p-one to the e-one) · sigma-one(p-two to the e-two) · ... · sigma-one(p-n to the e-n).
Each factor is the geometric sum for that prime power.

🔄 The Möbius function

🔄 Definition and rules

The Möbius function mu(N) is defined by three cases:

If any prime p-squared divides N, then mu(N) = 0.

mu(1) = 1.

If N = p-one · p-two · ... · p-n (distinct primes), then mu(N) = negative-one to the n-th power.

The function alternates sign based on the number of distinct prime factors.
It is zero whenever N has a repeated prime factor (not square-free).

🧪 Example and sum property

For N = 22 = 2 · 11:

Divisors: 1, 2, 11, 22.
mu(1) = 1, mu(2) = -1, mu(11) = -1, mu(22) = 1.
Sum over all divisors: 1 + (-1) + (-1) + 1 = 0.

The excerpt asks to compute this sum for other values and make a conjecture (the investigation guides the student to discover a pattern).

🔗 Multiplicative functions

🔗 Definition of multiplicativity

A function f on positive integers is multiplicative if f(N·M) = f(N)·f(M) whenever N and M are relatively prime (share no common prime factors).

Relatively prime means gcd(N, M) = 1.
The property is about factoring the function value, not about addition.

✅ All three functions are multiplicative

Function	Multiplicative?	Example showing failure when not relatively prime
sigma-zero	Yes	Must provide example where gcd(N,M) > 1
sigma-one	Yes	Must provide example where gcd(N,M) > 1
mu	Yes	Must provide example where gcd(N,M) > 1

The excerpt asks students to verify multiplicativity and to find counterexamples when N and M share a common factor.
Don't confuse: multiplicativity requires the inputs to be relatively prime; the property does not hold in general.

The types of combinations

3.1 The types of combinations

🧭 Overview

🧠 One-sentence thesis

The number of ways to choose k objects from n objects depends critically on whether order matters and whether repeats are allowed, yielding four distinct counting scenarios.

📌 Key points (3–5)

The central question: "How many ways can you choose k objects from n objects?" has no single answer—it depends on two factors.
Two key distinctions: whether order matters (does sequence count?) and whether repeats are allowed (can you pick the same object more than once?).
Four different answers: the same problem (e.g., choosing 2 fruits from 5) gives 10, 20, 25, or 15 depending on the combination of order/repeat rules.
Common confusion: "choosing 2 from 5" sounds like one problem, but you must always clarify order and repeats before counting.
Goal of the chapter: find general formulas for each of the four cases when choosing k objects from n objects.

🍎 The four scenarios illustrated

🍎 Order does not matter, repeats not allowed

This is the classic "combination" scenario: selecting a subset where sequence is irrelevant and no duplicates.

Example from the excerpt: Pick 2 pieces of fruit from 5 types (apple, banana, orange, pear, mango) to bring to work.
You care only which fruits, not in what order you grab them.
You cannot pick the same fruit twice (only one of each type in the fridge).
Count: "5 choose 2" = 5 factorial divided by (2 factorial times 3 factorial) = 10 ways.

🗓️ Order matters, repeats not allowed

Now sequence is important, but you still cannot reuse an object.

Example from the excerpt: Eat one fruit on Monday and a different fruit on Tuesday.
Monday vs. Tuesday matters: apple-then-banana is different from banana-then-apple.
Once you eat a fruit on Monday, it's gone—only 4 choices remain for Tuesday.
Count: 5 choices for Monday × 4 choices for Tuesday = 20 ways.

🔁 Order matters, repeats allowed

Sequence matters and you can pick the same object multiple times.

Example from the excerpt: The fridge is packed with plenty of each fruit type; you can eat an apple both Monday and Tuesday.
Order still matters (Monday vs. Tuesday).
Each day you have all 5 choices available again.
Count: 5 choices for Monday × 5 choices for Tuesday = 25 ways.

🎒 Order does not matter, repeats allowed

You care only about the collection (not sequence), but duplicates are permitted.

Example from the excerpt: Grab 2 pieces of fruit for your lunch bag; the fridge has plenty of each type.
You can pick two of the same kind (e.g., two apples).
Order doesn't matter: apple-apple is just one outcome, not two.
Count: 5 ways to pick two of the same kind + 10 ways to pick two different kinds = 15 ways.
Don't confuse: this is not the same as "order matters, repeats allowed" (25); allowing repeats when order doesn't matter requires careful counting to avoid double-counting pairs.

📊 Summary table

The excerpt provides a table that captures all four cases for choosing 2 fruits from 5 types:

	Order does not matter	Order matters
Repeats not allowed	10	20
Repeats allowed	15	25

Each cell answers the same base question ("choose 2 from 5") under different rules.
The chapter's goal is to generalize these numbers: find formulas for choosing k objects from n objects in each of the four cases.

🎯 What the chapter will cover

🎯 General formulas for all four cases

The excerpt states that the chapter will study each case "in depth."
The aim is to compute the number of ways to choose k objects from n objects for every combination of order/repeat rules.
Throughout, k and n are assumed to be natural numbers (positive integers).

🎯 Why this matters

The same counting question can have radically different answers depending on context.
Before applying any formula, you must first determine:
1. Does order matter in this situation?
2. Are repeats allowed?
Example: "How many ways can 7 people fill 4 student government positions?" depends on whether the positions are distinct roles (order matters) or just a committee (order does not matter).

Sequences

3.2 Sequences

🧭 Overview

🧠 One-sentence thesis

Sequences provide a framework for counting ordered arrangements of objects, with the formula n^k covering cases where order matters and repeats are allowed, and n! covering permutations where all n objects are arranged without repetition.

📌 Key points (3–5)

What a sequence is: an ordered list of numbers or symbols, either finite or infinite, written with parentheses and commas like (5, 2, 3, 6).
Counting sequences with repeats allowed: when choosing k objects from n objects where order matters and repeats are allowed, there are n^k possibilities.
Permutations (no repeats): a permutation arranges all n objects in order with no repeats; there are n! permutations of n objects.
Common confusion: the fruit-choosing examples show that "choose 2 from 5" can yield 10, 20, 25, or 15 depending on whether order matters and whether repeats are allowed—context determines the correct formula.
Generalization with different alphabets: when each position in a sequence can draw from a different set of symbols (sizes n₁, n₂, ..., nₖ), the total count is n₁ · n₂ · ... · nₖ.

📝 What sequences are

📝 Definition and notation

Sequence: a list of numbers or other symbols written in order; can be finite or infinite.

A finite sequence has a specific length (number of entries), e.g., (5, 2, 3, 6) has length 4.
An infinite sequence continues indefinitely, e.g., (2, 4, 6, 8, 10, ...).
Notation: (a₁, a₂, ..., aₙ) denotes a finite sequence of length n; (a₁, a₂, a₃, ...) denotes an infinite sequence where the i-th entry is aᵢ for each integer i ≥ 1.

🔤 Alphabet and strings

Alphabet: the set of numbers or symbols used in a sequence.

Common alphabets: binary {0, 1}, English letters, ASCII, or natural numbers.
String or word: a sequence written without parentheses and commas for convenience, e.g., (a, p, p, l, e) becomes "apple" and (1, 0, 1, 1, 0) becomes "10110".
Warning: don't write (1, 0, 10, 100, 1000) as "10101001000" because you lose information about entry boundaries.

🔢 Alternative view as functions

A sequence of length n can be thought of as a function assigning a number or symbol to every positive integer from 1 to n.
An infinite sequence assigns a number or symbol to every positive integer.

🔄 Zero-indexing variation

Sometimes it is convenient to start a sequence with a₀ rather than a₁ (zero-indexing).

🧮 Counting sequences: order matters, repeats allowed

🧮 The n^k formula

Theorem: The number of sequences of length k composed from an alphabet with n symbols is n^k.

More generally: there are n^k combinations of k objects chosen from n objects where order matters and repeats are allowed.
Why: there are n ways to choose the first symbol, n ways to choose the second, and so on up to the k-th, giving n · n · n · ... · n = n^k possibilities.
This fills one entry in the "choose k from n" table:

	Order does not matter	Order matters
Repeats not allowed	?	?
Repeats allowed	?	n^k

🔢 Examples with uniform alphabets

Binary strings of length k:

There are 2^k binary strings (n = 2).
Example: for k = 3, all 8 = 2³ strings are 000, 001, 010, 011, 100, 101, 110, 111.

Three-letter strings from English alphabet:

There are 26³ = 17,576 possibilities, e.g., "pxy" or "csu".

Three-character strings (letters or digits):

There are (26 + 10)³ = 36³ = 46,656 possibilities, e.g., "p17", "2xy", or "pqm".

Buffet snacks:

With 3 types of snacks (samosas, chicken skewers, carrots) and eating 10 snacks in a row, there are 3¹⁰ ways.

🔀 Generalization: different alphabets per position

Theorem: There are n₁ · n₂ · ... · nₖ possible sequences of length k made by choosing one of n₁ symbols as the 1st entry, one of n₂ symbols as the 2nd entry, ..., one of nₖ symbols as the k-th entry.

The n^k formula is the special case where n₁ = n₂ = ... = nₖ = n.

Example: 4-digit numbers with no leading zeros:

First digit: 9 choices (1–9).
Remaining three digits: 10 choices each (0–9).
Total: 9 · 10 · 10 · 10 = 9,000.

Example: consonant-vowel-consonant strings:

First letter: 21 consonants.
Middle letter: 5 vowels (A, E, I, O, U).
Last letter: 21 consonants.
Total: 21 · 5 · 21 = 2,205.

🔄 Permutations: order matters, no repeats

🔄 What a permutation is

Permutation: an ordering or arrangement of all n objects; can be expressed as a sequence or string.

A permutation uses each object exactly once.
Example: the set {a, b, c} has 6 permutations: abc, acb, bac, bca, cab, cba.
For sets with multi-digit numbers like {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, use sequence notation to avoid confusion, e.g., (3, 6, 7, 2, 1, 10, 4, 5, 9, 8).

🔄 The n! formula

Theorem: The number of permutations of n objects is n! (n factorial), where n! = n · (n − 1) · (n − 2) · ... · 3 · 2 · 1.

Why: there are n choices for the first object, n − 1 for the second, n − 2 for the third, and so on until only 1 choice remains for the n-th object, giving n · (n − 1) · (n − 2) · ... · 1 = n!.
Special case: 0! = 1, because there is one way to arrange zero objects (the "empty sequence" or "empty string").

🍦 Examples of permutations

Seven students lining up:

There are 7! = 5,040 ways for seven students {a, b, c, d, e, f, g} to line up for ice cream.

Five fruits for five weekdays:

Assign one of 5 different fruits (apple, banana, orange, pear, mango) to each day of the work week, using each fruit exactly once.
This is a permutation of {A, B, O, P, M}.
Total: 5! = 5 · 4 · 3 · 2 · 1 = 120.

🔍 Don't confuse permutations with sequences allowing repeats

Permutations require all n objects used exactly once (no repeats, all objects included).
Sequences with repeats allowed (n^k) let you reuse objects and may not use all objects.
Example: choosing 2 fruits from 5 where order matters gives 5 · 4 = 20 if repeats are not allowed (a partial permutation, covered in the next section) but 5² = 25 if repeats are allowed.

🔗 Preview: ordered subsets

🔗 What comes next

The excerpt introduces the concept of ordered subsets: choosing k objects from n objects where order matters and repeats are not allowed.
This is a generalization of permutations (which arrange all n objects) to cases where only k < n objects are selected and arranged.
The excerpt ends before providing the formula or detailed examples for ordered subsets, noting that the ice cream adventurers face a new problem requiring this concept.

Permutations and other sequences with distinct entries

3.3 Permutations and other sequences with distinct entries

🧭 Overview

🧠 One-sentence thesis

The number of ways to arrange n objects in order is n factorial, and when arranging only k objects from n, the count is n!/(n−k)!, both representing scenarios where order matters and repeats are not allowed.

📌 Key points (3–5)

Permutations count all arrangements: A permutation arranges all n objects in order, and there are n! such arrangements.
Ordered subsets for partial arrangements: When choosing and ordering only k objects from n (where k ≤ n), there are n!/(n−k)! ways.
Order matters, no repeats: Both permutations and ordered subsets require distinct entries—no object appears twice in the same arrangement.
Common confusion: Ordered subsets vs permutations—ordered subsets arrange only k out of n objects, while permutations arrange all n; when k = n, the formulas coincide.
Building block for the counting table: These formulas fill the "order matters, repeats not allowed" row of the general counting framework.

🔢 Permutations: arranging all objects

🔢 What a permutation is

Permutation of n objects: an ordering or arrangement of all n of them, expressed as a sequence or string.

A permutation uses every object exactly once.
The order in which objects appear matters—different orders are different permutations.
Example: The set {a, b, c} has 6 permutations: abc, acb, bac, bca, cab, cba.

🧮 Counting permutations: n! formula

Theorem: The number of permutations of n objects is n!.

Why n!: By the multiplication principle, there are n choices for the first position, then n−1 for the second, n−2 for the third, and so on down to 1 choice for the last position.
The product n · (n−1) · (n−2) · ... · 3 · 2 · 1 equals n!.
Example: Seven students lining up for ice cream can arrange themselves in 7! = 5,040 ways.

🍎 Everyday scenarios

Fruit assignment: If you have 5 different fruits and want to bring one each day of a 5-day work week (no repeats), there are 5! = 120 ways to assign them.
License plates and codes: The excerpt mentions problems like counting license plate combinations and donor account codes, which use similar multiplication principles.

🕳️ The empty case: 0! = 1

When there are 0 objects, there is exactly one way to arrange them: do nothing (the "empty sequence").
By convention, 0! = 1 to make formulas consistent.

📦 Ordered subsets: arranging k out of n objects

📦 What an ordered subset is

Ordered subset of size k from a set A: a string of k distinct elements chosen from A, where order matters.

Unlike a full permutation, an ordered subset arranges only k objects (k ≤ n), not all n.
Each element in the string must be distinct—no repeats.
Example: Choosing 4 students from 7 to form a line (where the shop can only serve 4 customers) creates an ordered subset of size 4.

🧮 Counting ordered subsets: n!/(n−k)!

Theorem: The number of ordered subsets of size k from a set with n elements is
n · (n−1) · (n−2) · ... · (n−k+1) = n! / (n−k)!.

Why this formula:
- There are n choices for the first position.
- Then n−1 for the second, n−2 for the third, and so on.
- For the kth (final) position, there are n−k+1 remaining choices.
- The product has exactly k factors.
The decreasing number of choices at each step ensures no repeated entries.
Example: Forming a line of 4 students from 7 gives 7 · 6 · 5 · 4 = 840 ways, which equals 7! / 3!.

🍊 Practical examples

Scenario	n	k	Count	Explanation
Fruits on Monday and Tuesday	5	2	5!/3! = 20	Choose and order 2 fruits from 5
47 students in 50 seats	50	47	50!/3!	Assign 47 students to 47 of 50 seats in order
Visit 5 state capitals from 50	50	5	50!/45!	Order matters: Sacramento→Dover→... differs from Dover→...

💡 Computation tip

When dividing factorials, expand both and cancel before multiplying.
Example: 5!/3! = (5·4·3·2·1)/(3·2·1) = 5·4 = 20, which is easier than computing 120/6.

🔄 Don't confuse: ordered subsets vs permutations

Permutations arrange all n objects (k = n).
Ordered subsets arrange only k objects from n (k can be less than n).
When k = n, the ordered subset formula n!/(n−k)! = n!/0! = n! reduces to the permutation formula.

🗂️ Filling the counting table

🗂️ The "order matters, repeats not allowed" row

The excerpt completes part of a general counting framework:

Condition	Order doesn't matter	Order matters
Repeats not allowed	(n choose k) = n!/(k!(n−k)!)	n!/(n−k)!
Repeats allowed	(formula not in this excerpt)	n^k

The right column's top entry, n!/(n−k)!, counts ordered subsets (this section's focus).
The left column's top entry, (n choose k), counts unordered subsets (covered in Section 3.4, briefly mentioned here).
The bottom-right entry, n^k, was covered earlier (repeats allowed, order matters).

🔗 Connection to unordered subsets (preview)

The excerpt mentions that (n choose k) = n!/(k!(n−k)!) counts subsets where order does not matter.
The proof idea: ordered subsets count each unordered subset k! times (once for each permutation of the k chosen objects), so divide n!/(n−k)! by k! to get (n choose k).
Example: The ice cream contest chooses 4 winners from 107 entrants; since all winners receive the same prize, order doesn't matter, so the answer is (107 choose 4).

🧩 Key distinctions and reminders

🧩 Order matters vs order doesn't matter

Order matters: "First place, second place, third place" are different roles → use n!/(n−k)!.
Order doesn't matter: "Four winners all get the same prize" → use (n choose k).
Example: Ranking your top 10 restaurants (#1 to #10) from 50 uses ordered subsets (order matters); choosing 4 contest winners from 107 uses unordered subsets (order doesn't matter).

🧩 Repeats allowed vs repeats not allowed

Repeats not allowed: Each object used at most once → formulas involve factorials (n! or n!/(n−k)!).
Repeats allowed: Objects can be reused → formula is n^k (from earlier sections).
Example: Scrambling 7 distinct letters into a 7-letter word (each letter used exactly once) is a permutation (7!); scrambling into a 5-letter word (each letter used at most once) is an ordered subset (7!/2!).

🧩 Practical problem-solving checklist

Does order matter? (Line order, ranking, sequence → yes; prize winners, committee members → no)
Are repeats allowed? (Distinct entries required → no; can reuse → yes)
Are we arranging all n or only k? (All n → permutation n!; only k → ordered subset n!/(n−k)!)

Example: "How many ways to stack 5 different rocks with your favorite on top?" → Order matters (stack position), no repeats (distinct rocks), arrange all 5 but one position is fixed → 4! ways to arrange the remaining 4 rocks below the top one.

Sets: When order doesn't matter

3.4 Sets: When order doesn’t matter

🧭 Overview

🧠 One-sentence thesis

The binomial coefficient formula counts how many ways you can choose k objects from n objects when order does not matter and repeats are not allowed, and this counting principle extends to surprising applications like binary strings and multisets.

📌 Key points (3–5)

What the binomial coefficient counts: the number of subsets of size k from a set of size n, where order does not matter and repeats are not allowed.
How it relates to ordered counting: the formula divides ordered subsets by k! because each unordered subset is counted k! times when order matters.
Common confusion: choosing k winners is the same as eliminating (n − k) losers, so (n choose k) equals (n choose n − k).
Surprising application: the formula also counts binary strings with exactly k zeroes and (n − k) ones, even though strings are ordered.
Why it matters: completes the counting table for "order does not matter, repeats not allowed" and provides a foundation for multiset counting.

🧮 The binomial coefficient formula

🧮 What it counts

The number of subsets of size k in a set of size n is (n choose k) = n! / (k!(n − k)!).

More generally, this is the number of ways to choose k objects from n objects where order does not matter and repeats are not allowed.
This fills in the upper-left entry of the counting table.

Situation	Order does not matter	Order matters
Repeats not allowed	(n choose k)	n! / (n − k)!
Repeats allowed	(to be covered)	n^k

🔍 Why the formula works (proof sketch)

The proof uses the division principle:

Start with ordered subsets: by an earlier theorem, the number of ordered subsets of size k is n! / (n − k)!.
Each unordered subset is overcounted: every unordered subset gets counted k! times (once for each permutation of the k objects).
Divide to correct the overcount: divide n! / (n − k)! by k! to get n! / (k!(n − k)!).

Example: If you have 5 fruits and choose 2 to bring to work on the same day (order doesn't matter), the answer is (5 choose 2) = 5! / (2! · 3!) = (5 · 4 · 3 · 2 · 1) / (2 · 1 · 3 · 2 · 1) = 5 · 2 = 10.

💡 Computation tip

When dividing one factorial by another, expand both as products and cancel terms before multiplying.

Don't compute 5! = 120 and 3! = 6 separately and then divide.
Instead, write 5! / 3! = (5 · 4 · 3 · 2 · 1) / (3 · 2 · 1), cancel the 3 · 2 · 1, and get 5 · 4 = 20 directly.

🎯 Worked examples

🎯 Ice cream contest winners

Problem: Out of 107 entrants, how many ways are there to choose 4 winners for an "Ice Cream for a Year" contest? All winners receive the same prize.

Answer: (107 choose 4) = 107! / (4! · 103!) = 5,160,610.

Order does not matter because all winners get the same prize.
Repeats are not allowed because each person can win at most once.

🏈 NCAA football teams (unranked)

Problem: How many ways are there to select the top 25 NCAA DI football teams (out of 128) without ranking them?

Answer: (128 choose 25) = 128! / (25! · 103!).

This is the number of unordered subsets of size 25.
Don't confuse with the earlier exercise about ranking: if you rank them, order matters and the answer is 128! / 103!.

🎁 Wrapping teddy bears

Problem: You have 7 different rolls of wrapping paper and 4 identical teddy bears. How many ways can you wrap the teddy bears such that no two bears are wrapped in the same paper?

Answer: (7 choose 4).

You are choosing 4 different wrapping papers out of 7.
Order does not matter because the teddy bears are identical.

🔄 Symmetry property

🔄 Choosing vs eliminating

Key insight: Choosing k winners out of n entrants is the same as eliminating (n − k) losers.

In the ice cream contest, choosing 4 winners out of 107 is the same as eliminating 103 people who did not win.
Both give the same count: (107 choose 4) = (107 choose 103).

📐 Corollary formula

(n choose k) = (n choose n − k).

This symmetry often simplifies calculations.
Example: (128 choose 25) = (128 choose 103), so you can compute whichever is easier.

🔢 Surprising application: binary strings

🔢 Counting binary strings with fixed numbers of 0s and 1s

Problem: How many binary strings (from the alphabet {0, 1}) have exactly k zeroes and (n − k) ones?

Answer: (n choose k).

Why this works:

A binary string of length n has n positions.
To form a string with exactly k zeroes, you simply choose k positions (out of n) for the 0's.
The remaining (n − k) positions automatically get 1's.
Choosing k positions is an unordered choice, even though the string itself is ordered.

Don't confuse: The string is ordered (position matters), but the choice of which positions get 0's is unordered (you are choosing a subset of positions).

Example: If n = 5 and k = 2, there are (5 choose 2) = 10 binary strings with exactly two 0's and three 1's.

🧩 Transition to multisets

🧩 What's next: repeats allowed

The excerpt introduces the next topic: multisets, which are sets where elements can repeat (have multiplicity greater than 1).

A multiset is a set together with a positive integer multiplicity assigned to each element.
Example: {apple, apple, orange, pear, pear} is a multiset with two apples, one orange, and two pears.
Notation: The number of multisets of size k from n types is written ((n multichoose k)) or ((n choose k with repeats)).

🍦 Ice cream sundae problem

Setup: There are 10 flavors of ice cream and a large sundae has 15 scoops. Order does not matter (the scoops melt together) and repeats are allowed (you can have multiple scoops of the same flavor). How many different sundaes can you order?

This is a "repeats allowed, order does not matter" problem.
The excerpt begins to introduce the "sticks and stones" method (also called "stars and bars") to count multisets, but the full solution is not provided in this excerpt.

Multisets: sets with repeats allowed

3.5 Multisets: sets with repeats allowed

🧭 Overview

🧠 One-sentence thesis

Multisets extend the idea of sets by allowing repeated elements, and counting them uses the "sticks and stones" method to transform the problem into choosing positions for dividers and items.

📌 Key points (3–5)

What a multiset is: a collection where the same element can appear multiple times, each with a specified multiplicity.
The counting formula: the number of multisets of size k from n types equals (n+k−1 choose k), which is written as ((n choose k with repeats)).
Sticks and stones method: represent multisets as strings of k stones (items) and n−1 sticks (dividers between types), then count positions.
Common confusion: don't confuse multisets with ordinary sets—order still doesn't matter, but repetition is now allowed.
Applications: multisets solve distribution problems (identical objects to people) and composition problems (ordered lists of nonnegative integers with fixed sum).

🎯 What multisets are

🎯 Definition and notation

Multiset: a set together with a positive integer multiplicity assigned to each element; equivalently, a set S with a function f : S → N − {0}.

Unlike ordinary sets, elements can repeat.
Example: {apple, apple, orange, pear, pear} is a multiset with two apples and two pears.

✍️ Ways to write multisets

Three equivalent notations:

List with repeats: {4, 4, 4, 5, 6, 6} or {4, 5, 4, 6, 4, 6} (order doesn't matter)
Multiplicity notation: {4×3, 5×1, 6×2} (the ×3 means "occurs 3 times," not exponentiation)
Function form: f : {4, 5, 6} → N with f(4) = 3, f(5) = 1, f(6) = 2

🔢 Notation for counting

((n choose k)) or ((n multichoose k)): the number of multisets of size k whose elements come from n types.
Pronounced "n multichoose k" or "n choose k with repeats."

🪨 The sticks and stones counting method

🪨 How the method works

The key insight: represent each multiset as a string of stones and sticks.

Stones (•): represent individual items
Sticks (|): act as dividers between different types

Example: A bag with 1 apple, 2 bananas, 2 mangoes, 0 oranges, 1 pear (from 5 fruit types A, B, M, O, P) becomes:

Multiset: {A, B, B, M, M, P}
String: • | • • | • • | | •

The empty space between the last two sticks shows zero oranges.

🧮 Why it gives the formula

For k items from n types:

Need k stones (the items)
Need n−1 sticks (to separate n types)
Total positions: k + (n−1) = n + k − 1
Choose which k positions get stones: (n+k−1 choose k)

Theorem: The number of multisets of size k from n types equals (n+k−1 choose k).

🍦 Ice cream example

Student c orders a 15-scoop sundae from 10 flavors:

k = 15 scoops, n = 10 flavors
Answer: (10+15−1 choose 15) = (24 choose 15) = 1,307,504 different sundaes
That's about 3,593 sundaes per day to try them all in a year!

🎈 Balloon example

A store sells 6 colors of balloons; you buy 10 balloons:

k = 10 balloons, n = 6 colors
Answer: ((6 choose 10)) = (6+10−1 choose 10) = (15 choose 10) = 3,003 ways

🎁 Variation: at least one of each type

🎁 The constraint changes the count

Problem: Choose m elements from n types where each type must appear at least once (m ≥ n).

Solution approach:

First, pick one of each type (uses n items)
Then choose the remaining m − n items freely from n types
Apply the multiset formula to the remaining items

Corollary: The number of ways is (m−1 choose n−1).

🍦 Sundae with every flavor

If the sundae must have at least one scoop of each of the 10 flavors:

First 10 scoops determined (one per flavor)
Choose 5 more scoops from 10 flavors
Answer: (14 choose 5) = 2,002 sundaes (about 6 per day for a year)

🎈 Bouquet with every color

10 balloons from 6 colors, at least one of each color:

m = 10, n = 6
Answer: (10−1 choose 6−1) = (9 choose 5)

💰 Distribution problems

💰 Distributing identical objects

Corollary: The number of ways to distribute k identical pennies to n people is (n+k−1 choose k).

Why it works:

Label people A₁, A₂, ..., Aₙ
Create a multiset where each person's label appears as many times as pennies they receive
This is exactly a multiset of size k from n types

Example: 5 pennies to 4 people

Multiset {A₁, A₁, A₃, A₄, A₄} means: person A₁ gets 2, A₂ gets 0, A₃ gets 1, A₄ gets 2
Total ways: (4+5−1 choose 5) = (8 choose 5) = 56

💰 Distribution with minimum requirement

If each person must receive at least one penny:

First give 1 penny to each person (uses n pennies)
Distribute remaining m − n pennies freely
Answer: (m−1 choose n−1)

Example: 9 pennies to 4 people, each gets at least one

Answer: (9−1 choose 4−1) = (8 choose 3) = 56

📝 Compositions: ordered lists with fixed sum

📝 What compositions count

Composition: an ordered list of nonnegative integers with fixed length and sum.

Example: Compositions of 9 of length 4 include:

(4, 0, 2, 3)
(0, 4, 2, 3)
(0, 0, 0, 9)

Note: (4, 0, 2, 3) and (0, 4, 2, 3) are different because order matters.

📝 Connection to distribution

A composition of m of length n encodes distributing m identical pennies to n people:

Position i in the list = number of pennies person i receives
Therefore: number of compositions of m of length n = (n+m−1 choose m)

Example: Compositions of 9 of length 4

Same as distributing 9 pennies to 4 people
Answer: ((4 choose 9)) = (12 choose 9) = 220

📝 Compositions with positive integers

If all entries must be positive (at least 1):

Each person must get at least one penny
Subtract 1 from each position
Reduces to compositions of m−n of length n
Answer: (m−1 choose n−1)

Example: Ordered lists of 4 positive integers summing to 9

Same as compositions of 5 of length 4
Answer: (8 choose 3) = 56

📊 Summary table

📊 Complete counting formulas

Scenario	Order doesn't matter	Order matters
Repeats not allowed	(n choose k)	n! / (n−k)!
Repeats allowed	(n+k−1 choose k)	nᵏ

The multiset formula (n+k−1 choose k) completes the lower-left cell of this fundamental counting table.

Summary of Counting Formulas

3.6 Summary

🧭 Overview

🧠 One-sentence thesis

This section consolidates the four major counting formulas that distinguish whether order matters and whether repetition is allowed when choosing k objects from n objects.

📌 Key points (3–5)

What the section provides: a summary table of the four fundamental counting formulas from the chapter.
Two key dimensions: order (does it matter or not?) and repetition (allowed or not allowed?).
Common confusion: distinguishing when to use combinations vs permutations, and when repetition changes the formula.
The four cases: no-order/no-repeat uses binomial coefficients; order/no-repeat uses factorial division; no-order/with-repeat uses stars-and-bars; order/with-repeat uses powers.

📊 The four counting formulas

📊 Summary table

The excerpt presents a table organizing the formulas by two criteria:

Order matters?	Repeats allowed?	Formula	Notation
No	No	n choose k	(n choose k)
No	Yes	n + k − 1 choose k	((n + k − 1) choose k)
Yes	No	n factorial divided by (n − k) factorial	n! / (n − k)!
Yes	Yes	n to the power k	n^k

🔍 What each formula counts

(n choose k): choosing k objects from n when order does not matter and you cannot pick the same object twice.
- Example: selecting 3 students from a class of 10 for a committee.
((n + k − 1) choose k): choosing k objects from n types when order does not matter but you can pick the same type multiple times.
- Example: distributing 10 identical candies to 4 students (stars-and-bars).
n! / (n − k)!: arranging k objects chosen from n when order matters and no repetition is allowed.
- Example: assigning 1st, 2nd, 3rd place from 8 contestants.
n^k: arranging k objects from n types when order matters and repetition is allowed.
- Example: creating a 5-character password where each character can be any of 26 letters.

🧩 How to choose the right formula

🧩 Does order matter?

Order matters means different arrangements of the same objects count as different outcomes.
- If you care about sequence or position → use the "order matters" row.
Order does not matter means you only care about which objects are selected, not their arrangement.
- If you care only about the set of chosen objects → use the "order does not matter" row.

🔁 Are repeats allowed?

Repeats not allowed means each object can be chosen at most once.
- Typical in selection problems where objects are distinct and cannot be reused.
Repeats allowed means the same object (or type) can be chosen multiple times.
- Common in distribution problems with identical items or when sampling with replacement.

⚠️ Common confusion: combinations with repetition

Don't confuse the "no order, repeats allowed" case with the basic binomial coefficient.
The formula ((n + k − 1) choose k) accounts for distributing k identical items into n bins (stars-and-bars method).
Example: How many ways to buy 10 pieces of sushi from 3 types? Use ((3 + 10 − 1) choose 10) = ((12) choose 10), not (3 choose 10).

🎯 Context and application

🎯 What this summary consolidates

The excerpt states these are "the major results in this chapter."
The formulas are presented without derivation; they summarize earlier sections (likely Sections 3.1–3.5).
The table is a reference tool for quickly identifying which formula applies to a given counting problem.

🎯 How to use the table

Identify whether the problem cares about order (arrangement vs selection).
Determine whether the same object can appear more than once.
Locate the corresponding cell in the table.
Apply the formula with the appropriate values of n (number of object types or total objects) and k (number of choices or positions).

Additional Problems for Chapter 3

3.7 Additional problems for Chapter

🧭 Overview

🧠 One-sentence thesis

This section provides practice problems that apply the counting formulas from Chapter 3 to a variety of scenarios including permutations, combinations with and without repetition, and both ordered and unordered selections.

📌 Key points (3–5)

Core formulas to apply: the chapter summary table shows four counting formulas based on whether order matters and whether repeats are allowed.
Problem types: permutations of letters with repetition, distributing identical or distinct objects, forming strings with constraints, and counting outcomes with dice or cards.
Common confusion: distinguishing when order matters vs. when it does not—strings and ranked lists are ordered; distributing identical items to people (without ranking the people) is unordered.
Repeats allowed vs. not allowed: giving 20 identical rocks to 12 students allows repeats (one student can get multiple rocks); choosing distinct letters does not.
Real-world application: the poker investigation (section 3.8) demonstrates how to count card hands by breaking down constraints (suits, values, runs, flushes).

📊 Summary of counting formulas

📊 The four core formulas

The excerpt provides a table summarizing the major results:

Order matters?	Repeats allowed?	Formula	Meaning
No	No	(n choose k)	Choose k objects from n, no order, no repeats
Yes	No	n! / (n - k)!	Arrange k objects from n, order matters, no repeats
No	Yes	(n + k - 1 choose k)	Choose k objects from n, no order, repeats allowed
Yes	Yes	n^k	Arrange k objects from n, order matters, repeats allowed

These formulas are the backbone for solving all the problems in section 3.7.
The table helps you decide which formula to use based on the problem's constraints.

🔤 Permutation and arrangement problems

🔤 Permutations with repeated letters

Problem 1: How many different strings are permutations of the letters in PILLOW?

The word PILLOW has repeated letters (two L's).
This is a permutation problem where order matters but some objects are identical.
Don't confuse: if all letters were distinct, the answer would be 6 factorial; repetition reduces the count because swapping identical letters does not create a new permutation.

🏆 Ranked lists and ordered selections

Problem 2: Baseball team A has 7 pitchers, team B has 5 catchers. A trade involves 3 of A's pitchers for 2 of B's catchers. The newspaper makes a ranked list of their top 4 favorite possible trades. How many different ranked lists can the newspaper make?

First, count the number of possible trades: choose 3 pitchers from 7 and 2 catchers from 5 (order does not matter within a trade).
Then, arrange 4 trades into a ranked list (order matters for the ranking).
Example: Trade X ranked first is different from Trade X ranked second.

🔢 Strings with mixed constraints

Problem 3: How many ordered strings with 10 symbols have exactly 7 numbers (0–9) and exactly 3 letters (a–z)?

Order matters: the string abc1222233 is different from 12c2a3b322.
Step 1: choose 7 positions out of 10 for the numbers (the remaining 3 are for letters).
Step 2: fill the 7 number positions (repeats allowed, 10 choices each).
Step 3: fill the 3 letter positions (repeats allowed, 26 choices each).

🪨 Distribution problems

🪨 Distributing identical objects

Problem 4(a): Find the number of ways to give 20 identical rocks to 12 students (a through l).

Distributing identical objects to distinct recipients with no restrictions: use the formula (n + k - 1 choose k), where n is the number of recipients and k is the number of objects.

Here, order does not matter (we don't rank the students), and repeats are allowed (one student can receive multiple rocks).
This is the "stars and bars" problem.

🪨 Distributing with a minimum constraint

Problem 4(b): Same as (a), but every student receives at least one rock.

First, give 1 rock to each of the 12 students (using 12 rocks).
Then, distribute the remaining 20 - 12 = 8 rocks with no restrictions.
Don't confuse: "at least one" changes the problem by reducing the number of objects left to distribute freely.

🪨 Distributing distinct objects

Problem 4(c): The rocks are all different and there are no restrictions (one student can receive all 20 rocks).

Each of the 20 distinct rocks can go to any of the 12 students.
Order matters in the sense that each rock's assignment is independent.
This is the formula n^k (12 choices for each of 20 rocks).

🧬 String and sequence problems

🧬 Sequences with a fixed alphabet

Problem 5: How many different DNA sequences of length 5 are there? (Alphabet: A, C, G, T)

Order matters, repeats allowed.
4 choices for each of 5 positions → 4^5.

Problem 6: How many strings of four distinct letters from the English alphabet (26 letters)?

Order matters, repeats not allowed (distinct means no letter appears twice).
26 choices for the first position, 25 for the second, 24 for the third, 23 for the fourth.
Formula: 26! / (26 - 4)!

🧬 Binary strings with balance constraints

Problem 7: How many binary strings of length 8 have the same number of 0's as 1's?

Length 8 with equal 0's and 1's means exactly 4 zeros and 4 ones.
Choose 4 positions out of 8 for the zeros (the rest are ones).
Formula: (8 choose 4).

🧬 Strings with exact or bounded counts

Problem 8: How many strings of length 7 from the alphabet {0, 1, 2} have exactly three 0's?

Choose 3 positions out of 7 for the 0's.
Fill the remaining 4 positions with 1 or 2 (2 choices each).
Formula: (7 choose 3) × 2^4.

Problem 9: How many strings of length 7 from {0, 1, 2} have less than three 0's?

"Less than three" means 0, 1, or 2 zeros.
Count each case separately and add them up.
Don't confuse: "less than three" is not the same as "exactly three."

🎲 Dice and card problems

🎲 Rolling dice with a sum constraint

Problem 10: How many ways can you roll a sum of 13 with three six-sided dice?

Each die shows 1 through 6.
Order matters (die 1, die 2, die 3 are distinguishable).
Count all ordered triples (a, b, c) where a + b + c = 13 and 1 ≤ a, b, c ≤ 6.

🃏 Poker hand counting (section 3.8 preview)

The excerpt introduces a poker investigation that applies counting formulas to card hands.

Key poker vocabulary:

A standard deck has 52 cards: 4 suits (hearts, spades, diamonds, clubs), 13 values per suit (2–10, Jack, Queen, King, Ace).
A hand is a set of 5 cards.
A run is 3 or more consecutive values (suit does not matter); Ace can be low or high but not in the middle.

Example hands and counts:

Hand type	Definition	Example count formula	Result
Royal flush	Run of A, K, Q, J, 10, all same suit	(4 choose 1)	4
Straight flush	Run of 5, all same suit (not royal)	(10 choose 1) × (4 choose 1) - 4	36
Four of a kind	Four cards of one value, any fifth card	(13 choose 1) × (48 choose 1)	624
Full house	Three of one value, two of another	(13 choose 1) × (4 choose 3) × (12 choose 1) × (4 choose 2)	(formula given, result cut off in excerpt)

Royal flush: values are fixed, so only choose the suit.
Straight flush: choose the starting value (10 options: Ace through 10) and the suit, then subtract the 4 royal flushes.
Four of a kind: choose the value for the four cards, then choose any of the remaining 48 cards for the fifth.
Full house: choose the value for the three cards, pick 3 suits out of 4 for them, choose a different value for the pair, pick 2 suits out of 4 for the pair.

Don't confuse: a run (straight) cares about consecutive values but not suit; a flush cares about all cards being the same suit but not consecutive values; a straight flush requires both.

Investigation: Counting problems in the game of poker

3.8 Investigation: Counting problems in the game of poker

🧭 Overview

🧠 One-sentence thesis

Counting poker hands requires systematically choosing card values and suits while carefully subtracting overlapping cases to avoid double-counting hands that belong to higher-ranking categories.

📌 Key points (3–5)

What we're counting: the number of distinct five-card hands that satisfy specific poker hand definitions, using combinations and the multiplication principle.
Core method: choose values first, then suits, then subtract any hands that would belong to a better (already-counted) category.
Common confusion: order matters vs. order doesn't matter—when choosing the last few cards in a hand, dividing by factorial accounts for the fact that drawing cards in different orders produces the same hand.
Why subtraction is necessary: some counting methods initially include hands that belong to higher-ranking categories (e.g., counting flushes initially includes straight flushes), so we must subtract those to get the correct count.
Key constraint: a standard deck has 52 cards in 4 suits, with 13 values per suit; each hand contains exactly 5 cards.

🃏 Deck and poker basics

🃏 Standard deck structure

A standard deck contains 52 cards sorted into four suits (♥ hearts, ♠ spades, ♦ diamonds, ♣ clubs), with 13 cards in each suit.

Each suit contains cards with values: 2, 3, 4, 5, 6, 7, 8, 9, 10, J (Jack), Q (Queen), K (King), A (Ace).
A hand is a set of five cards held by a player.
The excerpt emphasizes that 13 × 4 = 52 (quick check).

🔢 Runs and Aces

A run is a set of three or more cards with consecutive values where suit does not matter.

Example: 3♥, 4♠, 5♠ is a run of three cards; 9♥, 10♠, J♣, Q♣, K♦ is a run of five.
Special Ace rule: An Ace can be either the lowest or highest card in a run, but not in the middle.
- Ace high: Q♣, K♦, A♦
- Ace low: A♣, 2♥, 3♥
- Not a run: K♥, A♦, 2♣ (Ace cannot bridge King and 2)

🏆 High-ranking hands (rare)

👑 Royal flush

A run consisting of an Ace, King, Queen, Jack, and 10, all of the same suit.

Number of hands: (4 choose 1) = 4
Why: The five card values are fixed (A, K, Q, J, 10), so we only choose one of the four suits.

🌊 Straight flush

A run of five cards, all of the same suit (but not a royal flush).

Number of hands: (10 choose 1) × (4 choose 1) − 4 = 36
Why:
- Choose the starting card value: 10 ways (Ace through 10).
- Choose the suit: 4 ways.
- Subtract the 4 royal flushes (runs starting with 10).
Don't confuse: If we tried to start with a Jack, we wouldn't have enough cards (Aces end runs).

🎯 Four of a kind

Four cards of the same value and any other card.

Number of hands: (13 choose 1) × (48 choose 1) = 624
Why:
- Pick one value for the four cards: 13 ways.
- Choose the fifth card from the remaining 52 − 4 = 48 cards.

🏠 Full house

Three cards of one value and two cards of a second value.

Number of hands: (13 choose 1) × (4 choose 3) × (12 choose 1) × (4 choose 2) = 3,744
Why:
- Choose the value for the triple: 13 ways.
- Choose 3 suits for those 3 cards: (4 choose 3) ways.
- Choose the value for the pair from the remaining 12 values: 12 ways.
- Choose 2 suits for those 2 cards: (4 choose 2) ways.

💧 Flush and straight hands

💧 Flush

All five cards have the same suit, but the hand is not a run (not a straight or royal flush).

Number of hands: (4 choose 1) × (13 choose 5) − 40 = 5,108
Why:
- Choose one suit: 4 ways.
- Choose 5 values from the 13 possible: (13 choose 5) ways.
- Subtract 40 (the total number of straight flushes and royal flushes: 36 + 4).
Key idea: The initial count includes some straight/royal flushes, so we must subtract them.

📏 Straight

A run of five cards with at least two suits (not all the same suit).

Number of hands: (10 choose 1) × (4 choose 1)^5 − 40 = 10,200
Why:
- Pick the starting card value: 10 ways.
- Pick a suit for each of the 5 cards: 4^5 ways.
- Subtract 40 (straight flushes and royal flushes, which are all the same suit).

🎲 Mid-ranking hands (pairs and triples)

🎲 Three of a kind

Exactly three cards of the same value and two other cards with distinct values (not a full house).

Number of hands: (13 choose 1) × (4 choose 3) × [(48 choose 1) × (44 choose 1)] / 2! = 54,912
Why:
- Choose the value and three suits for the triple: (13 choose 1) × (4 choose 3).
- Choose the first remaining card from 48 cards (to avoid four of a kind).
- Choose the second remaining card from 44 cards (to avoid a full house).
- Divide by 2! because the order in which we choose the last two cards doesn't matter (e.g., 4♥ then 9♥ is the same hand as 9♥ then 4♥).

👥 Two pairs

Two pairs of cards (each pair has the same value) and a third card of a distinct value.

Number of hands: (13 choose 2) × (4 choose 2)^2 × (11 choose 1) × (4 choose 1) = 123,552
Why:
- Choose two values for the pairs: (13 choose 2).
- Once chosen (e.g., King and 10), one is larger; choose 2 suits for the larger-valued pair: (4 choose 2).
- Choose 2 suits for the smaller-valued pair: (4 choose 2).
- Choose the value for the last card from the remaining 11 values (to avoid three of a kind): 11 ways.
- Choose the suit of the last card: 4 ways.

🎴 Pair

Exactly two cards of the same value (different suits), and all other cards have different values.

Number of hands: (13 choose 1) × (4 choose 2) × [(48 choose 1) × (44 choose 1) × (40 choose 1)] / 3! = 1,098,240
Why:
- Choose the value and 2 suits for the pair: (13 choose 1) × (4 choose 2).
- Choose the first remaining card from 48 cards (to avoid three/four of a kind).
- Choose the second from 44 cards (to avoid two pairs).
- Choose the third from 40 cards (again to avoid two pairs).
- Divide by 3! because the order of drawing the last three cards doesn't matter.
Don't confuse: If the other three cards included another pair, it would be "two pairs"; if they matched the first pair's value, it would be three or four of a kind.

🃏 High card and subtraction principle

🃏 High card

A hand that does not fit any of the above categories.

Number of hands: 1,302,540
Combinatorial proof: Left as an exercise in the excerpt.
Method hint (from exercises): Define set A = hands with at least two cards of the same number, B = hands where all cards are the same suit, C = all straights. Then high cards = complement of A ∪ B ∪ C. Use the Inclusion-Exclusion Principle.

➖ Why subtraction is crucial

Many counting methods initially produce a superset that includes higher-ranking hands.
Example: Counting flushes by choosing a suit and 5 values includes straight flushes, so we subtract 40.
Example: Counting straights by choosing a starting value and suits for each card includes straight/royal flushes, so we subtract 40.
Common pattern: Count broadly, then subtract the overlapping "better" hands already counted elsewhere.

🔄 Order doesn't matter: dividing by factorials

🔄 When and why to divide

Key principle: If we choose multiple cards one at a time, but the order of selection doesn't affect the final hand, we must divide by the number of orderings (factorial).
Three of a kind: We choose the last two cards sequentially (48, then 44), but drawing them in either order gives the same hand → divide by 2!.
Pair: We choose the last three cards sequentially (48, 44, 40), but any ordering of these three gives the same hand → divide by 3!.
Don't confuse: We do not divide when the order of selection corresponds to a meaningful distinction (e.g., choosing the larger vs. smaller pair in "two pairs").

🔄 Example: Three of a kind

Without dividing: choosing 4♥ first and 9♥ second is counted separately from choosing 9♥ first and 4♥ second.
With dividing by 2!: these two sequences are recognized as the same hand, so we divide the count by 2.

📋 Summary table of hand counts

Hand	Definition	Count	Key subtraction or division
Royal flush	A, K, Q, J, 10 of same suit	4	None (values fixed)
Straight flush	Run of 5, same suit (not royal)	36	Subtract 4 royal flushes
Four of a kind	Four same value + any card	624	None
Full house	Triple + pair	3,744	None
Flush	Same suit, not a run	5,108	Subtract 40 (straight/royal flushes)
Straight	Run of 5, ≥2 suits	10,200	Subtract 40 (straight/royal flushes)
Three of a kind	Triple + 2 distinct others	54,912	Divide by 2! (order of last 2 cards)
Two pairs	Two pairs + 1 other	123,552	None (larger/smaller pair distinguished)
Pair	One pair + 3 distinct others	1,098,240	Divide by 3! (order of last 3 cards)
High card	None of the above	1,302,540	Use Inclusion-Exclusion (exercise)

Pascal's Triangle

4.1 Pascal’s triangle

🧭 Overview

🧠 One-sentence thesis

Pascal's triangle provides a fast recursive method to compute binomial coefficients, where each entry equals the sum of the two entries above it, and these coefficients appear as the multipliers in binomial expansions.

📌 Key points (3–5)

What Pascal's triangle is: an infinite triangular chart where the entry in the n-th row and k-th column is the binomial coefficient "n choose k."
How to compute entries quickly: edges are always 1; interior entries equal the sum of the two entries directly above (left and right).
Physical interpretation: each entry counts the number of distinct paths a ball can take to reach that position when falling through the grid (moving down-left or down-right at each step).
Common confusion: the triangle can be used both as a lookup table for binomial coefficients and as a path-counting tool—these are the same numbers but viewed differently.
Why it matters: Pascal's triangle gives the coefficients in the Binomial Theorem, connecting combinatorics to algebra.

🔢 Structure and notation

🔢 How the triangle is organized

The triangle is arranged in rows numbered n = 0, 1, 2, 3, ...
Within each row, entries are indexed by k = 0, 1, 2, ..., n (diagonal columns).
The entry in row n and column k is the binomial coefficient "n choose k," written as (n k).
Example: Row n = 4 contains the entries (4 0), (4 1), (4 2), (4 3), (4 4).

📐 Visual layout

The first few rows look like this:

n = 0: 1
n = 1: 1 1
n = 2: 1 2 1
n = 3: 1 3 3 1
n = 4: 1 4 6 4 1
n = 5: 1 5 10 10 5 1
n = 6: 1 6 15 20 15 6 1

The triangle is symmetric: the left and right edges are both 1, and entries mirror across the center.

🧮 Pascal's Recurrence

🧮 The recursive rule (Theorem 4.1.1)

Pascal's Recurrence: For any natural number n, (a) the edge entries satisfy (n 0) = 1 and (n n) = 1, and (b) if 0 ≤ k < n, then (n k) + (n k+1) = (n+1 k+1).

Part (a) says all entries on the left and right edges of the triangle are 1.
Part (b) says any interior entry is the sum of the two entries immediately above it (one to the left, one to the right).
Example: To find the middle entry in row n = 2, compute 2 = 1 + 1. To find the third entry (k = 2) in row n = 6, compute 15 = 5 + 10.

🔍 Why this works (algebraic proof)

The excerpt proves part (b) by manipulating factorials:

Start with the formula for binomial coefficients: (n k) = n! / (k!(n - k)!).
Add (n k) + (n k+1) by finding a common denominator.
Multiply the first term by (k+1)/(k+1) and the second by (n - k)/(n - k).
Combine to get n!(k + 1 + n - k) / ((k+1)!(n - k)!) = n!(n+1) / ((k+1)!(n - k)!).
Simplify to (n+1)! / ((k+1)!(n+1 - (k+1))!) = (n+1 k+1).

Don't confuse: This is an algebraic proof; the excerpt also mentions a "more elegant combinatorial proof" in a later section (Example 5.3.3).

🎯 Path-counting interpretation

🎯 The falling ball model (Lemma 4.1.2)

Lemma 4.1.2: The entry (n k) counts the number of possible ways a ball could have traveled to that location in the grid.

Imagine a ball starting at the top of the triangle.
At each step, the ball falls down one row, moving either left (L) or right (R).
To reach row n, the ball makes n moves.
To reach the entry (n k), the ball must move right exactly k times and left n - k times.
The number of such routes is (n k), because you choose which k of the n steps are "right."

🛤️ Examples of routes

Example 4.1.3: To reach (4 0), there is only one route: LLLL. To reach (4 1), there are four routes: LLLR, LLRL, LRLL, RLLL.
Example 4.1.4 (grid walking): Starting at a corner, walk two blocks north and three blocks west to reach a destination. After rotating Pascal's triangle, the number of such paths is 10. The paths can be written as sequences like NNWWW, NWNWW, etc.

🧩 Why recurrence makes sense for paths

Part (a): There is only one way to reach the outer edges—always move left or always move right.
Part (b): To reach an interior spot, the ball must have come from one of two positions in the row above (either from the left or from the right). The total number of paths is the sum of paths to those two positions.

🔗 Connection to the Binomial Theorem

🔗 Binomial coefficients in algebra (Theorem 4.2.2)

Binomial Theorem: The expansion of (x + y) raised to the power n is (x + y)^n = (n 0)x^n + (n 1)x^(n-1)y^1 + ... + (n k)x^(n-k)y^k + ... + (n n)y^n.

The coefficients in the expansion are exactly the entries in row n of Pascal's triangle.
Example expansions:
- (x + y)^0 = 1
- (x + y)^1 = x + y
- (x + y)^2 = x^2 + 2xy + y^2
- (x + y)^3 = x^3 + 3x^2y + 3xy^2 + y^3
- (x + y)^4 = x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4

🧪 Why the coefficients match (proof sketch)

Expanding (x + y)^n means multiplying (x + y) by itself n times.
At each of the n factors, you choose either x or y.
To get the term x^(n-k) y^k, you must choose y exactly k times and x the remaining n - k times.
There are (n k) ways to choose which k factors contribute y.
This is the "giant foil method" generalization mentioned in Remark 4.2.1.

Don't confuse: The binomial coefficients are the same numbers whether you compute them via factorials, Pascal's recurrence, path counting, or algebraic expansion—they are all equivalent views of the same combinatorial object.

🧪 Practical uses and exercises

🧪 Computing rows

To find row n = 7 or n = 8, start from row n = 6 and apply Pascal's recurrence repeatedly.
Each new row has one more entry than the previous row.

🧪 Grid path problems

General grid paths: The number of paths of length m + n from the lower left to the upper right corner of an m × n grid (moving only along edges, not backtracking) is a binomial coefficient.
Example: Walking four blocks north and two blocks west (without going south or east) corresponds to choosing which 2 of the 6 total steps are "west."

🧪 Probability and symmetry

If a ball falls with equal probability of going left or right at each step, the probability of landing at a specific entry depends on the number of paths to that entry divided by the total number of paths (which is 2^n for row n).
Example (Exercise 9): Compute the probability the ball lands at the 3rd spot in the 5th row.
Symmetry observation (Exercise 10): In row 10, the ball has a 50% probability of landing where k is even, because the triangle is symmetric and paths are equally likely.

🧪 Divisibility property (Exercise 6)

If p is a prime number and 1 ≤ k ≤ p - 1, then (p k) is divisible by p (i.e., (p k) ≡ 0 mod p).
This follows from the factorial formula and properties of primes.

The Binomial Theorem

4.2 The Binomial Theorem

🧭 Overview

🧠 One-sentence thesis

The Binomial Theorem states that the coefficients in the expansion of (x + y) to the power n are precisely the binomial coefficients from Pascal's triangle, which can be explained by counting how many ways to choose variables from each factor.

📌 Key points (3–5)

The pattern: When you expand (x + y) to the power n, the coefficients are the numbers in row n of Pascal's triangle—this is why they are called "binomial coefficients."
The formula: The expansion is the sum of terms (n choose k) times x to the (n minus k) times y to the k, for k from 0 to n.
Why it works: To get a term like x to the (n minus k) times y to the k, you must choose y from exactly k of the n factors (x + y), and there are (n choose k) ways to do that.
Common confusion: The "foil" method for (x + y) squared is a special case; the Binomial Theorem is the "giant foil method" for any power.
Practical use: By reading a row from Pascal's triangle, you can immediately write down the expansion of any binomial power without multiplying everything out.

🔢 The pattern in powers of binomials

🔢 Observing the coefficients

The excerpt shows the first few powers of (x + y):

(x + y) to the 0 equals 1
(x + y) to the 1 equals x + y
(x + y) to the 2 equals x squared + 2xy + y squared
(x + y) to the 3 equals x cubed + 3 x squared y + 3 x y squared + y cubed
(x + y) to the 4 equals x to the 4 + 4 x cubed y + 6 x squared y squared + 4 x y cubed + y to the 4

When you ignore the x's and y's and look only at the coefficients (1; 1, 1; 1, 2, 1; 1, 3, 3, 1; 1, 4, 6, 4, 1), they are exactly the rows of Pascal's triangle.

📛 Why "binomial coefficients"

The numbers (n choose k) are called binomial coefficients because they appear as the coefficients in the expansion of a binomial (x + y) to the power n.

This naming convention directly reflects the pattern observed above.

🧮 The Binomial Theorem statement and proof

🧮 The theorem

Binomial Theorem: Let x and y be variables. Then the expansion of (x + y) to the power n is: (x + y) to the n equals (n choose 0) x to the n plus (n choose 1) x to the (n minus 1) y to the 1 plus ... plus (n choose k) x to the (n minus k) y to the k plus ... plus (n choose n minus 1) x to the 1 y to the (n minus 1) plus (n choose n) y to the n.

This theorem was first discovered by the Persian mathematician and engineer Al-Karaji (approximately 953–1029), making it one of the oldest named theorems in mathematics.
Each term has the form (n choose k) times x to the (n minus k) times y to the k.
The powers of x decrease from n to 0, while the powers of y increase from 0 to n; the sum of the exponents in each term is always n.

🔍 Why the theorem works (the "giant foil method")

The proof explains the counting argument:

Start with the product: (x + y) to the n means multiplying (x + y) by itself n times.
Expand by choosing: To expand, you take one entry (either x or y) from each of the n factors, multiply them together, and add all possible products.
Count the ways to get a specific term: To produce the monomial x to the (n minus k) times y to the k, you must choose y from exactly k of the n factors and x from the remaining (n minus k) factors.
Apply the binomial coefficient: There are (n choose k) ways to choose which k factors contribute y (and which n minus k contribute x), so the coefficient of x to the (n minus k) times y to the k is (n choose k).

Example: In (x + y) to the 10, the coefficient in front of x to the 8 times y squared is (10 choose 2) equals 45, because you need to choose y from 2 of the 10 factors (and x from the remaining 8).

🧩 Connection to "foil"

The excerpt notes that some students know the "foil" (first, outer, inner, last) method for (x + y) squared:

(x + y) squared equals x times x plus y times x plus x times y plus y times y.

The Binomial Theorem generalizes this into a "giant foil method" for any power n, systematically counting all the ways to pick x or y from each factor.

🛠️ Using the theorem with Pascal's triangle

🛠️ Quick expansion by reading rows

By combining the Binomial Theorem with Pascal's triangle, you can expand powers without multiplying:

Example 1: Row n equals 6 of Pascal's triangle is 1, 6, 15, 20, 15, 6, 1, so:

(x + y) to the 6 equals x to the 6 plus 6 x to the 5 y plus 15 x to the 4 y squared plus 20 x cubed y cubed plus 15 x squared y to the 4 plus 6 x y to the 5 plus y to the 6.

Example 2: Row n equals 3 is 1, 3, 3, 1, so:

(a squared + 5b) to the 3 equals (a squared) cubed plus 3 times (a squared) squared times (5b) plus 3 times (a squared) times (5b) squared plus (5b) cubed
Simplifying: a to the 6 plus 15 a to the 4 b plus 75 a squared b squared plus 125 b cubed.

🎯 Finding a single coefficient

You don't need to expand the entire expression to find one coefficient—just use the appropriate binomial coefficient.

Example: What is the coefficient of x to the 5 times y cubed in (x + y) to the 8?

The term x to the 5 times y cubed corresponds to k equals 3 (since y appears 3 times).
The coefficient is (8 choose 3).

Don't confuse: the exponent on y tells you which binomial coefficient to use (k), not the exponent on x.

📐 Additional patterns in Pascal's triangle

📐 Symmetry

The excerpt mentions that (n choose k) equals (n choose n minus k), which is illustrated by the reflective symmetry about the vertical axis in Pascal's triangle.

Row	Entries	Symmetry
n = 4	1, 4, 6, 4, 1	Reads the same forwards and backwards
n = 5	1, 5, 10, 10, 5, 1	Reads the same forwards and backwards

📐 Row sums are powers of 2

The sum of all entries in row n equals 2 to the n.

Example: Row n equals 4 is 1, 4, 6, 4, 1, and 1 + 4 + 6 + 4 + 1 = 16 = 2 to the 4.

In general, the sum of the entries in row n is 2 to the n.

First identities in Pascal's triangle

4.3 First identities in Pascal’s triangle

🧭 Overview

🧠 One-sentence thesis

Pascal's triangle exhibits three fundamental patterns—symmetry, row sums equaling powers of 2, and alternating row sums equaling zero—all of which can be proven using the Binomial Theorem and interpreted through a falling-ball probability model.

📌 Key points (3–5)

Symmetry pattern: The triangle is symmetric about its vertical axis, reflecting the identity that binomial coefficient (n choose k) equals (n choose n minus k).
Row sums pattern: The sum of all entries in row n equals 2 to the power n, proven by substituting x equals 1 and y equals 1 into the Binomial Theorem.
Alternating sums pattern: For positive n, the alternating sum (adding and subtracting consecutive entries) in row n equals zero, proven by substituting x equals 1 and y equals negative 1 into the Binomial Theorem.
Common confusion: The falling-ball probability interpretation connects combinatorial counting to probability—each binomial coefficient (n choose k) divided by 2 to the power n gives the probability of reaching that location, not just a count.
Why it matters: These patterns provide multiple ways to understand binomial coefficients—geometric (symmetry), algebraic (Binomial Theorem), and probabilistic (falling ball).

🔄 The symmetry pattern

🪞 What symmetry means

Symmetry in Pascal's triangle: The triangle is reflectively symmetric about its vertical axis.

This reflects the identity: (n choose k) equals (n choose n minus k).
The excerpt references Corollary 3.4.5 as the source of this identity.
Example: In row n equals 4, the entry (4 choose 1) equals 4 and (4 choose 3) equals 4; both are equidistant from the center.

🔍 Visual interpretation

The vertical axis passes through the center entries: 1, 2, 6, 20, and so on.
Each entry at position k from the left mirrors the entry at position n minus k from the left.
Don't confuse: Symmetry is about position within a row, not about comparing different rows.

➕ Row sums equal powers of 2

📐 The pattern statement

Row sum pattern: The sum of all entries in row n of Pascal's triangle equals 2 to the power n.

In binomial coefficient notation: (n choose 0) plus (n choose 1) plus (n choose 2) plus ... plus (n choose n) equals 2 to the power n.
Example: For n equals 4, the sum is 1 plus 4 plus 6 plus 4 plus 1 equals 16 equals 2 to the power 4.

🧮 Proof using the Binomial Theorem

The Binomial Theorem states: (x plus y) to the power n equals the sum over k from 0 to n of (n choose k) times x to the power (n minus k) times y to the power k.
Set x equals 1 and y equals 1: (1 plus 1) to the power n equals the sum of (n choose k) times 1 to the power (n minus k) times 1 to the power k.
Simplify: 2 to the power n equals the sum of all (n choose k) from k equals 0 to n.
This proves Proposition 4.3.1.

🎲 Falling-ball probability interpretation

Imagine a ball falling through Pascal's triangle, with equal probability (1/2) of going left or right at each step.
After n steps, the ball has taken one of 2 to the power n possible routes, each with probability 1 divided by 2 to the power n.
The number of routes passing through location (n choose k) is exactly (n choose k).
Therefore, the probability of passing through (n choose k) is (n choose k) divided by 2 to the power n (Lemma 4.3.2).

🔗 Connecting probability to row sums

The ball must land somewhere in row n after n steps, so the sum of all probabilities equals 1.
Multiply each probability by 2 to the power n: the sum of all (n choose k) equals 2 to the power n.
This reproduces the row sum pattern (Remark 4.3.3).
Example: For n equals 4, there is 1 path to (4 choose 0) with probability 1/16; there are 4 paths to (4 choose 1) with probability 4/16 equals 1/4 (Example 4.3.4).

➖ Alternating sums equal zero

📐 The alternating sum pattern

Alternating sum pattern: For positive n, the alternating sum (n choose 0) minus (n choose 1) plus (n choose 2) minus (n choose 3) plus ... equals 0.

The sign alternates between plus and minus.
The last term is positive if n is even, negative if n is odd.
Example: For n equals 4, the alternating sum is 1 minus 4 plus 6 minus 4 plus 1 equals 0.

🧮 Proof using the Binomial Theorem

Set x equals 1 and y equals negative 1 in the Binomial Theorem: (1 plus (negative 1)) to the power n equals the sum of (n choose k) times 1 to the power (n minus k) times (negative 1) to the power k.
Simplify: 0 to the power n equals (n choose 0) minus (n choose 1) plus (n choose 2) minus (n choose 3) plus ... (with alternating signs).
For positive n, 0 to the power n equals 0, proving Proposition 4.3.5.
The excerpt notes that another proof will be given in Proposition 5.4.8.

📊 Summary of the three patterns

Pattern	Formula	Proof method	Interpretation
Symmetry	(n choose k) equals (n choose n minus k)	Corollary 3.4.5	Reflective symmetry about vertical axis
Row sums	Sum of (n choose k) for k from 0 to n equals 2 to the power n	Binomial Theorem with x equals 1, y equals 1	Total probability equals 1; ball must land somewhere
Alternating sums	Alternating sum of (n choose k) equals 0 (for positive n)	Binomial Theorem with x equals 1, y equals negative 1	Cancellation of positive and negative contributions

🎯 How to distinguish the patterns

Row sums vs alternating sums: Row sums add all entries with the same sign; alternating sums change the sign at each step.
Symmetry vs sums: Symmetry compares two entries within the same row; sums combine all entries in a row into a single number.
Binomial Theorem substitutions: Different substitutions (x equals 1, y equals 1 versus x equals 1, y equals negative 1) yield different patterns.

4.4 Additional identities in Pascal's triangle

4.4 Additional identities in Pascal’s triangle

🧭 Overview

🧠 One-sentence thesis

Beyond basic patterns, Pascal's triangle contains deeper identities such as the sum-of-squares formula and the hockey stick identity, which reveal hidden relationships between binomial coefficients.

📌 Key points (3–5)

Sum of squares pattern: the sum of the squares of all entries in row n equals the middle entry of row 2n.
Hockey stick identity: a diagonal sum of binomial coefficients along a "hockey stick" shape equals a single binomial coefficient at the end.
Two versions of hockey stick: version 1 starts from the top diagonal, version 2 starts from a fixed column; they are related by symmetry.
Common confusion: the hockey stick identity involves entries from multiple rows, not just one row, and the sum traces a diagonal path.
Why it matters: these identities connect different parts of Pascal's triangle and will be proved later using combinatorial arguments.

🔢 Sum of squares identity

🔢 What the pattern says

Proposition 4.4.1 (Sum of squares): If n is a natural number, then (n choose 0)² + (n choose 1)² + ... + (n choose n)² = (2n choose n).

This identity states that if you square every entry in row n and add them up, the result is the middle entry of row 2n.
Example: for n = 2, we have 1² + 2² + 1² = 6, which is the middle entry of row 4.

📊 Numerical examples

The excerpt lists:

Row n	Sum of squares	Result	Where it appears
0	1²	1	Middle of row 0
1	1² + 1²	2	Middle of row 2
2	1² + 2² + 1²	6	Middle of row 4
3	1² + 3² + 3² + 1²	20	Middle of row 6
4	1² + 4² + 6² + 4² + 1²	70	Middle of row 8

The sequence 1, 2, 6, 20, 70, ... appears as the middle entries in even-numbered rows.
Don't confuse: this is not about adding entries in one row; it's about squaring each entry first, then summing.

🔮 Proof deferred

The excerpt states that Proposition 4.4.1 will be proved later in Proposition 5.3.4.
The proof is not given here; the section focuses on recognizing the pattern.

🏒 Hockey stick identity

🏒 What the pattern says

Proposition 4.4.2 (Hockey stick identity): Suppose n is a positive integer and m is a non-negative integer.

Version 1: (n choose 0) + (n+1 choose 1) + (n+2 choose 2) + ... + (n+m choose m) = (n+m+1 choose m).

Version 2: (n choose n) + (n+1 choose n) + (n+2 choose n) + ... + (n+m choose n) = (n+m+1 choose n+1).

The name comes from the visual shape: the binomial coefficients involved trace a diagonal path resembling a hockey stick in Pascal's triangle.
Version 2 is obtained from version 1 by applying the symmetry property to each binomial coefficient.

🎯 How to read the identity

Version 1: start at (n choose 0), move diagonally down-right, adding entries until you reach (n+m choose m); the sum equals the entry just below and to the right of the last term.
Version 2: start at (n choose n), move straight down in the same column, adding entries; the sum equals an entry one row below and one column to the right.
Don't confuse: the sum involves entries from different rows, not a single row.

🖼️ Visual example

The excerpt illustrates version 1 with n = 2 and m = 4:

Start at row 2, position 0: entry 1 (which is (2 choose 0)).
Move diagonally: row 3, position 1: entry 3 (which is (3 choose 1)).
Continue: row 4, position 2: entry 6; row 5, position 3: entry 10; row 6, position 4: entry 15.
Sum: 1 + 3 + 6 + 10 + 15 = 35, which equals (7 choose 4).

The excerpt shows this path circled in Pascal's triangle:

n = 0:   1
n = 1:   1   1
n = 2:   1   2   1       ← start here (1)
n = 3:   1   3   3   1   ← next (3)
n = 4:   1   4   6   4   1   ← next (6)
n = 5:   1   5   10  10  5   1   ← next (10)
n = 6:   1   6   15  20  15  6   1   ← next (15)
n = 7:   1   7   21  35  35  21  7   1   ← result (35)

🔮 Proof deferred

The excerpt states that Proposition 4.4.2 will be proved later in Proposition 5.2.10.
One exercise hints at a combinatorial proof: separate the 35 paths ending at (7 choose 4) into sets of sizes 15, 10, 6, 3, 1 using Lemma 4.1.2 (paths interpretation).

🧪 Exercises and applications

🧪 Drawing and verifying

Exercise 1 asks to draw version 2 of the hockey stick identity for n = 2, m = 4 on Pascal's triangle.
Exercise 2 asks to explain version 1 using path-counting (Lemma 4.1.2): the 35 paths to (7 choose 4) can be grouped by where they pass through earlier entries.

📈 Chi-square distribution application

Exercise 3 introduces the chi-square distribution: Q_n = x₀² + ... + xₙ² for the entries in row n.
Part (a): use Proposition 4.4.1 to compute Q_n for n = 1, 2, 3, 4, 5 (these are the middle entries of rows 2, 4, 6, 8, 10).
Part (b) and (c): ask what happens to Q_n as n grows, and what the limit is as n approaches infinity.
Don't confuse: this is an application of the sum-of-squares identity to probability theory, not a new identity.

Counting anagrams with multinomial coefficients

4.5 Counting anagrams with multinomial coefficients

🧭 Overview

🧠 One-sentence thesis

Multinomial coefficients count the number of distinct anagrams of a word by dividing the total permutations by the factorial product of each repeated letter's count.

📌 Key points (3–5)

What anagrams are: rearrangements of letters in a word where some letters may repeat (different from permutations where all elements are distinct).
Why simple permutation counting fails: when letters repeat, the standard n! formula overcounts because swapping identical letters produces the same anagram.
The multinomial coefficient formula: for a word with n letters of m different types (with k₁, k₂, ..., kₘ occurrences), the number of anagrams is n! divided by (k₁! · k₂! · ... · kₘ!).
Common confusion: permutations vs anagrams—permutations treat every position as distinct; anagrams recognize that identical letters are interchangeable, so fewer distinct arrangements exist.
Special case: when m = 2 (only two letter types), the multinomial coefficient reduces to the familiar binomial coefficient.

🔤 What makes anagrams different from permutations

🔤 Anagrams defined

An anagram is a rearrangement of the letters in a word.

Unlike permutations, anagrams account for repeated letters.
When letters repeat, many permutations represent the same anagram.
Example: the word ZOO has two O's; if we label them O₁ and O₂, we get 3! = 6 permutations (ZO₁O₂, ZO₂O₁, O₁ZO₂, O₂ZO₁, O₁O₂Z, O₂O₁Z), but only 3 distinct anagrams (ZOO, OZO, OOZ) because swapping the two O's doesn't create a new anagram.

🔄 Why repetition causes overcounting

Standard permutation counting assumes all elements are distinct.
When k identical letters exist, each anagram appears k! times in the full permutation list.
Example: SASSY has 5 letters, so 5! = 120 permutations, but three S's can be rearranged in 3! = 6 ways without changing the anagram, so the actual number of anagrams is 5! / 3! = 20.

🧮 The multinomial coefficient

🧮 Definition and notation

Let n be a natural number. Let k₁, ..., kₘ be m positive natural numbers such that k₁ + ... + kₘ = n. Let W be a word with n letters of m different types, in which the letter of type i appears kᵢ times, for 1 ≤ i ≤ m. The multinomial coefficient (n choose k₁, ..., kₘ) is defined to be the number of anagrams of W.

The notation is (n choose k₁, ..., kₘ).
It generalizes the binomial coefficient to more than two types of elements.
The k values must sum to n (total letter count).

📐 The formula

The multinomial coefficient formula is:

(n choose k₁, ..., kₘ) = n! / (k₁! · k₂! · ... · kₘ!)

Start with n! (all permutations if letters were distinct).
Divide by each kᵢ! to correct for the overcounting caused by each letter type's repetitions.
Example: FORTCOLLINS has 11 letters with two L's and two O's (and 7 other distinct letters), so the count is 11! / (2! · 2! · 1! · 1! · 1! · 1! · 1! · 1! · 1!) = 11! / 4.

🔢 Worked example: MISSISSIPPI

Total letters: n = 11
Letter types: m = 4
Counts: k₁ = 4 (I's), k₂ = 1 (M), k₃ = 2 (P's), k₄ = 4 (S's)
Number of anagrams: 11! / (4! · 1! · 2! · 4!)

Don't confuse: the denominator includes a factorial for every letter type, even if it appears only once (like the M, which contributes 1! = 1).

🔗 Connection to binomial coefficients

🔗 Special case when m = 2

When there are only two letter types, the multinomial coefficient becomes the binomial coefficient.
If k₁ + k₂ = n, then (n choose k₁, k₂) = n! / (k₁! · k₂!) = n! / (k₁! · (n − k₁)!) = (n choose k₁).
Example: a binary string of length n with k zeros and (n − k) ones has (n choose k) anagrams, because the anagram is determined exactly by where the k zeros are placed.

🔗 Why the formulas match

The binomial coefficient formula n! / (k₁! · (n − k₁)!) from earlier theorems is identical to the multinomial formula when m = 2.
This shows that multinomial coefficients are a true generalization of binomial coefficients.

🌐 Applications beyond word anagrams

🌐 Paths in higher dimensions

The excerpt mentions counting paths from (0, 0, 0) to (3, 4, 5) in three-dimensional space using only unit steps in directions (1, 0, 0), (0, 1, 0), and (0, 0, 1).
This is equivalent to counting anagrams of a "word" with 3 steps of type 1, 4 steps of type 2, and 5 steps of type 3.
Total steps: n = 3 + 4 + 5 = 12; answer: 12! / (3! · 4! · 5!).

🌐 Ordered set partitions preview

The excerpt briefly introduces ordered set partitions as a generalization.
Example: choosing 2 fruits on Monday, 3 on Tuesday, and 2 on Wednesday from 7 fruits gives (7 choose 2) · (5 choose 3) · (2 choose 2) = 7! / (2! · 3! · 2!) = 210 ways.
This is the same structure as the multinomial coefficient: dividing a set into ordered groups of specified sizes.

Ordered Set Partitions and the Multinomial Theorem

4.6 Ordered set partitions and the multinomial theorem

🧭 Overview

🧠 One-sentence thesis

Ordered set partitions generalize the binomial coefficient by counting ways to separate a set into more than two labeled parts, and the multinomial coefficient formula directly extends the binomial theorem to powers of sums with more than two terms.

📌 Key points (3–5)

What ordered set partitions count: ways to separate a set into multiple non-overlapping labeled blocks (not just two parts like binomial coefficients).
The multinomial coefficient formula: n! divided by the product of all block sizes' factorials, counting ordered set partitions.
Connection to anagrams: anagrams of a word with repeated letters correspond exactly to ordered set partitions of position indices.
Common confusion: ordered set partitions vs. set partitions—ordered partitions assign labels/roles to blocks (e.g., Monday/Tuesday/Wednesday), while set partitions treat all blocks as interchangeable.
Why it matters: the multinomial theorem generalizes the binomial theorem, giving coefficients for expanding powers of sums with more than two variables.

🧩 Core concept: ordered set partitions

🧩 What an ordered set partition is

Ordered set partition: a collection of non-empty subsets B₁, ..., Bₘ of a set S such that any two blocks have trivial intersection and their union is S; the blocks are ordered (labeled).

This generalizes the binomial coefficient idea: instead of splitting S into "chosen" and "not chosen" (two parts), we split S into m labeled parts.
The blocks are ordered, meaning B₁ is distinct from B₂ even if they have the same size.
Example: Splitting 7 fruits into "Monday's 2 fruits," "Tuesday's 3 fruits," and "Wednesday's 2 fruits" creates an ordered set partition with three blocks.

🔢 The multinomial coefficient

Multinomial coefficient: (n choose k₁, k₂, ..., kₘ) = n! / (k₁! · k₂! · ... · kₘ!), where k₁ + k₂ + ... + kₘ = n.

This counts the number of ordered set partitions of an n-element set into blocks of sizes k₁, k₂, ..., kₘ.
The proof works by successive counting: choose k₁ elements for the first block (n choose k₁ ways), then k₂ from the remaining (n−k₁ choose k₂ ways), and so on; all intermediate factorials cancel, leaving only n! in the numerator and the block-size factorials in the denominator.
Example: 7 fruits into blocks of sizes 2, 3, 2 gives (7 choose 2,3,2) = 7!/(2!·3!·2!) = 210.

🎁 Example: distributing presents

11 different presents to 4 people in order: 4 to person a, 1 to person b, 2 to person c, 4 to person d.
This is an ordered set partition with n=11 and block sizes 4, 1, 2, 4.
Number of ways = (11 choose 4,1,2,4) = 11!/(4!·1!·2!·4!) = 34,650.

🔤 Connection to anagrams

🔤 How anagrams correspond to ordered set partitions

An anagram of a word with repeated letters can be encoded as an ordered set partition of position indices {1, 2, ..., n}.
Given an anagram, for each distinct letter i (in alphabetical order), collect the positions where that letter appears into block Bᵢ.
Example: the anagram IMSSSPIIPSI of MISSISSIPPI has I's in positions {1,7,8,11}, M in {2}, P's in {5,9}, S's in {3,4,6,10}; this gives the ordered set partition with those four blocks.
Conversely, given an ordered set partition with blocks of sizes matching the letter counts, place the i-th letter in the positions numbered by block Bᵢ to reconstruct an anagram.

🧮 Proof of the anagram counting formula

By the correspondence above, the number of anagrams of a word W with k₁ copies of letter a₁, k₂ copies of a₂, ..., kₘ copies of aₘ equals the number of ordered set partitions of {1, ..., n} into blocks of sizes k₁, ..., kₘ.
By the multinomial coefficient theorem, this count is n! / (k₁! · k₂! · ... · kₘ!).
This proves Theorem 4.5.7 (the anagram counting formula).

🆚 Ordered vs. unordered set partitions

🆚 The key distinction

Feature	Ordered set partition	Set partition
Blocks labeled?	Yes—blocks have distinct roles (e.g., Monday/Tuesday/Wednesday or person a/b/c/d)	No—blocks are interchangeable
Counting formula	(n choose k₁,...,kₘ) = n!/(k₁!·...·kₘ!)	Divide the ordered count by the number of ways to permute identical-sized blocks

Don't confuse: if blocks have assigned roles or labels, use ordered set partitions; if blocks are just "groups" with no distinction, use set partitions.

🎓 Example: students in groups

Problem 1 (ordered): 12 students separated into 4 groups of 3, assigned to present on Monday, Tuesday, Wednesday, Thursday.
- Each assignment is an ordered set partition into blocks B₁, B₂, B₃, B₄ of size 3.
- Answer: (12 choose 3,3,3,3) = 12!/(3!·3!·3!·3!).
Problem 2 (unordered): 12 students separated into 4 groups of 3, with no day assignment.
- This is a set partition; the 4 blocks are interchangeable.
- There are 4! ways to order the 4 blocks, so divide the ordered count by 4!.
- Answer: 12!/(3!·3!·3!·3! · 4!).

🧪 The multinomial theorem

🧪 Statement of the theorem

Multinomial Theorem: The expansion of (x₁ + x₂ + ... + xₘ)ⁿ is the sum over all k₁ + ... + kₘ = n of (n choose k₁,...,kₘ) · x₁^k₁ · ... · xₘ^kₘ.

This directly generalizes the binomial theorem (which handles m=2) to sums of more than two variables.
The coefficient of any monomial x₁^k₁ · ... · xₘ^kₘ in the expansion is the multinomial coefficient (n choose k₁,...,kₘ).
Example: (x + y + z)³ = x³ + y³ + z³ + 3x²y + 3xy² + 3x²z + 3xz² + 3y²z + 3yz² + 6xyz; the coefficient 3 of x²y¹z⁰ equals (3 choose 2,1,0) = 3.

🧪 Why the multinomial coefficient appears

To form the monomial x₁^k₁ · ... · xₘ^kₘ in the product (x₁ + ... + xₘ)ⁿ, label the n factors as F₁, ..., Fₙ.
Choose k₁ factors from which to pick x₁, then k₂ factors from which to pick x₂, and so on.
This creates an ordered set partition of the set of factors {F₁, ..., Fₙ} into blocks of sizes k₁, ..., kₘ.
By Theorem 4.6.3, the count is (n choose k₁,...,kₘ), which is the coefficient of that monomial.

🎯 Practice problems

🎯 Distributing distinct objects

Pillows to students: 20 different pillows to 12 students (a through l), with 2 pillows each to students a–h and 1 pillow each to students i–l.
- This is an ordered set partition with n=20 and block sizes 2,2,2,2,2,2,2,2,1,1,1,1.
- Answer: (20 choose 2,2,2,2,2,2,2,2,1,1,1,1) = 20!/(2!⁸ · 1!⁴).
Months to nurses: 12 months assigned to 3 nurses (a, b, c), 4 months each.
- Ordered set partition with n=12 and block sizes 4,4,4.
- Answer: (12 choose 4,4,4) = 12!/(4!·4!·4!).

🎯 Multi-stage problems

Conference room assignments: 9 friends in 3 hotel rooms (sizes 4, 3, 2) for 5 nights; room assignments can change each night.
- Each night is an independent ordered set partition of 9 people into blocks of sizes 4, 3, 2.
- For one night: (9 choose 4,3,2) = 9!/(4!·3!·2!).
- For 5 nights: raise this to the 5th power (since each night is independent).
- Answer: [9!/(4!·3!·2!)]⁵.
Stickers on windows: 8 different stickers on 4 windows, 2 stickers per window.
- Ordered set partition with n=8 and block sizes 2,2,2,2.
- Answer: (8 choose 2,2,2,2) = 8!/(2!·2!·2!·2!).

Investigation: Combinatorial problems on a chessboard

4.7 Investigation: Combinatorial problems on a chessboard

🧭 Overview

🧠 One-sentence thesis

This investigation uses the chessboard as a structured setting to explore counting problems involving placement constraints, attacking relationships, and material distinctions among chess pieces.

📌 Key points (3–5)

The setting: an 8×8 chessboard (or n×n generalization) with pieces that move according to chess rules; the board cannot be rotated or flipped.
Core constraint types: no two pieces on the same square, no attacking between pieces of different materials, and maximizing the number of pieces under constraints.
Rook problems: focus on row/column restrictions (rooks attack along rows and columns).
Queen problems: more complex because queens attack along rows, columns, and diagonals.
Common confusion: "attacking" only applies between pieces of different materials; pieces of the same material can be placed to avoid attacking each other, but the constraint changes when materials differ.

♟️ The chessboard framework

♟️ Board structure

A chessboard is an 8×8 board tiled in 1×1 squares, colored black and white so that no black square shares an edge with a white square.

The board is fixed to the table—it cannot be rotated or turned over.
This means positions are absolute; rotating the board would create a different configuration.

🎯 Piece movement and attacking

Rook: moves to any square in the same row or column.
Queen: moves to any square in the same row, column, or diagonal.
Attacking definition: two pieces of different materials attack each other if one can move to the other's square in one move.
Example: a gold rook and a silver rook attack each other if they share a row or column; two gold rooks do not "attack" in the sense used here.

🧱 Material distinctions

Problems involve pieces made from different materials (gold, silver, titanium, diamond).
An unlimited supply of rooks (or queens) of each material is available.
The key constraint is often preventing attacks between different materials, not within the same material.

🏰 Rook placement problems

🏰 Basic rook constraints

Problem 1: Place two gold rooks and two silver rooks.

(a) No square has more than one rook.
(b) Additionally, no gold rook attacks a silver rook (i.e., no gold and silver rook share a row or column).

📏 Maximum non-attacking rooks (same material)

Problem 2: What is the largest number of rooks of the same material that can be placed so no two share a row or column?

Since a rook attacks along its row and column, each rook must occupy a unique row and a unique column.
On an 8×8 board, the maximum is 8 rooks (one per row, one per column).
The problem also asks: how many ways can this be done?

🌈 Multi-material rook arrangements

Problem 3: Place 16 rooks (four of each of four materials) so that no pair of rooks with different materials attacks each other.

This requires partitioning the board so that rooks of different materials never share rows or columns.
The problem asks for a demonstration that this is possible and a count of the number of ways.

⚖️ The peaceful rooks problem

Problem 4: Find the largest number m such that m gold rooks and m silver rooks can be placed with no gold rook attacking a silver rook.

This is a maximization problem under the constraint that the two groups of rooks do not interfere.
The problem also asks for the number of ways to achieve this maximum.

👑 Queen placement problems

👑 Queen mobility

Problem 5: Analyze a single queen's movement options.

(a) Minimum mobility: find a position where the queen can move to the fewest squares. (Likely a corner or edge position.)
(b) Maximum mobility: find a position where the queen can move to the most squares. (Likely a central position.)

🔢 The n-queens puzzle

Problem 6: Place n gold queens on an n×n board so no two attack each other.

(a) Show this is impossible for n=2 or n=3.
(b)–(c) Find solutions for n=4 and n=5.
(d) The "8 queens puzzle" is a famous instance; the problem asks students to look it up and draw a solution.
Don't confuse: this is about queens of the same material not attacking each other, which is different from the multi-material constraint.

🕊️ The peaceable queens problem

Problem 7: On an n×n board, find the maximum m such that m gold queens and m silver queens can be placed with no attacks between different materials.

Example: when n=3, m=1 (one gold queen in a corner, one silver queen in the middle of an opposite side).
(a) For n=4, show m≥2 by constructing a valid placement.
(b) For n=5, show m≥4.
(c) Look up the peaceable queens problem and find m for n=8.
This generalizes the peaceful rooks problem to the more complex queen movement pattern.

🔗 Connection to broader combinatorics

🔗 Why these problems matter

The excerpt transitions to Chapter 5, which covers proof techniques in combinatorics.
Chessboard problems illustrate counting under constraints, a central theme in combinatorics.
They require systematic reasoning about placement rules, material distinctions, and optimization—skills that proofs formalize.

📚 Proofs as explanation

Proofs are valuable because they explain the reasons why statements are true.

The excerpt emphasizes that proofs are not just verification but also understanding.
Chessboard problems naturally lead to questions like "Why is this the maximum?" or "How many ways exist?"—questions that require proof.
Example: proving that 8 is the maximum number of non-attacking rooks (same material) on an 8×8 board involves showing both that 8 is achievable and that 9 is impossible.

Why are proofs necessary?

5.1 Why are proofs necessary?

🧭 Overview

🧠 One-sentence thesis

Proofs are essential in mathematics because numerical patterns and computational checks can be misleading, and only rigorous reasoning can establish that a statement holds without exception.

📌 Key points (3–5)

Why data misleads: patterns observed in the first few cases can break down at larger values, so checking examples is not enough.
What proofs provide: they show a statement is true without doubt and explain why it is true, not just that it appears true.
Historical examples: famous conjectures (Fermat numbers, Fermat's Last Theorem, Four Color Theorem) show both the power of proof and the danger of assuming patterns continue.
Common confusion: "almost integers" and pseudo-primes look like they satisfy a property but do not—appearance is not the same as proof.
Why proofs matter: they are the foundation of mathematics, connecting ideas across time and enabling us to build reliable knowledge.

🚫 When patterns fail

🚫 Divisors of factorials

Example 5.1.1 asks: How many positive divisors does n! have?

The first few cases yield 1, 2, 4, 8, 16 divisors for n = 1, 2, 3, 4, 5.
This suggests the pattern "doubles each time," so you might guess 6! has 32 divisors.
Reality: 6! has only 30 divisors—the pattern breaks.
Lesson: We need mathematical proofs to show a pattern continues indefinitely; we cannot rely on a finite number of checks because a counterexample may be too large to find by hand.

🎭 Almost integers and hoaxes

Example 5.1.2 presents "almost integers": numbers so close to integers that they appear to be whole numbers.

e^(π√43) ≈ 884736743.9997
e^(π√67) ≈ 147197952743.999998
e^(π√163) ≈ 262537412640768743.9999999999992 (Ramanujan's constant)
In April 1975, Gardner claimed Ramanujan conjectured e^(π√163) is an integer—this was an April Fool's hoax.
Don't confuse: "very close to an integer" with "is an integer"; these are transcendental numbers, not integers.

🔢 Fermat numbers and false conjectures

Example 5.1.4 discusses Fermat numbers F_a = 2^(2^a) + 1.

The first five Fermat numbers are all prime: F₀ = 3, F₁ = 5, F₂ = 17, F₃ = 257, F₄ = 65537.
Fermat conjectured F_a is always prime based on these cases.
Counterexample: In 1732, Euler showed F₅ = 4294967297 = 641 · 6700417, which is composite.
Currently no one knows how many Fermat numbers are prime; even with modern computing, no more Fermat primes have been found beyond F₄.
Lesson: Even a pattern holding for the first five cases can fail at the sixth.

🎲 Pseudo-primes and Carmichael numbers

Example 5.1.5 examines Fermat's little theorem and its converse.

Fermat's little theorem:

If p is prime, then p divides a^(p−1) − 1 for all a such that gcd(a, p) = 1.

Example: p = 5 divides a⁴ − 1 for a = 1, 2, 3, 4.
Question: Is the converse true? If n divides a^(n−1) − 1 for all a with gcd(a, n) = 1, must n be prime?
Counterexample: n = 561 divides a⁵⁶⁰ − 1 for all such a, but 561 = 3 · 11 · 17 is composite.
561 is a Carmichael number, also called a pseudo-prime (fake prime).
Don't confuse: satisfying a property of primes with actually being prime.

🏛️ Why proofs are valuable

🏛️ Beyond verification

The excerpt emphasizes that proofs do more than just confirm truth:

Show without doubt: proofs establish that a statement is true in all cases, not just the cases we can check.
Explain why: proofs reveal the reasons behind a statement, not just that it holds.
Beauty and power: many mathematicians find proofs beautiful and have favorite proofs that illustrate core ideas in their fields.
Foundation vs language: some view proofs as the solid base needed to build mathematics; others see them as a living language connecting ancient and modern thought.

🏆 Famous proof achievements

Problem	Key facts	Outcome
Four Color Theorem (Example 5.1.3)	Any map can be colored with four colors so no adjacent countries share a color	Proven in 1976 by Appel and Haken using a computer to exhaustively check cases after categorizing possible maps
Fermat's Last Theorem (Example 5.1.6)	X^n + Y^n = Z^n has no integer solutions for n ≥ 3 unless XYZ = 0	Conjectured in 1637; motivated centuries of work in algebraic number theory and arithmetic geometry; proven in 1994 by Wiles (building on work of Frey, Ribet, Serre, Taniyama, Shimura); Wiles received the Abel Prize (~$700,000)

💻 Computers in proofs

The Four Color Theorem proof used a computer to check many cases after mathematical reasoning reduced the problem to a finite (but large) set of cases.
Computers can make proofs much faster when there are many cases to check.
Fermat numbers remain useful because they can have very large prime factors; in 2020, a megaprime factor of F₅₅₂₃₈₅₈ was found as part of the PrimeGrid collaboration.

🎯 What this chapter will cover

🎯 Learning to write proofs

Writing proofs is not easy; students sometimes feel it is like drill exercises.
The chapter will explain proof techniques often useful in combinatorics:
- Induction
- Counting in two ways
- Bijective proofs
- Proof by contradiction
- The pigeonhole principle
Goal: by the end, writing a proof should seem more manageable; maybe you will have a favorite proof too.

🎯 Why this matters for combinatorics

The chapter focuses on proof techniques specifically useful in combinatorics.
Proofs are needed to show statements are true without doubt, especially when patterns in data can be misleading.
The examples (divisors, Fermat numbers, pseudo-primes) all illustrate combinatorial or number-theoretic phenomena where intuition from small cases fails.

Induction

5.2 Induction

🧭 Overview

🧠 One-sentence thesis

Mathematical induction proves that a property holds for all natural numbers by establishing a base case and showing that each true case implies the next, like dominoes falling in sequence.

📌 Key points (3–5)

The domino analogy: induction works by knocking over the first domino (base case) and ensuring each falling domino knocks over the next (inductive step), causing all infinitely many to fall.
Two required components: you must prove both (1) the property is true for n = 0 (or some starting value), and (2) if true for an arbitrary k, then true for k+1.
Common confusion: assuming what you want to prove—you cannot start by writing "A = B" and simplify; instead, assume P_k is true and use it to prove P_k+1.
When statements fail for small n: if a property is false for small values but true from some c onward, modify the base case to start at n = c instead of n = 0.
Why it works: the inductive step applies to an arbitrary k, so combined with the base case, it guarantees the property holds for every natural number.

🎯 The inductive principle

🎯 What induction proves

The inductive principle: If (1) property P_n is true for n = 0, and (2) whenever P_k is true for an arbitrary integer k ≥ 1, then P_(k+1) is also true, then P_n is true for all natural numbers n.

The excerpt defines the natural numbers as N = {0, 1, 2, 3, ...}.
A property P_n can be a formula or fact that might be true or false for each n.
Example from the excerpt: P_n: "1 + 2 + ... + n = n(n+1)/2" is true for all n ≥ 1.
Counter-example: P_n: "The Fermat number F_n is prime" is true for n = 1, 2, 3, 4 but false for n = 5, so it cannot be proven by any method.

🪜 The three-step method

Step 1: Base case

Prove that P_n is true for n = 0 (or whatever starting value is appropriate).
The excerpt notes this is usually an easy computation.

Step 2: Inductive hypothesis

Let k ∈ N be arbitrary and assume P_k is true for this specific value k.
The statement P_k is called the inductive hypothesis.
Choosing k to be arbitrary means showing that no matter what domino falls, it will always knock over the next one.

Step 3: Inductive step

Using the assumption that P_k is true, prove that P_(k+1) is true.
Steps 2 and 3 together prove the "if...then" statement: to prove such a statement in mathematics, assume the hypothesis is true and then prove the conclusion holds under that assumption.

🔄 Aesthetic variation

The excerpt mentions you can replace the inductive step with: if P_(k-1) is true for an arbitrary integer k ≥ 1, then P_k is true.
In other words, in step 2 let k ≥ 1 (rather than k ≥ 0) and assume P_(k-1), then in step 3 prove P_k.
This is a purely aesthetic choice but sometimes makes the algebra simpler.

📐 Worked examples

📐 Sum of first n odd integers

Property: P_n: 1 + 3 + 5 + ... + (2n - 1) = n² for all integers n ≥ 1.

Base case: For n = 1, note that 1 = 1².

Inductive step: Let k ≥ 1 be arbitrary and suppose P_k is true, meaning 1 + 3 + 5 + ... + (2k - 1) = k². Adding 2k + 1 to both sides gives:

1 + 3 + 5 + ... + (2k - 1) + (2k + 1) = k² + 2k + 1 = (k + 1)²
This shows P_(k+1) is true.

Hence by induction, P_n is true for all n ≥ 1.

📐 Sum of consecutive integers

Property: P_n: 1 + 2 + ... + n = n(n+1)/2 for all integers n ≥ 1.

The excerpt references this as Lemma 1.2.1 and states it can be proven by induction (left as an exercise).

📐 Sum of powers of 2

Property: 1 + 2 + 4 + 8 + ... + 2^(n-1) = 2^n - 1 for all integers n ≥ 1.

The excerpt states this can be proven by induction (left as an exercise).

📐 Factorial inequality

Property: Prove that n! ≥ 2^n for all n ≥ 4.

The excerpt notes the statement is false when n = 1, 2, 3 but true when n = 4.
Base case: The identity is true when n = 4 because 24 = 4! ≥ 2⁴ = 16.
Inductive step: Let k ≥ 4 be arbitrary and suppose k! ≥ 2^k. Then:
- (k+1)! = (k+1) · k! ≥ (k+1) · 2^k ≥ 5 · 2^k ≥ 2^(k+1)
- This shows P_(k+1) is true.

Hence by induction, P_k is true for all n ≥ 4.

🔧 Modified base cases

🔧 When to start at c instead of 0

Sometimes a statement P_n is false for small values of n. If there is a positive integer c for which P_c is true and the inductive step works, then we can modify the method of induction to prove that statement P_n is true for all n ≥ c (instead of n ≥ 0).

How to modify:

Change the base case to prove P_c rather than prove P_0.
Change the assumption on the inductive step to choose k ≥ c and assume P_k.

Example: The factorial inequality n! ≥ 2^n is false for n = 1, 2, 3, so the base case starts at n = 4 and the inductive step assumes k ≥ 4.

⚠️ Common mistakes

⚠️ Assuming what you want to prove

One common mistake that people make in writing proofs is to assume A = B and then simplify. Warning: this does not always work!

False example from the excerpt:

Claim: 3 = 5.
"Proof": Note 3 = 5 implies 3·0 = 5·0 implies 0 = 0 which is true.
Why it's wrong: You cannot start by assuming the conclusion and work backward; the logic is reversed.

⚠️ Inductive step failing for small k

False example: All horses are the same color

The excerpt presents a "proof" by induction:

Base case: In any herd of 1 horse, all horses in that herd are the same color. ✓
Inductive step: Let k ≥ 1 be arbitrary and suppose all horses in a herd of size k are the same color. Consider a herd of k+1 horses:
- Removing the last horse, the first horse is the same color as the horses in the middle.
- Removing the first horse, the last horse is the same color as the horses in the middle.
- Hence all k+1 horses are the same color.

Where it goes wrong: The inductive step fails when going from k = 1 to k = 2, even though it works for all larger values of k. When k = 1, the middle herd is empty, so the argument breaks down. This is why the bound on k in the declaration "Let k ≥ 1 be arbitrary" is so important.

Don't confuse: An inductive step that works for most values of k is not enough; it must work for every k starting from the base case.

🧮 Two-variable induction

🧮 Hockey Stick Identity

Property: For positive integer n and non-negative integer m,

(n choose 0) + (n+1 choose 1) + (n+2 choose 2) + ... + (n+m choose m) = (n+m+1 choose m)

Key point: There are two variables n and m in the formula, so the excerpt specifies that induction proceeds on m for a fixed n.

Base case: The identity is true when m = 0 because (n choose 0) = 1 = (n+0+1 choose 0).

Inductive step: Let m ∈ N be arbitrary and assume:

(n choose 0) + (n+1 choose 1) + (n+2 choose 2) + ... + (n+m choose m) = (n+m+1 choose m)

Must show:

(n choose 0) + (n+1 choose 1) + (n+2 choose 2) + ... + (n+m choose m) + (n+m+1 choose m+1) = (n+m+2 choose m+1)

By substituting the inductive hypothesis, the left-hand side simplifies to:

(n+m+1 choose m) + (n+m+1 choose m+1)

which equals (n+m+2 choose m+1) by Pascal's recurrence (Theorem 4.1.1).

Hence the statement is true for all non-negative integers m by induction.

🔍 Strong induction

🔍 Variation of the method

There is a variation of the inductive process, called Strong induction. In the inductive step of this variant, we suppose that P_k is true for all 1 ≤ k ≤ n, and show that P_k is true when k = n+1.

In regular induction, you assume P_k is true for one specific k and prove P_(k+1).
In strong induction, you assume P_k is true for all values from 1 up to n, and prove P_(n+1).
This gives you more to work with in the inductive step.

🧩 Counting in two ways

🧩 The principle

Counting in Two Ways: If a set has n elements and it also has m elements then n = m.

The excerpt calls this a "trivial principle" but notes it is incredibly useful in proving algebraic identities in combinatorics.
How to use it: To prove F = G where F and G are both formulas, find a collection of objects such that both F and G are the answer to "how many are there in the collection?"

🧩 Multiplication as repeated addition

Property: a · b = b + b + ... + b (a summands), where a and b are positive integers.

Right-hand side: By repeated application of the Addition Principle, b + b + ... + b counts the number of students in a school that has a classes with b students each.
Left-hand side: By the Multiplication Principle, a · b counts the number of ways to first walk into one of the a classrooms and then single out one of the b students in that class.
Thus a · b is also equal to the total number of students, so a · b = b + b + ... + b.

🧩 Pascal's recurrence combinatorially

Property: (n+1 choose k) = (n choose k-1) + (n choose k)

Combinatorial proof: Find the number of ways of choosing k people from a classroom that has n students and 1 professor.

Since there are n+1 people in the room, the answer is (n+1 choose k).
On the other hand, separate the ways of choosing the group into:
- (i) those that do not contain the teacher: (n choose k) choices
- (ii) those that do contain the teacher: (n choose k-1) choices (choose k-1 from the n students)
By the Addition Principle, the total is (n choose k-1) + (n choose k).

Hence (n+1 choose k) = (n choose k-1) + (n choose k).

Counting in Two Ways

5.3 Counting in two ways

🧭 Overview

🧠 One-sentence thesis

Counting in two ways proves algebraic identities by showing that two different formulas both count the same collection of objects, so they must be equal.

📌 Key points (3–5)

The principle: if a set has n elements and also has m elements, then n = m—sounds trivial but is powerful for proving combinatorial identities.
How to use it: to prove F = G, find a collection of objects such that both F and G answer "how many are there in the collection?"
Proof strategy: count the same set in two different ways—one way gives the left-hand side of the identity, the other way gives the right-hand side.
Common confusion: the principle is not about comparing two different sets; it's about counting the same set using two different methods.
Why it matters: provides combinatorial (intuitive, story-based) proofs for algebraic formulas involving binomial coefficients and sums.

🧩 The core principle

🧩 What "counting in two ways" means

Lemma (Counting in Two Ways): If a set has n elements and it also has m elements, then n = m.

This sounds obvious, but it is the foundation for proving many combinatorial identities.
The key insight: if you can count the same collection using two different methods, the two answers must be equal.
Example: if you count students by adding up each classroom's size, or by multiplying number of classrooms times students per classroom, you get the same total.

🔍 How to apply the principle

Goal: prove that formula F equals formula G.
Method:
- Find a concrete collection of objects (e.g., students, subsets, choices).
- Show that F counts this collection one way.
- Show that G counts the same collection a different way.
- Conclude F = G.
Don't confuse: you are not counting two different sets and comparing them; you are counting one set in two ways.

🎯 Simple examples

🎯 Proving multiplication equals repeated addition

The excerpt proves that a · b = b + b + ⋯ + b (with a summands).

The collection: all students in a school with a classes, each having b students.
First way (right-hand side): by the Addition Principle, b + b + ⋯ + b (a times) counts the total students.
Second way (left-hand side): by the Multiplication Principle, a · b counts the number of ways to walk into one of a classrooms and single out one of b students—this also equals the total number of students.
Conclusion: both formulas count the same thing, so they are equal.

🎯 Proving Pascal's recurrence

The excerpt proves that (n+1 choose k) = (n choose k−1) + (n choose k).

The collection: ways to choose k people from a room with n students and 1 professor (total n+1 people).
First way (left-hand side): (n+1 choose k) counts all such choices.
Second way (right-hand side): separate into two cases:
- (i) Choices that do not contain the professor: (n choose k) ways.
- (ii) Choices that do contain the professor: (n choose k−1) ways (choose only k−1 students).
- By the Addition Principle, total is (n choose k−1) + (n choose k).
Conclusion: both count the same collection, so the identity holds.

🧮 Advanced examples

🧮 Sum of squares formula

The excerpt proves that (n choose 0)² + (n choose 1)² + ⋯ + (n choose n)² = (2n choose n).

Proof:

The collection: ways to choose n soccer players out of 2n total players.
First way (left-hand side): (2n choose n).
Second way (right-hand side):
- Divide the 2n players into two groups: n with gold jerseys, n with green jerseys.
- To choose n players total, choose k from gold and n−k from green.
- For a fixed k, there are (n choose k) · (n choose n−k) ways.
- Since (n choose n−k) = (n choose k), this is (n choose k)².
- Sum over all possible k from 0 to n: Σ (n choose k)² = (2n choose n).
Conclusion: both formulas count the same collection.

🧮 Vandermonde's identity

The excerpt proves that (n+m choose ℓ) = Σ (from k=0 to ℓ) (n choose k)(m choose ℓ−k).

Proof:

The collection: ways to choose ℓ snacks from n granola bars and m bags of trail mix (total n+m snacks).
First way (left-hand side): (n+m choose ℓ).
Second way (right-hand side):
- Let D_k be the number of ways to choose exactly k granola bars and ℓ−k bags of trail mix.
- D_k = (n choose k)(m choose ℓ−k).
- Sum over all possible k from 0 to ℓ: Σ D_k = Σ (n choose k)(m choose ℓ−k).
Conclusion: both count the same collection.
Special case: substitute m = n = ℓ in Vandermonde's identity to get (2n choose n) = Σ (n choose i)², which is another proof of the sum of squares formula.

🔧 Convention and notation

🔧 Handling out-of-range binomial coefficients

In the summation for Vandermonde's identity, the excerpt sets (n choose k) = 0 if k > n.
Similarly, (m choose ℓ−k) = 0 if ℓ−k > m.
This convention ensures the sum is well-defined even when some terms would otherwise be undefined.

🔧 Exercises hint

The excerpt lists several identities to prove by counting in two ways:

(n choose k) = (n choose n−k): count subsets of size k vs. their complements of size n−k.
(n choose 0) + (n choose 1) + ⋯ + (n choose n) = 2^n: count all subsets of an n-element set.
(n choose k) · k = n · (n−1 choose k−1): count ways to choose a committee and designate a leader.
Hockey Stick identity: count paths or cumulative choices.

🔗 Connection to bijections

🔗 Preview of bijective proofs

The excerpt mentions that counting in two ways is not always enough; sometimes you need a bijection to compare the sizes of two different sets.
Example question: how many ways can you choose 4 numbers from {1, …, 100} so that none are consecutive?
This requires pairing elements of one set (4-element subsets with no consecutive numbers) with elements of another set (some easier-to-count collection) via a bijection.
Don't confuse: counting in two ways counts one set in two ways; bijective proofs establish a one-to-one correspondence between two different sets to show they have the same size.

Bijective proofs

5.4 Bijective proofs

🧭 Overview

🧠 One-sentence thesis

Bijective proofs establish that two sets have the same size by constructing a one-to-one correspondence between their elements, enabling solutions to counting problems that are difficult to solve directly.

📌 Key points (3–5)

What a bijection is: a map between two sets that is both 1-to-1 (no element is hit twice) and onto (every element is hit), creating a perfect pairing.
Why bijections matter for counting: if there is a bijection between sets A and B, then A and B have the same number of elements, even if counting one set directly is hard.
How to construct a bijection: define a map f from A to B and show it has an inverse map g from B to A that reverses the process.
Common confusion: bijections differ from "counting in two ways"—bijections compare two different sets, while counting in two ways counts one set using two methods.
Practical technique: transform a hard counting problem into an easier one by finding a bijection to a set whose size is already known.

🔗 What bijections are

🔗 Formal definition

Bijection: A map f : A → B that is both 1-to-1 and onto.

1-to-1 (injective): whenever a₁ ≠ a₂, then f(a₁) ≠ f(a₂); equivalently, if f(a₁) = f(a₂), then a₁ = a₂.
- Intuition: no element of B is hit twice.
Onto (surjective): for every b ∈ B, there exists an a ∈ A such that f(a) = b.
- Intuition: the map hits every element of B; the range of f is all of B.
A bijection creates a perfect pairing or matching between elements of A and elements of B.

🔄 Inverse maps

A bijection f : A → B has an inverse map g : B → A if and only if g ∘ f is the identity on A and f ∘ g is the identity on B.

How to define the inverse: Given b ∈ B, since f is onto, there exists a unique a ∈ A (unique because f is 1-to-1) such that f(a) = b; define g(b) = a.
Why this works: (g ∘ f)(a) = g(f(a)) = g(b) = a and (f ∘ g)(b) = f(g(b)) = f(a) = b.
Conversely: if g is an inverse of f, then f is a bijection:
- To show 1-to-1: if f(a₁) = f(a₂), then g(f(a₁)) = g(f(a₂)), so a₁ = a₂.
- To show onto: for any b ∈ B, let a = g(b); then f(a) = f(g(b)) = b.

Example: Let A be positive integers ending in 3, B be positive integers ending in 7. The map f(a) = a + 4 is a bijection with inverse g(b) = b − 4.

Example: Let A be all integers, B be all multiples of 7. The map f(a) = 7a is a bijection with inverse g(b) = b/7.

🧮 The bijective proof principle

🧮 Core counting rule

Lemma (Bijective proof): If A has m elements, B has n elements, and there is a bijection f : A → B, then m = n.

This sounds simple but can be tricky to execute in practice.
Strategy: to prove two quantities are equal, construct two sets A and B whose sizes represent those quantities, then find a bijection between them.
Don't confuse with "counting in two ways": bijections compare two different sets; counting in two ways counts one set using two different methods.

🎯 Classic examples

🎯 Subsets of complementary size

Problem: Show that the number of k-element subsets of {1, …, n} equals the number of (n − k)-element subsets.

Let A be the set of k-element subsets of Ω = {1, …, n}.
Let B be the set of (n − k)-element subsets of Ω.
Bijection: map each subset S ∈ A to its complement Ω − S ∈ B.
Since every subset has exactly one complement, this is a bijection.
Conclusion: the number of k-element subsets equals the number of (n − k)-element subsets, i.e., (n choose k) = (n choose n−k).

🎯 Even-sized vs odd-sized subsets

Problem (Proposition 5.4.9): For Ω = {1, …, n}, show that the number of even-sized subsets equals the number of odd-sized subsets.

This proves the alternating sum identity: (n choose 0) − (n choose 1) + (n choose 2) − … ± (n choose n) = 0, which can be rewritten as (n choose 0) + (n choose 2) + (n choose 4) + … = (n choose 1) + (n choose 3) + (n choose 5) + …

Case 1: n is odd

If S has size k, then Ω − S has size n − k.
If k is even, then n − k is odd (since n is odd), and vice versa.
Bijection: map each even-sized subset S to its complement Ω − S, which is odd-sized.
Example: For Ω = {1, 2, 3}, pair ∅ ↔ {1,2,3}, {1} ↔ {2,3}, {2} ↔ {1,3}, {3} ↔ {1,2}.

Case 2: n is even

The complement trick doesn't work (even − even = even).
New bijection: Pick a distinguished element, say 1 ∈ Ω. Group subsets into pairs where the two subsets differ only in whether they contain 1.
Each pair contains one even-sized subset and one odd-sized subset.
Example: For Ω = {1,2,3,4}, pair {1,2} ↔ {2}, {1,3} ↔ {3}, {1,2,3} ↔ {2,3}, etc.
Every subset is in exactly one pair, so the counts are equal.

Conclusion: Since there are 2ⁿ subsets total, the number of even-sized subsets is 2ⁿ / 2 = 2ⁿ⁻¹.

🍬 Multisets and the M&M's proof

🍬 Choosing with repetition allowed

Problem (Proposition 5.4.12): How many ways can you choose a set of size k from a set of size n with repeats allowed?

Answer: (n + k − 1 choose k).
This reproves the "sticks and stones" theorem (Theorem 3.5.6) using a bijection.

🍬 The bijection construction

Set A: cups filled with k M&M's, where there are n different possible colors.

Size of A = number of ways to choose k items from n types with repetition.

Set B: binary sequences with k zeroes and (n − 1) ones.

Total length: n + k − 1.
Size of B = (n + k − 1 choose k), since we choose which k positions contain the 0's.

Map f : A → B (M&M's to binary sequence):

Spill the k M&M's onto a plate and sort them by color (using a fixed color ordering, e.g., rainbow order).
The M&M's are now separated into n sections (some may be empty).
Place a toothpick between each section, using (n − 1) toothpicks total.
Put on color-filter goggles so all M&M's look grey.
Write a 0 for each grey M&M and a 1 for each toothpick.
Result: a sequence of k zeroes and (n − 1) ones.

Inverse map g : B → A (binary sequence to M&M's):

Interpret the sequence as k grey M&M's separated by (n − 1) toothpicks.
Color the M&M's according to the specified ordering of colors.
Put them into a cup.

Since these processes reverse each other, f and g are inverses, so f is a bijection.

🔢 Non-consecutive number selection

🔢 The motivating problem

Question 5.4.1: How many ways can you choose 4 numbers from {1, …, 100} so that none of them are consecutive?

We know there are (100 choose 4) ways to choose 4 numbers, but the "non-consecutive" condition is hard to handle directly.
Strategy: transform the problem into an easier one via bijection.

🔢 The transformation

Set A: 4-tuples (a₁, a₂, a₃, a₄) with 1 ≤ a₁ < a₂ − 1 < a₃ − 2 < a₄ − 3 ≤ 97.

This encodes the non-consecutive condition: a₂ > a₁ + 1, a₃ > a₂ + 1, a₄ > a₃ + 1.

Set B: 4-tuples (b₁, b₂, b₃, b₄) with 1 ≤ b₁ < b₂ < b₃ < b₄ ≤ 97.

This is just choosing 4 distinct numbers from {1, …, 97}.
Size of B = (97 choose 4).

Bijection f : A → B:

Define f(a₁, a₂, a₃, a₄) = (a₁, a₂ − 1, a₃ − 2, a₄ − 3).
Let b₁ = a₁, b₂ = a₂ − 1, b₃ = a₃ − 2, b₄ = a₄ − 3.
The condition a₂ > a₁ + 1 becomes b₂ + 1 > b₁ + 1, i.e., b₂ > b₁.
Similarly, a₃ > a₂ + 1 becomes b₃ > b₂, and a₄ > a₃ + 1 becomes b₄ > b₃.
Also, a₄ ≤ 100 becomes b₄ ≤ 97.

Inverse g : B → A:

Define g(b₁, b₂, b₃, b₄) = (b₁, b₂ + 1, b₃ + 2, b₄ + 3).
Since f and g reverse each other, f is a bijection.

Answer: The number of ways to choose 4 non-consecutive numbers from {1, …, 100} is (97 choose 4).

🔢 Why this works

The shifts (−1, −2, −3) "absorb" the gaps required by the non-consecutive condition.
The transformed problem (choosing 4 distinct numbers from a smaller range) is straightforward to count.
Don't confuse: the original problem has a₁ < a₂ < a₃ < a₄ with gaps; the transformed problem has b₁ < b₂ < b₃ < b₄ with no gap requirement, but from a smaller set.

5.5 Proof by contradiction

🧭 Overview

🧠 One-sentence thesis

Proof by contradiction establishes a claim by assuming its opposite and deriving an impossible conclusion, which forces the original claim to be true.

📌 Key points (3–5)

Core strategy: assume the negation of what you want to prove, then show this assumption leads to something definitely false.
When to use it: especially useful when no direct formula or construction is available (e.g., proving infinitely many primes exist).
Common confusion: don't overuse—a direct proof is better when available; contradiction is fun but not always necessary.
Number theory applications: the excerpt demonstrates irrationality proofs (√3) and Euclid's proof of infinitely many primes.

🔄 The contradiction strategy

🔄 How the method works

Proof by contradiction: assume the opposite (negation) of what you want to show; then show that this implies something that is definitely false; this means that the initial assumption must have been false.

You start by supposing the claim is not true.
You follow logical steps from that assumption.
You arrive at a statement that contradicts known facts or your own assumptions.
Because the chain of reasoning was valid, the starting assumption must be wrong.
Therefore, the original claim must be true.

✈️ Everyday example

Claim: If I want to catch my 10 am flight from Denver, I need to leave the house before 7 am.

Proof sketch:

Assume I leave at 7 am or later.
Then I arrive at the parking lot by 8:15, at the airport by 8:30, finish bag check by 8:45, clear security by 9:00, take the train by 9:15, and reach the gate at 9:45.
But the doors close before 9:45—this is too late.
So the assumption (leaving at 7 or later) leads to missing the flight, which contradicts the goal.
Therefore, I must leave before 7 am.

🔢 Number theory proofs

🔢 Proving √3 is irrational

Claim: √3 is not a fraction.

Proof outline:

Assume √3 is rational, so √3 = a/b where a and b are integers, b ≠ 0, and gcd(a, b) = 1 (the fraction is in simplest form).
Squaring both sides: 3 = a²/b², so 3b² = a².
This means 3 divides a². Since 3 is prime, 3 must divide a (by unique prime factorization).
Write a = 3a₁ for some integer a₁.
Substitute: 3b² = (3a₁)² = 9a₁², so b² = 3a₁².
Then 3 divides b² and therefore 3 divides b.
But now both a and b are divisible by 3, so gcd(a, b) is a multiple of 3—contradicting gcd(a, b) = 1.
Therefore, √3 cannot be a fraction.

Key insight: The contradiction arises because assuming √3 is rational forces both numerator and denominator to share a common factor, violating the simplest-form assumption.

∞ Euclid's proof of infinitely many primes

Claim: There are infinitely many prime numbers.

Historical note: Euclid discovered this proof around 300 B.C. Contradiction is used because no one has found a formula to generate all primes directly.

Proof outline:

Assume there are only finitely many primes. Let r be the total number of primes, and let S = {p₁, …, pᵣ} be the complete set of all primes.
Consider the number N = 1 + (product of all primes in S) = 1 + p₁ · p₂ · … · pᵣ.
N must be divisible by at least one prime q (every positive integer has a unique prime factorization).
But N is not divisible by any prime in S, because N ≡ 1 mod pᵢ for each i (dividing N by any pᵢ leaves remainder 1).
So q is a new prime not in S.
This contradicts the assumption that S contains all primes.
Therefore, there must be infinitely many primes.

Why contradiction is needed: There is no known algorithm to list all primes, so we cannot prove the claim directly by construction.

⚠️ When to use contradiction

⚠️ Appropriate use cases

When no direct construction or formula exists (e.g., proving infinitely many primes).
When the negation of the claim is easier to work with logically.
When you need to show existence without being able to exhibit the object directly (the excerpt mentions "combinatorial existence proofs").

⚠️ Don't overuse

Sometimes proofs by contradiction are so fun that people use them more often than needed. It is better to use a direct proof when you can.

Contradiction can be elegant, but it is not always the simplest approach.
If you can prove something directly, that is usually clearer and more informative.
Example: Don't assume "not P" and derive P when you can just prove P from the start.

The Pigeonhole Principle

5.6 The Pigeonhole Principle

🧭 Overview

🧠 One-sentence thesis

The pigeonhole principle proves that when more objects are distributed into fewer containers than objects, at least one container must hold multiple objects, enabling proofs of grouping and existence in combinatorics.

📌 Key points (3–5)

Core mechanism: If you place more than n objects into n boxes, at least one box must contain 2 or more objects (Version 1).
Generalized version: If you place more than k·n objects into n boxes, at least one box must contain k+1 or more objects (Version 2).
Proof strategy: Uses contradiction—assume each box has at most the threshold number of objects, then show the total would be too small.
Common confusion: The principle says nothing about how many boxes are empty; all objects could be in one box.
Problem-solving approach: Identify k+1 (your goal), define n boxes cleverly so objects in the same box satisfy your goal, then verify object count exceeds k·n.

🎯 What the principle states

🎯 Version 1: Basic form

Pigeonhole principle Version 1: If we place more than n objects into n boxes, then some box contains 2 or more objects.

Plain language: More items than containers guarantees at least one container holds multiple items.
Why it works: If each box held at most 1 object, the total would be at most n objects—contradicting that we have more than n.
Example: 11 numbers chosen from 1 to 100, divided into 10 boxes (1–10, 11–20, ..., 91–100). Since 11 > 10, two numbers must share a box, so their difference is less than 10.

🎯 Version 2: Generalized form

Pigeonhole principle Version 2: Suppose k is a non-negative integer. If we place more than k·n objects into n boxes, then some box contains at least k+1 objects.

Relationship to Version 1: Substitute k=1 into Version 2 to recover Version 1.
Why it works: If each box held at most k objects, the total would be at most k·n objects—contradicting that we have more than k·n.
Example: Colorado population 5,612,000 people, n=1,000,000 boxes (one per possible hair count 0 to 999,999), k=5. Since 5,612,000 > 5·1,000,000, at least one box contains 6 or more people with the same hair count.

⚠️ What the principle does NOT say

Empty boxes: The principle says nothing about how many boxes are empty.
Distribution: It is possible that all objects are stuffed into the same box.
Don't confuse: The principle guarantees at least one box meets the threshold; it does not describe the distribution across all boxes.

🔧 How to apply the principle

🔧 Three-step problem-solving strategy

The excerpt recommends:

Identify k+1: Determine your goal (e.g., "at least 6 people" means k+1=6, so k=5).
Define n boxes: Design boxes so that if enough objects land in a single box, you accomplish the problem's goal.
Verify object count > k·n: Confirm the number of objects exceeds k·n to trigger the principle.

📐 Geometric example: Triangle target

Problem: An equilateral triangle target with side-length 2 meters is hit by 5 arrows. Prove two arrows strike within 1 meter of each other.

Solution:

Divide the triangle into n=4 smaller equilateral triangles, each with side-length 1 meter (the "boxes").
The 5 arrow striking points are the "objects."
Since 5 > 4, by Version 1, at least one smaller triangle contains at least 2 striking points.
The distance between any two points in the same smaller triangle is at most 1 meter (the side length).

🤝 Handshake example: Hidden constraint

Problem: At a party with n guests (n ≥ 2), some pairs shake hands (no one shakes their own hand). Prove at least two guests shook hands with the same number of people.

Solution:

Each guest shook between 0 and n−1 hands.
Label boxes 0 through n−1; place a person in box i if they shook exactly i hands.
Key insight: Either box 0 or box n−1 must be empty:
- If someone shook 0 hands, then no one shook all n−1 hands.
- If someone shook all n−1 hands, then no one shook 0 hands.
So we assign n people to n−1 boxes (either 0 through n−2, or 1 through n−1).
Since n > n−1, by Version 1, two people must be in the same box, so they shook the same number of hands.

🪑 Consecutive seating problems

🪑 Row of chairs

Problem: 17 people seated in a row of 20 chairs. Prove some consecutive set of 5 chairs are filled.

Solution:

There are 3 empty chairs.
These separate the row into n=4 "boxes" (far left, center left, center right, far right), each containing consecutively filled chairs.
Let k+1=5, so k=4.
Since 17 > 16 = 4·4 = k·n, by Version 2, some box contains at least 5 consecutively filled chairs.

🔄 Circle of chairs

Problem: 17 people seated in a circle of 20 chairs. Prove some consecutive set of 6 chairs are filled.

Solution:

The 3 empty chairs partition the circle into n=3 "boxes" of consecutively filled chairs.
Let k+1=6, so k=5.
Since 17 > 15 = 5·3 = k·n, by Version 2, some box contains at least 6 consecutively filled chairs.

Don't confuse: Circle vs. row—the circle has fewer partitions (n=3 vs. n=4) because empty chairs in a circle create fewer segments, so you need a larger k to guarantee the same consecutive length.

🎨 Application to graph coloring (Ramsey theory)

🎨 Triangle in a colored complete graph

Problem: Consider the complete graph K₆ with 6 vertices (every pair connected by an edge). If you color the edges green and gold, show there must be either a green triangle or a gold triangle.

Solution:

Pick one vertex v. It has 5 edges connecting it to the 5 other vertices.
Let n=2 (for the 2 colors) and k=2.
Since 5 > 2·2 = 4, by Version 2, at least 3 of these 5 edges have the same color (say green).
These 3 green edges connect v to 3 other vertices w, x, y.
Case analysis:
- If any edge among w, x, y is green (e.g., w–x), then v, w, x form a green triangle.
- If all three edges among w, x, y are gold, then w, x, y form a gold triangle.
Either way, a monochromatic triangle exists.

Why this matters: This is an example of Ramsey theory, which studies unavoidable patterns in large structures.

Additional Problems for Chapter 5

5.7 Additional problems for Chapter

🧭 Overview

🧠 One-sentence thesis

This problem set applies combinatorial reasoning, the pigeonhole principle, and modular arithmetic to prove identities, spatial constraints, and number-theoretic results like Fermat's Little Theorem.

📌 Key points (3–5)

Combinatorial proofs: demonstrate identities by counting the same set in two different ways or by constructing bijections between sets.
Pigeonhole principle applications: prove that certain configurations (consecutive filled chairs, nearby points, number differences) must exist when items are distributed into limited containers.
Fermat's Little Theorem: states that a to the power p is congruent to a modulo p for any prime p and positive integer a; can be proven algebraically or combinatorially.
Common confusion: pigeonhole problems in rows vs circles require different consecutive-set sizes because circles have no endpoints.
Induction and well-ordering: these principles are closely related but differ in structure—induction builds upward step-by-step, while well-ordering guarantees a smallest element in any nonempty subset.

🧮 Combinatorial proof techniques

🧮 What combinatorial proofs are

Combinatorial proof: a demonstration that two expressions are equal by showing they count the same set in two different ways, or by constructing a bijection between two sets.

Instead of algebraic manipulation, you interpret both sides of an equation as answers to counting questions.
The excerpt asks for proofs of factorial formulas, binomial coefficient identities, and summation identities using these methods.

🔢 Factorial and power formulas

The problems ask to prove:

n factorial equals n times (n−1) times (n−2) down to 3 times 2 times 1
n to the power k equals the product of k copies of n
The falling factorial n times (n−1) times ... times (n−k+1) equals n factorial divided by (n−k) factorial
The binomial coefficient (n choose k) equals n factorial divided by k factorial times (n−k) factorial

How to approach: interpret each side as counting arrangements or selections.

Example: n factorial counts the number of ways to arrange n distinct objects in a row; the product n·(n−1)·...·1 counts the same thing by choosing the first position (n choices), then the second (n−1 choices), etc.

🔗 Summation identities

The excerpt includes:

Sum from k=0 to n of k times (n choose k) squared equals n times (2n−1 choose n−1)
Sum from k=0 to n of (n choose k) times s to the power k times t to the power (n−k) equals (s+t) to the power n (binomial theorem with positive integers s, t, n)
Sum from k=0 to n of 2 to the power k times (n choose k) equals 3 to the power n

How to approach: find a counting scenario where one side counts by cases and the other counts directly.

Example: for the identity involving 2 to the power k times (n choose k), think of choosing a subset of n items and then assigning each chosen item one of two additional labels; the left side counts by subset size, the right side counts all possibilities at once (each of n items has 3 states: not chosen, chosen with label 1, chosen with label 2).

🕳️ Pigeonhole principle problems

🕳️ What the pigeonhole principle says

Pigeonhole principle: if more items are placed into fewer containers than there are items, at least one container must hold more than one item.

The excerpt applies this to prove existence of number pairs with small differences, points within a certain distance, and consecutive filled chairs.

🔢 Number and point problems

Problem 8 (from earlier in the excerpt): if 17 numbers are chosen from 1 to 100, show that two of them have a difference less than 17.

Divide the range 1–100 into intervals (pigeonholes); with 17 numbers and fewer than 17 intervals, two numbers must fall in the same interval, guaranteeing a small difference.

Problem 9: among any 10 points inside a unit square, there must exist 3 of them all within square root of 2 divided by 2 of each other.

Partition the unit square into regions (pigeonholes); with 10 points and fewer regions, some region must contain at least 3 points, and the region's diameter ensures the distance bound.

🪑 Consecutive chair problems

The excerpt presents several variations:

Setup	Result	Key difference
94 people, 100 chairs in a row	Some consecutive 14 chairs are filled	Row has endpoints
94 people, 100 chairs in a circle	Some consecutive 16 chairs are filled	Circle has no endpoints; requires larger set
57 people, 64 chairs in a circle	At least 9 consecutively filled chairs	Fewer people relative to chairs
75 people, 80 chairs in a row	At least 13 consecutively filled chairs	More people relative to chairs

How to approach: divide the chairs into overlapping or non-overlapping groups (pigeonholes); count how many groups there are and how many must be "hit" by the people; deduce that some group must be densely filled.

Don't confuse: rows and circles require different arguments because in a circle, every position is equivalent (no first or last chair), which changes the pigeonhole structure.

🎯 Group difference problem

Problem 4: if 30 numbers are chosen from 1 to 82, show that there is a group of at least 5 of them such that the difference between any two numbers in this group is less than 12.

Partition the range into intervals of width less than 12 (pigeonholes).
With 30 numbers and fewer than 30 intervals, some interval must contain at least 5 numbers.
All numbers in the same interval have pairwise differences less than 12.

🔁 Induction and well-ordering

🔁 Well-ordering principle

Problem 7a: look up the statement of the well-ordering principle and explain how it is similar to and different from induction.

The excerpt does not provide the statement itself, but asks you to compare it to induction.
The problem hints that well-ordering guarantees any nonempty subset of natural numbers has a least element.

🔁 Proving well-ordering from strong induction

Problem 7b: prove the well-ordering principle using only strong induction and the axiom that 0 is smaller than every positive integer.

Define P(n) as "any subset of the natural numbers containing n has a least element."
Base case: prove P(0)—if a subset contains 0, then 0 is the least element (by the axiom).
Inductive step: assume P(0), P(1), ..., P(n) are all true; prove P(n+1).
- If a subset contains n+1, either it contains some smaller number (in which case P(0)...P(n) guarantee a least element) or n+1 is the smallest.
This shows that every subset has a least element, proving well-ordering.

Don't confuse: induction proves statements for all n by building upward; well-ordering guarantees a minimum exists in any nonempty set—they are logically equivalent but conceptually distinct.

🔐 Fermat's Little Theorem

🔐 Statement of the theorem

Fermat's Little Theorem: for any prime p and positive integer a, a to the power p is congruent to a modulo p.

Equivalently: p divides (a to the power p minus a).
Example: when p=7, then 7 divides (1 to the power 7 minus 1), 7 divides (2 to the power 7 minus 2), 7 divides (3 to the power 7 minus 3), etc.

🧪 Algebraic proof via binomial expansion

Problem 8: prove Fermat's Little Theorem using modular arithmetic and binomial coefficients.

(a) Generalize a previous exercise to show: (x₁ + ... + xₐ) to the power p is congruent to (x₁ to the power p + ... + xₐ to the power p) modulo p for any a ≥ 2.
- This uses the fact that binomial coefficients (p choose k) for 0 < k < p are divisible by p when p is prime.
(b) Substitute x₁ = 1, ..., xₐ = 1 (so there are a copies of 1).
- Left side: (1 + 1 + ... + 1) to the power p = a to the power p.
- Right side: 1 to the power p + ... + 1 to the power p = a.
- Therefore a to the power p ≡ a (mod p).

🎲 Combinatorial proof via sequences

Problem 9: prove Fermat's Little Theorem by counting sequences.

(a) Let S be the set of sequences of length p with entries from the alphabet {1, ..., a} such that not all entries are the same.
- Total sequences: a to the power p.
- Sequences with all entries the same: a.
- Therefore |S| = a to the power p minus a.
(b) Define an equivalence relation: two sequences are equivalent if one is a rotation of the other.
- Example: when p=5, (1,2,3,4,5) is equivalent to (2,3,4,5,1).
- Each equivalence class has size p because there are p distinct rotations of any sequence (and they are all distinct because not all entries are the same).
- Why p must be prime: if p were composite and a sequence had a repeating pattern with period dividing p, rotations would not all be distinct; primality ensures no such period exists.
(c) By the division principle, the number of equivalence classes is |S| divided by p = (a to the power p minus a) divided by p.
- This must be an integer, so p divides (a to the power p minus a).

Don't confuse: the algebraic proof uses properties of binomial coefficients; the combinatorial proof uses counting and symmetry—both are valid but use different reasoning.

Pascal's Triangle Modulo 2 and Sierpinski's Triangle

5.8 Investigation: Pascal’s triangle modulo 2 and Sierpinski’s triangle

🧭 Overview

🧠 One-sentence thesis

Pascal's triangle modulo 2 reveals a self-similar fractal pattern that exactly matches the structure of Sierpinski's triangle, demonstrating a deep connection between combinatorics and geometric fractals.

📌 Key points (3–5)

What modulo 2 means for Pascal's triangle: replace all even binomial coefficients with 0 and all odd coefficients with 1 to reveal hidden patterns.
Key structural property: rows that are powers of 2 (n = 2^m) contain only 1's at the first and last positions, with all middle entries being 0.
Self-similarity pattern: the rows from n = 0 to n = 2^m - 1 repeat on both the left and right sides of rows n = 2^m to n = 2^(m+1) - 1, with a triangle of zeros in the middle.
Common confusion: the pattern is not about the original binomial coefficient values but about their remainders when divided by 2 (odd vs even).
Fractal connection: the 1's in Pascal's triangle mod 2 outline Sierpinski's triangle, a famous self-similar fractal shape.

🔢 Understanding Pascal's Triangle Modulo 2

🔢 Converting to modulo 2

If an integer a is even, then a ≡ 0 mod 2; if a is odd, then a ≡ 1 mod 2.

The process: take each entry in Pascal's triangle and replace it with its remainder when divided by 2.
Even numbers become 0, odd numbers become 1.
Example from the excerpt:
- Row n = 4: 1, 4, 6, 4, 1 becomes 1, 0, 0, 0, 1
- Row n = 5: 1, 5, 10, 10, 5, 1 becomes 1, 1, 0, 0, 1, 1

🔍 Computing the next row

You can determine the next row of Pascal's triangle mod 2 from the previous row without computing the original triangle.
The excerpt asks students to discover this method through observation of patterns.

🎯 The Power-of-2 Row Property

🎯 Statement of the key proposition

Proposition 5.8.4: If n = 2^m is a power of 2, then the numbers in row n of Pascal's triangle mod 2 are all 0 except for the first and last entry.

This means rows n = 1, 2, 4, 8, 16, 32, ... have the pattern: 1, 0, 0, ..., 0, 1
After row n = 3 (which is 1, 1, 1, 1), the next all-1's row would be at the next power of 2.

🛠️ Three proof methods

The excerpt outlines three different approaches to prove this proposition:

Method	Key idea
Factorial/prime factorization	Show the numerator of (2^m choose k) has more factors of 2 than the denominator
Algebraic	Use induction to show (x + y)^(2^m) ≡ x^(2^m) + y^(2^m) mod 2
Bijective/combinatorial	Count subsets and show the count is even using complements and induction

🧮 The algebraic approach details

Step 1: Show that (x + y)^2 ≡ x^2 + y^2 mod 2 (base case)
Step 2: Use induction on m, assuming (x + y)^(2^(m-1)) ≡ x^(2^(m-1)) + y^(2^(m-1)) mod 2
Step 3: Square both sides to get the result for 2^m
The key insight: reducing coefficients (but not exponents) modulo 2 eliminates all middle terms

🔁 Self-Similarity and Repetition Patterns

🔁 The upside-down triangle of zeros

Between rows n = 2^m and n = 2^(m+1) - 1, there is a triangular region of zeros in the middle.
The excerpt uses a "ball-dropping" metaphor: balls placed at the edges of row n = 2^m can only land in certain positions, creating this zero region.

🪞 Left and right repetition

The rows from n = 0 to n = 2^m - 1 repeat on both the left-hand side and right-hand side of rows n = 2^m to n = 2^(m+1) - 1.
Mathematically: if 0 ≤ n ≤ 2^m - 1 and 0 ≤ k ≤ n, then (n choose k) ≡ (n + 2^m choose k) mod 2 and (n choose k) ≡ (n + 2^m choose k + 2^m) mod 2.

🧪 Proving the pattern

The excerpt shows that (x + y)^(n + 2^m) ≡ (x^(2^m) + y^(2^m)) · (x + y)^n mod 2.

This algebraic identity explains both the upside-down triangle of zeros and the self-similar repetition.
The factor (x^(2^m) + y^(2^m)) creates the splitting into left and right copies.

🔺 Connection to Sierpinski's Triangle

🔺 What is Sierpinski's triangle

Sierpinski's triangle is a famous example of a fractal, which is a shape that exhibits self-similarity.

It is made up of three shrunken copies of itself, each half the total size, arranged in a triangle.
Construction method 1: Start with a small triangle, copy it to lower left and lower right, repeat infinitely.
Construction method 2: Start with a black equilateral triangle, divide into 4 smaller triangles, color the middle one white, repeat for each remaining black triangle.

🎨 The pattern match

The pattern of 1's in Pascal's triangle mod 2 outlines a Sierpinski triangle shape.
The self-similarity discovered in Pascal's triangle mod 2 (rows repeating on left and right) is exactly the same as the self-similarity in Sierpinski's triangle.
Example activity: Print the first 16 rows of Pascal's triangle mod 2 and connect the 1's with lines in three triangular directions to see the Sierpinski pattern emerge.

🌀 Why this connection exists

Both structures exhibit the same recursive pattern: a shape made of two copies of itself (left and right) with a gap in the middle.
The algebraic structure of binomial coefficients modulo 2 naturally produces the geometric structure of the Sierpinski fractal.
Don't confuse: Sierpinski's triangle is the infinite limit of the construction process, while any finite printout of Pascal's triangle mod 2 is only an approximation.

Fibonacci numbers

6.1 Fibonacci numbers

🧭 Overview

🧠 One-sentence thesis

The Fibonacci sequence, defined recursively as F_n = F_(n−1) + F_(n−2), appears in diverse counting problems—from rabbit populations to staircase climbing—and can be computed directly using a closed-form formula involving the golden ratio.

📌 Key points (3–5)

What a recursive relation is: a formula that defines each term in a sequence using previous terms in that sequence.
The Fibonacci recurrence: F_1 = 1, F_2 = 1, and for n ≥ 3, F_n = F_(n−1) + F_(n−2), producing the sequence 1, 1, 2, 3, 5, 8, 13, 21, ...
Staircase problem connection: the number of ways to climb n stairs (taking 1 or 2 steps at a time) equals F_(n+1), proven by showing the same recurrence holds.
Common confusion: the staircase sequence S_n and Fibonacci F_n follow the same recurrence but are shifted—S_n = F_(n+1), not S_n = F_n.
Closed-form formula: F_n can be computed directly using the golden ratio α = (1 + √5)/2 without calculating all prior terms, though the formula surprisingly involves irrational numbers yet always yields integers.

🐇 The rabbit problem and Fibonacci definition

🐇 Original rabbit question

The Fibonacci numbers were first published in 1202 by Leonardo of Pisa in response to this question:

Suppose that rabbits mature in one month, that an adult pair of rabbits produces exactly one baby pair of rabbits each month, and that no rabbits ever die. Starting with one pair of baby rabbits in month 1, how many pairs of rabbits will there be in month 17?

The answer is F_17 = 1597 pairs.
The problem is completely unrealistic, but it led to a sequence that appears in nature (leaves on stems, pine cones, artichokes, pineapples, bee family trees) and applications in economics, logic, optics, and pseudo-random number generators.

📐 Formal definition

Recursive relation: A formula for a_n in terms of the previous values in the sequence a_1, a_2, ...

n-th Fibonacci number F_n: Set F_1 = 1 and F_2 = 1. For n ≥ 3, F_n = F_(n−1) + F_(n−2).

Each term is the sum of the two preceding terms.
The sequence begins: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, ...

🪜 The staircase climbing problem

🪜 The question

A staircase has n steps. How many ways can you go up the stairs if you can take one or two steps at a time and never go down?

Let S_n be the number of ways to climb n steps.
Example: for n = 4, there are 5 ways: (1,1,1,1), (1,1,2), (1,2,1), (2,1,1), (2,2).

🔍 Why the recurrence holds

The excerpt explains the logic for S_4 = S_3 + S_2:

To climb 4 steps, you must start with either:
- a size-1 step, leaving 3 stairs remaining → S_3 ways to proceed
- a size-2 step, leaving 2 stairs remaining → S_2 ways to proceed
By the addition principle, S_4 = S_3 + S_2.

More generally, for n ≥ 3:

Every sequence of steps up n stairs starts with either a size-1 step (leaving n−1 stairs, so S_(n−1) ways) or a size-2 step (leaving n−2 stairs, so S_(n−2) ways).
Therefore S_n = S_(n−1) + S_(n−2).

🔗 Relationship to Fibonacci numbers

Lemma 6.1.5: S_n = F_(n+1) for all n ≥ 1.

Proof sketch (using strong induction):

Base cases: S_1 = 1 = F_2 and S_2 = 2 = F_3.
Inductive step: Assume S_i = F_(i+1) for all 1 ≤ i ≤ n−1. Then:
- S_n = S_(n−1) + S_(n−2) (by the recurrence for stairs)
- = F_n + F_(n−1) (by the inductive hypothesis)
- = F_(n+1) (by the Fibonacci recurrence).

Don't confuse: The staircase values S_n are Fibonacci numbers, but shifted left by one index compared to F_n. For example, S_5 = F_6 = 8.

n (stairs)	1	2	3	4	5
S_n	1	2	3	5	8
Corresponding F	F_2	F_3	F_4	F_5	F_6

🌟 The golden ratio and closed-form formula

🌟 What the golden ratio is

Golden ratio: α = (1 + √5)/2 ≈ 1.61803...

Conjugate: α̅ = (1 − √5)/2 ≈ −0.61803...

Both α and α̅ are roots of the quadratic polynomial f(x) = x² − x − 1 (by the quadratic formula).
The golden ratio is important in Greek architecture and natural objects like sea shells for the ratio of length to height.

📐 Closed-form formula for Fibonacci numbers

Theorem 6.1.6: For all n ≥ 0, F_n = (1/√5) × [ ((1+√5)/2)^n − ((1−√5)/2)^n ].

In words: F_n equals one over the square root of 5, times the difference between α to the n-th power and α̅ to the n-th power.

Why this is surprising:

The right-hand side involves irrational numbers (√5) but always produces an integer.
It is not initially clear why the Fibonacci numbers are connected to the golden ratio.

🔢 Practical consequence

Drawback of recursion: computing F_100 requires computing F_1 through F_99 first.
Closed-form advantage: the formula allows direct computation of F_n without computing all prior terms.
Approximation: Because |α̅| < 1, powers of α̅ approach 0 as n grows large. Therefore F_n ≈ α^n for large n.

Note: The excerpt does not prove Theorem 6.1.6 in this chapter; it mentions that explanation appears in Section 6.2 (Example 6.2.14) and that induction proof will be revisited in Chapter 7.

🧮 Additional Fibonacci patterns (from exercises)

🧮 Sums and identities

The excerpt includes several patterns proven by induction:

Identity	Formula	Meaning
Sum of first n Fibonacci	F_1 + F_2 + ⋯ + F_n = F_(n+2) − 1	The sum telescopes to one less than F_(n+2)
Sum of odd-indexed	F_1 + F_3 + F_5 + ⋯ + F_(2n−1) = F_(2n)	Odd-indexed Fibonacci sum to an even-indexed one
Sum of squares	F₁² + F₂² + ⋯ + F_n² = F_n × F_(n+1)	Squares sum to product of consecutive terms

🎨 Other counting problems yielding Fibonacci

The exercises describe additional scenarios that produce Fibonacci numbers:

Domino tiling: D_n = number of ways to cover an n-by-2 board with 2-by-1 dominoes (no overlap) equals F_(n+1).
Subway seats: S_n = number of ways to fill n seats so no two people sit next to each other equals F_(n+2) for n ≥ 3.
Rock painting: number of ways to paint n rocks silver or navy so no two silver rocks are adjacent equals F_(n+2).
Subset without consecutive integers: the number of subsets of {1, 2, 3, ..., 9} containing no two consecutive integers is a Fibonacci number.

Common structure: All these problems involve "take it or leave it" choices with a constraint that prevents two consecutive selections, leading to the same recurrence S_n = S_(n−1) + S_(n−2).

Linear recurrence relations

6.2 Linear recurrence relations

🧭 Overview

🧠 One-sentence thesis

Linear recurrence relations can be solved systematically using the characteristic polynomial method to produce closed-form formulas that avoid the computational slowdown of recursive calculation.

📌 Key points (3–5)

What a linear recurrence relation is: an equation defining each term as a weighted sum of previous terms (e.g., a_n = c₁·a_{n-1} + c₂·a_{n-2}).
Why closed-form formulas matter: computers compute early entries quickly but slow down dramatically for later entries (e.g., the 100th term); closed formulas solve this.
The characteristic polynomial method: convert the recurrence into a polynomial, find its roots, then express a_n as a sum of powers of those roots.
Common confusion: the characteristic polynomial is not the recurrence relation itself—it is x^d - c₁·x^{d-1} - ... - c_d, where the signs flip and x replaces the index.
Order matters: order 2 recurrences need 2 initial values and produce 2 roots; order d recurrences need d initial values and produce d roots.

🧩 Core definitions and examples

🧩 What is a linear recurrence relation?

Linear recurrence relation of order d: an equation of the form a_n = c₁·a_{n-1} + c₂·a_{n-2} + ... + c_d·a_{n-d}, where c₁, ..., c_d are constants and c_d ≠ 0.

Each term is a linear combination of the previous d terms.
The initial values are the first d terms: a₁, a₂, ..., a_d.
Sometimes called "order d homogeneous linear recurrence with constant coefficients."

🪜 Example: Staircase climbing (order 3)

Problem: How many ways can you climb n stairs if you can take 1, 2, or 3 steps at a time?

Let T_n = number of ways to climb n stairs.
Recurrence: T_n = T_{n-1} + T_{n-2} + T_{n-3}.
Why: every sequence starts with either a 1-step (leaving T_{n-1} ways for the rest), a 2-step (leaving T_{n-2} ways), or a 3-step (leaving T_{n-3} ways).
Base cases (computed by hand): T₁ = 1, T₂ = 2, T₃ = 4.
Using the recurrence: T₄ = 7, T₅ = 13, T₆ = 24, T₇ = 44, T₈ = 81, T₉ = 149.

Example: For 9 stairs, there are 149 distinct sequences of steps.

🐝 Example: Wasp colony (order 2)

wasp_n = wasp_{n-1} + 20·wasp_{n-2}, with wasp₁ = 1, wasp₂ = 1.
This is linear of order 2 with constants c₁ = 1, c₂ = 20.
The excerpt notes that computing wasp(100) directly via recursion becomes very slow.

🔧 The characteristic polynomial method (order 2)

🔧 Building the characteristic polynomial

Characteristic polynomial of a_n = c₁·a_{n-1} + c₂·a_{n-2}: the degree 2 polynomial c(x) = x² - c₁·x - c₂.

Don't confuse: the recurrence has plus signs; the characteristic polynomial has minus signs.
The roots r₁ and r₂ are found by the quadratic formula: (c₁ ± √(c₁² + 4c₂)) / 2.

🧮 The closed-form formula

Theorem (order 2, distinct roots): If r₁ ≠ r₂, then there exist unique constants z₁ and z₂ such that

a_n = z₁·r₁ⁿ + z₂·r₂ⁿ for all n ≥ 1.

To find z₁ and z₂, substitute n = 1 and n = 2 into the formula and solve the system:
- a₁ = z₁·r₁ + z₂·r₂
- a₂ = z₁·r₁² + z₂·r₂²

🌟 Example: Fibonacci numbers

Recurrence: F_n = F_{n-1} + F_{n-2}, with F₁ = 1, F₂ = 1.
Characteristic polynomial: c(x) = x² - x - 1.
Roots: α = (1 + √5)/2 (the golden ratio) and α̅ = (1 - √5)/2.
Solving for z₁ and z₂ using the initial values:
- 1 = z₁·α + z₂·α̅
- 1 = z₁·α² + z₂·α̅²
Result: z₁ = 1/√5, z₂ = -1/√5.
Closed formula: F_n = (1/√5)·αⁿ - (1/√5)·α̅ⁿ.

Example: This formula lets you compute F₁₀₀ instantly, whereas recursive computation becomes impractically slow.

🚀 Higher-order recurrences

🚀 General definition (order d)

Linear recurrence relation of order d: a_n = c₁·a_{n-1} + c₂·a_{n-2} + ... + c_d·a_{n-d}, where c_d ≠ 0.

Requires d initial values: a₁, a₂, ..., a_d.
The staircase problem (T_n = T_{n-1} + T_{n-2} + T_{n-3}) is order 3 with c₁ = c₂ = c₃ = 1 and initial values T₁ = 1, T₂ = 2, T₃ = 4.

🔍 Characteristic polynomial (order d)

Characteristic polynomial of order d recurrence: c(x) = x^d - c₁·x^{d-1} - c₂·x^{d-2} - ... - c_{d-1}·x - c_d.

The polynomial has degree d, so it has d roots (call them r₁, r₂, ..., r_d).

🧮 Closed-form formula (order d)

Theorem (order d, distinct roots): If the d roots r₁, ..., r_d are all different, then there exist unique constants z₁, ..., z_d such that

a_n = z₁·r₁ⁿ + z₂·r₂ⁿ + ... + z_d·r_dⁿ for all n ≥ 1.

To find z₁, ..., z_d, substitute n = 1, 2, ..., d into the formula and solve the resulting system of d equations.
Don't confuse: the theorem requires distinct roots; if roots repeat, the formula changes (not covered in this excerpt).

💻 Implementing recurrences in SAGE

💻 Defining a recursive function

The excerpt shows how to define the staircase sequence T_n in SAGE:

def T(x):
    if x == 1: return 1;
    if x == 2: return 2;
    if x == 3: return 4
    else: return T(x-1) + T(x-2) + T(x-3)

The base cases (x = 1, 2, 3) are hard-coded.
For x ≥ 4, the function calls itself recursively.
To print the first 10 values: [print(T(n)) for n in [1..10]].

💻 Finding roots and constants

The excerpt demonstrates the workflow for Fibonacci:

Define the characteristic polynomial: c = x^2 - x - 1.
Find roots: c.roots() returns a list; extract r₁ and r₂.
Set up equations for z₁ and z₂ using the initial values.
Solve the system: solve(eqnfibo, z1, z2).
Define the closed-form function and compute large values (e.g., Fibo(100)).

Why this matters: The recursive definition of T(100) or fibo(100) becomes extremely slow; the closed-form formula computes it instantly.

💻 Example: Wasp colony in SAGE

Define the recurrence: wasp_n = wasp_{n-1} + 20·wasp_{n-2}.
Characteristic polynomial: c(x) = x² - x - 20.
Find roots r₁ and r₂, then solve for z₁ and z₂ using wasp₁ = 1, wasp₂ = 1.
Check by computing wasp₃; then compute wasp₁₀₀ using the closed formula.

🧷 Key distinctions and common confusions

🧷 Recurrence vs characteristic polynomial

Aspect	Recurrence relation	Characteristic polynomial
Form	a_n = c₁·a_{n-1} + c₂·a_{n-2}	c(x) = x² - c₁·x - c₂
Signs	Addition	Subtraction
Purpose	Defines the sequence	Tool to find closed formula

Don't confuse: The characteristic polynomial is derived from the recurrence but is not the same equation.

🧷 Order and initial values

Order d means the recurrence depends on the previous d terms.
You need exactly d initial values to uniquely determine the sequence.
Example: Fibonacci (order 2) needs F₁ and F₂; staircase (order 3) needs T₁, T₂, and T₃.

🧷 Distinct roots assumption

The theorems in this excerpt assume all roots are distinct (r₁ ≠ r₂ ≠ ... ≠ r_d).
If roots repeat, the closed-form formula changes (the excerpt mentions this is covered in Chapter 7).
Example: If c(x) = (x - 2)², the formula involves terms like n·2ⁿ, not just 2ⁿ.

The Tower of Hanoi

6.3 The tower of Hanoi

🧭 Overview

🧠 One-sentence thesis

The Tower of Hanoi game can be modeled by a nonlinear recurrence relation that yields a closed-form solution of 2^n − 1 moves to transfer n disks optimally.

📌 Key points (3–5)

What the game is: a puzzle with 3 pegs and n disks of different sizes, where you move disks one at a time without placing a larger disk on a smaller one.
The recurrence relation: H₁ = 1 and Hₙ = 2·Hₙ₋₁ + 1 for n ≥ 2, which is nonlinear (not solvable by the methods of section 6.2).
Closed-form solution: Hₙ = 2^n − 1 gives the minimal number of moves for n disks.
Common confusion: this is a nonlinear recurrence (the relation includes "+ 1" in addition to the recursive term), unlike the linear recurrences of section 6.2.
Why it matters: the recurrence captures the optimal strategy—move all but the largest disk, move the largest, then move the rest back on top.

🎮 The game and its rules

🎮 Setup and objective

Three pegs: left, middle, right.
n disks: each has a hole in the center; they start stacked on the left peg in order from largest (bottom) to smallest (top).
Goal: move all disks to another peg.

🚫 Movement constraints

On each move, transfer exactly one disk to another peg.
Never place a larger disk on top of a smaller one.
Example: if you have 2 disks, you must move the small disk off the large one before you can move the large disk.

🔢 Small cases

n	Hₙ	Explanation
1	1	Just move the single disk to another peg.
2	3	Move small disk to middle peg, move big disk to right peg, move small disk on top of big disk.
3	7	(Shown in exercises.)
4	15	(Shown in exercises.)

🔁 The recurrence relation

🔁 Defining Hₙ

Hₙ: the minimal number of moves needed to finish the game for n disks.

It is not "any number of moves that works," but the smallest number that guarantees success.
The excerpt asks "Can the tower of Hanoi be represented as a recurrence relation?" and answers yes.

🧩 The recursive structure (Lemma 6.3.2)

The recurrence is:

Base case: H₁ = 1
Recursive case: Hₙ = 2·Hₙ₋₁ + 1 for n ≥ 2

Why this works (proof sketch from the excerpt):

By definition, it takes Hₙ₋₁ moves to transfer all the disks except the biggest one to another peg (say, the middle peg).
Then it takes 1 move to move the biggest disk to the empty peg (say, the right peg).
Then, by definition, it takes Hₙ₋₁ moves to transfer all the other disks from the middle peg on top of the biggest disk on the right peg.
Total: Hₙ₋₁ + 1 + Hₙ₋₁ = 2·Hₙ₋₁ + 1.

Example: for n = 2, H₂ = 2·H₁ + 1 = 2·1 + 1 = 3, which matches the manual count.

⚠️ Nonlinear vs linear

The excerpt labels this a nonlinear recurrence.
Don't confuse: a linear recurrence (section 6.2) has the form aₙ = c₁·aₙ₋₁ + c₂·aₙ₋₂ + … (no extra constant term that doesn't come from initial conditions).
Here, the "+ 1" makes it nonlinear, so the methods of section 6.2 do not apply.
Exercise 2 asks "True or False: the recursion for the Tower of Hanoi problem is a linear recurrence that can be solved with the methods of section 6.2." The answer is False.

📐 The closed-form solution

📐 Formula (Lemma 6.3.3)

Hₙ = 2^n − 1

This gives the minimal number of moves directly without computing all previous terms.
Example: for n = 5, H₅ = 2^5 − 1 = 32 − 1 = 31 (mentioned in exercise 6).

🔍 Proof by induction

The excerpt provides a proof:

Base case (n = 1): H₁ = 2^1 − 1 = 1, which is true by definition.
Inductive hypothesis: assume Hₙ₋₁ = 2^(n−1) − 1.
Inductive step: by Lemma 6.3.2, Hₙ = 2·Hₙ₋₁ + 1.
- Substitute: Hₙ = 2·(2^(n−1) − 1) + 1 = 2^n − 2 + 1 = 2^n − 1.
Conclusion: the result is true by induction.

📊 Verification table

n	2^n − 1	Matches Hₙ?
1	1	Yes (H₁ = 1)
2	3	Yes (H₂ = 3)
3	7	Yes (H₃ = 7, per exercises)
4	15	Yes (H₄ = 15, per exercises)
5	31	Yes (exercise 6 mentions 31 steps)

🛠️ Exercises and strategy

🛠️ Exercises overview

The exercises ask you to:

Compute H₅ (answer: 31).
Verify that the recurrence is nonlinear (True/False question).
Write out explicit move sequences for n = 3 (7 moves) and n = 4 (15 moves).
Observe patterns: which disks does the smallest disk (d₁) rest on along the way?
For n = 5, what rules must you follow about where d₁ and d₂ can rest to achieve 31 steps?
Write an algorithm to solve the tower of Hanoi most efficiently.

🧠 Strategy insight

The recurrence relation itself encodes the optimal strategy: recursively move all but the largest disk, move the largest, then recursively move the rest back.
Don't confuse: the "+ 1" in the recurrence is the single move of the largest disk; the two Hₙ₋₁ terms are the two recursive sub-problems.
Example: to move 3 disks optimally, you must move the top 2 disks (H₂ = 3 moves), move the largest disk (1 move), then move the top 2 disks again (H₂ = 3 moves), for a total of 3 + 1 + 3 = 7 moves.

Regions of the plane

6.4 Regions of the plane

🧭 Overview

🧠 One-sentence thesis

When drawing n non-parallel lines in the plane so that at most two lines meet at any point, the number of regions grows according to a recurrence relation that yields a closed-form quadratic formula.

📌 Key points (3–5)

What P_n measures: the number of regions formed by drawing n non-parallel lines with at most two lines intersecting at any point.
The recurrence relation: P_0 = 1 (no lines = one region, the whole plane) and P_n = P_(n−1) + n for n ≥ 1.
Why the recurrence works: the nth line crosses n−1 existing lines, splitting n existing regions in two, adding n new regions.
Closed-form formula: P_n = (n² + n + 2)/2 for all n ≥ 1, proven by induction.
Common confusion: the nth line does not create n new regions from scratch; it splits n existing regions by crossing n−1 lines.

📐 Defining the problem

📐 Setup and constraints

Draw lines on a piece of paper.
Lines must be non-parallel (so every pair eventually intersects).
At most 2 lines intersect at a point (no three lines meet at one point).
P_n counts the resulting regions.

🔢 Small examples

The excerpt provides concrete values:

P_1 = 2 (one line splits the plane into two regions)
P_2 = 4 = P_1 + 2
P_3 = 7 = P_2 + 3
P_4 = 11 = P_3 + 4

Example: with zero lines, the entire plane is one region, so P_0 = 1.

🔁 The recurrence relation

🔁 Base case and recursive step

Recurrence relation: P_0 = 1 and P_n = P_(n−1) + n for n ≥ 1.

Base case: no lines means one region (the whole plane), so P_0 = 1.
Inductive step: start with n−1 lines (P_(n−1) regions). When you add the nth line, it crosses n−1 other lines (because lines are non-parallel and at most two meet at a point).

✂️ Why adding the nth line adds n regions

The nth line intersects n−1 existing lines.
Each intersection divides one existing region into two.
So the nth line splits n existing regions in two, adding n new regions total.
Hence P_n = P_(n−1) + n.

Don't confuse: the nth line does not create n brand-new regions; it splits n regions that were already there.

📊 Closed-form formula

📊 The formula

Closed form: P_n = (n² + n + 2)/2 for all n ≥ 1.

The excerpt proves this by induction.

🧮 Proof by induction

Base case (n = 1):

P_1 = 2.
Formula gives (1² + 1 + 2)/2 = 4/2 = 2. ✓

Inductive step:

Assume P_n = (n² + n + 2)/2.
Want to show P_(n+1) = ((n+1)² + (n+1) + 2)/2 = (n² + 3n + 4)/2.
By the recurrence relation, P_(n+1) = P_n + (n+1).
Substitute the inductive hypothesis:
- P_(n+1) = (n² + n + 2)/2 + (n+1)
- = (n² + n + 2)/2 + (2n + 2)/2
- = (n² + 3n + 4)/2. ✓
Hence the formula holds by induction.

📋 Computed values

The excerpt provides a table of values using the recurrence:

n	0	1	2	3	4	5	6	7	8	9	10
P_n	1	2	4	7	11	16	22	29	37	46	56

Example: for n = 10, the formula gives (100 + 10 + 2)/2 = 112/2 = 56.

🔄 Related problem: circles in general position

🔄 Circles instead of lines

The exercises introduce a similar problem with circles:

Draw n circles in general position: each pair of circles intersects in exactly two points.
Let a_n be the number of regions formed.

🔄 Key differences

Lines: the nth line crosses n−1 lines at n−1 points, adding n regions.
Circles: the nth circle crosses n−1 circles at 2(n−1) points, adding 2(n−1) regions.
Recurrence for circles: a_n = a_(n−1) + 2(n−1).
Closed form for circles: a_n = n² − n + 2 (to be shown by induction in the exercises).

Don't confuse: lines and circles have different recurrence relations because a circle intersects each other circle in two points, not one.

Derangements

6.5 Derangements

🧭 Overview

🧠 One-sentence thesis

Derangements count the permutations where no element remains in its original position, and they satisfy elegant recurrence relations that allow efficient computation.

📌 Key points (3–5)

What a derangement is: a permutation where every element moves away from its initial position.
How to count them: use recurrence relations rather than brute-force enumeration; two main formulas exist.
Key recurrence: D_n = (n − 1)(D_{n−1} + D_{n−2}) splits the problem by whether the first element swaps with exactly one other element or participates in a larger cycle.
Common confusion: don't confuse "permutation" (any rearrangement) with "derangement" (only rearrangements where no element stays in place).
Why it matters: derangements model real-world scenarios where every item must change position, and they connect to the constant e through the approximation D_n ≈ n!/e.

🔢 What derangements are

🔢 Definition and basic examples

Derangement of a sequence a₁, a₂, ..., aₙ: a permutation such that no element aᵢ appears in its initial position (the i-th spot).

Let D_n be the number of derangements of n distinct objects.
Example: for the letters in MAT (3 letters), only 2 out of 3! = 6 permutations are derangements: ATM and TMA.
- In ATM: M moved from position 1 to 3, A moved from 2 to 1, T moved from 3 to 2—no letter stayed.
- Contrast with MAT itself or AMT, where at least one letter remains in its original spot.
Example: for MATH (4 letters), 9 out of 4! = 24 permutations are derangements.

📊 Small values

n	D_n	Total permutations n!	Interpretation
1	0	1	A single element cannot move away from itself
2	1	2	Only one swap is possible
3	2	6	Two derangements out of six permutations
4	9	24	Nine derangements out of twenty-four
5	44	120	Forty-four derangements
10	1,334,961	3,628,800	Over a million derangements

Don't confuse: D_n counts only the permutations where every element moves; the total number of permutations is n!, which includes cases where some elements stay in place.

🔄 The main recurrence relation

🔄 Formula: D_n = (n − 1)(D_{n−1} + D_{n−2})

For n ≥ 3, the number of derangements satisfies:
- D_n = (n − 1)(D_{n−1} + D_{n−2})
This recurrence splits the problem into two cases based on where the first element goes.

🧩 Proof idea: where does element 1 go?

Setup: consider derangements of the sequence 1, 2, 3, ..., n.

Let c be the position where element 1 appears.
Since 1 cannot stay in position 1 (else not a derangement), there are n − 1 choices for c.

Case 1: elements 1 and c swap spots

Element 1 goes to position c, and element c goes to position 1.
Example: when n = 5 and c = 3, the derangement 34152 has 1 in position 3 and 3 in position 1.
The remaining n − 2 elements (excluding 1 and c) must form a derangement among themselves.
Number of ways: D_{n−2}.

Case 2: elements 1 and c do not swap

Element 1 goes to position c, but element c does not go to position 1.
Example: when n = 5 and c = 3, the derangement 23154 has 1 in position 3 but 3 in position 5 (not position 1).
Now we have n − 1 elements (2, 3, ..., n) to place in n − 1 spots (positions 1, 2, ..., n, excluding position c).
Each element is restricted from exactly one spot:
- Element c cannot go to position 1 (to stay in Case 2).
- Element j (j ≠ c) cannot go to position j (its original spot).
This is exactly a derangement of n − 1 objects.
Number of ways: D_{n−1}.

Combining the cases

Cases 1 and 2 do not overlap.
For a fixed choice of c, the total is D_{n−1} + D_{n−2}.
Since there are n − 1 choices for c, the total number of derangements is (n − 1)(D_{n−1} + D_{n−2}).

📐 Alternative formulas

📐 Second recurrence: D_n = n·D_{n−1} + (−1)ⁿ

For n ≥ 2, another recurrence relation holds:
- D_n = n·D_{n−1} + (−1)ⁿ
This formula is proven by induction using the first recurrence.
It provides a simpler recursive computation when only the previous term is needed.

📐 Closed-form formula

The number of derangements has an explicit formula:
- D_n = n! · (1 − 1/1! + 1/2! − 1/3! + ... + (−1)ⁿ/n!)
This is an alternating sum that resembles the Taylor series for 1/e.
The formula can be derived using the inclusion-exclusion principle (see exercises).

🔍 Connection to e

An interesting fact: D_n is the integer closest to n!/e.
This means the probability that a random permutation is a derangement approaches 1/e ≈ 0.368 as n grows large.
Example: for n = 5, D₅ = 44 and 5!/e ≈ 44.15, so the distance is about 0.15.
Don't confuse: this is an approximation for large n; for small n, the exact values differ slightly.

🧮 Computing derangements

🧮 Using the recurrence

Start with base cases: D₁ = 0, D₂ = 1.
Apply the recurrence D_n = (n − 1)(D_{n−1} + D_{n−2}) iteratively:
- D₃ = 2·(1 + 0) = 2
- D₄ = 3·(2 + 1) = 9
- D₅ = 4·(9 + 2) = 44
- D₆ = 5·(44 + 9) = 265
- And so on.

🧮 Using inclusion-exclusion

The closed-form formula comes from counting permutations that fix at least one element, then taking the complement.
Let W_k be the set of permutations that fix element k (i.e., element k stays in position k).
The union of all W_k is the set of permutations that fix at least one element.
The complement of this union is the set of derangements.
Inclusion-exclusion gives:
- Size of union = sum over all subsets of fixed elements, alternating signs.
- D_n = n! − (size of union) = n! · sum from i=0 to n of (−1)ⁱ/i!.

🧮 Practical example

For n = 4 (the word MATH):
- Total permutations: 4! = 24.
- Using the recurrence: D₄ = 3·(2 + 1) = 9.
- So 9 out of 24 permutations are derangements.
- The 9 derangements are listed: AMHT, ATHM, AHMT, TMHA, THMA, THAM, HMAT, HTMA, HTAM.

The Catalan numbers

6.6 The Catalan numbers

🧭 Overview

🧠 One-sentence thesis

The Catalan numbers form a celebrated combinatorial sequence defined by a simple recursion but counting an astonishing variety of structures—from lattice paths to parenthesizations—all unified by the same underlying pattern.

📌 Key points (3–5)

Definition: The Catalan numbers C₀, C₁, C₂, … start with C₀ = 1 and satisfy the recursion C_{n+1} = C₀·C_n + C₁·C_{n-1} + … + C_n·C₀.
Dyck paths as the key interpretation: C_n counts Dyck paths of length 2n—lattice paths from (0,0) to (n,n) using only up and right steps that stay on or above the main diagonal.
Closed-form formula: C_n = (2n choose n) − (2n choose n+1) = 1/(n+1) · (2n choose n).
Common confusion: Dyck paths vs arbitrary lattice paths—Dyck paths must stay on or above the diagonal at all times, not just reach (n,n).
Remarkable universality: The same numbers count 214 different combinatorial objects, including parenthesizations, polygon triangulations, grid tilings, and binary trees.

🔢 Definition and basic values

🔢 The recursive definition

Catalan numbers: The sequence C₀, C₁, C₂, … with initial value C₀ = 1 and recurrence relation C_{n+1} = C₀·C_n + C₁·C_{n-1} + C₂·C_{n-2} + … + C_n·C₀.

The recursion is a sum of products: each term pairs C_i with C_{n−i}.
This structure hints at "splitting" or "decomposing" a structure of size n+1 into two smaller pieces.

📊 First ten values

Starting from n = 0, the first ten Catalan numbers are:

n	0	1	2	3	4	5	6	7	8	9
C_n	1	1	2	5	14	42	132	429	1430	4862

The sequence grows rapidly but is determined entirely by the recursion.
Example: C₃ = 5 means there are 5 Dyck paths of length 6, 5 ways to parenthesize 4 letters, etc.

🛤️ Dyck paths: the central interpretation

🛤️ What is a Dyck path?

Dyck path of length 2n: A path on the integer lattice grid from (0,0) to (n,n) using only up and right steps that stays on or above the main diagonal.

"On or above the diagonal" means at every point (x,y) on the path, y ≥ x.
The path never dips below the line y = x.
Example: For n = 3 (length 6), there are exactly 5 such paths.

🔤 Dyck words as an alternative representation

A Dyck path can be encoded as a Dyck word: a sequence of n U's (up steps) and n R's (right steps).
Constraint: reading left to right, at any point the number of U's read so far must be at least the number of R's.
Example: The five Dyck words of length 6 are UUURRR, UURURR, UURRUR, URUURR, URURUR.
Don't confuse: any sequence of n U's and n R's is a valid word, but only those satisfying the "U's ≥ R's at all times" condition are Dyck words.

🧩 Proof that Dyck paths satisfy the Catalan recursion

The proof uses a "first return" argument:

Base case: There is exactly one Dyck path of length 0 (the empty path), so D₀ = 1 = C₀.
Recursion: Consider a Dyck path of length 2(n+1). After the first up step, the path eventually returns to the diagonal for the first time at some height i (where 1 ≤ i ≤ n+1).
Counting by first return height:
- The portion from (0,0) to (i,i) is determined by a Dyck path from (0,1) to (i−1,i), giving D_{i−1} choices.
- The remaining portion from (i,i) to (n+1,n+1) is a Dyck path of height n+1−i, giving D_{n+1−i} choices.
- By the multiplication principle, D_{n+1,i} = D_{i−1} · D_{n+1−i}.
Summing over all first return heights: D_{n+1} = D₀·D_n + D₁·D_{n−1} + … + D_n·D₀, which matches the Catalan recursion.
Conclusion: Since D_n satisfies the same recursion and initial condition as C_n, we have D_n = C_n for all n.

🧮 Closed-form formula

🧮 The binomial formula

Theorem: The nth Catalan number is C_n = (2n choose n) − (2n choose n+1) = 1/(n+1) · (2n choose n).

The two expressions in the formula are equivalent.
"(2n choose n)" means the binomial coefficient: the number of ways to choose n items from 2n.
The formula will be proven later in Section 7.6 (not included in this excerpt).
Example: For n = 3, C₃ = 1/4 · (6 choose 3) = 1/4 · 20 = 5.

🔍 Why this formula matters

The recursion is easy to understand but slow to compute for large n.
The closed form allows direct calculation without computing all previous values.
It connects Catalan numbers to binomial coefficients, a fundamental combinatorial object.

🌟 The 214 interpretations

🌟 Why Catalan numbers are celebrated

The excerpt states there are 214 different known combinatorial descriptions of the Catalan numbers.
All these seemingly unrelated counting problems yield the same sequence.
This universality suggests a deep underlying structure in combinatorics.

📋 Seven interpretations when n = 3 (C₃ = 5)

The fact that C₃ = 5 is equivalent to counting:

Interpretation	Description	Example constraint
Parenthesizations	Ways to put parentheses around 4 letters	((xy)z)w, (x(yz))w, (xy)(zw), x((yz)w), x(y(zw))
Polygon triangulations	Ways to divide a convex pentagon into triangles	Non-crossing diagonals
Stair tilings	Ways to tile a stair of height 3 with rectangles	Rectangles fit the stair shape
Grid arrangements	Ways to arrange {1,…,6} in a 2×3 grid	Each row and column increasing
Non-crossing pairings	Ways to pair vertices of a hexagon	Line segments do not intersect
Binary trees	Rooted binary trees with 4 leaves	Every vertex has 0 or 2 children
Balanced sequences	Sequences of 1's and −1's of length 6	All partial sums ≥ 0, total sum = 0

🔗 Connection between interpretations

The balanced sequences (interpretation 7) directly correspond to Dyck words: replace 1 with U and −1 with R.
"Partial sums ≥ 0" translates to "number of U's ≥ number of R's at all times."
Don't confuse: not all sequences with equal 1's and −1's are valid—only those where partial sums never go negative.

🎯 Related counting problems

🎯 Variations on Dyck paths

The exercises explore related problems:

On or below the diagonal: Paths from (0,0) to (n,n) that stay on or below y = x (by symmetry, also counted by C_n).
Strictly above the diagonal: Paths that never touch the diagonal except at endpoints (a different count).
Ballot problem: Counting vote sequences where one candidate is never ahead until the last ballot—this is equivalent to counting certain restricted Dyck-like sequences.

🧩 Generalizing to n = 4

When n = 4, C₄ = 14. The exercises ask to enumerate all 14 instances for each interpretation:

14 Dyck paths of length 8
14 ways to parenthesize 5 letters
14 ways to triangulate a hexagon
14 ways to tile a stair of height 4
14 ways to arrange {1,…,8} in a 2×4 increasing grid
14 non-crossing pairings of an octagon
14 binary trees with 5 leaves
14 balanced sequences of length 8

Don't confuse: each interpretation counts different objects, but the count is always the same Catalan number.

Making change

7.1 Making change

🧭 Overview

🧠 One-sentence thesis

The number of ways to make change for any amount using a given set of coins can be found by multiplying polynomials where exponents encode coin values and coefficients count the number of ways to achieve each total.

📌 Key points (3–5)

Core technique: represent each coin type as a polynomial where exponents are possible cent values, then multiply the polynomials together.
Reading the result: the coefficient of x^N in the expanded product tells you how many ways to make N cents.
Why exponents work: when you multiply polynomials, exponents add, which mirrors adding coin values to reach a total.
Common confusion: the coefficient is not the amount of money—it's the number of different ways to make that amount; the exponent is the amount.
General principle: works for any currency system and any number of each coin type, including unlimited supplies (infinite series).

🪙 The polynomial encoding technique

🪙 How to build the polynomial for one coin type

For each type of coin worth k cents, if you have m of that coin, the polynomial is 1 + x^k + x^(2k) + ··· + x^(mk).

Each term x^(jk) represents using j coins of that type, contributing jk cents.
The constant term 1 = x^0 represents using zero of that coin.
Example: 6 pennies (1 cent each) → 1 + x + x² + x³ + x⁴ + x⁵ + x⁶.
Example: 2 nickels (5 cents each) → 1 + x⁵ + x^10.

🔢 Why multiply the polynomials

You want to combine coins from different types.
When you multiply polynomials, exponents add: x^a · x^b = x^(a+b).
This addition of exponents mirrors adding the cent values from different coin types.
Example: the term x⁶ · x⁵ = x^11 represents 6 cents in pennies plus 5 cents in nickels, totaling 11 cents.

📖 Reading the expanded product

After expanding the product, look at the coefficient of x^N.
That coefficient is the number of distinct ways to make N cents.
Example: in the expansion, 2x⁶ means there are 2 ways to make 6 cents (either 6 pennies + 0 nickels, or 1 penny + 1 nickel).
If a term like x^17 is missing (coefficient 0), there are zero ways to make 17 cents with those coins.

💰 Worked examples

💰 Simple example: 6 pennies and 2 nickels

Pennies: x⁰ + x + x² + x³ + x⁴ + x⁵ + x⁶
Nickels: x⁰ + x⁵ + x^10
Product expands to: x⁰ + x + x² + x³ + x⁴ + 2x⁵ + 2x⁶ + x⁷ + x⁸ + x⁹ + 2x^10 + 2x^11 + x^12 + x^13 + x^14 + x^15 + x^16
The largest exponent is 16 (maximum you can give: 6 pennies + 2 nickels).
The smallest exponent is 0 (give nothing).
The coefficient of x^11 is 2, meaning two ways to give 11 cents: (6 pennies + 5 cents in nickels) or (1 penny + 10 cents in nickels).

💰 More complex example: making change for a dollar

Coins: 6 pennies, 2 nickels, 4 dimes (10 cents each), 3 quarters (25 cents each).
Polynomials:
- Pennies: 1 + x + x² + x³ + x⁴ + x⁵ + x⁶
- Nickels: 1 + x⁵ + x^10
- Dimes: 1 + x^10 + x^20 + x^30 + x^40
- Quarters: 1 + x^25 + x^50 + x^75
Multiply all four and expand (using a computer algebra system like Sage).
The coefficient of x^100 is 5, so there are 5 ways to make 100 cents (one dollar).
The excerpt lists those 5 ways explicitly: (3 quarters + 2 dimes + 1 nickel), (3 quarters + 2 dimes + 5 pennies), etc.

💰 Non-standard currency: Harry Potter example

Currency: 1 Knut (base unit), 1 Sickle = 29 Knuts, 1 Galleon = 493 Knuts.
Coins: 3 Knuts, 2 Sickles, 2 Galleons.
Polynomials:
- Knuts: 1 + x + x² + x³
- Sickles: 1 + x^29 + x^58
- Galleons: 1 + x^493 + x^986
After expansion, every coefficient is 1, meaning exactly one way to make each amount that appears (1047, 1046, 1045, 1044, 1018, ...).
No ways to make amounts whose exponents do not appear in the expansion.

🧮 General theorem and proof

🧮 Statement of the theorem

Theorem 7.1.5: Suppose there are m types of coins of values k₁, k₂, ..., k_m, and you have n₁, n₂, ..., n_m of each type respectively. The number of ways to make change for a total value of N is the coefficient of x^N in the product (1 + x^(k₁) + x^(2k₁) + ··· + x^(n₁k₁)) · (1 + x^(k₂) + x^(2k₂) + ··· + x^(n₂k₂)) ··· (1 + x^(k_m) + x^(2k_m) + ··· + x^(n_m k_m)).

🧮 Why the theorem works (proof sketch)

When you multiply one term from each factor, you get x^(a₁k₁) · x^(a₂k₂) ··· x^(a_m k_m) = x^(a₁k₁ + a₂k₂ + ··· + a_m k_m), where each aᵢ ≤ nᵢ.
The exponent a₁k₁ + a₂k₂ + ··· + a_m k_m equals N exactly when using a₁ coins of type 1, a₂ coins of type 2, etc., adds up to N cents.
Each such product corresponds to one way of making change for N.
The number of times x^N appears in the expansion equals the number of ways to make change for N.
Don't confuse: each individual product of terms gives one way; the sum of all those products (collected as the coefficient) counts all the ways.

🔄 Extension to unlimited coins (generating functions)

🔄 From finite to infinite

If a cashier has an unlimited supply of each coin, the polynomials become infinite series.
Example for unlimited pennies: 1 + x + x² + x³ + ···
Example for unlimited nickels: 1 + x⁵ + x^10 + x^15 + ···
These infinite series are called generating functions.

🔄 Definition of generating function

Generating function of the sequence a₀, a₁, a₂, ... is the series a₀ + a₁x + a₂x² + a₃x³ + ···, written as the sum from i=0 to infinity of aᵢxⁱ.

The variable x is just a formal symbol; the coefficients are what matter.
Any polynomial can be viewed as a generating function of a sequence that is eventually zero.
Example: the sequence 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, ... has generating function 1 + x + x² + x³ + x⁴ + x⁵ + x⁶ (the polynomial for 6 pennies).

Generating Functions

7.2 Generating functions

🧭 Overview

🧠 One-sentence thesis

Generating functions encode sequences as infinite series where the coefficients (not the function values) are the focus, enabling combinatorial problems like counting coin-change combinations to be solved through algebraic manipulation.

📌 Key points (3–5)

What a generating function is: an infinite series where each coefficient represents a term in a sequence, written as a₀ + a₁x + a₂x² + a₃x³ + ···
The variable x is formal: we don't substitute numerical values into x; it's just a "clothesline" to display the sequence of coefficients
Polynomials are special cases: any polynomial is a generating function of a sequence that eventually becomes all zeros
Common confusion: generating functions look like functions but aren't used as functions—we study the coefficients, not outputs from plugging in x values
Connection to combinatorics: multiplying generating functions (like for coins) counts combinations, similar to how polynomial multiplication worked for finite supplies

🎯 What generating functions are

🎯 The basic definition

Generating function: the series sum from i=0 to infinity of aᵢxⁱ = a₀ + a₁x + a₂x² + a₃x³ + ··· for a sequence a₀, a₁, a₂, ...

The sequence provides the coefficients
The variable x is raised to increasing powers
Example: the sequence 1, 2, 3, 4, 5, ... becomes 1 + 2x + 3x² + 4x³ + ···

🔢 How polynomials fit in

Any polynomial can be viewed as a generating function of a sequence that ends in infinitely many zeros
Example: the sequence 1, 1, 3, 0, 0, 0, ... gives 1 + x + 3x² + 0x³ + 0x⁴ + ··· = 1 + x + 3x²
This connects finite counting problems (limited coin supplies) to infinite ones (unlimited supplies)

🪙 From coin problems to infinite series

🪙 The motivation: unlimited coins

When a cashier has an unlimited supply of each coin type, the polynomials from finite problems become infinite:

Pennies: 1 + x + x² + x³ + ···
Nickels: 1 + x⁵ + x¹⁰ + x¹⁵ + ···
Dimes: 1 + x¹⁰ + x²⁰ + x³⁰ + ···
Quarters: 1 + x²⁵ + x⁵⁰ + x⁷⁵ + ···

🔗 Recording which positions are nonzero

The polynomial 1 + x + x² + x³ + x⁴ + x⁵ + x⁶ encodes the sequence 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, ... (six pennies available)
The polynomial 1 + x⁵ + x¹⁰ encodes 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, ... (ones in positions 0, 5, and 10 for two nickels)
The exponents show which amounts are possible; the coefficient shows how many ways

📚 Key examples

📚 Fibonacci sequence

The Fibonacci sequence 1, 1, 2, 3, 5, 8, ... has generating function 1 + x + 2x² + 3x³ + 5x⁴ + 8x⁵ + ···
The excerpt notes that by the end of the chapter, generating functions will be used to find an explicit formula for Fibonacci numbers

📚 Binomial coefficients

The sequence (n choose 0), (n choose 1), (n choose 2), ..., (n choose n) has generating function:

(1 + x)ⁿ = (n choose 0) + (n choose 1)x + (n choose 2)x² + ··· + (n choose n)xⁿ

Combinatorial interpretation:

n people each hold a card showing 0 or 1 (or equivalently, choose monomial 1 or x)
The coefficient of xᵏ in (1 + x)ⁿ counts ways to choose exactly k people (those who chose x)
This equals (n choose k)

🧵 The "clothesline" perspective

🧵 Not actually functions

The notation A(x) = a₀ + a₁x + a₂x² + ··· is just convenient shorthand
We do not substitute numerical values for x
If we tried, the infinite series might not even converge for certain x values

🧵 Why the metaphor matters

Quote from Herbert Wilf: "A generating function is a clothesline on which we hang up a sequence of numbers up for display."

The coefficients (the sequence a₀, a₁, a₂, ...) are what we're studying
The powers of x just organize and display them
We manipulate the series algebraically to learn about the coefficients

🧵 Variable choice is flexible

Any variable works: the sequence 1, 2, 3, 4, 5, ... can be written as:
- 1 + 2x + 3x² + 4x³ + ··· (in variable x)
- 1 + 2y + 3y² + 4y³ + ··· (in variable y)
- The variable is just a placeholder

➕ Operations on generating functions

➕ Addition of generating functions

Addition combines corresponding coefficients:

A(x) + B(x) = (a₀ + b₀) + (a₁ + b₁)x + (a₂ + b₂)x² + ···
Formula: sum from n=0 to infinity of (aₙ + bₙ)xⁿ

How it works:

Line up terms with the same power of x
Add the coefficients in each position
Example structure: (a₀ + a₁x + a₂x² + ···) + (b₀ + b₁x + b₂x² + ···) = (a₀ + b₀) + (a₁ + b₁)x + (a₂ + b₂)x² + ···

🔄 Other operations mentioned

The excerpt notes four main operations will be covered:

Addition (defined above)
Multiplication
Substitution of a monomial
Differentiation

(The excerpt cuts off before fully explaining multiplication and the others)

Operations on generating functions

7.3 Operations on generating functions

🧭 Overview

🧠 One-sentence thesis

Generating functions can be manipulated through addition, multiplication, substitution of monomials, and differentiation—operations that mirror polynomial arithmetic and calculus—to produce closed-form formulas and solve counting problems.

📌 Key points (3–5)

Four main operations: addition (add coefficients term-by-term), multiplication (use convolution of coefficients), substitution of monomials (replace x with ax^n), and differentiation (apply the power rule term-by-term).
Geometric series identity: the infinite series 1 + x + x² + x³ + ... equals the closed form 1/(1 − x), which is the foundation for many other identities.
Multiplication via convolution: the coefficient of x^n in A(x)B(x) is the sum a₀b_n + a₁b_(n−1) + ... + a_n b₀, called the n-th convolution.
Common confusion: generating functions are formal power series, not actual functions—do not substitute constants or non-monomial expressions without caution; composition G(F(x)) only makes sense when F(x) has no constant term.
Calculus rules still apply: despite being formal objects, generating functions obey the sum rule, product rule, quotient rule, and chain rule for monomials.

➕ Addition and subtraction

➕ How addition works

Addition of generating functions: add corresponding coefficients term-by-term.

If A(x) = sum of a_n x^n and B(x) = sum of b_n x^n, then A(x) + B(x) = sum of (a_n + b_n) x^n.
Visually: (a₀ + a₁x + a₂x² + ...) + (b₀ + b₁x + b₂x² + ...) = (a₀ + b₀) + (a₁ + b₁)x + (a₂ + b₂)x² + ...
Example: adding 1 + x + x² + x³ + ... and 1 − x + x² − x³ + ... gives 2 + 2x² + 2x⁴ + 2x⁶ + ...

➖ Subtraction

Subtraction works the same way: subtract corresponding coefficients.
Any generating function minus itself equals 0 (the generating function of the sequence 0, 0, 0, ...).

✖️ Multiplication and convolution

✖️ How multiplication works

Multiplication of generating functions: multiply every term from the left factor with every term from the right factor, then collect terms with the same power of x.

The coefficient of x^n in A(x)B(x) is a₀b_n + a₁b_(n−1) + a₂b_(n−2) + ... + a_n b₀.
This sum is called the n-th convolution of the sequences a₀, a₁, ... and b₀, b₁, ...
The excerpt organizes products in a table where down-and-left diagonals have the same exponent of x.

🧮 Example: squaring the sequence 1, 2, 3, 4, ...

The generating function is 1 + 2x + 3x² + 4x³ + ...
Squaring it: (1 + 2x + 3x² + ...)(1 + 2x + 3x² + ...)
Collect terms along diagonals:
- Constant term: 1
- Coefficient of x: 2 + 2 = 4
- Coefficient of x²: 3 + 4 + 3 = 10
- Coefficient of x³: 4 + 6 + 6 + 4 = 20
Result: 1 + 4x + 10x² + 20x³ + ...

🔄 Multiplying by x or x^k

Multiplying a generating function by x shifts all coefficients one position to the right.
If A(x) is the generating function of a₀, a₁, a₂, ..., then xA(x) has coefficients 0, a₀, a₁, a₂, ...
In summation notation: x · (sum of a_n x^n) = sum of a_n x^(n+1) = sum (starting at m=1) of a_(m−1) x^m.
Example: multiplying the geometric series by x gives x + x² + x³ + ... = x/(1 − x).
Multiplying by 2x² gives 2x² + 2x³ + 2x⁴ + ... = 2x²/(1 − x).

🔑 The geometric series and closed forms

🔑 Geometric series identity

Geometric series: 1 + x + x² + x³ + ... = 1/(1 − x).

Proof idea: multiply (1 + x + x² + ...) by (1 − x). The coefficient of x⁰ is 1, and all other coefficients are 1 − 1 = 0, so the product equals 1.
This is the most important generating function identity.

📐 Closed form formulas

A closed form formula is a formula that does not involve an infinite summation.
The geometric series has closed form 1/(1 − x).
Closed forms are useful for computing coefficients and solving problems.

🔄 Related identities

Alternating series: 1 − x + x² − x³ + x⁴ − ... = 1/(1 + x).
Squaring the geometric series: (1 + x + x² + ...)² = 1 + 2x + 3x² + 4x³ + ... = 1/(1 − x)².
- This gives a closed form for the sequence 1, 2, 3, 4, ...
- Can be derived either by multiplication (convolution) or by differentiation (see below).

🔀 Substitution of monomials

🔀 What substitution means

Monomial: a term of the form ax^n where a is a number and n is a positive integer (the degree).

You may substitute a monomial of positive degree into a generating function identity to produce new identities.
Example: starting with G(x) = 1 + x + x² + x³ + ... = 1/(1 − x), substitute 2x for x:
- G(2x) = 1 + 2x + (2x)² + (2x)³ + ... = 1 + 2x + 4x² + 8x³ + ...
- Closed form: 1/(1 − 2x).

🧩 Another example: even powers

To find a closed form for 1 + x² + x⁴ + x⁶ + ..., substitute x² for x in the geometric series:
- G(x²) = 1/(1 − x²).

⚠️ Warning: do not substitute constants or non-monomials

Common confusion: do not substitute constants (like y = 1) or expressions that are not monomials (like y = 1 + x) into generating functions without caution.
The composition G(F(x)) only makes sense when F(x) has no constant term.
Example of what goes wrong: substituting y = 1 + x into 1 + y + y² + ... would produce an infinite series in each term, which is not well-defined.

🧮 Differentiation

🧮 How differentiation works

Differentiation of generating functions: apply the power rule term-by-term.

Definition: d/dx (sum of a_n x^n) = sum (starting at n=1) of n·a_n x^(n−1) = sum (starting at n=0) of (n+1)·a_(n+1) x^n.
Visually: d/dx (a₀ + a₁x + a₂x² + a₃x³ + ...) = a₁ + 2a₂x + 3a₃x² + 4a₄x³ + ...
The constant term disappears, and each coefficient is multiplied by its original exponent.

📏 Calculus rules apply

Even though generating functions are formal power series (not actual functions), all the usual differentiation rules from calculus still hold:
- Sum rule: d/dx (F(x) + G(x)) = d/dx F(x) + d/dx G(x)
- Product rule: d/dx (F(x)G(x)) = G(x)·d/dx F(x) + F(x)·d/dx G(x)
- Quotient rule: if G(x) has a nonzero constant term, then d/dx (F(x)/G(x)) = (G(x)·d/dx F(x) − F(x)·d/dx G(x)) / G(x)²
- Chain rule for monomials: if H(x) = d/dx F(x) and ax^n is a monomial, then d/dx F(ax^n) = ax^(n−1)·H(ax^n).

🔍 Example: deriving 1/(1 − x)²

Start with the geometric series: 1 + x + x² + x³ + ... = 1/(1 − x).
Differentiate both sides: 1 + 2x + 3x² + 4x³ + ... = d/dx (1/(1 − x)) = 1/(1 − x)².
This gives another proof of the closed form for the sequence 1, 2, 3, 4, ... (previously found by multiplication).

🧪 Example: the exponential series

Define E(x) = 1 + x + (1/2!)x² + (1/3!)x³ + ... (the generating function for the sequence 1, 1, 1/2, 1/6, ...).
Taking the derivative: d/dx E(x) = 1 + x + (1/2!)x² + ... = E(x).
This makes E(x) behave like the function e^x (whose derivative is itself).

📊 Summary table of operations

Operation	How it works	Example
Addition	Add corresponding coefficients	(1 + x + x² + ...) + (1 − x + x² − ...) = 2 + 2x² + 2x⁴ + ...
Multiplication	Convolution: coefficient of x^n is sum of a_k·b_(n−k) for k = 0 to n	(1 + x + x² + ...)² = 1 + 2x + 3x² + 4x³ + ...
Substitution	Replace x with a monomial ax^n	Substitute x² into 1/(1 − x) to get 1/(1 − x²)
Differentiation	Apply power rule term-by-term	d/dx (1 + x + x² + ...) = 1 + 2x + 3x² + ...

🔗 How operations connect

Multiplication and differentiation can both be used to find the same closed form (e.g., 1/(1 − x)² for the sequence 1, 2, 3, ...).
Substitution extends basic identities (like the geometric series) to new sequences.
All operations preserve the algebraic structure, allowing systematic manipulation of generating functions.

Solving recurrences with generating functions

7.4 Solving recurrences with generating functions

🧭 Overview

🧠 One-sentence thesis

Generating functions transform recursive sequence definitions into algebraic equations that can be solved to yield explicit formulas for the sequence terms.

📌 Key points (3–5)

Three-step method: define the generating function, solve for it using the recurrence relation, then expand to find the coefficient formula.
Key algebraic tools: geometric series formula, partial fractions, and derivatives of power series are used to manipulate and expand generating functions.
Common confusion: the summation index often starts at n=1 after multiplying by x, so you must adjust by subtracting and adding back the a₀ term to match the full generating function.
Why it works: matching two different expansions of the same generating function lets you equate coefficients, revealing the explicit formula for the sequence.
Applications: the method solves both simple linear recurrences and more complex problems like Tower of Hanoi and plane-region counting.

🔧 The three-step method

🔧 Step 1: Define the generating function

Generating function for a sequence a₀, a₁, a₂, ...: A(x) = a₀ + a₁x + a₂x² + ⋯ = sum from n=0 to infinity of aₙxⁿ.

You assign a formal power series to the recursively defined sequence.
This step is purely definitional—no computation yet.
Example: for the sequence defined by a₀=3 and aₙ=2aₙ₋₁, you write A(x) = a₀ + a₁x + a₂x² + a₃x³ + ⋯.

🔧 Step 2: Solve for the generating function

Use the recurrence relation to create an equation that A(x) satisfies.
Key technique: multiply A(x) by x to shift indices—the coefficient of xⁿ in xA(x) is aₙ₋₁ for n≥1.
Index adjustment: after multiplying by x, the sum starts at n=1, so subtract and add a₀ to align with the full A(x).
Solve the resulting algebraic equation to get a closed form for A(x) in terms of x.

Example from the excerpt:

For aₙ = 2aₙ₋₁, multiplying by 2 gives: 2xA(x) = sum from n=1 to infinity of 2aₙ₋₁xⁿ = sum from n=1 to infinity of aₙxⁿ.
Adjust: 2xA(x) = -a₀ + a₀ + sum from n=1 to infinity of aₙxⁿ = -3 + A(x).
Solve: (2x - 1)A(x) = -3, so A(x) = 3/(1 - 2x).

🔧 Step 3: Expand to compute coefficients

Expand the closed form A(x) as a power series using known formulas (geometric series, derivatives, partial fractions).
Match the expansion sum from n=0 to infinity of (formula)xⁿ with sum from n=0 to infinity of aₙxⁿ.
Since the two series are equal, their coefficients must match: aₙ equals the formula.

Example from the excerpt:

A(x) = 3/(1 - 2x) = 3 · sum from n=0 to infinity of (2x)ⁿ = sum from n=0 to infinity of 3·2ⁿxⁿ.
Therefore aₙ = 3·2ⁿ for all n.

🏗️ Worked examples

🏗️ Tower of Hanoi

Recurrence: H₁=1, Hₙ = 2Hₙ₋₁ + 1 for n≥2; set H₀=0 so the recurrence holds for all n≥1.
Step 1: H(x) = H₀ + H₁x + H₂x² + H₃x³ + ⋯.
Step 2: substitute the recurrence into the coefficients:
- H(x) = H₀ + (2H₀ + 1)x + (2H₁ + 1)x² + (2H₂ + 1)x³ + ⋯.
- Split into two sums: H(x) = 2x(H₀ + H₁x + H₂x² + ⋯) + (x + x² + x³ + ⋯).
- Simplify: H(x) = 2xH(x) + x/(1 - x).
- Solve: H(x)(1 - 2x) = x/(1 - x), so H(x) = x/[(1 - x)(1 - 2x)].
Step 3: use partial fractions to write x/[(1 - x)(1 - 2x)] = a/(1 - x) + b/(1 - 2x).
- Finding a common denominator shows a = -1 and b = 1.
- H(x) = 1/(1 - 2x) - 1/(1 - x) = sum from n=0 to infinity of 2ⁿxⁿ - sum from n=0 to infinity of xⁿ = sum from n=0 to infinity of (2ⁿ - 1)xⁿ.
- Therefore Hₙ = 2ⁿ - 1 for all n.

🏗️ Regions of the plane

Recurrence: P₀=1, Pₙ = Pₙ₋₁ + n (number of regions cut by n lines, no two parallel, no three concurrent).
Step 1: P(x) = sum from n=0 to infinity of Pₙxⁿ.
Step 2: use the recurrence:
- Multiply P(x) by x to get xP(x) = sum from n=1 to infinity of Pₙ₋₁xⁿ.
- The generating function for n is sum from n=1 to infinity of nxⁿ = x/(1 - x)² (from Example 7.3.7).
- Add: sum from n=1 to infinity of (Pₙ₋₁ + n)xⁿ = xP(x) + x/(1 - x)².
- This equals sum from n=1 to infinity of Pₙxⁿ = P(x) - P₀.
- Solve: P(x) - xP(x) = 1 + x/(1 - x)², so P(x)(1 - x) = 1 + x/(1 - x)², giving P(x) = 1/(1 - x) + x/(1 - x)³.
Step 3: expand both terms:
- 1/(1 - x) = sum from n=0 to infinity of xⁿ (geometric series).
- For x/(1 - x)³, take the derivative of 1/(1 - x)² = sum from n=0 to infinity of (n+1)xⁿ:
  - d/dx[(1 - x)⁻²] = 2/(1 - x)³ = sum from n=1 to infinity of n(n+1)xⁿ⁻¹.
  - So 1/(1 - x)³ = (1/2)sum from n=1 to infinity of n(n+1)xⁿ⁻¹.
  - Multiply by x: x/(1 - x)³ = sum from n=1 to infinity of [n(n+1)/2]xⁿ = sum from n=0 to infinity of [n(n+1)/2]xⁿ (since the n=0 term is 0).
- Combine: P(x) = sum from n=0 to infinity of [1 + n(n+1)/2]xⁿ = sum from n=0 to infinity of [(n² + n + 2)/2]xⁿ.
- Therefore Pₙ = (n² + n + 2)/2 for all n.

🧰 Key algebraic tools

🧰 Geometric series formula

Formula: 1/(1 - x) = 1 + x + x² + x³ + ⋯ = sum from n=0 to infinity of xⁿ.
Monomial substitution: replace x with cx to get 1/(1 - cx) = sum from n=0 to infinity of cⁿxⁿ.
Example: 1/(1 - 2x) = sum from n=0 to infinity of 2ⁿxⁿ.
This is the most frequently used expansion in step 3.

🧰 Partial fractions

Partial fractions: a method to decompose a rational function into a sum of simpler fractions.

Used when the closed form A(x) is a ratio of polynomials with a factored denominator.
Example from Tower of Hanoi: x/[(1 - x)(1 - 2x)] = a/(1 - x) + b/(1 - 2x).
Find constants a and b by adding the fractions back together (common denominator) and matching numerators.
Each simpler fraction can then be expanded using the geometric series.

🧰 Derivatives of power series

Technique: differentiate both sides of a known generating function identity to obtain new identities.
Example from the excerpt:
- Start with 1/(1 - x)² = sum from n=0 to infinity of (n+1)xⁿ.
- Differentiate: d/dx[(1 - x)⁻²] = 2/(1 - x)³ = sum from n=1 to infinity of n(n+1)xⁿ⁻¹.
- Rearrange: 1/(1 - x)³ = (1/2)sum from n=1 to infinity of n(n+1)xⁿ⁻¹.
- Multiply by x: x/(1 - x)³ = sum from n=0 to infinity of [n(n+1)/2]xⁿ.
This tool is essential for handling generating functions with higher powers in the denominator.

🔍 Common pitfalls and tips

🔍 Index alignment

The issue: after multiplying A(x) by x, the sum starts at n=1 instead of n=0.
The fix: subtract and add the a₀ term to rewrite the sum starting at n=0, matching the definition of A(x).
Example: 2xA(x) = sum from n=1 to infinity of aₙxⁿ = -a₀ + a₀ + sum from n=1 to infinity of aₙxⁿ = -a₀ + A(x).
Don't confuse: the sum from n=1 to infinity of aₙxⁿ is not equal to A(x); it equals A(x) - a₀.

🔍 Concrete handle tip

Recommendation: compute the first few terms of the sequence using the recurrence before manipulating the generating function.
Example: for a₀=3, aₙ=2aₙ₋₁, compute a₁=6, a₂=12, a₃=24.
Write out: A(x) = 3 + 6x + 12x² + 24x³ + ⋯ and 2xA(x) = 6x + 12x² + 24x³ + ⋯.
Lining up terms visually shows that 2xA(x) - A(x) = 3, making the algebra more intuitive.

🔍 Notation flexibility

The excerpt shows two styles: using "⋯" (dot-dot-dot) notation and using summation (∑) notation.
Both are valid; the dot-dot-dot style can be clearer for beginners because you see the pattern in the first few terms.
The summation style is more formal and precise, especially when adjusting indices.

📊 Linear recurrences preview

📊 General linear recurrence form

The excerpt introduces solving linear recurrences at the end (section 7.5).
Example recurrence: a₀=0, a₁=1, aₙ = 5aₙ₋₁ - 6aₙ₋₂ for n≥2.
The first several terms are 0, 1, 5, 19, 89, ⋯.
Lemma result: the explicit formula is aₙ = 3ⁿ - 2ⁿ for all n.
The proof follows the same three-step method, using the recurrence to derive a closed form for A(x), then expanding via partial fractions.

📊 Why generating functions work for recurrences

The recurrence relation gives a rule for each coefficient in terms of previous coefficients.
Multiplying the generating function by x (or x², etc.) shifts the indices, aligning terms so the recurrence can be applied.
The result is an algebraic equation in A(x) that can be solved using standard algebra (factoring, partial fractions).
Expanding the solution back into a power series reveals the explicit formula for the coefficients.

Budget: 1000000 tokens remaining. Continue with more examples or exercises if needed.

Generating functions and linear recurrences

7.5 Generating functions and linear recurrences

🧭 Overview

🧠 One-sentence thesis

Generating functions provide a systematic method to solve linear recurrences by converting the recurrence relation into an algebraic equation, factoring, and using partial fractions to extract an explicit formula for the sequence.

📌 Key points (3–5)

Core technique: multiply the generating function by the reverse of the characteristic polynomial to eliminate most coefficients, leaving only initial terms.
Partial fractions decomposition: after solving for the generating function in closed form, split it into simpler fractions that can be expanded as geometric series.
Extracting the formula: the coefficient of x to the power n in the expanded generating function gives the explicit formula for the n-th term.
Common confusion: the "reverse" characteristic polynomial (1 minus c₁x minus c₂x² etc.) is used, not the original characteristic polynomial (x² minus c₁x minus c₂ etc.).
General result: when all roots of the characteristic polynomial are distinct, the solution is always a linear combination of powers of those roots.

🔧 The generating function method

🔧 Setting up the generating function

Generating function A(x) for a sequence a₀, a₁, a₂, ... is the infinite series A(x) = a₀ + a₁x + a₂x² + ... = sum from n=0 to infinity of aₙxⁿ.

The generating function encodes the entire sequence as coefficients of a power series.
Example: for the sequence 0, 1, 5, 19, 89, ... the generating function starts as 0 + x + 5x² + 19x³ + 89x⁴ + ...

🔄 Using the recurrence relation

Substitute the recurrence relation into the infinite sum starting from the appropriate index.
For aₙ = 5aₙ₋₁ - 6aₙ₋₂ (n ≥ 2), write A(x) = a₀ + a₁x + sum from n=2 to infinity of (5aₙ₋₁ - 6aₙ₋₂)xⁿ.
Re-index the sums and factor out powers of x to express everything in terms of A(x) itself.
This produces an algebraic equation relating A(x) to itself.

⚙️ The reverse characteristic polynomial trick

Reverse characteristic polynomial: obtained by reversing the coefficients of the characteristic polynomial; for aₙ - c₁aₙ₋₁ - c₂aₙ₋₂ - ... - cₐaₙ₋ₐ = 0, it is 1 - c₁x - c₂x² - ... - cₐxᵈ.

The excerpt shows two equivalent approaches:
1. Manipulate the generating function algebraically until you isolate A(x).
2. Directly multiply A(x) by the reverse characteristic polynomial.
Why this works: multiplying by (1 - c₁x - c₂x² - ... - cₐxᵈ) causes coefficients to cancel for n ≥ d, leaving only the initial terms.
Example: for aₙ = 5aₙ₋₁ - 6aₙ₋₂, the characteristic polynomial is x² - 5x + 6, so the reverse is 1 - 5x + 6x².
Don't confuse: the reverse polynomial has the constant term first (1 - ...), not the highest power first.

🧮 Solving for the explicit formula

🧮 Factoring and partial fractions

After solving for A(x), factor the denominator into linear factors (1 - r₁x)(1 - r₂x)...(1 - rₐx), where r₁, r₂, ..., rₐ are the roots of the characteristic polynomial.
Use partial fractions to decompose A(x) into a sum: A(x) = z₁/(1 - r₁x) + z₂/(1 - r₂x) + ... + zₐ/(1 - rₐx).
Solve for the constants z₁, z₂, ..., zₐ by combining fractions and comparing coefficients.

Example from the excerpt:

For A(x) = x/((1 - 2x)(1 - 3x)), write A(x) = r/(1 - 2x) + s/(1 - 3x).
Combine to get (r + s - (3r + 2s)x)/((1 - 2x)(1 - 3x)) = x/((1 - 2x)(1 - 3x)).
Equate numerators: r + s = 0 and 3r + 2s = -1, giving r = -1 and s = 1.

📐 Expanding as geometric series

Each fraction 1/(1 - rx) expands as the geometric series 1 + rx + r²x² + r³x³ + ... = sum from n=0 to infinity of rⁿxⁿ.
Multiply by the constant from partial fractions: z/(1 - rx) = sum from n=0 to infinity of z·rⁿxⁿ.
Add all the expanded series together.
The coefficient of xⁿ in the final sum is the explicit formula for aₙ.

Example: A(x) = -1/(1 - 2x) + 1/(1 - 3x) expands to sum from n=0 to infinity of (3ⁿ - 2ⁿ)xⁿ, so aₙ = 3ⁿ - 2ⁿ.

🌟 Worked examples

🌟 Example: aₙ = 5aₙ₋₁ - 6aₙ₋₂

Initial conditions: a₀ = 0, a₁ = 1.
Sequence: 0, 1, 5, 19, 89, ...
After manipulation, A(x) = x/(1 - 5x + 6x²) = x/((1 - 2x)(1 - 3x)).
Partial fractions: A(x) = -1/(1 - 2x) + 1/(1 - 3x).
Explicit formula: aₙ = 3ⁿ - 2ⁿ for all n.

🌟 Example: Fibonacci numbers

Initial conditions: F₀ = 0, F₁ = 1, recurrence Fₙ = Fₙ₋₁ + Fₙ₋₂ for n ≥ 2.
Generating function: G(x) = sum from n=0 to infinity of Fₙxⁿ.
Multiply by reverse characteristic polynomial (1 - x - x²): G(x)(1 - x - x²) = x.
Solve: G(x) = x/(1 - x - x²).
Factor denominator: (1 - ax)(1 - bx) where a = (1 + √5)/2 and b = (1 - √5)/2.
Partial fractions: G(x) = (1/√5)/(1 - ax) - (1/√5)/(1 - bx).
Explicit formula: Fₙ = (1/√5)·((1 + √5)/2)ⁿ - (1/√5)·((1 - √5)/2)ⁿ.

📜 General theorem for distinct roots

📜 Statement of the theorem

Theorem 7.5.5: If r₁, ..., rₐ are the distinct roots of the characteristic polynomial of the recurrence aₙ = c₁aₙ₋₁ + c₂aₙ₋₂ + ... + cₐaₙ₋ₐ, then there exist unique constants z₁, ..., zₐ such that aₙ = z₁r₁ⁿ + z₂r₂ⁿ + ... + zₐrₐⁿ for all n ≥ 1.

The theorem applies only when all roots are different (distinct).
The constants z₁, ..., zₐ are determined uniquely by the initial conditions.

📜 Proof outline

Define the reverse polynomial r(x) = 1 - c₁x - c₂x² - ... - cₐxᵈ.
Multiply the generating function A(x) by r(x); all coefficients from xᵈ onward cancel, leaving r(x)A(x) = p(x) for some polynomial p(x).
Show that r(x) factors as (1 - r₁x)(1 - r₂x)...(1 - rₐx) by relating it to the characteristic polynomial c(x) = (x - r₁)(x - r₂)...(x - rₐ).
Key identity: r(x) = xᵈ·c(1/x), which after simplification gives the factorization.
Use partial fractions to decompose A(x) = p(x)/r(x) into z₁/(1 - r₁x) + ... + zₐ/(1 - rₐx).
Expand each term as a geometric series; the coefficient of xⁿ is z₁r₁ⁿ + ... + zₐrₐⁿ.

📜 Why the reverse polynomial works

The characteristic polynomial c(x) = xᵈ - c₁xᵈ⁻¹ - ... - cₐ has roots r₁, ..., rₐ.
The reverse polynomial r(x) = 1 - c₁x - c₂x² - ... - cₐxᵈ is related by r(x) = xᵈ·c(1/x).
Substituting c(x) = (x - r₁)...(x - rₐ) into this identity and simplifying yields r(x) = (1 - r₁x)...(1 - rₐx).
This factorization is essential for partial fractions decomposition.

The Catalan numbers

7.6 The Catalan numbers

🧭 Overview

🧠 One-sentence thesis

The Catalan numbers can be derived using generating functions by recognizing that their recursive definition corresponds to squaring the generating function, leading to an explicit closed formula.

📌 Key points (3–5)

Recursive definition: Each Catalan number is defined by a sum involving products of earlier Catalan numbers: C_{n+1} = C_0·C_n + C_1·C_{n-1} + ... + C_n·C_0, with C_0 = 1.
Generating function approach: The recursion translates into the equation C(x)² = (C(x) - C_0)/x, which becomes a quadratic equation solvable by the quadratic formula.
Generalized binomial coefficients: To expand the square root in the solution, the text extends binomial coefficients to rational exponents, not just positive integers.
Common confusion: When solving C(x) = (1 ± √(1-4x))/(2x), you must choose the negative sign so the constant term cancels and all terms are divisible by 2x.
Explicit formula: The final result is C_n = (1/(n+1)) · (2n choose n), derived by expanding the generating function and extracting coefficients.

🔢 The Catalan recursion and generating function setup

🔢 Recursive definition

The Catalan numbers are defined by the recursion C_{n+1} = C_0·C_n + C_1·C_{n-1} + C_2·C_{n-2} + ... + C_n·C_0, with initial condition C_0 = 1.

This means each new Catalan number is built from products of all earlier pairs.
Example: C_1 = C_0·C_0, C_2 = C_0·C_1 + C_1·C_0, C_3 = C_0·C_2 + C_1·C_1 + C_2·C_0.

📊 Defining the generating function

The generating function is:

C(x) = C_0 + C_1·x + C_2·x² + C_3·x³ + ...
This packages all Catalan numbers into a single formal power series.

🧮 Solving for the generating function

🧮 Recognizing the product structure

When you multiply C(x) by itself:

C(x)² = C(x) · C(x) = C_0·C_0 + (C_0·C_1 + C_1·C_0)·x + (C_0·C_2 + C_1·C_1 + C_2·C_0)·x² + ...
The coefficient of x^n in C(x)² exactly matches the right-hand side of the recursion for C_{n+1}.
Therefore: C(x)² = C_1 + C_2·x + C_3·x² + C_4·x³ + ...

🔧 Manipulating to isolate C(x)

To align subscripts with powers of x:

Multiply both sides by x: x·C(x)² = C_1·x + C_2·x² + C_3·x³ + ...
The right side is almost C(x), but missing the C_0 term.
Add C_0 to both sides: x·C(x)² + C_0 = C_0 + C_1·x + C_2·x² + ... = C(x).
Since C_0 = 1, this becomes: x·C(x)² + 1 = C(x), or x·C(x)² - C(x) + 1 = 0.

🎯 Applying the quadratic formula

Treating this as a quadratic in C(x):

C(x) = (1 ± √(1 - 4x)) / (2x)
Which sign to choose? Use the negative sign (minus) so that the constant term 1 in the numerator cancels and all terms become divisible by 2x.
Don't confuse: both signs solve the quadratic algebraically, but only the minus sign gives a valid power series starting with C_0 = 1.

🔬 Expanding the square root

🔬 Interpreting √(1 - 4x) as a generating function

To find coefficients, think of √(1 - 4x) as "the generating function whose square is 1 - 4x."

Write (a_0 + a_1·x + a_2·x² + ...)² = 1 - 4x.
Expand the left side: a_0² + (a_0·a_1 + a_1·a_0)·x + (a_0·a_2 + a_1·a_1 + a_2·a_0)·x² + ... = 1 - 4x.
Match coefficients term by term:
- From x⁰: a_0² = 1, so a_0 = 1 (take the positive root by convention).
- From x¹: 2·a_0·a_1 = -4, so a_1 = -2.
- From x²: 2·a_0·a_2 + a_1² = 0, so a_2 = -2.
Each coefficient a_n is uniquely determined by earlier coefficients.

📐 Generalized binomial coefficients

Definition: For any rational number α and integer k, define (α choose k) = α(α-1)(α-2)···(α-k+1) / k! if k ≠ 0, and (α choose 0) = 1.

When α is a positive integer n, this matches the ordinary binomial coefficient (n choose k).
Example: (1/2 choose 3) = (1/2)(-1/2)(-3/2) / 3! = (3/8) / 6 = 1/16.

🧪 Generalized binomial theorem

Theorem: As a generating function, (1 + x)^α = Σ_{k=0}^∞ (α choose k)·x^k.

This extends the ordinary binomial theorem to rational exponents.
Example: Set α = 1/2 and substitute -4x for x:
- √(1 - 4x) = Σ_{k=0}^∞ (1/2 choose k)·(-4x)^k = Σ_{k=0}^∞ (1/2 choose k)·(-1)^k·4^k·x^k.

🎓 Deriving the explicit formula

🎓 Extracting the coefficient of x^n

Starting from C(x) = (1 - √(1 - 4x)) / (2x):

Substitute the expansion: C(x) = (1 - (Σ_{k=0}^∞ (1/2 choose k)·(-1)^k·4^k·x^k)) / (2x).
Separate the k=0 term: C(x) = (1 - (1 + Σ_{k=1}^∞ (1/2 choose k)·(-1)^k·4^k·x^k)) / (2x).
Simplify: C(x) = (-Σ_{k=1}^∞ (1/2 choose k)·(-1)^k·4^k·x^k) / (2x).
Substitute n = k - 1: C(x) = Σ_{n=0}^∞ (1/2 choose n+1)·(-1)^{n+1}·4^{n+1}·x^{n+1} / (2x).
Cancel x and simplify: C(x) = Σ_{n=0}^∞ (1/2 choose n+1)·(-1)^n·2^{2n+1}·x^n / 2.

🏆 Final closed formula

From Example 7.6.3, (1/2 choose n+1) = (2n)!·(-1)^n / (2^{2n+1}·n!·(n+1)!).

Substitute into the coefficient of x^n:
- C_n = [(2n)!·(-1)^n / (2^{2n+1}·n!·(n+1)!)] · [(-1)^n·2^{2n+1}] / 2.
- The (-1)^n and 2^{2n+1} terms cancel.
- C_n = (2n)! / (n!·(n+1)!) = (1/(n+1)) · (2n)! / (n!·n!) = (1/(n+1)) · (2n choose n).

Corollary: The n-th Catalan number is C_n = (1/(n+1)) · (2n choose n).

This completes the derivation using generating functions.

Additional Problems for Chapter 7

7.7 Additional problems for Chapter

🧭 Overview

🧠 One-sentence thesis

This chapter provides practice problems on generating functions, including coin-change polynomials, closed-form coefficients, recursive sequences, and an introduction to partition theory using infinite product formulas.

📌 Key points (3–5)

Coin-change problems: model the number of ways to make change by multiplying polynomials whose exponents represent coin values.
Coefficient extraction: find coefficients in generating functions with closed forms like 1/(1 − 3x) or x²/(1 − 3x).
Recursive sequences: use generating functions to find closed formulas for sequences defined by recurrence relations (e.g., Fibonacci-like sequences).
Partitions: a partition of n is a way to write n as a sum of positive integers where order doesn't matter; p(n) counts all partitions, q(n) counts partitions with distinct parts.
Common confusion: in partitions, (2, 2, 1) and (2, 1, 2) are the same partition because order doesn't matter; parts are written in weakly decreasing order by convention.

💰 Coin-change and polynomial methods

💰 Modeling coin combinations

Problem setup: you have 8 pennies, 3 nickels, and 1 dime; you want to give a subset to a friend.
For each coin type, write a polynomial whose exponents are the possible values you could give:
- Pennies: exponents 0, 1, 2, …, 8 (since you have 8 pennies, each worth 1 cent).
- Nickels: exponents 0, 5, 10, 15 (since you have 3 nickels, each worth 5 cents).
- Dimes: exponents 0, 10 (since you have 1 dime worth 10 cents).
Example polynomial for dimes: f_d = 1 + x^10.

🧮 Multiplying polynomials to count ways

Multiply the three polynomials f_p, f_n, f_d to get a product F.
The coefficient of x^k in F tells you how many ways you can give k cents.
Example questions:
- What is the degree of F? (The largest power of x that appears.)
- How many ways can you give 17 cents? (Look at the coefficient of x^17.)
- How many ways can you give 20 cents? (Look at the coefficient of x^20.)
If the number of dimes changes to 2, the polynomial for dimes becomes f_d = 1 + x^10 + x^20, which changes the degree and the coefficients.

🖥️ Using Sage for computation

The excerpt suggests using Sage (sagecell.sagemath.org) to define and expand the polynomials.
Useful commands:
- Define a polynomial ring: R = PolynomialRing(RR, x).
- Expand the product: F = expand(fp * fn * fd).
- Extract a specific coefficient: F.coefficients()[17] returns the coefficient of x^17 and the number 17.
Don't confuse: ending a Sage cell with F returns the entire polynomial; using F.coefficients() returns only the coefficients.

🔢 Coefficient extraction from closed forms

🔢 Geometric series and powers

Problem 3: What is the coefficient of x^4 in 1/(1 − 3x)?
- Use the geometric series formula: 1/(1 − 3x) = sum from k=0 to infinity of (3x)^k = sum of 3^k x^k.
- The coefficient of x^4 is 3^4 = 81.
Problem 4: What is the coefficient of x^4 in x²/(1 − 3x)?
- Rewrite: x²/(1 − 3x) = x² · (sum of 3^k x^k) = sum of 3^k x^(k+2).
- For x^4, we need k+2 = 4, so k = 2; the coefficient is 3^2 = 9.

🔢 Squared denominators

Problem 5: What is the coefficient of x^4 in 1/(1 − 3x)²?
- Use the formula for (1 − r)^(−2): the coefficient of x^n is (n+1) r^n.
- Here r = 3, so the coefficient of x^4 is (4+1) · 3^4 = 5 · 81 = 405.

🔢 Fibonacci generating functions

Problem 6: Let F(x) = F_0 + F_1 x + F_2 x² + … be the generating function of the Fibonacci numbers (F_0 = 0, F_1 = 1, F_n = F_(n−1) + F_(n−2) for n ≥ 2).
- Compute the coefficient of x^5 in 1/(1 − x F(x)).
- This requires expanding 1/(1 − x F(x)) as a geometric series in x F(x) and extracting the x^5 term.
Problem 7: What is the coefficient of x^5 in d/dx F(x)?
- Differentiate F(x) term by term: d/dx (F_n x^n) = n F_n x^(n−1).
- The coefficient of x^5 in d/dx F(x) is 6 F_6 (since the x^6 term in F(x) becomes 6 F_6 x^5 after differentiation).

🔁 Recursive sequences and closed formulas

🔁 Finding closed forms for recurrences

Problem 8: Find a closed formula for the generating function of the sequence defined by b_0 = 5, b_1 = 12, and b_n = 5 b_(n−1) + 6 b_(n−2) for all n ≥ 2.
- Let B(x) = sum of b_n x^n.
- Multiply the recurrence by x^n and sum to relate B(x) to itself.
- Solve for B(x) to get a closed form (typically a rational function).

🔁 Cumulative sequences

Problem 9: Find a closed formula for the generating function of the sequence defined by e_0 = 1 and e_n = e_(n−1) + e_(n−2) + … + e_0 for all n ≥ 1.
- Notice that e_n is the sum of all previous terms.
- Let E(x) = sum of e_n x^n.
- The recurrence e_n = sum from k=0 to n−1 of e_k implies E(x) = 1 + x E(x) / (1 − x) (or a similar relation).
- Solve for E(x).

🔁 Proving identities with generating functions

Problem 10: Prove, using generating functions, that sum from k=0 to n of k · (n choose k) = n · 2^(n−1).
- Consider the generating function (1 + x)^n = sum of (n choose k) x^k.
- Differentiate both sides with respect to x: n (1 + x)^(n−1) = sum of k (n choose k) x^(k−1).
- Set x = 1 to get n · 2^(n−1) = sum of k (n choose k).

🔁 Fibonacci expansion

Problem 11: The generating function for the Fibonacci numbers is sum of F_n x^n = x / (1 − x − x²).
- Rewrite as x / (1 − (x + x²)) and use the geometric series formula: 1 / (1 − r) = sum of r^k.
- Here r = x + x², so x / (1 − (x + x²)) = x · sum of (x + x²)^k.
- Expand (x + x²)^k and collect terms to prove that F_n = sum from k=0 to n−1 of (n − k − 1 choose k).

🧩 Partitions and their generating functions

🧩 What is a partition?

Partition of a positive integer n: a way of writing n as a sum of other positive integers, where the order of the summands doesn't matter.

The summands are called the parts of the partition.
Convention: write parts in weakly decreasing order: λ_1 ≥ λ_2 ≥ … ≥ λ_k.
Example: since 5 = 2 + 1 + 2, the tuple (2, 2, 1) is a partition of 5 (written in decreasing order).

🧩 Counting partitions

p(n): the number of partitions of n (all partitions).
q(n): the number of partitions of n whose parts are all distinct.
Example: the seven partitions of 5 are:
- (5), (4, 1), (3, 2), (3, 1, 1), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1).
- So p(5) = 7.
- The partitions with distinct parts are (5), (4, 1), (3, 2), so q(5) = 3.
Example: p(6) = 11 and q(6) = 4.

🧩 Generating function for partitions

The generating function for the partition numbers p(n) has an infinite product formula:
- sum from n=0 to infinity of p(n) x^n = 1 / ((1 − x) · (1 − x²) · (1 − x³) · …).
Each factor 1/(1 − x^k) corresponds to the choice of how many parts of size k to include in a partition.
Don't confuse: this is an infinite product, not a finite polynomial product like the coin-change problems.

🧩 Distinct parts

For partitions with distinct parts (counted by q(n)), the generating function is different (the excerpt is cut off, but typically involves products of (1 + x^k) instead of 1/(1 − x^k)).

Investigation: Partitions and their generating functions

7.8 Investigation: Partitions and their generating functions

🧭 Overview

🧠 One-sentence thesis

Partitions of integers can be enumerated using generating functions expressed as infinite products, and their Young diagrams reveal deep combinatorial identities and bijections.

📌 Key points (3–5)

What a partition is: a way to write a positive integer as a sum of positive integers where order doesn't matter, with parts written in weakly decreasing order.
Generating functions for partitions: the number of partitions p(n) has generating function equal to an infinite product of terms 1/(1 − x^i), while distinct-part partitions q(n) correspond to the product of (1 + x^i).
Convergence of infinite products: an infinite product of generating functions converges when the coefficients of partial products stabilize; not all infinite products converge.
Common confusion: partitions vs compositions—partitions ignore order (e.g., (2,2,1) and (2,1,2) are the same partition), while the parts must be arranged in weakly decreasing order as a canonical form.
Young diagrams and bijections: visual representations of partitions as stacks of boxes reveal connections to lattice paths, Catalan numbers, and self-conjugate partitions.

🔢 Partition fundamentals

🔢 Definition and notation

A partition of a positive integer n is a way of writing n as a sum of other positive integers, where the order of the summands doesn't matter.

The summands are called parts of the partition.
Notation: λ = (λ₁, λ₂, ..., λₖ) represents the partition n = λ₁ + λ₂ + ⋯ + λₖ.
Parts are ordered in weakly decreasing order: λ₁ ≥ λ₂ ≥ ⋯ ≥ λₖ.
Example: Since 5 = 2 + 1 + 2, the tuple (2, 2, 1) is a partition of 5 (parts reordered as 2 ≥ 2 ≥ 1).

📊 Counting partitions

p(n): the number of partitions of n (all partitions).
q(n): the number of partitions of n whose parts are all distinct.
Example: The seven partitions of 5 are (5), (4,1), (3,2), (3,1,1), (2,2,1), (2,1,1,1), (1,1,1,1,1), so p(5) = 7.
Among these, only (5), (4,1), and (3,2) have all distinct parts, so q(5) = 3.

🔄 Order matters for canonical form

Don't confuse: The partition (2,2,1) is the same as (2,1,2) or (1,2,2) because order doesn't matter in the definition.
However, we write parts in weakly decreasing order as a canonical choice to avoid counting the same partition multiple times.

∞ Infinite products and generating functions

∞ Generating function for p(n)

The generating function for partition numbers has an infinite product formula:

Sum from n=0 to ∞ of p(n)x^n = 1/(1−x) · 1/(1−x²) · 1/(1−x³) · 1/(1−x⁴) · ⋯

Each factor 1/(1−x^i) corresponds to allowing parts of size i in the partition.
The video explains this informally; the excerpt explores rigorous details.

🔍 Convergence of infinite products

An infinite product of generating functions F₁(x)F₂(x)F₃(x)⋯ converges to G(x) = Σbᵢx^i if the partial products Pₘ(x) = F₁(x)⋯Fₘ(x) have coefficients that eventually match those of G(x).

"Eventually match" means: for all i, there exists N such that if m > N, the coefficient of x^i in Pₘ(x) equals bᵢ.
If the product converges to G(x), we say the infinite product equals G(x).

⚠️ When products don't converge

Example from Question 7.8.7: If all Fᵢ(x) = 1 + x, then:

P₁(x) = 1 + x
P₂(x) = (1 + x)²
P₃(x) = (1 + x)³

The coefficient of x in Pₘ(x) is m (from the binomial expansion), which grows without bound as m increases. This pattern does not stabilize, so the infinite product (1+x)(1+x)(1+x)⋯ does not converge to any generating function.

✅ When products do converge

Example from Question 7.8.8: Define Fᵢ(x) = 1 + x^(2^(i−1)), so:

F₁(x) = 1 + x
F₂(x) = 1 + x²
F₃(x) = 1 + x⁴
F₄(x) = 1 + x⁸

The partial products expand to include all possible sums of distinct powers of 2. The infinite product (1+x)(1+x²)(1+x⁴)(1+x⁸)⋯ converges because each coefficient of x^i stabilizes once enough factors are included to represent i in binary.

🎯 Special partition generating functions

🎯 Distinct-part partitions

From Question 7.8.9:

Product from i=1 to ∞ of (1 + x^i) = Sum from n=0 to ∞ of q(n)x^n

Each factor (1 + x^i) represents the choice: either include part i once (contributing x^i) or don't include it (contributing 1).
This counts partitions where all parts are distinct.

🎯 Even-part partitions

From Question 7.8.10:

Product from i=1 to ∞ of 1/(1−x^(2i)) = Sum of E(n)x^n

where E(n) is the number of partitions of n that use only even parts.

Each factor 1/(1−x^(2i)) allows using the even part 2i any number of times (0, 1, 2, ...).

🔗 A remarkable identity

Question 7.8.11 asks to multiply the two products above:

[Product of (1 + x^i)] · [Product of 1/(1−x^(2i))]

By factoring 1 − x^(2i) = (1 − x^i)(1 + x^i), the (1 + x^i) terms cancel, leaving:

Product from i=1 to ∞ of 1/(1−x^i)

This is the generating function for all partitions p(n). Comparing coefficients yields a combinatorial identity relating q(n), E(n), and p(n).

📐 Young diagrams and visual combinatorics

📐 Definition and construction

The Young diagram of a partition λ is the left-justified stack of boxes in which the i-th row from the bottom has λᵢ boxes.

Visual representation: each part corresponds to a row of boxes.
Example: The partition (3,2) of 5 has a Young diagram with 3 boxes in the bottom row and 2 boxes in the second row.
The seven partitions of 5 correspond to seven distinct Young diagrams.

🔄 Transpose (conjugate) partitions

The transpose or conjugate of a partition is the partition formed by reflecting the Young diagram about the diagonal line going southwest-northeast.

Reflection swaps rows and columns.
Example: Two partitions shown in the excerpt are conjugates of each other (specific diagrams not reproduced here).
Question 7.8.18 asks which partitions of 5 are self-conjugate (equal to their own transpose).

🔗 Connection to lattice paths

Question 7.8.14: Counting Young diagrams that fit inside a 3×4 grid is equivalent to counting lattice paths from the top-left corner to the bottom-right corner.

Each Young diagram corresponds to a path that stays on or below the diagram's boundary.
This connects partition enumeration to earlier material on lattice path counting.

🌟 Connection to Catalan numbers

Question 7.8.15: The number of partitions (λ₁, ..., λₙ) with n − i < λᵢ ≤ n for all i equals the Catalan number Cₙ.

Hint: The Young diagram's border corresponds to a Dyck path (a lattice path that never goes below the diagonal).
This reveals a deep connection between partitions and Catalan structures.

🔄 Self-conjugate partitions and odd parts

Question 7.8.19: There is a bijection between:

Self-conjugate partitions of n (partitions equal to their own transpose).
Partitions of n having all distinct odd parts.

This bijection is a classic result in partition theory, showing that two seemingly different partition classes are equinumerous.

Graphs in combinatorics

8.1 Graphs in combinatorics

🧭 Overview

🧠 One-sentence thesis

Graphs provide a simplified data structure that represents relationships between objects through vertices and edges, enabling powerful tools to solve optimization and pattern problems across mathematics and computer science.

📌 Key points (3–5)

What a graph is: a collection of vertices (dots) and edges (line segments) that describe adjacency (connections or relationships) between vertices.
Why graphs matter: they simplify complex maps and networks, making it easier to solve optimization problems and reveal patterns.
Standard restrictions: unless stated otherwise, graphs in this book do not allow self-loops (edges from a vertex to itself) or multiple edges between the same pair of vertices.
Common confusion: the same graph can be drawn in different ways—a graph is defined only by its vertex set and edge set, not by how it looks on paper.
Key examples: complete graphs (every pair connected), path graphs (vertices in a line), and cycle graphs (vertices forming a loop).

🎯 What graphs represent

🎯 Real-world simplification

The excerpt motivates graphs with a bus route problem: students want to travel between campus locations (Corbett Hall, Library, Weber, Recreation Center) during a hailstorm.
A map with all details is confusing; a graph strips away unnecessary information.
Vertices represent locations (bus stops).
Edges represent direct routes between stops.
Example: Corbett Hall and Moby Arena are adjacent bus stops, so they are connected by an edge.

🌐 Social networks

The excerpt also describes a social network graph.
Each vertex corresponds to a person (or at least a mammal, the excerpt jokes).
An edge between two people means they have met before.
Application question: "Which person would you want to contact first?" to collect votes—this hints at centrality and influence in the network.

📐 Formal definitions

📐 Graph, vertices, edges

Graph G = (V, E): consists of a finite set V of vertices (or nodes) and a finite set E of edges, where each edge e ∈ E is of the form e = {u, v} with u, v ∈ V.

An edge {u, v} is often written as uv.
Unless mentioned otherwise, uv is the same as vu (order does not matter).
The graph is defined only by its vertex set and edge set, not by how it is drawn.

🚫 Standard restrictions

The excerpt emphasizes two default rules (Remark 8.1.2):

No self-loops: an edge uu (starting and ending at the same vertex u) is not allowed.
No multiple edges: at most one edge between any pair of vertices {u, v}.

These restrictions are automatic because each edge e is a set (so {u, u} is not valid) and E is a set (so duplicate edges are not allowed).
Later chapters will consider more general graphs that do allow self-loops and multiple edges.

🔗 Adjacency

Adjacent vertices u and v: two vertices are adjacent if there is an edge uv connecting them. If u and v are adjacent, we write u ∼ v.

Adjacency describes a direct connection or relationship.
Example: in the bus route graph, Corbett Hall ∼ Moby Arena because there is a direct bus route.

🏷️ Labeling

A labeling assigns numbers to each vertex.
If a graph has n vertices, labels are always 0 through n − 1, each used exactly once.
The same graph can be drawn in different ways; what matters is the vertex and edge sets, not the visual layout.

🏗️ Common graph families

🏗️ Complete graphs K_n

Complete graph K_n: the graph with n vertices having an edge between every pair of distinct vertices.

Every possible pair of vertices is connected.
The excerpt gives examples: K₁, K₂, K₃, K₄.
How many edges? K_n has "n choose 2" edges, because there are that many pairs of distinct vertices.
Example (from the excerpt): What is the largest number of edges a graph with 9 vertices can have? Answer: "9 choose 2" edges, which is the complete graph K₉.

🛤️ Path graphs P_n

Path graph P_n: the graph with n vertices and n − 1 edges that can be drawn so that all vertices and edges lie on a straight line.

Vertices are arranged in a line, each connected to the next.
Example: P₅ has five vertices and four edges.
The bus route in Figure 8.2 contains a path graph P₆ connecting Moby Arena → Corbett Hall → Library → CSU Transit Center → Engineering → MAX Station.

🔄 Cycle graphs C_n

Cycle graph C_n (for n ≥ 3): the graph with n edges and n vertices such that every vertex is adjacent to exactly two others.

Vertices form a closed loop.
Each vertex has exactly two neighbors.
The excerpt defines cycle graphs but does not provide a worked example in the given text.

🧮 Counting edges and graphs

🧮 Maximum edges

Question: What is the largest number of edges a graph with 5 vertices can have?
Answer: the complete graph K₅ has "5 choose 2" edges.
General rule: a graph with n vertices can have at most "n choose 2" edges (no self-loops, no multiple edges).

🧮 Counting distinct graphs

The excerpt includes exercises that ask:

Labeled vertices: How many different graphs (V, E) are there if the vertices are labeled (e.g., V = {v₁, v₂, v₃, v₄})? In this case, the graph with edge v₁v₂ is different from the graph with edge v₂v₃.
Unlabeled vertices: How many different graphs are there if vertices are not labeled? In this case, all graphs with 4 vertices and one edge are considered the same.

Don't confuse: labeled vs. unlabeled graphs count differently because labeling distinguishes otherwise identical structures.

🧮 Graphs with a fixed number of edges

Exercise: How many different graphs (V, E) with exactly three edges are there if V has 5 vertices?
- (a) Labeled vertices.
- (b) Unlabeled vertices.
The excerpt does not provide answers, but the distinction between labeled and unlabeled is key.

🔍 Drawing and recognizing graphs

🔍 Same graph, different drawings

The excerpt emphasizes: "A graph is defined only by its set of vertices and its set of edges; the same graph may be drawn in different ways."
Two graphs below (in the excerpt) are the same graph, just drawn differently in the plane.
Don't confuse: visual appearance vs. structural identity. If two drawings have the same vertex set and edge set, they are the same graph.

🔍 Ignoring edge crossings (for now)

In the social network example (Figure 8.3), "The one intersection of edges that does not have a dot means nothing and can be ignored."
Edge crossings in a drawing do not create new vertices unless explicitly marked.
Later (Chapter 11), the excerpt mentions graphs that can be drawn with no crossing edges, but for now crossings are just artifacts of the drawing.

Everyday graphs

8.2 Everyday graphs

🧭 Overview

🧠 One-sentence thesis

Certain graph structures appear so frequently in applications that they have standard names—complete graphs, paths, cycles, and bipartite graphs—each capturing a different pattern of connections.

📌 Key points (3–5)

Complete graphs (Kₙ): every pair of vertices is connected; the maximum number of edges for n vertices is "n choose 2."
Path and cycle graphs (Pₙ, Cₙ): paths are vertices arranged in a line; cycles close the path into a loop where every vertex has exactly two neighbors.
Bipartite graphs: vertices split into two groups (left and right) with edges only between groups, never within a group—useful for matching problems.
Common confusion: a cycle graph Cₙ requires n ≥ 3 and every vertex must have degree exactly 2; don't confuse it with a path, which has endpoints of degree 1.
Why it matters: recognizing these named graphs helps model real-world networks like bus routes, ordering patterns, and matching scenarios.

🔗 Complete graphs

🔗 What a complete graph is

Complete graph Kₙ: the graph with n vertices having an edge between every pair of distinct vertices.

Every possible connection exists.
Example: K₄ has 4 vertices and every pair is connected.

🧮 Counting edges in Kₙ

The excerpt asks: "What is the largest number of edges that a graph with 9 vertices can have?"
Answer: at most "9 choose 2" edges, which is the number of ways to pick 2 vertices from 9.
Generalization: Kₙ has "n choose 2" edges.
Example: if you have 9 bus stops and want the maximum number of direct routes (at most one route between any two stops), the answer is the complete graph K₉.

🛤️ Path and cycle graphs

🛤️ Path graph Pₙ

Path graph Pₙ: the graph with n vertices and n − 1 edges that can be drawn so that all vertices and edges lie on a straight line.

Vertices are arranged in a line; each vertex (except the two endpoints) is adjacent to exactly two others.
The two endpoints have degree 1.
Example: P₅ has 5 vertices and 4 edges.
Real-world example from the excerpt: a bus route connecting Moby Arena → Corbett Hall → Library → CSU Transit Center → Engineering → MAX Station forms a path graph P₆.

🔄 Cycle graph Cₙ

Cycle graph Cₙ (for n ≥ 3): the graph with n edges and n vertices such that every vertex is adjacent to exactly two others.

A cycle closes the path into a loop.
Usually drawn as a circle.
Example: C₆ has 6 vertices and 6 edges arranged in a circle.
Real-world example: a bus loop connecting Weber → Engineering → MAX Station → back to Weber forms a cycle graph C₃.
Don't confuse: a path has endpoints (degree 1); a cycle has no endpoints (every vertex has degree 2).

🍕 Bipartite graphs

🍕 What a bipartite graph is

Bipartite graph G = (V, E): V is the disjoint union of two sets L (left) and R (right), and every edge connects a vertex in L with a vertex in R; no edges exist between two vertices in L or between two vertices in R.

The vertex set splits into two groups.
Edges only cross between groups, never within a group.
Example: three pizza parlors (left) and four student houses (right); an edge means "that house ordered from that parlor last month."
No edges between two parlors (a parlor never orders from another parlor).
No edges between two houses (a house never orders from another house).

🍕 Why bipartite graphs matter

Many real-world problems involve matching two different types of objects:
- Students with jobs
- Pets with owners
- Houses with pizza parlors
The structure naturally models "two-sided" relationships.

🔗 Complete bipartite graph Bₙ,ₘ

Complete bipartite graph Bₙ,ₘ: the bipartite graph with n vertices on the left and m vertices on the right, such that every vertex on the left is adjacent to every vertex on the right.

Every possible cross-group connection exists.
Example: B₃,₄ models the situation where all 4 houses have ordered pizza from all 3 parlors within the last month.
Counting:
- Vertices: n + m total.
- Edges: n × m (every left vertex connects to every right vertex).
Special case: B₁,ₘ is called a "fan" (one vertex on the left connects to m vertices on the right).

📐 Degree of a vertex

📐 What degree means

Degree of v, denoted deg(v): the number of edges adjacent to v.

It counts how many connections a vertex has.
Example: at the CSU Transit Center bus stop, you can visit three other stops next → the degree of that vertex is 3.
Example: at Horsetooth, there is only one route → the degree of that vertex is 1.

📐 Why degree matters

Degree tells you how many choices you have at a given vertex.
The excerpt introduces the question: "Given a set of n non-negative numbers, can this be the set of degrees of the vertices in a graph with n vertices?"
This question explores whether a proposed degree sequence is realizable as a graph.

The degree of a vertex

8.3 The degree of a vertex

🧭 Overview

🧠 One-sentence thesis

The degree of a vertex counts how many edges connect to it, and the sum of all degrees in any graph must be even because each edge contributes exactly two to the total count.

📌 Key points (3–5)

What degree measures: the number of edges adjacent to a given vertex.
The fundamental constraint: the sum of all vertex degrees equals twice the number of edges, so the total is always even.
Odd-degree vertices come in pairs: because the sum must be even, there must be an even number of vertices with odd degree.
Common confusion: a vertex of degree 4 in a 5-vertex graph connects to all other vertices, so no vertex of degree 0 can exist in the same graph.
Degree sequences: not every list of numbers can be the degrees of vertices in a valid graph—constraints like even sums and logical adjacency must hold.

🔢 What degree means

🔢 Definition and intuition

Degree of a vertex v, denoted deg(v): the number of edges adjacent to v.

It counts how many connections a vertex has.
Example: In the bus route graph, the CSU Transit Center connects to three other stops (Weber, Engineering, Horsetooth), so its degree is 3.
Example: Horsetooth connects to only one other stop, so its degree is 1.

🚫 Don't confuse degree with other properties

Degree is not the number of vertices in the graph; it is the number of edges touching one specific vertex.
A vertex can have degree 0 (isolated, no edges) or degree equal to n−1 in an n-vertex graph (connected to every other vertex).

🧮 The fundamental theorem about degrees

🧮 Sum of degrees equals twice the number of edges

Theorem 8.3.6: The sum of the degrees of all vertices in a graph is twice the number of edges.

Why: Each edge connects two vertices, so when you add up all degrees, every edge is counted once at each of its two endpoints.
Example: A graph with 6 edges has total degree sum = 2 × 6 = 12.
Consequence: The sum of all vertex degrees must be an even number.

🎭 Odd-degree vertices must come in pairs

Because the total degree sum is even, the number of vertices with odd degree must also be even.
Example: You cannot have exactly one vertex of odd degree, or exactly three—only 0, 2, 4, 6, etc.
Two proofs given:
1. Direct from the theorem: sum is even, so odd-degree vertices must pair up to contribute an even total.
2. Induction on edges: adding one edge changes degrees of two vertices; in all three cases (even+even, even+odd, odd+odd), the parity of the count of odd-degree vertices stays even.

🧩 Valid and invalid degree sequences

✅ Valid examples

Vertices	Degrees	Valid?	Why
5	1, 2, 2, 2, 3	Yes	Sum = 10 (even); no logical conflict
9	1, 2, 2, 2, 3, 3, 3, 4, 4	Yes	Figure 8.2 is an example

❌ Invalid examples and why

❌ Logical conflict: degree 0 and degree 4 in a 5-vertex graph

Degrees: 0, 1, 2, 3, 4
Why invalid: A vertex of degree 4 in a 5-vertex graph must connect to all other four vertices, so no vertex can have degree 0 (isolated).
Don't confuse: the sum here is 10 (even), but the adjacency logic is impossible.

❌ Odd sum

Degrees: 2, 2, 3, 3, 3 (5 vertices)
Sum: 2 + 2 + 3 + 3 + 3 = 13 (odd)
Why invalid: Theorem 8.3.6 requires the sum to be even.

🧪 How to check a degree sequence

Check the sum: Is it even? If not, invalid.
Check odd-degree count: Is the number of odd degrees even? If not, invalid.
Check logical conflicts: Does a high-degree vertex force connections that contradict a low-degree vertex?
Try to draw it: Sometimes the only way to confirm is to attempt construction.

🔗 Degree in special graphs

🔗 Complete bipartite graph B(n,m)

Left side (n vertices): each connects to all m vertices on the right, so degree = m.
Right side (m vertices): each connects to all n vertices on the left, so degree = n.
Example: In B(3,4), the 3 left vertices each have degree 4, and the 4 right vertices each have degree 3.

🔗 Fan graph B(1,m)

One vertex on the left connects to m vertices on the right.
The single left vertex has degree m; each of the m right vertices has degree 1.
Called a "fan" because it looks like a central hub with spokes radiating out.

8.4 Subgraphs

🧭 Overview

🧠 One-sentence thesis

A subgraph is formed by taking a subset of a graph's vertices and edges, and this concept allows us to study how graphs change when parts are removed or isolated.

📌 Key points (3–5)

What a subgraph is: a graph whose vertices and edges are all contained in the original graph.
How to identify valid subgraphs: both the vertex set and edge set must be subsets of the original; missing a vertex for an edge or adding new edges disqualifies it.
Spanning subgraphs: special subgraphs that keep all vertices from the original graph.
Common confusion: an object with edges but missing their endpoint vertices is not a subgraph—it's not even a valid graph.
Boundary cases: every graph is a subgraph of itself, and the empty graph (no vertices or edges) is a subgraph of any graph.

🏗️ What makes a subgraph

🏗️ Formal definition

Subgraph: Let G = (V, E) be a graph. A graph G′ = (V′, E′) is a subgraph of G if V′ ⊆ V and E′ ⊆ E.

The vertex set V′ must be a subset of the original vertex set V.
The edge set E′ must be a subset of the original edge set E.
Both conditions must hold simultaneously.

🚫 What disqualifies a subgraph

The excerpt provides clear examples of invalid subgraphs:

Case	Why it fails	Example from excerpt
Missing vertex for an edge	Not even a valid graph	Object (c): has an edge but missing one of its endpoint vertices
Extra edge not in original	Contains elements outside the original edge set	Graph (d): contains an edge that is not in G

Don't confuse: removing an edge is allowed, but you cannot have an edge whose endpoints are not both present in your vertex subset.
Example: if you keep an edge {a, b}, you must keep both vertices a and b.

🌉 Special types and examples

🌉 Spanning subgraphs

Spanning subgraph: a subgraph that contains all of the vertices of G.

"Spanning" means the vertex set is complete (V′ = V), but the edge set can be any subset (E′ ⊆ E).
Example from the bus route scenario: when a single bus route closes but all stations remain open, the result is a spanning subgraph with one edge removed.

🚌 Real-world scenario: bus routes

The excerpt uses a bus route graph to illustrate subgraph formation:

First change: one bus route (edge) closes → spanning subgraph with one fewer edge.
Second change: a station (vertex) closes → three adjacent routes must also close → subgraph with one fewer vertex and three fewer edges.
This shows how removing a vertex forces removal of all its incident edges.

🔢 Boundary cases and structure

🔢 Trivial subgraphs

The graph itself: every graph G is always a subgraph of itself (V′ = V, E′ = E).
The empty graph: the graph with no vertices and no edges is a subgraph of any graph.
These are the two extremes of the subgraph relationship.

📊 Subgraph enumeration

The excerpt shows a complete enumeration for a small graph (three vertices, one edge):

All possible subgraphs are drawn in a hierarchy.
Each connection represents removing a single vertex or edge.
This illustrates that even small graphs can have many subgraphs.
The exercises hint at systematic counting: count subgraphs with 0 vertices, then 1 vertex, then 2 vertices, etc.

🏷️ Labeled vs unlabeled

When vertices are labeled (e.g., a, b, c), subgraphs with the same structure but different labels are considered different.
Example: a subgraph with vertices {a, b} and edge {a, b} is different from one with vertices {b, c} and edge {b, c}, even though both have two vertices and one edge.

8.5 Walks and connected graphs

🧭 Overview

🧠 One-sentence thesis

A graph is connected if you can travel between any two vertices by walking along edges, and disconnected graphs break uniquely into maximal connected components.

📌 Key points (3–5)

What a walk is: a sequence of vertices and edges where each edge connects consecutive vertices in the sequence.
What connected means: for any two vertices in the graph, there exists a walk between them.
Disconnected graphs: when no walk exists between some pairs of vertices, the graph splits into separate connected components.
Common confusion: a connected component must be maximal—you cannot add more vertices or edges from the original graph and still have it connected.
Why it matters: removing edges can disconnect a graph; connected components let you work separately with each piece of a disconnected graph.

🚶 Walks in graphs

🚶 What a walk is

Walk: a sequence of vertices and edges v₀, e₁, v₁, e₂, v₂, …, vₖ₋₁, eₖ, vₖ such that edge eᵢ connects vertices vᵢ₋₁ and vᵢ, for 1 ≤ i ≤ k.

A walk is not just a list of vertices; it explicitly includes the edges that connect them.
Each edge in the sequence must actually connect the two vertices before and after it.
Example: In a graph with vertices and edges, the sequence v₀, e₁, v₁, e₂, v₂, e₃, v₃ is a walk if e₁ connects v₀ and v₁, e₂ connects v₁ and v₂, and e₃ connects v₂ and v₃.

🎯 Real-world motivation

The excerpt uses a bus system example: students need to travel from Weber to Moby Arena.
When Bus 2 breaks down, three red edges disappear from the graph.
Result: there is no longer a bus route (walk) from Weber to Moby Arena.
This illustrates how removing edges can destroy connectivity.

🔗 Connected graphs

🔗 Definition of connected

Connected graph: a graph G is connected if for any two vertices u and v of G, there is a walk in G from u to v.

"Connected" means you can get from anywhere to anywhere by following edges.
It does not matter how long the walk is; as long as one exists, the graph is connected.
Example: The excerpt shows a graph where "no matter which two vertices you consider, you can always find a walk between them."

❌ What disconnected looks like

A graph is not connected when there is no walk between some pair of vertices.
Example: The excerpt describes a graph with vertices on the left side and vertices on the right side, with no walk between them.
Don't confuse: a graph can have many edges and still be disconnected if the edges do not form paths between all vertices.

🧩 Connected components

🧩 What a connected component is

Connected component: a maximal subgraph H of a graph G that is connected.

"Maximal" means you cannot add any more vertices or edges from G and still have H connected.
In other words: if H′ is another connected subgraph of G, and H is a subgraph of H′, then H = H′ (they are the same).
Each disconnected graph breaks uniquely into its connected components.

🔍 Why "maximal" matters

The excerpt gives an example: if G is a certain graph, then two specific subgraphs are the connected components.
Another subgraph is shown that is not a connected component because it is not maximal—you could add more vertices/edges and still be connected.
Don't confuse: a connected subgraph is not necessarily a connected component; it must be as large as possible within G.

🛠️ Working with disconnected graphs

If a graph is disconnected, you can work separately with each connected component.
This is often more convenient than dealing with the entire disconnected graph at once.
Example: The excerpt notes "it is then possible to work separately with each connected component of a graph."

🔧 Removing edges and connectivity

🔧 How edge removal affects connectivity

The excerpt states: "by deleting an edge from a connected graph, it is possible to make it disconnected."
Removing certain edges can break a graph into multiple connected components.
Example exercises ask: what is the smallest number of edges you must remove to disconnect a graph, or to divide it into a specific number of components?

⚙️ Special cases

Removing an edge from a cycle: If a connected graph contains a cycle and you remove an edge from that cycle, the remaining graph is still connected (stated in Exercise 5).
Adding an edge: If two vertices u and v are not connected by an edge, adding the edge uv creates a cycle if and only if u and v are already in the same connected component (Exercise 6).
Don't confuse: removing an edge from a cycle does not always disconnect the graph, because there are other paths around the cycle.

📊 Summary table

Concept	Definition	Key property
Walk	Sequence v₀, e₁, v₁, e₂, …, eₖ, vₖ where each eᵢ connects vᵢ₋₁ and vᵢ	Describes a path through the graph
Connected graph	For any two vertices u and v, there is a walk from u to v	You can reach any vertex from any other vertex
Disconnected graph	Some pair of vertices has no walk between them	The graph splits into separate pieces
Connected component	A maximal connected subgraph H of G	Cannot add more vertices/edges from G and stay connected

Graph complements

8.6 Graph complements

🧭 Overview

🧠 One-sentence thesis

The complement of a graph swaps which pairs of vertices are connected and which are not, transforming the structure in ways that reveal relationships like introducing people who haven't met or converting bipartite graphs into disjoint complete graphs.

📌 Key points (3–5)

What a complement is: a new graph with the same vertices but edges only where the original graph had no edges.
How to construct it: for every pair of vertices, if they were connected in G, disconnect them in the complement; if they were not connected, connect them.
Set-theoretic view: the complement's edge set is the set difference between all possible pairs and the original edge set.
Common confusion: the complement is not a subgraph of the original graph unless the original is complete (all vertices already connected).
Why it matters: complements help solve problems like "who needs to meet whom" and reveal hidden structure (e.g., bipartite complements split into complete graphs).

🔄 What is a graph complement

🔄 Core definition

Graph complement: Given a graph G, its complement G̅ has the same vertex set as G, and has an edge between vertices u and v precisely when G does not.

The complement "flips" the edges: present edges disappear, absent edges appear.
The vertex set stays identical; only the edge relationships change.
Example: In the picnic scenario, the original graph shows who has met; the complement shows who has not met and needs to be introduced.

🎯 Picnic scenario

Original graph: 7 vertices (Dr. Pries, Dr. Adams, Dr. Gillespie, Cam the Ram, Student A, Student B, Student C); edges represent "have met."
Complement graph: same 7 vertices; edges represent "have not yet met."
Student A draws the complement to plan introductions—each edge in the complement is a pair that needs to meet.

🧮 Two ways to define the complement

🧮 Vertex-pair definition

Let G = (V, E) be a graph with vertex set V and edge set E.
Let S denote the set of all pairs of vertices in V.
E is a subset of S (the pairs that are actually connected).
The complement can be defined as G̅ = (V, S − E), where S − E is the set complement (all pairs minus the original edges).
This agrees with Definition 8.6.1 because S − E contains exactly the pairs that were not edges in G.

🔍 Why the two definitions match

Definition 8.6.1 says: "edge in G̅ if and only if no edge in G."
Set-theoretic definition says: "edge set of G̅ is all possible pairs minus the original edge set."
Both produce the same result: every non-edge becomes an edge, every edge becomes a non-edge.

📐 Examples and patterns

📐 Path graph complements

The excerpt shows path graphs P₂ through P₅ and their complements.
A path is a linear chain of vertices; its complement connects vertices that were not adjacent in the path.
Example: In P₃ (three vertices in a line), the two endpoints are not connected; in the complement, they are connected, forming a triangle.

📐 Cycle graph C₅

Exercise asks to describe the complement of C₅ (a 5-vertex cycle).
In a cycle, each vertex connects to exactly two neighbors; the complement will connect each vertex to the two non-neighbors.

📐 Bipartite graph B₃,₄

Exercise asks for the complement of B₃,₄ (bipartite graph with parts of size 3 and 4).
A later exercise proves: the complement of Bₙ,ₘ has two connected components, which are Kₙ and Kₘ (complete graphs on n and m vertices).
Intuition: in a bipartite graph, vertices within the same part have no edges; in the complement, they become fully connected.

🧩 Key properties and exercises

🧩 Degree in the complement

Question: If vertex v has degree d in a graph G on n vertices, what is the degree of v in G̅?
Answer (from structure): In G, v connects to d vertices; in G̅, v connects to all the others it didn't connect to, so degree is (n − 1) − d.

🧩 When is the complement a subgraph?

Exercise 4: Prove that G̅ is a subgraph of G only if G is the complete graph on its vertex set.
Reasoning: If G̅ ⊆ G, every edge in G̅ must also be in G. But G̅ has edges exactly where G does not, so the only way both can hold is if G has all possible edges (complete graph) and G̅ has none.

🧩 Complement symmetry

Exercise 7: True or False: If G is the complement of H, then H is the complement of G.
Answer: True. Complementation is symmetric: flipping edges twice returns the original graph.

🧩 Connectivity of complements

Exercise 2: Is the complement of C₆ connected?
This requires checking whether the complement has a path between every pair of vertices.
Don't confuse: a connected original graph does not guarantee a connected complement (or vice versa).

🔗 Relationship to other concepts

🔗 Complete graphs

A complete graph Kₙ has all possible edges.
Its complement has no edges (every pair was already connected, so the complement disconnects them all).
Conversely, an empty graph (no edges) has a complete graph as its complement.

🔗 Bipartite complements

Exercise 8: The complement of Bₙ,ₘ has two connected components, Kₙ and Kₘ.
In Bₙ,ₘ, edges only go between the two parts; within each part, no edges exist.
In the complement, edges only go within each part, fully connecting each part into a complete graph; no edges go between parts.

🔗 Path and cycle complements

Exercise 5: For which n is the complement of Pₙ a path?
Exercise 6: For which n is the complement of Cₙ a cycle?
These questions explore when the complement preserves the same graph family (path or cycle).

Storage structures for graphs

8.7 Storage structures for graphs

🧭 Overview

🧠 One-sentence thesis

Graphs can be stored in computers using either adjacency matrices (which are easy to edit but use more space) or edge lists (which are more compact for sparse graphs but harder to modify), and the choice depends on the graph's properties and the operations needed.

📌 Key points (3–5)

Two main representations: adjacency matrix (an n × n matrix of 0s and 1s) and edge list (a list of vertex pairs).
Adjacency matrix trade-off: easy to edit (add/remove edges by changing single entries) but uses n² space.
Edge list trade-off: more compact for sparse graphs (only 2e integers) but harder to edit (requires searching and shuffling).
Common confusion: adjacency matrices depend on vertex labeling—different labelings produce different matrices (via row/column permutation), but they represent the same graph.
Special case: for rooted trees, edge lists are inefficient for finding descendants, so a materialized path (storing the entire walk from root to each vertex) is preferred.

📊 Adjacency matrix representation

📊 What an adjacency matrix is

Adjacency matrix: Given a graph G with n vertices labeled 0 to n−1, an n × n matrix where entry (i, j) is 1 if edge {i, j} exists in G, and 0 otherwise.

The matrix has n² entries, each either 0 or 1.
Example: The excerpt shows graph C₆ (a 6-vertex cycle) with its 6 × 6 adjacency matrix.

🔄 Symmetry and redundancy

Symmetric: Entry (i, j) equals entry (j, i) because edge {i, j} is the same as edge {j, i}.
Diagonal is zero: All n diagonal entries are 0 (no self-loops allowed in graphs).
Storage optimization: Only need to store (n² − n)/2 entries above the diagonal to fully represent the matrix.

🏷️ Labeling dependency

The adjacency matrix depends on how vertices are labeled (0 to n−1).
Changing the labeling changes the matrix by permuting rows and columns.
Don't confuse: different matrices can represent the same graph structure if they differ only by relabeling.

✏️ Editability advantage

Easy to add/remove edges: To add edge {i, j}, change entries (i, j) and (j, i) from 0 to 1; to remove, change from 1 to 0.
Only two matrix entries need updating per edge operation.

📝 Edge list representation

📝 What an edge list is

Edge list: A representation storing n (number of vertices) and a list of size 2 × e (where e is the number of edges), with each column storing the two vertices incident to that edge.

Requires storing 2e integers between 0 and n−1.
Example: The excerpt shows C₆'s edge list as two rows: 0 1 2 3 4 5 and 1 2 3 4 5 0.

💾 Space efficiency for sparse graphs

Advantage for sparse graphs: Uses less storage when the number of edges is small compared to the number of vertices.
Comparison: adjacency matrix uses n² space regardless of edge count; edge list uses only 2e space.
Example: A sparse graph with 1000 vertices and 2000 edges needs 2,000,000 entries for adjacency matrix but only 4,000 integers for edge list.

⚠️ Editing difficulty

Disadvantage: Harder to modify than adjacency matrices.
To remove an edge: must search through the entire list to find the edge's location, then shuffle all later entries left.
This is time-consuming compared to the two-entry update in adjacency matrices.

🌳 Special case: rooted trees and materialized paths

🌳 Descendant-finding problem

For rooted trees (connected graphs with no cycles and a designated root), finding all descendants of a vertex v is important.
Descendants: All vertices that are further from the root than v is.
With an edge list, finding descendants requires repeated queries through the list, which is very time-consuming.

🛤️ Materialized path solution

Materialized path: A storage structure that stores the entire walk from the root to each vertex.

Replaces edge lists in the rooted tree context.
Makes descendant queries more efficient by storing complete paths rather than individual edges.
Don't confuse: this is not a general replacement for edge lists—only useful for rooted tree structures where path queries are common.

📐 Comparison summary

Representation	Space used	Edit ease	Best for
Adjacency matrix	n² entries	Easy (change 2 entries)	Dense graphs, frequent edits
Edge list	2e integers	Hard (search + shuffle)	Sparse graphs, few edits
Materialized path	Full paths stored	(Not discussed)	Rooted trees, descendant queries

🔑 Choosing the right representation

Adjacency matrix: Choose when the graph is dense or when you need to frequently add/remove edges.
Edge list: Choose when the graph is sparse (few edges relative to vertices) and edits are rare.
Materialized path: Choose for rooted trees when you need to efficiently find descendants or query paths from the root.

Eulerian walks and Hamiltonian cycles

8.8 Eulerian walks and Hamiltonian cycles

🧭 Overview

🧠 One-sentence thesis

Euler's 1736 theorem completely characterizes when a connected graph has an Eulerian walk (traversing every edge exactly once) based on vertex degrees, whereas determining the existence of a Hamiltonian cycle (visiting every vertex exactly once) remains computationally very hard.

📌 Key points (3–5)

Eulerian walks: a walk through a connected graph that goes through every edge exactly once; existence depends entirely on the number of vertices with odd degree.
Euler's theorem: a connected graph has an Eulerian walk if and only if it has zero or exactly two vertices of odd degree; when all degrees are even, every Eulerian walk is closed.
Hamiltonian cycles: a closed walk that passes through each vertex exactly once; no efficient general method exists to determine whether a graph has one (the problem is NP-complete).
Common confusion: Eulerian walks focus on edges (traverse every edge once), while Hamiltonian cycles focus on vertices (visit every vertex once)—these are fundamentally different problems with very different difficulty levels.
Historical significance: Euler solved the Königsberg bridge problem by transforming it into graph theory, creating one of the earliest theorems in the field.

🌉 The Königsberg bridge problem

🌉 Historical context and transformation

The original problem: is it possible to walk through Königsberg crossing each of seven bridges over the Pregel River exactly once, without needing to return to the starting location?
Euler's innovation: transform the bridge-crossing question into an abstract graph theory problem.
- Each land mass becomes a vertex.
- Each bridge becomes an edge.
- Multiple bridges between the same land masses create multiple edges between two vertices.
This was groundbreaking because "no one had ever thought about graph theory abstractly before then."

🚫 Solution to the Königsberg problem

The resulting graph has four vertices of odd degree.
By Euler's theorem part (a), a graph with more than two vertices of odd degree has no Eulerian walks.
Conclusion: it is not possible to take a walk in Königsberg that crosses each of the seven bridges exactly once.

🔢 Euler's theorem on Eulerian walks

🔢 Definition and basic concepts

Eulerian walk: a walk through a connected graph that goes through every edge exactly once.

Recall: a walk is a sequence of vertices and edges v₀, e₁, v₁, e₂, v₂, ..., vₖ₋₁, eₖ, vₖ where edge eᵢ connects vertices vᵢ₋₁ and vᵢ.
A walk is closed if it begins and ends at the same location (v₀ = vₖ).
Don't confuse: an Eulerian walk may or may not be closed, depending on the graph structure.

📐 The three cases of Euler's theorem (1736)

For a connected graph G:

Vertices with odd degree	Eulerian walk exists?	Starting/ending behavior
More than two	No	No Eulerian walks possible
Exactly two	Yes	Every Eulerian walk starts and ends at these two vertices (not closed)
Zero (all even)	Yes	Every Eulerian walk is necessarily closed

🧮 Why vertex degree matters: odd-degree vertices

Claim: If a vertex v has odd degree, then any Eulerian walk must start or end at v.

Proof idea:

If the walk doesn't start at v, then it must repeatedly enter and leave v.
Each enter-and-leave pair uses up two edges.
Since v has odd degree, after using edges in pairs, exactly one edge remains.
The walk must enter along this final edge and cannot leave.
Therefore, if the walk doesn't start at v, it must end at v.

Implication: This proves part (a) and the second sentence of part (b) of Euler's theorem—you can't have more than two odd-degree vertices because each one must be a start or end point, and a walk has only two endpoints.

🔄 Even-degree vertices

Claim: If a vertex v has even degree, then any Eulerian walk either starts and ends at v, or starts and ends somewhere else.

The proof is similar to the odd-degree case.
Even degree means edges can be paired: enter-leave, enter-leave, etc.
Either all edges are used in pairs (walk passes through without starting/ending), or the walk starts and ends at v (using all edges in pairs plus the start/end).
This proves the second sentence of part (c): when all vertices have even degree, Eulerian walks must be closed.

❓ Why can't exactly one vertex have odd degree?

Question: Why can't we have one vertex of odd degree in a graph?

Answer:

The sum of all vertex degrees equals twice the number of edges (from Section 8.3).
Therefore, the sum of all degrees is always even.
If some vertices have odd degree, their count must be even (odd + odd = even; you need pairs of odd numbers to sum to even).
Conclusion: in any graph, the number of vertices with odd degree is even—you cannot have exactly one.

🎯 Hamiltonian cycles: a much harder problem

🎯 Definition and contrast with Eulerian walks

Hamiltonian cycle: a closed walk that passes through each vertex exactly once.

Hamiltonian walk: a walk that encounters each vertex exactly once (but is not necessarily closed).

Key distinction:

Eulerian: focus on edges—traverse every edge exactly once.
Hamiltonian: focus on vertices—visit every vertex exactly once.
Example: A graph might have an Eulerian walk but no Hamiltonian cycle, or vice versa.

🖼️ Examples

The excerpt shows two examples of graphs with Hamiltonian cycles (drawn in red).
The Petersen graph is shown as an example of a graph with no Hamiltonian cycle.
Unlike Eulerian walks, there is no simple degree-based criterion to determine Hamiltonian cycle existence.

💻 Computational difficulty

Determining whether a graph has a Hamiltonian cycle is NP-complete.
This is "a very difficult class of problems in theoretical computer science."
In general, it is "very hard to determine whether a graph has a Hamiltonian cycle or not."
Contrast: Eulerian walks have an efficient, complete characterization (Euler's theorem), but Hamiltonian cycles do not.
Don't confuse: just because both are walks in graphs doesn't mean they have similar solution methods—Eulerian walks are easy to check, Hamiltonian cycles are computationally hard.

🔍 Proof techniques and omitted details

🔍 What the excerpt proves

The excerpt provides complete proofs for:
- Why odd-degree vertices must be start/end points (proves part (a) and part of (b)).
- Why even-degree vertices behave differently (proves part of (c)).
- Why exactly one odd-degree vertex is impossible.

🔍 What is omitted

The excerpt omits proofs that Eulerian walks actually exist in cases (b) and (c).
Main idea for case (c): if a walk arrives at a vertex v by one edge, it can continue leaving by another edge, reducing unused edges at v by 2.
Main idea for case (b): add an edge between the two odd-degree vertices u and v, find an Eulerian cycle using case (c), adjust the starting point to u, then delete the added edge to produce an Eulerian walk starting and ending at u and v.
The excerpt describes these as "very interesting" but does not provide full proofs.

Investigation: The number of walks

8.9 Investigation: The number of walks

🧭 Overview

🧠 One-sentence thesis

The number of walks of length N between any two vertices in a graph can be computed by raising the adjacency matrix to the Nth power, where each entry in the resulting matrix counts the walks between corresponding vertices.

📌 Key points (3–5)

What we count: the number of distinct walks (sequences of edges) of exactly length N from vertex i to vertex j in a graph.
The matrix method: the adjacency matrix M₁ encodes which vertices are adjacent; raising it to the Nth power (M₁ to the N) gives a matrix where each entry is the count of N-step walks.
How matrix multiplication works here: to find walks of length N from i to j, sum over all intermediate vertices v the product of (walks from i to v in N−1 steps) times (whether v connects to j in 1 step).
Common confusion: don't confuse "number of walks" with "number of paths"—walks can repeat vertices and edges; the matrix counts all possible sequences, not just simple paths.
Why it matters: this technique applies even to graphs with self-loops or directed edges, and it reveals patterns like Fibonacci numbers appearing in walk counts for certain graphs.

🐹 The hamster cage example

🐹 Setting up the problem

The excerpt introduces the idea through a hamster moving in a cage shaped like a graph:

The hamster moves along edges every minute, never staying put.
We label vertices 1 through 5 and want to count routes.
h(i,j)(N): the number of ways the hamster can travel from vertex i to vertex j in exactly N minutes.

Example: Starting at vertex 1, after 1 minute the hamster must be at vertex 2 (the only neighbor), so h(1,1)(1) = 0 (not back at 1 yet).

🧮 Computing small cases by hand

For N = 2 starting at vertex 1:

Possible routes: 1 → 2 → 1, 1 → 2 → 3, 1 → 2 → 5.
This gives h(1,1)(2) = 1, h(1,3)(2) = 1, h(1,5)(2) = 1, and h(1,2)(2) = 0, h(1,4)(2) = 0.

For N = 3 starting at vertex 1:

Five possible routes are listed (e.g., 1 → 2 → 3 → 4, 1 → 2 → 5 → 4, etc.).
The hamster is never back at vertex 1 after 3 steps, so h(1,1)(3) = 0.

📊 Organizing data in matrices

To track all starting and ending pairs for a given N, the excerpt defines an n × n matrix M_N:

The entry in row i, column j is h(i,j)(N).
M₁ is the adjacency matrix: entry (i,j) is 1 if vertices i and j are adjacent, 0 otherwise.

Example for N = 1:

For N = 2, the excerpt computes M₂ by hand; notice the diagonal entries equal the vertex degrees (because a 2-step walk from i back to i means going to a neighbor and returning).

🔍 The recursive idea

To compute h(1,4)(3), the excerpt breaks it down:

Consider all 2-minute routes from vertex 1 to some intermediate vertex v.
Check if v is adjacent to vertex 4.
Sum: h(1,4)(3) = h(1,1)(2)·0 + h(1,2)(2)·0 + h(1,3)(2)·1 + h(1,4)(2)·0 + h(1,5)(2)·1 = 1·1 + 1·1 = 2.

This matches the two explicit routes: 1 → 2 → 3 → 4 and 1 → 2 → 5 → 4.

Don't confuse: this is not about finding the shortest path; it counts all possible N-step sequences, even if they revisit vertices.

🧮 The main theorem

🧮 Statement of Theorem 8.9.2

Theorem 8.9.2: Let G be a labeled graph with n vertices. Let M_N be the n × n matrix whose (i,j)th entry is the number of walks in G of length N starting at vertex i and ending at vertex j. Then M₁ is the adjacency matrix of G and M_N = (M₁) to the Nth power.

In words: the number of N-step walks from i to j equals the (i,j) entry of the adjacency matrix raised to the Nth power.

🔧 How the proof works

The proof uses induction on N:

Base case (N = 1):

M₁ is the adjacency matrix by definition.
The (i,j) entry is 1 if i and j are adjacent (one 1-edge walk exists), 0 otherwise.

Inductive step (N ≥ 2):

Assume M_(N−1) = (M₁)^(N−1) is correct.
Let A = (M₁)^(N−1) and B = M₁.
By matrix multiplication, the (i,j) entry of A × B is the dot product of row i of A with column j of B:
- c(i,j) = a(i,1)·b(1,j) + a(i,2)·b(2,j) + ... + a(i,n)·b(n,j).
Interpret each term a(i,v)·b(v,j):
- a(i,v) counts (N−1)-step walks from i to v (by inductive hypothesis).
- b(v,j) is 1 if v is adjacent to j, 0 otherwise.
- So a(i,v)·b(v,j) counts N-step walks from i to j that pass through v as the second-to-last vertex.
Summing over all v gives the total count of N-step walks from i to j.

🌐 Generality

The theorem also applies to:

Graphs with self-loops (an edge from a vertex to itself).
Directed graphs (edges that can only be traversed in one direction).

Example: In the hamster cage, if there were a self-loop at vertex 1, the adjacency matrix would have a 1 in the (1,1) position.

🔢 Fibonacci numbers from walks

🔢 The directed graph example

The excerpt presents a directed graph with two vertices u and v:

u has a self-loop and an edge to v.
v has an edge to u.

Question: How many routes of length N are there from u to v?

📋 Brainstorming with a table

The excerpt lists counts for N = 0 to 5:

N	0	1	2	3	4	5
# ways	0	1	1	2	3	5

This matches the Fibonacci sequence: F₀ = 0, F₁ = 1, F₂ = 1, F₃ = 2, F₄ = 3, F₅ = 5.

🧩 Proof by recurrence relation

Let P_N be the number of routes of length N from u to v.

P₀ = 0, P₁ = 1.
Every path of length N+1 from u to v starts with either:
- The self-loop at u (then N more steps from u to v remain, giving P_N ways), or
- The edge from u to v, then back from v to u, then N−1 more steps from u to v (giving P_(N−1) ways).
So P_(N+1) = P_N + P_(N−1), the Fibonacci recurrence.
Since P₀ = F₀ and P₁ = F₁, we conclude P_N = F_N for all N.

🧮 Proof by matrix powers

The adjacency matrix for this directed graph (u first, v second) is:

M₁ = 
  1  1
  1  0

(The (1,1) entry is 1 because of the self-loop at u.)

By Theorem 8.9.2, the number of N-step routes from u to v is the (1,2) entry of (M₁)^N.

Claim: For N ≥ 1,

(M₁)^N = 
  F_(N+1)  F_N
  F_N      F_(N−1)

Proof by induction:

Base case (N = 1): F₂ = 1, F₁ = 1, F₀ = 0, so the claim holds.

Inductive step: Assume the claim for N−1. Then multiply:

(M₁)^N = (M₁)^(N−1) × M₁
       = [F_N      F_(N−1)]   [1  1]
         [F_(N−1)  F_(N−2)] × [1  0]
       = [F_N + F_(N−1),  F_N        ]
         [F_(N−1) + F_(N−2), F_(N−1)]
       = [F_(N+1),  F_N    ]
         [F_N,      F_(N−1)]

using the Fibonacci recurrence F_(N+1) = F_N + F_(N−1).

This confirms that the (1,2) entry of (M₁)^N is F_N, the Nth Fibonacci number.

🧪 Exercises and patterns

🧪 Complete graph K₄

The exercises ask students to:

Write down the adjacency matrix M₁.
Compute M_n for n = 2 to 6 using software (SAGE).
Count routes of length 6 from vertex 1 to itself and from vertex 1 to vertex 2.
Observe symmetry: all diagonal entries are the same (call it d), all off-diagonal entries are the same (call it e).
Conjecture the value of d − e depending on whether n is even or odd.

Why symmetry? In a complete graph, every vertex has the same degree and the same "role," so walk counts depend only on whether start and end are the same vertex.

🔄 Cycle graph C₄

For the 4-cycle:

Half the entries of M_n are zero.
Explanation splits by parity of n:
- If n is even, walks of even length from vertex i can only reach vertices at even distance from i.
- If n is odd, walks of odd length from vertex i can only reach vertices at odd distance from i.
The non-zero entries follow a pattern students are asked to conjecture.

🛤️ Path graph P₄

For the path graph (a line of 4 vertices):

Again, half the entries of M_n are zero (same parity reasoning).
The non-zero entries in the center of M₆ are larger because central vertices have more routes passing through them.

➡️ Directed graph with 10 vertices

Exercise 4 asks students to verify that there are 149 routes from vertex 0 to vertex 9 in a specific directed graph (arrows indicate allowed directions).

The diagram shows intermediate counts at each vertex.
This reinforces that the matrix method works for directed graphs.

Don't confuse: in directed graphs, the adjacency matrix is not symmetric; entry (i,j) = 1 only if there is an edge from i to j, not necessarily from j to i.

The language of trees

9.1 The language of trees

🧭 Overview

🧠 One-sentence thesis

Trees are the simplest possible graphs because they achieve a "sweet-spot": they are the minimal connected graphs (removing any edge disconnects them) and the maximal cycle-free graphs (adding any edge creates a cycle).

📌 Key points (3–5)

What a tree is: a connected graph that contains no cycle as a subgraph.
Why trees are simple: between every pair of vertices there is exactly one walk that does not backtrack (because connected + no cycles).
Minimal connected property: a tree is connected, but deleting any edge makes it disconnected.
Maximal cycle-free property: a tree contains no cycles, but adding any new edge creates a cycle.
Common confusion: don't confuse "minimal connected" with "maximal cycle-free"—trees satisfy both properties simultaneously, which is what makes them special.

🌳 Core definition and structure

🌳 What a tree is

Tree: A graph G = (V, E) is a tree if it is connected and contains no cycle as a subgraph.

Connected means there is a walk between every pair of vertices.
No cycles means there is no subgraph that forms a closed loop.
Together, these two properties guarantee that for every pair of vertices there is a unique walk between them that does not backtrack.
This uniqueness is one reason why trees are the most simple graphs.

🍃 Leaf

Leaf: A vertex of degree 1 in a tree.

A leaf is a vertex with exactly one edge connected to it.
Leaves are the "endpoints" of a tree structure.

🔍 Two characterizations of trees

🔍 Minimal connected graphs (Theorem 9.1.3)

Statement: A graph G is a tree if and only if it is connected, but deleting any edge makes it disconnected.

Why this works:

Forward direction: If G is a tree (connected + no cycles), then removing any edge uv must disconnect it. If removing uv still left G connected, there would be a path from u to v in the remaining graph, and adding back the edge uv would create a cycle—contradicting that G is a tree.
Reverse direction: If G is connected but removing any edge disconnects it, then G cannot have cycles. If G had a cycle, you could remove any edge from that cycle and the graph would still be connected (you can go around the rest of the cycle)—a contradiction.

Interpretation: Among all connected graphs on a given vertex set, trees are the smallest possible—you cannot remove any edge without breaking connectivity.

🔍 Maximal cycle-free graphs (Theorem 9.1.4)

Statement: A graph G is a tree if and only if it contains no cycles, but adding any new edge creates a cycle.

Why this matters:

Among all graphs without cycles on a given vertex set, trees are the largest possible—you cannot add any edge without creating a cycle.
The excerpt notes the proof is omitted but included in exercises.

Comparison table:

Property	What it means	Implication
Minimal connected	Connected, but removing any edge disconnects	Trees are the "smallest" connected graphs
Maximal cycle-free	No cycles, but adding any edge creates a cycle	Trees are the "largest" cycle-free graphs

Don't confuse: These are two different ways to characterize the same object (trees). One focuses on connectivity (minimal), the other on cycles (maximal). Both must hold simultaneously for a graph to be a tree.

🌲 Spanning trees

🌲 What a spanning tree is

Spanning tree: If G is a connected graph with n vertices, then a spanning tree of G is a subgraph of G that is a tree with n vertices.

A spanning tree keeps all the vertices of the original graph but removes edges to eliminate cycles while maintaining connectivity.
The excerpt notes there can be many spanning trees in a single graph.

Example: The excerpt mentions that both B₁,₆ (a star with 1 center and 6 leaves) and P₇ (a path with 7 vertices) are spanning trees inside K₇ (the complete graph on 7 vertices).

🌲 General principle

The excerpt states (in an exercise) that every connected graph contains a spanning tree.
This means you can always "trim down" a connected graph to a tree by removing edges that create cycles, while keeping all vertices.

The number of edges in a tree

9.2 The number of edges in a tree

🧭 Overview

🧠 One-sentence thesis

Every tree with n vertices has exactly n − 1 edges, regardless of the tree's shape or structure.

📌 Key points (3–5)

The fundamental edge-count formula: every tree with n vertices has exactly n − 1 edges.
Proof strategy: strong induction by removing one edge to split the tree into two smaller trees, then counting edges in each component.
Extension to forests: a forest with n vertices and m connected components has n − m edges.
Common confusion: many different tree shapes exist for the same n, but all have the same edge count (n − 1).
Storage implications: the fixed edge count means trees can be stored more efficiently than arbitrary graphs.

🌳 The core theorem and its proof

🌳 Theorem statement

Theorem 9.2.1: Every tree with n vertices has n − 1 edges.

Despite the wide variety of possible tree shapes with 14 vertices (or any n), every such tree has exactly 13 edges (or n − 1 edges).
This is a structural invariant: the edge count depends only on the vertex count, not on the tree's shape.

🔍 Proof by strong induction

Base case (n = 1):

A tree with 1 vertex has no edges.
This matches the formula: 1 − 1 = 0 edges.

Inductive step (n ≥ 2):

Assumption: every tree with m < n vertices has m − 1 edges.
Goal: prove every tree with n vertices has n − 1 edges.
Strategy: take an arbitrary tree G with n vertices and at least one edge e = v₁v₂.

✂️ Removing an edge splits the tree

By Theorem 9.1.3, removing edge e leaves a disconnected graph G′.
G′ has exactly two connected components (adding one edge can only connect two parts).
Let the component containing v₁ have m₁ vertices, and the component containing v₂ have m₂ vertices.
Note: m₁ + m₂ = n.

🧮 Counting edges in the components

Since G has no cycles, each component is also a tree.
The component with v₁ has m₁ vertices, so by the inductive assumption it has m₁ − 1 edges.
The component with v₂ has m₂ vertices, so it has m₂ − 1 edges.
When we add edge e back in, the total edge count is:
- (m₁ − 1) + (m₂ − 1) + 1 = m₁ + m₂ − 1 = n − 1.
This completes the induction.

Example: In the diagram mentioned, m₁ = 5 and m₂ = 4, so the components have 4 and 3 edges respectively; adding back the removed edge gives 4 + 3 + 1 = 8 edges for a tree with 9 vertices.

🌲 Extension to forests

🌲 What is a forest

Forest: a graph whose connected components are all trees.

A forest can consist of multiple separate trees.
Example: the excerpt describes a forest with six trees, one of which is a single isolated vertex.

🌲 Edge count in forests

Theorem 9.2.3: Every forest with n vertices and m connected components has n − m edges.

This generalizes Theorem 9.2.1 (a tree is a forest with m = 1 component).
Proof strategy (from exercises): apply Theorem 9.2.1 to each component separately.
- If component i has nᵢ vertices, it has nᵢ − 1 edges.
- Sum over all m components: total edges = (n₁ − 1) + (n₂ − 1) + ... + (nₘ − 1) = (n₁ + n₂ + ... + nₘ) − m = n − m.

Don't confuse: a forest is not a single tree; it is a collection of disjoint trees, so the formula adjusts for the number of components.

💾 Storage efficiency for trees

💾 Why tree storage can be more efficient

General graphs with n vertices can have up to n(n − 1)/2 edges.
Trees with n vertices always have exactly n − 1 edges, a much smaller number.
This fixed structure allows more efficient storage than general graph representations.

💾 Adjacency matrix for trees

An n × n adjacency matrix can store 2^(n² − n)/2 different labeled graphs.
But there are only n^(n − 2) labeled trees on n vertices (Cayley's theorem, mentioned for Section 9.3).
The adjacency matrix is inefficient for trees because it reserves space for many more edges than a tree can have.

💾 Edge list for trees

A tree with n vertices has n − 1 edges, so it can be stored as a 2 × (n − 1) list of edges.
This requires 2(n − 1) entries from {0, 1, ..., n − 1}.
This representation can store n^(2(n − 1)) different things, still more than the n^(n − 2) actual labeled trees.
It is closer to efficient but not perfectly efficient.

💾 Binary search trees

Mentioned as a reasonably efficient storage structure for labeled trees.
Also easy to edit (adding or removing nodes).
The excerpt invites thinking about other good storage structures for trees.

🔗 Related concepts from exercises

🔗 Characterizing trees by edge count

Exercise 1: if G is a connected graph with n vertices and n − 1 edges, then G is a tree.
This is the converse of Theorem 9.2.1: the edge count n − 1 (plus connectedness) characterizes trees.

🔗 Creating spanning trees

Spanning tree: a subgraph of G that is a tree with all n vertices of G.
To create a spanning tree from a complete graph Kₙ, remove n − 1 edges from the total of n(n − 1)/2 edges, leaving n − 1 edges.
To create a spanning tree from a complete bipartite graph Bₙ,ₘ, remove edges to leave (n + m) − 1 edges.

🔗 Degree sum in trees

Exercise 7: for a tree with n vertices, the sum of the degrees of all vertices equals 2(n − 1).
This follows from the handshaking lemma: sum of degrees = 2 × number of edges = 2(n − 1).

Labeled trees: Cayley's theorem

9.3 Labeled trees: Cayley’s theorem

🧭 Overview

🧠 One-sentence thesis

Cayley's theorem states that the number of distinct labeled trees on n vertices is n^(n−2), a surprisingly simple formula for a complex counting problem.

📌 Key points (3–5)

What a labeled tree is: a tree where each vertex is assigned a unique number from 0 to n−1; different labelings produce different labeled trees.
The counting question: how many distinct labeled trees exist for a given number of vertices n.
Cayley's formula: L(n) = n^(n−2) gives the exact count of labeled trees on n vertices.
Common confusion: swapping labels on two vertices can produce the same labeled tree (e.g., n=2), but in general different labelings yield different trees; also, unlabeled trees that look identical can correspond to multiple distinct labeled trees.
Why the proof is non-trivial: simple enumeration methods (counting all possible edge sets and excluding non-trees) become impractical for larger n; Cayley's theorem requires deeper combinatorial techniques.

🏷️ What labeled trees are

🏷️ Definition and labeling

A labeling of a tree is an assignment of numbers to each vertex. If a tree has n vertices, then we will always use the labels 0 through n−1, where each such integer is used exactly once as a label.

Each vertex gets a distinct integer label from 0 to n−1.
Two trees with the same structure but different labelings are considered different labeled trees.
Example: A tree with vertices labeled 0,1,2,4,3 in one arrangement is different from the same structure with labels 0,1,2,3,4 in a different arrangement.

🔄 Labeled vs unlabeled trees

Unlabeled trees: focus only on the graph structure (which vertices connect to which).
Labeled trees: the specific assignment of labels matters.
The excerpt shows two trees that are "the same as unlabeled trees" but would be different if labels were assigned differently.
Don't confuse: for n=2, swapping the labels 0 and 1 on the single edge still results in the same labeled tree (because the structure and label set are identical), but for larger n, different labelings generally produce different labeled trees.

🔢 Counting labeled trees for small n

🔢 Small cases: n=1,2,3

n	L(n)	Description
1	1	Single vertex labeled 0
2	1	One edge with vertices labeled 0 and 1 (swapping labels gives the same tree)
3	3	Three distinct labeled trees shown in the excerpt

For n=3, the excerpt draws all three labeled trees explicitly.
These small cases help build intuition but do not easily reveal a general pattern.

🔢 Case n=4: detailed enumeration

By Theorem 9.2.1, a tree on 4 vertices has exactly 3 edges.
There are (4 choose 2) = 6 possible locations for edges between 4 labeled vertices.
Total labeled graphs with 3 edges: (6 choose 3) = 20.
Of these 20, exactly 4 are not trees (they contain cycles or are disconnected).
Therefore, L(4) = 20 − 4 = 16 labeled trees on 4 vertices.
The excerpt shows the 4 non-tree graphs explicitly.

🔢 Why enumeration becomes impractical

The method used for n=4 (enumerate all possible edge sets, then exclude non-trees) grows rapidly in complexity.
The excerpt notes: "try to expand the methods we used to compute L(4) to compute L(5) or L(6)" to see the difficulty.
This motivates the need for a closed-form formula.

🎯 Cayley's theorem

🎯 The formula

Theorem 9.3.1 (Cayley's Theorem): The number of labeled trees on n vertices is L(n) = n^(n−2).

This formula gives the exact count for any n.
Verification with small cases:
- L(1) = 1^(1−2) = 1^(−1) = 1 (matches)
- L(2) = 2^(2−2) = 2^0 = 1 (matches)
- L(3) = 3^(3−2) = 3^1 = 3 (matches)
- L(4) = 4^(4−2) = 4^2 = 16 (matches)
The formula is remarkably simple given the complexity of the counting problem.

🎯 Depth of the proof

The excerpt emphasizes: "The proof of Cayley's theorem is much deeper than a simple count."
Direct enumeration (as done for n=4) does not scale or reveal the underlying structure.
The proof requires advanced combinatorial techniques (the excerpt references a visual proof by Joyal in the exercises).
Don't confuse: the statement of Cayley's theorem is simple, but the proof is non-trivial and not covered in this excerpt.

🌳 Related concepts: binary trees

🌳 Definition of binary trees

A binary tree is either a single vertex with no edges, or a graph on more than one vertex consisting of a root vertex v and two edges, L and R, each connecting v to the root vertex of a smaller binary tree.

Binary trees are defined recursively.
Base case: a single vertex with no edges.
Recursive case: a root vertex connected to two smaller binary trees via a left edge (L) and a right edge (R).

🌳 Structure and terminology

Root vertex: the top vertex in the tree.
Left child / right child: the vertices connected to a parent via the left or right edge.
Parent: the vertex above a given vertex in the tree.
Every vertex in a binary tree either has exactly two children (left and right) or no children (a leaf).
The recursive definition ensures that left/right distinctions are well-defined throughout the tree.

🌳 Informal definition

A binary tree is a graph formed by starting with a root vertex at the top, and drawing edges and vertices downwards to the left or right in such a way that every vertex either has two children (a left and right child) or no children.

This informal definition emphasizes the visual/structural aspect: edges go "down-and-left" or "down-and-right."
Example: the excerpt shows building up binary trees step-by-step, starting from a single vertex and combining smaller trees.

🧮 Exercises and extensions

🧮 Rooted and doubly rooted trees

The exercises ask for counts of:
- Trees on 5 labeled vertices (direct application of Cayley's theorem).
- Rooted trees on 5 labeled vertices (a tree with a designated root vertex).
- Doubly rooted trees on 5 labeled vertices (a tree with two designated root vertices).
These variants extend the basic labeled tree counting problem.

🧮 NCAA bracket example

The excerpt uses a single-elimination tournament (64 teams, 63 games) as an example of a rooted binary tree with 64 leaves.
Unfilled bracket: a partially labeled binary tree where only the 64 leaf vertices (teams) are labeled.
Filled bracket: all vertices are labeled with predicted winners.
Key insight: a filled NCAA bracket is not a labeled tree in the sense of Cayley's theorem, because the labels are not unique integers 0 to n−1; instead, labels (team names) repeat as teams advance.
The basketball counting argument: each of 64 teams brings one ball; one ball is used per game; the losing team takes the used ball; at the end, 1 ball remains unused (the champion's), so 63 balls were used, meaning 63 games were played.
Number of ways to fill out the bracket: 2^63 (each of the 63 games has 2 possible outcomes).

Binary trees

9.4 Binary trees

🧭 Overview

🧠 One-sentence thesis

Binary trees can be counted using Catalan numbers, and their labeled variants (increasing binary trees) correspond exactly to permutations, providing powerful combinatorial tools for modeling tournaments and other recursive structures.

📌 Key points (3–5)

What a binary tree is: a recursive structure where each vertex has either two children (left and right) or none, built from a root vertex downward.
Counting leaves vs vertices: the number of binary trees with n leaves equals the *(n−1)*st Catalan number; counting by vertices requires a modified definition ("at-most binary trees").
Common confusion: binary trees vs at-most binary trees—at-most binary trees allow vertices to have only a left child, only a right child, both, or neither; standard binary trees require either two children or none.
Tournament application: binary trees model tournament brackets where players face off in pairs; the number of possible tournament structures grows as Catalan numbers.
Increasing binary trees: labeling at-most binary trees with increasing numbers along downward paths creates a bijection with permutations, so there are n! increasing binary trees on n vertices.

🌳 Core definition and structure

🌳 What a binary tree is

Binary tree: either a single vertex with no edges, or a graph consisting of a root vertex v and two edges, L and R, each connecting v to the root vertex of a smaller binary tree.

The definition is recursive: you build larger trees by joining smaller ones.
Each new edge is marked as either a left edge or right edge.
The vertex at the top of an edge is the parent; the vertex at the bottom is the child (left child or right child).
Example: start with a single vertex •, then join two single-vertex graphs to a new root to get a three-node tree with one parent and two children.

🔄 Informal definition

Binary tree (informal): a graph formed by starting with a root vertex at the top and drawing edges and vertices downward to the left or right, so that every vertex either has two children (a left and right child) or no children.

This phrasing emphasizes the "downward branching" structure.
The key constraint: every vertex has exactly two children or none—no vertex has only one child in a standard binary tree.

🏆 Tournament application and counting by leaves

🏆 Modeling tournaments

Binary trees can represent tournament brackets where players face off in pairs and winners advance.
The leaves (vertices with no children) represent the initial players.
The left vs right distinction matters: for example, in a chess tournament, the left child goes first and the right child goes second.
Example: with 5 players labeled 1, 2, 3, 4, 5, you can seed them into a binary tree with 5 leaves; players at sibling positions face off, and the winner advances to the parent node.

🔢 Counting binary trees by leaves

Question: How many binary trees have exactly n leaves?

Answer: The number of binary trees with n leaves is the *(n−1)*st Catalan number, Cn−1.

Why this works:

Let Bn be the number of binary trees with n leaves.
Initial condition: B1 = 1 (only one tree with one leaf), and C0 = 1, so B1 = C0.
Recursion: any tree with n+1 leaves has some number k ≥ 1 of leaves on the left subtree and n+1−k on the right subtree.
Counting all cases: Bn+1 = B1Bn + B2Bn−1 + ⋯ + BnB1.
This matches the Catalan recursion Cn = C0Cn−1 + C1Cn−2 + ⋯ + Cn−1C0 (with indices shifted by 1).
Therefore Bn = Cn−1 for all n ≥ 1.

🎲 Counting labeled tournaments

For 5 players, there are C4 = 14 unlabeled binary trees with 5 leaves.
Using the explicit Catalan formula Cn = (1/(n+1)) × (2n choose n), we get C4 = (1/5) × (8 choose 4) = 14.
If you have 5 different players, the number of ways to assign them to the 5 leaves is 5! = 120.
Total number of different tournaments for 5 players: 14 × 120 = 1680.

🌿 At-most binary trees and counting by vertices

🌿 What an at-most binary tree is

At-most binary tree: a tree constructed recursively starting from a root node at the top, where every node has either a left child only, a right child only, both a left and right child, or neither.

This is a relaxed version of the standard binary tree definition.
A vertex can now have only one child (left or right), not just two or none.
Don't confuse: standard binary trees require two children or none; at-most binary trees allow more flexibility.

🔢 Counting at-most binary trees by vertices

Question: How many at-most binary trees have exactly n vertices?

Answer: The number of at-most binary trees on n vertices is the nth Catalan number, Cn.

Why this works:

Let An be the number of at-most binary trees on n vertices.
Initial condition: consider the empty tree as an at-most binary tree, so A0 = 1 = C0.
Recursion: for n ≥ 1, any tree has a root vertex, and the root has either a left child only, a right child only, both, or neither.
Both the left and right branches form (possibly empty) at-most binary trees, with a total of n−1 vertices in the left and right trees combined.
If there are k vertices on the left and n−1−k on the right, there are AkAn−1−k possible trees.
Summing over all k: An = A0An−1 + A1An−2 + ⋯ + An−1A0.
This matches the Catalan recursion, so An = Cn.

📊 Example

All at-most binary trees on 3 vertices: there are 5 of them, which equals C3 = 5.
The excerpt shows these 5 trees visually.

🔢 Increasing binary trees and permutations

🔢 What an increasing binary tree is

Increasing binary tree: an at-most binary tree with vertices labeled by the numbers 1, 2, …, n such that the labels increase as one reads any downward path in the tree.

Warning: despite the name, an increasing binary tree may not be a standard binary tree—it only needs to be an at-most binary tree.
The key property: labels increase along any path from root to leaf.

🔄 Bijection with permutations

Claim: Increasing binary trees on n labeled vertices are in bijection with permutations of {1, 2, 3, …, n}.

How to convert a tree to a permutation:

The label 1 is at the root and corresponds to the position of 1 in the permutation.
The left subtree contains entries to the left of 1 in the permutation; the right subtree contains entries to the right of 1.
Repeat this analysis recursively on each subtree: the root of each subtree determines the position of that number relative to its siblings.
Example: given a tree with root 1, left subtree rooted at 3, and right subtree rooted at 2, you can reconstruct the permutation step by step.

How to convert a permutation to a tree:

Place 1 at the root.
The left child of 1 is the smallest element to the left of 1 in the permutation; the right child is the smallest element to the right of 1.
Recursively determine the left and right children of each successive node.
This process is reversible, so the correspondence is a bijection.

🎯 Counting increasing binary trees

Theorem: There are exactly n! increasing binary trees on n vertices.

Why: Since increasing binary trees are in bijection with permutations, and there are n! permutations of {1, 2, …, n}, there must be n! increasing binary trees.

📋 Summary table

Tree type	Constraint	Counted by	Formula
Binary tree (by leaves)	Every vertex has 2 children or none	Number of leaves n	C<sub>n−1</sub> (Catalan)
At-most binary tree	Every vertex has 0, 1, or 2 children	Number of vertices n	C<sub>n</sub> (Catalan)
Increasing binary tree	At-most binary tree with increasing labels	Number of vertices n	n! (permutations)

Minimal Spanning Trees

10.1 Minimal spanning trees

🧭 Overview

🧠 One-sentence thesis

Kruskal's greedy algorithm efficiently finds a minimal spanning tree—the cheapest way to connect all vertices in a weighted graph—by repeatedly adding the cheapest edge that doesn't create a cycle.

📌 Key points (3–5)

What a minimal spanning tree is: a sub-tree containing all vertices of a weighted connected graph with the smallest possible total edge weight.
Why brute force fails: checking all possible spanning trees grows exponentially (for n = 50 vertices, roughly 3.6 × 10⁸¹ trees—on the order of atoms in the universe).
How Kruskal's algorithm works: start with all vertices, then repeatedly add the cheapest remaining edge that connects two separate components until the graph is connected.
Common confusion: Kruskal's is a greedy algorithm (locally optimal at each step) but surprisingly also achieves the global optimum (the minimal spanning tree).
Uniqueness caveat: if no two edges have the same weight, the minimal spanning tree is unique; otherwise, multiple minimal spanning trees may exist with the same total cost.

🌳 What is a minimal spanning tree?

🏗️ Weighted graphs and sub-trees

Weighted graph: a graph equipped with a real number weight on each edge.

Sub-tree of a graph G: a subgraph of G which also happens to be a tree.

The weight represents cost, distance, or any quantity you want to minimize.
Example: the fiber-optic cable problem asks for the cheapest way to connect 6 towns; each edge's weight is the installation cost.

🎯 Minimal spanning tree definition

Minimal spanning tree: given a weighted connected graph G, a sub-tree containing all vertices of G whose total sum of edge weights is as small as possible.

It must be a tree (connected, no cycles).
It must span all vertices (every vertex is included).
Its total edge weight is minimized.
Example: connecting towns with fiber-optic cable—you want all towns reachable but with the lowest total installation cost.

🔀 Uniqueness and ties

If no two edges have the same weight, the minimal spanning tree is unique.
If some edges have the same weight, multiple minimal spanning trees may exist (the excerpt shows two examples: one graph with a unique minimal spanning tree, another with two different minimal spanning trees).
Don't confuse: different minimal spanning trees can exist, but they all have the same total cost.

🚫 Why brute force is impractical

💥 Exponential explosion

A poor algorithm: compute the cost of all possible spanning trees, then pick the cheapest.
For a complete labeled graph on n vertices, there are n to the power (n − 2) labeled trees (by Cayley's Theorem).
Example: for n = 50 vertices, that's 50⁴⁸ ≈ 3.6 × 10⁸¹ trees—comparable to the number of atoms in the universe.
This brute-force approach is computationally infeasible for even moderately sized graphs.

🛠️ Kruskal's algorithm

📋 Step-by-step procedure

Kruskal's algorithm to produce a minimal spanning tree from a weighted connected graph G:

Include all vertices of the graph G.
Include the cheapest edge (the one with the smallest weight). Remove that edge from consideration.
If the subgraph is not connected, add the cheapest remaining edge that connects two of its connected components to the subgraph. Remove that edge from consideration.
Repeat step (3) until the graph is connected.

At each step beyond the first, exactly one edge is added and the output has no cycles.
When the algorithm terminates, the output is a connected graph with no cycles (a tree) containing all vertices of G.
For a graph with n vertices, the algorithm stops after adding exactly n − 1 edges.

🍀 Greedy strategy

Greedy algorithm: at each step you do the cheapest possible thing to make progress.

Kruskal's algorithm is greedy: always pick the cheapest edge that doesn't form a cycle.
What is surprising: this local optimization at each step also achieves the global optimum (the minimal spanning tree).
Example: you don't need to look ahead or reconsider earlier choices; just pick the cheapest valid edge each time.

🎲 Handling ties

If there is more than one cheapest remaining edge that does not form a cycle, ties can be broken arbitrarily.
Whichever way you break the tie will still lead to a minimal spanning tree.
Question from the excerpt: If you and your friend each perform Kruskal's algorithm on a weighted graph, will you necessarily get the same minimal spanning trees as output? Will the minimal spanning trees you get necessarily have the same costs?
- Answer (implicit): you may get different trees if there are ties, but the total costs will always be the same.

✅ Why Kruskal's algorithm works

🧾 Proof outline

The excerpt provides a proof that Kruskal's algorithm returns a minimal spanning tree:

Let T be the spanning tree found by Kruskal's algorithm.
Let G₀ be any other spanning tree.
Goal: show cost(T) ≤ cost(G₀), which implies T is minimal.
Let e be the first edge (when constructing T via Kruskal's) that is not in G₀.
Adding e to G₀ creates a cycle C (by a theorem: adding any new edge between pre-existing vertices in a tree creates a cycle).
This cycle C is not contained in T, so there is an edge f in C that is not in T.
Define G₁ = (G₀ ∪ e) \ f (add e, remove f).
The weight of e is less than or equal to the weight of f (because Kruskal's algorithm chose e over f at that stage).
Therefore cost(G₁) ≤ cost(G₀).
Repeat this process: replace G₀ with G₁, obtaining G₂, and so on.
Each iteration adds one more edge from T and removes one edge not in T, so eventually Gₖ = T for some integer k.
We have cost(G₀) ≥ cost(G₁) ≥ cost(G₂) ≥ ... ≥ cost(Gₖ) = cost(T).
Hence Kruskal's algorithm always produces a minimal spanning tree.

🔑 Key insight

At each step, Kruskal's picks an edge e that is cheaper (or equal in cost) than any edge f it could have picked instead.
By swapping edges (add e, remove f) in any other spanning tree, you can transform it into the tree produced by Kruskal's without increasing cost.
This shows that Kruskal's tree is at least as cheap as any other spanning tree.

🧩 Context and applications

🌐 Real-world example

The excerpt opens with a fiber-optic cable problem: connect 6 towns (Laramie, Cheyenne, Greeley, Denver, Boulder, Fort Collins) with the cheapest total installation cost.
Each edge has a cost label (e.g., 3, 4, 5, 5.5, 6, 7, 11, 12, 13, 14).
The minimal spanning tree gives the cheapest way to connect all towns.

📊 Optimization problems on graphs

The excerpt situates minimal spanning trees within a broader class of optimization problems on graphs.
Some problems (like minimal spanning trees) can be solved efficiently via known algorithms.
Other problems (like the traveling salesperson problem, mentioned in Section 10.2) are difficult to solve exactly, though efficient approximate algorithms may exist.

Problem type	Solvability	Example
Minimal spanning tree	Efficiently solvable (Kruskal's algorithm)	Fiber-optic cable installation
Traveling salesperson	Difficult to solve exactly	(Mentioned in next section)

10.2 Traveling salesperson problem

🧭 Overview

🧠 One-sentence thesis

The traveling salesperson problem seeks a minimal-cost tour visiting every vertex, and while finding the optimal solution is computationally very hard (NP-hard), the Tree Shortcut Algorithm can approximate it within a factor of 2 when the graph satisfies the triangle inequality.

📌 Key points (3–5)

What a tour is: a closed walk that visits every vertex and is allowed to retrace edges.
Why it's hard: greedy approaches fail; finding an optimal tour is NP-hard (super-polynomial running time).
Triangle inequality: a condition where one edge of a triangle never costs more than the sum of the other two; complete graphs with Euclidean distances always satisfy it.
Tree Shortcut Algorithm: uses a minimal spanning tree to build an approximate tour that costs at most twice the optimal tour (when triangle inequality holds).
Common confusion: the Tree Shortcut Algorithm does not guarantee an optimal tour, only one within a factor of 2 of optimal—and this guarantee requires the triangle inequality.

🎯 Problem definition and real-world context

🎯 What is a tour?

A tour of a connected graph G is a closed walk that is allowed to retrace its steps (cross an edge more than once) and that visits every vertex of G.

The goal: find a tour of minimal cost in a weighted connected graph.
A tour may use the same edge more than once if needed.
Example: In a graph with vertices 1, 2, 3 and edges with costs 1, 17, 3, 2, 2, an optimal tour may utilize an edge twice.

🏭 Real-world applications

The excerpt gives two scenarios:

Salesperson: visiting cities and driving back home while minimizing gas usage.
Manufacturing plant: a drill bit drilling holes on a metal plate and returning to its starting position before the next plate arrives; the tour is repeated thousands of times per day, so minimizing distance is critical.

🚫 Why greedy algorithms fail

🚫 Greedy approaches don't work

In Section 10.1, Kruskal's algorithm (a greedy method) successfully finds minimal spanning trees.
For the traveling salesperson problem, greedy approaches fail miserably.
If you add the cheapest edges first, you may be forced later to choose extremely expensive edges, leading to a suboptimal solution.

🧮 Computational hardness

Finding an optimal tour is an NP-hard problem, which roughly means that the running time of a computer program is likely super-polynomial in terms of the number of vertices in a graph.

"Super-polynomial" means the time grows faster than any polynomial function of the number of vertices.
In practice, this makes finding exact optimal tours infeasible for large graphs.

📐 Triangle inequality and its role

📐 What is the triangle inequality?

A weighted complete graph G satisfies the triangle inequality if for any triangle (or subgraph C₃) in the graph G, the edge weights a, b, and c of this triangle satisfy:
a ≤ b + c, b ≤ a + c, and c ≤ a + b.

In other words: the cost of traversing one side of a triangle is never more expensive than the cost of traversing the other two sides.
Complete graphs drawn in the plane with edge weights given by Euclidean distances always satisfy the triangle inequality.

📐 Why it matters

The Tree Shortcut Algorithm (described next) requires the triangle inequality to guarantee that the approximate tour costs at most twice the optimal tour.
Without the triangle inequality, the algorithm can produce tours much worse than optimal (the excerpt mentions an example where the tour can be longer than 1000 times an optimal tour).

🌲 Tree Shortcut Algorithm

🌲 The algorithm steps

The Tree Shortcut Algorithm has three steps:

(a) Find a minimal spanning tree for the complete weighted graph G.
(b) Consider the tour that wraps around the outside of the minimal spanning tree, crossing every edge in this tree twice.
(c) When traversing the tour in (b), whenever possible skip visiting a vertex that has already been visited.

The excerpt illustrates this on a complete graph with 7 vertices, with edge labels given by Euclidean distances.
The starting edge (indicated by a red arrow) affects the final shortcut tour; different starting edges can produce different shortcut tours.

🌲 Non-uniqueness

There is not a unique way to proceed from step (b) to step (c).
Starting at a different edge may yield a different shortcut tour.

🎓 Theoretical guarantee: the 2-approximation

🎓 Main theorem

Theorem 10.2.3: If the nonnegative costs in a weighted complete graph G satisfy the triangle inequality, then the Tree Shortcut Algorithm finds a tour that costs at most twice as much as an optimal tour.

This is called a "2-approximation" or "within a factor of 2."
The proof uses two lemmas.

🎓 Lemma 1: Optimal tour ≥ Minimal spanning tree

Lemma 10.2.4: In a connected graph with nonnegative edge weights, the cost of an optimal tour is at least as much as the cost of a minimal spanning tree.

Proof idea:

Let M be the cost of a minimal spanning tree in G.
[cost of an optimal tour] ≥ [cost of an optimal tour minus any edge] ≥ M.
The first inequality: removing an edge (with nonnegative cost) cannot increase the total cost.
The second inequality: a tour is connected and reaches every vertex; removing edges from the tour forms a spanning tree, which must cost at least M (since M is minimal).

🎓 Lemma 2: Tree Shortcut output ≤ 2 × Minimal spanning tree

Lemma 10.2.5: If the edge costs in a weighted complete graph G satisfy the triangle inequality, then the Tree Shortcut Algorithm finds a tour that costs at most twice as much as a minimal spanning tree.

Proof idea:

Let M be the cost of a minimal spanning tree in G.
Step (b) produces a tour of cost 2M (wrapping around the tree, crossing every edge twice).
Step (c) shortcuts the tour by skipping already-visited vertices.
By the triangle inequality, skipping a vertex (taking a direct edge instead of two edges) can only decrease the cost.
Therefore, [cost of output of Tree Shortcut Algorithm] ≤ 2M.

🎓 Proof of Theorem 10.2.3

Combining the two lemmas:

[cost of output of Tree Shortcut Algorithm] ≤ 2M (Lemma 10.2.5)
2M ≤ 2 · [cost of optimal tour] (Lemma 10.2.4)
Therefore, the Tree Shortcut output costs at most twice the optimal tour.

⚠️ Don't confuse: when the guarantee fails

Question 10.2.6 asks: Can you find an example to show that Theorem 10.2.3 may fail if the edge weights don't satisfy the triangle inequality?
The excerpt mentions (in the exercises) that without the triangle inequality, a tour found by the Tree Shortcut Algorithm can be longer than 1000 times an optimal tour.
The 2-approximation guarantee is not universal; it depends critically on the triangle inequality.

📝 Summary table: Key results

Concept	Definition / Result
Tour	Closed walk visiting every vertex, may retrace edges
Optimal tour	Tour of minimal cost
NP-hard	Finding optimal tour is computationally very hard (super-polynomial time)
Triangle inequality	For any triangle with edges a, b, c: a ≤ b + c, b ≤ a + c, c ≤ a + b
Tree Shortcut Algorithm	(a) Find minimal spanning tree; (b) wrap around it (2M cost); (c) shortcut by skipping visited vertices
Theorem 10.2.3	If triangle inequality holds, Tree Shortcut output ≤ 2 × optimal tour
Lemma 10.2.4	Optimal tour cost ≥ minimal spanning tree cost
Lemma 10.2.5	Tree Shortcut output ≤ 2 × minimal spanning tree cost (under triangle inequality)

Matchings

10.3 Matchings

🧭 Overview

🧠 One-sentence thesis

Finding matchings in graphs—especially maximum and saturated matchings—is a central combinatorial optimization problem, and Hall's Marriage Theorem provides a precise condition for when a saturated matching exists in a bipartite graph.

📌 Key points (3–5)

What a matching is: a subset of edges where no two edges share a common vertex.
Maximal vs maximum: maximal means "cannot add more edges," but maximum means "has the largest possible number of edges"—every maximum matching is maximal, but not vice versa.
Augmenting paths: a technique to improve a matching by finding a path that alternates between matched and unmatched edges, allowing you to increase the matching size by one.
Saturated matchings in bipartite graphs: Hall's Marriage Theorem states that a saturated matching (matching every vertex on the smaller side) exists if and only if every subset X of the left side has a neighborhood at least as large as X.
Common confusion: maximal and maximum sound similar but differ—maximal only means locally "stuck," while maximum is globally optimal.

🔗 Core definitions

🔗 What is a matching?

A matching in a graph G = (V, E) is a subset S of the edge set E such that no two of the edges in S share a common vertex.

In other words, each vertex is an endpoint of at most one edge in the matching.
Example: In the complete graph K₅, a matching with two edges means four vertices are matched and one is left out.

📏 Maximal matching

A matching is maximal if it is not contained in a larger matching.

This means you cannot add any more edges without violating the matching property (two edges sharing a vertex).
Don't confuse: maximal does not mean "best possible"—it only means "locally stuck."
Example: In the path graph P₄ on 4 vertices, a single edge in the middle is maximal (you cannot add another edge without overlap), but a matching with two edges (one at each end) is also maximal and has more edges.

🏆 Maximum matching

A maximum matching is a matching with the largest possible number of edges.

This is the globally optimal matching.
Every maximum matching is also maximal, but not every maximal matching is maximum.
Example: In P₄, the two-edge matching is both maximal and maximum; the one-edge matching is maximal but not maximum.

✨ Perfect matching

A perfect matching is a matching such that every vertex of V is an endpoint of (exactly one) edge in S.

A perfect matching is always maximum (and maximal).
Example: In K₅ (5 vertices), no perfect matching exists because you cannot pair all five vertices; the maximum matching size is 2 (four vertices matched).

🛠️ Finding matchings

🛠️ Greedy algorithm for maximal matching

How it works: Label vertices 1, 2, 3, …, n. Choose the edge (i, j) where i is minimal and j is smallest among all edges from i. Delete those two vertices and repeat until no more edges can be added.
Result: This always produces a maximal matching.
Limitation: The greedy algorithm does not guarantee a maximum matching—it may get "stuck" in a locally good but globally suboptimal solution.

🔄 Augmenting paths

An augmenting path for a matching M is a path that starts and ends at two unmatched vertices and alternates between edges not in M and edges in M, with an odd number of edges total.

Why it matters: If you find an augmenting path, you can "flip" the edges (remove the matched edges in the path from M and add the unmatched edges) to get a new matching with one more edge.
Key property: The set P − M (edges in the path but not in M) is a matching with one more edge than M.
Example: In the complete bipartite graph B₄,₄ (minus one edge), starting with a matching of size 3, an augmenting path with 7 edges (alternating unmatched, matched, unmatched, …) can be used to construct a matching of size 4.

🔍 Hopcroft-Karp algorithm

The excerpt mentions this as a general algorithm for finding a maximum matching in any graph.
The full algorithm is not described, but the key idea is to repeatedly find augmenting paths to improve the matching until no augmenting path exists.

🌉 Bipartite graphs and saturated matchings

🌉 Saturated matching

In a bipartite graph with left set L and right set R where |L| ≤ |R|, a saturated matching is one that matches every element of L with some element of R.

A saturated matching is necessarily a maximum matching (because you cannot match more than |L| vertices from L).
Example: Four students and multiple dorms—if each student is matched to a distinct dorm, that is a saturated matching.

🏘️ Neighborhood of a set

The neighborhood N_G(S) of a subset S ⊆ V is the set of all vertices v such that v is connected by an edge to some element of S.

Example: If S = {v} is a single vertex, then N_G(S) is the set of vertices adjacent to v, and |N_G(S)| is the degree of v.
This concept is central to Hall's Marriage Theorem.

💍 Hall's Marriage Theorem

Hall's Marriage Theorem: Let G be a bipartite graph with left set L, right set R, and |L| ≤ |R|. A saturated matching exists if and only if, for every subset X ⊆ L, |X| ≤ |N_G(X)|.

Intuition (necessity): If some subset X of the left side has a neighborhood smaller than X, then by the pigeonhole principle you cannot match all of X to distinct vertices on the right—so the condition is necessary.
Sufficiency: The interesting part is that this condition is also sufficient (the proof is complicated and not given in full).
How to use it: To check if a saturated matching exists, verify that every subset X of L has at least |X| neighbors in R.
Example: In the student-dorm problem, if students a, b, d together have only two dorm choices (Parmelee and Braiden), then |X| = 3 but |N_G(X)| = 2, violating the condition—so no saturated matching exists.

🧪 Applying Hall's Theorem

Example 1 (violation): Students a, b, d have only two dorm choices between them, so |{a, b, d}| = 3 > 2 = |N_G({a, b, d})|—no saturated matching.
Example 2 (satisfied): After students change preferences so each checks two dorms, you can verify that every subset X has |X| ≤ |N_G(X)|, so a saturated matching exists (e.g., a → Academic Village, b → Braiden, c → Corbett, d → Parmelee).

🎯 Key distinctions and common confusions

🎯 Maximal vs maximum

Term	Meaning	Relationship
Maximal	Cannot add more edges without violating the matching property	Locally "stuck"
Maximum	Has the largest possible number of edges globally	Globally optimal
Relationship	Every maximum matching is maximal, but not every maximal matching is maximum	A maximal matching may be suboptimal

Don't confuse: "Maximal" does not mean "best"—it only means you cannot improve it by adding one more edge locally.
Example: In P₄, a one-edge matching in the middle is maximal but not maximum; a two-edge matching is both.

🎯 Perfect vs maximum

A perfect matching is always maximum (because it matches all vertices, which is the best you can do).
A maximum matching is not always perfect (e.g., K₅ has a maximum matching of size 2, but no perfect matching because 5 is odd).

🎯 Saturated vs perfect

In a bipartite graph with |L| ≤ |R|, a saturated matching matches all of L but not necessarily all of R.
A perfect matching matches all vertices on both sides, so it requires |L| = |R| and is a special case of saturated.

Ramsey Theory

10.4 Ramsey theory

🧭 Overview

🧠 One-sentence thesis

Ramsey theory proves that in any sufficiently large complete graph with edges colored in a fixed number of colors, there must exist a monochromatic clique of a guaranteed minimum size.

📌 Key points (3–5)

Core idea: In a group of 6 people, either some 3 all know each other or some 3 all do not know each other—this is modeled as finding monochromatic cliques in edge-colored graphs.
Ramsey numbers R(m₁, m₂, ..., mₖ): the smallest n such that any k-coloring of the edges of Kₙ must contain a monochromatic mᵢ-clique in at least one of the k colors.
Key result: Ramsey's theorem guarantees that every Ramsey number is finite, but computing the exact values is an unsolved problem in general.
Common confusion: The property depends on any coloring—you cannot avoid the monochromatic clique by clever coloring once n is large enough.
Why it matters: Ramsey theory is a major area of graph optimization and coloring problems, showing that complete disorder is impossible at sufficient scale.

🎨 Modeling the problem with colored graphs

🎨 From people to graphs

The motivating example: 6 people in a room, some know each other and some do not.
Model this as a graph on 6 vertices (one per person), with edges colored in two ways:
- Solid edge = the two people know each other.
- Dashed edge = the two people do not know each other.
The question becomes: does this graph contain a monochromatic 3-clique (three vertices all connected by edges of the same color)?

🔷 Key definitions

m-clique: A subgraph isomorphic to Kₘ (a complete graph on m vertices).

Monochromatic m-clique: An m-clique in which all edges are the same color.

Example: In the 6-person graph, vertices 3, 4, 6 form a monochromatic 3-clique with all dashed edges (they all mutually do not know each other).
Don't confuse: A clique is about the structure (complete subgraph); monochromatic adds the constraint that all edges share one color.

🧮 The classical result: R(3, 3) = 6

🧮 Proof that K₆ always contains a monochromatic 3-clique

The excerpt provides a pigeonhole-based proof:

Pick any vertex x in K₆.
x is connected to 5 other vertices by edges that are either solid or dashed.
By the pigeonhole principle, at least 3 of these 5 edges share the same color. Assume (by symmetry) they are solid, connecting x to vertices y, z, w.
Now examine the triangle formed by y, z, w:
- If any edge among y, z, w is solid, it forms a solid triangle with x → monochromatic solid 3-clique.
- If all three edges among y, z, w are dashed, then y, z, w form a monochromatic dashed 3-clique.
In all cases, a monochromatic 3-clique exists.

🔍 Why 6 is the minimum

The excerpt notes that 6 is the smallest number for which this is guaranteed.
It is possible to color the edges of K₅ in two colors without creating any monochromatic 3-clique (see Exercise 1).
Therefore R(3, 3) = 6.

📐 Ramsey numbers: general definition

📐 Two-color Ramsey numbers R(m₁, m₂)

Ramsey number R(m₁, m₂): The smallest number n such that any coloring of the edges of Kₙ using 2 colors c₁ and c₂ must have either:

A monochromatic m₁-clique of color c₁, or

A monochromatic m₂-clique of color c₂.

Example: R(3, 3) = 6 means any 2-coloring of K₆ contains a monochromatic 3-clique (of one color or the other).
Example: R(2, 2) = 2 because a 2-clique is just an edge, so you only need one edge of any color; thus n = 2 is the smallest.

🌈 Multi-color Ramsey numbers R(m₁, m₂, ..., mₖ)

Ramsey number R(m₁, m₂, ..., mₖ): The smallest number n such that any coloring of the edges of Kₙ using k colors c₁, c₂, ..., cₖ must have one of the following:

A monochromatic m₁-clique of color c₁, or

A monochromatic m₂-clique of color c₂, or

...

A monochromatic mₖ-clique of color cₖ.

Example: R(3, 3, 2) is the smallest n such that any 3-coloring (red, green, blue) of Kₙ contains either a red 3-clique, a green 3-clique, or a blue 2-clique.
The excerpt shows R(3, 3, 2) = 6:
- If there is any blue edge, that is a blue 2-clique.
- If there are no blue edges, the graph is colored only red and green, and since R(3, 3) = 6, there must be a red or green 3-clique.
- Since K₅ can be colored red and green without a monochromatic 3-clique (and no blue edges), R(3, 3, 2) > 5.
- Therefore R(3, 3, 2) = 6.

🔬 Ramsey's theorem and open problems

🔬 Ramsey's theorem

Ramsey's theorem: The Ramsey number R(m₁, m₂, ..., mₖ) is always finite.

This means: there always exists some sufficiently large n for which the statement is true.
In other words, you cannot avoid monochromatic cliques indefinitely by making the graph larger—eventually, structure (a monochromatic clique) must emerge.

❓ The unsolved problem

What is the exact value of R(m₁, m₂, ..., mₖ)?
The study of computing these numbers is called Ramsey theory.
It is one of a large class of optimization problems in graph theory involving coloring graphs.
The excerpt notes that this is "in general an unsolved problem."

📊 Known and unknown values

Ramsey number	Value or status (from excerpt)
R(2, 2)	2 (trivial: just need one edge)
R(3, 3)	6 (proven in the excerpt)
R(3, 4)	9 (mentioned in exercises)
R(4, 4)	Students are asked to look this up online
R(5, 5)	Unknown; students are asked to investigate what is known
R(3, 3, 3)	17 (stated as known)

Don't confuse: Knowing that a Ramsey number is finite (Ramsey's theorem) vs. knowing its exact value (often unknown).

🧩 Exercises and further exploration

🧩 Simple Ramsey numbers

R(n) = n for all positive integers n (Exercise 4).
R(2, n) = n for all positive integers n (Exercise 5).
R(a, b, 2) = R(a, b) for any positive integers a and b (Exercise 7): adding a 2-clique condition (just an edge) does not change the threshold.
R(a, b) = R(b, a) for any positive integers a and b (Exercise 8): symmetry in the two colors.

🧩 Constructing counterexamples

Exercise 1: Color the edges of K₅ in two colors to avoid a monochromatic 3-clique (showing R(3, 3) > 5).
Exercise 12: Investigate R(3, 3, 3) = 17 by constructing 3-colorings of K₆, K₇, etc., that avoid monochromatic 3-cliques, showing R(3, 3, 3) > 6, R(3, 3, 3) > 7, and so on up to R(3, 3, 3) > 16.

Planar graphs

11.1 Planar graphs

🧭 Overview

🧠 One-sentence thesis

A graph is planar if it can be drawn on paper with no edges crossing, and this property depends on how the graph is drawn rather than on the graph structure alone.

📌 Key points (3–5)

What planar means: a connected graph drawn in the plane so that its edges do not cross.
Drawing-dependent property: the same graph can be drawn with or without crossings; a graph is planar if some drawing exists without crossings.
Practical importance: minimizing edge crossings matters in computer chip design (fewer wire crossings).
Common confusion: "planar graph" refers to both (1) a specific drawing with no crossings and (2) any graph that can be drawn without crossings—context determines which meaning applies.
Straight-edge theorem: Fáry's Theorem guarantees that any planar graph can be drawn with all edges straight.

🎨 What makes a graph planar

🎨 Definition and visual intuition

Planar graph: a connected graph drawn in the plane so that its edges do not cross.

The key criterion is no edge crossings in the drawing.
A graph is a set of vertices and edges; there are many ways to represent it (incidence matrix, edge list, or visual drawing).
This section focuses on the visual representation: drawing vertices as points and edges as lines or curves.
Example: K₄ (the complete graph on 4 vertices) can be drawn with crossings or without crossings; the second drawing is planar.

🔄 Drawing-dependent vs graph-level property

The excerpt distinguishes two uses of "planar":
- A specific drawing is a planar graph if edges don't cross in that drawing.
- A connected graph (as an abstract structure) is planar if there exists some way to draw it without crossings.
The same graph may have both planar and non-planar drawings.
Don't confuse: seeing one drawing with crossings does not prove the graph is non-planar; you must check whether any crossing-free drawing exists.

🧪 Examples of planar graphs

🧪 Example 1: Five-vertex graph (A, B, C, D, E)

Question: Is the graph planar?
Answer: Yes. The excerpt shows two different planar drawings (rearranging vertices A, B, C, D, E so edges don't cross).
Lesson: even if the first drawing looks tangled, redrawing can reveal a planar structure.

🧪 Example 2: Adding an edge (CD)

Question: Is the graph still planar after adding edge CD?
Answer: Yes. The excerpt notes "Adding another edge, CD, only makes this harder" but still provides two planar drawings.
Lesson: more edges increase the difficulty of finding a planar drawing, but do not automatically make the graph non-planar.

🔧 Applications and special properties

🔧 Computer chip design

Minimizing edge crossings reduces the number of wire crossings in electrical circuits.
This translates the engineering problem into a graph theory question: how to draw a graph with the fewest crossings.
The excerpt mentions that Section 11.2 will study properties of planar graphs to help decide whether a graph is planar.

📐 Fáry's Theorem

Fáry's Theorem: if a graph is planar, then it can be drawn as a planar graph in which every edge is straight.

This means curves are not necessary; straight-line segments suffice.
The excerpt references an interactive game (www.jasondavies.com/planarity) where you drag vertices to find a planar embedding with straight edges.

❓ Open question: integer-length edges

It is not known whether every planar graph can be drawn with all edges straight and every edge length equal to an integer.
This highlights an unsolved problem in the field.

📊 Euler's formula preview

📊 Vertices, edges, and faces

Faces: the regions into which the edges divide the plane, including the unbounded region(s) touching the edge of the paper.

Let v = number of vertices, e = number of edges, f = number of faces.
The excerpt provides three examples with their counts:
- 6 vertices, 7 edges, 3 faces
- 7 vertices, 9 edges, 4 faces
- 7 vertices, 10 edges, 5 faces

📊 Euler characteristic

Euler characteristic: χ = v − e + f

This quantity connects the three counts in a planar graph.
The excerpt introduces the definition and mentions that Euler's formula (Section 11.2) will give a "remarkable connection" between edges, vertices, and faces.
Example: for K₄, v = 4 (the excerpt cuts off here, but the formula will be explored in the next section).

Euler's formula for planar graphs

11.2 Euler’s formula for planar graphs

🧭 Overview

🧠 One-sentence thesis

Euler's formula establishes that for every connected planar graph, the relationship v − e + f always equals 2, where v is the number of vertices, e is the number of edges, and f is the number of faces.

📌 Key points (3–5)

What the formula states: For any connected planar graph, v − e + f = 2 (the Euler characteristic χ equals 2).
What faces are: The edges of a planar graph divide the plane into regions called faces, including the unbounded outer region.
How the proof works: By induction on the number of faces, starting from trees (f = 1) and removing edges from cycles to reduce face count.
Common confusion: Don't forget to count the outer unbounded region as a face—it always counts toward f.
Why it matters: Euler's formula provides a fundamental constraint that helps determine whether certain graphs can be drawn without edge crossings.

📐 Core definitions and counting

📐 Vertices, edges, and faces

In a planar graph, let v be the number of vertices and let e be the number of edges. The edges divide the plane into regions, which are called faces. Let f be the number of faces, including the face(s) which touch the edge of the paper.

Vertices (v): The points or nodes in the graph.
Edges (e): The lines connecting vertices.
Faces (f): The regions created when edges divide the plane.
Important: Always include the outer unbounded region as one of the faces.

Example: A graph drawn as a triangle has v = 3, e = 3, and f = 2 (the inside region and the outside region).

🔢 The Euler characteristic

Define χ = v − e + f; we call this quantity the Euler characteristic of the graph.

This is simply the formula: (number of vertices) minus (number of edges) plus (number of faces).
The remarkable discovery is that this value is always 2 for connected planar graphs.

🧮 Examples demonstrating the formula

🧮 Complete graph K₄

For the planar graph K₄:

v = 4 vertices
e = 6 edges
f = 4 faces
χ = 4 − 6 + 4 = 2

🔄 Cycle graph Cₙ

For the graph Cₙ, drawn as a regular n-gon in the plane:

v = n vertices
e = n edges
f = 2 faces (the inside and the outside regions)
χ = n − n + 2 = 2

🛤️ Path graph Pₙ

For the graph Pₙ, drawn as a path in the plane:

v = n vertices
e = n − 1 edges
f = 1 face (the path makes a slit in the one face)
χ = n − (n − 1) + 1 = 2

🔗 Complete bipartite graph B₂,₃

When drawn as a planar graph:

v = 5 vertices
e = 6 edges
f = 3 faces
χ = 5 − 6 + 3 = 2

Don't confuse: Different drawings of the same graph may look different, but v, e, and f will always satisfy Euler's formula if the graph is planar.

🎯 Euler's formula theorem

🎯 The main theorem

Theorem (Euler's formula): In a connected planar graph, let v be the number of vertices, let e be the number of edges, and let f be the number of faces. Then χ = v − e + f = 2.

This holds for every connected planar graph, not just specific examples.
The formula applies to graphs that allow loop edges (from a vertex to itself) and multiple edges between vertices.

🔧 Proof machinery

🔧 Key lemma about cycles

Lemma: Let G be a connected graph which is not a tree. If e is an edge that belongs to a cycle in G, then the subgraph G′ = (G with edge e removed) is still connected.

Why this matters for the proof:

Removing an edge from a cycle doesn't disconnect the graph.
Any walk that used the removed edge can "go the other way around the cycle" instead.
This allows us to systematically reduce the number of faces while maintaining connectivity.

How it works:

If a walk from vertex u to vertex w doesn't use edge e, it still exists in G′.
If a walk does use edge e, there's a second walk that goes around the cycle the other way, avoiding e.

🧩 Proof by induction on faces

Base case (f = 1):

If f = 1, the graph has no cycles, so it is a connected tree.
A tree with v vertices has v − 1 edges.
Therefore χ = v − (v − 1) + 1 = 2.

Inductive step (f ≥ 2):

Assume Euler's formula is true for any connected graph with fewer than f faces.
Since f ≥ 2, the graph G is not a tree, so it has a cycle.
Pick an edge e in that cycle and remove it to get G′.
In G′: v′ = v (same vertices), e′ = e − 1 (one fewer edge), f′ = f − 1 (one fewer face).
By the inductive hypothesis: 2 = v′ − e′ + f′ = v − (e − 1) + (f − 1) = v − e + f.

Visualization:

Think of the removed edge as a dam separating two regions.
Once removed, the two regions merge into one, reducing the face count by 1.

🚫 Identifying non-planar graphs

🚫 Edge-face inequality

Lemma: If G is a planar graph, then e ≥ 3f/2.

Why this is true:

Let E be the sum of the number of edges on every face.
Every edge borders exactly two faces, so 2e = E.
Since the graph has no multi-edges, each face has at least 3 edges.
Therefore E ≥ 3f, which means 2e ≥ 3f, or e ≥ 3f/2.

Example: A planar graph with faces bounded by 3, 4, and 5 edges has e = (3 + 4 + 5)/2 = 6, which is ≥ (3 + 3 + 3)/2 = 9/2.

🔍 Proving K₅ is not planar

Theorem: The complete graph K₅ on 5 vertices is not planar.

Proof strategy (by contradiction):

Suppose K₅ were planar.
Then it would have to satisfy both Euler's formula (v − e + f = 2) and the edge-face inequality (e ≥ 3f/2).
For K₅: v = 5 and e = 10 (every pair of 5 vertices is connected).
From Euler's formula: f = 2 − v + e = 2 − 5 + 10 = 7.
From the edge-face inequality: 10 ≥ 3(7)/2 = 10.5.
This is a contradiction (10 is not ≥ 10.5), so K₅ cannot be planar.

Graphs that are not planar

11.3 Graphs that are not planar

🧭 Overview

🧠 One-sentence thesis

The complete graph K₅ and the complete bipartite graph B₃,₃ are not planar, and any graph containing them (or their minors) is also non-planar.

📌 Key points (3–5)

Core inequality for planar graphs: Every planar graph satisfies e ≥ 3f/2 (twice the number of edges is at least three times the number of faces).
K₅ is not planar: Using Euler's formula and the inequality above leads to a contradiction (10 ≥ 10.5), proving K₅ cannot be drawn without crossings.
Minors generalize subgraphs: A minor is obtained by contracting edges, deleting edges, and removing isolated vertices; if a graph contains K₅ or B₃,₃ as a minor, it is not planar.
Common confusion: The inequality e ≥ 3f/2 comes from the fact that every edge borders two faces and every face has at least 3 edges (no multi-edges).
Kuratowski's Theorem: A graph is planar if and only if it does not contain K₅ or K₃,₃ as a minor (though the excerpt notes this theorem is for general knowledge and not always required for determining planarity).

🔢 The fundamental inequality for planar graphs

🔢 Lemma: e ≥ 3f/2

Lemma 11.3.1: If G is a planar graph, then e ≥ 3f/2.

Why this inequality holds:

Let E be the sum of the number of edges on every face.
Every edge borders exactly two faces, so 2e = E.
The graph has no multi-edges, so each face has at least 3 edges.
Therefore E ≥ 3f.
Combining: 2e ≥ 3f, which gives e ≥ 3f/2.

Don't confuse:

E (sum of edges counted per face) vs e (total number of edges in the graph).
The factor of 2 comes from double-counting: each edge is shared by two faces.

📐 Examples of the inequality

Example 11.3.2:

A planar graph with faces having 3, 4, and 5 edges.
e = (3 + 4 + 5)/2 ≥ (3 + 3 + 3)/2 = 3f/2.

Example 11.3.3:

A planar graph with faces having 3, 3, 5, and 7 edges.
e = (3 + 3 + 5 + 7)/2 ≥ (3 + 3 + 3 + 3)/2 = 3f/2.

🚫 Proving K₅ is not planar

🚫 The contradiction argument

Theorem 11.3.4: The complete graph K₅ on 5 vertices is not planar.

Proof by contradiction:

Assume K₅ can be drawn as a planar graph.
Count vertices and edges: v = 5 and e = (5 choose 2) = 10.
Apply Euler's formula: v − e + f = 2, so 5 − 10 + f = 2, which gives f = 7.
Apply the inequality: e ≥ 3f/2, so 10 ≥ (3 × 7)/2 = 10.5.
Contradiction: 10 ≥ 10.5 is false.
Conclusion: K₅ cannot be planar.

🔗 Consequences for other graphs

Any graph that contains K₅ as a subgraph is not planar.
Similarly, B₃,₃ (the complete bipartite graph with 3 vertices in each part) is not planar (shown in exercises).
Any graph containing B₃,₃ as a subgraph is also not planar.

Example application:

The "three houses and three utilities" problem asks whether you can connect three houses to water, electricity, and gas without crossing lines—this is equivalent to asking whether B₃,₃ is planar, and the answer is no.

🔧 Minors and Kuratowski's Theorem

🔧 What is an edge contraction

Definition 11.3.5: An edge contraction is a way to make a new graph by removing an edge from G and merging the two vertices that were at its endpoints.

How it works:

Take an edge connecting two vertices.
Remove the edge.
Merge the two endpoints into a single vertex.
All edges that were connected to either endpoint are now connected to the merged vertex.

🔧 What is a minor

Definition 11.3.5: A graph H is a minor of G if H can be obtained from G by contracting some edges, deleting some edges, and deleting some isolated vertices.

Why minors matter:

Minors generalize the idea of subgraphs.
They allow for a more flexible way to "find" K₅ or B₃,₃ hidden inside a graph.
Some texts use "subdivisions" instead of minors to express similar ideas.

🏆 Kuratowski's Theorem

Theorem 11.3.6 (Kuratowski's Theorem): A graph G is planar if and only if it does not contain K₅ or K₃,₃ as a minor.

What this means:

The reason any graph is non-planar is because either K₅ or B₃,₃ (also written K₃,₃) is hidden in the graph as a minor.
This theorem provides a complete characterization of planar graphs.

Note from the excerpt:

The excerpt includes this theorem "just for your general knowledge."
Students are often asked to determine planarity without using Kuratowski's Theorem.

🌟 Example: The Petersen graph

The Petersen graph is not planar because it has K₅ as a minor.
You can contract all five edges connecting the "inner star" to the "outer pentagon" to form a copy of K₅.

📊 Special case: Bipartite planar graphs

📊 Stronger inequality for bipartite graphs

Exercise 3 in the excerpt:

If G is a bipartite planar graph, then e ≥ 2f (a stronger inequality than e ≥ 3f/2).

Why this is stronger:

In bipartite graphs, every cycle has even length (at least 4).
Therefore, every face has at least 4 edges (not just 3).
This leads to 2e ≥ 4f, or e ≥ 2f.

Application to B₃,₃:

Exercise 4 asks to prove B₃,₃ is not planar without using Kuratowski's Theorem.
B₃,₃ is bipartite, so you can use the stronger inequality e ≥ 2f along with Euler's formula to derive a contradiction.

Euler's formula for polyhedra

11.4 Euler’s formula for polyhedra

🧭 Overview

🧠 One-sentence thesis

Euler's formula states that for any convex polyhedron, the Euler characteristic (vertices minus edges plus faces) always equals 2, which can be proven by flattening the polyhedron into a planar graph.

📌 Key points (3–5)

What the formula says: For any convex polyhedron, v − e + f = 2 (vertices minus edges plus faces equals 2).
How to prove it: Remove one face, flatten the polyhedron into a planar graph (Schlegel diagram), then apply the planar graph Euler characteristic result.
Examples: All standard polyhedra (Platonic solids, prisms, sports balls) satisfy v − e + f = 2.
Duality: Each polyhedron has a dual where vertices become faces and faces become vertices; dual pairs share the same number of edges but swap vertex and face counts.
Common confusion: The formula applies to convex polyhedra in three dimensions, not arbitrary graphs or surfaces (the excerpt later mentions the torus has a different Euler characteristic).

🔷 What is a polyhedron

🔷 Definition and restrictions

A polyhedron is a three-dimensional shape whose faces are flat polygons and whose edges are straight line segments intersecting at vertices.

The excerpt restricts attention to convex polyhedra in three-dimensional space.
Convex means the polyhedron is an intersection of half-spaces or the convex hull of its vertices.
Example: prisms (triangular, pentagonal) and Platonic solids are all convex polyhedra.

🌟 Platonic solids

These are the only regular polyhedra: all faces are the same shape, all edges the same length, all interior angles the same.
There are exactly five:

Platonic solid	Vertices (v)	Edges (e)	Faces (f)	v − e + f
Tetrahedron	4	6	4	2
Cube	8	12	6	2
Octahedron	6	12	8	2
Dodecahedron	20	30	12	2
Icosahedron	12	30	20	2

🏐 Everyday examples

The excerpt lists sports balls as polyhedra (with "puffed out" sides):

Object	Vertices	Edges	Faces	v − e + f
Soccer ball	60	90	32 (12 pentagons, 20 hexagons)	2
Basketball	6	12	8	2
Football	2	4	4	2
Volleyball	32	48	18	2

All satisfy the Euler characteristic χ = v − e + f = 2.

🧮 Euler's formula and its proof

🧮 The formula

Theorem (Euler): In any convex polyhedron we have χ = v − e + f = 2, where v is the number of vertices, e is the number of edges, and f is the number of faces.

This is the Euler characteristic for convex polyhedra.
The formula holds for all the examples above (prisms, Platonic solids, sports balls).

🛠️ How the proof works

The proof connects polyhedra to planar graphs:

Remove one face from the polyhedron.
Stretch and flatten the remaining structure to lie flat in the plane.
This produces a planar graph where:
- The outside face of the planar map corresponds to the removed face.
- The planar graph has the same values for v, e, and f as the original polyhedron.
Apply the planar graph result: For any planar graph, χ = v − e + f = 2 (by Theorem 11.2.7, referenced in the excerpt).
Therefore, the same is true for the polyhedron.

Don't confuse: The formula relies on the polyhedron being convex and the flattening step producing a valid planar graph.

📐 Schlegel diagram

The planar graph obtained by removing one face and flattening is called a Schlegel diagram of the polyhedron.

The excerpt suggests a physical demonstration: hold a flashlight near the front face of a polyhedron made of sticks and balls; the projection of the edges onto a board gives the Schlegel diagram.
Example: A cube can be flattened into a planar graph with one square face on the outside and the other faces nested inside.

🔄 Duality of polyhedra

🔄 What is a dual polyhedron

Each polyhedron P has a dual P^D. Informally, P^D is found by replacing each vertex with a face and replacing each face with a vertex.

Two vertices of P^D are connected by an edge if and only if the corresponding faces of P share an edge.
The dual operation swaps the roles of vertices and faces but preserves edges.

🔄 Dual pairs among Platonic solids

The excerpt lists the dual relationships:

Polyhedron	Dual	Evidence
Cube	Octahedron	Same number of edges (12); vertices and faces flipped (8↔6)
Dodecahedron	Icosahedron	Same number of edges (30); vertices and faces flipped (20↔12)
Tetrahedron	Tetrahedron (self-dual)	Number of vertices equals number of faces (4)

Example: The cube has 8 vertices and 6 faces; the octahedron has 6 vertices and 8 faces; both have 12 edges.
Don't confuse: Duality is a structural relationship, not just counting; the dual polyhedron has a specific geometric construction.

🌍 Euler characteristic on other surfaces

🌍 Beyond the plane and sphere

The excerpt notes that for connected graphs on the plane or sphere, χ = v − e + f = 2.
This was used to show that certain graphs (K₅ and K₃,₃) are not planar.
The Euler characteristic χ is an important concept in topology.

🍩 Graphs on the torus

The excerpt introduces the torus (surface of a doughnut) as an example of a different surface.
One way to make a torus: take a piece of paper, tape top to bottom (forming a cylinder), then tape left side to right.
Another way: a "pacman" board where each point on the right edge is identified with the corresponding point on the left edge.
The excerpt hints that graphs on the torus have a different Euler characteristic (not equal to 2), but does not provide the value in the given text.

11.5 Investigation: Graphs on other surfaces

🧭 Overview

🧠 One-sentence thesis

The Euler characteristic χ = v − e + f is a topological invariant that equals 2 for graphs on the plane or sphere but equals 0 for graphs on the torus, revealing that the characteristic depends only on the surface, not on the specific graph drawn on it.

📌 Key points (3–5)

What the Euler characteristic is: the quantity χ = v − e + f (vertices minus edges plus faces) for a graph subdividing a surface.
Key property: χ depends only on the surface, not on the particular graph, as long as faces are connected regions with no non-contractible loops.
Sphere vs torus: graphs on the plane or sphere have χ = 2; graphs on the torus have χ = 0.
Common confusion: counting vertices, edges, and faces on the torus (especially the pacman board) requires careful attention to identifications—corners and edges that appear separate may actually be the same point.
Practical consequence: some graphs that cannot be drawn without crossings on the plane (like K₅ and K₃,₃) can be drawn without crossings on the torus.

🍩 Understanding the torus

🍩 What the torus is

The torus is the surface of a doughnut.

It is a different surface from the plane or sphere.
The excerpt provides two equivalent ways to construct a torus:
1. Take a piece of paper, tape top to bottom (making a cylinder), then tape left side to right.
2. Use a "pacman board": a flat piece of paper where the right edge is identified with the left edge (points at the same distance from the bottom), and the top edge is identified with the bottom edge (points at the same distance from the left).
These two descriptions are the same surface in topology (they are homeomorphic).
The tape in the first construction forms two loops; cutting along these loops stretches the torus back to a flat piece of paper.

🔄 Extra flexibility for drawing graphs

On the torus, edges can loop around the hole or through the hole.
On the pacman board, this flexibility appears as:
- An edge going off the right side comes in on the left.
- An edge going off the top comes in on the bottom.
This extra flexibility allows some graphs that are non-planar (cannot be drawn without crossings on the plane) to be drawn without crossings on the torus.

🧮 Counting vertices, edges, and faces on the torus

🧮 The pacman board example

The excerpt walks through a specific example on the pacman board to illustrate the counting challenges:

What it looks like	Actual count	Reason
5 vertices	2 vertices	The 4 corners are all the same point due to edge identifications
8 edges	6 edges	Top and bottom edges are identified (count as 1); left and right edges are identified (count as 1)
4 faces	4 faces	Top region is separated from bottom by the edge along the top; left region is separated from right by the edge along the left

Therefore: v = 2, e = 6, f = 4.
So χ = v − e + f = 2 − 6 + 4 = 0.

⚠️ Common mistake: ignoring identifications

Don't confuse: what appears as multiple separate vertices or edges on the pacman board may actually be the same vertex or edge due to the identification of edges.
The 4 corners all represent the same point because both the top-bottom and left-right identifications meet there.
Similarly, the top edge and bottom edge are the same edge, and the left edge and right edge are the same edge.

🔢 The Euler characteristic as a topological invariant

🔢 The key property

The Euler characteristic of a graph on a surface is an integer that depends only on the surface, and not on the graph, so long as the graph subdivides the surface into faces which are connected regions with no non-contractible loops.

This is described as a "surprising fact" in the excerpt.
For the plane or sphere: χ = 2 (as established in earlier sections).
For the torus: χ = 0 (as shown in the example and stated generally).

🌐 Independence from the specific graph

The Euler characteristic χ = 0 holds for any graph on the torus (meeting the conditions).
This is true even if the number of vertices, edges, and faces are very different from the example.
Example: even if v = 100 instead of v = 2, the relationship v − e + f = 0 still holds.
The excerpt emphasizes that χ depends on the surface, not on how many vertices or edges the graph has.

🎨 Applications: non-planar graphs on the torus

🎨 Complete graphs on the torus

The excerpt states an interesting fact about complete graphs:

K₅, K₆, and K₇ can be drawn on the torus without any edges crossing.
K₈ cannot be drawn on the torus without edges crossing.
Recall: K₅ and K₃,₃ are not planar (they cannot be drawn on the plane without crossings), but they can be drawn on the torus without crossings.

🔍 Why this matters

The torus provides more "room" for drawing graphs because of the extra flexibility (edges can wrap around or through the hole).
This demonstrates that planarity is a property tied to the specific surface, not an absolute property of the graph itself.
The exercises ask students to draw K₅ and K₃,₃ on the torus (using the pacman board representation) to see this concretely.

The chromatic number of a graph

12.1 The chromatic number of a graph

🧭 Overview

🧠 One-sentence thesis

The chromatic number of a graph is the minimum number of colors needed to color vertices so that no two adjacent vertices share the same color, and it provides a way to solve scheduling problems like minimizing the number of exam days.

📌 Key points (3–5)

What the chromatic number measures: the smallest number of colors required to color a graph's vertices so that adjacent vertices have different colors.
Real-world application: scheduling final exams to minimize the number of days while avoiding conflicts when students are enrolled in multiple courses.
How to tell if a graph is k-colorable: a graph is k-colorable if you can assign at most k colors to its vertices without adjacent vertices sharing a color.
Common confusion: vertex-coloring vs edge-coloring—vertex-coloring (the focus here) colors vertices so adjacent vertices differ; edge-coloring colors edges so edges meeting at a vertex differ.
Special case: 2-colorable graphs are exactly the bipartite graphs, which contain no odd cycles.

🎨 Core concept: vertex-coloring and chromatic number

🎨 What is a vertex-coloring?

Vertex-coloring: a way of coloring the vertices of a graph G so that any two adjacent vertices have different colors.

Adjacent means connected by an edge.
The goal is to avoid conflicts: if two vertices are neighbors, they must have different colors.
Example: in the exam scheduling scenario, each course is a vertex, and an edge exists if at least one student is enrolled in both courses; coloring ensures no student has two exams on the same day.

🔢 What is the chromatic number?

Chromatic number C(G): the smallest integer k such that G is k-colorable.

A k-coloring uses at most k colors.
A graph is k-colorable if it has a vertex-coloring with k colors.
The chromatic number is the minimum number of colors needed.
Example: if a graph can be colored with 3 colors but not with 2, then C(G) = 3.

🧩 How the exam scheduling problem works

Construct a graph with n vertices representing n courses.
Draw an edge between two vertices if at least one student is enrolled in both courses.
Assign colors to represent days: blue for Monday, orange for Tuesday, yellow for Wednesday, etc.
No two adjacent vertices can share the same color (no conflicts).
The chromatic number is the minimum number of days needed for finals.

📚 Examples of chromatic numbers

📚 Complete graph K_n

Example: The chromatic number of K_n is C(K_n) = n.

Why: K_n has every vertex connected to every other vertex.
K_n is n-colorable because each vertex can be a different color.
K_n is not (n − 1)-colorable: by the pigeonhole principle, if we have only n − 1 colors for n vertices, at least 2 vertices must share the same color, but all vertices are adjacent, so this violates the coloring rule.

📚 Empty graph

Example: The chromatic number of the empty graph with n ≥ 1 vertices (no edges) is 1.

Why: no edges means no adjacency constraints, so all vertices can be the same color.

📚 Path graph P_n

Example: The chromatic number of the path graph P_n with n vertices is C(P_n) = 2 if n ≥ 2.

Why: a path alternates between two colors along the sequence of vertices.

📚 Cycle graph C_n

Example: The chromatic number of the cycle graph C_n is 2 when n is even and 3 when n ≥ 3 is odd.

Why: an even cycle can alternate two colors around the loop; an odd cycle cannot alternate perfectly, requiring a third color.

🔗 2-colorable graphs and bipartite graphs

🔗 When is a graph 2-colorable?

Proposition: Let G be a connected graph with at least 2 vertices. Then G is bipartite if and only if C(G) = 2.

A graph is bipartite if and only if it is 2-colorable.
How they match: identify vertices on the left side with one color (e.g., gold) and vertices on the right side with another color (e.g., green).
The condition that adjacent vertices have different colors is the same as the condition that all edges connect vertices on opposite sides.
At least two colors are needed if the graph is connected and has at least two vertices.

🔗 Odd cycles and 2-colorability

Theorem: A graph G has C(G) ≤ 2 if and only if it contains no odd cycles.

Forward direction (⇒): an odd cycle is not 2-colorable, so any graph containing an odd cycle is not 2-colorable.
Reverse direction (⇐): if G contains no odd cycles, then it can be 2-colored (the proof starts by picking a vertex "a", coloring it red, coloring all neighbors of a blue, then continuing to color neighbors of the newly colored vertices).
Don't confuse: even cycles are 2-colorable (they alternate colors perfectly), but odd cycles require a third color.

🆚 Vertex-coloring vs edge-coloring

🆚 What is edge-coloring?

Edge-coloring: a way of coloring the edges of G such that all the edges ending at a vertex v have different colors, for each vertex v of G.

Edge-coloring number: the smallest number k such that G has an edge-coloring with k colors.

Key difference: vertex-coloring colors vertices so adjacent vertices differ; edge-coloring colors edges so edges meeting at a vertex differ.
The excerpt focuses on vertex-coloring; whenever "chromatic number" is mentioned without qualification, it refers to the minimum number of colors for a vertex-coloring.
Example: in edge-coloring, if three edges meet at a vertex, they must all have different colors.

Concept	What is colored	Constraint
Vertex-coloring	Vertices	Adjacent vertices have different colors
Edge-coloring	Edges	Edges meeting at a vertex have different colors

12.2 What graphs are 2-Colorable?

🧭 Overview

🧠 One-sentence thesis

A graph can be colored with exactly two colors if and only if it is bipartite, which is equivalent to saying it contains no odd cycles.

📌 Key points

Core equivalence: 2-colorable graphs, bipartite graphs, and graphs with no odd cycles are all the same thing.
How bipartite structure maps to coloring: one side of the bipartite graph gets one color, the other side gets the second color.
Why odd cycles matter: an odd cycle cannot be 2-colored because adjacent vertices must alternate colors, but an odd number of vertices forces two adjacent vertices to share a color.
Common confusion: don't confuse "no odd cycles" with "no cycles at all"—even cycles (cycles with an even number of vertices) are perfectly 2-colorable.
Practical algorithm: the proof provides a method to test whether a graph is 2-colorable by coloring neighbors in alternating waves.

🎨 The bipartite–2-colorable equivalence

🎨 What bipartite means for coloring

A graph is bipartite if and only if it is 2-colorable.

How the mapping works: vertices on the left side of a bipartite graph are assigned one color (e.g., gold), and vertices on the right side are assigned a second color (e.g., green).
Why it works: in a bipartite graph, all edges connect vertices on opposite sides, so adjacent vertices automatically have different colors.
Minimum requirement: at least two colors are needed if the graph is connected and has at least two vertices.

🔗 The formal statement

Proposition 12.2.1: Let G be a connected graph with at least 2 vertices. Then G is bipartite if and only if C(G) = 2.

C(G) denotes the chromatic number, the minimum number of colors needed to color the graph so that no two adjacent vertices share a color.
The condition for a 2-coloring (adjacent vertices have different colors) is identical to the condition for a bipartite graph (all edges connect vertices on opposite sides).

🔄 Odd cycles and 2-colorability

🔄 The odd-cycle criterion

Theorem 12.2.2: A graph G has C(G) ≤ 2 if and only if it contains no odd cycles.

Forward direction (⇒): if a graph contains an odd cycle, it cannot be 2-colored.
- Why: in any cycle, adjacent vertices must alternate colors; an odd number of vertices forces the last vertex to be adjacent to the first vertex with the same color—a contradiction.
Reverse direction (⇐): if a graph contains no odd cycles, it can be 2-colored.

🌊 The wave-coloring algorithm

The proof provides a constructive method to 2-color a graph with no odd cycles:

Pick any starting vertex "a" and color it red.
Color all neighbors of "a" blue.
Color all neighbors of the newly colored vertices red.
Continue alternating colors (red, blue, red, blue, ...) in waves until all vertices are colored.

Why this works:

If two adjacent vertices u and v end up with the same color, trace the shortest paths from u to "a" and from v to "a".
These paths alternate colors. Let w be the first junction point where the two paths meet.
The path from u to w, then reversed from w to v, plus the edge uv, forms an odd cycle—contradicting the assumption that the graph has no odd cycles.
Therefore, no two adjacent vertices can have the same color, and the graph is 2-colorable.

🧪 Example scenario

Example: Consider a square (a 4-cycle, which is even).

Start at one corner, color it red.
Color its two neighbors blue.
Color the remaining corner (opposite the start) red.
No two adjacent corners share a color → the square is 2-colorable.

Example: Consider a triangle (a 3-cycle, which is odd).

Start at one vertex, color it red.
Color its two neighbors blue.
But those two neighbors are also adjacent to each other, so they cannot both be blue → the triangle is not 2-colorable.

🛠️ Practical implications

🛠️ Testing for 2-colorability

Remark 12.2.3: The proof of Theorem 12.2.2 also gives an algorithm for deciding if a graph is 2-colorable (i.e., if it has no odd cycles).

You don't need to explicitly search for odd cycles; just try the wave-coloring algorithm.
If the algorithm completes without conflict, the graph is 2-colorable.
If a conflict arises (two adjacent vertices get the same color), the graph contains an odd cycle and is not 2-colorable.

🔍 Don't confuse: even vs. odd cycles

Even cycles (4-cycles, 6-cycles, etc.) are 2-colorable because alternating colors works perfectly around the cycle.
Odd cycles (3-cycles, 5-cycles, etc.) are not 2-colorable because alternating colors forces a conflict.
A graph can have many cycles and still be 2-colorable, as long as all cycles are even.

Bounds on chromatic numbers

12.3 Bounds on chromatic numbers

🧭 Overview

🧠 One-sentence thesis

The chromatic number of any graph with n vertices lies between 1 and n, and subgraphs provide lower bounds because any coloring of the larger graph must also validly color its subgraphs.

📌 Key points (3–5)

General bounds: For any graph with n vertices (n ≥ 1), the chromatic number C(G) satisfies 1 ≤ C(G) ≤ n.
Extreme cases: The empty graph achieves the lower bound (C = 1), and the complete graph K_n achieves the upper bound (C = n).
Subgraph lower bounds: If H is a subgraph of G, then C(H) ≤ C(G); the chromatic number of any subgraph provides a lower bound for the whole graph.
Common confusion: A graph containing a specific subgraph (like K_5 or an odd cycle) inherits colorability constraints—if the subgraph needs k colors, the whole graph needs at least k colors.
Key examples: Any graph containing K_n is not (n − 1)-colorable; any graph containing an odd cycle C_n (n ≥ 3 odd) is not 2-colorable.

📏 General bounds theorem

📏 The 1-to-n range

Theorem 12.3.1: Let G be a graph with n vertices, where n ≥ 1. Then 1 ≤ C(G) ≤ n.

The chromatic number of any graph with n vertices is at least 1 and at most n.
This range is tight: both extremes are achievable.

🔺 Upper bound: complete graphs

The complete graph K_n has chromatic number exactly n.
Every vertex is adjacent to every other vertex, so each vertex must receive a unique color.
Example: K_3 (a triangle) requires 3 colors; no two vertices can share a color.

⚪ Lower bound: empty graphs

The empty graph (n vertices, no edges) has chromatic number 1.
Since no vertices are adjacent, all can receive the same color.

🔀 Intermediate cases

If G is not complete: there exist two non-adjacent vertices x_1 and y_1.
- Assign x_1 and y_1 the same color.
- Assign each of the remaining (n − 2) vertices its own color.
- Result: C(G) ≤ n − 1 < n.
If G is not empty: there exist two adjacent vertices x_2 and y_2.
- These must receive different colors.
- Result: C(G) ≥ 2 > 1.

🧩 Subgraphs as lower bounds

🧩 The subgraph principle

Proposition 12.3.4: If G is a graph and H is a subgraph of G, then C(H) ≤ C(G).

The chromatic number of any subgraph H provides a lower bound for the chromatic number of the whole graph G.
Why: any valid coloring of G, when restricted to the vertices and edges of H, must also be a valid coloring of H.

🔍 Proof idea

Suppose for contradiction that G were k-colorable, where k < C(H).
Remove all vertices and edges in G that are not in H.
This gives a valid coloring of H with at most k colors.
Contradiction: H requires more than k colors by definition.
Therefore, G cannot be colored with fewer colors than H.

💡 Practical implication

To show that a graph G needs at least k colors, find a subgraph H that requires k colors.
Example: If G contains a triangle (K_3), then C(G) ≥ 3.

🔺 Complete subgraphs

🔺 K_n subgraphs

Remark 12.3.2: Any graph containing a copy of K_n is not (n − 1)-colorable.

The complete graph K_n requires exactly n colors.
If G contains K_n as a subgraph, then C(G) ≥ n.
Example: A graph containing K_5 (complete graph on 5 vertices) is not 4-colorable; it needs at least 5 colors.

🎯 Why this matters

Spotting a complete subgraph immediately gives a lower bound.
The excerpt shows a graph containing K_5, which proves that graph needs at least 5 colors.

🔄 Odd cycle subgraphs

🔄 Odd cycles and 2-colorability

Remark 12.3.3: Any graph containing a copy of C_n for n ≥ 3 odd is not 2-colorable.

Recall: any cycle graph with an odd number of vertices is not 2-colorable (from earlier sections).
If G contains an odd cycle as a subgraph, then C(G) ≥ 3.
Example: A graph containing a 7-cycle (C_7) is not 2-colorable.

🔁 Triangles as the simplest case

The triangle C_3 = K_3 is the smallest odd cycle.
Any graph containing a triangle is not 2-colorable.
This generalizes: C_3, C_5, C_7, C_9, ... all require at least 3 colors.

⚠️ Don't confuse

Even cycles (C_4, C_6, ...) are 2-colorable by alternating colors.
Odd cycles (C_3, C_5, C_7, ...) are not 2-colorable; they require at least 3 colors.
The parity (odd vs even) of the cycle length determines 2-colorability.

📊 Summary table

Subgraph type	Chromatic number of subgraph	Lower bound for G	Example
K_n (complete graph on n vertices)	n	C(G) ≥ n	K_5 → C(G) ≥ 5
C_n (odd cycle, n ≥ 3 odd)	3	C(G) ≥ 3	C_7 → not 2-colorable
Empty graph	1	C(G) ≥ 1	No constraint

Brooks' Theorem

12.4 Brooks’ Theorem

🧭 Overview

🧠 One-sentence thesis

Brooks' Theorem guarantees that any graph where every vertex has degree at most d can be colored with at most d+1 colors, providing an upper bound on the chromatic number based on maximum degree.

📌 Key points (3–5)

What Brooks' Theorem states: if every vertex in a graph has degree at most d, then the graph is (d+1)-colorable.
Why 3-coloring is hard: the number of options to try explodes exponentially, taking approximately 2^(n/2) running time for n vertices.
Real-world application: graph coloring solves scheduling problems—vertices represent tasks, edges represent conflicts, and colors represent resources (e.g., instructors or time slots).
Common confusion: Brooks' Theorem gives an upper bound (d+1 colors), but the actual chromatic number C(G) may be smaller; the theorem guarantees colorability, not optimality.
Special cases: a graph with maximum degree d is actually d-colorable (one fewer color) unless it contains an odd cycle (when d=2) or a complete graph K_(d+1) (when d>2).

🎨 The coloring explosion problem

💥 Why 3-coloring is computationally hard

When trying to 3-color a graph (similar to how we 2-color), the number of choices grows explosively.
The excerpt illustrates this with a triangle: after fixing three vertices as Red, Blue, and Green, the next vertex has multiple valid color options, and each choice branches into more choices.
Running time: deciding if a graph with n vertices is 3-colorable takes approximately 2^(n/2) time—this is expensive.
Nothing substantially faster is known for k-colorings when k ≥ 3.

📅 Scheduling application example

One example application of coloring graphs is to scheduling.

Scenario: you are a department chair with math classes at fixed times; you need to hire the minimum number of instructors.
How to model it:
- Each vertex = one math class
- Edge between two vertices = the classes overlap in time
- Each color = one instructor
Solution: the minimum number of colors needed to color this graph equals the minimum number of instructors you need to hire.
Example: if two classes overlap, they need different instructors (different colors); if they don't overlap, one instructor can teach both (same color allowed).

🔢 Brooks' Theorem statement and proof

📐 The theorem

Theorem 12.4.3 (Brooks' Theorem): If every vertex in a graph G has degree at most d, then G is (d+1)-colorable.

What "degree at most d" means: each vertex is connected to at most d other vertices.
What the theorem guarantees: you can always color such a graph with d+1 colors.
Example: if every vertex has degree at most 3, the graph is 4-colorable.

🧮 Proof by induction

Base case: if the graph has n ≤ d+1 vertices, use a different color for each vertex—clearly (d+1)-colorable.

Inductive step: assume any such graph with n vertices is (d+1)-colorable; prove the same for n+1 vertices.

Start with a graph G with n+1 vertices.
Delete one vertex v, leaving a graph with n vertices.
By the inductive assumption, color the remaining graph with d+1 colors.
Now add v back: since v has at most d neighbors, at most d colors are used by its neighbors.
At least one of the d+1 colors is free, so assign that free color to v.
This produces a valid (d+1)-coloring of the entire graph.

Illustration (when d=3):

Remove vertex v from G.
Color the rest (say with Red, Yellow, Purple, Blue).
v has at most 3 neighbors, so at least one of the 4 colors is unused by its neighbors.
Assign that free color to v.

🎯 The stronger result (aside)

Even better bound: a graph with maximum degree d is actually d-colorable (not d+1) unless:
- d=2 and some connected component is an odd cycle, or
- d>2 and some connected component is the complete graph K_(d+1).
The proof of this stronger result is very hard (not given in the excerpt).

🚫 Non-3-colorable graph examples

🔺 First example: 7-vertex graph

Setup: vertices 1,2,3,4,5,6,7 with specific edges (not fully described in excerpt, but the solution is given).

Proof by contradiction:

Assume the graph is 3-colorable.
Without loss of generality, let vertex 1 be Red.
Then vertices 2 and 3 must be Blue and Green (in some order), which forces vertex 6 to be Red.
Similarly, vertices 4 and 5 must be Blue and Green (in some order), which forces vertex 7 to be Red.
But vertices 6 and 7 are adjacent, so they cannot both be Red—contradiction.
Therefore, the graph is not 3-colorable.

Note: the excerpt remarks that there is no K_4 (complete graph on 4 vertices) in this graph, so we cannot immediately conclude it needs 4 colors; we must use a more detailed argument.

🌟 Second example: star-like graph with 5 twin pairs

Structure:

One central vertex P.
Five pairs of "twins": A₁ and A₂, B₁ and B₂, C₁ and C₂, D₁ and D₂, E₁ and E₂.
In each pair, the "outer twin" (e.g., A₁) and "inner twin" (e.g., A₂) are connected.
The central vertex P is connected to all 5 inner twins.
The 5 outer twins form an outer 5-cycle.

Proof by contradiction:

Assume the graph is 3-colorable.
Without loss of generality, let the central vertex P be Red.
All 5 inner twins must be Blue or Green (since they are adjacent to P).
Consider any outer twin colored Red: replace Red with the color of its inner twin (Blue or Green).
This produces a coloring of the outer 5-cycle using only Blue and Green (a 2-coloring).
Key observation: if an outer twin v is changed from Red to Blue, both of v's adjacent outer twins must be Green:
- They cannot be Blue (they are connected to v's inner twin, which is Blue).
- They cannot have been changed from Red (because v was originally Red, so its neighbors were not Red).
But this means we have a valid 2-coloring of the outer 5-cycle, which is an odd cycle.
Contradiction: no odd cycle can be 2-colored.
Therefore, the original graph is not 3-colorable.

📊 Comparing Brooks' bound with actual chromatic number

📏 When k differs from C(G)

Brooks' Theorem guarantees that G is k-colorable for some k = d+1, but the actual chromatic number C(G) might be smaller.

Don't confuse:

k (from Brooks' Theorem) = d+1 = an upper bound on the chromatic number.
C(G) = the actual minimum number of colors needed.
Always: C(G) ≤ k, but C(G) may be much smaller.

🔍 Examples to compare

The excerpt lists exercises comparing k and C(G) for:

(a) Path graph P_n: a path has maximum degree 2, so k=3, but C(P_n)=2 (paths are bipartite).
(b) Cycle graph C_n: maximum degree 2, so k=3; C(C_n)=2 if n is even, 3 if n is odd.
(c) Platonic solids: various regular polyhedra (specific values not computed in excerpt).
(d) Petersen graph: a well-known graph (specific values not computed in excerpt).

🧮 Introduction to chromatic polynomials

🔢 What the chromatic polynomial counts

The chromatic polynomial of G is the polynomial whose value at the input k is the number of k-colorings.

Definition: P_G(x) is a polynomial such that P_G(k) = the number of ways to k-color the vertices of G for any positive integer k.
Real-world scenario: if you have k employees and n jobs (some conflicting), the chromatic polynomial tells you how many ways to assign employees to jobs.
Modeling: build a graph where vertices = jobs, edges = conflicts, colors = employees; then P_G(k) = number of valid assignments.

⚠️ When no coloring exists

If k < C(G) (fewer colors than the chromatic number), there is no proper k-coloring, so P_G(k) = 0.
For k ≥ C(G), P_G(k) counts the number of different k-colorings.

🎯 Why it's remarkable

Theorem 12.5.1: If G is a labeled graph with n vertices, there exists a polynomial P_G(x) such that P_G(k) equals the number of ways to k-color G for any positive integer k.

Amazing property: the number of colors k can be much larger than the number of coefficients of the polynomial.
The chromatic polynomial is not a generating function (the excerpt notes this but does not elaborate further).

The chromatic polynomial

12.5 The chromatic polynomial

🧭 Overview

🧠 One-sentence thesis

The chromatic polynomial of a labeled graph is a polynomial whose value at any positive integer k gives the exact number of ways to properly k-color the graph's vertices, enabling us to count valid assignments even when k is much larger than the chromatic number.

📌 Key points (3–5)

What the chromatic polynomial is: a polynomial P_G(x) where P_G(k) equals the number of k-colorings of graph G for any positive integer k.
Real-world application: assigning k employees to n jobs when some jobs conflict—each valid assignment corresponds to a k-coloring of the conflict graph.
Key property: the polynomial has degree n (number of vertices) and is monic (leading coefficient is 1).
Deletion-contraction formula: P_G(x) = P_{G-α}(x) − P_{G⊗α}(x), where G-α deletes edge α and G⊗α contracts it; this recursively builds the polynomial.
Common confusion: the chromatic polynomial is not a generating function—the number of k-colorings is the value P_G(k), not the coefficient of x^k.

🎯 Definition and motivation

🎯 What the chromatic polynomial counts

The chromatic polynomial of G is the polynomial whose value at input k is the number of k-colorings.

For a labeled graph G with n vertices, there exists a polynomial P_G(x) such that P_G(k) counts all valid k-colorings for any positive integer k.
"Valid k-coloring" means assigning one of k colors to each vertex so that adjacent vertices have different colors.
The polynomial captures this count for all values of k simultaneously.

🏢 Real-world scenario: job assignment

Suppose you have k employees and n jobs, and some jobs conflict with each other.
Model this as a labeled graph G with n vertices (one per job) and edges between conflicting jobs.
Each employee is assigned one of k colors.
Making an assignment of employees to jobs is the same as making a k-coloring of G.
If k < C(G) (the chromatic number), there is no way to create a proper vertex-coloring.
For any k ≥ C(G), the chromatic polynomial P_G(k) tells you how many different valid assignments exist.

🔍 Not a generating function

The chromatic polynomial is not a generating function.
The number of k-colorings is the value P_G(k), not the k-th coefficient of the polynomial.
This distinction is important: the polynomial's value at k (which can be arbitrarily large) gives the count, even though the polynomial itself has only n+1 coefficients.

📐 Basic examples and formulas

📐 Complete graph K_n

If G is the labeled complete graph on n vertices, then P_G(x) = x(x − 1)(x − 2)···(x − n + 1).
Reason: the first vertex can be any of x colors, the second must differ from the first (x − 1 choices), the third must differ from the first two (x − 2 choices), and so on.

📐 Empty graph

If G is the empty labeled graph on n vertices (no edges), then P_G(x) = x^n.
Reason: each vertex can independently be any of x colors, since no adjacencies impose restrictions.

📐 Path graph P_n

If G is the labeled path graph P_n, then P_G(x) = x(x − 1)^(n−1).
Reason: the first vertex has x choices; each subsequent vertex along the path must differ from its single neighbor, leaving (x − 1) choices for each of the remaining n − 1 vertices.

🌳 Tree with n vertices

If T is a labeled tree with n vertices, then P_T(x) = x(x − 1)^(n−1).
Proof sketch:
- Color the first labeled vertex: x choices.
- Consider vertices adjacent to the first: each must avoid the first vertex's color, so (x − 1) choices.
- Continue along branches: at each step, one color is removed (the parent's color), but the starting vertex's color is no longer adjacent, so (x − 1) colors remain available.
- This holds for all n − 1 remaining vertices.
Example: a tree "branches out" from a starting vertex, and each branch can use any of the (x − 1) colors not used by its parent.

🔧 The deletion-contraction formula

🔧 Statement of the formula

Let G be a labeled graph and α be an edge of G. Let G−α be the graph obtained by deleting α, and G⊗α be the graph obtained by contracting edge α. Then P_G(x) = P_{G−α}(x) − P_{G⊗α}(x).

Deleting edge α (G−α): remove the edge but keep both endpoints as separate vertices.
Contracting edge α (G⊗α): merge the two endpoints of α into a single vertex, combining their adjacencies.
This formula recursively breaks down the chromatic polynomial into simpler graphs.

🔧 Why the formula works

Let v₁ and v₂ be the two vertices connected by edge α.
In G−α, v₁ and v₂ are no longer adjacent, so k-colorings of G−α fall into two disjoint cases:
1. v₁ and v₂ have the same color: these are exactly the k-colorings of G⊗α (since contracting α forces v₁ and v₂ to share a color).
2. v₁ and v₂ have different colors: these are exactly the k-colorings of G (the original graph with edge α).
Therefore, P_{G−α}(k) = P_G(k) + P_{G⊗α}(k).
Rearranging: P_G(k) = P_{G−α}(k) − P_{G⊗α}(k).

🔧 Example: cycle C₄

Let G be the labeled cycle C₄.
Choose an edge α: then G−α is the labeled path P₄, and G⊗α is the labeled cycle C₃.
Chromatic polynomial for P₄: P_{P₄}(x) = x(x − 1)³ = x⁴ − 3x³ + 3x² − x.
Chromatic polynomial for C₃: P_{C₃}(x) = x(x − 1)(x − 2) = x³ − 3x² + 2x.
Apply the formula: P_{C₄}(x) = P_{P₄}(x) − P_{C₃}(x) = (x⁴ − 3x³ + 3x² − x) − (x³ − 3x² + 2x) = x⁴ − 4x³ + 6x² − 3x = x(x − 1)(x² − 3x + 3).

🔧 Example: cycle C₅

Let G be the labeled cycle C₅.
Choose an edge α: then G−α is the labeled path P₅, and G⊗α is the labeled cycle C₄.
Chromatic polynomial for P₅: P_{P₅}(x) = x(x − 1)⁴.
Chromatic polynomial for C₄: P_{C₄}(x) = x⁴ − 4x³ + 6x² − 3x (from the previous example).
Apply the formula: P_{C₅}(x) = x(x − 1)⁴ − (x⁴ − 4x³ + 6x² − 3x) = x(x − 1)(x − 2)(x² − 2x + 2).
Roots of P_{C₅}(x): 0, 1, and 2 are roots, but 3 is not.
Therefore, the chromatic number χ(C₅) = 3 (the smallest positive integer k for which P_{C₅}(k) > 0).

📊 Properties of the chromatic polynomial

📊 Degree and leading coefficient

If G is a labeled graph with n vertices, then P_G(x) has degree n and is monic.
Degree n: the highest power of x in P_G(x) is x^n.
Monic: the coefficient of x^n is 1.
This follows from the recursive structure: the empty graph on n vertices has P_G(x) = x^n, and deletion-contraction preserves degree and leading coefficient.

📊 Finding the chromatic number from the polynomial

The chromatic number C(G) is the smallest positive integer k such that P_G(k) > 0.
Example: for C₅, P_{C₅}(0) = 0, P_{C₅}(1) = 0, P_{C₅}(2) = 0, but P_{C₅}(3) > 0, so χ(C₅) = 3.
Don't confuse: the chromatic number is the smallest k that works, not the degree of the polynomial or the number of roots.

12.6 Coloring regions with two colors

🧭 Overview

🧠 One-sentence thesis

Any set of circles drawn in the plane, no matter how they intersect, can always be colored using only two colors so that regions sharing a boundary arc have different colors.

📌 Key points (3–5)

Main result: circles in the plane divide it into regions that can always be 2-colored so adjacent regions differ in color.
What "adjacent" means: two regions are adjacent if they share a boundary arc of non-zero length; touching only at a single point does not count.
Proof method: induction on the number of circles—remove one circle, color the remaining regions, then restore the circle and flip colors inside it.
Graph connection: the regions form a bipartite graph (2-colorable graph), which by definition contains no odd cycles.
Common confusion: the graph has no odd cycles because each circle contributes an even number of edges to any cycle (in, out, in, out…), not because of the 2-coloring itself—though both facts are equivalent.

🎨 The two-color theorem for circles

🎨 Statement and definitions

Theorem 12.6.1: The regions formed by circles in the plane can be colored with two colors so that regions sharing a boundary arc are colored differently.

"Regions" are the areas created when circles divide the plane.
"Adjacent" means sharing a boundary arc of non-zero length.
Regions that only touch at a single vertex are not considered adjacent.

🔍 Why two colors suffice

No matter how many circles you draw or how they intersect, you never need more than two colors.
This is surprising because the number of regions grows quickly, yet the coloring complexity does not.

🧮 Proof by induction

🧮 Base case: one circle

When there is only one circle (n = 1), color the inside gray and the outside white.
This trivially satisfies the requirement: the two regions are adjacent and have different colors.

🔁 Inductive step: adding the nth circle

Assume the theorem holds for any n − 1 circles.
Remove one circle C from a drawing of n circles.
Color the remaining n − 1 circles' regions using two colors (gray and white) by the inductive hypothesis.
Restore circle C and reverse the color of every region inside C (gray ↔ white).

✅ Why the reversal works

Arcs outside C: these regions are unchanged, so their coloring remains valid.
Arcs inside C: all colors are flipped together, so any two adjacent regions inside C that were different colors remain different.
Arcs on C itself: regions on opposite sides of C now have different colors because one side was flipped and the other was not.

Example: Suppose before restoring C, two regions inside C are adjacent and colored gray and white. After flipping, they become white and gray—still different. A region inside C (now flipped) and a region outside C (not flipped) that share part of C's boundary will also differ in color.

🔗 Connection to graph theory

🔗 From regions to graphs

Construction: place one vertex in each region and draw an edge between two vertices exactly when their regions are adjacent.
This graph is called the dual graph of the circle arrangement.

🟢 Bipartite graphs and 2-colorability

A graph is bipartite (or 2-colorable) if its vertices can be colored with two colors so that no two adjacent vertices share the same color.

The graph constructed from circle regions is bipartite because the regions themselves can be 2-colored.
By Theorem 12.2.2 (referenced in the excerpt), a graph is 2-colorable if and only if it contains no odd cycles.

🔄 Why there are no odd cycles

Key insight: each edge in the dual graph crosses exactly one circle.
When you walk around any cycle in this graph, you cross each circle an even number of times (in, out, in, out, …).
Therefore, every cycle has even length—no odd cycles exist.
Don't confuse: the absence of odd cycles is both a consequence of 2-colorability and an independent geometric fact about how circles intersect.

🖍️ Extensions and remarks

🖍️ Elementary school drawings

The theorem extends to "elementary school drawings": shapes (triangles, circles, quadrilaterals, etc.) drawn on top of a checkerboard grid.
Method: start with a checkerboard (already 2-colored), then inductively add each shape and apply the circle-flipping argument from Theorem 12.6.1.
Example: draw a triangle overlapping a checkerboard; reverse colors inside the triangle to maintain the 2-coloring property.

🧩 Exercises and applications

The excerpt includes exercises asking whether certain region arrangements can be 2-colored (testing understanding of adjacency and the theorem's scope).
It also asks: if G is a connected bipartite graph, how many ways can you 2-color it? (Answer depends on fixing one vertex's color, then the rest are determined.)
If G has n connected components, the number of 2-colorings multiplies: each component can be flipped independently.

The Four Color Theorem

12.7 The four color theorem

🧭 Overview

🧠 One-sentence thesis

The four color theorem states that any planar map can be colored with only four colors so that adjacent regions have different colors, and while the full proof requires computer assistance, a simpler argument shows that six colors always suffice.

📌 Key points (3–5)

What the theorem claims: every planar map (or planar graph) can be colored with only 4 colors so that adjacent regions (or vertices) have different colors.
Historical proof challenge: the first correct proof (1976, Appel & Haken) required computer simulations and raised philosophical questions about what constitutes a valid proof.
Map-to-graph translation: any map coloring problem can be converted to a planar graph vertex coloring problem by placing a vertex in each region and connecting adjacent regions with edges.
Common confusion: proving 4-colorability is hard and requires computers, but proving 6-colorability (or 5-colorability) can be done with shorter, human-verifiable proofs.
Key structural fact: every planar graph has at least one vertex of degree at most 5, which is the foundation for the 6-colorability proof.

🗺️ From maps to graphs

🗺️ The cartographer's problem

In 1852, mapmakers needed to know: how many colors are needed to color any planar map so that regions sharing a border have different colors?
The excerpt shows a map example that requires at least 4 colors.
After coloring many maps successfully with 4 colors, the question became: can all planar maps be colored with only 4 colors?

🔄 Translating maps into planar graphs

The excerpt describes a systematic conversion process:

Map element	Graph element
Each region	One vertex
Adjacent regions (sharing a boundary)	An edge between their vertices
Valid map coloring (no adjacent regions same color)	Valid vertex coloring (no adjacent vertices same color)

This translation means proving the four color theorem for maps is equivalent to proving it for planar graphs.
Example: the excerpt shows a map with regions labeled N, C, O, I, U, A and demonstrates how it becomes a planar graph.

🎨 Why at least 4 colors are needed

The excerpt walks through a proof by contradiction for a specific map:

Suppose 3 colors suffice.
Color region N red, C blue, O green.
Region I is adjacent to red and green, so it must be blue.
Region U is adjacent to red and blue, so it must be green.
Region A is now adjacent to blue, red, and green—no valid color remains.
Therefore at least 4 colors are needed for this map.

Don't confuse: "at least 4 needed" (proven by example) vs. "at most 4 suffice" (the four color theorem itself, much harder to prove).

📜 History and philosophical implications

📜 Timeline of the four color theorem

1852: Francis Guthrie first raised the question in England.
1879: Alfred Kempe published an incorrect proof, believed correct for a decade.
1886: The problem was posed to students with instructions that solutions must fit on one page of manuscript and one page of diagrams (extremely difficult).
1976: Appel & Haken gave the first correct proof, using 1,000 hours of computer time to check various cases.
Present day: the proof has simplified but still requires computer assistance.

🤔 Philosophical questions raised

The computer-assisted proof raised questions at the intersection of philosophy, mathematics, and computer science:

What is a proof?
What does it mean for a proof to be verified or certified by a computer?
What are the implications when a proof is too long for every step to be understood by a team of human readers?

The excerpt notes that computer-assisted proofs have since grown in number, scope, and importance.

🔢 Proving six colors suffice

🔑 Key structural lemma

Lemma 12.7.2: Every planar graph has a vertex of degree at most 5.

Proof idea (by contradiction):

Assume every vertex has degree at least 6.
Let e = number of edges, v = number of vertices.
The sum of all vertex degrees equals 2e (by a theorem from earlier in the text).
If every vertex has degree at least 6, then 2e ≥ 6v, so e ≥ 3v.
But a planar graph can have at most 3v − 6 edges (from an earlier exercise).
This is a contradiction.

Why this matters: knowing that every planar graph (and every subgraph) has a low-degree vertex is the key to the 6-colorability proof.

🧩 General coloring theorem

Theorem 12.7.3: Let d be a positive integer. Suppose every subgraph of a graph G contains a vertex whose degree is less than or equal to d. Then G is (d + 1)-colorable.

Proof strategy (vertex removal and induction):

Start with graph G with v vertices.
Remove vertices one-by-one to form a sequence: G = G_v, G_{v−1}, G_{v−2}, ..., G_3, G_2, G_1.
At each step, remove a vertex of degree at most d (guaranteed by assumption).
The base case G_1 (single vertex) is clearly (d + 1)-colorable.
If G_{i−1} is (d + 1)-colorable, then G_i is also (d + 1)-colorable, because the removed vertex has at most d neighbors, and we have d + 1 colors available.
Work backwards from G_1 to G_v to conclude that G is (d + 1)-colorable.

Example: when you remove a vertex with degree 5 or less, and you have 6 colors available, you can always find a color for that vertex when adding it back (since at most 5 colors are used by its neighbors).

✅ Six-colorability corollary

Corollary 12.7.4: Any planar graph G, and hence any planar map, is 6-colorable.

Proof:

Any subgraph of a planar graph is also planar.
By Lemma 12.7.2, every planar subgraph has a vertex of degree at most d = 5.
By Theorem 12.7.3, G is colorable with d + 1 = 5 + 1 = 6 colors.

Don't confuse: this proof shows 6 colors suffice (relatively easy), but the four color theorem shows 4 colors suffice (much harder, requires computers). Similar proofs exist for 5-colorability, though they are "a bit longer."

🔗 Connection to bipartite graphs

🔗 Circle-region graphs are 2-colorable

The excerpt mentions an earlier example involving regions formed by circles in the plane:

Place a vertex in each region.
Draw an edge between two vertices when their regions are adjacent.
This graph is 2-colorable (bipartite).

🔄 Why no odd cycles exist

The excerpt provides a direct proof:

Each edge of this graph crosses a single circle.
When walking around a cycle, each circle C contributes an even number of edges (alternating in, out, in, out, ... of C).
Therefore every cycle has even length.
By an earlier theorem (Theorem 12.2.2), a graph is 2-colorable if and only if it has no odd cycles.

Don't confuse: the circle-region case (2-colorable) vs. general planar maps (4-colorable). The special structure of circle-formed regions makes them much easier to color.

Counting Rocks! An Introduction to Combinatorics

🧭 Overview

🧠 One-sentence thesis

📌 Key points (3–5)

🔢 Three ways to count handshakes

🔢 Arithmetic with double-counting

📝 Sequential counting by student

🎨 Visual graph model

📊 Graph terminology and structure

📊 Basic definitions

🔗 Why "complete"

🧩 What this problem introduces

🧩 Counting problems

🧩 Mathematical proofs and techniques

🧩 Extensions of the handshake problem

🎯 Pedagogical approach

🎯 Group work and communication

🎯 Problem-solving process

🧭 Overview

🧠 One-sentence thesis

📌 Key points (3–5)

🔢 Triangular numbers and cumulative sums

🔢 What triangular numbers count

🧮 Proof by rearrangement

🎨 Combinatorial proof using rocks

🔄 Factorials and ordering problems

🔄 What factorials count

📋 The reasoning behind factorials

🎯 Partial arrangements

🎲 Binomial coefficients and subset selection

🎲 What binomial coefficients count

🔍 Why order doesn't matter

🧩 Two methods to derive the formula

Method 1: Adjust for overcounting

Method 2: Line up and select

📊 Combining binomial coefficients with factorials

🔑 Proof strategies recap

🔑 Rearrangement proofs

🔑 Combinatorial proofs

🧭 Overview

🧠 One-sentence thesis

📌 Key points (3–5)

🎨 What graphs are and how to use them

🎨 Basic definition

🔗 Bipartite graphs

🚶 Walks on graphs

🌉 The Königsberg bridge problem

🖊️ Tracing edges

🔄 When are two graphs the same?

🔄 What "sameness" means

🔍 How to tell if two graphs are different

📐 Graph properties

📐 Degree of a vertex

🗺️ Planar graphs

🎨 Bipartite property revisited

🧩 Application examples

🧩 Snow plow operator problem

📬 Mail carrier problem

🌉 Relation to Königsberg

🧭 Overview

🧠 One-sentence thesis

📌 Key points (3–5)

💻 What SAGE is and where to find it

💻 Definition and access

🧮 Why use computing software instead of calculators

🔧 Basic SAGE commands

➕ Simple arithmetic and factoring

🔍 Comparing values

📋 Computing multiple values at once

📋 List comprehensions

🎯 Filtering with conditions

⚠️ Important note about range(m)

🧪 Example problems from the excerpt

🧪 Comparing exponentials and factorials

🧪 Finding binomial coefficients above a threshold

🧪 Very large computations

🧭 Overview

🧠 One-sentence thesis

📌 Key points (3–5)

🏛️ Proven statements and their roles

⚠️ Important note about `range(m)`