Linear Algebra

Section I: Solving Linear Systems

🧭 Overview

🧠 One-sentence thesis

Gauss's Method systematically reduces linear systems to determine whether they have a unique solution, infinitely many solutions, or no solution, depending on whether contradictory equations appear and whether all variables lead rows.

📌 Key points (3–5)

What Gauss's Method does: transforms a linear system through row operations into a simpler form that reveals the solution structure.
Three possible outcomes: unique solution (every variable leads a row, no contradictions), infinitely many solutions (some variables don't lead rows, no contradictions), or no solution (contradictory equation appears).
How to recognize each case: contradictory equation (e.g., 0 = -1) means no solution; variables not leading rows mean infinitely many solutions; otherwise, unique solution.
Common confusion: a system with more unknowns than equations does not always have infinitely many solutions—it can still have no solution if contradictions arise.
Row operations preserve solutions: operations like adding a multiple of one row to another do not change the solution set.

🔧 The Gauss's Method procedure

🔧 What row operations do

Gauss's Method uses operations written as notation like "−(1/2)ρ₁ + ρ₂" (subtract half of row 1 from row 2) or "ρ₁ ↔ ρ₄" (swap rows 1 and 4).
These operations transform the system step-by-step into a simpler form.
The goal is to isolate variables and make the structure of solutions visible.

🔄 Back-substitution

Once the system is reduced, solve for variables starting from the bottom row and working upward.
Example: If the final form is "x + y + z = 5; −2y − z = −5; z = 4", first find z = 4, then substitute into the second equation to find y, then substitute both into the first to find x.

🧩 Recognizing solution types

✅ Unique solution

A system has a unique solution when there is no contradictory equation and every variable leads some row.

"Every variable leads a row" means each variable appears as the first nonzero entry in at least one equation after reduction.
Example: The system reduces to "x − z = 0; y + 3z = 1; −3z = 3" gives x = −1, y = 4, z = −1 as the only solution.

♾️ Infinitely many solutions

A system has infinitely many solutions when there is no contradictory equation but at least one variable does not lead any row.

Variables that don't lead rows are called "free variables" and can take any value.
Example: The system "x − 3y + z = 1; 4y + z = 13" has variable z not leading a row, so there are many solutions.
Don't confuse: "many solutions" does not mean "all possible values work"—only certain combinations satisfy the equations.

❌ No solution

A system has no solution when a contradictory equation appears (an equation like 0 = k where k ≠ 0).

Example: Reduction produces "−x − y = 1; 0 = −1", which is impossible, so no solution exists.
A contradictory equation means the original equations are inconsistent.

🔍 More unknowns than equations

The excerpt shows that a system with more unknowns than equations can still have no solution.
Example: "x + y + z = 0; x + y + z = 1" has three unknowns and two equations, but the two equations contradict each other (the left sides are identical but the right sides differ).

🧮 Working with parameters

🧮 Systems depending on constants

Some systems include parameters (like b₁, b₂, b₃) instead of specific numbers.
Gauss's Method reveals conditions on the parameters for the system to be consistent.
Example: After reduction, if the final rows are "0 = 2b₁ − b₂ + b₃" and "0 = b₁ − b₂ + b₄", the system is consistent if and only if both b₃ = −2b₁ + b₂ and b₄ = −b₁ + b₂.

🧮 Always consistent systems

Some systems are consistent for any values of the parameters.
Example: A system that reduces to three equations with three leading variables "always has a unique solution" regardless of the parameter values.

🔗 Relationships between equations

🔗 Linear combinations of rows

One equation can be a linear combination of others (e.g., c₁ρ₁ + c₂ρ₂).
To check if a given row is a combination, solve for the coefficients c₁, c₂, etc.
Example: To see if "6x − 9y + 5z = −2" comes from "2x + y − z = 4" and "6x − 3y + z = 5", solve the system relating coefficients: 2c₁ + 6c₂ = 6, c₁ − 3c₂ = −9, etc.

🔗 Verifying solutions

When using substitution methods (solving for one variable in terms of others), always substitute back into all original equations to verify.
Example: If x = 1 − 3y is derived from one equation, check that it satisfies all other equations in the system.

📐 Special applications

📐 Trigonometric systems

Systems can involve trigonometric functions by substituting variables.
Example: Let x = sin α, y = cos β, z = tan γ, then solve the resulting linear system.
Important: After finding numerical solutions, check whether they correspond to valid angles (e.g., sin α = 2 has no real solution because sine values must be between −1 and 1).

📐 Finding polynomial functions

Given points that a polynomial must pass through, set up a system for the coefficients.
Example: If f(1) = 2, f(−1) = 6, f(2) = 3 for f(x) = ax² + bx + c, the system is "a + b + c = 2; a − b + c = 6; 4a + 2b + c = 3". Solving gives f(x) = x² − 2x + 3.

📐 Chemical reactions

The excerpt notes that chemical reactions can be modeled as linear systems.
If a reaction occurs, there must be at least one nonzero solution, and any multiple of a solution is also a solution (representing the same reaction in different quantities).

Linear Systems: Gauss's Method and Solution Types

Section II: Linear Geometry

🧭 Overview

🧠 One-sentence thesis

Gauss's Method systematically reduces linear systems to reveal whether they have a unique solution, infinitely many solutions, or no solution at all, depending on whether contradictory equations appear and whether all variables lead rows.

📌 Key points (3–5)

What Gauss's Method does: transforms a system of linear equations through row operations into a simpler form that reveals the solution structure.
Three possible outcomes: unique solution (every variable leads a row, no contradictions), infinitely many solutions (some variables don't lead rows, no contradictions), or no solution (contradictory equation appears).
Common confusion: distinguishing when a system has infinitely many solutions vs. no solution—look for contradictory equations like "0 = -1" (no solution) vs. free variables that don't lead any row (infinitely many solutions).
Row operations preserve solutions: operations like adding a multiple of one equation to another do not change the solution set.
Parameters matter: the coefficients in a system determine whether it has unique, infinite, or no solutions, independent of the constant terms in some cases.

🔧 How Gauss's Method works

🔧 The reduction process

Gauss's Method: a systematic procedure that uses row operations to transform a linear system into a simpler equivalent form.

Row operations include:
- Adding a multiple of one equation (row) to another, written as "c·ρ₁ + ρ₂"
- Swapping equations, written as "ρ₁ ↔ ρ₂"
Each operation transforms the system but preserves the solution set.
The goal is to create a "staircase" pattern where each variable either leads a row or doesn't appear as a leading variable.

Example: The system "2x + 3y = 13" and "x + 5y = 13" becomes "2x + 3y = 13" and "-(5/2)y = -15/2" after the operation "-(1/2)ρ₁ + ρ₂", revealing y = 3 and then x = 2.

🔄 Back-substitution

Once the system is reduced, solve from the bottom equation upward.
Each equation should have fewer variables than the one above it.
Substitute known values into earlier equations to find remaining variables.

Example: From "z = 4", "2y + z = 2", and "x + 3y + z = 0", work backward: z = 4, then y = -1, then x = -1.

🎯 Recognizing solution types

✅ Unique solution

When it occurs: every variable leads some row, and no contradictory equation appears.
The final form has as many leading variables as there are unknowns.
Each variable can be determined by back-substitution.

Example: The reduced system "x + y - z = 10", "-4y + 3z = -20", "(5/4)z = 0" gives the unique solution (x, y, z) = (5, 5, 0).

♾️ Infinitely many solutions

When it occurs: no contradictory equation exists, but some variables do not lead any row.
These non-leading variables are free variables—they can take any value.
For each choice of free variable values, back-substitution yields a different solution.

Example: The system "x - 3y + z = 1" and "4y + z = 13" has z as a free variable (it doesn't lead a row), so there are infinitely many solutions.

❌ No solution

When it occurs: a contradictory equation appears, such as "0 = -1" or "0 = 3".
This happens when row operations produce an equation where all variable coefficients are zero but the constant is nonzero.

Example: The reduction "-x - y = 1" and "0 = -1" shows no solution exists.

Don't confuse: "0 = 0" (which is always true and indicates a redundant equation) with "0 = nonzero" (which is impossible and indicates no solution).

🧮 Parameters and consistency

🧮 Systems with parameters

When coefficients or constants contain parameters (like b₁, b₂, k), the solution type can depend on parameter values.
Reduction reveals consistency conditions: relationships among parameters that must hold for a solution to exist.

Example: The system reduced to "x - 3y = b₁", "10y = -3b₁ + b₂", "0 = 2b₁ - b₂ + b₃", "0 = b₁ - b₂ + b₄" is consistent only if both b₃ = -2b₁ + b₂ and b₄ = -b₁ + b₂.

🧮 Coefficient conditions for unique solutions

For a two-equation, two-variable system "ax + by = j" and "cx + dy = k":
- If a ≠ 0, there is a unique solution if and only if ad - bc ≠ 0.
- The constants j and k do not affect whether a unique solution exists, only its value.
This condition (ad - bc ≠ 0) ensures that reduction does not produce a zero coefficient for the second variable.

Why it matters: The structure of coefficients determines solution type before you even look at the constant terms.

🔗 Properties and special cases

🔗 Equivalent systems and row operations

Two systems are equivalent if they have the same solution set.
Any row operation produces an equivalent system.
A given equation is a valid row operation result if it can be written as a linear combination of the original equations.

Example: To check if "6x - 9y + 5z = -2" comes from "2x + y - z = 4" and "6x - 3y + z = 5", solve for c₁ and c₂ in "c₁·(2, 1, -1, 4) + c₂·(6, -3, 1, 5) = (6, -9, 5, -2)". Finding c₁ = -3 and c₂ = 2 confirms it is -3ρ₁ + 2ρ₂.

🔗 More unknowns than equations

A system with more unknowns than equations can still have no solution.
Common misconception: "more unknowns than equations always means infinitely many solutions."

Example: The system "x + y + z = 0" and "x + y + z = 1" has three unknowns but two equations, yet has no solution because the equations contradict.

🔗 Substitution method

An alternative to Gauss's Method: solve one equation for a variable, substitute into others.
Important: always check potential solutions by substituting back into all original equations, not just the ones used in derivation.

Example: From "x = 1 - 3y" and "2x + 2y = 0", substituting gives y = 1/2, but this must be verified in both original equations.

Reduced Echelon Form

Section III: Reduced Echelon Form

🧭 Overview

🧠 One-sentence thesis

Reduced echelon form provides a unique canonical representation of a matrix that simplifies solving linear systems and determining properties like linear independence and span.

📌 Key points (3–5)

Gauss-Jordan reduction extends Gaussian elimination by continuing upward to produce reduced echelon form (diagonal ones with zeros above and below).
Uniqueness: Every matrix has exactly one reduced echelon form, making it useful for comparing matrices and solution sets.
Row equivalence: Two matrices are row equivalent if and only if they have the same reduced echelon form.
Common confusion: Reduced echelon form vs. echelon form—reduced form requires zeros above leading entries, not just below.
Applications: Determining bases, checking linear independence, and classifying solution sets by their canonical forms.

🔧 Gauss-Jordan reduction process

🔧 What Gauss-Jordan adds to Gaussian elimination

Gauss-Jordan reduction: The process of applying row operations to transform a matrix into reduced echelon form, where each leading entry is 1 and all entries above and below each leading 1 are zero.

Gaussian elimination produces echelon form (zeros below leading entries).
Gauss-Jordan continues by eliminating entries above each leading 1.
The result is a diagonal-like structure in the pivot columns.

Example: After reaching echelon form with leading entries, use row operations like "−ρ₂ + ρ₁" to clear entries above the second pivot.

🔢 Mechanical steps

First perform forward elimination (Gauss's Method) to get echelon form.
Scale each pivot row so the leading entry becomes 1.
Work upward: for each pivot column, eliminate all entries above the leading 1.
The result has leading 1s and zeros in all other positions of pivot columns.

Don't confuse: The order matters—you must reach echelon form before working upward, or you may create new nonzero entries below.

🎯 Properties and uniqueness

🎯 Canonical form property

Every matrix has exactly one reduced echelon form.
This uniqueness makes reduced echelon form a "canonical representative" of its row-equivalence class.
Two matrices are row equivalent ⟺ they have identical reduced echelon forms.

Why this matters: You can test whether two matrices are row equivalent by reducing both and comparing the results.

📊 Classification by reduced form

The excerpt shows that matrices can be grouped into equivalence classes based on their reduced echelon form.

Matrix size	Number of distinct reduced forms
2×2	Four types (including zero matrix, identity, and forms with free variables)
Larger	Depends on number of pivots and free variables

Example: All 2×2 matrices that reduce to the identity matrix form one equivalence class—the nonsingular 2×2 matrices.

🔍 Reading solution sets from reduced form

Free variables correspond to columns without leading 1s.
Each free variable becomes a parameter in the solution set.
The reduced form immediately shows whether the system has no solution, unique solution, or infinitely many solutions.

Don't confuse: A column of zeros in reduced form vs. a zero row—zero columns indicate a free variable; zero rows indicate dependent equations.

🧩 Linear independence and span connections

🧩 Testing linear independence

The Linear Combination Lemma (referenced in the excerpt) states that in reduced echelon form, no nonzero row is a linear combination of the other rows.

A set of vectors is linearly independent ⟺ the matrix formed from them reduces to a form with a pivot in every row.
If the reduced form has a zero row, the original vectors were linearly dependent.

🌐 Determining span

The span of a set of vectors equals the span of the nonzero rows in the reduced echelon form.
The number of nonzero rows in reduced form equals the dimension of the span.

Example: If three vectors reduce to a matrix with two nonzero rows, the vectors span a two-dimensional subspace.

⚙️ Basis extraction

From the excerpt's discussion: given a spanning set, reduce the matrix formed by the vectors. The columns corresponding to pivot columns in the reduced form identify a basis subset.

Don't confuse: Pivot columns in the original matrix vs. in the reduced form—the positions of pivots tell you which original vectors to keep, not the reduced vectors themselves.

📐 Computational considerations

📐 Accuracy issues

The excerpt includes exercises on computational accuracy, noting that:

Rounding errors accumulate during row operations.
Small pivots can amplify errors.
Row swapping can improve numerical stability.

Example: When a pivot is very small (like 0.0003), dividing by it creates large multipliers that magnify rounding errors.

🔄 Equivalence relations

The excerpt proves that row equivalence is an equivalence relation:

Reflexive: Every matrix is row equivalent to itself.
Symmetric: If A is row equivalent to B, then B is row equivalent to A.
Transitive: If A ~ B and B ~ C, then A ~ C.

This justifies treating reduced echelon form as a canonical representative of each equivalence class.

🎲 Uniqueness of representation

The excerpt establishes that if a set is a basis, then every vector has a unique representation as a linear combination of basis elements. The reduced echelon form makes this uniqueness computationally explicit—the solution to the system is unique when there are no free variables.

Reduced Echelon Form

🧭 Overview

🧠 One-sentence thesis

Reduced echelon form provides a unique canonical representation of matrices that enables systematic solution of linear systems and determination of fundamental properties like linear independence and dimension.

📌 Key points (3–5)

• Gauss-Jordan reduction extends Gaussian elimination by eliminating entries above leading ones to produce reduced echelon form • Every matrix has exactly one reduced echelon form, making it a canonical representative for testing row equivalence • The reduced form directly reveals solution structure: free variables correspond to non-pivot columns • Common confusion: echelon form vs reduced echelon form—reduced form requires zeros above and below each leading 1, not just below • Applications include determining bases, testing linear independence, and classifying solution sets

🔧 The Gauss-Jordan reduction process

🔧 What it adds to Gaussian elimination

Gauss-Jordan reduction: the process of applying row operations to transform a matrix into reduced echelon form, where each leading entry is 1 and is the only nonzero entry in its column.

• Gaussian elimination produces echelon form (zeros below pivots) • Gauss-Jordan continues by clearing entries above each pivot • Result: each pivot column has a single 1 with zeros elsewhere

Example: After forward elimination gives echelon form, apply operations like "−(1/2)ρ₂ + ρ₁" to eliminate entries above the second pivot.

🔢 Mechanical procedure

Apply forward elimination to reach echelon form
Scale each pivot row so its leading entry becomes 1
Working from bottom to top, eliminate all entries above each leading 1
Result has leading 1s as the only nonzero entries in their columns

Don't confuse: You must complete forward elimination first—working upward prematurely can create new nonzero entries below pivots.

🎯 Uniqueness and canonical properties

🎯 The uniqueness theorem

• Every matrix has exactly one reduced echelon form • This makes reduced echelon form a canonical representative of each row-equivalence class • Two matrices are row equivalent if and only if they have identical reduced echelon forms

Why it matters: To test if matrices A and B are row equivalent, reduce both and compare—if the reduced forms match, the matrices are equivalent.

📊 Classification by reduced form

The excerpt demonstrates that matrices can be organized into equivalence classes based on their reduced echelon form.

Matrix size Typical reduced forms 2×2 Zero matrix, identity, forms with one pivot, forms with two pivots m×n Determined by number and positions of pivots

Example: All nonsingular n×n matrices reduce to the identity matrix and form one equivalence class.

🔍 Reading solutions from reduced form

• Free variables: correspond to columns without leading 1s • Each free variable becomes a parameter in the solution description • The pattern immediately shows: no solution (contradictory row), unique solution (no free variables), or infinitely many solutions (free variables present)

Don't confuse: A zero column vs a zero row—zero columns indicate free variables; zero rows indicate redundant equations (or contradiction if augmented).

🧩 Connections to linear independence and span

🧩 Testing linear independence

The Linear Combination Lemma establishes that in reduced echelon form, no nonzero row is a linear combination of other rows.

• Vectors are linearly independent ⟺ their matrix reduces to a form with a pivot in every row • A zero row in reduced form signals linear dependence in the original vectors • The number of pivots equals the dimension of the span

🌐 Determining span and dimension

• The span of vectors equals the span of nonzero rows in reduced echelon form • Number of nonzero rows = dimension of the span • Pivot columns identify which original vectors form a basis for the span

Example: Three vectors from R³ that reduce to two nonzero rows span a two-dimensional subspace (a plane through the origin).

⚙️ Extracting a basis

From the excerpt's exercises: Given a spanning set, form a matrix with vectors as rows (or columns), reduce it, and identify pivot positions.

• The original vectors corresponding to pivot positions form a basis • This gives a systematic way to extract a linearly independent spanning subset

Don't confuse: Use pivot positions to select from original vectors, not the reduced vectors themselves—the reduced form shows which original vectors to keep.

📐 Computational and theoretical aspects

📐 Numerical accuracy

The excerpt includes exercises highlighting that:

• Rounding errors accumulate through row operations • Small pivots amplify errors when used as divisors • Row swapping can improve numerical stability

Example: A tiny pivot like 0.0003 creates large multipliers (like −1151.33) that magnify rounding errors in subsequent calculations.

🔄 Row equivalence as an equivalence relation

The excerpt proves row equivalence satisfies:

• Reflexive: A ~ A (every matrix is row equivalent to itself) • Symmetric: A ~ B implies B ~ A • Transitive: A ~ B and B ~ C implies A ~ C

This justifies using reduced echelon form as the unique canonical representative of each equivalence class.

🎲 Uniqueness of representations

When a set forms a basis, each vector has a unique representation as a linear combination of basis elements. Reduced echelon form makes this computational:

• Unique solution (no free variables) ⟺ unique representation • The coefficients in the solution are the coordinates with respect to the basis

The excerpt's exercises demonstrate this by showing that different linear combinations of basis vectors always yield different results.

Section I: Definition of Vector Space

🧭 Overview

🧠 One-sentence thesis

A vector space is a set equipped with addition and scalar multiplication operations that satisfy ten specific algebraic properties, and subspaces are subsets that inherit these operations and remain closed under them.

📌 Key points (3–5)

What a vector space is: a set with two operations (addition and scalar multiplication) satisfying ten conditions—five for addition (closure, commutativity, associativity, zero element, additive inverse) and five for scalar multiplication (closure, two distributive laws, associativity, identity).
How to verify a vector space: check all ten conditions systematically; many conditions rely on properties of real numbers (commutativity, associativity) so the checks are often routine.
What a subspace is: a nonempty subset of a vector space that is closed under the inherited operations; by Lemma 2.9, checking only nonemptiness and closure suffices.
Common confusion—subset vs subspace: not every subset is a subspace; for example, a set may contain vectors but fail closure (adding two members gives a non-member), or it may lack the zero vector.
Spanning sets: the span of a set S is the collection of all linear combinations of vectors in S; it is always a subspace, and finding a spanning set gives a parametrization of the subspace.

📐 The ten conditions for a vector space

📐 Conditions on addition (1–5)

A vector space V over R consists of a set V with two operations: vector addition (denoted +) and scalar multiplication (denoted ·), satisfying ten conditions.

Conditions 1–5 concern addition:

(1) Closure under addition: if v, w ∈ V then v + w ∈ V.
(2) Commutativity: v + w = w + v for all v, w ∈ V.
(3) Associativity: (u + v) + w = u + (v + w) for all u, v, w ∈ V.
(4) Zero vector: there exists a zero vector 0 ∈ V such that v + 0 = v for all v ∈ V.
(5) Additive inverse: for each v ∈ V there exists w ∈ V such that v + w = 0.

Why these matter:

Conditions (2) and (3) are inherited from real-number arithmetic when vectors have real components; the verifications in the excerpt repeatedly invoke "real number addition commutes" or "real number addition associates."
Condition (4) pins down a unique additive identity (the excerpt notes that every vector space has only one zero vector).
Condition (5) ensures every vector has a unique additive inverse (the excerpt proves this in Two.I.1.38).

Example: For the space of 2×2 real matrices, closure (1) holds because the sum of two 2×2 matrices is a 2×2 matrix; commutativity (2) holds entry-by-entry because real addition commutes.

📐 Conditions on scalar multiplication (6–10)

Conditions 6–10 concern scalar multiplication (r, s denote real scalars):

(6) Closure under scalar multiplication: if r ∈ R and v ∈ V then r · v ∈ V.
(7) Distributivity over vector addition: (r + s) · v = r · v + s · v.
(8) Distributivity over scalar addition: r · (v + w) = r · v + r · w.
(9) Associativity of scalar multiplication: (rs) · v = r · (s · v).
(10) Identity: 1 · v = v.

Why these matter:

Condition (6) ensures scalar multiples stay in the space.
Conditions (7)–(9) link scalar arithmetic with vector operations; they are often verified by expanding definitions and using real-number properties.
Condition (10) says that multiplying by the scalar 1 does nothing.

Example: For 2×2 matrices, (7) expands as (r + s)·(matrix) = (r + s)·(each entry) = r·(each entry) + s·(each entry) = r·(matrix) + s·(matrix).

Don't confuse: Scalar multiplication r · v is not the same as vector addition v + w; one mixes a number and a vector, the other combines two vectors.

🧪 Verifying a candidate space

🧪 Typical verification workflow

The excerpt's exercises (Two.I.1.19, Two.I.1.20, Two.I.1.21) follow a standard pattern:

State three example elements to show the set is nonempty.
Check closure (1) and (6) explicitly by showing sums and scalar multiples have the required form.
Check (2), (3), (7)–(10) by appealing to real-number properties (these checks are often "routine" or "similar to the prior item").
Check (4) and (5) by identifying the zero vector and showing each vector has an additive inverse in the set.

Example (from Two.I.1.19(b)): The set P = {a + bx | a − 2b = 0} (linear polynomials with a restriction).

Closure (1): if a + bx and c + dx satisfy a − 2b = 0 and c − 2d = 0, then (a + c) + (b + d)x satisfies (a + c) − 2(b + d) = (a − 2b) + (c − 2d) = 0.
Zero element (4): 0 + 0x satisfies 0 − 2·0 = 0, so it is in P.
Additive inverse (5): if a + bx ∈ P then −a − bx ∈ P because (−a) − 2(−b) = −(a − 2b) = 0.

🧪 Common failures

When a set is not a vector space:

Not closed under addition (Two.I.1.22(a)–(c)): adding two members gives a non-member.
- Example: the set of 3-tall vectors with first component 1 is not closed; (1 0 0) + (1 0 0) = (2 0 0) is not in the set.
Not closed under scalar multiplication (Two.I.1.22(d)): multiplying by a scalar gives a non-member.
- Example: {polynomials with all coefficients positive} fails because (−1)·(1 + x + x²) has negative coefficients.
Empty set (Two.I.1.22(e)): violates condition (4) because there is no zero vector.
Operations not inherited (Two.I.2.38(a)): if the operations are defined differently from the enclosing space, the set may not be a vector space even if it satisfies some conditions.

Don't confuse: A set containing the zero vector is not automatically a subspace; it must also be closed. Conversely, a set that is closed under addition and scalar multiplication but does not contain the zero vector cannot be a subspace (Two.I.2.38(c)).

🔲 Subspaces

🔲 Definition and Lemma 2.9

A subspace of a vector space V is a subset S ⊆ V that is itself a vector space under the operations inherited from V.

Lemma 2.9 (the shortcut): To verify that a nonempty subset S of a vector space V is a subspace, it suffices to check:

S is nonempty.
S is closed under addition: if v, w ∈ S then v + w ∈ S.
S is closed under scalar multiplication: if r ∈ R and v ∈ S then r · v ∈ S.

Why this works:

The other eight conditions (commutativity, associativity, etc.) are automatically inherited from V.
Nonemptiness plus closure under scalar multiplication (with r = 0) implies S contains the zero vector (Two.I.1.40(a)).
Closure under scalar multiplication (with r = −1) implies S contains additive inverses.

Example (Two.I.2.20(a)): The set of 2×2 matrices with zero in positions (1,2) and (2,1) is a subspace of M₂ₓ₂.

Nonempty: it contains the zero matrix.
Closed under addition: sum of two such matrices is again such a matrix.
Closed under scalar multiplication: scalar multiple of such a matrix is again such a matrix.

🔲 Trivial and improper subspaces

Trivial subspace: {0}, the set containing only the zero vector; it is a subspace of every vector space (Two.I.2.36).
Improper subspace: the entire space V is a subspace of itself (Two.I.2.31).
All other subspaces are proper and nontrivial.

Don't confuse: "Subspace" does not mean "smaller space"; the entire space is a subspace of itself.

🔲 Operations on subspaces

Intersection (Two.I.2.45(a)):

The intersection A ∩ B of two subspaces is always a subspace.
Proof: both contain 0, so A ∩ B is nonempty; if v, w ∈ A ∩ B then v, w ∈ A and v, w ∈ B, so r·v + s·w ∈ A and r·v + s·w ∈ B, hence r·v + s·w ∈ A ∩ B.

Union (Two.I.2.45(b)):

The union A ∪ B is a subspace only if one subspace contains the other (i.e., A ⊆ B or B ⊆ A).
Counterexample: in R³, let A be the x-axis and B be the y-axis; then (1 0 0) ∈ A and (0 1 0) ∈ B, but (1 1 0) ∉ A ∪ B.

Complement (Two.I.2.45(c)):

The complement V − A is never a subspace (it does not contain the zero vector).

🌐 Spanning sets

🌐 Definition of span

The span of a set S, denoted [S], is the set of all linear combinations of vectors in S: [S] = {r₁s₁ + r₂s₂ + ··· + rₙsₙ | r₁, …, rₙ ∈ R, s₁, …, sₙ ∈ S, n ∈ N}.

Key properties:

The span of any set S is a subspace (it is nonempty because it contains 0 = 0·s for any s ∈ S, and it is closed by the definition of linear combination).
If S ⊆ T then [S] ⊆ [T] (Two.I.2.48(a)).
The span does not depend on the enclosing space (Two.I.2.46); a linear combination in a subspace W gives the same sum as in the larger space V.

Example (Two.I.2.27(a)): The set {(c, b, c) | b, c ∈ R} can be written as {b(0, 1, 0) + c(1, 0, 1) | b, c ∈ R}, so it is the span of {(0, 1, 0), (1, 0, 1)}.

🌐 Finding a spanning set (parametrization)

Workflow:

Write the subspace description as a set of vectors satisfying some equations.
Solve for some components in terms of others (the free variables become parameters).
Factor out the parameters to express each vector as a linear combination of fixed vectors.
Those fixed vectors form a spanning set.

Example (Two.I.2.27(c)): The subspace {(a, b, c, d) | a + 3b = 0, 2a − c − d = 0}.

Solve: b = −(c + d)/6 and a = (c + d)/2.
Rewrite: (a, b, c, d) = ((c + d)/2, −(c + d)/6, c, d) = c·(1/2, −1/6, 1, 0) + d·(1/2, −1/6, 0, 1).
Spanning set: {(1/2, −1/6, 1, 0), (1/2, −1/6, 0, 1)}.

Don't confuse: A spanning set is not unique; many different sets can span the same subspace (Two.I.2.44).

🌐 Membership in a span

To check if a vector v is in [S]:

Set up the equation v = r₁s₁ + ··· + rₙsₙ (where s₁, …, sₙ are the vectors in S).
This gives a linear system in the unknowns r₁, …, rₙ.
If the system has a solution, v ∈ [S]; otherwise v ∉ [S].

Example (Two.I.2.22): Is (1, 0, 3) in the span of {(2, 1, −1), (1, −1, 1)}?

Set up: (1, 0, 3) = c₁(2, 1, −1) + c₂(1, −1, 1).
System: 2c₁ + c₂ = 1, c₁ − c₂ = 0, −c₁ + c₂ = 3.
Row reduction shows no solution, so (1, 0, 3) ∉ span.

🌐 Span and closure

Lemma (Two.I.2.37, Linear Combination Lemma):

The span of the span equals the span: [[S]] = [S].
In words: a linear combination of linear combinations of vectors from S is itself a linear combination of vectors from S.

Adding a vector to a spanning set (Two.I.2.44):

If v is already in [S], then [S ∪ {v}] = [S] (adding v does not enlarge the span).
Conversely, if [S ∪ {v}] = [S], then v ∈ [S].

🧮 Examples and edge cases

🧮 Standard vector spaces

The excerpt verifies several standard spaces:

P₁ (Two.I.1.19(a)): linear polynomials a + bx with real coefficients; spanning set {1, x}.
M₂ₓ₂ (Two.I.1.20(a)): 2×2 real matrices; spanning set consists of the four matrices with a single 1 and three 0s.
R³ (Two.I.1.21(a)): three-component row vectors; spanning set {(1, 0, 0), (0, 1, 0), (0, 0, 1)}.
Functions (Two.I.1.29): real-valued functions on R with pointwise addition and scalar multiplication; the subset of functions satisfying f(7) = 0 is a subspace.

🧮 Non-standard operations

Exotic vector space (Two.I.2.38(b)):

The set of three-tall vectors with x + y + z = 1, equipped with operations:
- Addition: (x₁, y₁, z₁) + (x₂, y₂, z₂) = (x₁ + x₂ − 1, y₁ + y₂, z₁ + z₂).
- Scalar multiplication: r·(x, y, z) = (rx − r + 1, ry, rz).
The zero element is (1, 0, 0) (not the usual zero vector).
All ten conditions can be verified; the space is isomorphic to the plane x + y + z = 0 in R³, "displaced" by 1 along the x-axis.

Don't confuse: A set with non-inherited operations may still be a vector space, but you must verify all ten conditions from scratch (Two.I.2.38(a) gives a counterexample where the operations fail).

🧮 Infinite-dimensional spaces

Polynomial spaces (Two.I.2.28(e)):

The space of all polynomials (of any degree) has spanning set {1, x, x², x³, …}.
Any nontrivial vector space over R is infinite (Two.I.1.40(c)): if v ≠ 0 then {k·v | k ∈ R} is infinite.

Function spaces (Two.I.2.30):

Even functions {f | f(−x) = f(x)} form a subspace of the space of all functions.
Odd functions {f | f(−x) = −f(x)} also form a subspace.
Differentiable functions form a subspace (Two.I.1.41).

🧮 Degenerate cases

R¹ subspaces (Two.I.2.34): the only subspaces of R¹ are {0} and R¹ itself (any nonzero vector spans the whole line).
Empty set: not a vector space (violates condition 4).
Singleton {v} where v ≠ 0: not a subspace (does not contain 0, not closed under addition).

Linear Independence

Section II: Linear Independence

🧭 Overview

🧠 One-sentence thesis

A set of vectors is linearly independent if the only way to combine them to produce the zero vector is the trivial combination where all coefficients are zero, and this property determines whether vectors provide genuinely distinct directions in a vector space.

📌 Key points (3–5)

Core definition: A set is linearly independent when the equation c₁·v₁ + c₂·v₂ + ⋯ + cₙ·vₙ = 0 forces all coefficients c₁, c₂, …, cₙ to be zero; otherwise the set is linearly dependent.
How to test: Set up the linear relationship equal to the zero vector, form a homogeneous system, and apply Gauss's Method—if the only solution is the trivial one (all coefficients zero), the set is independent.
Common confusion: A set containing the zero vector is always dependent (because you can multiply the zero vector by any nonzero scalar); a two-vector set is dependent if and only if one vector is a scalar multiple of the other.
Size limits: In Rⁿ, any set with more than n vectors must be linearly dependent (more unknowns than equations guarantees a free variable).
Why it matters: Independence determines whether vectors provide new directions or are redundant; it underlies the concept of dimension and basis.

🔍 Testing for independence

🔍 The fundamental procedure

To determine whether a set {v₁, v₂, …, vₙ} is linearly independent:

Write the linear relationship: c₁·v₁ + c₂·v₂ + ⋯ + cₙ·vₙ = 0 (where 0 is the zero vector).
This produces a homogeneous linear system (right-hand side all zeros).
Apply Gauss's Method to the coefficient matrix.
Independent: if the only solution is c₁ = 0, c₂ = 0, …, cₙ = 0 (the trivial solution).
Dependent: if there are infinitely many solutions (at least one free variable).

Linear independence: A set of vectors is linearly independent if the only solution to c₁·v₁ + c₂·v₂ + ⋯ + cₙ·vₙ = 0 is the trivial solution where all coefficients are zero.

Example: For the set {(1, 2, 0), (−1, 1, 0)} in R³, the system becomes:

c₁ − c₂ = 0
2c₁ + c₂ = 0
0 = 0

Reduction gives c₁ = 0 and c₂ = 0 as the only solution, so the set is independent.

🔍 Exhibiting dependence

When a set is dependent, you must show a nontrivial linear relationship (at least one coefficient nonzero).

Example: The set {(1, 2, 0), (2, 2, 4), (4, −4, 14)} is dependent because:

After Gauss's Method, there is a free variable.
Setting c₃ = 1 gives c₂ = −1 and c₁ = −2.
So: −2·(1, 2, 0) − 1·(2, 2, 4) + 1·(4, −4, 14) = (0, 0, 0).

Don't confuse: "Dependent" does not mean "all coefficients are nonzero"—it means "at least one coefficient can be nonzero."

🎯 Special cases and quick tests

🎯 Singleton sets

A set containing a single vector {v} is linearly independent if and only if v ≠ 0.

Why: The relationship c·v = 0 forces c = 0 only when v is nonzero.
Zero vector rule: Any set containing the zero vector is automatically dependent (you can write 5·0 = 0, a nontrivial relationship).

🎯 Two-vector sets

A set {v, w} with two vectors is linearly independent if and only if neither vector is a scalar multiple of the other.

Equivalently: dependent ⟺ one is a multiple of the other.
Why: If v = r·w, then 1·v − r·w = 0 is a nontrivial dependence.
Example: {(1, 0), (2, 0)} is dependent because (2, 0) = 2·(1, 0).

Don't confuse: "Not a multiple" includes the case where both are nonzero but point in different directions.

🎯 Sets with more vectors than dimension

In Rⁿ, any set with more than n vectors is automatically linearly dependent.

Why: The homogeneous system has more unknowns (coefficients) than equations (components), so Gauss's Method must leave at least one variable free.
Example: In R², any set of 3 or more vectors is dependent; in R⁴, any set of 5 or more vectors is dependent.
The largest linearly independent set in Rⁿ has exactly n vectors.

🧩 Working with function spaces

🧩 Functions as vectors

The excerpt includes examples where functions (polynomials, trigonometric functions) are treated as vectors.

Testing independence of functions:

Write c₁·f(x) + c₂·g(x) + ⋯ = Z(x), where Z is the zero function (Z(x) = 0 for all x).
Plug in convenient values of x to generate a system of equations.
Solve for the coefficients.

Example: To test {1, sin(x), sin(2x)}:

Plug in x = π, x = π/2, x = π/4 to get three equations.
The resulting system has only the trivial solution, so the set is independent.

🧩 Using known identities

Sometimes you can spot dependence using known relationships.

Example: The set {4·sin²(x), cos²(x), 2} is dependent because:

The identity sin²(x) + cos²(x) = 1 gives 4·sin²(x) + cos²(x) = 2 is not always true, but…
Rearranging: (1/2)·(4·sin²(x)) + 2·cos²(x) = 2 uses the identity to show dependence.

Don't confuse: A relationship like cos(x) = r·x for some scalar r is not a linear relationship—it would need to hold for all x, which it doesn't (cosine is not a scalar multiple of the identity function).

🔧 Theoretical properties

🔧 Subsets and supersets

Any subset of an independent set is independent (fewer vectors means fewer ways to form dependencies).
Any superset of a dependent set is dependent (if you already have a dependence, adding more vectors doesn't fix it).

Example: If {v₁, v₂, v₃} is independent, then {v₁, v₂} is also independent.

🔧 Uniqueness of representation

Key result from the excerpt: If S is linearly independent and a vector v can be written as a linear combination of vectors from S, then that representation is unique (up to reordering and omitting zero-coefficient terms).

Why: Suppose v = c₁·s₁ + ⋯ + cₙ·sₙ = d₁·t₁ + ⋯ + dₘ·tₘ (where all vectors are from S).
Rearranging: c₁·s₁ + ⋯ + cₙ·sₙ − d₁·t₁ − ⋯ − dₘ·tₘ = 0.
By independence, all coefficients are zero, so the two representations are the same.

Don't confuse: A dependent set can have multiple distinct representations of the same vector.

🔧 Reducing to independent subsets

Important theorem: Every finite set S has a linearly independent subset T with the same span: [T] = [S].

How to find it: If S is dependent, by Corollary 1.19 some vector is a linear combination of others—remove it. Repeat until independent.
Why it works: Removing a vector that's a combination of others doesn't change the span (Corollary 1.3).

📊 Computational examples

📊 Matrix form and row reduction

The excerpt shows many examples of setting up the coefficient matrix and reducing it.

Set	System after reduction	Conclusion
{(1, 2, 0), (−1, 1, 0)}	Unique solution c₁ = c₂ = 0	Independent
{(1, −3, 5), (2, 2, 4), (4, −4, 14)}	Free variable (c₃)	Dependent
{(0, 0, −1), (1, 0, 4), (0, 0, 0)}	Contains zero vector	Dependent

Pattern: After Gauss's Method, if the reduced matrix has:

A pivot in every column → independent.
At least one free variable → dependent.

📊 Determinant criterion (for square matrices)

For three vectors in R³, the excerpt derives:

The set {(a, d, g), (b, e, h), (c, f, i)} is independent ⟺ aei + bgf + cdh − hfa − idb − gec ≠ 0.
This expression is the determinant formula (though the excerpt doesn't name it).

Don't confuse: This only applies when the number of vectors equals the dimension of the space.

Section III: Basis and Dimension

🧭 Overview

🧠 One-sentence thesis

A basis is a linearly independent spanning set for a vector space, and the dimension—the number of vectors in any basis—is a fundamental invariant that characterizes the space and determines properties like the uniqueness of representations and the solvability of linear systems.

📌 Key points (3–5)

What a basis is: A set of vectors that both spans the space and is linearly independent; equivalently, every vector has a unique representation as a linear combination of basis vectors.
Dimension as invariant: All bases for a given space have the same number of elements, called the dimension; this number is well-defined and does not depend on which basis you choose.
How to find bases: Parametrize the solution set or span, then verify linear independence; alternatively, use row reduction on transposed vectors to eliminate redundancies.
Common confusion—span vs. independence: A set can span without being independent (redundant vectors), or be independent without spanning (too few vectors); only when both hold do you have a basis.
Why dimension matters: Dimension determines the number of free variables in homogeneous systems, the rank of matrices, and whether subspaces can be complements.

🧩 What is a basis?

🧩 Definition and uniqueness

Basis: A set of vectors in a vector space that is both linearly independent and spans the space.

A basis gives every vector in the space exactly one representation as a linear combination of the basis vectors.
The uniqueness property is key: if you can write a vector as a combination of basis vectors in two different ways, the coefficients must be identical.
Example: In the polynomial space P₂, the set {1, x, x²} is a basis; any polynomial a₀ + a₁x + a₂x² has unique coefficients a₀, a₁, a₂.

🔍 Two conditions must both hold

Spanning: Every vector in the space can be written as a linear combination of the basis vectors.
Linear independence: The only way to combine basis vectors to get the zero vector is the trivial combination (all coefficients zero).
Don't confuse: A set can satisfy one condition without the other. For instance, {1 + x, 2x + 1} spans a subspace of P₂ but does not span all of P₂ because no combination gives a quadratic term.

🎯 Representation with respect to a basis

Once you have a basis B = ⟨β₁, ..., βₙ⟩, you can write any vector v uniquely as v = c₁β₁ + ... + cₙβₙ.
The coefficients (c₁, ..., cₙ) form the representation of v with respect to B, written Rep_B(v).
Example: If v = (1, 2) and B = ⟨(1, 0), (0, 1)⟩, then Rep_B(v) = (1, 2); but if B changes, the representation changes even though v stays the same.

📏 Dimension: the fundamental invariant

📏 All bases have the same size

Dimension: The number of vectors in any basis for a vector space.
The excerpt proves (via the Exchange Lemma) that if one basis has n vectors, every other basis also has exactly n vectors.
This makes dimension well-defined: it is a property of the space itself, not of any particular basis.

🔢 Computing dimension

To find the dimension of a space, find any basis and count its elements.
For solution sets of homogeneous systems: parametrize the solutions; the number of free variables equals the dimension.
Example: The solution set {(4x₂ - 3x₃ + x₄, x₂, x₃, x₄) | x₂, x₃, x₄ ∈ ℝ} has three parameters, so dimension is 3.

🧮 Standard dimensions

Space	Dimension
ℝⁿ	n
Pₙ (polynomials degree ≤ n)	n + 1
M_{m×n} (m×n matrices)	m · n
Trivial subspace {0}	0

⚠️ Don't confuse dimension with other counts

Dimension is not the number of parameters in a description (though they often coincide after proper parametrization).
Dimension is not the number of vectors in a spanning set (which may have redundancies).
Dimension is the size of a minimal spanning set, which is the same as a maximal linearly independent set.

🔧 Finding bases in practice

🔧 Method 1: Parametrize and extract

Write the space as {combinations of parameters}.
The vectors multiplying the parameters form a natural candidate for a basis.
Verify linear independence (usually straightforward after parametrization).
Example: For {(a, b, c) | a + b = 0}, parametrize as {(-b, b, c) | b, c ∈ ℝ} = {b·(-1, 1, 0) + c·(0, 0, 1)}, giving basis ⟨(-1, 1, 0), (0, 0, 1)⟩.

🔧 Method 2: Row reduction on transposed vectors

If you have a spanning set (possibly with redundancies), transpose the vectors to rows.
Apply Gaussian elimination; the nonzero rows form a basis for the row space.
Transpose back to get a basis for the original column space.
This eliminates linear dependencies automatically.

🔧 Method 3: Extend or shrink

Extending: If you have a linearly independent set that doesn't span, add vectors until it does (Corollary 2.12).
Shrinking: If you have a spanning set that isn't independent, remove redundant vectors until independence holds.
Both processes terminate at a basis.

🛠️ Common scenario: subspaces defined by equations

Given conditions like a - 2b + c - d = 0, solve for leading variables in terms of free variables.
Each free variable corresponds to one basis vector.
Example: From a = 2b - c + d, get {(2b - c + d, b, c, d)} = {b·(2, 1, 0, 0) + c·(-1, 0, 1, 0) + d·(1, 0, 0, 1)}, so basis has three vectors.

🔗 Rank: connecting matrices and dimension

🔗 Row rank equals column rank

Row rank: dimension of the row space (span of the rows).
Column rank: dimension of the column space (span of the columns).
The excerpt establishes that these are always equal; this common value is called the rank of the matrix.

🔗 Rank from row reduction

Apply Gaussian elimination; the number of nonzero rows in echelon form is the rank.
This works because row operations preserve the row space's dimension.
Example: A 3×4 matrix reducing to two nonzero rows has rank 2.

🔗 Rank and linear systems

For a system with coefficient matrix A and augmented matrix (A|b):
- The system has a solution if and only if rank(A) = rank(A|b).
- If rank(A) = number of variables, solutions (if they exist) are unique.
- The dimension of the solution space of the homogeneous system is (number of variables) - rank(A).

📊 Rank inequalities

Condition	Implication
rank(A) = number of rows	Full row rank; system Ax = b always has a solution
rank(A) = number of columns	Full column rank; homogeneous system has only trivial solution
rank(A) < min(rows, columns)	Matrix is "deficient"; some dependencies exist

➕ Combining subspaces: sums and intersections

➕ Sum of subspaces

Sum W₁ + W₂: The set of all vectors of the form w₁ + w₂ where w₁ ∈ W₁ and w₂ ∈ W₂.
Equivalently, the span of W₁ ∪ W₂.
The sum is the smallest subspace containing both W₁ and W₂.

➕ Direct sum

Direct sum W₁ ⊕ W₂: A sum W₁ + W₂ where every vector has a unique decomposition as w₁ + w₂.

This happens if and only if W₁ ∩ W₂ = {0} (trivial intersection).
Equivalently, dim(W₁ ⊕ W₂) = dim(W₁) + dim(W₂).
Example: In ℝ³, the xy-plane and the z-axis form a direct sum because their intersection is just the origin.

🔀 Dimension formula for sums

For any subspaces U and W: dim(U + W) = dim(U) + dim(W) - dim(U ∩ W).
This generalizes the inclusion-exclusion principle from set theory.
Example: Two 8-dimensional subspaces of ℝ¹⁰ must intersect in a subspace of dimension at least 6.

🧩 Complements

Subspaces W₁ and W₂ are complements if V = W₁ ⊕ W₂.
Every subspace has a complement (though not unique).
To construct: take a basis for W₁, extend to a basis for V; the added vectors span a complement.

⚠️ Don't confuse sum with union

W₁ + W₂ is generally much larger than W₁ ∪ W₂.
The union is usually not even a subspace (not closed under addition).
Example: The union of the x-axis and y-axis in ℝ² is not a subspace, but their sum is all of ℝ².

Note: This excerpt consists primarily of exercise solutions demonstrating techniques for finding bases, computing dimensions, and working with matrix rank. The core theory (definitions, theorems, proofs) appears in earlier sections not included here.

Isomorphisms: Definition and Examples

Section I: Isomorphisms

🧭 Overview

🧠 One-sentence thesis

A function is one-to-one (injective) when it never maps two different inputs to the same output, which can be verified by showing that equal outputs imply equal inputs.

📌 Key points (3–5)

What one-to-one means: a map is one-to-one if whenever two domain elements map to the same image, those two domain elements must be equal.
How to verify: assume f(x) = f(y), then use the definition of f to show x = y.
Component-wise equality: vectors are equal only when all their corresponding components are equal.
Common confusion: one-to-one is about uniqueness of inputs for each output, not about the function's formula looking simple.

🔍 What one-to-one means

🔍 The definition in action

A function f is one-to-one when: if f maps two members of the domain to the same image, then those two members are equal.

This is a conditional statement: "if f(x) = f(y), then x = y."
The excerpt demonstrates this with a specific map from row vectors to column vectors.
Example: if f sends (a b) to the column vector with entries a and b, and f also sends (c d) to the same column vector, then we must have a = c and b = d.

🧮 The verification technique

The excerpt shows a standard proof pattern:

Assume f((a b)) = f((c d))
Apply the definition of f to both sides
Use properties of equality (here: column vectors are equal only when components match)
Conclude that the original inputs were equal: a = c and b = d

🎯 Component-wise equality

🎯 Vector equality rule

The excerpt states: "column vectors are equal only if they have equal components."
This is a fundamental property used in the proof.
It allows us to break down vector equality into simpler scalar equalities.
Don't confuse: two vectors being "close" or "similar" is not the same as being equal—every component must match exactly.

Section II: Homomorphisms

🧭 Overview

🧠 One-sentence thesis

A homomorphism is a linear map between vector spaces that preserves vector addition and scalar multiplication, and understanding its range space and null space reveals whether the map is one-to-one, onto, or an isomorphism.

📌 Key points (3–5)

What a homomorphism is: a function between vector spaces that preserves linear combinations (addition and scalar multiplication).
How to verify linearity: check that h(c₁·v₁ + c₂·v₂) = c₁·h(v₁) + c₂·h(v₂) for all vectors and scalars.
Range space and null space: the range is the set of all outputs; the null space is the set of inputs that map to zero.
Rank-nullity theorem: rank (dimension of range) plus nullity (dimension of null space) equals the dimension of the domain.
Common confusion: a map can preserve some independent sets without being one-to-one; it must preserve all independent sets to be one-to-one.

🔍 Definition and verification

🔍 What makes a map a homomorphism

A homomorphism h : V → W is a function between vector spaces that preserves linear combinations: h(c₁·v₁ + c₂·v₂) = c₁·h(v₁) + c₂·h(v₂).

The map must respect both addition and scalar multiplication.
If a map fails to preserve even one combination, it is not a homomorphism.
Example: the map h from R³ to R² given by (x, y, z) ↦ (x, x+y+z) is a homomorphism because h(c₁·(x₁, y₁, z₁) + c₂·(x₂, y₂, z₂)) = c₁·h(x₁, y₁, z₁) + c₂·h(x₂, y₂, z₂).

✅ How to check if a map is linear

The excerpt shows many verification examples. The standard procedure:

Take a linear combination c₁·v₁ + c₂·v₂ in the domain.
Apply h to get h(c₁·v₁ + c₂·v₂).
Expand using the definition of h.
Rearrange to show it equals c₁·h(v₁) + c₂·h(v₂).

Example from the excerpt: for h from R³ to R² given by (x, y, z) ↦ (2x+y, 3y−4z), the check shows:

h((c₁x₁+c₂x₂, c₁y₁+c₂y₂, c₁z₁+c₂z₂)) = (2(c₁x₁+c₂x₂)+(c₁y₁+c₂y₂), 3(c₁y₁+c₂y₂)−4(c₁z₁+c₂z₂))
This equals c₁·(2x₁+y₁, 3y₁−4z₁) + c₂·(2x₂+y₂, 3y₂−4z₂) = c₁·h(x₁,y₁,z₁) + c₂·h(x₂,y₂,z₂).

❌ When a map fails to be linear

A map is not a homomorphism if it fails to preserve even one linear combination.

Example from the excerpt: the map h from R³ to R² given by (x, y, z) ↦ (1, 1) is not linear because h((0,0,0) + (0,0,0)) = (1,1) but h(0,0,0) + h(0,0,0) = (1,1) + (1,1) = (2,2), so addition is not preserved.

Don't confuse: a map that works for some vectors but not all is still not a homomorphism.

🎯 Range space and null space

🎯 Range space (image)

The range space R(h) = {h(v) | v ∈ V} is the set of all outputs of h; it is a subspace of the codomain W.

The range is nonempty because V is nonempty.
It is closed under linear combinations: if w₁ and w₂ are in the range, then c₁·w₁ + c₂·w₂ is also in the range.
The rank of h is the dimension of the range space.
Example: for h : P₃ → P₄ given by h(p(x)) = x·p(x), the range is all polynomials in P₄ with zero constant term.

🕳️ Null space (kernel)

The null space N(h) = {v ∈ V | h(v) = 0_W} is the set of all inputs that map to the zero vector; it is a subspace of the domain V.

The null space always contains at least the zero vector of V.
It is closed under linear combinations.
The nullity of h is the dimension of the null space.
Example: for the derivative map d/dx on Pₙ, the null space is the set of constant polynomials {a₀ | a₀ ∈ R}, so the nullity is 1.

⚖️ Rank-nullity theorem

Rank plus nullity equals the dimension of the domain: rank(h) + nullity(h) = dim(V).

This is a fundamental constraint on any linear map.
If the domain has dimension 5 and the rank is 3, then the nullity must be 2.
Example from the excerpt: for a map from R³ to R², if the range is all of R² (rank = 2), then the nullity is 3 − 2 = 1.

🔗 One-to-one and onto

🔗 When a homomorphism is one-to-one

A homomorphism h is one-to-one if and only if its null space is trivial (contains only the zero vector), i.e., nullity = 0.

Equivalently, h is one-to-one if and only if h(v₁) = h(v₂) implies v₁ = v₂.
By the rank-nullity theorem, if h is one-to-one, then rank(h) = dim(V).
Example: the derivative map d/dx on Pₙ is not one-to-one because both f₀(x) = 0 and f₁(x) = 1 have the same derivative (zero in P_{n−1}).

Don't confuse: a map that preserves some independent sets is not necessarily one-to-one; it must preserve all independent sets.

🎯 When a homomorphism is onto

A homomorphism h is onto if its range equals the entire codomain W.

Equivalently, h is onto if and only if rank(h) = dim(W).
Example from the excerpt: the map h : P₃ → R² given by h(a₀ + a₁x + a₂x² + a₃x³) = (a₀, a₁) is onto because any (x, y) in R² is the image of the polynomial x + yx.

🔄 Isomorphisms

A homomorphism that is both one-to-one and onto is called an isomorphism.

If the domain and codomain have the same dimension, then h is one-to-one if and only if it is onto.
An isomorphism has an inverse that is also a homomorphism.
Example: the logarithm map from positive reals (under multiplication) to all reals (under addition) is an isomorphism because ln(ab) = ln(a) + ln(b) and ln(aʳ) = r·ln(a), and it has the exponential map as its inverse.

🧩 Properties and examples

🧩 Homomorphisms determined by basis images

A linear map is completely determined by where it sends the basis vectors.

If B = {β₁, ..., βₙ} is a basis for V, then specifying h(β₁), ..., h(βₙ) uniquely determines h on all of V.
For any v = c₁β₁ + ... + cₙβₙ, we have h(v) = c₁h(β₁) + ... + cₙh(βₙ).
Example from the excerpt: if h sends β₁ to 0 and β₂ to 0, ..., and βₙ to 0, then h is the zero map.

🔢 Matrix representations

Any linear map from Rⁿ to Rᵐ can be represented by an m×n matrix.

The map h(x₁, ..., xₙ) = (a₁,₁x₁ + ... + a₁,ₙxₙ, ..., aₘ,₁x₁ + ... + aₘ,ₙxₙ) is linear.
The verification that this preserves linear combinations is routine.
Example: the projection map from R³ to R² given by (x, y, z) ↦ (x, y) is linear and can be represented by the 2×3 matrix with rows (1, 0, 0) and (0, 1, 0).

📐 Geometric examples

The excerpt includes several geometric homomorphisms:

Map	Description	Properties
Projection to xz-plane	(x, y, z) ↦ (x, 0, z)	Linear; not one-to-one; not onto R³
Projection to x-axis	(x, y, z) ↦ (x, 0, 0)	Linear; null space is yz-plane
Shadow map	Projects 3D objects to 2D	Preserves scalar multiples: shadow of cv is c times shadow of v

🧮 Calculus examples

Differentiation and integration are homomorphisms on polynomial spaces.

The derivative map d/dx : Pₙ → Pₙ₋₁ is linear because d/dx(f + g) = d/dx(f) + d/dx(g) and d/dx(r·f) = r·d/dx(f).
The null space of d/dx on Pₙ is the set of constant polynomials.
The null space of dᵏ/dxᵏ on Pₙ is the set of polynomials of degree less than k.
Integration and differentiation are not inverses because the composition (integrate then differentiate) of the constant 1 gives 0, not 1.

🔍 Inverse images and cosets

🔍 Inverse images

The inverse image of a vector w is the set h⁻¹(w) = {v ∈ V | h(v) = w}.

If h(v₀) = w, then h⁻¹(w) = {v₀ + n | n ∈ N(h)}.
In other words, the inverse image is a translate of the null space.
Example from the excerpt: for the map h : R² → R given by h(x, y) = 2x + y, the inverse images are lines with slope −2.

📊 General = Particular + Homogeneous

The excerpt emphasizes this pattern:

The general solution to h(v) = w is a particular solution plus the general solution to the homogeneous equation h(v) = 0.
This applies to linear systems of equations and differential equations.
Example: if v₀ is one solution to h(v) = w, then all solutions are {v₀ + n | n ∈ N(h)}.

🎓 Special results

🎓 Composition of homomorphisms

If f : U → V and g : V → W are linear, then the composition g ∘ f : U → W is also linear.

The verification: (g ∘ f)(c₁u₁ + c₂u₂) = g(f(c₁u₁ + c₂u₂)) = g(c₁f(u₁) + c₂f(u₂)) = c₁g(f(u₁)) + c₂g(f(u₂)) = c₁(g ∘ f)(u₁) + c₂(g ∘ f)(u₂).

🎓 Transpose as a homomorphism

The transpose operation T : M_{m×n} → M_{n×m} is a linear map.

The entry in row i and column j of Mᵀ is the entry from row j and column i of M.
Verification: (rA + sB)ᵀ = rAᵀ + sBᵀ.

🎓 Dual space

The dual space V* is the set of all linear maps from V to R.

V* is isomorphic to Rⁿ when V has dimension n.
The isomorphism is given by mapping h to the vector (h(β₁), ..., h(βₙ)) where B = {β₁, ..., βₙ} is a basis for V.

🎓 Preservation of convexity

If h : Rⁿ → Rᵐ is linear and C ⊆ Rⁿ is convex, then h(C) is also convex.

A set is convex if the line segment between any two points in the set is also in the set.
Linear maps send line segments to line segments (possibly degenerate).
Therefore, if C is convex, so is h(C).

Computing Linear Maps

Section III: Computing Linear Maps

🧭 Overview

🧠 One-sentence thesis

Any matrix represents a linear map between vector spaces once bases are chosen for the domain and codomain, and this representation preserves the map's properties such as rank, nullity, and whether it is one-to-one or onto.

📌 Key points (3–5)

Matrix-vector multiplication computes map images: multiplying the representation matrix by a domain vector's representation gives the codomain vector's representation.
Domain dimension = number of columns; codomain dimension = number of rows: the matrix shape directly encodes the dimensions of the spaces involved.
Rank and nullity are basis-independent: the rank of the matrix equals the dimension of the range space, and nullity equals the dimension of the null space, regardless of which bases are chosen.
Common confusion: the same matrix can represent different maps if the bases change; conversely, the same map can be represented by different matrices with different basis choices.
Isomorphism criterion: a square matrix represents an isomorphism if and only if it is nonsingular (has full rank).

🧮 Matrix-vector multiplication and map computation

🧮 How to compute the image of a vector

The excerpt shows a systematic three-step process:

Represent the input vector with respect to the domain basis B: write the vector as a linear combination of basis vectors and extract coefficients.
Multiply the representation matrix by this coefficient column vector.
Interpret the result with respect to the codomain basis D: the output column tells you how to combine the codomain basis vectors to get the image.

Example: If a map h is represented by a matrix H with respect to bases B and D, and a vector v in the domain is represented as Rep_B(v) = column of c_i, then Rep_D(h(v)) = H · Rep_B(v).

📐 Transparent representation with standard bases

When both domain and codomain use standard bases (E_n), vectors represent themselves:

Rep_{E_n}(vector) = the vector itself (as a column).
This makes calculations simpler because no basis conversion is needed.

The excerpt repeatedly uses this fact: "with respect to E_2, a column vector represents itself."

🔢 Dimensions, rank, and nullity from the matrix

🔢 Reading dimensions from matrix shape

The dimension of the domain space is the number of columns m. The dimension of the codomain space is the number of rows n.

An m×n matrix represents a map from an m-dimensional space to an n-dimensional space.
This is immediate from the matrix shape and does not depend on which bases are chosen.

📊 Rank and range space

The rank of the map equals the rank of the matrix (the dimension of the column space).

The range space R(h) consists of all vectors in the codomain that are images of some domain vector.
In terms of representations: R(h) = {all linear combinations of the matrix columns}.
The map is onto if and only if rank = dimension of codomain (i.e., the matrix has full row rank).

Example from the excerpt: a 3×3 matrix with rank 2 has a two-dimensional range space, so the map is not onto because the range is a proper subspace of the three-dimensional codomain.

🕳️ Nullity and null space

The nullity is the dimension of the null space N(h).

The null space N(h) is the set of domain vectors that map to the zero vector in the codomain.

In matrix terms: solve H · Rep_B(v) = zero column to find Rep_B of null space members.
The map is one-to-one if and only if nullity = 0 (i.e., the null space is trivial).
Rank-nullity theorem: rank + nullity = dimension of domain (= number of columns).

Example: a matrix with one free variable in its solution has nullity 1, so the map is not one-to-one.

🔄 Same matrix, different maps; same map, different matrices

🔄 Different bases yield different maps from the same matrix

The excerpt emphasizes this repeatedly:

The identity matrix I can represent the identity map (if domain basis = codomain basis) or a completely different map (e.g., swapping components) if the bases differ.
Example: the 2×2 identity with respect to E_2 for domain and D = ⟨e_2, e_1⟩ for codomain represents the map (x, y) ↦ (y, x), which is reflection about y = x, not the identity.

Don't confuse: a matrix is not "the map"; it is a representation of a map with respect to chosen bases.

🔄 Different matrices can represent the same map

Conversely, the same linear map can be represented by different matrices if you change bases.

Example from the excerpt: projection onto the x-axis is represented by both

(1 0; 0 0) with respect to E_2, E_2, and
(0 1; 0 0) with respect to C = ⟨e_2, e_1⟩, E_2.

The map itself (projection) is unchanged; only the matrix representation changes.

🧩 Constructing the representation matrix

🧩 The column-by-column recipe

To build the matrix Rep_{B,D}(h) representing a map h with respect to domain basis B = ⟨β_1, …, β_n⟩ and codomain basis D:

Compute the image of each domain basis vector: h(β_i).
Represent each image with respect to the codomain basis D: Rep_D(h(β_i)) is a column of scalars.
Adjoin these columns side by side to form the matrix.

The excerpt applies this recipe in many exercises, e.g., for differentiation, integration, evaluation, and geometric transformations.

🧩 Example: differentiation

For the derivative map d/dx on polynomials with basis B = ⟨1, x, x², x³⟩:

d/dx(1) = 0 → Rep_B(0) = (0, 0, 0, 0)
d/dx(x) = 1 → Rep_B(1) = (1, 0, 0, 0)
d/dx(x²) = 2x → Rep_B(2x) = (0, 2, 0, 0)
d/dx(x³) = 3x² → Rep_B(3x²) = (0, 0, 3, 0)

Adjoin these columns:

Rep_{B,B}(d/dx) = 
( 0  1  0  0 )
( 0  0  2  0 )
( 0  0  0  3 )
( 0  0  0  0 )

This is an upper-triangular matrix with the derivative coefficients on the superdiagonal.

🔍 Isomorphisms and nonsingular matrices

🔍 When a matrix represents an isomorphism

A map is an isomorphism if and only if it is both one-to-one and onto.

For a square n×n matrix H representing a map h : V → W (both n-dimensional):

H is nonsingular (invertible, full rank n) ⟺ h is an isomorphism.
This is because nonsingularity ⟺ unique solution for every codomain vector ⟺ onto and one-to-one.

The excerpt proves this equivalence in Exercise Three.III.2.25.

🔍 Non-square matrices cannot represent isomorphisms between spaces of the same dimension

If the matrix is m×n with m ≠ n, the domain and codomain have different dimensions, so the map cannot be an isomorphism (isomorphisms require equal dimensions).

However, a non-square matrix can still represent an injective (one-to-one) or surjective (onto) map:

More columns than rows (m < n): cannot be onto (rank ≤ m < n).
More rows than columns (m > n): cannot be one-to-one (by rank-nullity, nullity ≥ m − n > 0).

🎯 Practical examples and geometric maps

🎯 Geometric transformations in ℝ²

The excerpt gives several 2×2 matrices and their geometric interpretations with respect to standard bases:

Matrix	Map description
(3 0; 0 2)	Stretch by 3 in x-direction, by 2 in y-direction
(1 0; 0 0)	Projection onto x-axis
(0 1; 1 0)	Reflection about line y = x (swap components)
(1 3; 0 1)	Skew parallel to y-axis by 3 times distance from x-axis

🎯 Rotations in ℝ³

For rotation by angle θ about the z-axis (with standard basis E_3):

( cos θ   sin θ   0 )
( −sin θ  cos θ   0 )
(   0       0     1 )

The first two basis vectors rotate in the xy-plane; the third (along z-axis) is unchanged.

Similar formulas appear for rotations about the x- and y-axes.

🎯 Polynomial maps

Integration ∫ from B_n to B_{n+1}: the matrix has 0s on the main diagonal and 1, 1/2, 1/3, … on the superdiagonal (because ∫ xⁱ = xⁱ⁺¹/(i+1)).
Evaluation at a point (e.g., eval_3): the matrix is a single row (1, 3, 9, …, 3ⁿ) because it maps polynomials to scalars.
Shift map (x ↦ x+1): the matrix is Pascal's triangle, because (x+1)ⁿ expands via binomial coefficients.

⚠️ Common pitfalls and clarifications

⚠️ "Is homomorphic to" is not always an equivalence relation

The excerpt notes that if "V is homomorphic to W" means "there exists a homomorphism from V into W," then every space is homomorphic to every other (via the zero map), so the relation is trivial.

If it means "there exists an onto homomorphism," the relation is not symmetric: there is an onto map ℝ³ → ℝ² but not ℝ² → ℝ³, so it fails reflexivity (the excerpt says "not reflexive," likely meaning "not symmetric").

⚠️ Range depends on the map, not just the matrix

Two matrices that are equal can represent maps with different ranges if the codomain bases differ.

Example from the excerpt: the same matrix can represent different maps with different codomains, so the range spaces (as subsets of those codomains) differ.

⚠️ Upper-triangular matrices and subspace chains

If a basis B for the domain is chosen so that the image of each β_i lies in the span of the first i codomain basis vectors, the matrix is upper-triangular.

The excerpt sketches this in Exercise Three.III.2.35: define W_i = span{h(β_1), …, h(β_i)} and extend to a basis for the codomain; the matrix will be upper-triangular.

This is a preview of more advanced topics (Jordan form, flag structures) not fully developed in this excerpt.

Matrix Operations

Section IV: Matrix Operations

🧭 Overview

🧠 One-sentence thesis

Matrix addition, scalar multiplication, and matrix multiplication are defined entry-by-entry or via composition of linear maps, and these operations preserve the structure of linear transformations while obeying algebraic laws like associativity and distributivity.

📌 Key points (3–5)

Matrix addition and scalar multiplication: defined entry-by-entry; correspond to adding or scaling the represented linear maps.
Matrix multiplication: the i,j entry of GH is the dot product of row i of G and column j of H; represents composition of linear maps (g ∘ h).
Inverses: a square matrix H has an inverse H⁻¹ if HH⁻¹ = H⁻¹H = I; not all matrices are invertible (e.g., if determinant ad - bc = 0 for 2×2 matrices).
Common confusion: matrix multiplication is not commutative (GH ≠ HG in general), even though composition order matters (right-to-left).
Why it matters: these operations let us compute with linear maps algebraically, solve systems, and understand transformations via their matrix representations.

➕ Addition and scalar multiplication

➕ Entry-by-entry definition

Matrix addition G + H: add corresponding entries g_i,j + h_i,j. Scalar multiplication r·H: multiply every entry by r, giving rh_i,j.

These operations are defined only when matrices have the same dimensions.
They mirror vector-space operations: matrices of a fixed size form a vector space.
Example: if G and H are 2×2, then (G + H)_1,1 = g_1,1 + h_1,1.

🔗 Connection to linear maps

The excerpt shows that if G and H represent linear maps g and h (with respect to fixed bases), then:
- G + H represents the map g + h (pointwise sum of outputs).
- r·H represents the map r·h (scaling outputs by r).
This is Theorem 1.4 (referenced but not fully quoted): representation preserves linear combinations.
Don't confuse: the matrix sum is not a composition; it's a pointwise operation on the maps.

📐 Algebraic properties

The excerpt lists properties (commutativity, associativity, zero matrix, etc.) and notes two ways to verify them:

Property	Entry-by-entry check	Map interpretation
G + H = H + G	g_i,j + h_i,j = h_i,j + g_i,j	g(v) + h(v) = h(v) + g(v)
(G + H) + J = G + (H + J)	associativity of reals	associativity of vector addition
G + Z = G	adding zero entries	g(v) + 0 = g(v)
r·(G + H) = r·G + r·H	distributive law for reals	r·(g(v) + h(v)) = r·g(v) + r·h(v)

The zero matrix represents the zero map (sending every vector to zero).
Trace (sum of diagonal entries) is a linear map from M_n×n to R: trace(H + G) = trace(H) + trace(G), trace(r·H) = r·trace(H).

✖️ Matrix multiplication

✖️ Definition via dot products

The i,j entry of the product GH is the dot product of row i of G and column j of H: (GH)_i,j = g_i,1 h_1,j + g_i,2 h_2,j + ··· + g_i,r h_r,j.

For the product to be defined, the number of columns of G must equal the number of rows of H.
If G is m×r and H is r×n, then GH is m×n.
Example: multiplying a 2×3 matrix by a 3×2 matrix yields a 2×2 matrix.

🔄 Composition of linear maps

The excerpt emphasizes that matrix multiplication represents composition: if H represents h: V → W and G represents g: W → X, then GH represents g ∘ h: V → X.
The order is right-to-left: GH means "apply h first, then g."
Example: if h rotates by π/6 and g rotates by π/6, then g ∘ h rotates by π/3.
Don't confuse: the product GH is not the same as HG (composition is not commutative in general).

⚠️ Non-commutativity

The excerpt gives explicit examples where GH ≠ HG (e.g., rotation matrices about different axes).
Even when both products are defined and have the same size, they usually differ.
Exception: diagonal matrices commute with each other; scalar multiples of the identity commute with everything.

🧮 Algebraic laws

Associativity: (FG)H = F(GH) (verified via indices or via associativity of function composition).
Distributivity: F(G + H) = FG + FH and (G + H)F = GF + HF.
Scalar compatibility: r·(GH) = (r·G)H = G(r·H).
Transpose: (GH)ᵀ = HᵀGᵀ (order reverses).
Trace: trace(GH) = trace(HG) (even though GH ≠ HG).

🔢 Special matrices and operations

🔢 Identity and elementary matrices

The identity matrix I has 1s on the diagonal and 0s elsewhere; it satisfies HI = IH = H for any square H of the same size.

Elementary matrices: represent row operations.
- P_i,j swaps rows i and j.
- C_i,j(k) adds k times row j to row i.
- M_i(r) multiplies row i by r.
Multiplying on the left performs row operations; on the right performs column operations.
Example: C_1,2(-2) applied to the left subtracts 2 times row 2 from row 1.

🔺 Triangular and diagonal matrices

Upper triangular: all entries below the diagonal are zero (i > j ⇒ h_i,j = 0).
The product of two upper triangular matrices is upper triangular.
Diagonal matrices: all off-diagonal entries are zero; they commute with each other.
The set of n×n diagonal matrices is a subspace of dimension n.

🔄 Permutation matrices

A permutation matrix has exactly one 1 in each row and column, all other entries 0.
Multiplying by a permutation matrix rearranges rows (left) or columns (right).
Permutation matrices are orthogonal: PᵀP = I.

🔁 Matrix inverses

🔁 Definition and uniqueness

A square matrix H is invertible (nonsingular) if there exists a matrix H⁻¹ such that HH⁻¹ = H⁻¹H = I.

The inverse is unique if it exists (Lemma 4.2, referenced).
Not all matrices have inverses; a matrix is invertible if and only if it has full rank (rank = n for n×n).
Example: for 2×2 matrices, H is invertible ⇔ ad - bc ≠ 0.

🧮 Computing inverses

Gauss-Jordan method: augment H with I and row-reduce [H | I] to [I | H⁻¹].
If the left side reduces to I, the right side becomes H⁻¹.
If the left side does not reduce to I (i.e., H is singular), no inverse exists.
Example: reducing [H | I] for a 3×3 matrix step-by-step yields H⁻¹ on the right.

🔢 Formula for 2×2 inverses

For a 2×2 matrix H = (a b; c d), if ad - bc ≠ 0, then:

H⁻¹ = (1/(ad - bc)) · (d -b; -c a)

The quantity ad - bc is the determinant.
If ad - bc = 0, the matrix is singular (not invertible).

🔗 Properties of inverses

(H⁻¹)⁻¹ = H: the inverse of the inverse is the original matrix.
(GH)⁻¹ = H⁻¹G⁻¹: the inverse of a product reverses the order.
(Hᵀ)⁻¹ = (H⁻¹)ᵀ: transpose and inverse commute.
(r·H)⁻¹ = (1/r)·H⁻¹ (for r ≠ 0).
Elementary matrices are invertible: e.g., C_i,j(k)⁻¹ = C_i,j(-k).

⚠️ Common pitfalls

No inverse for sums: (H + G)⁻¹ ≠ H⁻¹ + G⁻¹ in general (even if all inverses exist).
Left vs right inverses: for non-square matrices, a matrix may have a left inverse but no right inverse (or vice versa).
Example: a 2×3 matrix can have infinitely many right inverses but no left inverse.
Zero divisors: if AB = 0 (zero matrix) and B ≠ 0, then A cannot be invertible.

🧩 Rank and structure

🧩 Rank and products

The rank of a product GH is at most the minimum of rank(G) and rank(H).
Example: multiplying two rank-1 matrices can yield a rank-0 (zero) matrix.
The rank can drop: if H has a column of zeros, GH also has a column of zeros.

🔄 Row and column operations

Row operations (left multiplication by elementary matrices) do not change the row space or rank (except M_i(0), which is not allowed).
Column operations (right multiplication) similarly preserve column space.
The excerpt shows that row equivalence preserves invertibility: if A is row-equivalent to B and A is invertible, then B is invertible.

🧮 Trace and rank

Trace: the sum of diagonal entries; it is invariant under similarity (trace(GH) = trace(HG)).
Rank: the dimension of the column space (or row space); it is the number of leading 1s in reduced form.
The excerpt notes that the trace is a linear map, and that rank interacts with products in specific ways (e.g., rank(A²) = rank(A) - dim(R(A) ∩ N(A))).

Change of Basis

Section V: Change of Basis

🧭 Overview

🧠 One-sentence thesis

Change of basis matrices allow us to translate vector and linear map representations between different coordinate systems, preserving the underlying mathematical structure while altering the numerical form.

📌 Key points (3–5)

What a change of basis matrix does: it converts a vector's representation from one basis to another by encoding how the new basis vectors are expressed in the old basis.
Nonsingularity requirement: a matrix can serve as a change of basis matrix if and only if it is nonsingular (invertible), ensuring the transformation is reversible.
Matrix equivalence: two matrices are matrix equivalent if they represent the same linear transformation with respect to different bases, related by H-hat = P·H·Q where P and Q are nonsingular.
Common confusion: changing representations of vectors (one basis change) versus changing representations of maps (two basis changes—one for domain, one for codomain).
Rank preservation: matrix equivalence preserves rank, so matrices fall into equivalence classes determined by their size and rank.

🔄 Changing vector representations

🔄 The basic mechanism

A change of basis matrix converts a vector's coordinate representation from one basis to another.

Change of basis matrix Rep_{B,D}(id): the matrix whose columns are Rep_D(β_i), the representations of the starting basis vectors in the ending basis.

To build the matrix: represent each vector from the starting basis B in terms of the ending basis D, then concatenate these representations as columns.
The identity map "id" appears because we are not changing the vector itself, only how we write it down.
Example: if B = ⟨β₁, β₂⟩ and we want to change to basis D, compute Rep_D(β₁) and Rep_D(β₂), then place them side-by-side as columns.

🔁 Inverting the direction

To go the opposite direction (from D to B), use the inverse matrix.

Rep_{D,B}(id) = [Rep_{B,D}(id)]⁻¹
For 2×2 matrices, the excerpt shows the explicit inverse formula makes this easy.
Don't confuse: the matrix that changes B→D is not the same as the matrix that changes D→B; they are inverses of each other.

🧮 Row operations and basis changes

Row operations on a representation correspond to specific basis changes.

Row multiplication M_i(k): changes from a basis where the i-th vector is scaled by k.
Row swap P_{i,j}: changes from a basis where the i-th and j-th vectors are swapped.
Row combination C_{i,j}(k): changes from a basis where the i-th vector has k times the j-th vector added to it.
Each reduction matrix represents a change from some other basis to the current basis B.

🗺️ Changing map representations

🗺️ The double change formula

When changing how a linear map is represented, both domain and codomain bases may change.

The arrow diagram shows:

V wrt B  --[T]-->  W wrt D
   |                  |
  [Q]                [P]
   ↓                  ↓
V wrt B̂  --[T̂]-->  W wrt D̂

Formula: T̂ = P·T·Q where Q = Rep_{B̂,B}(id) and P = Rep_{D,D̂}(id)
The operation done first is written on the right (Q converts the input basis, then T acts, then P converts the output basis).
Example: to find T̂, compute the two change of basis matrices P and Q, then multiply in the order P·T·Q.

🎯 Matrix equivalence definition

Matrix equivalence: matrices H₁ and H₂ are matrix equivalent if there exist nonsingular matrices P and Q such that H₁ = P·H₂·Q.

This is an equivalence relation (reflexive, symmetric, transitive).
Two matrices are matrix equivalent if and only if they have the same size and the same rank.
Don't confuse with similarity: matrix equivalence allows different bases for domain and codomain; similarity requires the same basis for both (covered later).

📐 Canonical form

Every m×n matrix is matrix equivalent to a standard form with ones on the diagonal and zeros elsewhere.

The canonical form has rank r if it has r ones on the diagonal, then all other entries zero.
This means every m×n matrix of rank r belongs to the same equivalence class.
The number of equivalence classes for m×n matrices is k+1, where k = min(m,n), corresponding to ranks 0, 1, 2, ..., k.

🔍 Key properties and applications

🔍 Nonsingularity and bases

A matrix can serve as a change of basis matrix if and only if it is nonsingular.

Nonsingular means the matrix has full rank (rank equals the number of columns).
This ensures the columns form a linearly independent set, hence a basis.
The columns of a change of basis matrix H = Rep_{B,E_n}(id) are exactly the basis vectors β_i (when the ending basis is the standard basis, representations are transparent).

🔍 Rank preservation

Matrix equivalence preserves rank.

Property	Explanation
Same rank	If H₁ = P·H₂·Q with P, Q nonsingular, then rank(H₁) = rank(H₂)
Equivalence classes	Matrices are grouped by size and rank
Transpose	A matrix and its transpose have the same rank, so are matrix equivalent

🔍 Special cases

When only one basis changes, the formula simplifies.

Domain only: if only the domain basis changes (W basis stays D), then T̂ = T·Q where Q = Rep_{B₂,B₁}(id).
Codomain only: if only the codomain basis changes (V basis stays B), then T̂ = P·T where P = Rep_{D₁,D₂}(id).
Same starting and ending basis: when B = D (domain and codomain use the same basis), this leads to the concept of similarity (next subsection).

⚠️ What doesn't preserve

Not all operations preserve matrix equivalence.

Addition: equivalence classes are not closed under addition; example: H + (-H) has rank zero even if H has higher rank.
Powers: if H₁ and H₂ are matrix equivalent, their squares H₁² and H₂² need not be matrix equivalent (unless additional conditions hold, like representing the same map with respect to the same starting and ending bases).
Inverses: if H₁ = P·H₂·Q and all three are invertible, then H₁⁻¹ = Q⁻¹·H₂⁻¹·P⁻¹ (the order reverses).

Section VI: Projection

🧭 Overview

🧠 One-sentence thesis

Orthogonal projection decomposes vectors into components lying in a subspace and components perpendicular to it, enabling applications from least-squares fitting to understanding linear map geometry.

📌 Key points (3–5)

What projection does: splits any vector into a part inside a subspace (the projection) and a part orthogonal to it.
How to compute it: for a line spanned by s, use the formula (v • s)/(s • s) · s; for higher-dimensional subspaces, use Gram-Schmidt or matrix formulas.
Gram-Schmidt process: converts any basis into an orthogonal (or orthonormal) basis by successively removing projections.
Common confusion: projection into a line vs. projection into a subspace—lines use a single vector formula; subspaces require summing projections or using a matrix A(AᵀA)⁻¹Aᵀ.
Why it matters: orthogonal projection minimizes distance, underpins least-squares solutions, and simplifies representations via orthonormal bases.

📐 Projection into a line

📐 The basic formula

Orthogonal projection of vector v into the line spanned by s: projs = (v • s)/(s • s) · s

This formula gives the component of v that lies along the direction of s.
The result is always a scalar multiple of s, so it lies in the line.
Example: projecting (2, 1) onto the line spanned by (3, −2) gives (4/13)·(3, −2) = (12/13, −8/13).

🔍 Why it's orthogonal

The vector v − projs is perpendicular to s.
Proof: compute (v − projs) • s and it equals zero after substitution.
This perpendicularity is the defining property: the projection is the closest point in the line to v.

📏 Minimizing distance

Among all vectors in the line, projs is the one closest to v.
Reason: the triangle formed by v, projs, and any other point w in the line has a right angle at projs, so by the Pythagorean theorem the hypotenuse v − w is longer than v − projs.
Don't confuse: "closest" means smallest Euclidean distance, not smallest component.

⚙️ Properties

Projection is idempotent: projecting twice gives the same result as projecting once, because once v is in the line, projecting again leaves it unchanged.
Linearity: proj[s](c₁v₁ + c₂v₂) = c₁ projs + c₂ projs.
Scaling invariance: replacing s by any nonzero scalar multiple cs does not change the projection.

🔧 Gram-Schmidt orthogonalization

🔧 The algorithm

Given a basis {β₁, β₂, ..., βₙ}, Gram-Schmidt produces an orthogonal basis {κ₁, κ₂, ..., κₙ}:

Set κ₁ = β₁.
For each i = 2, 3, ..., n, compute κᵢ = βᵢ − projκ₁ − projκ₂ − ... − projκᵢ₋₁.
Each κᵢ is orthogonal to all prior κⱼ (j < i).

The process "removes" the components of βᵢ that lie in the span of the earlier vectors.
Example: starting with {(1,1), (2,1)}, we get κ₁ = (1,1) and κ₂ = (2,1) − (3/2)·(1,1) = (1/2, −1/2).

🎯 Orthonormal bases

After Gram-Schmidt, normalize each κᵢ by dividing by its length: κᵢ / ‖κᵢ‖.
The result is an orthonormal basis: each vector has length 1 and all pairs are orthogonal.
Orthonormal bases simplify computations: representing a vector v becomes v = (v • κ₁)κ₁ + (v • κ₂)κ₂ + ....

🔄 Why it works

At each step, κᵢ is nonzero (otherwise the original vectors would be linearly dependent).
The dot product κᵢ • κⱼ (for j < i) equals zero by construction: the projection formula ensures orthogonality.
Don't confuse: Gram-Schmidt requires a linearly independent set; if the input is dependent, some κᵢ will be zero.

📊 Applications

Change of basis: the matrix Q whose columns are the orthonormal vectors satisfies QᵀQ = I.
Simplifying representations: in an orthonormal basis, coordinates are just dot products.
Projection into subspaces: once you have an orthonormal basis for a subspace, projecting is easy (see next section).

🧮 Projection into a subspace

🧮 General definition

Orthogonal projection of v into subspace M: the unique vector p ∈ M such that v − p is orthogonal to every vector in M.

This generalizes line projection: p is the "closest point" in M to v.
The vector v − p is called the orthogonal complement component.

🔢 Computing with an orthonormal basis

If M has orthonormal basis {κ₁, κ₂, ..., κₖ}, then:

proj_M(v) = (v • κ₁)κ₁ + (v • κ₂)κ₂ + ... + (v • κₖ)κₖ.
This is just the sum of projections into the lines spanned by each basis vector.
Example: projecting (1, 1, 2) into the plane spanned by orthonormal {κ₁, κ₂} gives (v • κ₁)κ₁ + (v • κ₂)κ₂.

🔨 Matrix formula

If M is spanned by columns of matrix A, then:

proj_M(v) = A(AᵀA)⁻¹Aᵀv.
The matrix P = A(AᵀA)⁻¹Aᵀ is the projection matrix.
Properties: P² = P (idempotent), Pᵀ = P (symmetric).

🧩 Direct sum decomposition

Any vector space V can be written as V = M ⊕ M⊥, where M⊥ is the orthogonal complement of M.
M⊥ = {all vectors perpendicular to every vector in M}.
Every v ∈ V decomposes uniquely as v = m + n with m ∈ M and n ∈ M⊥.
The projection is proj_M(v) = m.

🔍 Finding the orthogonal complement

To find M⊥ when M is spanned by {β₁, β₂, ..., βₖ}:

M⊥ = {v | v • β₁ = 0, v • β₂ = 0, ..., v • βₖ = 0}.
Solve this homogeneous system to get a basis for M⊥.
Example: if M = span{(1, 2, 3)}, then M⊥ is the solution set of x + 2y + 3z = 0.

⚠️ Common pitfall

Don't confuse: projection into M along N (general, non-orthogonal) vs. orthogonal projection into M.
Orthogonal projection is the special case where N = M⊥.
In general projection along N, the complement need not be perpendicular.

📈 Applications and geometry

📈 Least-squares fitting

Problem: solve Ax = b when no exact solution exists.
Solution: find x that minimizes ‖Ax − b‖.
Method: project b onto the column space of A; the solution satisfies AᵀAx = Aᵀb (the normal equations).
Example: fitting a line y = mx to data points uses projection to find the best m.

🔺 Minimizing distance

The projection proj_M(v) is the vector in M closest to v.
For any other w ∈ M, the distance ‖v − w‖ is larger than ‖v − proj_M(v)‖.
Proof: the triangle with sides v − proj_M(v), proj_M(v) − w, and v − w has a right angle, so the hypotenuse is longest.

🎭 Orthonormal matrices

A matrix Q is orthonormal (or orthogonal) if its columns form an orthonormal set.
Equivalently: QᵀQ = I.
Properties: Q⁻¹ = Qᵀ, and Q preserves lengths and angles (distance-preserving).
Example: rotation matrices are orthonormal.

🔄 Projection as a linear map

Projection into M (along M⊥) is a linear transformation t: V → V.
It satisfies t² = t (idempotent) and t(v) = v for all v ∈ M.
The range of t is M, and the null space is M⊥.

📐 Geometry of linear maps

Any linear map can be factored using projections and changes of basis.
Orthonormal bases simplify this: the matrix representation becomes cleaner.
Example: a rotation followed by a scaling can be expressed via orthonormal transformations.

🔬 Advanced properties

🔬 Bessel's inequality

For an orthonormal set {κ₁, κ₂, ..., κₖ} and any vector v:

(v • κ₁)² + (v • κ₂)² + ... + (v • κₖ)² ≤ ‖v‖².
Equality holds if and only if v is in the span of the orthonormal set.
Interpretation: the sum of squared projections cannot exceed the squared length.

🧮 Perpendicular subspaces

If M ⊆ N, then N⊥ ⊆ M⊥.
(M⊥)⊥ = M (taking the perpendicular twice returns the original subspace).
(M + N)⊥ = M⊥ ∩ N⊥.

🔧 Projection and null space

For a linear map f: Rⁿ → Rᵐ represented by matrix H:

The null space N(f) = {v | Hv = 0}.
N(f)⊥ = span of the rows of H (the row space).
This connects projection to solving linear systems.

⚙️ Uniqueness and existence

The orthogonal projection into a subspace M is unique: there is exactly one vector p ∈ M such that v − p ⊥ M.
Proof: if there were two, their difference would be in M and perpendicular to M, hence zero.
Existence follows from the Gram-Schmidt process or the matrix formula.

Determinants: Definition and Properties

Section I: Definition

🧭 Overview

🧠 One-sentence thesis

The determinant is a unique function on square matrices that satisfies multilinearity, sign-change under row swaps, and normalization, and it can be computed either by row reduction or by the permutation expansion formula.

📌 Key points (3–5)

What the determinant measures: a scalar value assigned to square matrices that indicates singularity (determinant zero) or nonsingularity (determinant nonzero).
Three defining properties: multilinearity in rows, sign-change when rows are swapped, and the identity matrix has determinant 1.
Two main computation methods: Gauss's Method (row reduction to echelon form) and the permutation expansion (sum over all permutations).
Common confusion: determinants are not linear over matrix addition—det(A + B) ≠ det(A) + det(B) in general.
Key connection: a matrix is nonsingular (invertible) if and only if its determinant is nonzero.

🧮 Computing determinants by row reduction

🧮 Gauss's Method approach

Reduce the matrix to echelon form using row operations.
The determinant of an echelon form matrix is the product down the diagonal.
Track how row operations change the determinant:
- Row combination (kρᵢ + ρⱼ): determinant unchanged.
- Row swap (ρᵢ ↔ ρⱼ): determinant changes sign.
- Row rescaling (kρᵢ): determinant multiplied by k.

Example: To find the determinant of a 4×4 matrix, apply Gauss's Method without rescaling rows. If the echelon form has diagonal entries 1, −7, −1, −1 and one row swap occurred, the determinant is −(1·(−7)·(−1)·(−1)) = +7.

🧮 Triangular matrices

Upper-triangular and lower-triangular matrices are already in (or close to) echelon form.
Their determinants are simply the product of diagonal entries.
Don't confuse: a diagonal matrix is a special case of both upper- and lower-triangular.

🔢 The permutation expansion formula

🔢 What the formula says

Determinant via permutation expansion: |T| = Σ (over all permutations φ) t₁,φ(1) t₂,φ(2) ··· tₙ,φ(n) · |Pφ|

A permutation φ is a rearrangement of {1, 2, ..., n}.
Each term picks one entry from each row and each column (no two from the same row or column).
Pφ is the permutation matrix corresponding to φ.
|Pφ| is +1 or −1 depending on whether φ is an even or odd permutation (the signum of φ).

🔢 Signum of a permutation

The signum sgn(φ) equals the determinant of the permutation matrix Pφ.
It is +1 if φ can be obtained from the identity by an even number of swaps; −1 if odd.
Example: For n=3, there are 3! = 6 permutations; three have signum +1 and three have signum −1.

🔢 Small cases

2×2 matrix: |A| = a·d·|P₁| + b·c·|P₂| = ad − bc (two permutations: identity and one swap).
3×3 matrix: six terms corresponding to six permutations; three positive, three negative.
Don't confuse: the "diagonal mnemonic" (adding products along diagonals) works for 2×2 and 3×3 but fails for 4×4 and larger.

🧩 Defining properties of determinants

🧩 The three axioms

A determinant function d on n×n matrices satisfies:

Multilinearity in rows: d(ρ₁, ..., k₁v₁ + k₂v₂, ..., ρₙ) = k₁·d(ρ₁, ..., v₁, ..., ρₙ) + k₂·d(ρ₁, ..., v₂, ..., ρₙ).
Alternating (sign-change on swap): swapping two rows multiplies the determinant by −1.
Normalization: d(I) = 1 (the identity matrix has determinant 1).

🧩 Uniqueness

These three properties uniquely determine the determinant function.
Any function satisfying these axioms must equal the determinant.

🧩 Consequences of the axioms

A matrix with two equal rows has determinant zero (swap them: d = −d implies d = 0).
A matrix with a zero row has determinant zero (factor out 0 from that row).
Row combination operations do not change the determinant (follows from multilinearity and the alternating property).

🔗 Key determinant properties

🔗 Transpose

|Aᵀ| = |A|: the determinant of the transpose equals the determinant of the original.
Consequence: all row properties have column analogues (e.g., swapping columns changes sign, columns can be multilinear).

🔗 Product of matrices

|AB| = |A|·|B|: the determinant of a product is the product of determinants.
Consequence: |A⁻¹| = 1/|A| (since |A·A⁻¹| = |I| = 1).

🔗 Similar matrices

If B = P⁻¹AP then |B| = |A| (similar matrices have the same determinant).
Proof: |B| = |P⁻¹|·|A|·|P| = |A|·|P⁻¹|·|P| = |A|·|I| = |A|.

🔗 Singularity test

Condition	Determinant	Interpretation
Matrix is nonsingular (invertible)	≠ 0	Columns/rows are linearly independent
Matrix is singular (not invertible)	= 0	Columns/rows are linearly dependent

Example: To test whether a system has a unique solution, compute the determinant of the coefficient matrix; nonzero determinant guarantees unique solution.

🎯 Special cases and patterns

🎯 Block matrices

For a block upper-triangular matrix with blocks J (p×p) and K (q×q) on the diagonal and zeros in the lower-left block:
- |T| = |J|·|K| (determinant is the product of the block determinants).

🎯 Vandermonde determinant

The excerpt mentions a 3×3 case:

| 1  a  a² |
| 1  b  b² | = (b−a)(c−a)(c−b)
| 1  c  c² |

This generalizes to n×n Vandermonde matrices.

🎯 Permutation matrices

A permutation matrix P has |P| = ±1.
|P| = +1 if P corresponds to an even permutation; −1 if odd.
Permutation matrices satisfy PPᵀ = I (they are orthogonal).

⚠️ Common pitfalls

⚠️ Not linear over addition

False: |A + B| = |A| + |B|.
Example from excerpt: |2A| ≠ 2|A| in general; instead, |cA| = cⁿ|A| for an n×n matrix (factor c out of each of n rows).

⚠️ The diagonal mnemonic fails for n ≥ 4

The excerpt explicitly warns: the pattern of summing products along diagonals works for 2×2 and 3×3 but gives wrong answers for 4×4 matrices.
Example: A 4×4 matrix with two equal rows is singular (determinant 0), but the diagonal mnemonic gives a nonzero value.

⚠️ Scalar multiplication

Multiplying one row by k multiplies the determinant by k.
Multiplying the entire matrix by k (all entries) multiplies the determinant by kⁿ (not k).
Don't confuse: |kA| = kⁿ|A|, not k|A|.

🔍 Geometric and algebraic meaning

🔍 Area and volume interpretation

The excerpt gives a 2×2 geometric argument: the determinant equals the signed area of a parallelogram with column vectors as sides.
The sign distinguishes orientation (counterclockwise vs. clockwise).

🔍 Linear independence

A set of n vectors in Rⁿ is linearly independent if and only if the matrix with those vectors as columns has nonzero determinant.
Equivalently: the vectors form a basis for Rⁿ.

🔍 Eigenvalues (brief mention)

The excerpt shows that |T − xI| is a polynomial of degree n in x.
The roots of this polynomial (at most n) are related to when T − xI is singular.

📐 Worked examples from the excerpt

📐 Example: 3×3 via Gauss's Method

| 3  1  2 |     | 3  1   2 |     | 3  1   2 |
| 3  1  0 | --> | 0  0  -2 | --> | 0  1   4 | = 6
| 0  1  4 |     | 0  1   4 |     | 0  0  -2 |

One row swap (sign change), then multiply diagonal: −(3·1·(−2)) = 6.

📐 Example: 2×2 permutation expansion

| 2  1 |
| 3  1 | = 2·1·|P₁| + 1·3·|P₂| = 2·1·(+1) + 1·3·(−1) = 2 − 3 = −1

Two permutations: identity (sgn +1) and swap (sgn −1).

📐 Example: collinear points

Three points (x₁,y₁), (x₂,y₂), (x,y) are collinear if and only if:
```
| 1  x₁  y₁ |
| 1  x₂  y₂ | = 0
| 1  x   y  |
```
Expanding this determinant gives the equation of the line through the first two points.

Section II: Geometry of Determinants

🧭 Overview

🧠 One-sentence thesis

Determinants measure the size (area, volume, or higher-dimensional content) of boxes formed by vectors and reveal how linear transformations scale and orient these geometric objects.

📌 Key points (3–5)

Determinants as size functions: the absolute value of a determinant gives the volume of the box formed by a sequence of vectors.
How transformations affect size: when a linear transformation acts on a box, the determinant of the transformation matrix tells you the scaling factor—multiply the original size by the absolute determinant to get the new size.
Orientation matters: the sign of the determinant indicates whether a transformation preserves orientation (positive) or reverses it (negative).
Common confusion: a box is formed by an ordered sequence of vectors (order matters, coefficients in [0,1]), not a span (unordered set, any coefficients); also, preserving area does not mean preserving individual vector lengths.
Determinant multiplication rule: the determinant of a product of matrices equals the product of their determinants, which explains why composition of transformations multiplies their scaling factors.

📦 Boxes and their volumes

📦 What is a box

A box is formed by an ordered sequence of vectors; it consists of all points that can be written as weighted sums of those vectors with coefficients in the interval [0,1].

The volume (or area in 2D) of a box is the absolute value of the determinant formed by those vectors.
Example: to find if a vector lies inside a box, solve for the coefficients; if all coefficients are in [0,1], the vector is inside.
Don't confuse: a box requires exactly n vectors in n-dimensional space, and the order matters; a span can have any number of vectors and order doesn't matter.

📏 Computing box size

To find the size of a box formed by vectors, arrange the vectors as columns (or rows) of a matrix and compute the absolute value of the determinant.
If the box is not positioned at the origin, slide it to the origin first (translation does not change size).
Example: a box formed by the sequence ⟨(3,0), (2,1)⟩ has area equal to the absolute value of the determinant with columns (3,0) and (2,1), which is 3.

🔄 How transformations change size

🔄 The scaling rule

When a linear transformation T acts on a box S, the size of the image box T(S) equals the size of S multiplied by the absolute value of the determinant of T.
Formula: |T(S)| = |T| · |S|.
Example: if a starting box has area 6 and the transformation matrix has determinant −14, the image box has area 6 × 14 = 84.

🔄 Determinant of products

The determinant of a product of matrices equals the product of their determinants: |AB| = |A| · |B|.
This explains why composing transformations multiplies their scaling factors.
Don't confuse: |AB| = |BA| even though AB and BA may be different matrices; the scaling effect is commutative even if the transformations are not.

🔄 Finding image size step-by-step

Represent the transformation with respect to standard bases (as a matrix).
Multiply each vector in the box by the transformation matrix to get the image vectors.
Compute the determinant of the matrix formed by the image vectors.
The absolute value of this determinant is the size of the image box.

🧭 Orientation and sign

🧭 What orientation means

The sign of the determinant indicates orientation: positive means the transformation preserves orientation, negative means it reverses orientation.
Example: if the determinant of a transformation matrix is positive and the starting box is positively oriented, the ending box is also positively oriented.

🧭 Positive vs negative orientation

Orientation	Determinant sign	Interpretation
Positive	+1 (or positive)	"Right-hand" orientation; standard basis orientation
Negative	−1 (or negative)	"Left-hand" orientation; reversed from standard

In 3D, positive orientation is called "right-hand orientation": if you curl your fingers from the first basis vector to the second, your thumb points along the third.
Example: rotating the standard basis by π/4 preserves orientation (determinant +1); reflecting across a line reverses orientation (determinant −1).

🧭 Cycles of bases

As you continuously rotate a basis, the determinant changes continuously but cannot jump from positive to negative without passing through zero.
In 2D, there are two cycles: one with all determinants +1 (positive orientation) and one with all determinants −1 (negative orientation).
In 3D, there are 48 possible bases formed by unit vectors along the axes; half are positively oriented and half are negatively oriented.

🔍 Special properties and cases

🔍 When determinants are zero

If the vectors forming a box are linearly dependent, the determinant is zero (the box is "flat" with no volume).
If a transformation has determinant zero, it collapses all boxes to zero volume (the image of any linearly dependent set is linearly dependent).

🔍 Determinants and inverses

For any invertible matrix A: |A · A⁻¹| = |A| · |A⁻¹| = 1, so |A⁻¹| = 1/|A|.
If |A|² = 1, then |A| = ±1; but this does not mean A is a permutation matrix (the transpose equals the inverse is a stronger condition).
Example: the matrix with rows (3,1) and (2,1) has determinant 1 but is not a permutation matrix.

🔍 Determinants and transposes

The determinant of a matrix equals the determinant of its transpose: |Aᵀ| = |A|.
This means row operations and column operations have symmetric effects on determinants.

🔍 Rotation matrices

A rotation matrix in 2D has the form with cos θ and sin θ entries; its determinant is always 1.
This confirms that rotations preserve area and orientation.

🧮 Geometric applications

🧮 Area of triangles

The area of a triangle with vertices (x₁,y₁), (x₂,y₂), (x₃,y₃) is half the absolute value of the determinant with rows (x₁,y₁,1), (x₂,y₂,1), (x₃,y₃,1).
Geometric insight: the three vectors (with z=1) form a box in 3D; the triangle is the base of this box, so its area is half the box volume.
Example: to find the area of a triangle in 3D, form two edge vectors, find a unit vector orthogonal to both, and compute half the absolute determinant of the three vectors.

🧮 Collinearity test

Three points (x,y), (x₂,y₂), (x₃,y₃) are collinear if and only if the determinant with rows (x,y,1), (x₂,y₂,1), (x₃,y₃,1) equals zero.
This is because the "box" formed by these three vectors has zero volume when the triangle is degenerate (all three points on a line).

🧮 Altitude and distance formulas

The altitude of a triangle through a vertex can be expressed using the determinant formula divided by the square root of the sum of squares of coordinate differences.
This reveals how determinants encode not just area but also perpendicular distances in geometric configurations.

⚙️ Determinant function properties

⚙️ Uniqueness and definition

Determinant functions are unique: any function satisfying the four defining properties (multilinearity, sign change on row swap, scaling, identity = 1) must be the determinant.
This uniqueness allows us to define determinants geometrically (as size functions) and prove they satisfy the algebraic properties.

⚙️ Row operations preserve the determinant relationship

Adding a multiple of one row to another does not change the determinant.
Swapping two rows changes the sign of the determinant.
Multiplying a row by a scalar k multiplies the determinant by k.
These properties hold whether you perform the row operation on the matrix first or on the product with another matrix.

⚙️ Similarity preserves determinant

If H = P⁻¹GP, then |H| = |G| (similar matrices have the same determinant).
Proof: |H| = |P⁻¹| · |G| · |P| = |P⁻¹P| · |G| = |G|.
This means the determinant is a property of the linear transformation itself, not just its matrix representation.

Laplace's Formula

Section III: Laplace’s Formula

🧭 Overview

🧠 One-sentence thesis

Laplace's expansion provides a systematic method to compute determinants by expanding along any row or column using cofactors, and this leads to practical formulas for matrix inverses and solving linear systems.

📌 Key points (3–5)

Cofactor definition: The (i,j) cofactor is (-1)^(i+j) times the determinant of the (i,j) minor (the submatrix with row i and column j deleted).
Expansion flexibility: A determinant can be expanded along any row or column, not just the first row—all give the same result.
Adjoint matrix: The adjoint (adj(T)) is the transpose of the cofactor matrix; it satisfies T · adj(T) = det(T) · I, leading to the inverse formula T^(-1) = adj(T) / det(T).
Common confusion: Don't confuse minors (submatrices) with cofactors (signed determinants of minors); the cofactor includes the checkerboard sign pattern (-1)^(i+j).
Why it matters: Laplace's formula enables Cramer's Rule for solving systems and provides explicit formulas for inverses, though direct computation is often impractical for large matrices.

🧮 Computing determinants via expansion

🧮 The cofactor formula

Cofactor T_{i,j}: The (i,j) cofactor is (-1)^(i+j) times the determinant of the (i,j) minor, where the minor is the (n-1)×(n-1) submatrix obtained by deleting row i and column j.

The sign (-1)^(i+j) creates a checkerboard pattern: positive when i+j is even, negative when i+j is odd.
Example: For a 3×3 matrix, the (2,3) cofactor is (-1)^(2+3) = -1 times the determinant of the 2×2 minor formed by deleting row 2 and column 3.

📐 Expansion along any row or column

The determinant equals the sum of entries in row i (or column j) times their cofactors:

Row expansion: |T| = t_{i,1}·T_{i,1} + t_{i,2}·T_{i,2} + ... + t_{i,n}·T_{i,n}
Column expansion: |T| = t_{1,j}·T_{1,j} + t_{2,j}·T_{2,j} + ... + t_{n,j}·T_{n,j}

The excerpt demonstrates this with multiple examples (Four.III.1.13, Four.III.1.14) showing that expanding along different rows or columns yields the same determinant value.

🎯 Strategic expansion choices

Choose rows or columns with zeros to minimize computation.
Example (Four.III.1.14a): Expanding along a row with a zero means one fewer cofactor to compute.
Don't confuse: "Expanding down the diagonal" (multiplying diagonal entries) does NOT generally equal the determinant—this only works for triangular matrices.

🔧 The adjoint matrix and inverses

🔧 Constructing the adjoint

Adjoint matrix adj(T): The transpose of the cofactor matrix; entry (i,j) of adj(T) is the cofactor T_{j,i}.

Note the index swap: the (i,j) entry of the adjoint is T_{j,i}, not T_{i,j}.
The excerpt shows detailed calculations (Four.III.1.12, Four.III.1.15, Four.III.1.16) building adjoints by computing all cofactors and transposing.

Step	What to do	Example notation
1. Compute all cofactors	Calculate T_{i,j} for each position	T_{1,1}, T_{1,2}, ..., T_{n,n}
2. Arrange in matrix	Place cofactors in their positions	Cofactor matrix
3. Transpose	Swap rows and columns	adj(T)

🔄 The fundamental identity

The key relationship: T · adj(T) = det(T) · I

This holds for any square matrix, singular or not.
When det(T) ≠ 0, dividing both sides by det(T) gives: T^(-1) = adj(T) / det(T)
Example (Four.III.1.17): To find the inverse, compute the adjoint and divide each entry by the determinant.

⚠️ Special matrix structures

The excerpt proves (Four.III.1.23, Four.III.1.26) that diagonal and triangular matrices have special adjoint properties:

Diagonal matrix: If D has diagonal entries d₁, d₂, ..., dₙ, then adj(D) is diagonal with entries (product of all other d's).
Upper triangular: If T is upper triangular, then adj(T) is also upper triangular.
Don't confuse: The adjoint of a transpose equals the transpose of the adjoint (Four.III.1.24), because (−1)^(i+j) = (−1)^(j+i) and minors transpose correspondingly.

📊 Applications to linear systems

📊 Cramer's Rule

For a system Ax = b with det(A) ≠ 0, each variable is:

x_i = det(B_i) / det(A)

where B_i is A with column i replaced by b.

The excerpt derives this (Topic: Cramer's Rule, problem 4) by writing the system as A·X_i = B_i, where X_i is a matrix with x_i in position (i,i).
Example (problem 1a): Solving a 2×2 system by computing two determinants, one for each variable.

🔢 Integer solutions

When A and b have integer entries and det(A) ≠ 0:

Each x_i = det(B_i) / det(A) is a ratio of integers.
Since B_i has integer entries, det(B_i) is an integer (problem 5).
This guarantees rational solutions (though not necessarily integer solutions unless det(A) = ±1).

🚫 Singular systems

For systems where det(A) = 0:

Cramer's Rule does not apply.
Infinitely many solutions: All det(B_i) = 0 as well (problem 7).
No solutions: At least one det(B_i) ≠ 0.
Example (problem 8): A system with c = 6 has infinitely many solutions (all determinants zero), but c ≠ 6 has no solutions (some B_i determinant nonzero).

🔍 Theoretical foundations

🔍 Permutation expansion connection

The excerpt proves (Four.III.1.27) that Laplace expansion is equivalent to the permutation definition:

Each term in the permutation expansion contains exactly one entry from each row.
Factoring out row i entries: |T| = t_{i,1}·(sum of terms without row i) + ... + t_{i,n}·(sum of terms without row i).
The "sum of terms without row i" for position (i,j) equals the cofactor T_{i,j}.
Key insight: An n-permutation φ with φ(n) = n has the same sign as the corresponding (n-1)-permutation, because they have the same number of inversions.

🔄 Determinant of transpose

The proof (Four.III.1.28) uses induction on matrix size:

Base case: For 1×1 matrices, |T| = |T^T| trivially.
Inductive step: Assume true for (n-1)×(n-1) matrices. Expanding T on row i and T^T on column i gives matching terms because:
- The signs (−1)^(i+j) = (−1)^(j+i) are equal.
- The (j,i) minor of T^T is the transpose of the (i,j) minor of T.
- By induction, these minors have equal determinants.

📐 Geometric interpretation

The excerpt includes (Four.II.1.31) a geometric view for 3×3 determinants:

Three vectors ending in the z=1 plane form a box.
The determinant equals zero if and only if the triangle formed by the endpoints is degenerate (collinear points).
This connects to the formula for triangle area: Area = (1/2)|determinant| when coordinates are in the z=1 plane.
Application: When coordinates are integers, the determinant is an integer, so the minimum nonzero triangle area is 1/2.

Complex Vector Spaces and Similarity

Section I: Complex Vector Spaces

🧭 Overview

🧠 One-sentence thesis

This section demonstrates how homogeneous coordinate vectors represent geometric objects (points and lines) in projective geometry and how transformation matrices encode reflections, rotations, and translations in computer graphics.

📌 Key points (3–5)

Homogeneous coordinates: Points and lines are represented by three-component vectors where scalar multiples represent the same geometric object.
Intersection representation: Points defined as intersections of two lines can be expressed by combining the homogeneous coordinate vectors of those lines with parameters.
Parameter relationships: When a point lies on multiple lines, the parameters in its homogeneous coordinate vectors must satisfy specific relationships (e.g., u₀u₁u₂ = 1).
Transformation matrices: In computer graphics, 3×3 matrices encode geometric transformations (translation, reflection, rotation) using homogeneous coordinates in R².
Common confusion: Homogeneous coordinate vectors are not unique—any scalar multiple represents the same point, so different parameter combinations can describe the same geometric object.

📐 Homogeneous coordinate representation

📍 What homogeneous coordinates encode

Homogeneous coordinate vectors: three-component vectors that represent points or lines in projective geometry, where scalar multiples represent the same geometric object.

A point or line is not tied to a single vector; any non-zero multiple of the vector represents the same object.
The excerpt shows points like U₂ and U₁ expressed as linear combinations of basis vectors with parameters.
Example: U₂ has the form d(1,0,1) + d(1,1,1) = (d+d, d, d+d), which simplifies to a homogeneous coordinate vector (1, u₂, 1) after parameter manipulation.

🔗 Intersection points and parameter equations

When a point is defined as the intersection of two lines, its homogeneous coordinate vector is found by setting two linear combinations equal.
The excerpt shows V₁ = T₀U₂ ∩ U₀T₂ leads to equations: g(1,0,0) + h(1,u₂,1) = i(u₀,1,1) + j(0,0,1).
Solving these component-wise gives relationships among parameters (e.g., g+h = iu₀, hu₂ = i, h = i+j).
After substitution, V₁ has the two-parameter homogeneous coordinate vector (u₀u₂, u₂, 1).

🔄 Multiple representations of the same point

A single point can have different homogeneous coordinate vector forms depending on which lines define it.
The excerpt shows V₁ can be written as (u₀u₂, u₂, 1) from one intersection and as (q, p+q, qu₁) from lying on line T₁U₁.
These must represent the same point, so there is a relationship: comparing the two forms gives (u₀u₁u₂, u₁u₂, u₁) and leads to the constraint u₀u₁u₂ = 1.
Don't confuse: different parameter sets can describe the same geometric object; the key is recognizing when two forms are scalar multiples.

🖼️ Computer graphics transformations

🔄 Transformation matrices in R²

In computer graphics, 3×3 matrices encode transformations using homogeneous coordinates (x, y, 1) for points in R².
The bottom row is typically (0, 0, 1) to preserve the homogeneous structure.
The excerpt shows a reflection about the line y = 2x encoded as:
```
(-3/5,  4/5, 0)
( 4/5,  3/5, 0)
(   0,    0, 1)
```

🪞 Reflection about a line through the origin

To find the reflection matrix M, identify how basis vectors transform.
The excerpt uses two test vectors: (1, 2) maps to itself (on the line), and (-2, 1) maps to (2, -1) (perpendicular to the line).
Solving M * [(1, -2), (2, 1)] = [(1, 2), (2, -1)] gives the matrix M.
The homogeneous form adds a third row and column to embed this in 3×3 format.

🔀 Composing transformations

Complex transformations are built by multiplying simpler transformation matrices.
The excerpt shows reflection about a line not through the origin: translate by -4, reflect about y = 2x, then translate back by +4.
Matrix product:
```
(translate by -4) × (reflect) × (translate by +4)
```
gives the combined transformation matrix.

🔄 Rotation and translation combined

The excerpt demonstrates that rotation followed by translation is encoded as:
```
(cos θ, -sin θ, tₓ)
(sin θ,  cos θ, tᵧ)
(    0,      0,  1)
```
This is the product of a translation matrix and a rotation matrix.
For 3D (4×4 matrices), the excerpt shows rotation matrices with an extra dimension, where the third row/column handles the z-axis.

🧮 Parameter relationships and constraints

🔗 Deriving parameter constraints

When a point lies on multiple lines, its different homogeneous coordinate representations must be scalar multiples.
The excerpt shows V₁ on line T₁U₁ has form (q, p+q, qu₁) (from equation *).
From a previous part, V₁ also has form (u₀u₂, u₂, 1).
Matching these (allowing for scalar multiplication) gives (u₀u₁u₂, u₁u₂, u₁) and the relationship u₀u₁u₂ = 1.

🎯 Using constraints to simplify forms

Once a constraint like u₀u₁u₂ = 1 is known, it can simplify other homogeneous coordinate vectors.
The excerpt shows V₂ has form (u₀u₁u₂, u₂, u₁u₂), which becomes (1, u₂, u₁u₂) after substituting the constraint.
Verifying V₂ lies on line T₂U₂ (form r(0,0,1) + s(1,u₂,1)) requires choosing parameters s=1 and r=u₁u₂-1.
This confirms consistency: the homogeneous coordinate vector derived from one intersection matches the form required by another line.

Section II: Similarity

🧭 Overview

🧠 One-sentence thesis

Two matrices are similar when they represent the same linear transformation with respect to different bases, and a matrix is diagonalizable when it can be made diagonal through a change of basis, which happens precisely when the transformation has enough eigenvectors to form a basis.

📌 Key points (3–5)

What similarity means: matrices T and S are similar if T = PSP⁻¹ for some invertible P; they represent the same transformation with different bases.
Diagonalizability criterion: a matrix is diagonalizable if and only if there exists a basis of eigenvectors; this happens when the geometric multiplicity equals the algebraic multiplicity for every eigenvalue.
Eigenvalues and eigenvectors: λ is an eigenvalue of T if T𝑣⃗ = λ𝑣⃗ for some nonzero 𝑣⃗ (an eigenvector); eigenvalues are roots of the characteristic polynomial det(T - xI) = 0.
Common confusion: algebraic vs geometric multiplicity—algebraic multiplicity is how many times λ appears as a root of the characteristic equation; geometric multiplicity is the dimension of the eigenspace (how many independent eigenvectors exist for λ).
Why it matters: similar matrices share invariants (rank, determinant, eigenvalues); diagonalization simplifies computations like matrix powers.

🔗 Similarity as change of basis

🔗 Definition of similarity

Two matrices T and S are similar if there exists an invertible matrix P such that T = PSP⁻¹.

This is not just any matrix equation; it encodes a geometric relationship.
The matrices represent the same linear transformation t, but with respect to different bases.
The matrix P is the change-of-basis matrix: P = Rep_{B,D}(id).

🗺️ The similarity diagram

The relationship is captured by this commutative diagram:

V wrt B  --t→  V wrt B
   ↓ id         ↓ id
V wrt D  --t→  V wrt D

Top arrow: representation of t with respect to basis B (matrix T).
Bottom arrow: representation of t with respect to basis D (matrix S).
Vertical arrows: identity map, represented by change-of-basis matrices P and P⁻¹.
Reading the diagram: T = P⁻¹SP, or equivalently S = PTP⁻¹.

Example: If T = Rep_{E₂,E₂}(t) represents a transformation with respect to the standard basis, and we want S = Rep_{B,B}(t) with respect to a new basis B, then S = PTP⁻¹ where P = Rep_{B,E₂}(id).

🔄 Properties preserved by similarity

Similarity is an equivalence relation (reflexive, symmetric, transitive).

Similar matrices share these invariants:

Rank: both represent the same transformation, so the dimension of the range is the same.
Determinant: |PSP⁻¹| = |P| · |S| · |P⁻¹| = |P| · |S| · |P|⁻¹ = |S|.
Characteristic polynomial: similar matrices have the same eigenvalues.

Don't confuse: similarity with matrix equivalence—similarity is stricter; all similar matrices are equivalent, but not all equivalent matrices are similar (Exercise 14 confirms this).

🎯 Diagonalizability

🎯 What it means to diagonalize

A matrix T is diagonalizable if it is similar to a diagonal matrix D.

In other words: T = PDP⁻¹ for some diagonal D and invertible P.
Geometrically: the transformation has a basis B such that Rep_{B,B}(t) is diagonal.
When T is diagonal, T𝛽ᵢ = λᵢ𝛽ᵢ for each basis vector 𝛽ᵢ.

🧱 The diagonalization procedure

To diagonalize a matrix T:

Find eigenvalues: solve the characteristic equation det(T - xI) = 0.
Find eigenvectors: for each eigenvalue λ, solve (T - λI)𝑣⃗ = 0⃗.
Check dimension: verify that you have n independent eigenvectors (for an n×n matrix).
Build P: columns of P are the eigenvectors.
Build D: diagonal entries of D are the corresponding eigenvalues.

Example (from Exercise 2.6): To diagonalize T = (4 -2; 1 1):

Solve det(T - xI) = 0 gives (4-x)(1-x) + 2 = x² - 5x + 6 = (x-2)(x-3) = 0.
Eigenvalues: λ₁ = 2, λ₂ = 3.
For λ₁ = 2: solve (2 -2; 1 -1)(b₁; b₂) = (0; 0) gives 𝛽₁ = (1; 1).
For λ₂ = 3: solve (1 -2; 1 -2)(b₁; b₂) = (0; 0) gives 𝛽₂ = (2; 1).
Result: D = (2 0; 0 3) = P⁻¹TP where P = (1 2; 1 1).

⚠️ When diagonalization fails

Not every matrix is diagonalizable.

Example (Exercise 2.10): The matrix N = (0 0; 1 0) is not diagonalizable because solving (0 0; 1 0)(b₁; b₂) = x(b₁; b₂) gives only one independent eigenvector (0; 1) for eigenvalue x = 0.

The problem: we need n independent eigenvectors to form a basis, but this matrix doesn't have enough.

🔢 Eigenvalues and eigenvectors

🔢 Core definitions

A scalar λ is an eigenvalue of transformation t (or matrix T) if there exists a nonzero vector 𝑣⃗ such that t(𝑣⃗) = λ𝑣⃗.

The nonzero vector 𝑣⃗ satisfying t(𝑣⃗) = λ𝑣⃗ is an eigenvector associated with eigenvalue λ.

The eigenspace V_λ is the set of all vectors 𝑣⃗ (including the zero vector) such that t(𝑣⃗) = λ𝑣⃗.

An eigenspace is a subspace: it is the null space of (T - λI).
Eigenvectors are special: the transformation simply scales them by λ.
The zero vector is in every eigenspace but is not an eigenvector.

📐 Finding eigenvalues: the characteristic polynomial

The characteristic polynomial of T is det(T - xI), where x is a variable.

Eigenvalues are precisely the roots of the characteristic equation det(T - xI) = 0.
Why: λ is an eigenvalue ⟺ (T - λI)𝑣⃗ = 0⃗ has nonzero solutions ⟺ T - λI is singular ⟺ det(T - λI) = 0.

Example (Exercise 3.23a): For T = (10 -9; 4 -2), the characteristic polynomial is det(T - xI) = (10-x)(-2-x) + 36 = x² - 8x + 16 = (x-4)². So λ = 4 is the only eigenvalue (with algebraic multiplicity 2).

Shortcut for triangular matrices (Exercise 3.32): If T is triangular, det(T - xI) is the product down the diagonal: (t₁,₁ - x)(t₂,₂ - x)···(tₙ,ₙ - x). So the eigenvalues are just the diagonal entries.

🧮 Algebraic vs geometric multiplicity

The algebraic multiplicity of eigenvalue λ is the number of times (λ - x) appears as a factor in the characteristic polynomial.

The geometric multiplicity of eigenvalue λ is the dimension of the eigenspace V_λ.

Geometric multiplicity ≤ algebraic multiplicity (always).
For diagonalizability, we need equality for every eigenvalue.

Example (Exercise 3.28c): For the matrix with characteristic polynomial x³ - 5x² + 8x - 4 = (x-1)(x-2)²:

λ₁ = 1: algebraic multiplicity 1, geometric multiplicity 1 (eigenspace is 1-dimensional).
λ₂ = 2: algebraic multiplicity 2, geometric multiplicity 1 (eigenspace is only 1-dimensional).
This matrix is not diagonalizable because the geometric multiplicity of λ₂ is less than its algebraic multiplicity.

Don't confuse: a repeated root in the characteristic polynomial does not automatically mean the matrix is non-diagonalizable; you must check the dimension of the eigenspace (Exercise 3.28b shows a case where algebraic multiplicity 2 equals geometric multiplicity 2).

🧩 Computational techniques

🧩 Solving for eigenvectors

Once you have an eigenvalue λ, find eigenvectors by solving the homogeneous system (T - λI)𝑣⃗ = 0⃗.

Example (Exercise 3.24a): For T = (3 0; 8 -1) with eigenvalue λ₁ = 3:

Set up: (3-3 0; 8 -1-3)(b₁; b₂) = (0 0; 8 -4)(b₁; b₂) = (0; 0).
Row reduce: second equation gives 8b₁ - 4b₂ = 0, so b₂ = 2b₁.
Eigenspace: {(b₁; 2b₁) | b₁ ∈ ℂ}.
Eigenvector: (1; 2).

🔧 Powers of diagonal matrices

If D is diagonal with entries d₁, ..., dₙ, then D^p has diagonal entries d₁^p, ..., dₙ^p.

This makes computing powers of diagonalizable matrices easy:

If T = PDP⁻¹, then T^k = PD^kP⁻¹.
Just raise the diagonal entries to the k-th power.

Example (Exercise 2.17): To compute (matrix)^k, first diagonalize, then raise the diagonal entries to the k-th power, then transform back.

🎲 Polynomials of matrices

If T = PSP⁻¹ and p(x) is a polynomial, then p(T) = Pp(S)P⁻¹.

Why (Exercise 2.21):

p(T) = cₙT^n + ··· + c₁T + c₀I
= cₙ(PSP⁻¹)^n + ··· + c₁(PSP⁻¹) + c₀I
= cₙPS^nP⁻¹ + ··· + c₁PSP⁻¹ + c₀PP⁻¹
= P(cₙS^n + ··· + c₁S + c₀I)P⁻¹
= Pp(S)P⁻¹.

This is useful because if S is diagonal, p(S) is easy to compute.

🧪 Special cases and examples

🧪 2×2 matrices

For a 2×2 matrix T = (a b; c d):

Characteristic polynomial: x² + (-a-d)x + (ad-bc) = 0.
Eigenvalues: use the quadratic formula.

Example (Exercise 3.34): The characteristic polynomial is x² - (trace)x + (determinant).

🧪 Symmetric matrices

The excerpt mentions (Exercise 2.20b) that real symmetric 2×2 matrices are always similar to real diagonal matrices.

The discriminant in the quadratic formula is always non-negative for symmetric matrices.
This means eigenvalues are always real.

🧪 Matrices that cannot be diagonalized

Example 1 (Exercise 2.19): The matrix (1 c; 0 1) with c ≠ 0 cannot be diagonalized. The proof shows that assuming it is diagonal leads to P being singular.

Example 2 (Exercise 3.36): The matrix (0 1; 0 0) has only eigenvalue 0, but its eigenspace is only 1-dimensional, so it cannot be diagonalized.

🧪 Zero and identity matrices

The zero matrix is similar only to itself (Exercise 1.6a).
The identity matrix is similar only to itself (Exercise 1.12).
Scalar multiples of the identity cI are similar only to themselves (Exercise 1.6b).

Don't confuse: these are extreme cases; most matrices have many matrices in their similarity class.

Nilpotence

Section III: Nilpotence

🧭 Overview

🧠 One-sentence thesis

Nilpotent transformations eventually send all vectors to zero after repeated application, and their structure is completely captured by analyzing how the chains of range and null spaces stabilize, leading to a canonical form built from "strings" of basis vectors.

📌 Key points (3–5)

What nilpotence means: a transformation is nilpotent if some power of it (applying it repeatedly) becomes the zero map; the smallest such power is the index of nilpotency.
How chains stabilize: repeatedly applying a transformation produces nested chains of range spaces (shrinking) and null spaces (growing) that eventually stop changing at some power k.
String basis structure: nilpotent maps can be represented by a special basis where each basis vector either maps to zero or to the next vector in a "string," and the canonical form is a block matrix of subdiagonal ones.
Common confusion: the index of nilpotency is bounded by the dimension of the space—by the n-th iteration, the null space has grown as large as possible (Lemma 1.4).
Why it matters: every nilpotent matrix is similar to a canonical form determined entirely by the lengths of its strings, and nilpotent matrices have only zero as an eigenvalue.

🔗 Chains of range and null spaces

🔗 How the chains are built

Start with a transformation t on a vector space V.
The range space chain: V ⊇ R(t) ⊇ R(t²) ⊇ R(t³) ⊇ …
- Each range space is contained in the previous one because applying t again can only shrink or keep the image the same.
The null space chain: {0} ⊆ N(t) ⊆ N(t²) ⊆ N(t³) ⊆ …
- If t^j(v) = 0, then t^(j+1)(v) = t(t^j(v)) = t(0) = 0, so each null space contains the previous one.

⏹️ When the chains stabilize

Lemma 1.4: For a map on an n-dimensional space, the nullity (dimension of the null space) has grown as large as possible by the n-th iteration.
Once the dimension of R(t^k) stops decreasing, the containments become equalities: R(t^k) = R(t^(k+1)) = R(t^(k+2)) = … (the "generalized range space" R^∞(t)).
Similarly, once the dimension of N(t^k) stops increasing, N(t^k) = N(t^(k+1)) = … (the "generalized null space" N^∞(t)).
The dimensions of R(t^j) and N(t^j) always sum to n, so when one stops changing, so does the other.

🔍 Examples of stabilization

Zero transformation: R(t) = {0} = {0} = … and N(t) = V = V = …
Identity transformation: R(t) = V = V = … and N(t) = {0} = {0} = …
Shift operator (x, y, z) ↦ (0, x, y): the null space after one application is all triples starting with one zero; after two applications, triples starting with two zeros; after three, the entire space. The map stabilizes after three iterations.

🧵 String bases and canonical form

🧵 What a t-string is

A t-string starting from v is the sequence v, t(v), t²(v), …, t^(k−1)(v), where t^k(v) = 0 but t^(k−1)(v) ≠ 0.

The length of the string is k (the number of non-zero vectors in the sequence).
Example: if β₁ ↦ β₂ ↦ β₃ ↦ 0, this is a string of length 3.

🏗️ Building a string basis

A string basis is a basis for V made up entirely of t-strings.
Each string starts with some vector v and continues until it hits zero.
The basis may contain multiple strings of different lengths.
Example from Five.III.2.23:
- β₁ ↦ β₂ ↦ β₃ ↦ 0 (length 3)
- β₄ ↦ 0 (length 1)
- β₅ ↦ 0 (length 1)

🧱 Canonical form matrix

With respect to a string basis, the transformation is represented by a block-diagonal matrix.
Each block corresponds to one string and has the form:
```
[ 0  0  0 ]
[ 1  0  0 ]
[ 0  1  0 ]
```
(subdiagonal ones, zeros elsewhere).

Example: a 5×5 canonical form with a length-3 string and a length-2 string:

[ 0  0  0  0  0 ]
[ 1  0  0  0  0 ]
[ 0  1  0  0  0 ]
[ 0  0  0  0  0 ]
[ 0  0  0  1  0 ]

🔄 How to find the canonical form

Compute the chains of null spaces N(t), N(t²), … until they stabilize.
Count dimensions at each power to determine how many strings of each length are needed.
Build a string basis accordingly.
Write the matrix with respect to that basis.

Example (Five.III.2.21a):

Domain dimension 4; N(t) has dimension 2, N(t²) has dimension 4.
Index of nilpotency is 2.
String structure: two vectors map to zero after one application, two more after two applications.

Canonical form:

[ 0  0  0  0 ]
[ 1  0  0  0 ]
[ 0  0  0  0 ]
[ 0  0  1  0 ]

🧮 Properties and tests

🧮 Index of nilpotency

The index of nilpotency is the smallest power k such that t^k is the zero map.

For an n-dimensional space, the index is at most n (by Lemma 1.4).
The index can be zero only when V is the trivial space {0}.
To test if a matrix is nilpotent:
- For 2×2 matrices, check if the square is zero.
- For 3×3 matrices, check if the cube is zero.
- In general, check the n-th power for an n×n matrix.

🔢 Eigenvalues of nilpotent matrices

All eigenvalues of a nilpotent matrix are zero.
Reason: a nilpotent matrix sends all vectors to zero after some iterations, which conflicts with eigenvector behavior v ↦ λv unless λ = 0.
Another view: the canonical form is lower-triangular with zeros on the diagonal, so its characteristic polynomial is (−x)^n, which has only zero as a root.

⚠️ Common confusion: similar matrices

Two nilpotent matrices are similar if and only if they have the same canonical form (same string lengths).
Example (Five.III.2.35): these two matrices both have only zero eigenvalues but are not similar:
```
[ 0  0 ]    [ 0  0 ]
[ 0  0 ]    [ 1  0 ]
```
(They are already in canonical form and differ.)

🔧 Working with nilpotent maps

🔧 Checking nilpotence

A transformation n is nilpotent if and only if its matrix representation N (with respect to any basis) is nilpotent.
This is because only the zero matrix represents the zero map, so n^j is the zero map ⟺ N^j is the zero matrix.
Similar matrices represent the same map, so they have the same index of nilpotency.

🔧 Subspaces and nilpotence

If W is a subspace of V and t(W) ⊆ W, then the span of a t-string starting from v is t-invariant (closed under t).
The set {v, t(v), …, t^(k−1)(v)} is linearly independent (unless v = 0).
- Proof: write a linear combination equal to 0, apply t^(k−1) to both sides to isolate the first coefficient, then continue with t^(k−2), etc.

🔧 Dimension constraints

The dimension of R(t³) for a nilpotent map of index 4 can be any size ≥ 1.
- It must be at least 1 (there must be at least one string of length 4).
- It can be arbitrarily large (add more strings of length 4).
A map is nonsingular ⟺ it preserves dimension ⟺ R(t) = V.
- For a transformation t : V → V, nonsingular ⟺ onto ⟺ R(t) = V.

🔧 Left and right multiplication

Left multiplication by a block of subdiagonal ones shifts the rows of a matrix downward.
Right multiplication shifts the columns.
Distinct blocks act on distinct parts of the matrix.

Example:

[ 0  0 ] [ a  b ]   [ 0  0 ]
[ 1  0 ] [ c  d ] = [ a  b ]

📊 Summary table: canonical forms

String structure	Dimension	Index	Canonical form block structure
β₁ ↦ β₂ ↦ 0	2	2	One 2×2 block
β₁ ↦ β₂ ↦ 0, β₃ ↦ 0	3	2	One 2×2 block, one 1×1 block
β₁ ↦ β₂ ↦ β₃ ↦ 0	3	3	One 3×3 block
β₁ ↦ β₂ ↦ β₃ ↦ 0, β₄ ↦ β₅ ↦ 0	5	3	One 3×3 block, one 2×2 block

Don't confuse: the number of blocks vs. the index—the index is the length of the longest string, not the number of strings.

Jordan Form and Related Topics

Section IV: Jordan Form

🧭 Overview

🧠 One-sentence thesis

Jordan form provides a canonical representation for linear transformations by combining nilpotent structure with eigenvalue information, enabling systematic analysis of recurrences, matrix powers, and dynamical systems.

📌 Key points (3–5)

Jordan form structure: Combines a nilpotent matrix N with eigenvalue λ to create Jordan blocks J = N + λI, organizing transformations into canonical representatives.
Minimal vs characteristic polynomials: The minimal polynomial divides the characteristic polynomial and contains the same linear factors but possibly to lower powers; both determine Jordan structure.
Cayley-Hamilton theorem: Every matrix satisfies its own characteristic polynomial, linking algebraic properties to geometric structure.
Common confusion: Jordan form vs diagonalization—a matrix is diagonalizable if and only if its minimal polynomial has no repeated roots; otherwise Jordan form has subdiagonal 1's.
Applications span multiple domains: Linear recurrences, population dynamics, page ranking, and coupled oscillators all reduce to eigenvalue/Jordan form analysis.

🧩 Polynomials of Maps and Matrices

🔍 Cayley-Hamilton Theorem

The Cayley-Hamilton Theorem (Theorem 1.8) states that the minimal polynomial must contain the same linear factors as the characteristic polynomial, although possibly of lower degree but not of zero degree.

The minimal polynomial m(x) divides the characteristic polynomial c(x).
Both polynomials share the same roots (eigenvalues), but powers may differ.
Example: If c(x) = (x - 3)⁴, then m(x) could be (x - 3), (x - 3)², (x - 3)³, or (x - 3)⁴.

Why it matters: The minimal polynomial determines the smallest degree polynomial that annihilates the transformation, revealing the true complexity of the nilpotent structure.

🔢 Finding Minimal Polynomials

The excerpt demonstrates a systematic method:

Factor the characteristic polynomial to identify eigenvalues.
List all possible minimal polynomials (same factors, varying powers).
Test candidates by substituting the matrix until finding the zero matrix.

Example from the excerpt: For a 4×4 matrix with c(x) = x²(x² - 4), the minimal polynomial candidates are x(x² - 4) or x²(x² - 4). Testing shows m(x) = x(x² - 4) works.

Don't confuse: The characteristic polynomial always has degree n for an n×n matrix, but the minimal polynomial can have degree anywhere from 1 to n.

🎯 Special Cases

🎯 Nilpotent transformations

A transformation is nilpotent if and only if its only eigenvalue is zero.
The minimal polynomial has the form m(x) = xᵏ for some k.
Example: A map n is nilpotent if and only if nʲ is the zero map for some j.

🎯 Diagonal matrices

For diagonal matrices, the minimal polynomial is the product of distinct linear factors (x - tᵢ,ᵢ), with no repeats needed.

Example: For a diagonal matrix with entries 3, 3, 1, the characteristic polynomial is (x - 3)²(x - 1) but the minimal polynomial is just (x - 3)(x - 1).

🏗️ Jordan Canonical Form Construction

📐 The Jordan Form Structure

A Jordan block Jλ for eigenvalue λ has:

λ on the diagonal
1's on the subdiagonal
0's elsewhere

The full Jordan form is a block-diagonal matrix with one block per eigenvalue "string."

🔨 Construction Algorithm

The excerpt outlines this procedure:

Factor the characteristic polynomial: c(x) = (x - λ₁)^p₁(x - λ₂)^p₂...
For each eigenvalue λᵢ:
- Compute powers of (T - λᵢI) until the null space stabilizes
- This gives the generalized null space N^∞(t - λᵢ)
Track nullities: How the dimension of null spaces grows reveals the string structure
Build the nilpotent part: The pattern of nullity increases determines Nλᵢ
Add the eigenvalue: Jλᵢ = Nλᵢ + λᵢI

Example from the excerpt: For a matrix with eigenvalue λ = 0 and nullities 1, 2, the action is β₁ → β₂ → 0, giving Jordan form with one subdiagonal 1.

🧵 String Basis Interpretation

The nullity pattern tells us the action on a string basis:

If nullity jumps by 1 at each power: one long string
If nullity jumps by 2: two strings (or one string of length 2)
Stable nullity: strings have terminated

Don't confuse: The power at which the null space stabilizes is the index of nilpotency, not necessarily the size of the Jordan block.

🔄 Invariant Subspaces and Generalized Eigenspaces

🎪 Invariant Subspaces

A subspace M is t-invariant if t(m) ∈ M for all m ∈ M.

Key properties from the excerpt:

The intersection of t-invariant subspaces is t-invariant
The sum of t-invariant subspaces is t-invariant
The complement of an invariant subspace need not be invariant

🌟 Generalized Null Spaces

The generalized null space N^∞(t - λ) is where (t - λ)ᵏ eventually maps everything to zero.

It equals the null space of (t - λ)ᵏ for sufficiently large k
It is t-invariant (by Lemma 2.11)
The restriction of t - λ to N^∞(t - λ) is nilpotent

Example: Computing N(t - λ), N((t - λ)²), N((t - λ)³)... until no change reveals N^∞(t - λ).

📊 Applications

📈 Linear Recurrences

The excerpt shows recurrence relations f(n) = a₁f(n-1) + a₂f(n-2) + ... can be expressed in matrix form:

[f(n), f(n-1), ..., f(n-k+1)]ᵀ = T · [f(n-1), f(n-2), ..., f(n-k)]ᵀ

The characteristic equation of T determines solutions of the form f(n) = c₁λ₁ⁿ + c₂λ₂ⁿ + ...

Example: Fibonacci numbers satisfy f(n) = f(n-1) + f(n-2), leading to solutions involving powers of (1 + √5)/2.

🌍 Stable Populations

Population dynamics with migration between regions:

State vector: [park population, rest-of-world population]ᵀ
Transition matrix T encodes stay/leave probabilities
Equilibrium: eigenvector with eigenvalue 1
Growth/decline: other eigenvalues

Example from excerpt: Park with 90% retention and 1% immigration from outside has eigenvalues determining long-term population ratios.

🔍 Page Ranking

The excerpt describes Google's PageRank as an eigenvalue problem:

G = αH + (1-α)S where H is the hyperlink matrix, S is uniform
Steady-state ranking is the eigenvector with eigenvalue 1
Parameter α (typically 0.85) balances link-following vs random jumps

⚡ Method of Powers

Iterative eigenvalue finding:

Start with vector v
Repeatedly compute v ← Tv
Normalize to prevent overflow
The ratio (Tv)·v / v·v converges to the largest eigenvalue

Limitation: Only finds the dominant eigenvalue; fails if starting vector has no component in that eigenspace.

🔧 Computational Techniques

🧮 Matrix Calculations

The excerpt demonstrates systematic computation:

Task	Method	Example
Characteristic polynomial	det(T - xI) for triangular matrices	Read off diagonal
Minimal polynomial	Test candidates by substitution	Compute T - λI, (T - λI)², ...
Null spaces	Solve (T - λI)ᵏv = 0	Gaussian elimination
Jordan form	Track nullity growth	Pattern reveals string structure

🎯 Verification Checks

Similar matrices have equal characteristic polynomials
Similar matrices have equal minimal polynomials
f(A) = Pf(B)P⁻¹ if A = PBP⁻¹ (polynomial functions preserve similarity)
Trace is invariant under similarity (sum of eigenvalues)
Determinant is invariant under similarity (product of eigenvalues)

Don't confuse: Similarity (A = PBP⁻¹) vs equivalence (A = PBQ for possibly different P, Q). Only similarity preserves eigenvalues.

🔬 Special Results

From the excerpt:

A matrix is invertible ⟺ the constant term of its minimal polynomial is nonzero
A transformation is nilpotent ⟺ its only eigenvalue is zero
Minimal polynomial degree ≤ characteristic polynomial degree (= n for n×n matrices)
For projections π² = π, the minimal polynomial is x² - x (except for trivial cases)