Linear Algebra

Organizing Information

1.1 Organizing Information

🧭 Overview

🧠 One-sentence thesis

Organizing information by explicitly specifying the order of variables transforms ambiguous columns of numbers into unambiguous function inputs, making functions of several variables tractable.

📌 Key points (3–5)

The ambiguity problem: a column of numbers like (1 2 3) has no meaning until we specify which variable each position represents.
How ordering solves it: writing the function itself as an ordered tuple (e.g., (24 80 35)) and labeling the input order (e.g., subscript B) removes all ambiguity.
Same numbers, different meanings: the identical column (1 2 3) can produce completely different outputs (334 vs 264) depending on the chosen order.
Common confusion: don't confuse the function notation with the input—changing the variable order changes both the function's notation and how inputs are interpreted.
Why it matters: explicit ordering is essential for readers to understand what is written and is a way of organizing information for linear algebra.

🧩 The ambiguity of unordered inputs

🧩 Why a column of numbers is not enough

The excerpt asks: what is V(1 2 3)?
Without knowing the order of variables, we cannot compute an output.
The column could mean:
- 1 share of Google, 2 of Netflix, 3 of Apple, or
- 1 share of Netflix, 2 of Google, 3 of Apple, or
- any other permutation.
Do we multiply the first number by 24 or by 35? No one has specified.

📝 The tedious alternative

We could write "1 share of Google, 2 shares of Netflix, and 3 shares of Apple" every time.
The excerpt calls this "unacceptably tedious."
The goal: use ordered triples of numbers to concisely describe inputs—but only if we make the order explicit.

🔢 Notation that encodes order

🔢 Writing the function as an ordered tuple

The function V itself can be denoted as an ordered triple of numbers that reminds us what to do to each number from the input.

Instead of writing V(x, y) = 3x + 5y in one line, we can write V as a tuple that matches the chosen variable order.
Example from the excerpt: if the order is (Google, Apple, Netflix), write V as (24 80 35).

🏷️ Subscripts to label the order

The excerpt uses subscripts like B and B′ to name different orderings.
These subscripts are "just symbols" but the distinction is critical.
The same column of numbers with different subscripts represents different inputs.

Notation	Order chosen	Interpretation of (1 2 3)	Calculation	Output
V(1 2 3)_B	(G A N)	1 share G, 2 shares A, 3 shares N	24(1) + 80(2) + 35(3)	334
V(1 2 3)_B′	(N A G)	1 share N, 2 shares A, 3 shares G	35(1) + 80(2) + 24(3)	264

Don't confuse: the column (1 2 3) looks identical, but the subscript changes its meaning entirely.

📊 Example: stock portfolio value

📊 The setup

You own stock in three companies: Google, Netflix, and Apple.
The value V of your portfolio depends on the number of shares you own: s_N, s_G, s_A.
The formula is: 24s_G + 80s_A + 35s_N.

📊 Applying the ordering system

Order B = (G A N):
- Write V as (24 80 35).
- Input (1 2 3)_B means s_G=1, s_A=2, s_N=3.
- Compute: 24(1) + 80(2) + 35(3) = 334.
Order B′ = (N A G):
- Write V as (35 80 24).
- Input (1 2 3)_B′ means s_N=1, s_A=2, s_G=3.
- Compute: 35(1) + 80(2) + 24(3) = 264.
The excerpt emphasizes: "V assigns completely different numbers to the same columns of numbers with different subscripts."

🔄 Six possible orderings

There are six different ways to order three companies.
Each way gives:
- Different notation for the same function V.
- A different way of assigning numbers to columns of three numbers.
It is critical to make clear which ordering is used if the reader is to understand what is written.

🎯 Why this matters

🎯 Organizing information

The excerpt states: "Doing so is a way of organizing information."
Explicit ordering transforms ambiguous data into unambiguous function inputs.
This is the foundation for making "problems involving linear functions of many variables easy (or at least tractable)."

🎯 Hint at a central idea

The excerpt notes that the symbols B and B′ were chosen "because we are hinting at a central idea in the course: choosing a basis."
The subscripts are not arbitrary—they foreshadow a key concept in linear algebra.

What are Vectors?

1.2 What are Vectors?

🧭 Overview

🧠 One-sentence thesis

Vectors are any objects that can be added together and multiplied by scalars, and this general concept encompasses far more than just columns of numbers.

📌 Key points (3–5)

What vectors are: things you can add and scalar multiply, not just stacks of numbers.
Many kinds of vectors: numbers, n-vectors (columns of numbers), polynomials, power series, and functions are all examples of different kinds of vectors.
Common confusion: vectors of different kinds cannot be added to each other—you can only add vectors of the same kind.
Zero vectors: every kind of vector has its own zero vector, produced by scalar multiplying any vector by 0.
Why this matters: recognizing that many mathematical objects are vectors allows you to organize information and apply linear algebra techniques broadly.

🧩 What makes something a vector

🧩 The defining property

Vectors are things you can add and scalar multiply.

A vector is not defined by its appearance (columns of numbers, polynomials, etc.) but by what you can do with it.
If you can add two objects together and multiply them by numbers (scalars), they are vectors.
This is a much broader concept than the "arrows" or "columns of numbers" often introduced first.

🔢 Two operations required

Vector addition: combining two vectors of the same kind to produce a new vector of that kind.

Example: adding two 3-vectors: (1, 1, 0) + (0, 1, 1) = (1, 2, 1).
Example: adding two polynomials: if p(x) = 1 + x − 2x² + 3x³ and q(x) = x + 3x² − 3x³ + x⁴, then p(x) + q(x) = 1 + 2x + x² + x⁴.

Scalar multiplication: multiplying a vector by a number to produce a new vector of the same kind.

Example: 4 times the 3-vector (1, 1, 0) equals (4, 4, 0).
Example: one-third times the 3-vector (1, 1, 0) equals (1/3, 1/3, 0).
Scalar multiplication extends naturally from repeated addition: 4x means x + x + x + x.

🎭 Different kinds of vectors

🎭 Five examples from the excerpt

The excerpt lists five distinct kinds of vectors:

Kind	Example	Addition example
Numbers	3 and 5	3 + 5 = 8
3-vectors	(1, 1, 0)	(1, 1, 0) + (0, 1, 1) = (1, 2, 1)
Polynomials	1 + x − 2x² + 3x³	p(x) + q(x) combines like terms
Power series	1 + x + (1/2!)x² + (1/3!)x³ + ...	f(x) + g(x) adds corresponding coefficients
Functions	e^x and e^(−x)	e^x + e^(−x) = 2 cosh x

Each kind follows the same pattern: addition stems from the rules for adding numbers.
The excerpt emphasizes that these are different kinds—stacks of numbers are not the only vectors.

⚠️ You cannot mix kinds

Don't confuse: just because two things are both vectors does not mean you can add them.
The excerpt asks: "What possible meaning could the following have? (9, 3) + e^x"
You should only add vectors of the same kind.
Example: you can add two polynomials, or two 3-vectors, but not a 3-vector to a polynomial.

🔵 The zero vector

🔵 Every kind has its own zero

The zero vector is produced by scalar multiplying any vector by the number 0.
Each of the five kinds of vectors has a different zero vector:

Kind	Zero vector
Numbers	0 (the zero number)
3-vectors	(0, 0, 0) (the zero 3-vector)
Polynomials	0 (the zero polynomial)
Power series	0 + 0x + 0x² + 0x³ + ... (the zero power series)
Functions	0 (the zero function)

Don't confuse: these are five very different kinds of zero, even though they may all be written as "0."
The zero vector for each kind behaves consistently: adding it to any vector of that kind leaves the vector unchanged.

📦 Organizing information with vectors

📦 Choosing how to describe inputs

The excerpt's opening example shows that the same function V can be denoted differently depending on the order chosen for variables.
Using the order (G, A, N) and naming it B, the notation V with input (1, 2, 3) subscript B means calculate 24(1) + 80(2) + 35(3) = 334.
Using the order (N, A, G) and naming it B′, the notation V with input (1, 2, 3) subscript B′ means calculate 35(1) + 80(2) + 24(3) = 264.
The subscripts B and B′ are symbols reminding the reader how to interpret the column of numbers.

🎯 Why order matters

The same column of numbers (1, 2, 3) is assigned completely different values by V depending on the subscript.
There are six different ways to order three companies, each giving different notation for the same function V.
Critical distinction: it is essential to make clear which ordering is used so the reader can understand what is written.
This choice of order is an example of choosing a basis, a central idea in linear algebra.

🎯 Freedom in organizing information

The excerpt hints at a main lesson: you have considerable freedom in how you organize information about certain functions.
You can use that freedom to uncover aspects of functions that don't change with the choice, make calculations maximally easy, and approximate functions of several variables.
Don't confuse: the example of choosing an order is an example of choosing a basis, not the full definition of "basis"—you cannot learn the definition from this example alone, just as you cannot learn the definition of "bird" by seeing only a penguin.

What are Linear Functions?

1.3 What are Linear Functions?

🧭 Overview

🧠 One-sentence thesis

Linear algebra studies functions of vectors that respect vector addition and scalar multiplication—called linear functions—which obey the additivity and homogeneity properties and can often be represented as matrices.

📌 Key points (3–5)

What linear functions are: functions from vectors to vectors that obey two properties: additivity L(u + v) = L(u) + L(v) and homogeneity L(cu) = cL(u).
Why linearity matters: most functions do not obey these properties; linear algebra focuses on the special class that does.
Common confusion: linearity means the function respects vector operations—it doesn't matter whether you add/scale first then apply L, or apply L first then add/scale the outputs.
How many problems reduce to one form: questions (A)–(D) in the excerpt all become "find vector X such that L(X) = B" where L is a known linear transformation.
What matrices are: matrices are linear functions of a certain kind that result from organizing information related to linear functions.

🔄 Functions of vectors

🔄 What functions of vectors look like

In calculus, functions took a real number x and output a real number f(x).
In linear algebra, functions take vectors as inputs and produce vectors as outputs.
Since vectors can be numbers, n-vectors, polynomials, power series, or functions, the input/output types vary widely.

🎯 Five disguised examples

The excerpt presents five questions that are all secretly about functions of vectors:

Question	Input vector type	Output vector type	Function description
(A) 10x = 3	number	number	multiply by 10
(B) 4 × (1,1,0) × u = (0,1,1)	3-vector	3-vector	cross product with a fixed vector
(C) integrals of p equal 0 and 1	polynomial	2-vector	two definite integrals
(D) x d/dx f(x) − 2f(x) = 0	power series	power series	differential operator
(E) 4x² = 1	number	number	square then multiply by 4

All have the form "What vector X satisfies f(X) = B?" with f known, B known, X unknown.
Example (C) is especially important: the inputs are functions (polynomials), showing that "vectors" can themselves be functions.

⚙️ The defining properties of linearity

⚙️ Additivity

Additivity: L(u + v) = L(u) + L(v)

The function L respects vector addition.
It doesn't matter if you:
- first add u and v, then input their sum into L, or
- first input u and v into L separately, then add the outputs.
Both paths give the same result.

⚙️ Homogeneity

Homogeneity: L(cu) = cL(u)

The function L respects scalar multiplication.
It doesn't matter if you:
- first multiply u by scalar c, then input into L, or
- first input u into L, then multiply the output by c.
Both paths give the same result.

🔑 Why these two properties matter

Most functions of vectors do not obey these requirements.
Example from the excerpt: if f(x) = x², then f(1 + 1) = 4, but f(1) + f(1) = 2, so f is not additive.
Linear algebra is the study of the special functions that do obey additivity and homogeneity.
Together, these two properties are called linearity.

🧮 Linear combinations

A sum of multiples of vectors cu + dv is called a linear combination of u and v.

Linearity implies: L(cu + dv) = cL(u) + dL(v).
This "feels a lot like the regular rules of algebra for numbers."
Don't confuse: we write Lu (like multiplication), but "uL" makes no sense—order matters.

🎓 Terminology and notation

🎓 Equivalent names

The excerpt lists three interchangeable terms:

Function
Transformation
Operator

All refer to the same concept when discussing linear maps.

🎓 Notation shorthand

For linear maps L, we often write Lu instead of L(u).
This is because linearity lets us treat L(u) as "multiplying" vector u by operator L.
The linearity property makes this notation consistent with algebraic rules.

🔗 Connection to solving equations

🔗 The standard form

Questions (A)–(D) can all be restated as Lv = w, where:
- v is an unknown vector
- w is a known vector
- L is a known linear transformation
To verify this, check the rules for adding vectors (inputs and outputs) and confirm linearity of L.

🔗 The derivative operator example

Example: The derivative operator is linear.

For any two functions f(x), g(x) and any number c, calculus taught:

d/dx (cf) = c · d/dx f (homogeneity)
d/dx (f + g) = d/dx f + d/dx g (additivity)

If we view functions as vectors (with function addition and scalar multiplication), these familiar derivative properties are exactly the linearity property of linear maps.
This shows that differential equations are a special case of linear algebra problems.

🔗 Solving linear equations

Solving Lv = w often amounts to solving systems of linear equations (covered in Chapter 2 of the source material).
This is a central skill in linear algebra.

🧱 What matrices are

🧱 Matrices as organized information

Matrices are linear functions of a certain kind.

Matrices appear "almost ubiquitously in linear algebra" because:
- Matrices are the result of organizing information related to linear functions.
This idea takes time to develop; the excerpt notes it will be explored through studying systems of linear equations.
The excerpt provides an elementary example in Section 1.1 (not included in this excerpt).

So, What is a Matrix?

1.4 So, What is a Matrix?

🧭 Overview

🧠 One-sentence thesis

Matrices are linear functions that organize information about systems of linear equations, allowing us to transform vectors in a way that preserves addition and scalar multiplication.

📌 Key points (3–5)

What a matrix is: a linear function represented by an array of numbers that takes vectors as inputs and produces vectors as outputs.
Why matrices matter: they organize information related to linear functions and provide efficient notation for solving systems of linear equations.
How matrices work: multiplying a matrix by a vector produces a linear combination of the matrix's columns.
Common confusion: matrix-vector multiplication vs systems of equations—they are two notations for the same thing; the matrix equation encodes the entire system.
Matrix multiplication as composition: placing two matrix operations end-to-end corresponds to multiplying the matrices together.

🔢 From equations to matrices

🍎 The fruit container problem

The excerpt introduces matrices through a concrete scenario:

A room contains x bags and y boxes of fruit
Each bag: 2 apples and 4 bananas
Each box: 6 apples and 8 bananas
Total: 20 apples and 28 bananas
Goal: find x and y

This translates to a system of linear equations:

2x + 6y = 20
4x + 8y = 28

📝 What makes equations "linear"

A system of linear equations is a collection of equations in which variables are multiplied by constants and summed, with no variables multiplied together.

Characteristics:

No powers of variables (like x squared or y to the fifth)
No non-integer or negative powers (like y to the one-seventh or x to the negative three)
No products of variables (like xy)

Example: The fruit problem satisfies all these conditions—only constants times single variables, then added together.

🔄 Two perspectives on information

The excerpt highlights that container information can be stored two ways:

In terms of apples and bananas (the output)
In terms of bags and boxes (the input)

Going from (ii) to (i): easy—multiply containers by fruit per container.

Going from (i) to (ii): harder—feels like the opposite of multiplication, i.e., division.

Matrix notation clarifies what we are "multiplying" and "dividing" by.

🧮 Matrix notation and definition

📐 Building the matrix representation

The excerpt shows how to rewrite the system as vector equations:

Starting with:

2x + 6y = 20
4x + 8y = 28

Rewrite as a single vector equation:

(2x + 6y, 4x + 8y) = (20, 28)

Factor out x and y:

x(2, 4) + y(6, 8) = (20, 28)

🎯 The matrix function definition

The function represented by the matrix (2 6; 4 8) is defined by: (2 6; 4 8)(x, y) := x(2, 4) + y(6, 8).

Key insight: The matrix takes a 2-vector as input and produces a 2-vector as output.

The general rule for a 2×2 matrix:

(p q; r s)(x, y) := (px + qy, rx + sy) = x(p, r) + y(q, s)

📦 Bigger matrices work the same way

Example of a 3×4 matrix:

(1 0 3 4; 5 0 3 4; -1 6 2 5)(x, y, z, w) := x(1, 5, -1) + y(0, 0, 6) + z(3, 3, 2) + w(4, 4, 5)

The pattern: each input component multiplies one column of the matrix, then all columns are added together.

🎪 The concise problem statement

The fruit problem becomes:

What vector (x, y) satisfies (2 6; 4 8)(x, y) = (20, 28)?

This has the form Lv = w:

The matrix encodes "fruit per container"
The equation is roughly "fruit per container times number of containers equals fruit"
To solve for number of containers, we want to "divide" by the matrix

🌟 Matrices as linear functions

🔑 The column space concept

The column space is the set of all possible outputs of a matrix times a vector (also called the image of the linear function defined by the matrix).

Why this matters: The second form of the output (px + qy, rx + sy) = x(p, r) + y(q, s) tells us that all possible outputs are sums of the columns of the matrix multiplied by scalars.

Example: For the fruit matrix (2 6; 4 8), every possible output is some combination of (2, 4) and (6, 8).

✅ Verifying linearity

Matrices satisfy both linearity properties:

Property 1 (Homogeneity):

(2 6; 4 8)[λ(x, y)] = λ[(2 6; 4 8)(x, y)]

The excerpt shows the detailed verification:

Left side: (2 6; 4 8)(λa, λb) = λa(2, 4) + λb(6, 8) = (2λa + 6λb, 4λa + 8λb)
Right side: λ[(2 6; 4 8)(a, b)] = λ[a(2, 4) + b(6, 8)] = λ(2a + 6b, 4a + 8b) = (2λa + 6λb, 4λa + 8λb)
The underlined expressions are identical

Property 2 (Additivity):

(2 6; 4 8)[(x, y) + (x', y')] = (2 6; 4 8)(x, y) + (2 6; 4 8)(x', y')

The excerpt verifies:

Left side: (2 6; 4 8)(a+c, b+d) = (a+c)(2, 4) + (b+d)(6, 8) = (2a + 2c + 6b + 6d, 4a + 4c + 8b + 8d)
Right side: a(2, 4) + b(6, 8) + c(2, 4) + d(6, 8) = (2a + 2c + 6b + 6d, 4a + 4c + 8b + 8d)
Both expressions match

📋 Matrix equations

Any equation of the form Mv = w with M a matrix and v, w n-vectors is called a matrix equation.

The excerpt emphasizes: matrices are examples of the linear operators that appear in algebra problems, and Chapter 2 is about efficiently solving systems of linear equations (equivalently, matrix equations).

🔗 Matrix multiplication as composition

🏭 Machines in series

The excerpt uses a machine metaphor: what happens if we place two machines end-to-end?

The output of the first machine becomes the input to the second:

Input: (x, y)
First machine (2 6; 4 8): produces (2x + 6y, 4x + 8y)
Second machine (1 2; 0 1): takes that output and produces (1·(2x + 6y) + 2·(4x + 8y), 0·(2x + 6y) + 1·(4x + 8y)) = (10x + 22y, 4x + 8y)

🎯 The single equivalent machine

The same final result could be achieved with a single machine that directly produces (10x + 22y, 4x + 8y).

Matrix multiplication notation for this composition: (1 2; 0 1)(2 6; 4 8) = (10 22; 4 8).

Key insight: Matrix multiplication represents composition of functions—applying one linear transformation after another is equivalent to applying a single combined transformation.

Don't confuse: Matrix multiplication is not element-by-element multiplication; it encodes the composition of the two linear functions.

Matrix Multiplication is Composition of Functions

1.4.1 Matrix Multiplication is Composition of Functions

🧭 Overview

🧠 One-sentence thesis

Matrix multiplication represents the composition of linear functions, where chaining two linear transformations end-to-end corresponds to multiplying their matrices.

📌 Key points (3–5)

What matrix multiplication represents: placing two machines (linear transformations) end-to-end, feeding the output of the first into the second.
How composition works: if function f maps U to V and function g maps V to W, then g ∘ f maps U directly to W by applying f first, then g.
Matrix notation for composition: the composition g ∘ f is computed by multiplying the matrices representing g and f.
Common confusion: matrices are just notation—linear algebra is fundamentally about linear functions, and matrices only appear when we make specific notational choices.
Why it matters: matrix multiplication is the computational tool for finding the composition of linear functions.

🔗 Chaining transformations end-to-end

🏭 The two-machine setup

The excerpt uses a metaphor of "expensive machines" placed end-to-end:

The output of the first machine becomes the input to the second machine.
Example: Start with (x, y) → first machine produces (2x + 6y, 4x + 8y) → second machine takes those outputs and produces (1·(2x + 6y) + 2·(4x + 8y), 0·(2x + 6y) + 1·(4x + 8y)) = (10x + 22y, 4x + 8y).

🎯 Single-machine equivalent

The same final result can be achieved with a single machine that directly maps (x, y) to (10x + 22y, 4x + 8y).

This single machine represents the composition of the two original transformations.
Matrix notation captures this: multiplying the two matrices gives the matrix of the composed transformation.

🔢 Matrix multiplication as composition notation

🧮 The multiplication formula

The excerpt shows:

Matrix multiplication: (1 2; 0 1) times (2 6; 4 8) equals (10 22; 4 8)

The first matrix represents the second machine (applied second).
The second matrix represents the first machine (applied first).
The product matrix represents the combined effect.

🔄 Function composition notation

Composition of functions: if f : U → V and g : V → W, then g ∘ f : U → W where (g ∘ f)(u) = g(f(u)).

The notation g ∘ f means "apply f first, then apply g to the result."
Matrix multiplication is the computational tool for this composition when the functions are linear.
Don't confuse: the order matters—g ∘ f means g comes after f, just as the matrix for g is written to the left of the matrix for f.

🧩 Matrices are just notation

🎭 The fundamental idea

The excerpt emphasizes:

"Linear algebra is about linear functions, not matrices."

Matrices only appear when we make specific notational choices.
The same linear function can be represented by different matrices depending on the notation chosen.

📝 Example: the derivative operator

The excerpt illustrates with the differential operator (d/dx + 2):

This operator takes quadratic functions (of the form ax² + bx + c) and produces other quadratic functions.
Notational choice: denote ax² + bx + c as a column with entries a, b, c (with subscript B to mark this convention).
Under this notation, the operator becomes a matrix with entries (2 0 0; 2 2 0; 0 1 2).
The matrix representation depends entirely on the notational convention chosen for representing the functions.

⚠️ Don't confuse notation with substance

The matrix is not the operator itself; it is a representation that depends on how we write down the inputs and outputs.
Different notational conventions (different "subscripts B") would produce different matrices for the same operator.
The excerpt states that "matrices only get involved in linear algebra when certain notational choices are made."

🔗 Connection to solving equations

📐 Matrix equations

Matrix equation: any equation of the form M v = w with M a matrix and v, w n-vectors.

Systems of linear equations can be written as matrix equations.
The excerpt notes that Chapter 2 is about efficiently solving such systems.
Matrix multiplication (composition) becomes essential when solving more complex systems involving multiple transformations.

1.4.2 The Matrix Detour

🧭 Overview

🧠 One-sentence thesis

Matrices are not the subject of linear algebra—they are merely notational tools that arise when we choose a particular way to represent linear functions, and the same linear function can be represented by different matrices depending on the notational convention.

📌 Key points (3–5)

What matrices really are: notational representations of linear functions that depend on how we choose to denote vectors.
The matrix detour workflow: sometimes a linear equation is too hard to solve directly, but organizing information into a matrix equation makes finding solutions tractable.
Same function, different matrices: one linear function can be represented by many different matrices depending on the notational convention for vectors.
Common confusion: matrices are not the core object—linear functions are; matrices only appear after we pick a way to write vectors as n-vectors.
Why notation matters: changing how we denote vectors (e.g., ordering coefficients differently) changes the matrix but not the underlying linear function.

🎯 The central idea: matrices as notation

🎯 Linear algebra is about functions, not matrices

The excerpt emphasizes: "Linear algebra is about linear functions, not matrices."
Matrices only get involved when certain notational choices are made.
The presentation is meant to keep you thinking about this idea constantly throughout the course.

📝 How matrices arise

Matrices come into linear algebra when we choose a particular way to denote vectors as n-vectors.

When we pick a notational convention for vectors (e.g., writing a quadratic function as a column of coefficients), the linear operator automatically gets a matrix representation.
Example: the differential operator d/dx + 2 becomes a matrix only after we decide how to write quadratic functions as 3-vectors.

🔄 The matrix detour workflow

🔄 Why take the detour

The excerpt describes a process:

Start with a linear equation that may be too hard to solve directly.
Organize information by choosing a notational convention.
Reformulate the equation as a matrix equation.
The process of finding solutions becomes tractable.

🛠️ When the detour is useful

The excerpt notes that sometimes the detour is unnecessary (e.g., if you already know how to anti-differentiate).
But the general idea is that organizing information into matrix form can make solving linear equations more systematic.

📐 Example: the differential operator

📐 First notational convention (B)

The excerpt works through a detailed example with the equation:

(d/dx + 2)f = x + 1
Unknown: f (a quadratic function)

Convention B: denote ax² + bx + c as a column vector [a, b, c] with subscript B.

Applying the operator:

(d/dx + 2)(ax² + bx + c) = 2ax² + (2a + 2b)x + (b + 2c)
This becomes [2a, 2a + 2b, b + 2c] in notation B.

The induced matrix:

The operator becomes a 3×3 matrix with entries [2, 0, 0; 2, 2, 0; 0, 1, 2].
The original equation becomes: matrix times [a, b, c] = [0, 1, 1].

System of equations:

2a = 0
2a + 2b = 1
b + 2c = 1

Solution: [0, 1/2, 1/4] in notation B, which represents (1/2)x + 1/4.

📐 Second notational convention (B′)

The excerpt then shows a different convention to prove the point about changeability.

Convention B′: denote a + bx + cx² as [a, b, c] with subscript B′ (note the reversed order of terms).

Applying the same operator:

(d/dx + 2)(a + bx + cx²) = (2a + b) + (2b + 2c)x + 2cx²
This becomes [2a + b, 2b + 2c, 2c] in notation B′.

The induced matrix:

The operator becomes a different 3×3 matrix: [2, 1, 0; 0, 2, 2; 0, 0, 2].
The equation becomes: different matrix times [a, b, c] = [1, 1, 0].

Solution: [1/4, 1/2, 0] in notation B′, which represents 1/4 + (1/2)x (the same function as before).

🔍 Key observation

Aspect	Convention B	Convention B′
Vector notation	ax² + bx + c → [a, b, c]	a + bx + cx² → [a, b, c]
Matrix for d/dx + 2	[2, 0, 0; 2, 2, 0; 0, 1, 2]	[2, 1, 0; 0, 2, 2; 0, 0, 2]
Solution vector	[0, 1/2, 1/4]	[1/4, 1/2, 0]
Actual function	(1/2)x + 1/4	1/4 + (1/2)x (same!)

Don't confuse: different matrices and different n-vectors can represent the same linear function and the same vector—it all depends on the notational convention.

🎓 The main lesson

🎓 One function, many matrices

The excerpt explicitly states:

"We have obtained a different matrix for the same linear function."
"We have obtained a different 3-vector for the same vector."
"One linear function can be represented (denoted) by a huge variety of matrices."

🎓 What determines the representation

The representation only depends on how vectors are denoted as n-vectors.

The underlying linear function does not change.
Only the matrix notation changes when we change how we write vectors.
This is why the course emphasizes thinking about linear functions, not matrices.

Review Problems for Linear Algebra Foundations

1.5 Review Problems

🧭 Overview

🧠 One-sentence thesis

These review problems consolidate the foundational skills needed for linear algebra—understanding how the same linear function or vector can be represented by different matrices or n-vectors depending on the chosen basis, and practicing core operations like matrix multiplication, composition, and recognizing special matrix types.

📌 Key points (3–5)

Representation depends on notation: the same linear function can be represented by different matrices, and the same vector by different n-vectors, depending on the basis or coordinate system chosen.
Linear operators map between vector spaces: problems involve identifying domain and codomain sets (V and W) and understanding when operators preserve linearity.
Matrix multiplication encodes composition: applying one linear operator after another corresponds to multiplying their matrices, and the order matters.
Common confusion: uniqueness vs. representation—a function or operator is unique, but its matrix representation changes with the choice of basis or ordering.
Special matrices have special properties: diagonal matrices commute under multiplication; identity-like operators have unique representations.

🔄 Representation and notation

🔄 Same function, different matrices

The excerpt emphasizes a crucial point:

"One linear function can be represented (denoted) by a huge variety of matrices. The representation only depends on how vectors are denoted as n-vectors."

A linear function is an abstract object; its matrix form depends on the coordinate system.
Example from the excerpt: the differential equation (d/dx + 2)f = x + 1 yields different matrices and different coefficient vectors when the basis changes.
The solution vector changes from one triple to another (e.g., (1/4, 1/2, 0)) because it represents the same polynomial (1/4 + 1/2 x) in a different basis.
Don't confuse: the function itself is unchanged; only its numerical representation varies.

🗂️ Domain and codomain

Every linear operator L maps vectors from a set V to a set W, written L: V → W.
Problem 1 asks: for each example, identify the sets V and W where input v and output w live.
Understanding domain and codomain is essential for checking whether operations are well-defined.

🧮 Matrix operations and composition

🧮 Matrix multiplication as composition

Problem 6 explores why matrix multiplication is defined the way it is:

If you apply matrix N to vector v, then apply matrix M to the result, you get M(Nv).
The composition MN must also be a linear operator.
The rule for computing entries of MN follows from the requirement that (MN)v = M(Nv) for all v.
The excerpt asks: "Is there any sense in which these rules for matrix multiplication are unavoidable, or are they just a notation?"
Answer direction: the rules are unavoidable if you want composition of linear operators to correspond to matrix multiplication.

🔲 Diagonal matrices

Problem 7 introduces diagonal matrices:

"If all the off-diagonal entries of a matrix vanish, we say that the matrix is diagonal."

Diagonal entries: m_ii (row and column index the same).
Off-diagonal entries: m_ij with i ≠ j.
For an n×n matrix: n diagonal entries, n² - n off-diagonal entries.
Special property: if D and D′ are both diagonal, then DD′ = D′D (they commute).
The excerpt asks whether this property holds for arbitrary matrices (it does not) or when only one matrix is diagonal (generally no).
Diagonal matrices will play a recurring special role throughout linear algebra.

🔍 Uniqueness and special operators

🔍 Identity and zero operators

Problem 8 asks for unique linear operators with special properties:

Operator type	Property	Uniqueness	Matrix form
Identity	Output equals input for all inputs	Unique	Yes (identity matrix)
Zero/constant	Output is the same regardless of input	Unique	Yes (zero matrix)

To prove uniqueness, the hint suggests proof by contradiction: assume two such operators exist, then show they must be identical.
Example: if L₁ and L₂ both satisfy "output = input," then for any v, L₁(v) = v = L₂(v), so L₁ = L₂.

🎯 Cross product and torque

Problem 2 applies linear algebra to physics:

Torque τ = r × F (cross product of position vector and force vector).
Cross product formula for 3-vectors is given explicitly.
The problem asks: find force F to produce a given torque with a given wrench position r.
Key insight: infinitely many solutions exist because you can add any multiple of r to a solution and still get the same torque (forces along the wrench don't create rotation).
This illustrates that linear systems can have multiple solutions.

🧩 Functions, ordering, and representation

🧩 Functions on unordered sets

Problem 9 explores how ordering affects representation:

A set S = {∗, ?, #} has no inherent order; {∗, ?, #} = {#, ?, ∗}.
A function with domain S and codomain ℝ assigns one real number to each symbol.
To write the function as a triple of numbers, you must choose an ordering.
Different orderings give different triples, but they represent the same function.
Don't confuse: the function is the assignment rule; the triple is just one way to write it down.

📐 Time-dependent differential equations

Problem 4 contrasts two differential equations:

(d/dt)f = 2f has constant proportionality; solutions are exponential: f(t) = f(0)e^(2t).
(d/dt)f = 2t·f has time-dependent proportionality.
Both can be rewritten as Df = 0 where D is a linear operator (by moving terms to one side).
The question asks whether the second DE also describes exponential growth (it does not, because the "constant" changes with time).

🍎 Applied problem: nutrition representation

Problem 5 involves translating between representations:

Pablo represents a barrel as (sugar, fruit) where sugar = 2·(apples) + (oranges in sugar units).
Everyday representation: (apples, oranges).
Task: find the linear operator (matrix) relating the two representations.
Hint: let λ = sugar per apple; then sugar = λ·apples + 2λ·oranges.
This illustrates how different coordinate systems require transformation matrices.

🔬 Background skills

🔬 Prerequisites emphasized

The excerpt stresses that "understanding sets, functions and basic logical operations is a must to do well in linear algebra."

Gaussian Elimination

2.1 Gaussian Elimination

🧭 Overview

🧠 One-sentence thesis

Gaussian elimination is an algorithm that systematically simplifies systems of linear equations into Reduced Row Echelon Form (RREF) through elementary row operations, making it straightforward to read off all solutions.

📌 Key points (3–5)

What the algorithm does: transforms an augmented matrix into RREF using three elementary row operations (row swap, scalar multiplication, row addition) without changing the solution set.
Why RREF matters: it is a maximally simplified form with as many zeros and ones as possible, from which solutions can be directly read off.
Pivot vs. non-pivot variables: pivot variables (those with a 1 in a pivot position) are expressed in terms of non-pivot (free) variables; the number of free variables determines the dimension of the solution set.
Common confusion: RREF is not always the identity matrix—redundant or inconsistent equations, or more unknowns than equations, prevent reaching the identity; the algorithm still produces a canonical simplest form.
Solution structure: every solution set has the form "one particular solution plus any combination of homogeneous solutions," indexed by the free variables.

📝 Augmented matrix notation

📝 What an augmented matrix is

Augmented matrix: a compact notation for a system of linear equations, consisting of the coefficient matrix with a vertical line separating it from the constant terms on the right-hand side.

Why use it: more efficient than writing out full equations or matrix equations.
Example: The system
x + y = 27
2x − y = 0
becomes the augmented matrix
(1 1 | 27)
(2 −1 | 0).
The same system can also be written as a matrix equation:
(1 1)(x) = (27)
(2 −1)(y) ( 0)
All three notations represent the same problem.

🔢 Index conventions

For r equations in k unknowns, the augmented matrix has r rows and k+1 columns (k for coefficients, 1 for constants).
Entries left of the divide carry two indices: superscript for row number, subscript for column number.
Important: superscripts here are not exponents.

🧩 Interpretation as vector combinations

The augmented matrix can be read as: "find which combination of the column vectors (left of the divide) adds up to the vector on the right."
Example:
x(1) + y( 1) = (27)
(2) (−1) ( 0)
means "find scalars x and y so that x times the first column plus y times the second column equals the right-hand side."

🔄 Elementary row operations and equivalence

🔄 The three EROs

Elementary Row Operations (EROs) change the augmented matrix without changing the solution set:

Operation	What it does	Why it preserves solutions
Row Swap	Exchange any two rows	Reordering equations doesn't change solutions
Scalar Multiplication	Multiply any row by a non-zero constant	Scaling an equation doesn't change its solutions
Row Addition	Add one row to another row	Adding equations produces an equivalent equation

These operations are the only moves allowed in Gaussian elimination.
Don't confuse: multiplying by zero is not allowed (it would lose information).

↔️ Row equivalence (∼)

Two augmented matrices are row equivalent (written A ∼ B) if one can be obtained from the other by a sequence of EROs.

Row-equivalent matrices represent systems with the same solution set.
Example from the excerpt:
(1 1 | 27) ∼ (1 0 | 9) ∼ (1 0 | 9)
(2 −1 | 0) (2 −1 | 0) (0 1 | 18)
Each step uses an ERO; the solution set never changes.

🎯 Pivots

A pivot is the matrix entry (always 1 in RREF) used to eliminate (make zero) all other entries in its column.

The algorithm uses the top-left nonzero entry as the first pivot, then moves down and right.
Example: In the matrix
(1 1 | 5)
(0 1 | 3),
the top-left 1 is a pivot (used to zero out below it), and the bottom-right 1 is also a pivot (used to zero out above it).

🏁 Reduced Row Echelon Form (RREF)

🏁 Definition of RREF

An augmented matrix is in Reduced Row Echelon Form if it satisfies three properties:

In every row, the leftmost non-zero entry is 1 (called a pivot).
Each pivot is to the right of the pivot in the row above it (staircase pattern).
Each pivot is the only non-zero entry in its column (all other entries in that column are 0).

Example of RREF:
(1 0 7 | 0)
(0 1 3 | 0)
(0 0 0 | 1)
(0 0 0 | 0)
Example NOT in RREF:
(1 0 3 | 0)
(0 0 2 | 0) — violates rule 1 (pivot is 2, not 1)
(0 1 0 | 1) — violates rule 2 (pivot is left of the one above)
(0 0 0 | 1)

🎯 The Gaussian elimination algorithm

Goal: Transform any augmented matrix into RREF using EROs.

Steps (brute-force version):

Make the leftmost nonzero entry in the top row equal to 1 (by scalar multiplication).
Use that 1 as a pivot to eliminate (make zero) everything below it (by row addition).
Move to the next row; make its leftmost nonzero entry 1.
Use that 1 as a pivot to eliminate everything below and above it.
Repeat for each row.

If the first entry of the first row is zero, swap rows first.
If an entire column is zero, skip it and continue with the next column.
Don't confuse: RREF is not always the identity matrix—it depends on the system.

🚫 When RREF is not the identity

🔁 Redundant equations

If one equation is a multiple of another, elimination produces a row of zeros.
Example:
x + y = 2
2x + 2y = 4
becomes
(1 1 | 2)
(0 0 | 0) in RREF.
Solutions still exist (e.g., x=1, y=1), but there are infinitely many.

❌ Inconsistent equations

If elimination produces a row like (0 0 | 1), the system has no solutions.
Example:
x + y = 2
2x + 2y = 5
becomes
(1 1 | 2)
(0 0 | 1) in RREF.
This says "0 + 0 = 1," which is impossible.

📐 More unknowns than equations

If there are more columns (unknowns) than rows (equations), some columns will have no pivot.
Example:
(1 1 1 0 | 2)
(0 0 0 1 | 0) in RREF has no pivots in columns 2 and 3.

🧮 Reading solutions from RREF

🧮 Pivot vs. non-pivot variables

Pivot variables: variables corresponding to columns with a pivot.
Non-pivot (free) variables: variables corresponding to columns without a pivot.
Standard approach: express pivot variables in terms of non-pivot variables.

📋 The standard approach to solution sets

Steps:

Write the augmented matrix.
Perform EROs to reach RREF.
Express pivot variables in terms of non-pivot variables.

Example:
RREF:
(1 0 0 3 | −5)
(0 1 0 2 | 6)
(0 0 1 4 | 8)

Equations:
x + 3w = −5
y + 2w = 6
z + 4w = 8

Solution:
x = −5 − 3w
y = 6 − 2w
z = 8 − 4w
w = w (free)

In vector form:
(x) (−5) (−3)
(y) = ( 6) + w × (−2)
(z) ( 8) (−4)
(w) ( 0) ( 1)

Solution set: all vectors of this form for any real number w.

🔢 Multiple free variables

If RREF has n columns without pivots, there are n free variables.
The solution set has the form:
{one particular solution + (free variable 1) × (homogeneous solution 1) + (free variable 2) × (homogeneous solution 2) + … : all free variables in R}.

Example:
RREF:
(1 0 7 0 | 4)
(0 1 3 4 | 1)
(0 0 0 0 | 0)
(0 0 0 0 | 0)

Columns 3 and 4 have no pivots, so z and w are free.

Solution:
(x) (4) (−7) ( 0)
(y) = (1) + z × (−3) + w × (−4)
(z) (0) ( 1) ( 0)
(w) (0) ( 0) ( 1)

🏠 Particular and homogeneous solutions

A homogeneous solution to the equation Lx = v is a vector x_H such that Lx_H = 0 (the zero vector).

A particular solution x_P satisfies Lx_P = v.

Key fact: particular solution + any combination of homogeneous solutions = another particular solution.
In the examples above, the constant vector (without free variables) is the particular solution; the vectors multiplied by free variables are homogeneous solutions.
Check: substituting a homogeneous solution into the left side of the matrix equation gives the zero vector.
Don't confuse: the solution set is not just one vector—it is the set of all combinations of the form x_P + (free variables) × (homogeneous solutions).

🎓 Canonical choice of free variables

The standard approach always uses non-pivot variables as free variables.
This choice is canonical: everyone following the algorithm will pick the same free variables.
Other choices of free variables are valid (e.g., solving for w in terms of z), but the standard approach ensures consistency and is easier to check.

2.1.1 Augmented Matrix Notation

🧭 Overview

🧠 One-sentence thesis

Augmented matrix notation provides a more efficient way to represent and manipulate systems of linear equations than either standard equation form or full matrix equation form.

📌 Key points (3–5)

What augmented matrices are: a compact notation that combines the coefficient matrix and the result vector, separated by a vertical line.
Why they are useful: simpler and more efficient than writing out full systems of equations or full matrix equations.
How to read them: rows represent equations, columns left of the line represent unknowns, and the column right of the line represents the results.
Common confusion: superscripts in augmented matrix entries denote row number, NOT exponents.
The underlying question: finding which combination of the coefficient matrix columns adds up to the result vector.

📝 Three equivalent representations

📝 Standard system of equations

The excerpt shows that the same information can be written in three ways:

As a system of equations with variables
As a matrix equation
As an augmented matrix

Example: The system "x + y = 27" and "2x - y = 0" can be written as:

System form: two equations with braces
Matrix equation: a 2×2 matrix times a column vector of variables equals a column vector of results
Augmented matrix: a 2×3 array with a vertical line separating coefficients from results

🔄 Vector combination interpretation

Another way to view the same problem:

"x times (1, 2) + y times (1, -1) = (27, 0)"
This shows we are finding which combination of column vectors adds up to the result vector
Example solution given: "9 times (1, 2) + 18 times (1, -1)"

🔢 Augmented matrix structure

🔢 General form for r equations in k unknowns

The excerpt describes the general augmented matrix structure:

Number of rows = number of equations (r)
Number of columns left of the vertical line = number of unknowns (k)
One column right of the vertical line = the result values
Total size: r rows by (k+1) columns

🏷️ Index notation

Entries left of the divide carry two indices; subscripts denote column number and superscripts row number.

Important warning from the excerpt:

Superscripts here do NOT denote exponents
They indicate which row (which equation) the entry belongs to
Subscripts indicate which column (which variable) the entry belongs to

Example: In the general form, "a₁²" means the entry in row 2, column 1.

📋 Larger example

The excerpt provides a 3-equation, 4-unknown system:

"1x + 3y + 2z + 0w = 9"
"6x + 2y + 0z - 2w = 0"
"-1x + 0y + 1z + 1w = 3"

This becomes a 3×5 augmented matrix (3 rows, 4 columns of coefficients plus 1 result column).

The corresponding matrix equation shows a 3×4 coefficient matrix times a 4×1 variable vector equals a 3×1 result vector.

🎯 The core question

🎯 What we are trying to find

The excerpt emphasizes repeatedly:

We are trying to find which combination of the columns of the matrix adds up to the vector on the right hand side.

This is the fundamental interpretation:

Each column left of the line corresponds to one unknown variable
We seek coefficients (the values of the unknowns) that make the linear combination equal the result vector
This is the same question whether written as equations, matrix equation, or augmented matrix

✅ Required skill

The excerpt states:

Make sure you can write out the system of equations and the associated matrix equation for any augmented matrix.

This means being able to convert freely between all three representations.

🔧 Row operations preview

🔧 Example of systematic manipulation

The excerpt includes Example 11, which shows a step-by-step solution process:

Starting system: "x + y = 27" and "2x - y = 0"
The excerpt performs operations like "replace the first equation by the sum of the two equations" or "divide by 3"
Each step is shown in all three notations side by side

Key observation from the excerpt:

Everywhere in the instructions above we can replace the word "equation" with the word "row" and interpret them as telling us what to do with the augmented matrix instead of the system of equations.

🎯 The strategy revealed

The excerpt describes the strategy as:

To eliminate y from the first equation and then eliminate x from the second. The result was the solution to the system.

This systematic process is called Gaussian elimination (mentioned at the end of the excerpt).

🔄 Solutions remain unchanged

The excerpt emphasizes that operations on rows change the appearance of the augmented matrix but not its solutions—this is the foundation for the elimination algorithm.

2.1.2 Equivalence and the Act of Solving

🧭 Overview

🧠 One-sentence thesis

Row equivalence allows us to transform an augmented matrix step-by-step without changing its solutions, and Gaussian elimination systematically uses this principle to solve systems of linear equations.

📌 Key points (3–5)

The tilde symbol ∼: "is (row) equivalent to" means the augmented matrix changes by row operations but its solutions remain the same.
Equivalence as a solving method: setting up a string of equivalences transforms the system into a form where the solution is obvious.
Pivot entries: the matrix entry used to "zero out" other entries in its column during elimination.
Common confusion: row operations change the appearance of the augmented matrix, but the underlying solution set does not change—equivalence preserves solutions.
Goal of elimination: convert the left part of the augmented matrix into the identity matrix, which directly reveals the solution.

🔄 Three representations of the same system

🔄 Equations, matrix equations, and augmented matrices

The excerpt emphasizes that a system of linear equations can be written in three equivalent ways:

System of equations: e.g., x + y = 27, 2x − y = 0
Matrix equation: e.g., a coefficient matrix times a variable vector equals a constant vector
Augmented matrix: the coefficient matrix with the constants appended as an extra column

All three representations describe the same question and can be manipulated in parallel.

🔧 Operations on equations mirror operations on rows

The excerpt shows that every step performed on equations (adding, subtracting, dividing) can be reinterpreted as a row operation on the augmented matrix:

"Replace the first equation by the sum of the two equations" ↔ "Replace the first row by the sum of the two rows"
"Divide the first equation by 3" ↔ "Divide the first row by 3"
"Replace the second equation by the second minus two times the first" ↔ "Replace the second row by the second row minus two times the first row"

This parallelism is the foundation of Gaussian elimination.

≈ Row equivalence and the tilde symbol

≈ What the tilde means

The symbol ∼ is called "tilde" but should be read as "is (row) equivalent to" because at each step the augmented matrix changes by an operation on its rows but its solutions do not.

The tilde connects two augmented matrices that represent the same system of equations.
Example from the excerpt:
- (1 1 27 / 2 −1 0) ∼ (1 0 9 / 2 −1 0) ∼ (1 0 9 / 0 1 18)
Each transformation is a row operation; the solution set remains unchanged.

🔍 Don't confuse: changing the matrix vs changing the solutions

Row operations change the entries of the augmented matrix.
But the solutions (the values of x, y, etc. that satisfy the system) stay the same.
Equivalence means "different appearance, same solution."

🎯 Pivots and elimination strategy

🎯 What a pivot is

The name pivot is used to indicate the matrix entry used to "zero out" the other entries in its column; the pivot is the number used to eliminate another number in its column.

In Example 12, the top left 1 is a pivot used to make the bottom left entry zero.
The bottom right entry (before dividing) is also a pivot, used to make the top right entry vanish.

🧩 The elimination strategy

The excerpt describes the strategy as:

Eliminate y from the first equation (or zero out entries in the y-column above/below the pivot).
Eliminate x from the second equation (or zero out entries in the x-column above/below the pivot).
The result is a system where each equation has only one variable, making the solution obvious.

Example from the excerpt:

Step	Augmented matrix	What happened
Start	(1 1 5 / 1 2 8)	Original system
After first pivot	(1 1 5 / 0 1 3)	Used top-left 1 to zero out bottom-left entry
After second pivot	(1 0 2 / 0 1 3)	Used bottom-right entry to zero out top-right entry
Solution	x = 2, y = 3	Directly readable

🏁 The goal: identity matrix and obvious solutions

🏁 What the identity matrix is

The Identity Matrix I has 1's along its diagonal and all off-diagonal entries vanish.

For two equations: I = (1 0 / 0 1)
For larger systems: I has 1's on the diagonal and 0's everywhere else.

🏁 Why the identity is the goal

If the left part of the augmented matrix becomes the identity, the system reads directly as x = a, y = b, etc.
Example: (1 0 9 / 0 1 18) means x = 9, y = 18.
The excerpt notes that for many systems, reaching the identity is not possible, but Gaussian elimination still aims for a form with the maximum number of components eliminated.

🧠 Equivalence as the act of solving

The excerpt states:

Setting up a string of equivalences like this is a means of solving a system of linear equations.

Solving is not a single step; it is a sequence of row-equivalent transformations.
Each ∼ step preserves solutions while simplifying the matrix.
The final form reveals the solution directly.

Reduced Row Echelon Form

2.1.3 Reduced Row Echelon Form

🧭 Overview

🧠 One-sentence thesis

Reduced Row Echelon Form (RREF) is the maximally simplified version of an augmented matrix obtained through Gaussian elimination, which reveals whether a system has a unique solution, infinitely many solutions, or no solution at all.

📌 Key points (3–5)

Goal of Gaussian elimination: transform the augmented matrix so the left side becomes the identity matrix (1's on the diagonal, 0's elsewhere), which directly gives solutions x = a, y = b, etc.
What RREF is: the endpoint of Gaussian elimination—a standardized form with the maximum number of entries eliminated using three elementary row operations.
Three elementary row operations (EROs): row swap, scalar multiplication (by non-zero constant), and row addition—these do not change the system's solutions.
Common confusion: not all systems can reach the identity matrix; redundant equations produce rows of zeros, and inconsistent equations produce impossible statements like 0 = 1.
Why RREF matters: it reveals the structure of solutions—whether the system is solvable, has redundancy, or is inconsistent.

🎯 The goal: reaching the identity matrix

🎯 What the identity matrix looks like

Identity Matrix I: a matrix with 1's along its diagonal and all off-diagonal entries zero.

For two equations: I = (1 0 / 0 1).
For larger systems: I has 1's down the main diagonal and 0's everywhere else.
When the left side of the augmented matrix becomes I, the system directly states x = a, y = b, etc.

🚧 When the identity is unreachable

The excerpt shows two obstructions:

Situation	What happens	Example outcome
Redundant equations	One equation is a multiple of another; elimination produces a row of zeros	(1 1 2 / 0 0 0); solutions still exist but are not unique
Inconsistent equations	Elimination produces an impossible statement like 0 + 0 = 1	(1 1 2 / 0 0 1); no solutions exist

Don't confuse: a row of zeros (0 0 0) means redundancy; a row like (0 0 1) means inconsistency.

🔧 Elementary row operations (EROs)

🔧 The three operations that preserve solutions

The excerpt emphasizes that these operations do not change the system's solutions:

Row Swap: exchange any two rows.
Scalar Multiplication: multiply any row by a non-zero constant.
Row Addition: add one row to another row.

Why non-zero for multiplication: multiplying by zero would destroy information and change solutions.
Example from the excerpt: swapping rows when the top-left entry is zero (the "silly order of equations" example).

🔄 Pivots and elimination

Pivot: the matrix entry used to "zero out" the other entries in its column; the pivot is the number used to eliminate another number in its column.

The excerpt shows that a pivot (e.g., the top-left 1) is used to make entries below it zero.
Later pivots are used to eliminate entries both above and below.
Example: in the system x + y = 5, x + 2y = 8, the top-left 1 is a pivot to eliminate the bottom-left entry; then the bottom-right entry becomes a pivot to eliminate the top-right entry.

🛠️ The RREF algorithm

🛠️ Step-by-step process

The excerpt provides a "brute force" algorithm:

Make the leftmost nonzero entry in the top row equal to 1 (by scalar multiplication).
Use that 1 as a pivot to eliminate everything below it (by row addition).
Move to the next row; make its leftmost nonzero entry 1.
Use that 1 as a pivot to eliminate everything below and above it.
Repeat for each subsequent row.

If the first entry of the first row is zero: swap rows first.
If an entire column is zero: skip it and apply the algorithm to remaining columns.

📐 What RREF looks like

The excerpt gives the general form of RREF:

1  *  0  *  0  ...  0  *  b₁
0  0  1  *  0  ...  0  *  b₂
0  0  0  0  1  ...  0  *  b₃
...
0  0  0  0  0  ...  1  *  bₖ
0  0  0  0  0  ...  0  0  bₖ₊₁
...
0  0  0  0  0  ...  0  0  bᵣ

The asterisks (*) denote entries that may be nonzero.
Each leading 1 (pivot) is the only nonzero entry in its column.
Rows of all zeros appear at the bottom.

⚠️ Special cases in RREF

Redundant equations (Example 13):

System: x + y = 2, 2x + 2y = 4.
RREF: (1 1 2 / 0 0 0).
Interpretation: the second equation adds no new information; solutions exist (e.g., (1, 1)) but are not unique.

Inconsistent equations (Example 14):

System: x + y = 2, 2x + 2y = 5.
RREF: (1 1 2 / 0 0 1).
Interpretation: the bottom row says 0 + 0 = 1, which is impossible; no solutions exist.

Silly order (Example 15):

Starting with 0x + y = -2, x + y = 7 puts a zero in the top-left, blocking the algorithm.
Solution: swap rows first to get x + y = 7, 0x + y = -2.
RREF: (1 0 9 / 0 1 -2), giving x = 9, y = -2.

🧩 Why RREF is the "favorite" form

🧩 Maximum simplification

RREF: the version of the matrix that has the maximum number of components eliminated.

Even when the identity matrix is unreachable, RREF is the simplest possible form.
The excerpt states "no more than two components can be eliminated" in the redundant-equation example, meaning RREF is as far as you can go.

🧩 Direct solution reading

When RREF reaches the identity on the left side, the right side directly gives the solution values.
Example from the excerpt: (1 0 2 / 0 1 3) corresponds to x = 2, y = 3.
This is the "main idea" of section 2.1.3: setting up equivalences to solve systems by transforming to RREF.

Solution Sets and RREF

2.1.4 Solution Sets and RREF

🧭 Overview

🧠 One-sentence thesis

Reduced Row Echelon Form (RREF) provides a maximally simplified system of equations from which all solutions—whether unique, infinitely many, or none—can be systematically read off by expressing pivot variables in terms of free (non-pivot) variables.

📌 Key points (3–5)

What RREF achieves: maximally simplifies a system by making as many coefficients 0 or 1 as possible, making solutions easier to extract.
Pivot vs non-pivot variables: pivot variables correspond to columns with a leading 1 in RREF; non-pivot variables become free variables that index (parameterize) the solution set.
Standard approach: express pivot variables in terms of non-pivot variables; the number of non-pivot variables equals the number of free parameters in the solution.
Common confusion: different choices of free variables can describe the same solution set—the standard approach is canonical (everyone gets the same RREF and same free variables), but non-standard choices are valid if they yield the same set of solutions.
General solution structure: solution sets with n free variables take the form "one particular solution plus n homogeneous solutions," each multiplied by a free parameter.

🔧 Elementary Row Operations and the algorithm

🔧 Three Elementary Row Operations (EROs)

The excerpt defines three operations that do not change a system's solutions:

(Row Swap): Exchange any two rows.
(Scalar Multiplication): Multiply any row by a non-zero constant.
(Row Addition): Add one row to another row.

These are the building blocks of Gaussian elimination.

📋 Algorithm for obtaining RREF

The excerpt gives a step-by-step "brute force" algorithm:

Make the leftmost nonzero entry in the top row equal to 1 (by scalar multiplication).
Use that 1 as a pivot to eliminate (make zero) everything below it (by row addition).
Move to the next row; make its leftmost nonzero entry 1.
Use that 1 as a pivot to eliminate everything below and above it.
Repeat for each subsequent row.

Special cases:

If the first entry of the first row is zero, swap it with another row whose first entry is nonzero.
If an entire column is zero, skip it and apply the algorithm to the remaining columns.

Don't confuse: the algorithm eliminates below and above each pivot, not just below—this is what makes the form "reduced."

🎯 What RREF looks like

Reduced Row Echelon Form (RREF): an augmented matrix satisfying three properties:

In every row, the leftmost non-zero entry is 1 (called a pivot).

The pivot of any given row is always to the right of the pivot of the row above it.

The pivot is the only non-zero entry in its column.

The general form includes asterisks (arbitrary numbers) in columns that have no pivot.

Example of RREF (from Example 16):

Columns 1, 2, and 4 have pivots; column 3 has no pivot.

Example NOT in RREF (from Example 17):

This breaks all three rules: the second row's first nonzero entry is not 1, pivots are not in descending staircase order, and pivots are not the only nonzero entries in their columns.

🧩 Reading solutions from RREF

🧩 Why RREF simplifies solution extraction

RREF is a maximally simplified version of the original system in the following sense:

As many coefficients of the variables as possible are 0.

As many coefficients of the variables as possible are 1.

This makes it easier to read off solutions, even when there are infinitely many.

🔑 Pivot variables vs non-pivot variables

Pivot variables: variables that appear with a pivot coefficient (a leading 1) in RREF.
Non-pivot variables: variables whose columns have no pivot.
Free variables: variables not expressed in terms of others; in the standard approach, these are the non-pivot variables.

The excerpt emphasizes: "There are always exactly enough non-pivot variables to index your solutions."

Example: In Example 19, variables x, y, and z are pivot variables; w is a non-pivot variable and becomes the free variable.

📐 The standard approach to solution sets

The excerpt outlines a three-step process:

Write the augmented matrix.
Perform EROs to reach RREF.
Express the pivot variables in terms of the non-pivot variables.

Then add trivial equations for the free variables (e.g., w = w) and rewrite the system in vector form.

Example walkthrough (Example 19):

Original system (already in near-RREF form):
- x + y + 5w = 1
- y + 2w = 6
- z + 4w = 8
After further row operations, RREF gives:
- x + 3w = −5
- y + 2w = 6
- z + 4w = 8
Express pivot variables in terms of w:
- x = −5 − 3w
- y = 6 − 2w
- z = 8 − 4w
- w = w

Vector form:

(x, y, z, w) = (−5, 6, 8, 0) + w(−3, −2, −4, 1)

Solution set: all vectors of this form for any real number w.

Don't confuse: you could solve for a different variable (e.g., express w in terms of z), but the standard approach is canonical—everyone using RREF will choose the same free variables, making communication clearer.

🔢 Multiple free variables

When RREF has multiple columns without pivots, there are multiple free variables.

Example (Example 20):

RREF augmented matrix:

1  0  7  0  4
0  1  3  4  1
0  0  0  0  0
0  0  0  0  0

Columns 3 and 4 (variables z and w) have no pivots → two free variables.
Express pivot variables:
- x = 4 − 7z
- y = 1 − 3z − 4w
- z = z
- w = w

Vector form:

(x, y, z, w) = (4, 1, 0, 0) + z(−7, −3, 1, 0) + w(0, −4, 0, 1)

Solution set: all such vectors for any real z and w.

The excerpt notes: "You can imagine having three, four, or fifty-six non-pivot columns and the same number of free variables indexing your solution set."

🏗️ Structure of solution sets

🏗️ General form with n free variables

The excerpt gives the general structure:

A solution set to a system of equations with n free variables will be of the form: { xP + μ₁xH1 + μ₂xH2 + ⋯ + μₙxHn : μ₁, …, μₙ ∈ ℝ }

xP: a particular solution (the constant vector).
xH1, xH2, …, xHn: homogeneous solutions (the direction vectors multiplied by free parameters).

🔬 Particular and homogeneous solutions

Homogeneous solution: A vector xH such that L xH = 0 (where 0 is the zero vector), for a linear equation L x = v.

Particular solution: A vector xP such that L xP = v.

Key property: If you add any sum of multiples of homogeneous solutions to a particular solution, you obtain another particular solution.

The excerpt emphasizes: "This will come up over and over again."

Analogy (from the excerpt): Consider the differential equation d²f/dx² = 3.

Particular solution: (3/2)x²
Homogeneous solutions: x and 1
General solution: { (3/2)x² + a x + c : a, c ∈ ℝ }

This mirrors the structure of linear system solutions.

⚠️ Inconsistent systems

The excerpt notes: "If any of the numbers bk+1, …, br [in the bottom rows of RREF] are non-zero then the system of equations is inconsistent and has no solutions."

Example: A row like [0 0 0 | 5] corresponds to the equation 0 = 5, which is impossible.

🎓 Practical skills and reminders

🎓 Converting RREF back to equations

The excerpt stresses: "It is important that you are able to convert RREF back into a system of equations."

Each row of the RREF augmented matrix corresponds to one equation. Read the coefficients and the constant term directly.

🎓 Canonical vs non-standard approaches

Aspect	Standard (canonical) approach	Non-standard approach
Free variables	Non-pivot variables	Any choice of variables
Advantage	Everyone gets the same answer; easier to compare	May be simpler in specific cases
Validity	Always correct	Correct if the solution set is the same

The excerpt uses a metaphor: "You might think of this as the difference between using Google Maps or Mapquest; although their maps may look different, the place they are describing is the same!"

🎓 Practice and mastery

The excerpt repeatedly emphasizes practice:

"Learning to perform this algorithm by hand is the first step to learning linear algebra."
"You need to learn it well. So start practicing as soon as you can, and practice often."
"You are going to need to get really good at this!"
"You need to become very adept at reading off solution sets… it is a basic skill for linear algebra, and we will continue using it up to the last page of the book!"

Review Problems: Homogeneous and Particular Solutions

2.2 Review Problems

🧭 Overview

🧠 One-sentence thesis

The solution to a linear equation splits into a particular solution plus any combination of homogeneous solutions, and mastering this structure—along with reading solution sets from RREF—is fundamental to all of linear algebra.

📌 Key points (3–5)

Homogeneous vs particular: a homogeneous solution satisfies Lx = 0; a particular solution satisfies Lx = v; adding any homogeneous solution to a particular solution yields another particular solution.
Why this structure matters: the pattern of "particular + homogeneous" appears repeatedly in linear systems, differential equations, and throughout the book.
Core skill: reading solution sets directly from the RREF of an augmented matrix is essential and will be used up to the last page.
Common confusion: row equivalence vs solution-set equality—two matrices can have the same solution set without being row equivalent; row equivalence is defined by elementary row operations, not by solution sets.
RREF uniqueness: the RREF of a matrix is unique, but row echelon form (REF) is not.

🧩 Solution structure: particular + homogeneous

🧩 Homogeneous solution

A homogeneous solution to a linear equation Lx = v (with L and v known) is a vector x_H such that Lx_H = 0, where 0 is the zero vector.

It is a solution to the "zero version" of the equation.
The excerpt emphasizes that homogeneous solutions correspond to the parts of the general solution with free variables as coefficients.

🔧 Particular solution

A particular solution x_P satisfies the original equation Lx_P = v.
The excerpt states: if you add a sum of multiples of homogeneous solutions to a particular solution, you obtain another particular solution.
Example (from the excerpt): for the differential equation d²f/dx² = 3, a particular solution is (3/2)x², while x and 1 are homogeneous solutions; the full solution set is {(3/2)x² + ax + c : a, c ∈ ℝ}.

🔍 Why this decomposition works

The key insight: Lx_P = v and Lx_H = 0 together imply L(x_P + x_H) = v.
This structure recurs in differential equations and linear systems.
Don't confuse: the homogeneous solution is not "the solution when v = 0 in the problem"; it is the part of the general solution that satisfies the zero equation.

🧮 Reading solution sets from RREF

🧮 The essential skill

The excerpt emphasizes: "You need to become very adept at reading off solution sets of linear systems from the RREF of their augmented matrix; it is a basic skill for linear algebra."
This skill is used continuously throughout the book.
The review problems ask you to check whether augmented matrices are in RREF and compute their solution sets.

📋 What RREF tells you

Pivot columns correspond to leading variables (determined by other variables).
Free variables (columns without pivots) generate the homogeneous solutions.
The augmented column gives the particular solution when free variables are set to zero.

🔄 Row equivalence and its properties

🔄 Definition and operations

Two matrices are row equivalent if one can be obtained from the other by elementary row operations (EROs).
The excerpt notes that row equivalence is defined by the operations, not by having the same solution set.

🚫 What affects row equivalence

Operation	Effect on row equivalence	Explanation from excerpt
Removing columns	Never affects it	Problem 3 asks you to verify that removing columns from row-equivalent matrices leaves them row-equivalent.
Removing rows	Can affect it	Problem 4 asks whether removing a row can change row equivalence; the answer is yes (removing constraints can change the system).

⚠️ Common confusion: row equivalence ≠ same solution set

Problem 11 explicitly asks you to find two augmented matrices that are not row equivalent but do have the same solution set.
This shows that row equivalence is a stronger condition than solution-set equality.

🎯 RREF vs REF

🎯 RREF is unique

Problem 6 asks you to show that the RREF of a matrix is unique.
Hint: consider what happens if the same augmented matrix had two different RREFs; try removing columns.

🔀 REF is not unique

Row Echelon Form (REF): pivots are not necessarily set to one, and we only require that all entries left of the pivots are zero, not necessarily entries above a pivot.

Problem 7 asks for a counterexample to show that REF is not unique.
Once in REF, the system can be solved by back substitution: write the REF matrix as a system of equations, then solve from the bottom row upward.

🧪 Special cases and conditions

🧪 No solution

Problem 4 gives an augmented matrix with no solutions; you are asked to check whether removing a row changes this.
Problem 5 shows a system with no solutions (a row like [0 0 0 | 6] in RREF) and asks for which values of a parameter k the system has a solution.

🧪 Geometric interpretation

Problem 9 asks for a geometric reason why a system has no solution: plot the column vectors in the plane.
For a general augmented matrix with entries a, b, c, d, e, f, you are asked to find a condition on a, b, c, d that corresponds to the geometric condition.
Example: if the first two columns are parallel (proportional), the system may have no solution or infinitely many solutions depending on the third column.

🧪 Equivalence relations

Problem 10 asks you to show that row equivalence is an equivalence relation:
- Reflexive: any matrix is row equivalent to itself.
- Symmetric: if matrix A is row equivalent to B, then B is row equivalent to A.
- Transitive: if A is row equivalent to B and B is row equivalent to C, then A is row equivalent to C.

Elementary Row Operations

2.3 Elementary Row Operations

🧭 Overview

🧠 One-sentence thesis

Elementary row operations can be represented as matrices, allowing us to "undo" a matrix step by step by applying these operation matrices to both sides of an equation, just like dividing both sides in elementary algebra.

📌 Key points (3–5)

EROs as matrices: Each elementary row operation can be written as a matrix that multiplies the original matrix.
Step-by-step undoing: Instead of finding a single inverse all at once, we can undo a matrix one ERO at a time, applying the same operations to both sides of an equation.
Recording EROs with (M | I): Augment the matrix M with the identity matrix I, then perform Gaussian elimination; the left side becomes I while the right side becomes the matrix that undoes M.
Common confusion: The order of matrix multiplication—when M N = I, both M N and N M equal the identity (the excerpt shows both orders work).
Why it matters: This gives a concrete, familiar way to "divide by a matrix" and solve systems of linear equations.

🔢 EROs represented as matrices

🔢 What it means to perform EROs with matrices

The excerpt shows that each elementary row operation can be written as a matrix.
When you multiply an augmented matrix by these ERO matrices, you perform the corresponding row operation.
Example from the excerpt: multiplying by a matrix with 1/2 in a diagonal position scales that row by 1/2; multiplying by a matrix with a 1 and -1 in specific positions subtracts one row from another.

🧮 Concrete example

The excerpt demonstrates:

Start with an augmented matrix.
Multiply on the left by a sequence of ERO matrices (swap rows, scale rows, add/subtract rows).
Each multiplication performs one row operation.
The result is the same as doing Gaussian elimination step by step.

🔄 Undoing a matrix step by step

🔄 The algebra analogy

"Dividing by a matrix": applying ERO matrices to both sides of an equation to isolate the variable.

Just as 6x = 12 can be solved by multiplying both sides by 3⁻¹, then by 2⁻¹, a matrix equation can be solved by applying ERO matrices to both sides.
The excerpt shows Example 23: 6x = 12 ⇔ 3⁻¹(6x) = 3⁻¹(12) ⇔ 2x = 4 ⇔ 2⁻¹(2x) = 2⁻¹(4) ⇔ x = 2.
This breaks down "dividing by 6" into two steps: divide by 3, then divide by 2.

🧩 Matrix equation example

Example 24 in the excerpt shows a 3×3 system:

Start with a matrix M times a vector (x, y, z) equals a vector (7, 4, 4).
Apply ERO matrices one at a time to both sides.
Each step simplifies the left side closer to the identity matrix.
When the left side becomes the identity matrix, the right side is the solution vector (2, 3, 4).
The excerpt notes: "This is another way of thinking about Gaussian elimination which feels more like elementary algebra in the sense that you 'do something to both sides of an equation' until you have a solution."

🎯 Why this approach works

Instead of finding a single inverse matrix M⁻¹ directly, you compose multiple simple ERO matrices.
Each ERO matrix undoes one aspect of M (e.g., swaps rows, scales a row, eliminates an entry).
Applying them in sequence fully undoes M.

📋 Recording EROs with (M | I)

📋 The augmentation technique

Augment by the identity matrix (not just a single column) and then perform Gaussian elimination.

Write the matrix M next to the identity matrix I as (M | I).
Perform Gaussian elimination on the left side (M) to turn it into I.
As you do this, the right side (I) transforms into the matrix that undoes M.
The excerpt emphasizes: "There is no need to write the EROs as systems of equations or as matrices while doing this"—just do row operations as usual.

🧪 Concrete example

Example 25 shows:

Start with a 3×3 matrix M augmented by the 3×3 identity: (M | I).
Perform row swaps, scaling, and row additions.
The left side becomes the identity matrix.
The right side becomes the matrix that undoes M.

✅ Verification

Example 26 demonstrates checking that one matrix undoes another:

Multiply the original matrix M by the matrix N obtained from the (M | I) process.
The result is the identity matrix: M N = I.
The excerpt also shows that the order doesn't matter in this case: N M = I as well.
Don't confuse: matrix multiplication order usually matters, but when M N = I, both orders work (this is a special property of inverses).

🔑 The inverse matrix

🔑 Definition

Whenever the product of two matrices M N = I, we say that N is the inverse of M or N = M⁻¹.

The inverse is the single matrix that undoes M.
It is the result of composing all the ERO matrices used in Gaussian elimination.
The (M | I) technique gives a systematic way to find M⁻¹.

🔄 Composition of EROs

The excerpt mentions "put together 3⁻¹ 2⁻¹ = 6⁻¹ to get a single thing to apply to both sides of 6x = 12 to undo 6."
Similarly, composing multiple ERO matrices gives a single matrix M⁻¹.
This single matrix can then be applied to both sides of M x = b to solve for x directly: x = M⁻¹ b.

🧭 Relationship to Gaussian elimination

Gaussian elimination is the process of applying EROs to simplify a system.
Recording these EROs as matrices (via the (M | I) technique) captures the entire process in a single inverse matrix.
This connects the procedural view (row operations) with the algebraic view (matrix multiplication).

EROs and Matrices

2.3.1 EROs and Matrices

🧭 Overview

🧠 One-sentence thesis

Elementary row operations can be represented as matrices, allowing us to "undo" a matrix step-by-step by multiplying both sides of an equation, just like dividing both sides in elementary algebra.

📌 Key points (3–5)

EROs as matrices: every elementary row operation can be written as a matrix that multiplies the augmented matrix.
Step-by-step undoing: multiplying by ERO matrices one at a time reverses the effect of a matrix, similar to dividing both sides of 6x = 12 by 3 then by 2.
Collecting all EROs at once: augment the original matrix with the identity matrix (M | I) and perform Gaussian elimination; the right side becomes the inverse matrix.
Common confusion: the order of multiplication—when MN = I, either order (MN or NM) gives the identity, and N is called the inverse of M (written M⁻¹).
Why it matters: this approach makes Gaussian elimination feel like familiar algebra ("do the same thing to both sides") and provides a concrete way to compute matrix inverses.

🔢 EROs as matrix multiplication

🔢 What it means to perform EROs with matrices

Each elementary row operation can be represented as a matrix that left-multiplies the augmented matrix.

Instead of writing row operations in words, you multiply the augmented matrix by a sequence of ERO matrices.
Example 22 in the excerpt shows three ERO matrices multiplying an augmented matrix step-by-step, transforming the left side toward row-echelon form.
The result is the same as performing the row operations directly, but now each step is a matrix product.

🧮 Why this representation is useful

It allows "dividing by a matrix" in a concrete sense: you can multiply both sides of Ax = b by matrices that undo A.
This mirrors elementary algebra: to solve 6x = 12, you multiply both sides by 3⁻¹ then by 2⁻¹.
Example 23 shows 6x = 12 solved by multiplying both sides first by 3⁻¹ (getting 2x = 4) then by 2⁻¹ (getting x = 2).

🔄 Undoing a matrix step-by-step

🔄 The process: multiply both sides by ERO matrices

Start with a matrix equation like Mx = b.
Multiply both sides by a sequence of ERO matrices, one at a time, until the left side becomes the identity matrix.
Example 24 demonstrates this for a 3×3 system: six ERO matrices are applied in sequence to both sides, transforming the left side from M to I and the right side from the original vector to the solution vector.

🎯 The goal: isolate the variable vector

After all ERO matrices are applied, the left side is Ix (which equals x), and the right side is the solution.
This approach "feels more like elementary algebra" because you explicitly "do something to both sides of an equation" at each step.
Don't confuse: you are not solving for a single number; you are isolating a vector by undoing the matrix multiplication.

🧩 Collecting EROs into a single inverse matrix

🧩 The augmented identity method

To find the matrix that undoes M, augment M with the identity matrix (M | I) and perform Gaussian elimination.

Instead of writing out each ERO matrix separately, perform row operations on the augmented system (M | I).
As the left side changes from M to I, the right side changes from I to the matrix that undoes M.
Example 25 shows this process: the left side is transformed to the identity, and the right side becomes the inverse matrix.

🔍 Why this works

Each row operation you perform is implicitly multiplying both sides by an ERO matrix.
The right side accumulates the product of all these ERO matrices.
When the left side reaches I, the right side is the product of all ERO matrices, which is exactly the matrix that undoes M.

📝 No need to write EROs explicitly

The excerpt emphasizes: "There is no need to write the EROs as systems of equations or as matrices while doing this."
You simply perform Gaussian elimination on (M | I) and read off the inverse from the right side.

🔁 The inverse matrix and commutativity

🔁 Definition and notation

When the product of two matrices MN = I, we say that N is the inverse of M, written N = M⁻¹.

The inverse matrix "undoes" M: multiplying M by M⁻¹ gives the identity matrix.
This is the matrix analogue of dividing by a number: just as 6 · (6⁻¹) = 1, we have M · M⁻¹ = I.

🔄 Order does not matter for inverses

Example 26 shows that both MN and NM equal the identity matrix.
This is a special property: in general, matrix multiplication is not commutative (AB ≠ BA), but when N is the inverse of M, both orders give I.
Don't confuse: this commutativity holds only for a matrix and its inverse, not for arbitrary matrices.

✅ Checking your work

To verify that you have correctly computed M⁻¹, multiply M by your candidate inverse in either order.
If the result is the identity matrix, your inverse is correct.
Example 26 demonstrates this check for the matrix from Example 25.

Recording EROs in (M | I)

2.3.2 Recording EROs in ( M | I )

🧭 Overview

🧠 One-sentence thesis

By augmenting a matrix M with the identity matrix and performing Gaussian elimination to reduce M to I, the identity side transforms into the inverse matrix M⁻¹ that undoes M.

📌 Key points (3–5)

The core method: Augment M with the identity matrix as (M | I), then apply EROs until the left side becomes I; the right side becomes M⁻¹.
What "undoing" means: The inverse M⁻¹ is the matrix that, when multiplied with M, produces the identity matrix (M⁻¹M = I and MM⁻¹ = I).
When a matrix is invertible: A matrix M is invertible if and only if its RREF (reduced row echelon form) is the identity matrix.
Common confusion: The order of multiplication—for invertible matrices, M⁻¹M and MM⁻¹ both equal I (the order doesn't matter for the result).
Building blocks: Every invertible matrix can be expressed as a product of elementary row operation matrices, similar to how integers factor into primes.

🔧 The augmentation method

🔧 Why augment with the identity

Just as you combine multiple steps in algebra (like 3 × (-1) × 2 × (-1) = 6⁻¹) to undo a single number, you combine multiple EROs to undo a matrix.
The identity matrix I acts as a "recorder": as you perform EROs on M, the same operations applied to I accumulate into the inverse.
No need to write out each ERO as a separate matrix or system of equations—just perform the row operations on the augmented matrix (M | I).

📝 Step-by-step process

Start with (M | I): the original matrix M on the left, the identity matrix I on the right.
Apply elementary row operations to reduce the left side to the identity matrix.
As the left side changes from M to I, the right side changes from I to M⁻¹.
The final form is (I | M⁻¹).

How to find M⁻¹: (M | I) ∼ (I | M⁻¹)

🧪 Example walkthrough

The excerpt shows a 3×3 matrix being reduced:

Start: (M | I) with M having entries like [0,1,1; 2,0,0; 0,0,1] and I as [1,0,0; 0,1,0; 0,0,1].
After row swaps and scaling: the left side becomes the identity.
The right side becomes the matrix [0, 1/2, 0; 1, 0, -1; 0, 0, 1], which is M⁻¹.

Don't confuse: The process is not about solving a system with a single right-hand side column; you're augmenting with the entire identity matrix to capture all the operations at once.

🔄 Understanding the inverse

🔄 What the inverse does

Inverse of M (denoted M⁻¹): A matrix N such that MN = I.

When you multiply M by its inverse M⁻¹, you get the identity matrix.
Conversely, M is the inverse of M⁻¹, so M = (M⁻¹)⁻¹.
The excerpt shows that order doesn't matter for invertible matrices: M⁻¹M = I and MM⁻¹ = I both hold.

✅ Checking the inverse

Example from the excerpt:

Multiply the original matrix M by the computed inverse.
Both orders of multiplication yield the identity matrix:
- M⁻¹ × M = I
- M × M⁻¹ = I
This confirms that the computed matrix is indeed the inverse.

Example: If M is [0,1,1; 2,0,0; 0,0,1] and the computed inverse is [0, 1/2, 0; 1, 0, -1; 0, 0, 1], multiplying them in either order produces [1,0,0; 0,1,0; 0,0,1].

🧱 Invertibility and structure

🧱 When a matrix is invertible

Invertible matrix: A matrix M whose RREF is the identity matrix.

Key condition: You can only find M⁻¹ if M reduces to the identity matrix through EROs.
If the RREF of M is not the identity (e.g., has a row of zeros), M is not invertible.
The excerpt emphasizes: "This is only true if the RREF of M is the identity matrix."

🏗️ Matrices as products of EROs

The process (M | I) ∼ (I | M⁻¹) can be written symbolically as:
- Apply EROs E₁, E₂, ... in sequence: (E₂E₁M | E₂E₁) ∼ ... ∼ (I | ...E₂E₁).
- The product ...E₂E₁ equals M⁻¹, because (...E₂E₁)M = I.
Inverse relationship: If M⁻¹ = ...E₂E₁, then M = E₁⁻¹E₂⁻¹... (each ERO has an inverse).
Verification: M⁻¹M = (...E₂E₁)(E₁⁻¹E₂⁻¹...) = ...E₂(E₂⁻¹)... = I.

🧬 Fundamental building blocks

The excerpt draws an analogy to fundamental theorems in mathematics:
- Fundamental theorem of arithmetic: integers factor into primes.
- Fundamental theorem of algebra: polynomials factor into first-order (complex) polynomials.
- Invertible matrices: can be expressed as products of elementary row operation matrices.
EROs are the "atoms" of invertible matrices—every invertible matrix is built from them.

🔢 The three elementary matrices

🔢 Types of ERO matrices

The excerpt introduces three kinds of elementary matrices, each corresponding to a type of row operation. All are close to the identity matrix:

Type	Description	Form
Row Swap	Swaps two rows	Identity matrix with two rows swapped
Scalar Multiplication	Multiplies one row by a nonzero scalar	Identity matrix with one diagonal entry not equal to 1
Row Sum	Adds a multiple of one row to another	Identity matrix with one off-diagonal entry not equal to 0

🔍 Why they're close to the identity

Each elementary matrix differs from the identity matrix by a small, specific change.
This makes them easy to construct and recognize.
The excerpt notes that concrete examples and applications follow from understanding these three types.

Don't confuse: An elementary matrix is not the same as the ERO itself; it is the matrix representation of the operation that you can multiply with another matrix to perform the operation.

The Three Elementary Matrices

2.3.3 The Three Elementary Matrices

🧭 Overview

🧠 One-sentence thesis

Elementary row operations can be represented as matrices that are simple modifications of the identity matrix, and every invertible matrix can be expressed as a product of these elementary matrices.

📌 Key points (3–5)

What elementary matrices are: matrices corresponding to the three types of elementary row operations (row swap, scalar multiplication, row sum), each formed by performing that operation on the identity matrix.
How to construct them: row swap matrices swap two rows of the identity; scalar multiplication matrices change one diagonal entry; row sum matrices add one off-diagonal entry.
Core factorization principle: any invertible matrix can be written as a product of elementary matrices, similar to how integers factor into primes or polynomials factor into linear terms.
Common confusion: the order matters—when expressing M as a product of inverse ERO matrices, the order is reversed from the elimination steps.
Practical applications: LU, LDU, and PLDU factorizations break elimination into blocks of different ERO types for computational efficiency.

🔑 Invertibility and elementary operations

🔑 What makes a matrix invertible

Definition: A matrix M is invertible if its RREF is an identity matrix.

Invertibility means you can "undo" the matrix using elementary row operations.
The process: perform EROs to bring M to the identity matrix I.
Notation: (M | I) ~ (E₁M | E₁) ~ (E₂E₁M | E₂E₁) ~ ⋯ ~ (I | ⋯E₂E₁).
The result on the right side is the inverse: M⁻¹ = ⋯E₂E₁.

🔄 How to find the inverse

Procedure: (M | I) ~ (I | M⁻¹)

Augment M with the identity matrix.
Apply EROs to transform M into I.
The same operations transform I into M⁻¹.
This works because each ERO has an inverse: M = E₁⁻¹E₂⁻¹⋯ and M⁻¹ = ⋯E₂E₁.

🧱 The fundamental building-block principle

If M is invertible, then M can be expressed as the product of EROs.
The same is true for its inverse M⁻¹.
Analogy: This resembles the fundamental theorem of arithmetic (integers as products of primes) or the fundamental theorem of algebra (polynomials as products of first-order factors).
EROs are the building blocks of invertible matrices.

🎯 The three types of elementary matrices

🔀 Row swap matrices

Form: Identity matrix with two rows swapped.

Example: To swap the 2nd and 4th rows, swap those rows in the identity matrix.
The resulting matrix has 1s in positions (1,1), (2,4), (3,3), (4,2), (5,5) instead of the usual diagonal.
Key property: Applying this matrix to another matrix performs the same row swap.

✖️ Scalar multiplication matrices

Form: Identity matrix with one diagonal entry not equal to 1.

Example: To replace the 3rd row with 7 times the 3rd row, put 7 in the (3,3) position instead of 1.
All other diagonal entries remain 1; all off-diagonal entries remain 0.
Key property: Changes the scale of one row only.

➕ Row sum matrices

Form: Identity matrix with one off-diagonal entry not equal to 0.

Example: To replace the 4th row with (4th row + 9 times the 2nd row), put 9 in the (4,2) position.
All diagonal entries remain 1; only one off-diagonal entry is non-zero.
Key property: Adds a multiple of one row to another row.

🔢 Working with elementary matrix products

🔢 Expressing a matrix as a product of EROs

Process: Keep track of EROs used during elimination to RREF.

Example from the excerpt: M goes through three EROs (E₁, E₂, E₃) to reach I.
The elimination sequence: M ~ E₁M ~ E₂E₁M ~ E₃E₂E₁M = I.
This means E₃E₂E₁M = I, so M = E₁⁻¹E₂⁻¹E₃⁻¹.
Order reversal: The product of inverse ERO matrices is in reverse order from the elimination steps.

🔄 Finding inverse ERO matrices

Each type of ERO has a straightforward inverse:

ERO type	How to invert
Row swap	Same matrix (swapping twice returns to original)
Scalar multiplication by c	Scalar multiplication by 1/c
Add c times row i to row j	Add -c times row i to row j

Example from excerpt: E₂ multiplies row 1 by 1/2, so E₂⁻¹ multiplies row 1 by 2.
Example: E₃ adds -1 times row 2 to row 3, so E₃⁻¹ adds +1 times row 2 to row 3.

✅ Verification by multiplication

Symbolic verification: M⁻¹M = ⋯E₂E₁E₁⁻¹E₂⁻¹⋯ = ⋯E₂E₂⁻¹⋯ = ⋯ = I.

Each ERO cancels with its inverse in sequence.
The excerpt shows this step-by-step: adjacent inverse pairs collapse to I.
Example: Multiplying E₁⁻¹E₂⁻¹E₃⁻¹ in the excerpt recovers the original matrix M.

📦 Matrix factorizations for computation

🔺 LU factorization

Goal: Stop elimination halfway—eliminate only below the diagonal.

Upper triangular (U): Result after eliminating entries below the diagonal.
Lower triangular (L): Product of inverse ERO matrices that performed the elimination.
The factorization: M = LU = (E₁⁻¹E₂⁻¹E₃⁻¹)U.
Why it matters: Frequently used in large computations in sciences and engineering.

Example structure from the excerpt:

Apply EROs E₁, E₂, E₃ to eliminate below diagonal, reaching U.
Then M = E₁⁻¹E₂⁻¹E₃⁻¹U.
L is lower triangular (has non-zero entries only on and below diagonal).
U is upper triangular (has non-zero entries only on and above diagonal).

🔷 LDU factorization

Goal: Further separate the diagonal scaling from the elimination.

L: Product of inverse EROs that eliminate below the diagonal (lower triangular).
D: Product of inverse EROs that set diagonal elements to 1 (diagonal matrix).
U: Product of inverse EROs that eliminate above the diagonal (upper triangular with 1s on diagonal).
The factorization: M = LDU.

Process breakdown:

First, eliminate below diagonal → intermediate upper triangular form.
Then, scale rows so diagonal entries become 1 → EROs E₄, E₅, E₆.
Rearrange: M = (E₁⁻¹E₂⁻¹E₃⁻¹)(E₄⁻¹E₅⁻¹E₆⁻¹)U.
Name the blocks: L (lower), D (diagonal), U (upper with 1s on diagonal).

🧩 Understanding the factorization blocks

Each block corresponds to a type of ERO:

Block	ERO type	Matrix form
L	Row addition below diagonal	Lower triangular
D	Row multiplication (scaling)	Diagonal
U	Row addition above diagonal	Upper triangular with 1s on diagonal

Don't confuse: The U in LU factorization is different from the U in LDU factorization—the LDU version has 1s on the diagonal because scaling has been separated into D.
The excerpt mentions PLDU factorization as an extension (P for permutation/row swaps), though details are not fully covered in this section.

LU, LDU, and PLDU Factorizations

2.3.4 LU , LDU , and P LDU Factorizations

🧭 Overview

🧠 One-sentence thesis

Matrix factorizations (LU, LDU, and PLDU) decompose a matrix into products of simpler matrices that encode different types of elementary row operations, enabling efficient solution of linear systems.

📌 Key points (3–5)

What factorization achieves: A matrix M can be written as products of matrices encoding EROs (elementary row operations)—L for elimination below the diagonal, D for scaling the diagonal, U for elimination above the diagonal, and P for row exchanges.
LU vs LDU structure: LU splits into lower-triangular (L) and upper-triangular (U) factors; LDU further separates out a diagonal matrix (D) from U.
When row exchange is needed: If bringing M to RREF requires row exchanges, the factorization becomes PLDU (or LDPU), where P encodes the row-exchange operations.
Common confusion: The inverses of ERO matrices appear in the factorization, not the ERO matrices themselves—e.g., if E eliminates, then E⁻¹ appears in the factorization.
Why it matters: Factorizations organize the elimination process systematically and allow solving multiple systems with the same matrix efficiently.

🧩 Structure of LDU factorization

🧩 What each factor represents

The excerpt shows that a matrix M can be written as M = LDU, where:

L (lower triangular): Product of inverses of EROs that eliminate below the diagonal by row addition. L has ones on the diagonal and nonzero entries only below the diagonal.
D (diagonal): Product of inverses of EROs that scale diagonal elements to 1 by row multiplication. D has nonzero entries only on the diagonal.
U (upper triangular): Product of inverses of EROs that eliminate above the diagonal by row addition. U has ones on the diagonal (in the LDU form) and nonzero entries only above the diagonal.

LDU factorization: A factorization of a matrix into blocks of EROs of various types—L is the product of the inverses of EROs which eliminate below the diagonal by row addition, D the product of inverses of EROs which set the diagonal elements to 1 by row multiplication, and U is the product of inverses of EROs which eliminate above the diagonal by row addition.

🔄 How the factorization is built

The excerpt shows the equation U = E₆E₅E₄E₃E₂E₁M can be rearranged as:

M = (E₁⁻¹E₂⁻¹E₃⁻¹)(E₄⁻¹E₅⁻¹E₆⁻¹)U

The first group (E₁⁻¹E₂⁻¹E₃⁻¹) becomes L.
The second group (E₄⁻¹E₅⁻¹E₆⁻¹) becomes D.
The final result of elimination is U.

Don't confuse: The factorization uses E⁻¹ (the inverse of the ERO matrix), not E itself. If you apply ERO E to eliminate, the factorization records E⁻¹.

📐 Example structure

The excerpt gives a concrete 4×4 example:

Factor	Structure	Role
L	Lower triangular with 1s on diagonal	Encodes elimination below diagonal
D	Diagonal matrix	Encodes scaling of pivots
U	Upper triangular with 1s on diagonal	Encodes elimination above diagonal

Example: The matrix with entries (2, 0, -3, 1) in the first row factors as L (with entries like -2, -1, 1 below the diagonal) times D (with diagonal 2, 1, 3, -3) times U (with entries like -3/2, 1/2, 2, 4/3 above the diagonal).

🔀 When row exchange is needed: PLDU factorization

🔀 The missing operation

The excerpt notes:

You may notice that one of the three kinds of row operation is missing from this story. Row exchange may be necessary to obtain RREF.

So far, the chapter assumed M can be brought to identity using only row multiplication and row addition.
If row exchange is necessary, the factorization becomes LDPU, where P is the product of inverses of EROs that perform row exchange.

🔀 LDPU structure

The excerpt shows Example 31 where the original matrix has a zero in the top-left position, requiring a row swap:

First, a permutation P swaps rows to avoid the zero.
Then the standard LDU factorization proceeds on the swapped matrix.
The final form is M = PLDU (or equivalently, P⁻¹M = LDU).

Don't confuse: The position of P can vary depending on convention. The excerpt writes both "LDPU" and shows P⁻¹M = LDU, meaning P appears on the left when you solve for M.

Example: A matrix starting with (0, 1, 2, 2) in the first row needs a row swap with the second row (2, 0, -3, 1) before elimination can proceed. The permutation matrix P encodes this swap.

🔗 Relationship to solving systems

🔗 Why factorizations help

Although the excerpt does not explicitly state applications, it places factorizations in the context of "Systems of Linear Equations" (chapter title) and mentions solving "multiple matrix equations with the same matrix" in the review problems.

Once you have M = LDU, solving Mx = b can be done in stages: solve Ly = b, then Dz = y, then Ux = z.
Each stage involves a triangular or diagonal system, which is easier than solving the original system directly.

🔗 Connection to EROs

The excerpt emphasizes that factorizations organize the ERO process:

Each factor records a specific type of ERO (or its inverse).
This systematic organization makes the elimination process transparent and reusable.

Don't confuse: The factorization is not a different method from Gaussian elimination—it is a way of recording and organizing the elimination steps you already perform.

Review Problems: Matrix Factorization and Elementary Row Operations

2.4 Review Problems

🧭 Overview

🧠 One-sentence thesis

Matrix factorization (LU, LDU, LDPU) expresses a matrix as a product of simpler matrices corresponding to different types of elementary row operations, enabling systematic solution of linear systems.

📌 Key points (3–5)

LDU factorization structure: L captures row additions below the diagonal, D captures diagonal scaling, and U captures row additions above the diagonal—each built from inverses of elementary row operation matrices.
When row exchange is needed: if a matrix cannot reach identity through only row addition and multiplication, a permutation matrix P is required, yielding LDPU factorization.
Common confusion: the factorization matrices (L, D, U, P) are products of inverses of the elementary row operation matrices used during elimination, not the ERO matrices themselves.
Practical advantage: once factored, multiple systems with the same coefficient matrix can be solved efficiently by reusing the factorization.
Solution set structure: linear systems Ax = b have exactly one solution, no solutions, or infinitely many solutions—never exactly two or three solutions.

🔢 Matrix factorization types

🔢 LDU factorization components

LDU factorization: M = LDU, where L is lower triangular (from eliminating below the diagonal), D is diagonal (from scaling diagonal entries to 1), and U is upper triangular (from eliminating above the diagonal).

Each factor corresponds to a specific type of elementary row operation (ERO):
- L: product of inverses of EROs that add multiples of one row to rows below it
- D: product of inverses of EROs that multiply rows by constants to set diagonal entries to 1
- U: product of inverses of EROs that add multiples of one row to rows above it
The excerpt shows that L has 1s on the diagonal and non-zero entries only below the diagonal.
D is diagonal with the pivot values on the diagonal.
U has 1s on the diagonal and non-zero entries only above the diagonal.

Example: The excerpt demonstrates that a 4×4 matrix can be written as M = LDU where L contains the elimination steps below the diagonal, D contains the scaling factors, and U contains the elimination steps above the diagonal.

🔄 When row exchange is necessary (LDPU)

The excerpt notes that so far, examples assumed M could reach identity using only row multiplication and row addition.
When row exchange is required: the factorization becomes M = LDPU, where P is the product of inverses of row exchange EROs.
The excerpt's Example 31 shows a matrix starting with 0 in the top-left position, requiring a row swap before elimination can proceed.

Don't confuse: P is not just any permutation—it specifically captures the row exchanges needed during the elimination process, applied in the order they were needed.

🔨 Building factorizations step-by-step

🔨 The inverse relationship

The factorization equation U = E₆E₅E₄E₃E₂E₁M can be rearranged as M = (E₁⁻¹E₂⁻¹E₃⁻¹)(E₄⁻¹E₅⁻¹E₆⁻¹)U.
Each Eᵢ is an elementary row operation matrix applied during forward elimination.
The factorization uses the inverses E⁻¹ᵢ of these matrices.
The excerpt groups these inverses by type: first three for L (below-diagonal elimination), next three for D (diagonal scaling), remaining for U (above-diagonal elimination).

🧮 Example structure from the excerpt

The excerpt shows:

E₁, E₂, E₃ eliminate below the diagonal → their inverses form L
E₄, E₅, E₆ scale the diagonal → their inverses form D
Any remaining operations eliminate above the diagonal → their inverses form U

Example: The excerpt displays explicit 4×4 matrices for E₄, E₅, E₆ and their inverses E₄⁻¹, E₅⁻¹, E₆⁻¹, showing how diagonal scaling operations (like multiplying row 1 by 1/2, row 3 by 1/3, row 4 by -1/3) are reversed in the factorization.

📝 Review problem themes

📝 Gaussian elimination with explicit ERO notation

Problem 1 asks students to write "the full system of equations describing the new rows in terms of the old rows above each equivalence symbol."
This reinforces that each elimination step is a linear combination of previous rows.
The excerpt emphasizes making the relationship between old and new rows explicit at every step.

📝 Solving systems using ERO matrices

Problem 2: "Solve the vector equation by applying ERO matrices to each side of the equation to perform elimination. Show each matrix explicitly."
This approach treats the system as a matrix equation and applies the same ERO matrices to both sides.
Problem 3: Find the inverse through (M | I) ~ (I | M⁻¹) and apply M⁻¹ to both sides.

Comparison of approaches:

Method	What you do	When it's efficient
ERO matrices explicitly	Apply E₁, E₂, ... to both sides	Understanding the mechanics
Find M⁻¹ first	Compute (M \| I) ~ (I \| M⁻¹), then solve	Multiple systems with same M

📝 Simultaneous solution of multiple systems

Problem 5a: "Solve both systems by performing elimination on just one augmented matrix."
When multiple systems share the same coefficient matrix, you can augment with multiple right-hand-side vectors and solve all at once.
Problem 5b asks for interpretation: "Give an interpretation of the columns of M⁻¹ in (M | I) ~ (I | M⁻¹) in terms of solutions to certain systems of linear equations."
This connects the columns of M⁻¹ to solutions of systems where the right-hand side is each column of the identity matrix.

⚠️ Common mistake warning

Problem 6 asks: "How can you convince your fellow students to never make this mistake?"

The excerpt shows an error where three row operations are applied simultaneously:

R'₁ = R₁ + R₂
R'₂ = R₁ - R₂
R'₃ = R₁ + 2R₂

The problem: R'₃ uses R₂, but R₂ has already been changed to R'₂. When performing multiple row operations, you must either:

Apply them one at a time, or
Use the original rows on the right-hand side for all operations in a simultaneous step

🔍 Uniqueness and problem creation

Problem 7: "Is LU factorization of a matrix unique? Justify your answer."
The excerpt does not provide the answer but prompts investigation.
Problem ∞ (the "infinity" problem): advice on creating practice problems by working backward—start with a simple RREF or factored form, then apply operations to make it look complicated.

Why this works: Starting with a random matrix often produces messy fractions; starting with the answer and working backward ensures clean numbers.

🌐 Solution set geometry

🌐 The trichotomy for linear systems

For linear equations Ax = b with A a linear operator and real scalars, there are exactly three possibilities: one solution, no solutions, or infinitely many solutions.

Never exactly two, three, or any other finite number greater than one.
The excerpt contrasts this with nonlinear equations like x(x - 1) = 0, which can have exactly two solutions.

🌐 1×1 case (number line)

The excerpt analyzes three scenarios:

Equation	Number of solutions	Solution set geometry
6x = 12	One (x = 2)	A single point
0x = 12	None	Empty set
0x = 0	Infinitely many	The entire number line ℝ

When the operator (the coefficient) is invertible (6 ≠ 0), there is exactly one solution.
When the operator is not invertible (coefficient is 0), there are either no solutions or infinitely many.

🌐 2×2 case (plane)

The excerpt extends the pattern to 2×2 matrices:

Case 1 (invertible matrix):

The matrix is invertible (both diagonal entries non-zero in the example).
Exactly one solution: a single point in the plane.

Case 2a (no solution):

Matrix is not invertible.
The system is inconsistent (second equation 0 = 1 is impossible).
Solution set is empty.

Case 2bi (line solution set):

Matrix is not invertible.
Solution set: {(4, 0) + y(-3, 1) : y ∈ ℝ}.
This describes a line in the plane: a particular solution (4, 0) plus all multiples of a direction vector (-3, 1).

Case 2bii (entire plane):

The zero matrix: all equations are 0 = 0.
Solution set: {(x, y) : x, y ∈ ℝ}, the entire plane.

Don't confuse: "Infinitely many solutions" can mean different geometric objects—a line in 2D, a plane in 3D, etc.—depending on the dimension and the rank of the matrix.

🌐 Higher dimensions

The excerpt begins to discuss r equations in k variables.
For three variables, each equation represents a plane in 3D space.
The solution set is where all these planes intersect: a point, a line, a plane, empty, etc.

Pattern: As dimension increases, the geometric objects representing solution sets become higher-dimensional (points, lines, planes, hyperplanes), but the trichotomy (one, none, or infinitely many) always holds.

Solution Sets for Systems of Linear Equations

2.5 Solution Sets for Systems of Linear Equations

🧭 Overview

🧠 One-sentence thesis

Linear systems Ax = b have exactly one of three outcomes—one solution, no solutions, or infinitely many solutions—and the geometry of the solution set is determined by the number of free (non-pivot) variables.

📌 Key points (3–5)

Trichotomy property: Linear systems with real scalars always have either exactly one solution, no solutions, or infinitely many solutions (never two or three solutions like polynomial equations).
Invertibility determines outcome: When the linear operator is invertible, there is exactly one solution; when not invertible, there may be no solutions or infinitely many.
Free variables control geometry: The number of free (non-pivot) variables determines the dimension of the solution set—zero free variables gives a point, one gives a line, two gives a plane, etc.
Common confusion: For k unknowns, there are k + 2 possible outcomes (not k): no solutions, plus k + 1 cases corresponding to 0, 1, 2, ..., k free parameters.
Hyperplanes generalize planes: Solution sets with free parameters are called hyperplanes, which behave like planes in three-dimensional space but exist in higher dimensions.

🔢 The three-outcome property

🔢 Why linear systems differ from polynomial equations

Polynomial equations like x(x - 1) = 0 can have multiple discrete solutions (0 and 1 in this case).
Linear equations Ax = b with real scalars never have "two solutions" or "three solutions."

Linear system trichotomy: If A is a linear operator and b is known, then Ax = b has either (1) one solution, (2) no solutions, or (3) infinitely many solutions.

This is a fundamental property distinguishing linear from nonlinear problems.

🔑 Role of invertibility

When the linear operator A is invertible: exactly one solution exists.
When A is not invertible: either no solutions or infinitely many solutions.
Example: The 1×1 case shows this clearly:
- 6x = 12 (invertible) → one solution: x = 2
- 0x = 12 (not invertible) → no solution
- 0x = 0 (not invertible) → infinitely many solutions (all of R)

📐 Geometric interpretation of solution sets

📐 1×1 matrices: points and lines

Case	Equation	Invertible?	Solution set	Geometry
1	6x = 12	Yes	x = 2	A point on the number line
2a	0x = 12	No	None	Empty
2b	0x = 0	No	All of R	The whole number line

📐 2×2 matrices: points, lines, and planes

The excerpt gives four examples with 2×2 matrices:

Case 1 (invertible):

Matrix with 6 and 2 on diagonal → one solution: (2, 3)
Geometry: a single point in the plane

Case 2a (no solutions):

Matrix with second row all zeros, but right-hand side has 1 in second position
Geometry: empty (contradictory equations)

Case 2bi (line):

Same matrix but right-hand side has 0 in second position
Solution set: {(4, 0) + y(-3, 1) : y ∈ R}
Geometry: a line in the plane (one free parameter y)

Case 2bii (plane):

Zero matrix with zero right-hand side
Solution set: {(x, y) : x, y ∈ R}
Geometry: the entire plane (two free parameters)

📐 Three variables: planes intersecting

For r equations in three variables, each equation represents a plane in three-dimensional space.

Solutions are the common intersection of all these planes.

Five possible outcomes:

Unique solution: The planes intersect at exactly one point.
No solutions (2a): Some equations are contradictory; planes do not share a common intersection.
Line (2bi): The planes intersect along a common line (one free parameter).
Plane (2bii): Either one equation only, or all equations coincide geometrically (two free parameters).
All of R³ (2biii): No constraints; any point in three-dimensional space is a solution (three free parameters).

🧮 Counting outcomes and free parameters

🧮 The k + 2 formula

For systems with k unknowns:

There are k + 2 possible outcomes total.
This breaks down as:
- 1 outcome with no solutions
- k + 1 outcomes corresponding to 0, 1, 2, ..., k free parameters

🧮 Free parameters determine dimension

0 free parameters → unique solution (a point)
1 free parameter → line
2 free parameters → plane
k free parameters → all of R^k

Don't confuse: The number of free parameters is not the same as the number of variables; it depends on how many columns lack pivots after row reduction.

🏗️ Hyperplanes as generalized planes

🏗️ What hyperplanes are

Hyperplanes: Generalizations of planes that behave like planes in R³ in many ways.

Solution sets with free parameters are called hyperplanes.
They extend the concept of planes to higher dimensions.
Example: In R³, a plane is a two-parameter family of points; in R⁴, a hyperplane might be a three-parameter family.

🏗️ How to identify the geometry

The excerpt states that non-pivot variables determine the geometry of the solution set.

Process:

Reduce the system to reduced row echelon form.
Identify pivot columns (columns containing a pivot).
Variables corresponding to non-pivot columns are free variables.
The number of free variables = the dimension of the solution set.

Example from the excerpt:

Matrix in reduced row echelon form with 4 variables (x₁, x₂, x₃, x₄)
Pivot columns: columns 1 and 2
Non-pivot columns: columns 3 and 4
Free variables: x₃ and x₄
Geometry: a two-dimensional hyperplane (like a plane) in four-dimensional space

The excerpt notes: "Following the standard approach, express the pivot variables [in terms of free variables]" (text cuts off here, but the idea is to write pivot variables as functions of free variables to describe the solution set).

The Geometry of Solution Sets: Hyperplanes

2.5.1 The Geometry of Solution Sets: Hyperplanes

🧭 Overview

🧠 One-sentence thesis

Linear systems Ax = b have solution sets that are geometric objects—points, lines, planes, or hyperplanes—whose shape depends on the number of free variables, and every solution can be written as one particular solution plus any combination of homogeneous solutions.

📌 Key points (3–5)

Three possibilities for linear systems: exactly one solution, no solutions, or infinitely many solutions (never "finitely many but more than one").
Geometry determined by free variables: the number of non-pivot variables (free parameters) determines whether the solution set is a point, line, plane, or higher-dimensional hyperplane.
Structure of infinite solution sets: any solution can be written as x_P + μ₁x_H1 + μ₂x_H2 + ... where x_P is one particular solution and x_H are homogeneous solutions.
Common confusion: homogeneous solutions do NOT solve the original equation Mx = v; they solve the associated equation Mx = 0.
Invertibility matters: when the matrix is invertible there is exactly one solution; when not invertible, there may be no solutions or infinitely many.

🔢 The three-outcome rule for linear systems

🔢 Why linear equations behave differently

For equations Ax = b where A is a linear operator with real scalars, there are exactly three possibilities: one solution, no solutions, or infinitely many solutions.

This contrasts with nonlinear algebra problems (e.g., x(x - 1) = 0) which can have multiple but finite solutions.
The linearity of the operator A forces this trichotomy.
You will never see "exactly two solutions" or "exactly five solutions" for a linear system.

🔍 Small examples illustrating all three cases

1×1 case (single variable):

6x = 12 → one solution: x = 2 (invertible operator)
0x = 12 → no solution (not invertible, inconsistent)
0x = 0 → infinitely many solutions: the entire real line R (not invertible, consistent)

2×2 case (two variables):

Invertible matrix → one solution (a point in the plane)
Non-invertible with inconsistency → no solutions
Non-invertible but consistent → infinitely many solutions (a line or the entire plane)

Example: the system with matrix (1 3; 0 0) and right-hand side (4; 0) has solution set {(4, 0) + y(-3, 1) : y ∈ R}, which is a line in the plane.

🌐 Geometric interpretation: hyperplanes

🌐 Three equations in three variables

Each equation in three variables represents a plane in three-dimensional space. The solution set is the common intersection of all these planes.

Five geometric possibilities:

Unique solution: the planes meet at exactly one point. 2a. No solutions: some equations are contradictory; the planes do not share a common intersection. 2bi. Line: the planes intersect along a common line (one free parameter). 2bii. Plane: all equations describe the same plane, or there is only one equation (two free parameters). 2biii. All of R³: no constraints; any point works (three free parameters).

📐 General pattern for k unknowns

There are k + 2 possible outcomes for a system with k unknowns.
The outcomes correspond to 0, 1, 2, ..., k free parameters, plus the "no solutions" case.
These solution sets are called hyperplanes: generalizations of planes that behave like planes in R³.

Don't confuse: The number of free parameters determines the dimension of the solution set, not the number of equations or variables alone.

🧩 Particular solution + homogeneous solutions

🧩 How free variables determine geometry

Non-pivot variables (free variables) are those corresponding to columns without a pivot in reduced row echelon form.

The number of free variables determines the geometric shape of the solution set.
Pivot variables are expressed in terms of non-pivot variables.
Non-pivot variables can take any value; they are the parameters μ₁, μ₂, etc.

📝 Example breakdown

Consider the system:

Matrix in reduced row echelon form with columns 1 and 2 having pivots; columns 3 and 4 are non-pivot.
Variables x₃ and x₄ are free.
Express pivot variables in terms of free ones:
- x₁ = 1 - x₃ + x₄
- x₂ = 1 + x₃ - x₄
- x₃ = x₃
- x₄ = x₄

This can be rewritten as:

(x₁, x₂, x₃, x₄) = (1, 1, 0, 0) + x₃(-1, 1, 1, 0) + x₄(1, -1, 0, 1)

Or in set notation:

S = {(1, 1, 0, 0) + μ₁(-1, 1, 1, 0) + μ₂(1, -1, 0, 1) : μ₁, μ₂ ∈ R}

This solution set forms a plane (two free parameters).

🔑 Terminology: particular vs homogeneous

x_P = (1, 1, 0, 0) is a particular solution: it solves the original equation Mx = v.
x_H1 = (-1, 1, 1, 0) and x_H2 = (1, -1, 0, 1) are homogeneous solutions: they solve the associated equation Mx = 0.

The general solution set is written:

S = {x_P + μ₁x_H1 + μ₂x_H2 : μ₁, μ₂ ∈ R}

🧮 Why this structure works: linearity

🧮 Verifying the solution structure

Because matrices are linear operators, we can use linearity to verify the solution structure:

M(x_P + μ₁x_H1 + μ₂x_H2) = Mx_P + μ₁Mx_H1 + μ₂Mx_H2

Setting μ₁ = μ₂ = 0:

We get Mx_P = v, confirming x_P is a particular solution.

Setting μ₁ = 1, μ₂ = 0 and subtracting Mx_P = v:

We get Mx_H1 = 0, confirming x_H1 is a homogeneous solution.

Setting μ₁ = 0, μ₂ = 1:

We get Mx_H2 = 0, confirming x_H2 is a homogeneous solution.

⚠️ Critical distinction

Don't confuse: Homogeneous solutions x_H1 and x_H2 do NOT solve the original equation Mx = v. They solve the associated homogeneous equation Mx = 0.

The particular solution x_P solves Mx = v.
The homogeneous solutions describe the "directions" you can move from x_P while staying in the solution set.
Every solution is x_P plus some linear combination of homogeneous solutions.

Type	Equation it solves	Role
Particular solution x_P	Mx = v	One specific solution to the original problem
Homogeneous solutions x_H	Mx = 0	Directions that span the solution space
General solution	Mx = v	x_P + all combinations of x_H solutions

Particular Solution + Homogeneous Solutions

2.5.2 Particular Solution + Homogeneous Solutions

🧭 Overview

🧠 One-sentence thesis

The solution set to a linear system Ax = b always consists of one particular solution to the original equation plus all homogeneous solutions to the associated equation Ay = 0.

📌 Key points (3–5)

Structure of solution sets: Every solution can be written as a particular solution plus a combination of homogeneous solutions.
Particular solution: Any single vector x_P that satisfies Mx_P = v (the original equation).
Homogeneous solutions: Vectors x_H that satisfy the associated homogeneous equation My = 0 (right-hand side is zero).
Common confusion: Homogeneous solutions do not solve the original equation Mx = v; they solve Mx = 0, but adding them to a particular solution yields other particular solutions.
Free parameters determine geometry: The number of free variables (non-pivot variables) determines the shape of the solution set (point, line, plane, hyperplane).

🧩 Core structure of solution sets

🧩 The fundamental decomposition

Fundamental lesson: The solution set to Ax = b, where A is a linear operator, consists of a particular solution plus homogeneous solutions.

The excerpt states this as:

{Solutions} = {Particular solution + Homogeneous solutions}

How it works:

Start with any one solution x_P to the original equation Mx_P = v.
Find all solutions x_H to the homogeneous equation My = 0.
Every solution to the original equation can be written as x_P plus some combination of the homogeneous solutions.

Example: If the solution set is written as

x_P + μ₁ x_H1 + μ₂ x_H2 (where μ₁, μ₂ are any real numbers),
then x_P is the particular solution,
and x_H1, x_H2 are the homogeneous solutions.

🔍 Why this decomposition works (linearity)

The excerpt explains this using the linearity of matrix multiplication:

M(x_P + μ₁ x_H1 + μ₂ x_H2) = Mx_P + μ₁ Mx_H1 + μ₂ Mx_H2
Since Mx_P = v and Mx_H1 = 0 and Mx_H2 = 0,
the result is v + μ₁·0 + μ₂·0 = v.

Key insight: Adding any multiple of a homogeneous solution to the particular solution yields another particular solution.

🎯 Particular solutions

🎯 What is a particular solution

Particular solution x_P: A vector that satisfies the original equation Mx_P = v.

How to identify it:

In the solution set notation, it is the constant vector (the one without any free parameters μ).
Setting all free parameters to zero (μ₁ = μ₂ = ... = 0) gives the particular solution.

Example from the excerpt:

The solution set is written as (1,1,0,0) + μ₁(-1,1,1,0) + μ₂(1,-1,0,1).
The particular solution is x_P = (1,1,0,0).
Plugging this into the original matrix equation confirms it satisfies Mx_P = v.

🔄 Not the only solution

The excerpt emphasizes: "this is not the only solution."

The particular solution is just one example of a solution.
The full solution set includes infinitely many solutions (if there are free parameters).

🏠 Homogeneous solutions

🏠 What is a homogeneous solution

Homogeneous solution x_H: A vector that satisfies the associated homogeneous equation My = 0.

Key distinction:

Homogeneous solutions do not solve the original equation Mx = v.
They solve the related equation where the right-hand side is zero.

Example from the excerpt:

x_H1 = (-1,1,1,0) satisfies Mx_H1 = 0.
x_H2 = (1,-1,0,1) satisfies Mx_H2 = 0.

🔗 How homogeneous solutions relate to the full solution set

The excerpt explains:

Each homogeneous solution is multiplied by a free parameter (μ₁, μ₂, etc.).
These parameters can be any real numbers.
The homogeneous solutions span the "directions" in which the solution set extends.

Don't confuse:

Homogeneous solutions alone are not solutions to the original problem.
But when added to a particular solution, they generate all solutions.

🧮 Free variables and geometry

🧮 Non-pivot variables determine geometry

The excerpt states: "It is the number of free variables that determines the geometry of the solution set."

How it works:

After reducing to row echelon form, columns without pivots correspond to free variables.
Each free variable introduces one free parameter (μ₁, μ₂, etc.) in the solution set.
The number of free parameters determines the dimension of the solution set.

Number of free variables	Geometry of solution set
0	Unique solution (a point)
1	Line
2	Plane
k	k-dimensional hyperplane

📐 Example walkthrough

The excerpt provides a detailed example:

Original system: three equations in four unknowns (x₁, x₂, x₃, x₄).
After row reduction, x₃ and x₄ are non-pivot (free) variables.
Express pivot variables in terms of free variables:
- x₁ = 1 - x₃ + x₄
- x₂ = 1 + x₃ - x₄
- x₃ = x₃
- x₄ = x₄
Rewrite as: (x₁, x₂, x₃, x₄) = (1,1,0,0) + x₃(-1,1,1,0) + x₄(1,-1,0,1).
Two free variables → the solution set forms a plane.

🔢 Set notation

The preferred way to write the solution set uses set notation:

S = {(1,1,0,0) + μ₁(-1,1,1,0) + μ₂(1,-1,0,1) : μ₁, μ₂ ∈ R}
This notation makes clear that μ₁ and μ₂ can be any real numbers.

Note: The excerpt mentions that "the first two components of the second two terms come from the non-pivot columns," linking the structure of the homogeneous solutions to the original matrix.

Solutions and Linearity

2.5.3 Solutions and Linearity

🧭 Overview

🧠 One-sentence thesis

The solution set to a linear equation Mx = v always decomposes into a particular solution plus all homogeneous solutions, which is a fundamental structural property of linear systems.

📌 Key points (3–5)

Core structure: Any solution to Mx = v can be written as x_P + μ₁x_H₁ + μ₂x_H₂ + ... where x_P solves the original equation and each x_H solves the associated homogeneous equation My = 0.
Particular vs homogeneous: The particular solution satisfies Mx_P = v; homogeneous solutions satisfy Mx_H = 0 and do not solve the original equation.
Why this works: Linearity of the matrix operator M allows us to split M(x_P + μ₁x_H₁ + μ₂x_H₂) = Mx_P + μ₁Mx_H₁ + μ₂Mx_H₂ = v + 0 + 0 = v.
Common confusion: Homogeneous solutions alone do not solve Mx = v; they must be added to a particular solution to generate the full solution set.
Geometric interpretation: The solution set forms a plane (or higher-dimensional affine subspace) shifted from the origin by the particular solution.

🧩 The fundamental decomposition

🧩 Solution set structure

Fundamental lesson of linear algebra: {Solutions} = {Particular solution + Homogeneous solutions}

For the matrix equation Mx = v, the complete solution set is written:
- S = {x_P + μ₁x_H₁ + μ₂x_H₂ : μ₁, μ₂ ∈ ℝ}
Here μ₁ and μ₂ are free parameters (real numbers) that can take any value.
The excerpt emphasizes this is true for any linear operator A and equation Ax = b.

🔍 Why the decomposition works

The excerpt derives this structure from the linearity property of matrices:

Start with M(x_P + μ₁x_H₁ + μ₂x_H₂)
Apply linearity: = Mx_P + μ₁Mx_H₁ + μ₂Mx_H₂
This equals v for any choice of μ₁, μ₂ ∈ ℝ

Key insight: Because M is linear, we can distribute it across the sum and pull out scalar multiples.

🎯 Particular solutions

🎯 What makes a solution "particular"

Particular solution x_P: a specific vector that satisfies the original equation Mx_P = v.

To isolate the particular solution from the general form, set all free parameters to zero: μ₁ = μ₂ = 0.
Then M(x_P + 0·x_H₁ + 0·x_H₂) = Mx_P = v.
Example from the excerpt: x_P = (1, 1, 0, 0) is one particular solution, but it is not the only solution to the original equation.

🔄 Generating other particular solutions

Adding any multiple of a homogeneous solution to the particular solution yields another particular solution.
Example: If x_P solves Mx = v and x_H solves Mx_H = 0, then M(x_P + μx_H) = Mx_P + μMx_H = v + 0 = v.
This is why the solution set contains infinitely many solutions (when homogeneous solutions exist).

🏠 Homogeneous solutions

🏠 What makes a solution "homogeneous"

Homogeneous solution x_H: a vector that solves the associated homogeneous equation My = 0.

The homogeneous equation replaces the right-hand side v with the zero vector 0.
Homogeneous solutions do not solve the original equation Mx = v.
Instead, they describe the "directions" along which you can move from one particular solution to another.

🔬 Deriving homogeneous solutions

The excerpt shows how to extract homogeneous solutions from the general form:

Set μ₁ = 1, μ₂ = 0 in the general solution.
This gives M(x_P + x_H₁) = Mx_P + Mx_H₁ = v.
Subtract Mx_P = v from both sides: Mx_H₁ = 0.
Similarly, setting μ₁ = 0, μ₂ = 1 gives Mx_H₂ = 0.

Don't confuse: The homogeneous solutions x_H₁, x_H₂ are not solutions to Mx = v; they are solutions to the different equation Mx = 0.

📐 Worked example

📐 Example 33 breakdown

The excerpt revisits a matrix equation with solution set:

S = {(1, 1, 0, 0) + μ₁(-1, 1, 1, 0) + μ₂(1, -1, 0, 1) : μ₁, μ₂ ∈ ℝ}

Component	Vector	What it satisfies	Role
x_P	(1, 1, 0, 0)	Mx_P = v	Particular solution to original equation
x_H₁	(-1, 1, 1, 0)	Mx_H₁ = 0	Homogeneous solution
x_H₂	(1, -1, 0, 1)	Mx_H₂ = 0	Homogeneous solution

🧮 Connection to non-pivot variables

The excerpt notes that x₃ and x₄ are non-pivot variables (free variables).
The first two components of the homogeneous vectors come from the non-pivot columns of the matrix.
This explains why there are two homogeneous solutions: there are two free parameters corresponding to two non-pivot variables.

🌐 Geometric interpretation

The excerpt states "the solution set forms a plane."
The particular solution (1, 1, 0, 0) anchors the plane at a specific location.
The two homogeneous solutions (-1, 1, 1, 0) and (1, -1, 0, 1) span the plane's directions.
Every point on the plane is reached by starting at x_P and moving μ₁ units along x_H₁ and μ₂ units along x_H₂.

2.6 Review Problems

🧭 Overview

🧠 One-sentence thesis

The solution set to any linear equation Ax = b consists of one particular solution plus all homogeneous solutions (solutions to Ax = 0), a fundamental structure that applies whenever A is a linear operator.

📌 Key points (3–5)

Solution structure: Solutions = Particular solution + Homogeneous solutions; adding any multiple of a homogeneous solution to a particular solution yields another particular solution.
What homogeneous solutions are: solutions to the equation Mx = 0 (where the right-hand side is zero).
Einstein summation notation: a compact way to write sums where repeated indices are automatically summed, and dummy indices can be relabeled.
Common confusion: the solution set is not just one answer—it's a particular solution plus a space of homogeneous solutions; different solution methods give different descriptions of the same set.
Linearity matters: this particular + homogeneous structure holds for linear operators but may fail for non-linear operators.

🧩 Solution set structure

🧩 Particular plus homogeneous

Solution set structure: {Solutions} = {Particular solution + Homogeneous solutions}

The excerpt calls this a "fundamental lesson of linear algebra."
For a matrix equation Ax = b:
- A particular solution x_P satisfies Ax_P = b.
- A homogeneous solution x_H satisfies Ax_H = 0 (the homogeneous equation).
Any solution can be written as x_P + (some combination of homogeneous solutions).

🔄 How adding homogeneous solutions works

Adding any multiple of a homogeneous solution to the particular solution yields another particular solution.
Example from the excerpt: if x_P solves the original equation and x_H₁, x_H₂ solve the homogeneous equation, then x_P + μ₁x_H₁ + μ₂x_H₂ (for any real numbers μ₁, μ₂) is also a solution.
This explains why linear systems can have infinitely many solutions: you can scale and combine the homogeneous solutions freely.

📋 Example 33 breakdown

The excerpt gives a concrete example:

Solution set: S = {(1,1,0,0) + μ₁(−1,1,1,0) + μ₂(1,−1,0,1) : μ₁, μ₂ ∈ ℝ}
(1,1,0,0) is the particular solution (Mx_P = v).
(−1,1,1,0) is one homogeneous solution (Mx_H₁ = 0).
(1,−1,0,1) is another homogeneous solution (Mx_H₂ = 0).
The particular solution is "not the only solution"—the full solution set includes all combinations.

🔢 Einstein summation notation

🔢 What it is

Einstein summation notation: a shorthand where repeated indices in a product are automatically summed, allowing expressions like a₂₁x¹ + a₂₂x² + ⋯ + a₂ₖxᵏ to be written simply as a₂ⱼxʲ.

Invented by Albert Einstein to write sums more compactly.
The index j is a dummy variable: a₂ⱼxʲ ≡ a₂ᵢxⁱ (relabeling dummy indices gives the same result).
Important: x² does not mean "x squared"—it means the second component of the vector x; superscripts are indices, not exponents.

⚠️ Common pitfall: products of sums

When dealing with products of sums, you must introduce a new dummy for each term.
Wrong: aᵢxᵢbᵢyᵢ = Σᵢ aᵢxᵢbᵢyᵢ (reuses the same index i in both factors).
Right: aᵢxᵢbⱼyⱼ = (Σᵢ aᵢxᵢ)(Σⱼ bⱼyⱼ) (each sum gets its own dummy index).
Don't confuse: the dummy index is just a placeholder; relabeling it doesn't change the value, but mixing dummies incorrectly changes the meaning.

📝 Review problem themes

📝 Types of solution sets (Problem 1)

The excerpt asks for examples of augmented matrices corresponding to "five types of solution sets for systems of equations with three unknowns."
This refers to the geometric classification: no solution, unique solution, line of solutions, plane of solutions, etc.
Goal: practice recognizing how different augmented matrices lead to different solution structures.

📝 Multiple solution descriptions (Problem 2)

Invent a system with multiple solutions.
Solve it using the standard approach and a non-standard approach.
Key question: "Is the solution set different with different approaches?"
Answer (implied by the excerpt): No—the solution set is the same; only the description (choice of particular solution and basis for homogeneous solutions) differs.

📝 Matrix-vector multiplication rule (Problem 3)

Given a matrix M with entries aᵢⱼ and a vector x with components xʲ, propose a rule for Mx using Einstein summation notation.
The rule should make Mx = 0 equivalent to the system of linear equations shown.
Also show that this rule obeys the linearity property (i.e., M(cx + dy) = cMx + dMy for scalars c, d and vectors x, y).

📝 Standard basis vectors (Problem 4)

Standard basis vector eᵢ: a column vector with a one in the i-th row and zeroes everywhere else.

Using the matrix-vector multiplication rule from Problem 3, find a simple rule for Meᵢ.
Hint: multiplying M by eᵢ should "pick out" the i-th column of M.

📝 Non-linear operators (Problem 5)

Question: If A is a non-linear operator, can solutions to Ax = b still be written as "particular + homogeneous"?
The excerpt asks for examples to explore this.
Implication: the particular + homogeneous structure is special to linear operators; it may break down when A is non-linear.

📝 Inequalities and restricted ranges (Problem 6)

Find a system of equations whose solution set is "the walls of a 1×1×1 cube."
Hint: you may need to restrict the ranges of the variables; the equations might not be linear.
This problem foreshadows the next chapter (Chapter 3, The Simplex Method), which deals with inequalities and optimization rather than pure equalities.

🔗 Connections and context

🔗 Webwork problems

The excerpt lists specific problem numbers:

Reading problems: 4, 5
Solution sets: 20, 21, 22
Geometry of solutions: 23, 24, 25, 26

These are practice problems to reinforce the concepts.

🔗 Transition to Chapter 3

The excerpt ends with a preview: Chapter 3 will cover situations where inequalities appear instead of equalities.
Such problems involve finding an optimal solution that extremizes a quantity of interest.
For linear functions, these are called linear programming problems, solved by methods like the simplex algorithm.
Example: Pablo's problem (designing a school lunch program with constraints on fruit quantities and minimizing sugar intake).

Pablo's Problem

3.1 Pablo’s Problem

🧭 Overview

🧠 One-sentence thesis

Linear programming problems with inequality constraints can be solved by finding the optimal value of a linear objective function within the feasible region defined by those constraints, and the optimal solution always lies at a vertex of that region.

📌 Key points (3–5)

What linear programming handles: situations with linear inequalities (constraints) where you want to optimize (maximize or minimize) a linear function.
Feasible region: the set of all variable values that satisfy all the constraints; in two variables, this is a region in the plane.
Key insight: for linear programming problems, the optimal answer must lie at a vertex (corner) of the feasible region, not in the middle.
Common confusion: constraints vs. the objective function—constraints are inequalities that define what solutions are allowed; the objective function is what you're trying to optimize.
Graphical method: when there are only two variables, you can plot the feasible region and visually identify the optimal vertex.

🍎 The setup: translating a word problem into mathematics

🍎 Pablo's original problem

Pablo must design a school lunch program with competing requirements:

The school board (influenced by fruit growers) requires at least 7 oranges and 5 apples per week.
Parents and teachers want at least 15 pieces of fruit per week.
Janitors insist on no more than 25 pieces of fruit per week (to avoid mess).
Oranges have twice as much sugar as apples; apples have 5 grams of sugar each.
Pablo's goal: minimize children's sugar intake.

🔢 Mathematical restatement

Let x = number of apples and y = number of oranges.

Constraints (inequalities that must be satisfied):

x ≥ 5 (at least 5 apples)
y ≥ 7 (at least 7 oranges)
x + y ≥ 15 (at least 15 pieces of fruit total)
x + y ≤ 25 (at most 25 pieces of fruit total)

Objective function (what to minimize):

s = 5x + 10y (total grams of sugar)

The problem asks: minimize s subject to the four linear inequalities.

Don't confuse: The constraints define what is allowed; the objective function defines what you're trying to optimize.

📐 Graphical solution method

📐 Feasible region concept

Feasible region: the set of all values of the variables that satisfy all the constraints.

When there are two variables, the feasible region can be plotted in the (x, y) plane.
Each inequality constraint defines a half-plane; the feasible region is where all these half-planes overlap.
In Pablo's problem, the feasible region is bounded by four lines: x = 5, y = 7, x + y = 15, and x + y = 25.

Example: The feasible region for Pablo's problem is a quadrilateral (four-sided polygon) in the plane, with vertices at the corners where constraint lines intersect.

🎯 Finding the optimal solution

The excerpt states a key principle:

The optimal answer must lie at a vertex of the feasible region.

Why this works (intuitive explanation from the excerpt):

The objective function s(x, y) = 5x + 10y is a plane through the origin when plotted in three dimensions.
Restricting to the feasible region gives a flat surface (lamina) in 3-space.
Since the function is linear and non-zero, if you pick any point in the middle of this surface, you can always increase or decrease the function by moving to an edge, and then along that edge to a corner.
Therefore, the extreme values (maximum or minimum) must occur at the vertices.

🍊 Pablo's solution

Applying the vertex principle to Pablo's problem:

Oranges are very sugary (10 grams each vs. 5 grams for apples), so minimize y.
The less fruit the better (to minimize sugar), so the answer should lie on the lower boundary x + y = 15.
Combining these: y = 7 (the minimum allowed) and x + y = 15 gives x = 8.
The optimal vertex is (8, 7): 8 apples and 7 oranges.
Total sugar: s = 5(8) + 10(7) = 40 + 70 = 110 grams per week.

Don't confuse: The feasible region has multiple vertices; you must evaluate the objective function at each vertex to find which one is optimal (or use reasoning about the direction of optimization, as shown here).

🔧 When to use graphical methods

🔧 Limitations and scope

The graphical technique works when the number of variables is small (preferably 2).
With three or more variables, visualization becomes difficult or impossible.
The excerpt mentions that a more general algorithm (the simplex algorithm) will be introduced for handling larger problems.

🆚 Linear vs. non-linear optimization

The excerpt contrasts linear programming with non-linear optimization:

For linear functions, the optimal solution lies at a vertex of the feasible region.
The excerpt hints that non-linear optimization behaves differently (the sentence is cut off, but the implication is that non-linear functions may have optima in the interior of the feasible region, not just at vertices).

Feature	Linear programming	Non-linear (implied contrast)
Objective function	Linear (e.g., s = 5x + 10y)	Non-linear
Optimal location	Always at a vertex	May be in the interior
Graphical appearance	Plane (flat surface)	Curved surface

Graphical Solutions

3.2 Graphical Solutions

🧭 Overview

🧠 One-sentence thesis

For linear programming problems with few variables, graphical methods reveal that the optimal solution always lies at a vertex of the feasible region defined by the constraints.

📌 Key points (3–5)

When graphical methods work: when the number of variables is small, preferably 2, constraints can be plotted to visualize the feasible region.
Feasible region: the set of all variable values that satisfy all the constraints (inequalities).
Where the answer lies: the optimal solution to a linear programming problem must be at a vertex (corner) of the feasible region, not in the middle.
Common confusion: linear vs non-linear optimization—linear functions only need checking endpoints/vertices, while non-linear functions require checking derivatives for interior extrema.
Why vertices matter: because the objective function is linear, moving from any interior point toward an edge and then to a corner will always improve (or maintain) the value.

📐 Constraints and the feasible region

📏 What constraints are

Constraints: inequalities that the variables must satisfy in a linear programming problem.

In Pablo's problem, the constraints are:
- x ≥ 5 (at least 5 apples)
- y ≥ 7 (at least 7 oranges)
- 15 ≤ x + y ≤ 25 (between 15 and 25 pieces of fruit total)
These are linear inequalities that restrict which values of x and y are allowed.

🗺️ The feasible region

Feasible region: the set of all values of the variables that satisfy all the constraints.

For two variables, this region can be plotted in the (x, y) plane.
The excerpt shows that Pablo's feasible region is bounded by the four constraint lines.
Only points inside (or on the boundary of) this region are valid solutions.
Example: A point like (3, 7) would violate x ≥ 5, so it is not in the feasible region.

🎯 Finding the optimal solution

🔺 Why the answer is at a vertex

The excerpt states: "the optimal answer must lie at a vertex of the feasible region."
Reasoning: Since the objective function (what we want to minimize or maximize) is linear and non-zero, if you pick any point in the middle of the feasible region, you can always improve the function value by moving to an edge, and then along that edge to a corner.
Example: In Pablo's problem, the sugar function s(x, y) = 5x + 10y is minimized at the vertex (8, 7), which gives 110 grams of sugar per week.

🍊 Solving Pablo's problem graphically

Step 1: Plot the feasible region using the four constraints.
Step 2: Identify the objective—minimize sugar s = 5x + 10y.
Step 3: Recognize that oranges have more sugar (10 grams) than apples (5 grams), so keep y as low as possible: y = 7.
Step 4: To minimize total fruit (and thus sugar), stay on the lower boundary x + y = 15.
Step 5: The intersection of y = 7 and x + y = 15 gives the vertex (8, 7).
Answer: 8 apples and 7 oranges, for a total of 110 grams of sugar per week.

📊 Visualizing the objective function

The excerpt describes plotting the sugar function s(x, y) = 5x + 10y in three dimensions.
A linear function of two variables forms a plane through the origin.
Restricting to the feasible region gives a "lamina" (a flat surface) in 3-space.
Because the function is linear, the minimum or maximum must occur at a corner of this lamina, not in the interior.

🔄 Linear vs non-linear optimization

🆚 Key difference

Type	Where to check	Why
Linear function	Only endpoints/vertices	Moving toward a corner always improves or maintains the value
Non-linear function	Endpoints + interior points	Must compute derivatives to find extrema inside the interval

🧮 Linear case

To optimize a linear function f(x) over an interval [a, b], compute and compare f(a) and f(b).
No need to check the interior—the extreme value is always at a boundary.
Example: Minimizing f(x) = 2x on [1, 5] → just check f(1) = 2 and f(5) = 10; minimum is at x = 1.

🌀 Non-linear case

For non-linear functions, extrema can occur inside the interval.
Must compute the derivative df/dx and solve for where it equals zero.
Don't confuse: linear programming avoids this complexity because linearity guarantees corner solutions.

🚀 When graphical methods are practical

📉 Limitations of graphical solutions

The excerpt notes that graphical techniques work "when the number of variables is small (preferably 2)."
With three or more variables, visualization becomes difficult or impossible.
Many real applications have "thousands or even millions of variables and constraints," making graphical methods impractical.
This motivates the need for algorithmic approaches like the simplex algorithm (covered in the next section).

✅ When to use graphical methods

Best for two-variable problems where you can draw the feasible region on a plane.
Useful for understanding the geometry of linear programming and why vertices matter.
Example: Pablo's problem is ideal for graphical solution because it has only two variables (apples and oranges).

Dantzig's Algorithm

3.3 Dantzig’s Algorithm

🧭 Overview

🧠 One-sentence thesis

Dantzig's simplex algorithm solves linear programming problems with many variables and constraints by systematically performing row operations on an augmented matrix until the optimal solution is found at a vertex of the feasible region.

📌 Key points (3–5)

Why an algorithm is needed: graphical methods work for two variables, but real applications may have thousands or millions of variables and constraints, requiring a computer-implementable method.
How the algorithm works: arrange the objective function and constraints in an augmented matrix, then use elementary row operations (EROs) to zero out negative coefficients in the last row while keeping constraint values positive.
Termination condition: the algorithm stops when all coefficients in the last row (except the objective value) are non-negative, at which point setting certain variables to zero maximizes the objective.
Common confusion: negative coefficients in the objective row mean those variables are determined by constraints and must be eliminated, not that they help the objective directly.
Standard form requirements: the problem must be set up as "maximize a linear function subject to equality constraints with non-negative variables"—inequalities and minimization require transformation tricks.

🎯 The standard problem format

🎯 What Dantzig's algorithm solves

Standard problem: Maximize f(x₁, ..., xₙ) where f is linear, xᵢ ≥ 0 (i = 1, ..., n) subject to Mx = v, where M is an m × n matrix and v is an m × 1 column vector.

The function f must be linear (no squared terms, products, etc.).
All variables must be non-negative (greater than or equal to zero).
Constraints must be equalities (Mx = v), not inequalities.
The goal is maximization, not minimization.

🔧 Key insight: adding constraints to the objective

Suppose you want to maximize f(x₁, ..., xₙ) subject to a constraint c(x₁, ..., xₙ) = k.
You can instead maximize f(x₁, ..., xₙ) + α·c(x₁, ..., xₙ) for any constant α.
Why: this only shifts f by a constant α·k, which doesn't change where the maximum occurs.
This insight justifies adding multiples of constraint rows to the objective row during the algorithm.

🔢 Setting up the augmented matrix

🔢 Matrix structure

The information is arranged as:

Part	What it contains	Location
First n columns	The n variables (and any slack/artificial variables)	All rows
Constraint rows	Coefficients from Mx = v	All rows except last
Objective row	Coefficients from the objective function equation	Last row only
Last column	Constraint values (top) and objective value (bottom)	Right edge

📝 Encoding the objective function

If the objective is f = 3x - 3y - z + 4w, rewrite it as an equation: -3x + 3y + z - 4w + f = 0.
This becomes the last row of the augmented matrix.
The last entry in this row tracks the current value of the objective function.

Example: For f = 3x - 3y - z + 4w with constraints c₁ and c₂, the matrix is:

[ 1   1   1   1   0   5 ]  ← c₁ = 5
[ 1   2   3   2   0   6 ]  ← c₂ = 6
[-3   3   1  -4   1   0 ]  ← f = 3x - 3y - z + 4w

🔄 Running the algorithm

🔄 Step 1: Find the most negative coefficient

Scan the last row (objective row) for negative coefficients (ignoring the last entry).
Pick the most negative coefficient—this identifies a variable that needs to be eliminated from the objective.
Don't confuse: a negative coefficient like -4 multiplying a positive variable w does NOT help the objective; it means w is constrained and must be removed from the objective equation.

🔄 Step 2: Choose which constraint row to use

You will add a multiple of one constraint row to the objective row to zero out the negative coefficient.
Decision rule: choose the constraint row that adds the smallest constant to f.
How to check: for each candidate row, divide the last column entry (constraint value) by the coefficient in the column you're zeroing out; pick the smallest ratio.

Example: To zero out -4 in column 4:

Using row 1 (coefficient 1): would add 4 × 5 = 20 to f.
Using row 2 (coefficient 2): would add 2 × 6 = 12 to f.
Choose row 2 because 12 < 20.

🔄 Step 3: Perform the row operation

Add the appropriate multiple of the chosen constraint row to the objective row to zero out the negative coefficient.
Then use that same constraint row to zero out all other entries in that column (except the constraint row itself).
This ensures you don't "undo good work" in later steps.
Critical: all entries in the last column (constraint values) must remain positive after each operation; this is why step 2's choice matters.

🔄 Step 4: Repeat or terminate

Repeat steps 1–3 until all coefficients in the last row (except the last entry) are non-negative.
Termination: when no negative coefficients remain, the algorithm is done.
At termination, set all variables with positive coefficients in the objective row to zero; the remaining variables are determined by the constraints.
The last entry in the last row is the maximum value of the objective function.

Example: If the final objective row is [0 7 6 0 1 16], then f = 16 - 7y - 6z. To maximize, set y = 0 and z = 0, giving f = 16.

🛠️ Transforming problems into standard form

🛠️ Handling non-negative variable requirements

Problem: variables like x and y may not satisfy xᵢ ≥ 0.
Solution: introduce new variables with a shift.

Example: If x ≥ 5 and y ≥ 7, define x₁ = x - 5 and x₂ = y - 7. Now x₁ ≥ 0 and x₂ ≥ 0.

🛠️ Converting inequalities to equalities: slack variables

Problem: constraints like x + y ≥ 3 or x + y ≤ 13 are inequalities, not equalities.
Solution: introduce slack variables (new positive variables) to "take up the slack."

Slack variables: positive variables added to convert inequality constraints into equality constraints.

Example:

x₁ + x₂ ≥ 3 becomes x₁ + x₂ - x₃ = 3 with x₃ ≥ 0.
x₁ + x₂ ≤ 13 becomes x₁ + x₂ + x₄ = 13 with x₄ ≥ 0.

🛠️ Minimization vs maximization

Problem: the standard form maximizes, but you may need to minimize (e.g., minimize sugar s).
Solution: maximize the negative of the objective.

Example: To minimize s = 5x + 10y, define f = -s (or f = -s + constant for convenience). Maximizing f is equivalent to minimizing s.

🛠️ Artificial variables: the "dirty trick"

Problem: after setting up the matrix, the simplex algorithm may terminate prematurely because setting all original variables to zero doesn't satisfy the constraints.
Solution: introduce artificial variables x₅, x₆, ... and modify the constraints and objective.

Artificial variables: positive variables added temporarily to help the algorithm start; they are penalized heavily in the objective so they vanish at the optimum.

Shift each constraint: c₁ → c₁ - x₅, c₂ → c₂ - x₆.
Modify the objective: f → f - α·x₅ - α·x₆ for a large positive α (e.g., α = 10).
Why it works: for large α, the objective is only maximal when artificial variables are zero, so the original problem is unchanged.
Before running the main algorithm, perform row operations to zero out the artificial variable coefficients in the objective row.

Example: If the objective row initially has coefficients for x₅ and x₆, subtract multiples of the constraint rows to make those coefficients zero before proceeding.

🎓 Why the algorithm works

🎓 Linear functions and vertices

For linear optimization over a feasible region, the optimum always occurs at a vertex (corner) of the region.
Intuition: a linear function is a plane; if you pick a point in the middle of the feasible region, you can always move to an edge and then to a corner to increase/decrease the function.
Contrast with calculus: for non-linear functions, you must compute derivatives to find extrema inside the interval; for linear functions, you only need to check the endpoints (vertices).

🎓 What the algorithm is doing

Each iteration of the simplex algorithm moves from one vertex of the feasible region to an adjacent vertex.
The row operations systematically explore vertices by "pivoting" on constraint equations.
Termination occurs when no adjacent vertex can improve the objective—this is the optimal vertex.

🎓 Advantages over graphical methods

Graphical method: only works for 2 (or maybe 3) variables; requires plotting and visual inspection.
Simplex algorithm: works for any number of variables; can be implemented on a computer; much faster for large problems.
Trade-off: the simplex algorithm is slower by hand for small problems (like Pablo's fruit problem), but essential for real applications with thousands of variables.

3.4 Pablo Meets Dantzig

🧭 Overview

🧠 One-sentence thesis

To apply the simplex algorithm to real problems like Pablo's, you must transform the problem into standard form using slack variables and artificial variables, converting inequalities into equalities and minimization into maximization.

📌 Key points (3–5)

Standard form requirements: variables must be non-negative, constraints must be equalities (M x = v), and the objective must be maximized.
Slack variables: introduced to convert inequality constraints into equalities by absorbing the "slack" difference.
Artificial variables: added when the initial solution doesn't satisfy constraints; combined with a large penalty coefficient to force them to zero at the optimum.
Common confusion: minimizing sugar s is equivalent to maximizing f = −s (plus any constant); the constant doesn't change the optimal solution.
Why it matters: these transformations allow any linear programming problem to be solved algorithmically by computer, much faster than checking all vertices manually.

🔧 Transforming variables to standard form

🔧 Making variables non-negative

The simplex algorithm requires all variables to satisfy x_i ≥ 0, but Pablo's original variables x and y do not obey this constraint.

Solution: Define new variables as shifts of the originals:

x₁ = x − 5
x₂ = y − 7

This ensures x₁ and x₂ can be treated as non-negative in the standard formulation.

Don't confuse: The shift values (5 and 7) come from Pablo's specific problem context; they are not arbitrary—they represent baseline values that make the new variables naturally non-negative.

📐 Converting inequalities to equalities

📐 Slack variables

Slack variables: positive variables introduced to take up the "slack" required to convert inequality constraints into equality constraints.

Pablo's fruit constraints are inequalities:

15 ≤ x + y ≤ 25

In terms of the new variables:

x₁ + x₂ ≥ 3
x₁ + x₂ ≤ 13

How slack variables work:

For the lower bound (≥ 3): introduce x₃ ≥ 0 and write c₁ := x₁ + x₂ − x₃ = 3
For the upper bound (≤ 13): introduce x₄ ≥ 0 and write c₂ := x₁ + x₂ + x₄ = 13

The slack variable x₃ "absorbs" how much the sum exceeds the minimum, while x₄ absorbs how much room remains below the maximum.

📐 Matrix form

The two equality constraints can now be written as M x = v:

Coefficient matrix	Variables	Equals	Right-hand side
1 1 −1 0	x₁, x₂, x₃, x₄	=	3
1 1 0 1	x₁, x₂, x₃, x₄	=	13

🎯 Transforming the objective function

🎯 Minimization to maximization

Pablo wants to minimize sugar s = 5x + 10y, but the standard simplex algorithm maximizes an objective function f.

Solution: Define the objective function as f = −s + 95 = −5x₁ − 10x₂.

Why this works:

Maximizing −s is equivalent to minimizing s
Adding the constant 95 doesn't change which solution is optimal; it's chosen so f is a linear function of (x₁, x₂)
The optimal value of s can be recovered: s = −f + 95

🎯 Initial augmented matrix

The augmented matrix includes the objective function equation 5x₁ + 10x₂ + f = 0 as the last row:

 1   1  −1   0   0   3
 1   1   0   1   0  13
 5  10   0   0   1   0

Problem: The last row has only positive coefficients, suggesting the algorithm terminates immediately with x₁ = 0 = x₂. However, this does not solve the constraints for positive slack variables x₃ and x₄.

🎭 Artificial variables trick

🎭 Why artificial variables are needed

When setting all decision variables to zero doesn't satisfy the constraints (even with slack variables), the simplex algorithm cannot start from a feasible solution.

The "very dirty" trick: Add artificial variables x₅ and x₆ (both positive) to shift each constraint:

c₁ → c₁ − x₅
c₂ → c₂ − x₆

🎭 Penalty mechanism

Modify the objective function to f − αx₅ − αx₆ where α is a large positive number (the excerpt uses α = 10).

How it works:

For large α, the modified objective is only maximal when the artificial variables vanish (x₅ = 0, x₆ = 0)
When artificial variables vanish, the original problem is unchanged
The choice of α = 10 is sufficient; the solution does not depend on the exact value as long as it's large enough

🎭 Modified augmented matrix

With artificial variables and α = 10:

 1   1  −1   0   1   0   0   3
 1   1   0   1   0   1   0  13
 5  10   0   0  10  10   1   0

Initial row operation: Perform R'₃ = R₃ − 10R₁ − 10R₂ to zero out the coefficients of the artificial variables:

 1   1  −1   0   1   0   0    3
 1   1   0   1   0   1   0   13
−15 −10  10 −10   0   0   1 −160

Now the algorithm is ready to run exactly as in the standard simplex method.

🔄 Running the simplex algorithm

🔄 First pivot operation

Use the 1 in the top of the first column to zero out the most negative entry (−15) in the last row:

 1   1  −1   0   1   0   0    3
 1   1   0   1   0   1   0   13
 0   5  −5 −10  15   0   1 −115

Then R'₂ = R₂ − R₁:

 1   1   1   0   1   0   0    3
 0   0   1   1  −1   1   0   10
 0   5  −5 −10  15   0   1 −115

🔄 Second pivot operation

R'₃ = R₃ + 10R₂:

 1   1   1   0   1   0   0    3
 0   0   1   1  −1   1   0   10
 0   5   5   0   5  10   1  −15

🔄 Reading the solution

Now variables (x₂, x₃, x₅, x₆) have zero coefficients in the last row, so they must be set to zero to maximize f.

Optimal values:

f = −15, so s = −f + 95 = 110 (the minimum sugar)
From the constraints: x₁ = 3 and x₄ = 10
Converting back: x = x₁ + 5 = 8 and y = x₂ + 7 = 7

This agrees with the previous graphical result.

💻 Algorithm vs. manual calculation

💻 Trade-offs

By hand: The simplex algorithm was slow and complex for Pablo's problem—many transformations and row operations were needed.

Key advantage: It is an algorithm that can be fed to a computer.

Scalability: For problems with many variables, this method is much faster than checking all vertices manually (as done in the graphical method of section 3.2).

Example: A problem with dozens of variables and constraints would have an impractical number of vertices to check, but the simplex algorithm systematically finds the optimum through pivot operations.

Review Problems

3.5 Review Problems

🧭 Overview

🧠 One-sentence thesis

These review problems apply linear programming techniques—both graphical methods and the simplex algorithm—to optimization problems with constraints, demonstrating how to find maximum values by checking feasible region corners or using algorithmic row operations.

📌 Key points (3–5)

Two solution methods: graphical (sketch constraints, check corners) and algorithmic (simplex method with slack/artificial variables).
Real-world application: the Conoil problem shows how to model profit maximization under resource and budget constraints.
Simplex algorithm workflow: introduce slack variables to convert inequalities to equalities, add artificial variables if needed, then perform row operations to find the optimum.
Common confusion: the simplex algorithm is slower by hand but much faster on computers for many-variable problems; graphical methods only work well for two-variable cases.
Key result verification: both methods should yield the same answer, providing a double-check on correctness.

📝 Problem 1: Basic linear programming

📝 The optimization task

Maximize f(x, y) = 2x + 3y subject to:

x ≥ 0
y ≥ 0
x + 2y ≤ 2
2x + y ≤ 2

🗺️ Part (a): Graphical method

Sketch the region in the xy-plane defined by all four constraints.
The feasible region is bounded by the intersection of these inequalities.
Check the value of f at each corner (vertex) of this region.
The maximum value occurs at one of these corners.

🔢 Part (b): Simplex algorithm

Introduce slack variables to convert inequalities to equalities.
The hint explicitly tells you to use slack variables.
Apply the simplex algorithm as shown in section 3.3.
The result should match the graphical method's answer.

🛢️ Problem 2: Conoil optimization

🛢️ The business scenario

Context:

Conoil operates two oil wells (A and B) in southern Grease.
Goal: determine how many barrels to pump from each well to maximize profit.

Key constraints:

Well A oil is worth 50% more per barrel than well B oil (better quality).
Total pumping cannot exceed 6 million barrels per year (environmental regulation).
Well A costs twice as much as well B to operate.
Operating budget limits pumping to at most 10 million barrels from well B per year.

💰 Profit considerations

All profit goes to shareholders, not operating costs.
The quality difference means well A generates more revenue per barrel.
Operating costs differ: well A is twice as expensive to run as well B.
The optimization must balance higher revenue from A against higher operating costs.

🔍 Solution approach

Two methods required:

Graphical method: plot constraints and find the corner that maximizes profit.
Dantzig's algorithm (the simplex method): use as a double-check to verify the graphical solution.

Why both methods?

The excerpt emphasizes using the algorithmic approach "as a double check."
Agreement between methods confirms the answer is correct.
Demonstrates that different techniques solve the same problem.

⚠️ Don't confuse

"10 million barrels from well B" is the budget constraint (what the budget can afford), not a production target.
The 6 million barrel limit applies to total pumping from both wells combined, not each well individually.
Well A's higher value per barrel doesn't automatically mean pumping only from A—operating costs matter too.

🔄 Connection to the simplex method

🔄 Why the simplex algorithm matters here

The excerpt's earlier example (Pablo's problem) showed:

By hand, the simplex algorithm is "slow and complex."
However, it is an algorithm that can be fed to a computer.
For problems with many variables, this method is "much faster than simply checking all vertices."

🔄 Application to these problems

Problem 1 is a practice exercise to master the technique.
Problem 2 (Conoil) is still a two-variable problem (barrels from A, barrels from B), so graphical methods work.
The double-check requirement reinforces understanding of both approaches.
In real-world scenarios with more wells or constraints, only the algorithmic method would be practical.

4.1 Addition and Scalar Multiplication in Rⁿ

4.1 Addition and Scalar Multiplication in R n

🧭 Overview

🧠 One-sentence thesis

n-vectors can be added component-wise and multiplied by scalars, enabling algebraic operations that extend familiar geometric ideas to arbitrarily high dimensions.

📌 Key points (3–5)

What n-vectors are: ordered lists of n numbers where order matters; the set of all n-vectors is denoted Rⁿ.
Two fundamental operations: addition (add corresponding components) and scalar multiplication (multiply every component by the scalar).
The zero vector: a special vector with all components equal to zero; it labels the origin and has zero magnitude with no particular direction.
Common confusion: superscript notation—aᵢ denotes the i-th component of vector a, not "a to the power i."
Geometric extension: lines, planes, and hyperplanes generalize to Rⁿ using parametric equations built from vector addition and scalar multiplication.

📐 What n-vectors are

📐 Definition and notation

An n-vector is an ordered list of n numbers: a = (a¹, ..., aⁿ).

The components are labeled with superscripts: a¹ is the first component, a² is the second, and so on.
Don't confuse: a² means "the second component of a," not "a squared."
Order is essential—two vectors with the same components in different orders are not equal.

Example: The vector (7, 4, 2, 5) is not equal to (7, 2, 4, 5) because the second and third components are swapped.

🗂️ The set Rⁿ

Rⁿ := {(a¹, ..., aⁿ) | a¹, ..., aⁿ ∈ R}

Rⁿ is the set of all n-vectors whose components are real numbers.
This notation extends familiar spaces: R² is the plane, R³ is 3D space, and Rⁿ generalizes to any number of dimensions.

➕ The two fundamental operations

➕ Vector addition

Given two n-vectors a = (a¹, ..., aⁿ) and b = (b¹, ..., bⁿ), their sum is a + b := (a¹ + b¹, ..., aⁿ + bⁿ).

Add corresponding components: first to first, second to second, and so on.
Both vectors must have the same number of components.

Example: If a = (1, 2, 3, 4) and b = (4, 3, 2, 1), then a + b = (5, 5, 5, 5).

✖️ Scalar multiplication

Given a scalar λ, the scalar multiple is λa := (λa¹, ..., λaⁿ).

Multiply every component of the vector by the scalar.
The result is another n-vector in the same space.

Example: If a = (1, 2, 3, 4), then 3a = (3, 6, 9, 12).

🔗 Combining operations

You can combine addition and scalar multiplication in one expression.

Example: With a = (1, 2, 3, 4) and b = (4, 3, 2, 1), the expression 3a − 2b = (3·1 − 2·4, 3·2 − 2·3, 3·3 − 2·2, 3·4 − 2·1) = (−5, 0, 5, 10).

🎯 The zero vector

🎯 Definition and properties

The zero vector has all components equal to zero: 0 = (0, ..., 0) =: 0ₙ.

It is the only vector with zero magnitude.
It is the only vector that points in no particular direction.
In Euclidean geometry, the zero vector labels the origin O.

🧭 Role in geometry

n-vectors label points P in space.
The zero vector is the reference point (origin) from which all other points are measured.
Don't confuse: the zero vector is not "nothing"—it is a specific vector with a geometric role.

🌐 Hyperplanes and geometric objects

📏 Lines in Rⁿ

A line L along direction v through point P (labeled by vector u) is L = {u + tv | t ∈ R}.

The parameter t ranges over all real numbers.
As t varies, u + tv traces out all points on the line.

Example: The set {(1, 2, 3, 4) + t(1, 0, 0, 0) | t ∈ R} describes a line in R⁴ parallel to the x¹-axis.

🛫 Planes in Rⁿ

A plane determined by two vectors u and v through point P is {P + su + tv | s, t ∈ R}.

Two parameters s and t allow movement in two independent directions.
Caution: u and v must not be scalar multiples of each other; otherwise they lie on the same line and do not determine a plane.

Example: The set {(3, 1, 4, 1, 5, 9) + s(1, 0, 0, 0, 0, 0) + t(0, 1, 0, 0, 0, 0) | s, t ∈ R} describes a plane in 6-dimensional space parallel to the xy-plane.

🔲 k-dimensional hyperplanes

A set of k+1 vectors P, v₁, ..., vₖ in Rⁿ (with k ≤ n) determines a k-dimensional hyperplane: {P + λ₁v₁ + ... + λₖvₖ | λᵢ ∈ R}.

This is a recursive definition generalizing lines (k=1) and planes (k=2) to higher dimensions.
Exception: if any vector vⱼ can be written as a combination of the other k−1 vectors, the set does not determine a k-dimensional hyperplane—it collapses to a lower dimension.

Example: The set {(3, 1, 4, 1, 5, 9) + s(1, 0, 0, 0, 0, 0) + t(0, 1, 0, 0, 0, 0) + u(1, 1, 0, 0, 0, 0) | s, t, u ∈ R} is not a 3-dimensional hyperplane because (1, 1, 0, 0, 0, 0) = 1·(1, 0, 0, 0, 0, 0) + 1·(0, 1, 0, 0, 0, 0); it is actually a 2-dimensional plane.

👁️ Visualization limits

Vectors in Rⁿ are impossible to visualize unless n is 1, 2, or 3.
However, the algebraic definitions of lines, planes, and hyperplanes remain valid and useful for any n.
Don't confuse: inability to visualize does not mean the objects are undefined or meaningless—they are fully specified by their parametric equations.

Hyperplanes

4.2 Hyperplanes

🧭 Overview

🧠 One-sentence thesis

Hyperplanes generalize lines and planes to any dimension by using parametric combinations of direction vectors, and they can be specified either parametrically or by a single algebraic equation when the dimension is n−1.

📌 Key points (3–5)

Lines and planes in n dimensions: A line is defined by a point plus a scalar multiple of one direction vector; a plane by a point plus scalar multiples of two direction vectors.
k-dimensional hyperplanes: A k-dimensional hyperplane is determined by a point P and k direction vectors, written as P plus all linear combinations of those vectors.
When vectors fail to determine a hyperplane: If any direction vector is a linear combination of the others, the set of vectors does not determine a k-dimensional hyperplane but instead collapses to a lower dimension.
Common confusion: "Hyperplane" without a dimension qualifier usually means (n−1)-dimensional in R^n, which is the solution set to one linear equation; don't confuse this with the general k-dimensional case.
Two ways to specify: Hyperplanes can be written parametrically (point plus direction vectors) or as the solution set to a linear equation (for (n−1)-dimensional hyperplanes).

📐 Lines and planes in n-dimensional space

📏 Line definition

A line L along the direction defined by a vector v and through a point P labeled by a vector u can be written as L = {u + tv | t ∈ R}.

The line is the set of all points you get by starting at u and moving any scalar multiple t of the direction vector v.
Order: one point, one direction vector, one parameter t.
Example: The set {(1,2,3,4) + t(1,0,0,0) | t ∈ R} describes a line in R^4 parallel to the x₁-axis.

🗺️ Plane definition

The plane determined by two vectors u and v can be written as {P + su + tv | s, t ∈ R}.

A plane requires a point P and two direction vectors u and v.
The sum u + v corresponds to laying the two vectors head-to-tail; if u and v determine a plane, their sum lies in that plane.
When two vectors fail to determine a plane: If both vectors lie on the same line (one is a scalar multiple of the other), they do not determine a plane.
Example: The set {(3,1,4,1,5,9) + s(1,0,0,0,0,0) + t(0,1,0,0,0,0) | s, t ∈ R} describes a plane in 6-dimensional space parallel to the xy-plane.

🔢 k-dimensional hyperplanes

🧮 Parametric definition

A set of k+1 vectors P, v₁, …, vₖ in R^n with k ≤ n determines a k-dimensional hyperplane: {P + sum of λᵢvᵢ (i from 1 to k) | λᵢ ∈ R}.

The hyperplane is the set of all points formed by starting at P and adding any linear combination of the k direction vectors.
This is a recursive definition: it generalizes lines (k=1) and planes (k=2) to any dimension k.

⚠️ When vectors do not determine a k-dimensional hyperplane

The definition fails if any of the direction vectors vⱼ lives in the (k−1)-dimensional hyperplane determined by the other k−1 vectors.

In other words: if one direction vector is a linear combination of the others, the set collapses to a lower dimension.
Example: The set S = {(3,1,4,1,5,9) + s(1,0,0,0,0,0) + t(0,1,0,0,0,0) + u(1,1,0,0,0,0) | s, t, u ∈ R} is not a 3-dimensional hyperplane because (1,1,0,0,0,0) = 1·(1,0,0,0,0,0) + 1·(0,1,0,0,0,0).
The third vector is redundant; the set can be rewritten with only two direction vectors, so it is actually a 2-dimensional hyperplane.

🔤 Default meaning of "hyperplane"

When the dimension k is not specified, "hyperplane" usually means k = n−1 for a hyperplane inside R^n.

This is the kind of object specified by one algebraic equation in n variables.
Don't confuse: a general k-dimensional hyperplane (parametric form with k direction vectors) vs. an (n−1)-dimensional hyperplane (solution set to one equation).

🧾 Algebraic specification of hyperplanes

📝 One equation, (n−1) dimensions

An (n−1)-dimensional hyperplane in R^n can be written as the solution set to one linear equation.

Example: The solution set to x₁ + x₂ + x₃ + x₄ + x₅ = 1 is a 4-dimensional hyperplane in R^5.
Rewriting the equation: (x₁, x₂, x₃, x₄, x₅) = (1 − x₂ − x₃ − x₄ − x₅, x₂, x₃, x₄, x₅).
Parametric form: (1,0,0,0,0) + s₂(−1,1,0,0,0) + s₃(−1,0,1,0,0) + s₄(−1,0,0,1,0) + s₅(−1,0,0,0,1) for s₂, s₃, s₄, s₅ ∈ R.
This shows the same hyperplane in two forms: algebraic (one equation) and parametric (one point plus four direction vectors).

🔄 Comparison of representations

Representation	Form	Dimension	Parameters
Parametric	Point + direction vectors	k-dimensional	k scalars (λ₁, …, λₖ)
Algebraic	One linear equation	(n−1)-dimensional in R^n	n−1 free variables

Parametric form is more general (works for any k); algebraic form is standard for (n−1)-dimensional hyperplanes.
Both describe the same geometric object when k = n−1.

Directions and Magnitudes

4.3 Directions and Magnitudes

🧭 Overview

🧠 One-sentence thesis

The dot product of n-vectors allows us to define Euclidean length, angles, and orthogonality in n-dimensional space, and these concepts generalize through inner products to other geometries like special relativity.

📌 Key points (3–5)

Dot product definition: multiply corresponding components and sum them; this operation underlies length and angle calculations.
Length (norm) from dot product: the square root of a vector dotted with itself gives Euclidean length.
Angle and orthogonality: the dot product relates to the cosine of the angle between vectors; zero dot product means perpendicular.
Common confusion: the dot product is not the only way to measure length and angle—other inner products (like the Lorentzian) change what "distance" and "time" mean.
Key inequalities: Cauchy–Schwarz bounds the dot product by the product of lengths; the triangle inequality says the direct path is shortest.

📐 The dot product and its geometric meaning

📐 Defining the dot product

Dot product: For u = (u₁, …, uₙ) and v = (v₁, …, vₙ), the dot product is u · v := u₁v₁ + ··· + uₙvₙ.

It is a single number, not a vector.
You multiply corresponding components and add them all up.
Example: The dot product of (1, 2, 3, 4, …, 100) and (1, 1, 1, 1, …, 1) is 1 + 2 + 3 + ··· + 100 = 5050 (the sum Gauß famously computed as a child).

📏 Euclidean length (norm)

Length (norm, magnitude) of an n-vector v is ‖v‖ := √(v · v).

In words: the square root of the sum of the squares of all components.
Equivalently, ‖v‖ = √((v₁)² + (v₂)² + ··· + (vₙ)²).
Example: The norm of (1, 2, 3, 4, …, 101) is √(1² + 2² + ··· + 101²) = √37,961.

📐 Deriving the angle formula with the Law of Cosines

The excerpt shows how the dot product emerges from the Law of Cosines applied to two vectors u and v that span a plane in Rⁿ:

Connect the ends of u and v with the vector v − u.
Law of Cosines: ‖v − u‖² = ‖u‖² + ‖v‖² − 2‖u‖‖v‖ cos θ.
Expand ‖v − u‖² and simplify to isolate cos θ.
Result: ‖u‖‖v‖ cos θ = u₁v₁ + ··· + uₙvₙ = u · v.

Angle θ between two vectors is determined by u · v = ‖u‖‖v‖ cos θ.

Example: The angle between (1, 2, 3, …, 101) and (1, 0, 1, 0, …, 1) is arccos(10,201 / (√37,916 √51)).

⊥ Orthogonality (perpendicularity)

Orthogonal (perpendicular) vectors: two vectors whose dot product is zero.

When u · v = 0, the angle θ satisfies cos θ = 0, so θ = 90°.
Example: (1, 1, 1, …, 1) · (1, −1, 1, …, −1) = 0, so these vectors in R¹⁰¹ are orthogonal.
Special case: The zero vector 0ₙ is orthogonal to every vector in Rⁿ, because 0ₙ · v = 0 for all v.

🔧 Properties of the dot product

The dot product has four key algebraic properties:

Property	Formula	Meaning
Symmetric	u · v = v · u	Order doesn't matter
Distributive	u · (v + w) = u · v + u · w	Distributes over addition
Bilinear	u · (cv + dw) = c(u · v) + d(u · w) and (cu + dw) · v = c(u · v) + d(w · v)	Linear in both arguments
Positive definite	u · u ≥ 0, and u · u = 0 only when u = 0	Self-dot is always non-negative; zero only for the zero vector

These properties are what make the dot product an inner product.
The excerpt notes that other inner products exist; they are usually written ⟨u, v⟩ to avoid confusion with the standard dot product.

🌌 Beyond Euclidean geometry: other inner products

🌌 The Lorentzian inner product in special relativity

The excerpt introduces a non-Euclidean example:

Lorentzian inner product on R⁴: ⟨u, v⟩ = u₁v₁ + u₂v₂ + u₃v₃ − u₄v₄.

The fourth coordinate is "time"; the first three are spatial.
Not positive definite: the squared-length ‖v‖² = x² + y² + z² − t² can be zero or negative even for non-zero v.
Physical interpretation depends on the sign of ⟨X₁, X₂⟩ for two spacetime points:
- If ⟨X₁, X₂⟩ ≥ 0, they are separated by a distance √⟨X₁, X₂⟩.
- If ⟨X₁, X₂⟩ ≤ 0, they are separated by a time √(−⟨X₁, X₂⟩).
Don't confuse: the difference in time coordinates t₂ − t₁ is not the time between the two points (just as in polar coordinates, θ₂ − θ₁ is not the distance).

🔄 General inner products

An inner product is any operation ⟨ , ⟩ with the four properties listed above (symmetric, distributive, bilinear, positive definite).
Some contexts relax the positive definite requirement (as in the Lorentzian case).
Changing the inner product changes the notions of length and angle.

📊 Two fundamental inequalities

📊 Cauchy–Schwarz inequality

Cauchy–Schwarz inequality: For any non-zero vectors u and v with an inner product ⟨ , ⟩, |⟨u, v⟩| / (‖u‖‖v‖) ≤ 1.

In words: the absolute value of the inner product is at most the product of the lengths.
Equivalently, |⟨u, v⟩| ≤ ‖u‖‖v‖.
Why it holds: the easiest argument uses the fact that cos θ ≤ 1; the excerpt also gives an algebraic proof by considering the positive quadratic polynomial 0 ≤ ⟨u + αv, u + αv⟩ and finding its minimum.
Example: For a = (1, 2, 3, 4) and b = (4, 3, 2, 1), a · b = 20 and ‖a‖‖b‖ = 30, so 20 < 30 ✓.

📐 Triangle inequality

Triangle inequality: For any u, v ∈ Rⁿ, ‖u + v‖ ≤ ‖u‖ + ‖v‖.

In words: the length of the sum is at most the sum of the lengths (the direct path is shortest).
Proof sketch: expand ‖u + v‖² = ‖u‖² + ‖v‖² + 2‖u‖‖v‖ cos θ, note that cos θ ≤ 1, so ‖u + v‖² ≤ (‖u‖ + ‖v‖)².
Example: For a = (1, 2, 3, 4) and b = (4, 3, 2, 1), a + b = (5, 5, 5, 5), so ‖a + b‖² = 100 and (‖a‖ + ‖b‖)² = 120; indeed 100 < 120 ✓.

🛒 Vectors as functions: the notation R^S

🛒 From shopping lists to functions

The excerpt uses a shopping list analogy:

A set S = {apple, orange, onion, milk, carrot} by itself is not a vector.
A list that assigns a number to each item (e.g., 5 apples, 3 oranges, …) is a function f : S → R.
This function is a vector in disguise.

🔢 n-vectors as functions on {1, …, n}

An n-vector can be thought of as a function whose domain is the set {1, …, n}.

Two equivalent notations:
- Rⁿ = {column vectors (a₁, …, aₙ) | a₁, …, aₙ ∈ R}
- Rⁿ = {a : {1, …, n} → R} =: R^{1,…,n}
When the domain is ordered (like {1, …, n}), we naturally write components in order.

🔤 General function spaces R^S

R^S: the set of all functions from a set S to R.

For any set S, R^S := {f : S → R}.
Example: Let S = {∗, ?, #}. A particular element of R^S is the function a defined by a(?) = 3, a(#) = 5, a(∗) = −2.
Don't confuse: when S has no natural ordering, writing components in a column might cause confusion—there is no canonical "first" element.

4.4 Vectors, Lists and Functions: R^S

4.4 Vectors, Lists and Functions: R S

🧭 Overview

🧠 One-sentence thesis

Vectors can be understood as functions from a set S to the real numbers, unifying the idea of ordered lists (like n-vectors) with more general "shopping list" assignments where the domain has no natural ordering.

📌 Key points (3–5)

What R^S means: the set of all functions from a set S to the real numbers, generalizing the idea of n-vectors.
Two equivalent views of n-vectors: either as ordered lists of n numbers or as functions from {1, ..., n} to R.
Key difference: when S has no natural ordering (like {∗, ?, #}), writing components in a fixed order can cause confusion, unlike R^n where indices 1, ..., n provide natural ordering.
Common confusion: sets vs. vectors—a set like {apple, orange, onion} is not a vector because you cannot add sets; assigning numbers to each element (a function) makes it a vector.
Why it matters: vector operations (addition, scalar multiplication) work on R^S just as they do on R^n, because they are really operations on functions.

🛒 From shopping lists to functions

🛒 Sets are not vectors

A simple shopping list like S = {apple, orange, onion, milk, carrot} is just a set.
Problem: there is no information about ordering or quantity, and you cannot add such sets to one another.
Example: if you have {apple, orange} and I have {orange, carrot}, what does "addition" mean?

🔢 Assigning numbers makes a vector

A more careful shopping list assigns a number to each item: "5 apples, 3 oranges, 2 onions, 1 milk, 4 carrots."
This is really a function f : S → R, where each element of S is mapped to a real number (the quantity).
Why this is a vector: given two such lists, you can add them element-by-element (5 apples + 3 apples = 8 apples).
The excerpt emphasizes: "the second list is really a 5-vector in disguise."

🔄 Two equivalent notations for n-vectors

🔄 Ordered lists vs. functions

The excerpt gives two equivalent definitions of R^n:

View	Notation	Meaning
Ordered list	R^n := column vectors with entries a₁, ..., aₙ in R	Traditional vector notation
Function	R^n = {a : {1, ..., n} → R} = R^{1,...,n}	Each n-vector is a function from {1, ..., n} to R

Key insight: thinking of an n-vector as a function whose domain is {1, ..., n} is equivalent to thinking of it as an ordered list of n numbers.
The set {1, ..., n} has a natural ordering, so writing components in order (a₁, a₂, ..., aₙ) is unambiguous.

📐 General definition of R^S

R^S := {f : S → R}, the set of all functions from S to R.

For any set S, R^S denotes the collection of all functions that assign a real number to each element of S.
When S = {1, ..., n}, we recover R^n.
When S is an arbitrary set (like a shopping list), we get a more general notion of "vector."

🔀 When ordering matters (and when it doesn't)

🔀 Unordered sets cause notation problems

Example from the excerpt: S = {∗, ?, #} has no natural ordering.
A function a ∈ R^S might be defined by a_? = 3, a_# = 5, a_∗ = −2.
Problem: it is not natural to write a as a column vector like (3, 5, −2) or (−2, 3, 5), because the elements of S do not have an ordering.
As sets, {∗, ?, #} = {?, #, ∗}, so any fixed ordering is arbitrary and might cause confusion.

🔀 Similarities to R^3

Despite the ordering issue, R^S behaves like R^3 in important ways:
- You can add two elements of R^S.
- You can multiply elements of R^S by scalars.
The excerpt notes: "What is more evident are the similarities."

➕ Vector operations on R^S

➕ Addition in R^S

Example from the excerpt: if a, b ∈ R^{∗,?,#} with

a_? = 3, a_# = 5, a_∗ = −2
b_? = −2, b_# = 4, b_∗ = 13

then a + b is the function defined by:

(a + b)_? = 3 − 2 = 1
(a + b)_# = 5 + 4 = 9
(a + b)_∗ = −2 + 13 = 11

How it works: add the values at each element of S separately, just like adding n-vectors component-by-component.

✖️ Scalar multiplication in R^S

Example from the excerpt: if a ∈ R^{∗,?,#} with a_? = 3, a_# = 5, a_∗ = −2, then 3a is the function:

(3a)_? = 3 · 3 = 9
(3a)_# = 3 · 5 = 15
(3a)_∗ = 3(−2) = −6

How it works: multiply the value at each element of S by the scalar, just like scalar multiplication for n-vectors.

🌉 Bridging abstract and concrete

🌉 Visualization and abstraction

We visualize R² and R³ in terms of axes.
R⁴, R⁵, and R^n for larger n are "more abstract."
R^S seems "even more abstract" because the domain S may have no geometric structure.

🌉 Everyday objects as vectors

Key point from the excerpt: when thought of as a simple "shopping list," vectors in R^S can describe everyday objects.
This bridges the gap between abstract mathematics and practical applications.
The excerpt notes that chapter 5 will introduce "the general definition of a vector space that unifies all these different notions of a vector."

🌉 Don't confuse

Set vs. vector: a set S is not a vector; a function from S to R is a vector in R^S.
Ordering: R^n has a natural ordering (indices 1, ..., n), but R^S for arbitrary S may not; this affects notation but not the underlying operations.

Review Problems for Vectors and Vector Spaces

4.5 Review Problems

🧭 Overview

🧠 One-sentence thesis

This collection of review problems applies vector operations, geometric interpretations, and the formal definition of vector spaces to concrete scenarios ranging from lawn-mowing economics to high-dimensional hyperplanes.

📌 Key points (3–5)

Dot product applications: The dot product can compute real-world quantities like total earnings by combining rates, areas, and frequencies.
Geometric angles in n-dimensions: The angle between a diagonal and coordinate axis changes systematically as dimension increases, with a limiting behavior as n approaches infinity.
Vector space axioms: A vector space requires closure under addition and scalar multiplication, plus eight additional properties (commutativity, associativity, zero element, inverses, distributivity, and unity).
Common confusion: Don't confuse the scalar product (number times vector → vector) with the dot product (vector times vector → number); they have different inputs and outputs.
High-dimensional geometry: Planes in R³ generalize to hyperplanes in higher dimensions using the same normal-vector equation structure.

🧮 Applied vector operations

💰 Economic interpretation of dot products

The lawn-mowing problem (Problem 1) shows how dot products encode real-world calculations:

Vector A lists lawn areas (in square feet).
Vector f lists mowing frequencies (times per year).
The dot product A · f computes total square footage mowed across all lawns.
To find earnings: multiply by the rate (5¢ per square foot), which can be expressed as a scalar multiple of the dot product.

Example: If one lawn is 200 sq ft mowed 20 times, it contributes 200 × 20 = 4000 sq ft to the total.

🔢 Variable rates extension

Problem 1(d) asks how to handle different rates for different customers:

Create a third vector r containing the rate for each customer.
Compute element-wise: sum over all customers of (area × frequency × rate).
This is equivalent to the dot product of A with the element-wise product of f and r.

📐 Geometry in n-dimensions

📏 Diagonal-to-axis angles

Problem 2 explores how angles change with dimension:

In R²: the unit square diagonal makes a specific angle with each axis.
In R³: the unit cube diagonal makes a different (smaller) angle.
In Rⁿ: there's a general formula for this angle in terms of n.
As n → ∞: the angle approaches a limiting value (the problem asks students to find this limit).

Why it matters: This illustrates how geometric intuition from 2D/3D doesn't always extend to higher dimensions—angles behave differently as dimensionality increases.

🔄 Matrix transformations

Problem 3 examines the rotation matrix M with cos θ and sin θ:

Part (a): visualizing how M transforms vectors geometrically.
Part (b): computing the ratio of lengths ||MX|| / ||X||.
Part (c): the result reveals that M preserves lengths (it's a rotation), showing the connection between algebraic and geometric properties.

⚡ Lorentzian geometry

Problem 4 introduces a non-standard inner product (Lorentzian):

Unlike the usual dot product, this can give zero length for non-zero vectors.
Students find and sketch the "light cone"—the set of all zero-length vectors in 2D and 3D Lorentzian space-time.
This is a preview of special relativity geometry, where the inner product structure differs from Euclidean space.

Don't confuse: Zero length in Lorentzian geometry doesn't mean the zero vector; it's a property of the non-Euclidean metric.

🏗️ High-dimensional structures

🌐 Hyperplanes in R¹⁰¹

Problems 5–6 extend plane equations to higher dimensions:

Dimension	Object	Equation form
R³	Plane	n · (x, y, z) = n · p
R¹⁰¹	Hyperplane	N · X = N · P

N is the normal vector (perpendicular to the hyperplane).
P is any point on the hyperplane.
X is the variable vector (x₁, x₂, ..., x₁₀₁).
A 99-dimensional hyperplane in R¹⁰¹ can be described by two independent normal equations.

🎯 Vector projections

Problem 7 asks for projections in R¹⁰¹:

The projection of v onto u finds the component of v in the u direction.
The hint reminds students that two vectors always define a plane, so the 2D projection formula applies.
This shows that some geometric operations work the same way regardless of the ambient dimension.

🔧 Solution sets and linear systems

📊 Parametric vs implicit descriptions

Problems 8–9 connect two ways to describe geometric objects:

Parametric form (Problem 9):

Start with a base point plus free parameters times direction vectors.
Example: p + c₁v₁ + c₂v₂ describes a 2-parameter family (a plane).

Implicit form (system of equations):

Constraints that solutions must satisfy.
General procedure: find vectors perpendicular to all direction vectors, then write normal equations.

🧩 Special solution properties

Problem 10 explores a key insight:

If both v and cv (for any scalar c) solve Ax = b, what does this tell us about b?
Since A(cv) = cA(v) (linearity), we have cA(v) = b for all c.
This is only possible if b = 0 (the zero vector).
Implication: Non-trivial scaling of solutions only works for homogeneous systems.

🏛️ Vector space axioms

📋 The formal definition

A vector space requires ten properties organized into two groups:

Addition properties (+i through +v):

Closure: sum of vectors is a vector.
Commutativity: order doesn't matter.
Associativity: grouping doesn't matter.
Zero element: an identity for addition.
Inverses: every vector has an additive inverse.

Scalar multiplication properties (·i through ·v):

Closure: scalar times vector is a vector.
Distributivity over scalar addition: (c + d)·v = c·v + d·v.
Distributivity over vector addition: c·(u + v) = c·u + c·v.
Associativity: (cd)·v = c·(d·v).
Unity: 1·v = v.

⚠️ Notation distinctions

The excerpt emphasizes critical differences:

Operation	Notation	Input → Output	Meaning
Scalar product	·	(number, vector) → vector	Scaling
Dot product	·	(vector, vector) → number	Inner product
Addition	+	(vector, vector) → vector	Combining

Don't confuse: The scalar product c · v and dot product u · v use similar notation but are fundamentally different operations with different types.

Examples of Vector Spaces

5.1 Examples of Vector Spaces

🧭 Overview

🧠 One-sentence thesis

A vector space is any set closed under addition and scalar multiplication that satisfies ten axioms, and examples range from familiar spaces like R^n to function spaces, solution sets of homogeneous equations, and even spaces over different fields like complex numbers or bits.

📌 Key points (3–5)

What makes a vector space: a set V with addition and scalar multiplication satisfying ten properties (five for addition, five for scalar multiplication).
Function spaces are vector spaces: sets like R^N (infinite sequences), R^R (all real functions), and differentiable functions all form vector spaces under pointwise operations.
Solution sets to homogeneous equations: the set of solutions to Mx = 0 is always a vector space (called a subspace or kernel), but non-homogeneous equations fail to be vector spaces.
Common confusion: most sets of n-vectors are NOT vector spaces—breaking even one axiom disqualifies the set; for example, non-negative vectors fail scalar closure, and nowhere-zero functions fail additive closure.
Different base fields: vector spaces can be built over any field (real numbers R, complex numbers C, rationals Q, or bits B₂), changing which "scalars" are allowed.

📐 The Ten Axioms of a Vector Space

📐 Additive properties (five axioms)

A vector space (V, +, ·, R) is a set V with two operations + and · satisfying ten properties for all u, v in V and c, d in R.

The five addition axioms:

Axiom	Name	Statement	Plain language
(+i)	Additive Closure	u + v ∈ V	Adding two vectors gives a vector
(+ii)	Additive Commutativity	u + v = v + u	Order of addition does not matter
(+iii)	Additive Associativity	(u + v) + w = u + (v + w)	Order of adding many vectors does not matter
(+iv)	Zero	There exists 0_V ∈ V such that u + 0_V = u	A special zero vector exists
(+v)	Additive Inverse	For every u ∈ V there exists w ∈ V such that u + w = 0_V	Every vector has an opposite

📐 Scalar multiplication properties (five axioms)

The five scalar multiplication axioms:

Axiom	Name	Statement	Plain language
(·i)	Multiplicative Closure	c · v ∈ V	Scalar times a vector is a vector
(·ii)	Distributivity	(c + d) · v = c · v + d · v	Distributes over addition of scalars
(·iii)	Distributivity	c · (u + v) = c · u + c · v	Distributes over addition of vectors
(·iv)	Associativity	(cd) · v = c · (d · v)	Scalar multiplication is associative
(·v)	Unity	1 · v = v	Multiplying by 1 does nothing

🔍 Notation clarifications

Shorthand: Instead of writing (V, +, ·, R), we say "let V be a vector space over R" or just "let V be a vector space" when the base field is obvious.
Don't confuse scalar product with dot product:
- Scalar product · takes one number and one vector, returns a vector: · : R × V → V
- Dot product takes two vectors, returns a number: · : V × V → R
After verifying axioms, we write cv instead of c · v for efficiency.

🔢 Function Spaces

🔢 Infinite sequences: R^N

R^N = {f | f : N → R}: the set of functions from natural numbers to real numbers.

Addition: (f₁ + f₂)(n) = f₁(n) + f₂(n)
Scalar multiplication: c · f (n) = cf(n)
Interpretation: think of these as infinitely long ordered lists of numbers.
Example: the function f(n) = n³ looks like the infinite column vector [1, 8, 27, ..., n³, ...].

Why it's a vector space (checking two axioms):

(+i) Additive Closure: (f₁ + f₂)(n) = f₁(n) + f₂(n) is indeed a function N → R, since the sum of two real numbers is a real number.
(+iv) Zero: The constant zero function g(n) = 0 works because f(n) + g(n) = f(n) + 0 = f(n).
The other axioms follow from properties of real numbers.

Important note: We cannot write explicit infinite lists; we use implicit definitions like f(n) = n³ or algebraic formulas.

🔢 All real functions: R^R

R^R = {f | f : R → R}: the set of all functions from real numbers to real numbers.

Addition and scalar multiplication: pointwise, just like R^N.
Even more infinite: infinitely many components between any two components.
Most vectors cannot be defined algebraically.
Example of a non-algebraic function: the nowhere continuous function f(x) = 1 if x is rational, 0 if x is irrational.

🔢 Finite function spaces: R^S

R^{∗,?,#} = {f : {∗, ?, #} → R}: functions from a three-element set to real numbers.

This generalizes: R^S is a vector space for any set S.
Addition and scalar multiplication of functions show this is a vector space.

Common confusion: You might guess all vector spaces are of the form R^S for some set S, but the next example shows this is false.

🔢 Differentiable functions

{f : R → R | d/dx f exists}: the set of all differentiable functions.

Why it's a vector space:
- The sum of two differentiable functions is differentiable (derivative distributes over addition).
- A scalar multiple of a differentiable function is differentiable (derivative commutes with scalar multiplication: d/dx (cf) = c d/dx f).
- The zero function is 0(x) = 0 for every x.
- Other properties inherited from addition and scalar multiplication in R.
Similarly, functions with at least k derivatives, or infinitely many derivatives, are vector spaces.
Cannot be written as R^S for any set S.

🧩 Subspaces and Solution Sets

🧩 Solution sets to homogeneous equations

The solution set to Mx = 0 is always a vector space, called a subspace or the kernel of M.

Example: For the matrix M = [[1,1,1], [2,2,2], [3,3,3]], the solution set to Mx = 0 is:

{c₁[-1,1,0] + c₂[-1,0,1] | c₁, c₂ ∈ R}
This is not equal to R³ (e.g., it doesn't contain [1,0,0]).

Why it's a vector space (closure properties):

The sum of any two solutions is a solution.
Any scalar multiple of a solution is a solution.
Example: 2[-1,1,0] + 3[-1,0,1] plus 7[-1,1,0] + 5[-1,0,1] equals 9[-1,1,0] + 8[-1,0,1].

General proof of closure: If Mx₁ = 0 and Mx₂ = 0, then M(c₁x₁ + c₂x₂) = c₁Mx₁ + c₂Mx₂ = 0 + 0 = 0 (using linearity of matrix multiplication).

Subspace theorem: Based on closure properties alone, homogeneous solution sets are guaranteed to be vector spaces.

More generally: Any hyperplane through the origin of V is a vector space.

🧩 Planes inside function spaces

Example: In R^R, consider f(x) = e^x and g(x) = e^(2x). By taking combinations:

{c₁f + c₂g | c₁, c₂ ∈ R} forms a plane inside R^R.
This is a vector space.
Examples of vectors in it: 4e^x - 31e^(2x), πe^(2x) - 4e^x, and (1/2)e^(2x).

Don't confuse: A hyperplane that does NOT contain the origin cannot be a vector space because it fails condition (+iv), the zero axiom.

🧩 Product spaces

If V and W are sets, their product is V × W = {(v, w) | v ∈ V, w ∈ W}: all ordered pairs of elements from V and W.

If V and W are vector spaces, then V × W is also a vector space.

Example: The real numbers R form a vector space. The product R × R = {(x, y) | x ∈ R, y ∈ R} has:

Addition: (x, y) + (x', y') = (x + x', y + y')
Scalar multiplication: c·(x, y) = (cx, cy)
This is just the vector space R² = R^{1,2}.

❌ Non-Examples (What Breaks the Rules)

❌ Non-homogeneous equations

The solution set to a linear non-homogeneous equation is NOT a vector space because it does not contain the zero vector and therefore fails axiom (+iv).

Example: The solution set to [[1,1],[0,0]] [x,y] = [1,0] is:

{[1,0] + c[-1,1] | c ∈ R}
The vector [0,0] is not in this set.

Key point: Breaking just one axiom disqualifies the set. Most sets of n-vectors are not vector spaces.

❌ Non-negative vectors

Example: P := {[a,b] | a, b ≥ 0} is NOT a vector space.

It fails (·i) multiplicative closure.
[1,1] ∈ P, but -2[1,1] = [-2,-2] ∉ P.

❌ Nowhere-zero functions

Example: {f : R → R | f(x) ≠ 0 for any x ∈ R} is NOT a vector space.

It fails (+i) additive closure.
f(x) = x² + 1 and g(x) = -5 are in the set.
But their sum (f + g)(x) = x² - 4 = (x+2)(x-2) is not, since (f + g)(2) = 0.

Lesson: Sets of functions other than those of the form R^S should be carefully checked for compliance with all ten axioms.

🌐 Other Base Fields

🌐 What is a field?

A field is a collection of "numbers" satisfying certain properties (listed in appendix B).

Above, we defined vector spaces over the real numbers R.
One can define vector spaces over any field (choosing a different base field).

🌐 Complex numbers: C

C = {x + iy | i² = -1, x, y ∈ R}: the complex numbers.

Example from quantum physics: Vector spaces over C describe all possible states a physical system can have.

V = {[λ, μ] | λ, μ ∈ C} is the set of possible states for an electron's spin.
[1,0] and [0,1] describe spin "up" and "down" along a given direction.
Other vectors like [i, -i] are permissible since the base field is C.
Such states represent a mixture of spin up and down (counterintuitive but experimentally verified), or a given spin in another direction.

Why complex numbers are useful: Every polynomial over C factors into a product of linear polynomials.

Example: x² + 1 doesn't factor over R, but over C it factors into (x + i)(x - i).
There are two solutions to x² = -1: x = i and x = -i.
This property has far-reaching consequences: problems difficult over R often become simpler over C (e.g., diagonalizing matrices).

🌐 Rational numbers: Q

The rationals Q are also a field.
Importance in computer algebra: A real number with infinite decimal digits can't be stored by a computer, so rational approximations are used.
Since Q is a field, the mathematics of vector spaces still applies.

🌐 Bits: B₂ = Z₂ = {0, 1}

Addition and multiplication rules:

+	0	1
0	0	1
1	1	0

×	0	1
0	0	0
1	0	1

Summarized by the relation 2 = 0.
For bits, it follows that -1 = 1.
The theory of fields is typically covered in abstract algebra or Galois theory.

Non-Examples of Vector Spaces

5.1.1 Non-Examples

🧭 Overview

🧠 One-sentence thesis

Most sets fail to be vector spaces because they violate at least one of the required properties, most commonly failing to contain the zero vector, lacking closure under scalar multiplication, or lacking closure under addition.

📌 Key points (3–5)

Non-homogeneous solution sets: Solution sets to non-homogeneous linear equations are not vector spaces because they do not contain the zero vector.
Scalar multiplication failure: A set can fail to be a vector space if multiplying a vector by a scalar produces something outside the set.
Addition closure failure: A set can fail if adding two vectors in the set produces a vector outside the set.
Common confusion: Just one broken rule is enough—if any single vector space property fails, the entire set is disqualified as a vector space.
Most sets are not vector spaces: The majority of sets of n-vectors or functions do not satisfy all the required properties.

❌ Non-homogeneous equations

❌ Why they fail

The solution set to a linear non-homogeneous equation is not a vector space because it does not contain the zero vector and therefore fails (iv).

A non-homogeneous equation has a non-zero constant term on the right-hand side.
The zero vector cannot satisfy such an equation, so it is not in the solution set.
Property (iv) requires that the zero vector must be in any vector space.

🔢 Concrete example

The excerpt gives the equation:

Matrix (1 1 0 0) times vector (x y) equals (1 0).
The solution set is: {(1 0) + c(−1 1) | c ∈ ℝ}.
The zero vector (0 0) is not in this set.
Example: When c = 0, we get (1 0), not (0 0).

Don't confuse: Homogeneous equations (right-hand side = 0) do form vector spaces; non-homogeneous equations (right-hand side ≠ 0) do not.

🚫 Scalar multiplication failures

🚫 Restricted sign example

The excerpt defines:

P := {(a b) | a, b ≥ 0} (vectors with non-negative components).
This set is not a vector space.

🔍 Why it fails

The set fails property (·i): if a vector is in the set, then any scalar multiple should also be in the set.
The vector (1 1) is in P because both components are non-negative.
But −2 times (1 1) equals (−2 −2), which has negative components.
Therefore (−2 −2) is not in P, violating closure under scalar multiplication.

Key insight: Restricting vectors to non-negative components breaks scalar multiplication closure because negative scalars produce vectors outside the set.

➕ Addition closure failures

➕ Nowhere-zero functions

The excerpt considers:

The set of all functions from ℝ to ℝ that are nowhere zero: {f : ℝ → ℝ | f(x) ≠ 0 for any x ∈ ℝ}.
This set does not form a vector space.

🔍 Why it fails

The set fails property (+i): adding two vectors in the set should produce another vector in the set.
The function f(x) = x² + 1 is in the set (always positive, never zero).
The function g(x) = −5 is in the set (always −5, never zero).
Their sum is (f + g)(x) = x² + 1 − 5 = x² − 4 = (x + 2)(x − 2).
This sum equals zero at x = 2, so (f + g)(2) = 0.
Therefore f + g is not in the set, violating closure under addition.

Key insight: Even though both functions individually avoid zero, their sum can hit zero, breaking closure.

🎯 General principles

🎯 One violation is enough

The excerpt emphasizes: "if just one of the vector space rules is broken, the example is not a vector space."
You do not need to check all properties; finding one failure is sufficient to disqualify a set.

🎯 Most sets fail

The excerpt states: "Most sets of n-vectors are not vector spaces."
Vector spaces require all properties to hold simultaneously, which is a strong constraint.
Sets of functions should be "carefully checked for compliance with the definition of a vector space."

🎯 Summary table

Example	What fails	Why
Non-homogeneous solution set	Property (iv): zero vector	Zero vector does not satisfy the equation
Non-negative vectors P	Property (·i): scalar multiplication	Negative scalars produce vectors outside the set
Nowhere-zero functions	Property (+i): addition	Sum of two nowhere-zero functions can be zero

5.2 Other Fields

🧭 Overview

🧠 One-sentence thesis

Vector spaces can be defined over any field (not just the real numbers), and choosing different base fields—such as complex numbers, rationals, or bits—enables different mathematical applications while preserving vector space structure.

📌 Key points (3–5)

What a field is: a collection of "numbers" satisfying certain properties (detailed in appendix B); examples include real numbers ℝ, complex numbers ℂ, rationals ℚ, and bits B₂.
Base field flexibility: the definition of vector spaces works over any field, not just ℝ; this is called "choosing a different base field."
Complex numbers ℂ: every polynomial over ℂ factors into linear polynomials (e.g., x² + 1 = (x + i)(x − i)), making many problems simpler; used in quantum physics for describing physical states.
Rationals ℚ and bits B₂: ℚ is important for computer algebra (computers can't store infinite decimals); B₂ = {0, 1} with addition and multiplication mod 2 (so 2 = 0 and −1 = 1).
Common confusion: the scalars come from the base field—if the base field is ℂ, then scalar multiplication uses complex numbers; if it's B₂, scalars are only 0 and 1.

🔢 What is a field and why it matters

🔢 Definition of a field

A field is a collection of "numbers" satisfying properties listed in appendix B.

The excerpt does not list the properties explicitly but refers to an appendix.
Fields generalize the idea of "numbers you can add, subtract, multiply, and divide (except by zero)."
Why it matters: once you have a field, you can build vector spaces over it using the same axioms.

🌐 Base field

The base field is the set from which scalars are drawn.
In earlier sections, vector spaces were defined "over the real numbers" (base field = ℝ).
Changing the base field changes which scalars are allowed in scalar multiplication, but the vector space axioms remain the same.

🧮 Examples of fields

🧮 Complex numbers ℂ

ℂ = {x + iy | i² = −1, x, y ∈ ℝ}

Special property: every polynomial over ℂ factors into linear polynomials.
- Example: x² + 1 does not factor over ℝ, but over ℂ it factors as (x + i)(x − i).
- This means x² = −1 has two solutions: x = i and x = −i.
Why it matters: problems that are difficult over ℝ often become simpler over ℂ; this is important for diagonalizing matrices (chapter 13).
Application in quantum physics (Example 68):
- Vector spaces over ℂ describe all possible states of a physical system.
- V = {(λ, μ) | λ, μ ∈ ℂ} represents possible states for an electron's spin.
- (1, 0) = spin "up"; (0, 1) = spin "down"; (i, −i) = a mixture (spin in another direction).
- Vectors like (i, −i) are allowed because the base field is ℂ, so scalars can be complex.

🧮 Rational numbers ℚ

ℚ is the field of fractions (ratios of integers).
Why it matters for computers: a real number with an infinite decimal expansion cannot be stored exactly by a computer.
Rational approximations are used instead.
Since ℚ is a field, the mathematics of vector spaces still applies.

🧮 Bits B₂ = Z₂ = {0, 1}

Addition and multiplication tables:

+	0	1
0	0	1
1	1	0

×	0	1
0	0	0
1	0	1

Key relation: 2 = 0 (because 1 + 1 = 0 in this field).
Consequence: −1 = 1 (since 1 + 1 = 0, adding 1 to both sides gives 1 = −1).
This field is useful in coding theory and computer science.

🔍 How base field affects vector spaces

🔍 Scalar multiplication depends on the base field

If the base field is ℝ, scalars are real numbers.
If the base field is ℂ, scalars are complex numbers—so vectors like (i, −i) are valid.
If the base field is B₂, scalars are only 0 and 1—scalar multiplication is very restricted.

🔍 Don't confuse: same vector space structure, different scalars

The vector space axioms (closure, associativity, zero vector, etc.) are the same regardless of base field.
What changes is which scalars you can use and which vectors are in the space.
Example: over ℝ, the vector (i, −i) is not in ℝ²; over ℂ, it is in ℂ².

🧪 Why different fields are useful

🧪 Complex numbers simplify problems

Polynomials always factor over ℂ.
Many mathematical problems become "relatively simple" when working over ℂ instead of ℝ.
Example application: diagonalizing matrices (chapter 13).

🧪 Rationals for computation

Computers cannot store arbitrary real numbers (infinite decimals).
Using ℚ as the base field allows exact symbolic computation with fractions.
The field structure ensures vector space operations remain valid.

🧪 Bits for discrete systems

B₂ is useful in coding theory, cryptography, and digital systems.
The relation 2 = 0 and −1 = 1 reflect modular arithmetic (mod 2).

🧪 Further study

The theory of fields is covered in abstract algebra or Galois theory courses.

Review Problems for Vector Spaces

5.3 Review Problems

🧭 Overview

🧠 One-sentence thesis

These review problems reinforce the definition of a vector space by asking students to verify axioms for various sets, propose operations, and explore how different base fields affect vector space structure.

📌 Key points (3–5)

Core task: verify that candidate sets satisfy all parts of the vector space definition, including addition, scalar multiplication, zero vector, and additive inverses.
Base field matters: the same set can behave differently as a vector space over R versus over C.
Common confusion: not every natural set is a vector space—divergent sequences fail closure, while convergent sequences succeed.
Function spaces: sets of functions (like sequences or functions from finite sets) form vector spaces when addition and scalar multiplication are defined pointwise.
Non-standard operations: vector spaces can use unusual definitions of "addition" and "scalar multiplication" (like multiplication and exponentiation) as long as all axioms hold.

✅ Verification problems

✅ Standard Euclidean space

Problem 1 asks students to check that R² with usual operations satisfies all vector space axioms.

This is the most familiar example: vectors are ordered pairs of real numbers.
Addition is componentwise; scalar multiplication scales each component.
The exercise builds fluency with the axiom list.

✅ Complex numbers as a vector space

Problem 2 explores C = {x + iy | i² = −1, x, y ∈ R}.

(a) Over C as base field: addition is complex addition; scalar multiplication is complex multiplication.
(b) Over R as base field: the excerpt hints at comparing with problem 1, suggesting students explore what changes when the scalars are restricted to real numbers.

🔍 Subsets and closure

🔍 Convergent vs divergent sequences

Problem 3 contrasts two subsets of R^N (the space of all sequences):

(a) Convergent sequences: V = {f | f: N → R, lim_{n→∞} f(n) ∈ R}.
- Students must check whether this subset is closed under addition and scalar multiplication.
- If the sum of two convergent sequences converges, and a scalar multiple of a convergent sequence converges, then this is a vector space.
(b) Divergent sequences: V = {f | f: N → R, lim does not exist or is ±∞}.
- Don't confuse: divergent sequences are not closed under addition—two divergent sequences can sum to a convergent sequence.
- Example: f(n) = n and g(n) = −n both diverge, but f + g = 0 converges.

🧩 Proposing operations

🧩 Matrices

Problem 4: the set of 2×4 matrices with complex entries.

Students must propose componentwise addition and scalar multiplication.
Zero vector: the matrix with all entries zero.
Additive inverse: negate every entry.

🧩 Polynomials

Problem 5: P^R_3 = polynomials with real coefficients of degree ≤ 3.

(a) Addition: add coefficients; scalar multiplication: multiply each coefficient.
(b) Zero vector: the zero polynomial; additive inverse of −3 − 2x + x² is 3 + 2x − x².
(c) Over C: the excerpt notes that P^R_3 is not a vector space over C because the coefficients are restricted to R, but scalars would be in C. A small change: allow complex coefficients to make it a vector space over C.

🎲 Non-standard examples

🎲 Positive reals with exotic operations

Problem 6: V = R⁺ = {x ∈ R | x > 0}.

Define x ⊕ y = xy (multiplication as "addition").
Define λ ⊗ x = x^λ (exponentiation as "scalar multiplication").
Students verify all vector space axioms hold with these unusual operations.
Zero vector: the number 1 (since 1 · y = y).
Additive inverse of x: 1/x (since x · (1/x) = 1).

🎲 Matrices as functions

Problem 7: a 2×2 matrix has entries m_{ij} for i, j ∈ {1, 2}.

This is equivalent to a function from the set S = {1, 2} × {1, 2} to R.
Generalization: an m×n matrix corresponds to R^S where S = {1,…,m} × {1,…,n}.

🧮 Function spaces

🧮 Functions from finite sets

Problem 8: R^{∗,?,#} = functions from the three-element set {∗, ?, #} to R.

The excerpt defines three special functions e_∗, e_?, e_# that output 1 on one symbol and 0 on the others.
Any function f in this space can be written as f = f(∗)e_∗ + f(?)e_? + f(#)e_#.
These are analogous to basis vectors in R³.

🧮 General function spaces

Problem 9: if V is a vector space and S is any set, then V^S (all functions S → V) is a vector space.

Addition rule: (f + g)(s) = f(s) + g(s) for all s ∈ S (pointwise addition in V).
Scalar multiplication: (λf)(s) = λ · f(s).
This generalizes sequences (S = N) and finite function spaces.

The Consequence of Linearity

6.1 The Consequence of Linearity

🧭 Overview

🧠 One-sentence thesis

Linear functions are remarkably simple because, despite potentially infinite domains, they are completely specified by a very small amount of information—namely, their action on a basis.

📌 Key points (3–5)

Core advantage of linearity: A linear function on an n-dimensional space is fully determined by its values on just n carefully chosen input vectors, even though the domain contains infinitely many vectors.
How it works: Linearity (additivity + homogeneity) lets you compute the output for any vector by combining the outputs of basis vectors.
Matrix representation: For linear functions on R^n, knowing the action on the n standard basis vectors immediately gives the matrix form.
Common confusion: Not all linear functions can be written as square matrices—when the domain is a subspace (like a hyperplane), the natural matrix representation may have different dimensions or require solving a system.
Two key properties: Linear functions are special because (1) they act on vector spaces and (2) they respect addition and scalar multiplication.

🔑 Why linear functions are simple

🔑 Infinite outputs from finite information

A linear function L : V → W satisfies L(ru + sv) = rL(u) + sL(v) for all vectors u, v and scalars r, s.

The key insight: Even though a general real function requires specifying one output for every input (infinite information), a linear function needs far less.
Homogeneity in action: If you know L maps one vector to a particular output, you automatically know how it maps all scalar multiples of that vector.
- Example: If L((1, 0)) = (5, 3), then by homogeneity L(5·(1, 0)) = 5·L((1, 0)) = 5·(5, 3) = (25, 15).
- This single piece of information determines infinitely many outputs.

📐 Complete specification in R^2

Two outputs determine everything: For a linear function on R^2, knowing the outputs for (1, 0) and (0, 1) is sufficient.
Why this works: Every vector in R^2 can be written as (x, y) = x·(1, 0) + y·(0, 1).
Computing any output: By additivity and homogeneity,
- L((x, y)) = L(x·(1, 0) + y·(0, 1)) = x·L((1, 0)) + y·L((0, 1))
- Example: If L((1, 0)) = (5, 3) and L((0, 1)) = (2, 2), then L((x, y)) = x·(5, 3) + y·(2, 2) = (5x + 2y, 3x + 2y).
Matrix connection: The function acts exactly like the matrix with columns [5, 3] and [2, 2].

🧮 General pattern for R^n

🧮 Standard basis vectors

The n-vector pattern: A linear transformation on R^n is completely specified by its action on the n standard basis vectors.
- For R^3: the three vectors (1, 0, 0), (0, 1, 0), (0, 0, 1).
- For R^n: the n vectors with exactly one non-zero component (equal to 1).

📋 Reading off the matrix

Once you know how L acts on each standard basis vector, you can immediately write the matrix form.
The outputs on the standard basis vectors become the columns of the matrix.
Don't confuse: This straightforward matrix representation works cleanly when the domain is R^n itself; other domains require more care.

⚠️ When domains are not R^n

⚠️ Linear functions on hyperplanes

The complication: Not all linear functions have R^n as their domain; some are defined only on subspaces like hyperplanes.
Example setup: Let V be the plane spanned by (1, 1, 0) and (0, 1, 1) in R^3, and let L : V → R^3 be linear with:
- L((1, 1, 0)) = (0, 1, 0)
- L((0, 1, 1)) = (0, 1, 0)
Computing outputs: For any vector c₁·(1, 1, 0) + c₂·(0, 1, 1) in V, linearity gives L(c₁·(1, 1, 0) + c₂·(0, 1, 1)) = (c₁ + c₂)·(0, 1, 0).
The range is just the line through the origin in the x₂ direction.

🚫 Matrix representation challenges

The ambiguity: It's not immediately clear how to write L as a matrix.
Why 3×3 doesn't work: You might try to write L as a 3×3 matrix acting on (c₁, c₁ + c₂, c₂), but:
- All 3×3 matrices naturally have R^3 as their domain (by the natural domain convention).
- The domain of L is smaller—only the plane V, not all of R^3.
The resolution: When L is eventually realized as a matrix, it will be a 3×2 matrix (not 3×3), reflecting the two-dimensional domain.
Don't confuse: Matrix size reflects the dimension of the domain and codomain, not just the ambient space where vectors happen to live.

6.2 Linear Functions on Hyperplanes

🧭 Overview

🧠 One-sentence thesis

Linear operators on hyperplanes require careful matrix representation because their domain is smaller than the full coordinate space, and the matrix representation depends on how you label points in that subspace.

📌 Key points (3–5)

Why linear operators are special: they are completely specified by a finite amount of information (outputs at a few basis vectors), even though they act on infinitely many inputs.
How many outputs specify a linear function: in R^n, knowing the output at n linearly independent vectors determines the entire function through additivity and homogeneity.
Hyperplane domains complicate matrix form: when the domain is a hyperplane (a subspace smaller than R^n), the natural 3×3 matrix representation does not work because the domain is restricted.
Common confusion: a linear function on a hyperplane in R^3 is not the same as a 3×3 matrix acting on all of R^3; the matrix size depends on the dimension of the domain and codomain, not the ambient space.
Key requirement for matrix representation: you must specify both the matrix and the coordinate system (labeling scheme) used to represent points in the hyperplane.

🎯 Why linear operators are simple

🔢 Finite information specifies infinite outputs

A linear function is completely specified by a very small amount of information, even though it can have infinitely many elements in its domain.

Contrast with general functions: a real function of one variable requires specifying one output for each input—an infinite amount of information.
Linear functions exploit structure: because they obey additivity and homogeneity, knowing a few outputs determines all others.

🧮 Homogeneity extends one output to infinitely many

If L is linear and you know L applied to vector (1, 0) equals (5, 3), then you can compute L applied to (5, 0) by homogeneity.
By homogeneity: L applied to (5, 0) equals L applied to 5 times (1, 0), which equals 5 times L applied to (1, 0), which equals 5 times (5, 3) = (25, 15).
One piece of information (the output at one vector) determines the output at infinitely many scalar multiples.

➕ Additivity combines outputs to cover the entire domain

If L is linear and you know L applied to (1, 0) equals (5, 3) and L applied to (0, 1) equals (2, 2), you can compute L applied to (1, 1).
By additivity: L applied to (1, 1) equals L applied to (1, 0) plus (0, 1), which equals L applied to (1, 0) plus L applied to (0, 1), which equals (5, 3) plus (2, 2) = (7, 5).
In R^2: every vector (x, y) can be written as x times (1, 0) plus y times (0, 1), so two outputs determine the entire function.
Example: L applied to (x, y) equals x times (5, 3) plus y times (2, 2) = (5x + 2y, 3x + 2y).

🔑 Two characteristics make linear functions simple

They act on vector spaces: the domain has structure (addition and scalar multiplication).
They act additively and homogeneously: L applied to (u + v) equals L applied to u plus L applied to v, and L applied to (c times u) equals c times L applied to u.

In R^n: a linear transformation is completely specified by its action on n standard basis vectors (vectors with exactly one non-zero component equal to 1).
The matrix form can be read directly from this information.

🛤️ Linear functions on hyperplanes

🌐 What is a hyperplane domain

Not all linear functions have nice domains like R^n.
A hyperplane is a subspace smaller than the full coordinate space.
Example: V is the set of all vectors of the form c₁ times (1, 1, 0) plus c₂ times (0, 1, 1), where c₁ and c₂ are real numbers.
This V is a plane (2-dimensional subspace) inside R^3.

🧩 Specifying the function on a hyperplane

Consider L mapping V to R^3, where:

L applied to (1, 1, 0) equals (0, 1, 0).
L applied to (0, 1, 1) equals (0, 1, 0).

By linearity, for any vector in V:

L applied to c₁ times (1, 1, 0) plus c₂ times (0, 1, 1) equals (c₁ + c₂) times (0, 1, 0).
The domain is a plane; the range is a line through the origin in the x₂ direction.

⚠️ Why 3×3 matrices don't work

It is not clear how to write L as a 3×3 matrix.
You might try to write L applied to (c₁, c₁ + c₂, c₂) as a 3×3 matrix times that vector, but this does not work.
Reason: by the natural domain convention, all 3×3 matrices have R^3 as their domain, but the domain of L is smaller (a 2-dimensional plane).
Don't confuse: a linear function on a hyperplane in R^3 with a linear function on all of R^3.

📐 Correct matrix representation: 3×2 matrix

The domain of L is 2-dimensional (a plane), and the codomain is 3-dimensional (R^3).
Therefore, L should be represented as a 3×2 matrix.
Rewrite: L applied to c₁ times (1, 1, 0) plus c₂ times (0, 1, 1) equals c₁ times (0, 1, 0) plus c₂ times (0, 1, 0).
In matrix form: L applied to the vector equals the 3×2 matrix with columns (0, 1, 0) and (0, 1, 0) times the column vector (c₁, c₂).

Representation	Matrix size	Domain	Codomain
Incorrect 3×3	3×3	R^3 (too large)	R^3
Correct 3×2	3×2	2-dimensional plane V	R^3

🚨 Critical warning

The matrix specifies L only if you also provide the information that you are labeling points in the plane V by the two numbers (c₁, c₂).

The matrix alone is not enough; you must specify the coordinate system (how points in V are represented by pairs of numbers).
The same linear function can have different matrix representations depending on the choice of coordinates for the hyperplane.

🧮 Linear differential operators

📚 Derivatives as linear operators

The derivative operator is linear, which simplifies calculus.
Instead of using the limit definition every time, you use linearity plus a few basic rules.

🔬 Example: derivatives of polynomials

Let V be the vector space of polynomials of degree 2 or less:

V consists of all polynomials a₀ times 1 plus a₁ times x plus a₂ times x².

The derivative operator d/dx maps V to V. Three equations, along with linearity, determine the derivative of any second-degree polynomial:

d/dx applied to 1 equals 0.
d/dx applied to x equals 1.
d/dx applied to x² equals 2x.

By linearity:

d/dx applied to (a₀ times 1 plus a₁ times x plus a₂ times x²) equals a₀ times (d/dx applied to 1) plus a₁ times (d/dx applied to x) plus a₂ times (d/dx applied to x²).
This equals 0 plus a₁ plus 2a₂ times x.

Result: the derivative acting on infinitely many second-order polynomials is determined by its action on just three inputs (1, x, x²).

Linear Differential Operators

6.3 Linear Differential Operators

🧭 Overview

🧠 One-sentence thesis

Linear differential operators, like the derivative, exploit linearity to determine their action on infinitely many inputs by specifying their behavior on just a few key vectors or functions.

📌 Key points (3–5)

Core principle: A linear operator is completely determined by how it acts on a small set of "building block" vectors or functions.
Differential operators as linear maps: The derivative operator on polynomials is linear, so knowing derivatives of a few basic polynomials (like 1, x, x²) determines all polynomial derivatives.
Matrix representation challenges: Not all linear functions have straightforward matrix forms—domains that are subspaces (like hyperplanes) require careful specification of coordinate systems.
Common confusion: A matrix alone doesn't fully specify a linear operator; you must also know how the domain is being labeled or parameterized.
Basis flexibility: Many different sets of vectors can serve as "building blocks" (bases) for the same space, giving freedom in how to represent linear operators.

🔧 Linear operators on restricted domains

🔧 Hyperplanes as domains

The excerpt shows that when a linear function's domain is a hyperplane (a plane through the origin), writing it as a matrix requires extra care.

Example scenario: A linear function L maps a 2-dimensional plane V inside R³ to R³.
The plane V is described by all vectors of the form c₁(1,1,0) + c₂(0,1,1), where c₁ and c₂ are real numbers.
L is specified by how it acts on the two "direction vectors" (1,1,0) and (0,1,1).

📐 Matrix representation pitfalls

Warning: A matrix specifies a linear operator only when you also provide the information about how points in the domain are labeled by coordinates.

The excerpt shows that L can be written as a 3×2 matrix (0,0; 1,1; 0,0) acting on the coordinate pair (c₁, c₂).
Why 3×2? The domain is 2-dimensional (the plane V) and the codomain is 3-dimensional (R³).
Don't confuse: A 3×3 matrix would have domain R³ by convention, but L's domain is smaller—only the plane V.
The matrix form only makes sense when paired with the coordinate system (c₁, c₂) for the plane.

📏 The derivative as a linear operator

📏 Polynomial differentiation

The excerpt uses the derivative operator d/dx on polynomials of degree 2 or less as a key example.

Domain: V := {a₀·1 + a₁x + a₂x² | a₀, a₁, a₂ ∈ R}, the vector space of polynomials of degree ≤ 2.

The derivative operator d/dx maps V to itself (V → V).
It is a linear operator, meaning it respects addition and scalar multiplication.

🔑 Three equations determine everything

The excerpt emphasizes that knowing just three derivative values determines the derivative of any second-degree polynomial:

Input	Derivative
1	0
x	1
x²	2x

How it works: For any polynomial a₀·1 + a₁x + a₂x², linearity gives:
- d/dx(a₀·1 + a₁x + a₂x²) = a₀·(d/dx 1) + a₁·(d/dx x) + a₂·(d/dx x²)
- = a₀·0 + a₁·1 + a₂·2x = a₁ + 2a₂x
Why this matters: Instead of using the limit definition for each polynomial, you apply linearity and three simple rules.
Example: The derivative of 5 + 3x + 7x² is 0 + 3 + 14x = 3 + 14x, computed purely by linearity.

🌟 Infinitely many inputs, finitely many rules

The vector space V contains infinitely many polynomials (one for each choice of a₀, a₁, a₂).
Yet the derivative operator is completely determined by its action on just three inputs: 1, x, and x².
This is the "hidden simplicity" the excerpt refers to—linear functions compress infinite information into finite specifications.

🧩 Bases and complete specification

🧩 What makes a basis

The excerpt introduces the idea that different sets of vectors can serve as "building blocks" for a space.

Basis for R²: A pair of vectors such that any vector in R² can be expressed as a linear combination of them.

The standard example: (1,0) and (0,1).
Another valid basis: (1,1) and (1,−1).
Key property: When used as columns of a matrix, a basis gives an invertible matrix.

🔄 Example with a non-standard basis

The excerpt walks through how a linear operator L on R² is completely specified by its values on (1,1) and (1,−1):

Given: L(1,1) = (2,4) and L(1,−1) = (6,8).
Step 1: Express any vector (x,y) as a combination of (1,1) and (1,−1).
- Solve the system: a(1,1) + b(1,−1) = (x,y).
- Solution: a = (x+y)/2, b = (x−y)/2.
Step 2: Apply linearity:
- L(x,y) = L[(x+y)/2·(1,1) + (x−y)/2·(1,−1)]
- = (x+y)/2·L(1,1) + (x−y)/2·L(1,−1)
- = (x+y)/2·(2,4) + (x−y)/2·(6,8)
- = (4x−2y, 6x−y).
Result: Knowing L on just two vectors determines L on all of R².

🔓 Freedom in choosing bases

The excerpt notes there are infinitely many pairs of vectors that form a basis for R².
Similarly, infinitely many triples form a basis for R³.
Why it matters: This freedom is what makes linear algebra powerful—you can choose the most convenient basis for a given problem.
Don't confuse: Different bases describe the same space, but coordinates of vectors will differ depending on the basis chosen.

6.4 Bases (Take 1)

🧭 Overview

🧠 One-sentence thesis

A basis is a set of vectors that allows any other vector in the space to be uniquely expressed as a linear combination, and the freedom to choose different bases is what makes linear algebra powerful.

📌 Key points (3–5)

Core idea: Linear operators on a space are completely determined by how they act on a basis—just a few vectors specify behavior on infinitely many.
What makes a basis: A set of vectors such that every vector in the space can be written as a unique linear combination of them; for R² or R³, this corresponds to columns of an invertible matrix.
Dimension: The number of vectors in a basis (all bases for the same space have the same count); roughly, the number of independent directions available.
Common confusion: There are infinitely many different bases for the same space—any invertible-matrix column set works—but they all have the same dimension.
Why freedom matters: Choosing a good basis can dramatically reduce calculation time.

🔑 The power of bases

🔑 Specifying linear operators with fewer inputs

A linear operator acting on R² is completely specified by how it acts on just two vectors (a basis).
The excerpt shows that the standard pair (1,0) and (0,1) works, but so does (1,1) and (1,−1).
Once you know the operator's output on the basis vectors, linearity lets you compute its action on any vector by:
1. Expressing the target vector as a linear combination of the basis.
2. Applying linearity to distribute the operator.

Example: If L is given by L(1,1) = (2,4) and L(1,−1) = (6,8), then for any (x,y):

First solve (x,y) = a(1,1) + b(1,−1) to get a = (x+y)/2 and b = (x−y)/2.
Then L(x,y) = (x+y)/2 · L(1,1) + (x−y)/2 · L(1,−1) = (x+y)/2 · (2,4) + (x−y)/2 · (6,8) = (4x−2y, 6x−y).

🧩 Why this works

Linearity means the operator respects addition and scalar multiplication.
If every vector can be written as a combination of basis vectors, the operator's action on the basis determines everything.
The excerpt emphasizes: "any vector can be expressed as a linear combination of them."

🧱 What is a basis?

🧱 Definition and criteria

A basis is a set of vectors in terms of which it is possible to uniquely express any other vector.

For R²: any pair of vectors whose columns form an invertible matrix.
For R³: any triple of vectors whose columns form an invertible matrix.
For a 2-dimensional subspace V in R³: any pair of vectors such that every vector in V can be written as a linear combination of them.

🔄 Infinitely many bases

The excerpt states: "there are infinitely many pairs of vectors from R² with the property that any vector can be expressed as a linear combination of them."
Example: the plane V = {c₁(1,1,0) + c₂(0,1,1) | c₁,c₂ ∈ R} can also be written as:
- V = {c₁(1,1,0) + c₂(0,2,2) | c₁,c₂ ∈ R}
- V = {c₁(1,1,0) + c₂(1,3,2) | c₁,c₂ ∈ R}
All these pairs are bases for the same space V.

⚠️ Don't confuse

Many bases, one dimension: Every vector space has infinitely many bases, but all bases for a particular space have the same number of vectors.
That common count is the dimension of the space.

📏 Dimension

📏 Intuitive meaning

Dimension is "the number of independent directions available."
The excerpt describes a process:
1. Stand at the origin and pick a direction.
2. If there are vectors not in that direction, pick another direction not in the line of the first.
3. If there are vectors not in the plane of the first two, pick a third direction, and so on.
You stop when you have a minimal set of independent vectors (a basis).

📏 Formal definition (preview)

The excerpt notes: "the careful mathematical definition is given in Chapter 11."
Key idea: a basis is a minimal set of independent vectors; the number of vectors in the basis is the dimension.
All bases for the same space have the same count, so dimension is well-defined.

Space	Dimension	Basis size
R²	2	2 vectors
R³	3	3 vectors
V (plane in R³)	2	2 vectors

🚀 Why freedom in choosing bases matters

🚀 Computational efficiency

The excerpt emphasizes: "That freedom is what makes linear algebra powerful."
"Often a good choice of basis can reduce the time required to run a calculation in dramatic ways!"
Different bases can make the same problem easier or harder to compute.

🚀 The central idea

"The central idea of linear algebra is to exploit the hidden simplicity of linear functions."
"It ends up there is a lot of freedom in how to do this."
By choosing a basis that aligns with the problem structure, you can simplify calculations significantly.

Review Problems

6.5 Review Problems

🧭 Overview

🧠 One-sentence thesis

A linear transformation is completely determined by its action on a basis, and this fact allows us to express any linear operator as a matrix once we choose bases for the domain and codomain.

📌 Key points (3–5)

Complete specification by basis: A linear operator on R² is fully determined by how it acts on any pair of vectors that form a basis (not just the standard basis).
What makes a basis: A set of vectors forms a basis if every vector in the space can be uniquely expressed as a linear combination of them; for Rⁿ, this corresponds to columns of an invertible matrix.
Dimension as independent directions: The dimension of a vector space is the number of vectors in any basis; all bases for the same space have the same number of vectors.
Common confusion: Linearity has two equivalent formulations—the two-condition version (additivity + homogeneity) and the single-condition version (linear combination preservation).
Why bases matter: Choosing a good basis can dramatically reduce calculation time; every vector space has infinitely many bases.

🔢 Complete specification by basis

🔢 How a basis determines a linear operator

The excerpt shows that a linear operator L on R² is completely specified by its values on just two basis vectors.
Example: If L(1,1) = (2,4) and L(1,−1) = (6,8), then L is fully determined.
Why this works: any vector (x,y) can be written as a linear combination of (1,1) and (1,−1), then linearity extends L to all vectors.

🧮 The calculation process

The excerpt demonstrates a three-step process:

Express the target vector as a linear combination of basis vectors
- Solve a linear system to find coefficients a and b such that (x,y) = a(1,1) + b(1,−1)
- The excerpt shows: a = (x+y)/2 and b = (x−y)/2
Apply linearity
- L((x,y)) = L[a(1,1) + b(1,−1)] = aL(1,1) + bL(1,−1)
Substitute known values
- L((x,y)) = ((x+y)/2)(2,4) + ((x−y)/2)(6,8) = (4x−2y, 6x−y)

Don't confuse: The operator is not defined by a formula first; rather, the formula is derived from the basis values using linearity.

🧩 What is a basis

🧩 Definition and criteria

Basis: A set of vectors in terms of which it is possible to uniquely express any other vector.

For different spaces:

R²: Any pair of vectors that form the columns of an invertible matrix
R³: Any triple of vectors that form the columns of an invertible matrix
General vector space V: Any set of vectors such that every vector in V can be expressed as a linear combination of them

🔄 Multiple representations of the same space

The excerpt shows that the space V = {c₁(1,1,0) + c₂(0,1,1) | c₁,c₂ ∈ R} can be written equivalently as:

V = {c₁(1,1,0) + c₂(0,2,2) | c₁,c₂ ∈ R}
V = {c₁(1,1,0) + c₂(1,3,2) | c₁,c₂ ∈ R}

All three are valid bases for the same space V.

✅ Infinitely many bases

The excerpt emphasizes: "there are infinitely many pairs of vectors from R² with the property that any vector can be expressed as a linear combination of them."
The key test: when used as columns of a matrix, they give an invertible matrix.

📐 Dimension

📐 Intuitive understanding

The excerpt gives an informal definition:

Dimension is the number of independent directions available.

The process for finding dimension:

Stand at the origin and pick a direction
If vectors exist outside that direction, pick another independent direction
Continue until all vectors in the space lie in the span of chosen directions
The number of directions chosen = dimension

📐 Well-defined property

Property	What the excerpt says
Minimal independent set	A basis is a minimal set of independent vectors
Consistency	Every vector space has many bases, but all bases for a particular space have the same number of vectors
Implication	Dimension is well-defined (does not depend on which basis you choose)

Don't confuse: The number of vectors in a basis is an invariant property of the space itself, not a choice.

📝 Review problem themes

📝 Equivalent formulations of linearity

Problem 1 asks to show equivalence between:

Two-condition version:

L(u + v) = L(u) + L(v) (additivity)
L(cv) = cL(v) (homogeneity)

Single-condition version:

L(ru + sv) = rL(u) + sL(v) (linear combination preservation)

Both must hold for all vectors u, v and all scalars c, r, s.

📝 Determining linear functions from data

Problem 2: How many points on the graph of a linear function of one variable are needed to specify it?
Problem 3: Given specific input-output pairs, determine whether a function can be linear.
Problem 4: Given L(1,2) = 0 and L(2,3) = 1, find L(x,y).
Problem 5: Given L on polynomials with L(1) = 4, L(t) = t³, L(t²) = t−1, find L for arbitrary polynomials.

📝 Linearity of specific operators

Problem 6: Show that the integral operator I (mapping f to its antiderivative from 0 to x) is linear on continuous functions.
Problem 7: Complex conjugation c(x,y) = (x,−y) is linear over R but not over C—illustrating that linearity depends on the choice of scalar field.

Key insight: Linearity is not just about the function's formula; it's about how the function interacts with the vector space structure (addition and scalar multiplication).

Linear Transformations and Matrices

7.1 Linear Transformations and Matrices

🧭 Overview

🧠 One-sentence thesis

Ordered bases allow us to represent linear operators as matrices by recording how the operator maps each input basis vector to a combination of output basis vectors.

📌 Key points (3–5)

Basis notation encodes vectors: An ordered basis lets us write any vector as a column of coefficients, which is not the same as the vector itself but encodes it.
Order matters: Changing the order of basis elements changes the column vector representation of the same vector.
Matrix construction rule: To find the matrix of a linear operator, apply it to each input basis vector and express the results in terms of the output basis vectors.
Common confusion: The column vector of a vector in a non-standard basis does not equal the column vector the vector "actually is" (only true for the standard basis in R^n).
Why bases matter: Different bases can make the same vector or computation simpler; there is no universal "best" basis.

📐 Basis notation and column vectors

📐 What basis notation means

The column vector of a vector v in an ordered basis B = (b₁, b₂, ..., bₙ) is the list of coefficients (α₁, α₂, ..., αₙ) such that v = α₁b₁ + α₂b₂ + ⋯ + αₙbₙ.

The notation [α₁, α₂, ..., αₙ] with subscript B means "multiply each basis vector by the corresponding coefficient and sum them."
This column vector is not equal to the vector v itself; it only encodes v relative to the chosen basis.
Example: In the vector space of 2×2 real matrices, the matrix [[a, b], [c, d]] can be written as [a, b, c, d] in the basis of standard matrix units, but the 4-vector is not the same object as the matrix.

🔄 Standard vs non-standard bases

Basis type	Example in R²	Column vector of (x, y)	Notes
Standard E = (e₁, e₂)	e₁ = (1, 0), e₂ = (0, 1)	(x, y) in E	Column vector agrees with the actual vector
Non-standard B = (b, β)	b = (1, 1), β = (1, −1)	(x, y) in B = (x+y, x−y) in E	Column vector differs from the actual vector

Only in the standard basis does the column vector representation match the column vector the vector "actually is."
Don't confuse: The same numbers (x, y) represent different vectors in different bases.

🎯 Why order matters

There is no inherent order to basis vectors; we must choose one arbitrarily.
Changing the order changes the column vector representation.
Example: For the hyperplane basis B = (b₁, b₂) vs B′ = (b₂, b₁), the column vector (x, y) in B equals (y, x) in B′.

🔍 Finding column vectors

To find the column vector of a given vector v in basis B = (b₁, b₂, ..., bₙ), solve the linear system: v = α₁b₁ + α₂b₂ + ⋯ + αₙbₙ.
Example: For the Pauli matrices basis, finding the column vector of a given 2×2 matrix requires solving a system of four equations for the three coefficients.

🔧 From linear operators to matrices

🔧 The matrix construction rule

A matrix records how a linear operator maps each element of the input basis to a sum of multiples of the output basis vectors.

If L is a linear operator from V to W, with input basis B = (b₁, b₂, ...) and output basis B′ = (β₁, β₂, ...), the matrix entry mⱼᵢ is the coefficient of βⱼ when L(bᵢ) is written in the output basis.
Formally: L(bᵢ) = m₁ᵢβ₁ + m₂ᵢβ₂ + ⋯ + mⱼᵢβⱼ + ⋯

📝 Step-by-step procedure

Apply L to each input basis vector: L(b₁), L(b₂), ..., L(bᵢ), ...
Express each result as a linear combination of output basis vectors.
Collect the coefficients into columns: the i-th column contains the coefficients from L(bᵢ).

Example: For L : V → R³ with L(b₁) = 0e₁ + 1e₂ + 0e₃ and L(b₂) = 0e₁ + 1e₂ + 0e₃, the matrix is [[0, 0], [1, 1], [0, 0]].

🎼 Derivative operator example

For the derivative operator d/dx acting on polynomials of degree ≤ 2, with basis B = (1, x, x²):
- d/dx(1) = 0 = 0·1 + 0·x + 0·x²
- d/dx(x) = 1 = 1·1 + 0·x + 0·x²
- d/dx(x²) = 2x = 0·1 + 2·x + 0·x²
The matrix is [[0, 1, 0], [0, 0, 2], [0, 0, 0]].
Important: This matrix representation only makes sense when the input and output bases are specified.

⚠️ Basis dependence

The same linear operator has different matrix representations in different bases.
Don't confuse: A matrix is not the operator itself; it is a representation that depends on the choice of ordered input and output bases.
The excerpt emphasizes: "Linear operators become matrices when given ordered input and output bases."

🧮 Why different bases matter

🧮 Simplicity varies by basis

The same vector can have simpler or more complicated column vectors in different bases.
Example: The vector v = (1, 1) in R² has column vector (1, 1) in the standard basis E, but (1, 0) in the basis B = ((1, 1), (1, −1))—the latter is simpler.
Key insight: The existence of many bases allows us to choose one that makes our computation easiest.

🌐 Beyond standard bases

The "standard basis" only makes sense for R^n.
For other vector spaces (e.g., solutions to differential equations, matrices, polynomials), there is no universal standard basis.
Example: For the vector space of trace-free complex matrices, the Pauli matrices form a natural basis, not any "standard" choice.

Basis Notation

7.1.1 Basis Notation

🧭 Overview

🧠 One-sentence thesis

An ordered basis allows us to encode any vector in a vector space as a column of numbers (components), and the same vector will have different column representations in different bases, which lets us choose the basis that makes our computations simplest.

📌 Key points (3–5)

What basis notation does: it expresses an arbitrary vector as a sum of multiples of basis elements, storing the coefficients in a column vector.
The column vector is not the vector itself: the notation with a basis subscript stands for the actual vector obtained by multiplying each coefficient by the corresponding basis element and summing.
Order matters: changing the order of basis elements changes the column vector representation of the same vector.
Common confusion: in the standard basis E for R^n, the column vector of v happens to equal the column vector that v actually is, but this is special—in other bases the column representation differs from the "raw" column vector.
Why multiple bases are useful: different bases give different column representations, so we can choose a basis that makes the column vector simpler or the computation easier.

🔤 What basis notation means

🔤 Writing a vector in terms of a basis

Given a vector v and an ordered basis B = (b₁, b₂, ..., bₙ), the column vector of v in basis B is the column of coefficients (α₁, α₂, ..., αₙ) such that v = α₁b₁ + α₂b₂ + ⋯ + αₙbₙ.

The notation with subscript B means: multiply each basis element by the corresponding scalar in the column and sum them.
Two equivalent shorthand notations are used:
- Column vector with subscript: [α₁, α₂, ..., αₙ] with subscript B
- Basis list times column: (b₁, b₂, ..., bₙ) times the column [α₁, α₂, ..., αₙ]
The second notation can be read like matrix multiplication of a row vector times a column vector, except the row entries are themselves vectors.

🧮 Finding the column vector: a linear systems problem

To find the column vector of a given vector v in a given basis, solve the equation v = α₁b₁ + α₂b₂ + ⋯ + αₙbₙ for the unknowns α₁, α₂, ..., αₙ.
This is a linear systems problem.
Example: For the Pauli matrices basis B = (σₓ, σᵧ, σᵤ) and a given matrix v, solve v = αₓσₓ + αᵧσᵧ + αᵤσᵤ by equating components, which gives four equations for the three unknowns.

🎯 Examples across different vector spaces

🎯 2×2 real matrices

Vector space V = {2×2 real matrices}.
One basis: B = (e₁₁, e₁₂, e₂₁, e₂₂) where e₁₁ = [[1,0],[0,0]], e₁₂ = [[0,1],[0,0]], etc.
An arbitrary matrix v = [[a,b],[c,d]] is written as ae₁₁ + be₁₂ + ce₂₁ + de₂₂.
The column vector [a, b, c, d] with subscript B encodes the matrix v but is NOT equal to it (v is a matrix, the column is in R⁴).

🎯 Standard basis of R²

The standard basis vectors are e₁ = [1, 0] and e₂ = [0, 1].
The ordered basis is E = (e₁, e₂).
An arbitrary vector v = [x, y] is written as xe₁ + ye₂.
Notation: [x, y] with subscript E := (e₁, e₂) times [x, y] := xe₁ + ye₂ = v.
Read as: "The column vector of the vector v in the basis E is [x, y]."

🎯 Hyperplane in R³

Vector space V = {c₁[1,1,0] + c₂[0,1,1] | c₁, c₂ in R}.
One ordered basis: B = (b₁, b₂) where b₁ = [1,1,0] and b₂ = [0,1,1].
The column [x, y] with subscript B = xb₁ + yb₂ = [x, x+y, y] with subscript E.
If we reverse the order to B′ = (b₂, b₁), then [x, y] with subscript B′ = [y, x+y, x] with subscript E.
Don't confuse: the same column [x, y] represents different vectors in B and B′ because order matters.

🎯 Pauli matrices (complex trace-free 2×2 matrices)

Vector space V = {[[z, u], [v, -z]] | z, u, v in C} over C.
Basis B = (σₓ, σᵧ, σᵤ) where σₓ = [[0,1],[1,0]], σᵧ = [[0,-i],[i,0]], σᵤ = [[1,0],[0,-1]].
To find the column vector of v = [[-2+i, 1+i], [3-i, 2-i]] in basis B, solve the equation v = αₓσₓ + αᵧσᵧ + αᵤσᵤ.
This gives four equations (one for each matrix entry) for the three unknowns αₓ, αᵧ, αᵤ.
Solution: αₓ = 2, αᵧ = 2 - 2i, αᵤ = -2 + i, so v = [2, 2-i, -2+i] with subscript B.

🔄 Standard vs non-standard bases

🔄 Non-standard basis of R²

Consider b = [1, 1] and β = [1, -1] as a basis for R².
There is no a priori reason to order them one way or the other, but we must choose an order to encode vectors.
Choose ordered basis B = (b, β).
The column [x, y] with subscript B := (b, β) times [x, y] := xb + yβ = x[1,1] + y[1,-1] = [x+y, x-y].
Don't confuse: [x, y] with subscript B = [x+y, x-y] (as a raw column), but [x, y] with subscript E = [x, y] (as a raw column).
Only in the standard basis E does the column vector of v agree with the column vector that v actually is.

🔄 Why non-standard bases can be simpler

The vector v = [1, 1] in the standard basis E is [1, 1] with subscript E (easy to calculate).
But in the basis B = (b, β) where b = [1,1], we find v = [1, 0] with subscript B, which is actually a simpler column vector.
Key insight: The fact that there are many bases for any given vector space allows us to choose a basis in which our computation is easiest.
The standard basis only makes sense for R^n; for other vector spaces (e.g., solutions to a differential equation), there is no obvious "standard" basis.

📊 Components and notation summary

📊 Definition of components

The numbers (α₁, α₂, ..., αₙ) in the column vector representation of v in basis B are called the components of the vector v.

To find them, solve the linear systems problem v = α₁b₁ + α₂b₂ + ⋯ + αₙbₙ.

📊 Two shorthand notations

Notation	Meaning	Interpretation
[α₁, α₂, ..., αₙ] with subscript B	The vector v expressed in basis B	Multiply each αᵢ by bᵢ and sum
(b₁, b₂, ..., bₙ) times [α₁, α₂, ..., αₙ]	Same as above	Read like matrix multiplication: row of basis vectors times column of coefficients

Both notations stand for the vector obtained by multiplying the coefficients by the corresponding basis element and summing.
The second notation is useful because it resembles matrix multiplication, even though the "row vector" entries are themselves vectors.

From Linear Operators to Matrices

7.1.2 From Linear Operators to Matrices

🧭 Overview

🧠 One-sentence thesis

A matrix encodes how a linear operator transforms basis vectors of the domain into linear combinations of basis vectors in the target space, fully specifying the operator once input and output bases are chosen.

📌 Key points (3–5)

Core idea: A matrix records how each input basis vector is mapped to a sum of multiples of output basis vectors.
Why matrices work: Linear functions are fully specified by their values on any basis for their domain (from Chapter 6).
The construction process: Compute what the linear transformation does to every input basis vector, then express each result in terms of the output basis vectors.
Common confusion: A matrix for a linear operator has no meaning without specifying which ordered bases are used for input and output.
Key notation: The column vector of a vector v in basis B is found by solving v = α₁b₁ + α₂b₂ + ⋯ + αₙbₙ; the αᵢ are called the components of v.

🧩 Column vectors and components

🧩 What a column vector represents

The column vector of a vector v in an ordered basis B = (b₁, b₂, ..., bₙ) is defined by solving the linear system v = α₁b₁ + α₂b₂ + ⋯ + αₙbₙ.

The column vector is not the vector itself; it is a representation of the vector relative to a chosen basis.
The numbers (α₁, α₂, ..., αₙ) are called the components of the vector v.
Shorthand notation: v = [α₁, α₂, ..., αₙ] with subscript B, or v = (b₁, b₂, ..., bₙ)[α₁, α₂, ..., αₙ].

🔍 How to find components

Set up the equation: v equals some linear combination of the basis vectors.
Solve for the coefficients αᵢ.
Example from the excerpt: Given a system with αₓ + iαᵧ = 3 − i, αᵤ = −2 + i, and −αᵤ = 2 − i, the solution is αₓ = 2, αᵧ = 2 − 2i, αᵤ = −2 + i, so v = [−2+i, 1+i, 3−i, 2−i] in basis B.

🔧 Constructing the matrix of a linear operator

🔧 The definition

If L is a linear operator from V to W, the matrix for L in ordered bases B = (b₁, b₂, ...) for V and B′ = (β₁, β₂, ...) for W is the array of numbers mⱼᵢ specified by L(bᵢ) = m₁ᵢβ₁ + ⋯ + mⱼᵢβⱼ + ⋯

The matrix entry mⱼᵢ tells you the coefficient of the j-th output basis vector when the i-th input basis vector is transformed.
The i-th column of the matrix is the column vector of L(bᵢ) in the output basis B′.

📝 Step-by-step construction

Apply L to each input basis vector: Compute L(b₁), L(b₂), ..., L(bᵢ), ...
Express each result in the output basis: Write each L(bᵢ) as a linear combination of β₁, β₂, ..., βⱼ, ...
Assemble the matrix: The coefficients from step 2 become the columns of the matrix.

The excerpt gives the formula:

(L(b₁), L(b₂), ..., L(bᵢ), ...) = (β₁, β₂, ..., βⱼ, ...) times the matrix with columns [m₁₁, m₂₁, ..., mⱼ₁, ...], [m₁₂, m₂₂, ..., mⱼ₂, ...], ..., [m₁ᵢ, m₂ᵢ, ..., mⱼᵢ, ...], ...

⚠️ Basis dependence

Don't confuse: The same linear operator can have different matrices depending on which bases you choose.
The excerpt emphasizes: "This last line makes no sense without explaining which bases we are using!"
Example: The derivative operator on polynomials of degree 2 or less has matrix [[0, 1, 0], [0, 0, 2], [0, 0, 0]] in the basis (1, x, x²), but would have a different matrix in a different basis.

📐 Worked examples

📐 Example: L from V to R³

Setup: L : V → R³ with L([1, 1, 0]) = [0, 1, 0] and L([0, 1, 1]) = [0, 1, 0].
Input basis: B = ([1, 1, 0], [0, 1, 1]) = (b₁, b₂).
Output basis: E = ([1, 0, 0], [0, 1, 0], [0, 0, 1]) = (e₁, e₂, e₃).
Calculation:
- Lb₁ = 0e₁ + 1e₂ + 0e₃, so the first column is [0, 1, 0].
- Lb₂ = 0e₁ + 1e₂ + 0e₃, so the second column is [0, 1, 0].
- Matrix: [[0, 0], [1, 1], [0, 0]].
Interpretation: L acts like the matrix [[0, 0], [1, 1], [0, 0]] when given these bases.
The excerpt notes: "We had trouble expressing this linear operator as a matrix" before choosing bases, showing that bases are essential.

🧮 Example: Derivative operator on polynomials

Vector space: V = {a₀·1 + a₁x + a₂x² | a₀, a₁, a₂ ∈ R} (polynomials of degree 2 or less).
Ordered basis: B = (1, x, x²).
Action of d/dx:
- d/dx(1) = 0 = 0·1 + 0·x + 0·x², so first column is [0, 0, 0].
- d/dx(x) = 1 = 1·1 + 0·x + 0·x², so second column is [1, 0, 0].
- d/dx(x²) = 2x = 0·1 + 2·x + 0·x², so third column is [0, 2, 0].
Matrix: [[0, 1, 0], [0, 0, 2], [0, 0, 0]].
Using the matrix: If [a, b, c] in basis B represents a + bx + cx², then d/dx applied to it gives [b, 2c, 0] in basis B, which represents b + 2cx.

🔑 The fundamental rule

🔑 Linear operators become matrices

The excerpt states the general rule:

Linear operators become matrices when given ordered input and output bases.

Why this matters: Without bases, a linear operator is an abstract function; with bases, it becomes a concrete array of numbers that can be computed with.
How to use it: Once you have the matrix, applying the linear operator to any vector in the domain is equivalent to matrix multiplication.
Example from the excerpt: L([x, y] in basis B) = [[0, 0], [1, 1], [0, 0]] times [x, y], which equals [(x+y)·[0, 1, 0]] in basis E.

🧠 Why linear functions are special

The excerpt recalls from Chapter 6: "linear functions are very special kinds of functions; they are fully specified by their values on any basis for their domain."
This means: Once you know what L does to each basis vector, you know what L does to every vector (by linearity).
The matrix is simply a compact way to record this information.

Linear Operators as Matrices

7.2 Review Problems

🧭 Overview

🧠 One-sentence thesis

Linear operators can be represented as matrices once we choose ordered input and output bases, which transforms abstract linear transformations into concrete computational tools.

📌 Key points (3–5)

Core principle: Linear operators become matrices when given ordered input and output bases.
How it works: Express where each basis vector maps, then read off the coefficients as matrix columns.
Common confusion: The same linear operator produces different matrices depending on which bases you choose—the matrix representation is not unique without specifying bases.
Why it matters: Matrix representation allows efficient computation and storage of linear transformations.
Key skill: Matching basis vectors to matrix columns by expressing outputs as linear combinations of output basis vectors.

🔧 Building matrix representations

🔧 The general procedure

The excerpt establishes the fundamental rule:

Linear operators become matrices when given ordered input and output bases.

Start with a linear operator L acting on vectors.
Choose an ordered input basis B = (b₁, b₂, ...) and output basis E = (e₁, e₂, ...).
For each input basis vector, compute where L sends it.
Express each result as a linear combination of output basis vectors.
The coefficients form the columns of the matrix.

📝 The worked example with custom bases

The excerpt shows an operator L where:

L maps (1,1,0) to (0,1,0)
L maps (0,1,1) to (0,1,0)
By linearity: L maps c₁(1,1,0) + c₂(0,1,1) to (c₁ + c₂)(0,1,0)

Building the matrix:

Input basis B = ((1,1,0), (0,1,1))
Output basis E = ((1,0,0), (0,1,0), (0,0,1))
Lb₁ = 0e₁ + 1e₂ + 0e₃ → first column is (0,1,0)
Lb₂ = 0e₁ + 1e₂ + 0e₃ → second column is (0,1,0)
Result: the matrix is a 3×2 matrix with columns (0,1,0) and (0,1,0)

Don't confuse: The matrix size depends on the dimensions of the bases—here 3 rows (output basis size) and 2 columns (input basis size).

📐 Derivative operator example

📐 Polynomials of degree 2 or less

The excerpt demonstrates the derivative operator on V = {a₀·1 + a₁x + a₂x²}.

In basis B = (1, x, x²):

A polynomial is written as (a,b,c)_B = a·1 + bx + cx²
Taking the derivative: d/dx of (a,b,c)_B = b·1 + 2cx + 0x² = (b, 2c, 0)_B
Reading the transformation pattern:
- First basis vector 1 → 0
- Second basis vector x → 1
- Third basis vector x² → 2x
The matrix representation is:
```
0  1  0
0  0  2
0  0  0
```

Critical note from the excerpt: "Notice this last line makes no sense without explaining which bases we are using!"

📐 Different basis, different matrix

The review problems ask students to find the matrix for d/dx in basis B = (x², x, 1) and also in basis B' = (x² + x, x² - x, 1).

The same operator produces different matrices in different bases.
Example: The problems then ask students to solve the same differential equation using each matrix and compare results.

🧪 Review problem patterns

🧪 Supply package problem

Problem 1 asks students to view packages as functions:

Package f contains 3 slabs, 4 fasteners, 6 brackets
Package g contains 7 slabs, 5 fasteners, 3 brackets
These can be written as vectors in R³ (after choosing an ordering for supplies)
A manufacturing process L takes supply packages and outputs (doors, frames)
Given Lf = 1 door + 2 frames and Lg = 3 doors + 1 frame, find the matrix for L

Key insight: Real-world processes can be modeled as linear operators between vector spaces.

🧪 Synthesizer problem

Problem 2 models a keyboard where:

First key with intensity a produces a·sin(t)
Second key with intensity b produces b·sin(2t)
Sounds can be added (superposition principle)

Students must distinguish between (3,5) in R^{1,2} versus (3,5)_B where B = (sin(t), sin(2t)).

🧪 Integration operator

Problem 7 asks for the matrix of the integration operator I: V → W defined by I(p(x)) = integral from 1 to x of p(t)dt.

Input space V: polynomials of degree ≤ 2
Output space W: polynomials of degree ≤ 3
Input basis B = (1, x, x²)
Output basis B' = (x³, x², x, 1)

Don't confuse: Integration increases polynomial degree, so the output space must be larger.

🧪 Polynomial interpolation

Problem 8 generalizes finding mx + b from two points:

How many points determine a constant function? (1 point)
How many points determine a first-order polynomial? (2 points)
How many points determine an n-th order polynomial R → R? (n+1 points)
How many points determine a first-order polynomial R² → R²? (The problem asks students to work this out)

Key pattern: The number of constraints needed matches the number of free parameters in the function space.

🔍 Matrix definition and terminology

🔍 Basic structure

An r × k matrix M = (m^i_j) for i = 1,...,r; j = 1,...,k is a rectangular array of real (or complex) numbers.

The superscript i indexes the row
The subscript j indexes the column
The numbers m^i_j are called entries

🔍 Special cases

Type	Dimensions	Notation	Example
Column vector	r × 1	v = (v^r)	(1, 2, 3) written vertically
Row vector	1 × k	v = (v_k)	(1 2 3) written horizontally

🔍 Transpose operation

The transpose of a column vector is the corresponding row vector and vice versa.

If v is a column vector (1, 2, 3), then v^T is the row vector (1 2 3)
(v^T)^T = v
This is an involution: an operation that does nothing when performed twice

💾 Practical applications

💾 Image storage

The excerpt mentions .gif files as an example:

The file is essentially a matrix
Matrix size is specified at the start
Each entry indicates the color of a pixel
Rows may be shuffled (e.g., every eighth row) for progressive display during download
Compression algorithms are applied to reduce file size

💾 Graph theory preview

The excerpt begins to mention that graphs (vertices and connections) can be represented as matrices, setting up applications in telephone networks and airline routes.

Properties of Matrices

7.3 Properties of Matrices

🧭 Overview

🧠 One-sentence thesis

Matrices are rectangular arrays of numbers that efficiently store information and represent linear operators, with addition, scalar multiplication, and a special multiplication rule that respects linearity and dimensional constraints.

📌 Key points (3–5)

What matrices are: rectangular arrays of real (or complex) numbers that serve as efficient storage and representations of linear operators.
Matrix as a vector space: the set of all r×k matrices forms a vector space with entry-wise addition and scalar multiplication.
Matrix multiplication rule: multiply an r×k matrix M by a k×s matrix N by treating N as s column vectors and multiplying M by each, yielding an r×s matrix.
Common confusion: matrix multiplication requires matching dimensions—the number of columns in the first matrix must equal the number of rows in the second; the order matters and is not commutative.
Two views of multiplication: either as M acting on each column of N, or as dot products of rows of M with columns of N.

📐 Matrix structure and notation

📐 Definition and entries

An r×k matrix M = (m_ij) for i = 1,...,r; j = 1,...,k is a rectangular array of real (or complex) numbers.

The superscript i indexes the row.
The subscript j indexes the column.
The numbers m_ij are called entries.
Example: a 3×2 matrix has 3 rows and 2 columns.

📏 Special cases: vectors

Column vector: an r×1 matrix v = (v_r), written vertically.
Row vector: a 1×k matrix v = (v_k), written horizontally.
Transpose: flipping a column vector gives the corresponding row vector and vice versa; applying transpose twice returns the original (an involution).

Example: if v is a column vector [1, 2, 3] vertically, then v^T is the row vector (1 2 3), and (v^T)^T = v.

🗂️ Matrices as information storage

🗂️ Practical uses

The excerpt emphasizes that a matrix is an efficient way to store information, not just an abstract mathematical object.

Application	How the matrix encodes information
Computer graphics (.gif files)	Each entry indicates the color of a pixel; the file starts with matrix size, then lists entries (rows may be shuffled for progressive display, then compressed).
Graph theory	An adjacency matrix where m_ij indicates the number of edges between vertex i and vertex j.

🔗 Symmetric matrices

In the graph example, the adjacency matrix is symmetric: m_ij = m_ji.
This reflects that an edge between vertex i and vertex j is the same as an edge between j and i.

➕ Matrix arithmetic

➕ Addition and scalar multiplication

The set of all r×k matrices M_rk forms a vector space.

Addition: M + N = (m_ij) + (n_ij) = (m_ij + n_ij)
Add corresponding entries.
Scalar multiplication: rM = r(m_ij) = (r·m_ij)
Multiply every entry by the scalar r.

Don't confuse: these operations are entry-wise; they do not involve the more complex matrix multiplication rule.

🔢 Connection to column vectors

The vector space M_n1 (n×1 matrices) is exactly the vector space R^n of column vectors.
This shows that column vectors are a special case of matrices.

✖️ Matrix multiplication

✖️ Multiplying matrix by column vector

The excerpt recalls the rule for multiplying an r×k matrix M by a k×1 column vector V:

Result: an r×1 column vector.
Formula: (MV)_i = sum from j=1 to k of m_ij · v_j.
This is the foundation for the general matrix multiplication rule.

🧩 General multiplication rule: column-by-column view

To multiply an r×k matrix M by a k×s matrix N:

Think of N as s column vectors N_1, N_2, ..., N_s, each of dimension k×1.
Multiply M by each column vector: M·N_1, M·N_2, ..., M·N_s (each result is r×1).
Place these s result columns side-by-side to get an r×s matrix MN.

Concisely: if M = (m_ij) for i=1,...,r; j=1,...,k and N = (n_ij) for i=1,...,k; j=1,...,s, then MN = L where L = (ℓ_ij) for i=1,...,r; j=1,...,s is given by
ℓ_ij = sum from p=1 to k of m_ip · n_pj.

🎯 Dimensional constraints

For the product MN to make sense:

M is r×k, N is s×m.
We need k = s (columns of M must equal rows of N).
The result is r×m.

Shorthand diagram: (r × k) times (k × m) is (r × m).

Example: multiplying a (3×1) matrix by a (1×2) matrix yields a (3×2) matrix.
[1; 3; 2] times (2 3) = [2 3; 6 9; 4 6].

Don't confuse: for the product NM, we would need m = r. Matrix multiplication is not commutative; order matters.

🔍 Dot product view

An alternative way to understand matrix multiplication:

The (i,j)-entry of MN is the dot product of the i-th row of M with the j-th column of N.

Example: Let M be a 3×2 matrix with rows u^T, v^T, w^T, and N be a 2×3 matrix with columns a, b, c. Then
MN has entries [u·a, u·b, u·c; v·a, v·b, v·c; w·a, w·b, w·c].

📏 Linearity and orthogonality

The multiplication rule obeys linearity.

Important consequence (Theorem 7.3.1): Let M be a matrix and x a column vector. If Mx = 0, then the vector x is orthogonal to the rows of M.

Why: each entry of Mx is a dot product of a row of M with x; if all are zero, x is orthogonal to every row.
This connects matrix-vector multiplication to geometric orthogonality.

Associativity and Non-Commutativity

7.3.1 Associativity and Non-Commutativity

🧭 Overview

🧠 One-sentence thesis

Matrix multiplication is associative (the order of grouping does not matter) but not commutative (the order of the matrices does matter), meaning that M(NR) = (MN)R but typically MN ≠ NM.

📌 Key points (3–5)

Associativity holds: For matrices M, N, and R, the product M(NR) equals (MN)R, just as with real numbers.
Commutativity fails: For generic square matrices M and N, MN is usually not equal to NM—the order matters.
Why associativity works: The proof relies on the distributive and associative properties of real numbers applied to the summation rule for matrix multiplication.
Common confusion: Don't assume matrices behave like numbers in all ways—while grouping doesn't matter (associativity), the sequence of multiplication does (non-commutativity).
Geometric meaning: The non-commutativity reflects that the order of successive linear transformations matters (e.g., rotating in different planes produces different results).

🔄 Associativity of matrix multiplication

🔄 What associativity means

Associativity: For matrices M, N, and R (of compatible sizes), M(NR) = (MN)R.

This property mirrors the behavior of real numbers: x(yz) = (xy)z.
The order in which you group the multiplications does not change the result.
Example: If M is m×n, N is n×r, and R is r×t, then both (MN)R and M(NR) yield the same m×t matrix.

🧮 Why associativity holds

The excerpt provides a detailed proof:

Start with the definition: Write out the matrix multiplication rule using summations.
- (MN)R = (sum over k from 1 to r of [sum over j from 1 to n of m_ij n_jk] r_kl)
Apply distributive property: Move the summation symbol outside brackets.
- This uses the fact that x(y + z) = xy + xz for real numbers.
Apply associativity of real numbers: Remove the square brackets around individual products.
- This uses the fact that (ab)c = a(bc) for real numbers.
Repeat for M(NR): The same reasoning shows M(NR) produces the identical expression.

The key insight: Matrix associativity inherits from the associativity and distributivity of the real numbers in the summation formula.
Don't confuse: This does not mean you can rearrange the matrices themselves—only the grouping of parentheses.

❌ Non-commutativity of matrix multiplication

❌ What non-commutativity means

Non-commutativity: For generic n×n square matrices M and N, MN ≠ NM.

Unlike real numbers (where xy = yx), the order of matrix multiplication matters.
The excerpt emphasizes "generic" matrices—there are special cases where MN = NM, but this is not the general rule.

🔢 Concrete numerical example

The excerpt provides a 2×2 example:

Computation	Result
(1 1; 0 1)(1 0; 1 1)	(2 1; 1 1)
(1 0; 1 1)(1 1; 0 1)	(1 1; 1 2)

The two products are different matrices, demonstrating MN ≠ NM.
Example interpretation: Even with simple 2×2 matrices, swapping the order changes the outcome.

🌐 Geometric interpretation: rotations in 3D

The excerpt gives a three-dimensional rotation example:

Matrix M: Rotates vectors by angle θ in the xy-plane.
- M = (cos θ, sin θ, 0; −sin θ, cos θ, 0; 0, 0, 1)
Matrix N: Rotates vectors by angle θ in the yz-plane.
- N = (1, 0, 0; 0, cos θ, sin θ; 0, −sin θ, cos θ)
Applying M then N vs. N then M: The excerpt shows a picture of a colored block rotated by 90° in each case.
- The final orientations are different, so MN ≠ NM.

Why this matters:

Matrices represent linear transformations.
The order of successive transformations affects the result.
Example: Rotating first around one axis, then another, produces a different final position than rotating in the opposite order.

🚫 Don't confuse with associativity

Associativity (holds): You can regroup: M(NR) = (MN)R.
Commutativity (fails): You cannot reorder: MN ≠ NM in general.
The excerpt explicitly contrasts these two properties to highlight that matrices share some, but not all, properties of real numbers.

🧩 Connection to linear transformations

🧩 Why order matters

Since n×n matrices represent linear transformations from R^n to R^n, matrix multiplication corresponds to composing transformations.
The order of composition matters: applying transformation M and then N is not the same as applying N and then M.
Example: In the 3D rotation case, rotating around the x-axis and then the y-axis produces a different orientation than rotating around the y-axis first and then the x-axis.

Block Matrices

7.3.2 Block Matrices

🧭 Overview

🧠 One-sentence thesis

Block matrices allow us to partition large matrices into smaller sub-matrices and perform operations by treating the blocks themselves as matrix entries, simplifying calculations when the structure suggests natural groupings.

📌 Key points (3–5)

What block matrices are: a way to partition a matrix into smaller rectangular sub-matrices called blocks.
How to partition correctly: blocks must fit together to form a rectangle; not every arrangement of blocks is valid.
How to compute with blocks: matrix operations (like multiplication) can be carried out by treating blocks as if they were individual entries.
When to use blocks: context or patterns in the matrix (e.g., large blocks of zeros or identity-like blocks) suggest useful ways to partition.
Common confusion: blocks must align properly—the arrangement must respect the rectangular structure of the original matrix.

🧩 What block matrices are

🧩 Definition and structure

Block matrix: a matrix partitioned into smaller matrices called blocks.

Instead of viewing a matrix as individual numbers, you group entries into rectangular sub-matrices.
The blocks must fit together to form the original rectangular matrix.
Example: A 4×4 matrix M can be written as a 2×2 arrangement of blocks: M = (A B; C D), where A is 3×3, B is 3×1, C is 1×3, and D is 1×1.

✅ Valid vs invalid block arrangements

Valid: blocks align so rows and columns match up across the entire matrix.
- Example: (B A; D C) makes sense if the dimensions fit.
Invalid: blocks do not align properly.
- Example: (C B; D A) does not work because the blocks cannot form a rectangle.
The key constraint: each row of blocks must have the same total height, and each column of blocks must have the same total width.

🔧 How to work with block matrices

🔧 Operations using blocks

You can perform matrix operations by treating blocks as matrix entries.
Block multiplication: if M = (A B; C D) and you compute M squared, you get M² = (A² + BC, AB + BD; CA + DC, CB + D²).
Each "entry" in the block product is itself a matrix computed using standard matrix operations.

📐 Example calculation

The excerpt gives a detailed example:

Start with M = (A B; C D) where A is 3×3, B is 3×1, C is 1×3, D is 1×1.
Compute M² = (A B; C D)(A B; C D) = (A² + BC, AB + BD; CA + DC, CB + D²).
Calculate each block separately:
- A² + BC is a 3×3 matrix.
- AB + BD is a 3×1 matrix.
- CA + DC is a 1×3 matrix.
- CB + D² is a 1×1 matrix.
Assemble the four blocks into the final 4×4 result.
The excerpt confirms: "This is exactly M²."

🎯 When and why to use block matrices

🎯 Choosing a useful partition

Many ways to partition: an n×n matrix can be divided into blocks in many different ways.
Context matters: the entries or structure of the matrix often suggest a natural partition.
Look for patterns: large blocks of zeros, identity-like blocks, or other regularities make block partitioning useful.

💡 Benefits of block structure

Simplifies calculations by reducing the problem to smaller sub-problems.
Makes patterns and structure more visible.
Example: if a matrix has a large zero block, you can immediately see that certain products will be zero without computing every entry.

⚠️ Don't confuse

Not every grouping of blocks is valid—blocks must align to form a rectangle.
Treating blocks as entries only works if you respect the rules of matrix multiplication (dimensions must match for block products).

The Algebra of Square Matrices

7.3.3 The Algebra of Square Matrices

🧭 Overview

🧠 One-sentence thesis

Square matrices of the same size can always be multiplied in either order and can be used as inputs to polynomial and convergent Taylor series functions, enabling powerful algebraic operations despite multiplication not commuting.

📌 Key points

Why square matrices are special: any two n×n matrices can be multiplied in either order (though the results differ), unlike rectangular matrices where dimensions must match.
Matrix powers and polynomials: square matrices can be raised to powers (M², M³, etc.), and any polynomial can accept a square matrix as input.
Matrix functions via Taylor series: functions defined by convergent Taylor series can be extended to matrices by substituting M for x; the matrix exponential always converges.
Trace extracts essential information: the trace (sum of diagonal entries) is a key property that is invariant under cyclic permutation of products—tr(MN) = tr(NM) even though MN ≠ NM.
Common confusion: matrix multiplication does not commute (MN ≠ NM in general), but the trace of products does commute (tr(MN) = tr(NM)).

🔢 Why square matrices are algebraically special

🔢 Dimension compatibility for multiplication

For general matrices, an r×k matrix M and an s×l matrix N can only be multiplied if k = s (columns of left = rows of right).
Square matrices of the same size bypass this restriction: two n×n matrices can always be multiplied in either order.
This does not mean the results are the same—MN and NM are typically different matrices.

🔁 Matrix powers and the identity

M⁰ = I, the identity matrix, analogous to x⁰ = 1 for numbers.

Because square matrices can be multiplied repeatedly, we can define:
- M² = MM
- M³ = MMM
- And so on for any positive integer power.
Setting M⁰ = I allows consistent notation and algebra.

📐 Polynomials and functions of matrices

📐 Polynomials with matrix inputs

Any polynomial f(x) can accept a square matrix M as input by replacing x with M and using matrix powers.
Example from the excerpt: Let f(x) = x − 2x² + 3x³ and M = [[1, t], [0, 1]].
- M² = [[1, 2t], [0, 1]]
- M³ = [[1, 3t], [0, 1]]
- f(M) = M − 2M² + 3M³ = [[1, t], [0, 1]] − 2[[1, 2t], [0, 1]] + 3[[1, 3t], [0, 1]] = [[2, 6t], [0, 2]].

🌀 Taylor series and matrix functions

If f(x) has a convergent Taylor series f(x) = f(0) + f′(0)x + (1/2!)f″(0)x² + …, we can define the matrix function:
- f(M) = f(0) + f′(0)M + (1/2!)f″(0)M² + …
Matrix exponential: exp(M) = I + M + (1/2)M² + (1/3!)M³ + … always converges for any square matrix M.
The excerpt notes that convergence techniques for matrix Taylor series rely on the fact that convergence is simple for diagonal matrices.

🎯 Trace: extracting essential information

🎯 Definition and computation

Trace of a square matrix M = (mᵢⱼ): the sum of its diagonal entries, tr M = Σᵢ mᵢᵢ (sum from i=1 to n).

The trace picks out only the diagonal elements and adds them.
Example from the excerpt: tr([[2, 7, 6], [9, 5, 1], [4, 3, 8]]) = 2 + 5 + 8 = 15.
Why it matters: a large matrix contains much information, some of which may be redundant (e.g., due to inefficient basis choice); the trace extracts essential information invariant under certain transformations.

🔄 Trace commutes under cyclic permutation

Key property: tr(MN) = tr(NM) for any square matrices M and N, even though MN ≠ NM in general.
Proof sketch from the excerpt:
- tr(MN) = tr(Σₗ Mᵢₗ Nₗⱼ) = Σᵢ Σₗ Mᵢₗ Nₗᵢ
- Reorder the sums: Σₗ Σᵢ Nₗᵢ Mᵢₗ = tr(NM).
Example from the excerpt: M = [[1, 1], [0, 1]], N = [[1, 0], [1, 1]].
- MN = [[2, 1], [1, 1]] ≠ NM = [[1, 1], [1, 2]].
- But tr(MN) = 2 + 1 = 3 = 1 + 2 = tr(NM).
Don't confuse: the matrices themselves do not commute, but their traces do.

🔁 Trace and transpose

tr M = tr Mᵀ because the trace only uses diagonal entries, which are unchanged by transposing.
Example from the excerpt: tr([[1, 1], [2, 3]]) = 4 = tr([[1, 2], [1, 3]]) = tr([[1, 2], [1, 3]]ᵀ).

📏 Trace as a linear transformation

The trace is a linear transformation from matrices to real numbers.
This means tr(aM + bN) = a·tr(M) + b·tr(N) for scalars a, b and matrices M, N.
The excerpt states this is "easy to check" from the definition.

Trace

7.3.4 Trace

🧭 Overview

🧠 One-sentence thesis

The trace extracts essential information from a square matrix by summing its diagonal entries, and it has the remarkable property that the trace of a product of matrices does not depend on the order of multiplication.

📌 Key points (3–5)

What trace measures: the sum of the diagonal entries of a square matrix, which captures essential information while ignoring details that may depend on inefficient problem setup.
Key property—order independence: tr(MN) = tr(NM), even though matrix multiplication itself does not commute (MN ≠ NM in general).
Common confusion: matrix multiplication order matters for the product itself, but trace "cancels out" the order—the trace of MN equals the trace of NM.
Additional properties: trace is unchanged by transpose (tr M = tr M^T) and is a linear transformation from matrices to real numbers.

🎯 What trace is and why it matters

🎯 Definition and motivation

Trace of a square matrix M = (m_ij): the sum of its diagonal entries, tr M = sum from i=1 to n of m_ii.

A large matrix contains a great deal of information, some of which often reflects inefficient problem setup (e.g., poor choice of basis).
The excerpt emphasizes that trace is a way to extract essential information from a matrix.
Example: For the 3×3 matrix with entries (2, 7, 6) in row 1, (9, 5, 1) in row 2, and (4, 3, 8) in row 3, the trace is 2 + 5 + 8 = 15 (only the diagonal entries 2, 5, 8 are used).

🔢 Only for square matrices

The definition requires a square matrix (n×n) so that diagonal entries m_11, m_22, ..., m_nn exist.
The excerpt notes that n must be finite; otherwise convergence subtleties arise.

🔄 Order independence of trace in products

🔄 The key theorem

Theorem 7.3.3: For any square matrices M and N, tr(MN) = tr(NM).

This is surprising because matrix multiplication does not commute: in general, MN ≠ NM.
Yet the trace of the product is the same regardless of multiplication order.

🔍 Proof sketch

The excerpt provides the proof:

tr(MN) = tr(sum over l of M_il N_lj) = sum over i, sum over l of M_il N_li
Rearranging the double sum: sum over l, sum over i of N_li M_il
This equals tr(sum over i of N_li M_il) = tr(NM).

Why it works: the trace sums over the diagonal, and the double summation can be reordered because we only care about the i = j diagonal entries.

🧩 Example showing the distinction

From Example 92:

M = matrix with entries (1, 1) in row 1 and (0, 1) in row 2
N = matrix with entries (1, 0) in row 1 and (1, 1) in row 2
MN = matrix with entries (2, 1) in row 1 and (1, 1) in row 2
NM = matrix with entries (1, 1) in row 1 and (1, 2) in row 2
MN ≠ NM (the products are different matrices)
But tr(MN) = 2 + 1 = 3 and tr(NM) = 1 + 2 = 3, so tr(MN) = tr(NM).

Don't confuse: the matrices MN and NM themselves are different, but their traces are equal.

🧮 Additional properties of trace

🔁 Trace and transpose

Property: tr M = tr M^T

The trace only uses diagonal entries, which are unchanged by the transpose operation.
Example: tr of matrix (1, 1; 2, 3) = 1 + 3 = 4, and tr of its transpose (1, 2; 1, 3) = 1 + 3 = 4.
The excerpt notes "the diagonal entries are fixed by the transpose."

➕ Linearity of trace

Property: Trace is a linear transformation from matrices to real numbers.

The excerpt states "trace is a linear transformation" and notes "this is easy to check."
This means tr(aM + bN) = a·tr(M) + b·tr(N) for scalars a, b and matrices M, N.

📊 Summary table

Property	Statement	Key insight
Definition	tr M = sum of diagonal entries m_ii	Extracts essential information
Order independence	tr(MN) = tr(NM)	Holds even though MN ≠ NM
Transpose invariance	tr M = tr M^T	Diagonal unchanged by transpose
Linearity	tr(aM + bN) = a·tr(M) + b·tr(N)	Trace is a linear map to reals

7.4 Review Problems

🧭 Overview

🧠 One-sentence thesis

This section provides practice problems that consolidate matrix multiplication, transpose properties, trace invariance, and the relationship between matrix operations and linear transformations.

📌 Key points (3–5)

Matrix multiplication practice: compute products of various sizes and shapes to reinforce the mechanics of matrix multiplication.
Transpose of products: prove that the transpose of a product reverses the order: (MN)ᵀ = NᵀMᵀ.
Trace properties: explore how trace behaves under multiplication and transpose, and investigate whether trace is basis-independent for linear transformations.
Common confusion: distinguish between left and right multiplication—both are linear transformations, but they act on different matrix spaces.
Applications: connect abstract properties (symmetry, dot products as matrix operations) to concrete computational techniques.

🧮 Matrix multiplication mechanics

🧮 Computing products

The first problem asks you to compute several matrix products of varying dimensions:

Products of 3×3 matrices with 3×3 matrices
A row vector times a column vector (yielding a scalar)
A column vector times a row vector (yielding a matrix)
Chains of three or more matrices

Why this matters: Fluency with matrix multiplication is essential because:

You must track dimensions carefully (an m×n matrix times an n×p matrix yields an m×p matrix)
Order matters—matrix multiplication does not commute
Special cases (row times column vs. column times row) produce very different results

Example: A 1×5 row vector times a 5×1 column vector gives a 1×1 result (a number), but a 5×1 column vector times a 1×5 row vector gives a 5×5 matrix.

🔄 Verifying the transpose product rule

Problem 2 walks through a structured proof that (MN)ᵀ = NᵀMᵀ:

Write M and N in terms of their entries mᵢⱼ and nᵢⱼ
Compute the (i,j)-entry of MN
Compute the (i,j)-entry of (MN)ᵀ
Compute the (i,j)-entry of NᵀMᵀ
Show these are equal

Don't confuse: The transpose reverses the order of multiplication, just like the inverse does: (AB)⁻¹ = B⁻¹A⁻¹.

🔍 Trace and symmetry

🔍 Trace of products

Problem 3 asks you to:

Compute AAᵀ and AᵀA for a specific matrix A
Show that for any m×n matrix M, both MᵀM and MMᵀ are symmetric
Determine the relationship between their traces

Key insight: Even though MᵀM and MMᵀ have different sizes (n×n vs. m×m), their traces are equal because trace is invariant under cyclic permutation: tr(MᵀM) = tr(MMᵀ).

🔍 Basis independence of trace

Problem 6 explores a deep property:

Compute the matrix of a linear transformation L in one basis B
Compute the matrix of the same L in a different basis B′
Compare the traces

What you should find: The trace is the same in both bases. This means trace is an intrinsic property of the linear transformation itself, not dependent on the choice of basis. It makes sense to talk about "the trace of a linear transformation" without specifying a basis.

🧩 Special matrix operations

🧩 Dot product as matrix multiplication

Problem 4 asks you to show that the dot product of column vectors x and y can be written as xᵀIy, where I is the identity matrix.

Why this form matters: This expresses the dot product as a matrix operation, connecting geometric intuition (dot product) with algebraic machinery (matrix multiplication).

🧩 Right multiplication is linear

Problem 5 asks you to prove that right multiplication by a k×m matrix R is a linear transformation from the space of s×k matrices to the space of s×m matrices.

What to show: For matrices A and B of size s×k and scalars c:

(A + B)R = AR + BR
(cA)R = c(AR)

Don't confuse: Left multiplication by an r×s matrix N takes s×k matrices to r×k matrices; right multiplication by a k×m matrix R takes s×k matrices to s×m matrices. Both are linear, but they change different dimensions.

🎯 Diagonal and special matrices

🎯 Diagonal matrix multiplication

Problem 7 asks you to explain what happens when you multiply a matrix by a diagonal matrix on the left vs. on the right.

Key observations:

Left multiplication by a diagonal matrix scales the rows of the matrix
Right multiplication by a diagonal matrix scales the columns of the matrix

Example: If D is diagonal with entries d₁, d₂, d₃, then DM multiplies row i of M by dᵢ, while MD multiplies column j of M by dⱼ.

🎯 Matrix exponential

Problem 8 asks you to compute exp(A) for specific 2×2 matrices, including:

A diagonal matrix with λ on the diagonal
An upper triangular matrix
A nilpotent matrix (one where some power equals zero)

Recall: The matrix exponential is defined as exp(A) = I + A + (A²/2!) + (A³/3!) + ...

🎯 Block multiplication

Problem 9 introduces block matrix multiplication: divide a large matrix into named blocks (submatrices) and multiply the blocks as if they were scalars, provided dimensions are compatible.

Advantage: Block multiplication can simplify computation when the matrix has special structure (e.g., contains an identity block or zero blocks).

🔀 Symmetric and anti-symmetric decomposition

🔀 Decomposing any matrix

Problem 10 asks you to show that every n×n matrix M can be written as M = A + S, where:

A is anti-symmetric (Aᵀ = −A)
S is symmetric (Sᵀ = S)

Hint from the excerpt: Consider M + Mᵀ and M − Mᵀ.

Solution approach:

M + Mᵀ is symmetric
M − Mᵀ is anti-symmetric
Set S = (M + Mᵀ)/2 and A = (M − Mᵀ)/2
Then M = A + S

Why this matters: This decomposition separates the symmetric and anti-symmetric parts of any matrix, which is useful in many applications (e.g., physics, where symmetric and anti-symmetric tensors have different physical meanings).

⚠️ Non-associative operations

⚠️ Cross product example

Problem 11 explores the cross product, which is not associative:

Part (a): Find vectors u, v, w such that u × (v × w) ≠ (u × v) × w
Part (b): Reconcile this with the fact that matrix multiplication is associative

Key insight: The cross product operator B = u× can be written as a matrix (given a basis), and composing such operators corresponds to matrix multiplication, which is associative. The resolution is that the cross product itself is not the same as composition of cross-product operators.

Don't confuse: An operation being linear does not imply it is associative. The cross product is linear in each argument but not associative.

Inverse Matrix

7.5 Inverse Matrix

🧭 Overview

🧠 One-sentence thesis

A square matrix is invertible when there exists another matrix that undoes its effect through multiplication, and this inverse allows immediate solution of linear systems and reveals whether homogeneous systems have only the trivial solution.

📌 Key points (3–5)

What invertibility means: a square matrix M is invertible if there exists M⁻¹ such that M⁻¹M = I = MM⁻¹; otherwise M is singular.
How to compute inverses: use Gaussian elimination on the augmented matrix (M | I) until the left side becomes I, then the right side is M⁻¹.
Why inverses matter for systems: if M⁻¹ exists and is known, the solution to Mx = v is immediately x = M⁻¹v.
Common confusion: the order reverses when taking inverses of products—(AB)⁻¹ = B⁻¹A⁻¹, not A⁻¹B⁻¹.
Connection to homogeneous systems: M is invertible if and only if Mx = 0 has no non-zero solutions.

🔑 Definition and basic properties

🔑 What invertible means

Invertible (or nonsingular): a square matrix M is invertible if there exists a matrix M⁻¹ such that M⁻¹M = I = MM⁻¹.

Singular (or non-invertible): a matrix M that has no inverse.

The inverse must work from both sides: left multiplication and right multiplication both yield the identity.
Only square matrices can be invertible (the excerpt only discusses square matrices).
Example: if M is 3×3 and M⁻¹ exists, then M⁻¹M = I₃ and MM⁻¹ = I₃.

🧮 The 2×2 formula

For a 2×2 matrix M = (a b; c d), define N = (d -b; -c a).

Multiplying gives MN = (ad - bc, 0; 0, ad - bc) = (ad - bc)I.
Therefore M⁻¹ = 1/(ad - bc) · (d -b; -c a), so long as ad - bc ≠ 0.
The quantity ad - bc is the determinant; if it equals zero, the matrix is not invertible.
The excerpt emphasizes this formula is worth memorizing (see Figure 7.1 reference).

Example scenario: For M = (2 3; 1 4), we have ad - bc = 2·4 - 3·1 = 5, so M⁻¹ = 1/5 · (4 -3; -1 2).

🔄 Three key properties of inverses

🔄 Inverse of an inverse

Property 1: If B is the inverse of A, then A is the inverse of B, since AB = I = BA.

This gives the identity (A⁻¹)⁻¹ = A.
Taking the inverse twice brings you back to the original matrix.

🔀 Inverse of a product

Property 2: (AB)⁻¹ = B⁻¹A⁻¹

The order reverses, just like with transposes.
Why: B⁻¹A⁻¹AB = B⁻¹IB = I, and similarly ABB⁻¹A⁻¹ = I.
Don't confuse: (AB)⁻¹ is not A⁻¹B⁻¹; the factors must be reversed.

Example: If you apply transformation A then B, the inverse operation applies B⁻¹ first, then A⁻¹.

🔁 Inverse of a transpose

Property 3: (A⁻¹)ᵀ = (Aᵀ)⁻¹

Since Iᵀ = I, we have (A⁻¹A)ᵀ = Aᵀ(A⁻¹)ᵀ = I.
Similarly (AA⁻¹)ᵀ = (A⁻¹)ᵀAᵀ = I.
The inverse of the transpose equals the transpose of the inverse.

🛠️ Computing inverses via Gaussian elimination

🛠️ The augmented matrix method

To compute M⁻¹, solve the collection of systems Mx = eₖ, where eₖ is the column vector of zeroes with a 1 in the kth entry.

The identity matrix can be viewed as I = (e₁ e₂ ··· eₙ).
Set up the augmented matrix (M | I) and row-reduce the left side to I.
When the left side becomes I, the right side becomes M⁻¹: (M | I) ∼ (I | M⁻¹).

Why this works:

Solving Mx = V gives x = M⁻¹V on the right side after reduction.
Solving Mx = eₖ gives x = M⁻¹eₖ, which is the kth column of M⁻¹.
Putting all columns together gives the full inverse matrix.

📝 Worked example

The excerpt shows finding the inverse of M = (-1 2 -3; 2 1 0; 4 -2 5).

Steps:

Write augmented matrix (-1 2 -3 | 1 0 0; 2 1 0 | 0 1 0; 4 -2 5 | 0 0 1).
Apply row reduction to the left side until it becomes I.
The result is (I | M⁻¹) where M⁻¹ = (-5 4 -3; 10 -7 6; 8 -6 5).

Always check: Verify MM⁻¹ = I (or M⁻¹M = I) to catch arithmetic errors, since row reduction is lengthy and error-prone.

⚠️ Why checking matters

Row reduction involves many arithmetic steps with room for mistakes.
The excerpt explicitly recommends checking the answer by confirming MM⁻¹ = I.
Example: in the worked example, multiplying M by the computed M⁻¹ indeed yields the 3×3 identity matrix.

📐 Using inverses to solve linear systems

📐 Immediate solution when inverse is known

If M⁻¹ exists and is known, then Mx = v is equivalent to x = M⁻¹v.

Multiply both sides of Mx = v by M⁻¹ on the left: M⁻¹(Mx) = M⁻¹v.
Since M⁻¹M = I, this simplifies to x = M⁻¹v.
The solution is immediate—no further row reduction needed.

Summary: When M⁻¹ exists, Mx = v ⇔ x = M⁻¹v.

📝 Example with the computed inverse

Consider the system:

-x + 2y - 3z = 1
2x + y = 2
4x - 2y + 5z = 0

This is Mx = (1; 2; 0) where M is the matrix from the previous section.

Solution:

x = M⁻¹(1; 2; 0) = (-5 4 -3; 10 -7 6; 8 -6 5)(1; 2; 0) = (3; -4; -4).
The solution is x = 3, y = -4, z = -4.

🔗 Connection to homogeneous systems

🔗 The invertibility criterion

Theorem: A square matrix M is invertible if and only if the homogeneous system Mx = 0 has no non-zero solutions.

Proof (one direction):

Suppose M⁻¹ exists. Then Mx = 0 ⇒ x = M⁻¹0 = 0.
So if M is invertible, Mx = 0 has only the trivial solution x = 0.

Proof (other direction):

Mx = 0 always has the solution x = 0.
If no other solutions exist, M can be put into reduced row echelon form with every variable a pivot.
In this case, M⁻¹ can be computed using the augmented matrix method.

🧩 What this means

Invertibility is equivalent to having full rank (every variable is a pivot).
If Mx = 0 has a non-zero solution, then M is singular (not invertible).
Don't confuse: "no non-zero solutions" means "only the zero solution exists."

💾 Bit matrices and Z₂

💾 Matrices over Z₂

A bit matrix is a matrix with entries in Z₂ = {0, 1}, where addition and multiplication follow special rules:

| + | 0 | 1 | | × | 0 | 1 | |---|---|---| |---|---|---| | 0 | 0 | 1 | | 0 | 0 | 0 | | 1 | 1 | 0 | | 1 | 0 | 1 |

Notice that -1 = 1 in Z₂, since 1 + 1 = 0.
All linear algebra concepts (vector spaces, inverses, etc.) apply to matrices over Z₂.
Computers can add and multiply bits very quickly, making bit matrices practical.

🔐 Example and application

The excerpt gives an example: (1 0 1; 0 1 1; 1 1 1) is invertible over Z₂, with inverse (0 1 1; 1 0 1; 1 1 1).

Application—Cryptography:

A simple way to hide information is a substitution cipher, which permutes the alphabet.
Example: ROT-13 exchanges each letter with the letter thirteen places away.
(The excerpt cuts off here, but the implication is that bit matrices can be used for encoding/decoding.)

7.5.1 Three Properties of the Inverse

🧭 Overview

🧠 One-sentence thesis

The inverse of a matrix behaves predictably under composition and transposition: the inverse of an inverse returns the original matrix, the inverse of a product reverses the order of factors, and the inverse and transpose operations commute.

📌 Key points (3–5)

Property 1 (double inverse): The inverse of A⁻¹ is A itself.
Property 2 (product rule): The inverse of a product AB reverses the order: (AB)⁻¹ = B⁻¹A⁻¹.
Property 3 (transpose rule): The inverse of a transpose equals the transpose of the inverse: (A⁻¹)ᵀ = (Aᵀ)⁻¹.
Common confusion: Like the transpose, taking the inverse of a product reverses the order of the factors—do not assume (AB)⁻¹ = A⁻¹B⁻¹.
Why it matters: These properties simplify manipulations of matrix equations and ensure consistency when combining operations.

🔄 Property 1: The inverse of an inverse

🔄 Double inverse returns the original

If A is a square matrix and B is the inverse of A, then A is the inverse of B, since AB = I = BA. So we have the identity (A⁻¹)⁻¹ = A.

Plain language: Inverting twice brings you back to where you started.
Why: If B is the inverse of A, then by definition AB = I and BA = I. This means A satisfies the definition of being the inverse of B.
Example: If matrix M has inverse M⁻¹, then (M⁻¹)⁻¹ = M.
Don't confuse: This is not saying "the inverse exists"; it assumes the inverse already exists and describes what happens when you invert it again.

🔀 Property 2: Inverse of a product

🔀 Order reversal for products

Notice that B⁻¹A⁻¹AB = B⁻¹IB = I = ABB⁻¹A⁻¹ so (AB)⁻¹ = B⁻¹A⁻¹.

Plain language: The inverse of a product of two matrices is the product of their inverses in reverse order.
Why: The excerpt shows that B⁻¹A⁻¹ satisfies the definition of the inverse of AB by verifying both:
- B⁻¹A⁻¹ times AB equals I
- AB times B⁻¹A⁻¹ equals I
Example: If you have matrices A and B, then (AB)⁻¹ = B⁻¹A⁻¹, not A⁻¹B⁻¹.
Don't confuse: This is exactly like the transpose rule (AB)ᵀ = BᵀAᵀ—both operations reverse the order of a product.

🔗 Analogy to transpose

The excerpt explicitly notes:

Thus, much like the transpose, taking the inverse of a product reverses the order of the product.

Both transpose and inverse flip the order when applied to a product.
This parallel helps remember the rule: if you know (AB)ᵀ = BᵀAᵀ, then (AB)⁻¹ = B⁻¹A⁻¹ follows the same pattern.

🔃 Property 3: Inverse and transpose commute

🔃 Inverse of transpose equals transpose of inverse

Finally, recall that (AB)ᵀ = BᵀAᵀ. Since Iᵀ = I, then (A⁻¹A)ᵀ = Aᵀ(A⁻¹)ᵀ = I. Similarly, (AA⁻¹)ᵀ = (A⁻¹)ᵀAᵀ = I. Then: (A⁻¹)ᵀ = (Aᵀ)⁻¹.

Plain language: You can take the inverse first then transpose, or transpose first then invert—the result is the same.
Why: The excerpt verifies that (A⁻¹)ᵀ satisfies the definition of the inverse of Aᵀ by showing:
- (A⁻¹)ᵀ times Aᵀ equals I (by transposing A⁻¹A = I)
- Aᵀ times (A⁻¹)ᵀ equals I (by transposing AA⁻¹ = I)
Example: If you have matrix M, then (M⁻¹)ᵀ = (Mᵀ)⁻¹.
Don't confuse: This does not say the transpose and inverse are the same operation; it says the two operations can be performed in either order with the same result.

🧩 Why the identity transpose matters

The excerpt notes "Since Iᵀ = I":

The identity matrix is symmetric, so transposing it does nothing.
This fact is crucial: when you transpose A⁻¹A = I, you get Aᵀ(A⁻¹)ᵀ = Iᵀ = I, which proves (A⁻¹)ᵀ is the inverse of Aᵀ.

📋 Summary table

Property	Formula	Key insight
Double inverse	(A⁻¹)⁻¹ = A	Inverting twice returns the original
Product inverse	(AB)⁻¹ = B⁻¹A⁻¹	Order reverses (like transpose)
Transpose inverse	(A⁻¹)ᵀ = (Aᵀ)⁻¹	Inverse and transpose commute

7.5.2 Finding Inverses (Redux)

🧭 Overview

🧠 One-sentence thesis

Gaussian elimination can find the inverse of a square matrix by row-reducing the augmented matrix (M | I) until the left side becomes the identity, at which point the right side becomes M⁻¹.

📌 Key points (3–5)

Core method: augment M with the identity matrix I, then row-reduce the left side to I; the right side becomes M⁻¹.
Why it works: solving MX = V by row reduction produces M⁻¹V on the right; solving MX = eₖ for all k produces the columns of M⁻¹.
Practical workflow: write (M | I), apply Gaussian elimination to the left, check the result by verifying MM⁻¹ = I.
Common confusion: the identity matrix on the right is not the answer—it's the starting point; the answer appears after row reduction.
Connection to systems: once M⁻¹ is known, solving MX = V becomes immediate: X = M⁻¹V.

🔧 The augmented-matrix method

🔧 Why augmenting with I works

The excerpt explains that for an invertible matrix M, the system MX = V has a unique solution X = M⁻¹V.
Row-reducing the augmented matrix (M | V) produces (I | M⁻¹V) on the right.
To get M⁻¹ itself (not M⁻¹V), we need M⁻¹I = M⁻¹ on the right side.
This is achieved by solving MX = eₖ for each standard basis vector eₖ (a column of zeros with a 1 in the kth position).
Grouping all eₖ together forms the identity matrix: Iₙ = (e₁ e₂ ⋯ eₙ).

📐 The reduction process

To compute M⁻¹, augment M with the identity matrix and row-reduce: (M | I) ∼ (I | M⁻¹).

Start with the augmented matrix (M | I).
Apply Gaussian row operations only to the left side until it becomes the identity matrix.
Once the left side is I, the right side is M⁻¹.
Don't confuse: the identity on the right is the input; the identity on the left is the goal.

✅ Verification step

The excerpt emphasizes checking the answer because row reduction involves many arithmetic steps with room for error.
Verify by computing MM⁻¹ = I (or M⁻¹M = I).
Example: the excerpt shows a 3×3 matrix M and its computed inverse, then confirms the product equals the identity matrix.

🧮 Worked example

🧮 Finding the inverse of a 3×3 matrix

The excerpt provides Example 93: find the inverse of the matrix with rows (−1, 2, −3), (2, 1, 0), (4, −2, 5).

Step 1: Write the augmented matrix

Left side: the original matrix M.
Right side: the 3×3 identity matrix.

Step 2: Row-reduce the left side

The excerpt shows three intermediate steps (∼ symbols indicate row equivalence).
Operations transform the left side from M to I.

Step 3: Read off M⁻¹

Final augmented matrix: (I | M⁻¹).
The inverse is the 3×3 matrix with rows (−5, 4, −3), (10, −7, 6), (8, −6, 5).

Step 4: Check

Multiply M by M⁻¹; the result is the identity matrix, confirming correctness.

🔗 Applications to linear systems

🔗 Immediate solution via M⁻¹

If M⁻¹ is known, solving MX = V becomes trivial: X = M⁻¹V.
Example 94: the system with equations −x + 2y − 3z = 1, 2x + y = 2, 4x − 2y + 5z = 0 corresponds to MX = V where V = (1, 2, 0).
Using the inverse from Example 93, X = M⁻¹V = (−5, 4, −3; 10, −7, 6; 8, −6, 5) times (1, 2, 0) = (3, −4, −4).
The solution is x = 3, y = −4, z = −4.

🔗 Summary equivalence

When M⁻¹ exists, MX = V if and only if X = M⁻¹V.

This equivalence allows immediate reading of solutions without further row reduction.

🧪 Invertibility and homogeneous systems

🧪 Theorem 7.5.1

A square matrix M is invertible if and only if the homogeneous system MX = 0 has no non-zero solutions.

Direction 1: If M⁻¹ exists, then MX = 0 implies X = 0

Multiply both sides by M⁻¹: X = M⁻¹0 = 0.
So the only solution is the trivial solution.

Direction 2: If MX = 0 has only the trivial solution, then M⁻¹ exists

MX = 0 always has the solution X = 0.
If no other solutions exist, M can be row-reduced to a form where every variable is a pivot.
In this case, M⁻¹ can be computed using the augmented-matrix process.

🧪 Why this matters

This theorem provides a test for invertibility: check whether the homogeneous system has non-zero solutions.
If it does, M is not invertible; if it doesn't, M is invertible.

🖥️ Bit matrices

🖥️ Matrices over Z₂

The excerpt introduces bit matrices: matrices with entries in Z₂ = {0, 1}.
Addition and multiplication in Z₂ follow special tables:
- Addition: 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0.
- Multiplication: 0 × 0 = 0, 0 × 1 = 0, 1 × 0 = 0, 1 × 1 = 1.
Note: −1 = 1 in Z₂ because 1 + 1 = 0.

🖥️ Linear algebra over Z₂

All linear algebra concepts (vector spaces, matrices, inverses) apply to Z₂ entries.
Example 95: the 3×3 matrix with rows (1, 0, 1), (0, 1, 1), (1, 1, 1) is invertible over Z₂.
Its inverse is the matrix with rows (0, 1, 1), (1, 0, 1), (1, 1, 1).
Verification: multiply the two matrices using Z₂ arithmetic; the result is the identity matrix.

🖥️ Application hint

The excerpt mentions cryptography: substitution ciphers permute the alphabet to hide information.
Bit matrices can be used in such schemes (the excerpt begins to describe ROT-13 but is cut off).

Linear Systems and Inverses

7.5.3 Linear Systems and Inverses

🧭 Overview

🧠 One-sentence thesis

When a matrix inverse exists, it provides an immediate solution to any linear system associated with that matrix by converting the system into a simple multiplication problem.

📌 Key points (3–5)

Direct solution method: If M⁻¹ is known, solving Mx = v reduces to computing x = M⁻¹v.
Verification is essential: After computing an inverse through row reduction, always check that M M⁻¹ = I to catch arithmetic errors.
Connection to homogeneous systems: A square matrix is invertible if and only if the homogeneous system Mx = 0 has no non-zero solutions.
Common confusion: The inverse solves any system with that matrix, not just one specific equation—once you have M⁻¹, you can solve Mx = v for any vector v.
Practical applications: Bit matrices (matrices with entries 0 and 1 under modulo-2 arithmetic) use the same inverse principles for cryptography and data encoding.

🔑 Solving systems with inverses

🔑 The fundamental equivalence

When M⁻¹ exists: Mx = v ⇔ x = M⁻¹v

This transforms a system of equations into a single matrix multiplication.
The solution is obtained directly without further row reduction or substitution.
Why this works: Multiplying both sides of Mx = v by M⁻¹ gives M⁻¹Mx = M⁻¹v, which simplifies to x = M⁻¹v because M⁻¹M = I.

📝 Step-by-step example

The excerpt shows a system:

Negative x plus 2y minus 3z equals 1
2x plus y equals 2
4x minus 2y plus 5z equals 0

Solution process:

Write as matrix equation MX = v where v is the column vector [1, 2, 0]
Apply the known inverse: X = M⁻¹v
Multiply the inverse matrix by the vector [1, 2, 0]
Result: x = 3, y = -4, z = -4

The solution is "easy to see" once the equation is in the form x = [specific numbers].

✅ Verification and error checking

✅ Why verification matters

Row reduction is "lengthy and involved" with "lots of room for arithmetic errors."
Always check your computed inverse by confirming M M⁻¹ = I (or equivalently M⁻¹M = I).

✅ How to verify

The excerpt demonstrates:

Multiply the original matrix M by the computed inverse M⁻¹
The result should be the identity matrix (1s on the diagonal, 0s elsewhere)
Example: The product shown equals the 3×3 identity matrix, confirming the inverse is correct

Don't confuse: Verification is not optional—it's a required step to ensure your answer is correct.

🔗 Connection to homogeneous systems

🔗 The invertibility criterion

Theorem: A square matrix M is invertible if and only if the homogeneous system Mx = 0 has no non-zero solutions.

Two directions of the proof:

Direction	Logic	Implication
If M⁻¹ exists →	Mx = 0 implies x = M⁻¹0 = 0	Only the zero solution exists
If only x = 0 solves Mx = 0 →	M can be put in reduced row echelon form with every variable a pivot	M⁻¹ can be computed

🔗 What this means

The homogeneous system Mx = 0 always has at least one solution: x = 0 (the trivial solution).
Key question: Are there any other solutions?
If no non-zero solutions exist, the matrix has full rank and is invertible.
If non-zero solutions exist, the matrix is singular (not invertible).

Don't confuse: "No non-zero solutions" means the only solution is the zero vector—it doesn't mean "no solutions at all."

💻 Bit matrices and applications

💻 What are bit matrices

A bit matrix: a matrix with entries in Z₂ = {0, 1} with addition and multiplication modulo 2.

Arithmetic rules in Z₂:

Addition: 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0
Multiplication: 0 × 0 = 0, 0 × 1 = 0, 1 × 0 = 0, 1 × 1 = 1
Special property: -1 = 1 (since 1 + 1 = 0)
All linear algebra concepts apply to bit matrices.
A bit is the basic unit of information in computers (a single 0 or 1).

💻 Example from the excerpt

The 3×3 bit matrix with entries [1,0,1; 0,1,1; 1,1,1] is invertible, and its inverse is [0,1,1; 1,0,1; 1,1,1].

Verification works the same way: multiply them to get the identity matrix (using modulo-2 arithmetic).

🔐 Cryptography application

Substitution ciphers:

Simple version: systematically exchange each letter for another (e.g., ROT-13 shifts each letter 13 positions).
Easy to break when the same substitution is used throughout.

Matrix-based encryption:

Represent characters as bit vectors (e.g., ASCII uses 8 bits per character, forming a vector in Z₂⁸).
Choose an invertible bit matrix M (e.g., 8×8 for single characters, 16×16 for pairs).
Encrypt: multiply each character vector by M.
Decrypt: multiply the encrypted vector by M⁻¹.
Harder to break than simple substitution, especially with larger matrices.

One-time pad:

Uses a different substitution for each letter.
"Practically uncrackable" as long as each set of substitutions is used only once.

Don't confuse: The matrix method is not a one-time pad—it uses the same matrix M for all characters, but it's still more secure than simple letter-by-letter substitution because it mixes information across multiple bits.

Homogeneous Systems

7.5.4 Homogeneous Systems

🧭 Overview

🧠 One-sentence thesis

A homogeneous system Mx = 0 has only the trivial solution x = 0 if and only if the matrix M is invertible, which means M can be reduced to a form where every variable is a pivot.

📌 Key points (3–5)

Invertibility and solutions: If M⁻¹ exists, then Mx = 0 has no non-zero solutions; conversely, if Mx = 0 has only the zero solution, then M is invertible.
Always has at least one solution: The homogeneous system Mx = 0 always has the trivial solution x = 0.
Pivot structure: If no solutions other than x = 0 exist, M can be put into reduced row echelon form with every variable a pivot.
Common confusion: Don't confuse "no solutions" (impossible for homogeneous systems) with "no non-zero solutions" (which means M is invertible).
Computing the inverse: When Mx = 0 has only the trivial solution, M⁻¹ can be computed using the process from the previous section.

🔗 The invertibility-solution connection

🔗 When M is invertible

If M⁻¹ exists, then Mx = 0 implies x = M⁻¹0 = 0.

Start with the equation Mx = 0.
Multiply both sides by M⁻¹ (on the left).
This gives x = M⁻¹(Mx) = M⁻¹0 = 0.
Therefore: invertibility guarantees that the only solution is the zero vector.

🔄 The reverse direction

The homogeneous system Mx = 0 always has at least one solution: x = 0 (the trivial solution).
If no other solutions exist beyond x = 0, then M can be transformed into reduced row echelon form where every variable is a pivot.
When every variable is a pivot, M is invertible.
In this case, M⁻¹ can be computed using the standard process (from the previous section).

🎯 What this means for solving homogeneous systems

🎯 Two possible outcomes

Situation	What it means	Implication for M
Only x = 0 solves Mx = 0	No non-zero solutions exist	M is invertible (M⁻¹ exists)
Other solutions exist	Non-zero solutions exist	M is not invertible (singular)

🧮 The pivot criterion

"Every variable a pivot" means that in reduced row echelon form, each column has a leading 1 in a different row.
This is equivalent to saying the matrix has full rank.
Example: If M is a 3×3 matrix and you can reduce it so that all three columns have pivots, then M is invertible and Mx = 0 has only x = 0 as a solution.

🔍 Key distinctions

🔍 Homogeneous vs non-homogeneous systems

A homogeneous system has the form Mx = 0 (right-hand side is zero).
A non-homogeneous system has the form Mx = b where b ≠ 0.
Don't confuse: Non-homogeneous systems can have no solutions at all, but homogeneous systems always have at least x = 0.

🔍 Trivial vs non-trivial solutions

The trivial solution is x = 0.
A non-trivial solution is any solution x ≠ 0.
The excerpt's main point: M is invertible ⟺ no non-trivial solutions exist for Mx = 0.

Bit Matrices

7.5.5 Bit Matrices

🧭 Overview

🧠 One-sentence thesis

Invertible bit matrices can encode and decode messages by multiplying ASCII character vectors, creating substitution ciphers that become harder to break when applied to longer sequences of letters.

📌 Key points (3–5)

ASCII as bit vectors: English characters stored in ASCII are 8-bit strings, which can be treated as vectors in Z₂⁸ (8-dimensional space with entries of 0 or 1).
How bit-matrix encryption works: multiply each character vector by an invertible 8×8 bit matrix M to encode; multiply by M⁻¹ to decode.
Scaling up security: treating pairs or longer sequences of letters as vectors in higher-dimensional spaces (e.g., Z₂¹⁶) with larger invertible matrices makes decoding tougher.
Common confusion: this is not a simple substitution cipher like ROT13—each character is transformed via matrix multiplication, and the same matrix can be reused (unlike one-time pads, which use different substitutions for each letter).
Why invertibility matters: the matrix must be invertible so that M⁻¹ exists and the message can be decoded uniquely.

🔐 From substitution ciphers to bit matrices

🔤 Simple substitution ciphers

Traditional substitution ciphers replace each letter with another letter (e.g., ROT13 shifts each letter 13 positions in the alphabet).
Example: HELLO becomes URYYB; applying the algorithm again decodes it back to HELLO.
These are easy to break.

🔒 One-time pads

One-time pad: a system that uses a different substitution for each letter in the message.

As long as a particular set of substitutions is not used on more than one message, the one-time pad is unbreakable.
This extends the basic substitution idea into a practically uncrackable system.

🧮 Bit matrices for encryption

🧮 ASCII as vectors

English characters are often stored in ASCII format.
In ASCII, a single character is represented by a string of eight bits.
We can consider this 8-bit string as a vector in Z₂⁸ (like vectors in R⁸, but entries are zeros and ones).

🔢 Encoding with an invertible bit matrix

Choose an 8×8 invertible bit matrix M.
Multiply each letter of the message (as an 8-bit vector) by M to encode it.
To decode, multiply each encoded 8-character string by M⁻¹.
Example: Sender encodes character vector X by computing MX; Receiver decodes by computing M⁻¹(MX) = X.

🔐 Increasing difficulty

To make the message tougher to decode, consider pairs (or longer sequences) of letters as a single vector in Z₂¹⁶ (or a higher-dimensional space).
Use an appropriately-sized invertible matrix (e.g., 16×16 for pairs).
This increases the complexity because the attacker must analyze patterns across multiple characters at once.

🧩 Key requirements and properties

🧩 Why the matrix must be invertible

The matrix M must be invertible so that M⁻¹ exists.
Without M⁻¹, the encoded message cannot be decoded uniquely.
Don't confuse: "invertible" means the matrix has an inverse; "singular" means it does not (see review problems for identifying singular bit matrices).

🧩 Bit matrices vs. one-time pads

Feature	Bit matrix cipher	One-time pad
Substitution pattern	Same matrix M used for all characters	Different substitution for each letter
Reusability	Matrix can be reused on multiple messages	Each set of substitutions used only once
Security	Harder to break with larger matrices/longer sequences	Unbreakable if used correctly

The bit matrix approach is a middle ground: more secure than simple substitution, but not as secure as a one-time pad.

7.6 Review Problems

🧭 Overview

🧠 One-sentence thesis

This section reviews key matrix concepts—singularity, left/right inverses for non-square matrices, LU decomposition, and elementary operations—through problems that connect invertibility, linear systems, and computational techniques.

📌 Key points (3–5)

Singularity and solutions: A matrix is singular (non-invertible) if a linear system has either two distinct solutions or no solution; non-singular matrices guarantee unique solutions.
Left and right inverses: Non-square matrices can have one-sided inverses (right inverse for n×m with n<m, left inverse for n>m), but these are not unique.
LU decomposition: Any square matrix can be written as M = LU (lower triangular × upper triangular), which makes solving linear systems and computing determinants much faster.
Common confusion: Elementary Row Operations (EROs) vs Elementary Column Operations (ECOs)—both preserve certain properties, but ECOs are less commonly used.
Range and column relationships: If a matrix's range is lower-dimensional (e.g., a plane in R³), one column is a linear combination of others, and this relationship appears in the null space solutions.

🔍 Singularity and solution existence

🔍 Two distinct solutions imply singularity

Problem setup: If MX = V has two distinct solutions, then M must be singular (have no inverse).
Why: If two different X values produce the same V, the transformation is not one-to-one, so it cannot be reversed.
Example: If both X₁ and X₂ satisfy MX = V with X₁ ≠ X₂, then M cannot have an inverse.

🚫 No solution also implies singularity

Problem setup: If MX = V has no solutions for some vector V, then M must be singular.
Why: A non-singular matrix can map to any output vector; if some V is unreachable, M is not invertible.

✅ Non-singular matrices guarantee unique solutions

Conclusion: If M is non-singular, then for any column vector V, there is a unique solution to MX = V.
Why: Non-singular means M⁻¹ exists, so X = M⁻¹V is the unique solution.
Don't confuse: "No solution" and "multiple solutions" both indicate singularity, but for different reasons (range vs kernel issues).

🔄 Left and right inverses for non-square matrices

🔄 Right inverse definition and construction

Right inverse B for matrix A: A matrix B such that AB = I.

When it exists: For an n×m matrix A with n < m (more columns than rows).
Construction formula (from the excerpt):
1. Compute AA^T
2. Compute (AA^T)⁻¹
3. Set B := A^T(AA^T)⁻¹
Verification: The problem asks to verify that AB = I for the given example.
Example: For A = (0 1 1; 1 1 0), compute B using the formula and check AB = I.

🔄 Left inverse definition and construction

Left inverse C for matrix A: A matrix C such that CA = I.

When it exists: For an n×m matrix A with n > m (more rows than columns).
Suggested formula (from the excerpt): Assume A^T A has an inverse, then construct C using a similar pattern.
Test case: For A = (1; 2), find C such that CA = I.

⚠️ Non-uniqueness of one-sided inverses

Key fact: The excerpt asks "True or false: Left and right inverses are unique."
Answer: False—one-sided inverses are not unique.
Why: BA may not even be defined (different dimensions), and multiple matrices can satisfy the one-sided inverse property.
Don't confuse: Two-sided inverses (for square matrices) are unique, but one-sided inverses are not.

🧮 LU decomposition fundamentals

🧮 What LU decomposition is

LU decomposition: Writing a square matrix M as M = LU, where L is lower triangular and U is upper triangular.

Lower triangular L: All entries above the main diagonal are zero (l_ij = 0 for all j > i).
Upper triangular U: All entries below the main diagonal are zero (u_ij = 0 for all i > j).
Why it matters: Triangular matrices are much easier to work with—inverses, determinants, and linear systems are all faster to compute.

🔧 Using LU to solve linear systems

Three-step process for solving MX = LUX = V:

Step 1: Set W = UX (introduce an intermediate variable).
Step 2: Solve LW = V by forward substitution (easy because L is lower triangular).
Step 3: Solve UX = W₀ by backward substitution (easy because U is upper triangular).

Example from excerpt:

System: 6x + 18y + 3z = 3; 2x + 12y + z = 19; 4x + 15y + 3z = 0
LU decomposition: M = (3 0 0; 1 6 0; 2 3 1) × (2 6 1; 0 1 0; 0 0 1)
Step 2 gives W₀ = (1; 3; -11)
Step 3 gives X = (-3; 3; -11)

🔨 Finding an LU decomposition

Key technique: Use elementary matrices to perform row operations.

Elementary matrix example: E = (1 0; λ 1) performs the row operation R₂ → R₂ + λR₁ when multiplying EM.
Inverse: E⁻¹ = (1 0; -λ 1), so M = E⁻¹EM.
Process: Create sequences L₁, L₂, ... and U₁, U₂, ... such that L_i U_i = M at each step.
Lower unit triangular: The unique LU decomposition where L has ones on the diagonal.

Construction steps (from excerpt):

Use the first row to zero out entries below it in the first column.
Record the multipliers (with minus sign) in the first column of L₁.
Continue for subsequent rows and columns.

Example: For M = (6 18 3; 2 12 1; 4 15 3), perform R₂ → R₂ - (1/3)R₁ and R₃ → R₃ - (2/3)R₁ to get U₁, and set L₁'s first column to (1; 1/3; 2/3).

🔗 Range, columns, and null space

🔗 Range as a plane implies column dependence

Problem statement: If the range of a 3×3 matrix M (viewed as a function R³ → R³) is a plane, then one column is a sum of multiples of the other columns.

Why: A plane is 2-dimensional, so the three column vectors cannot all be independent.
Preservation under EROs: This relationship between columns is preserved when performing elementary row operations.

🔗 Null space describes column relationships

Key insight: The solutions to Mx = 0 describe the relationship between the columns.

How: If Mx = 0 has a non-trivial solution x = (a; b; c), then a·(column 1) + b·(column 2) + c·(column 3) = 0.
Interpretation: The null space encodes exactly which linear combinations of columns give zero.

🧩 Additional matrix properties

🧩 Products of invertible and singular matrices

Problem 6: If M⁻¹ exists and N⁻¹ does not exist, does (MN)⁻¹ exist?

Answer: No, (MN)⁻¹ does not exist.
Why: If N is singular, then MN is also singular (multiplying by M cannot "fix" the singularity).

🧩 Matrix exponential and invertibility

Problem 7: If M is a square matrix which is not invertible, is e^M invertible?

Key question: Does the matrix exponential of a singular matrix become non-singular?
The excerpt poses this as a problem to explore.

🧩 Elementary Column Operations (ECOs)

Problem 8: ECOs can be defined in the same 3 types as EROs.

Task: Describe the 3 kinds of ECOs (analogous to the 3 types of row operations).
Key result: If maximal elimination using ECOs produces a column of zeros, the matrix is not invertible.
Don't confuse: EROs are standard for solving systems; ECOs are less common but reveal similar structural properties.

7.7 LU Redux

🧭 Overview

🧠 One-sentence thesis

LU decomposition expresses a matrix as the product of a lower triangular matrix and an upper triangular matrix, and the method extends naturally to non-square matrices and block matrices.

📌 Key points (3–5)

Core idea: Any matrix M can be written as M = LU, where L is lower triangular and U is upper triangular.
Construction method: Build L and U iteratively by zeroing out columns below the diagonal using row operations, recording the inverse operations in L.
Key property: The product of lower triangular matrices is always lower triangular; the constants from row operations appear directly in L.
Common confusion: Standard form uses a lower unit triangular L (ones on diagonal), but you can scale columns of L and rows of U by reciprocal constants without changing the product.
Extensions: LU decomposition works for non-square (m×n) matrices and for block matrices with invertible blocks.

🔧 Building LU decomposition step by step

🔧 Starting point and first elimination

Begin with L₀ = I (identity matrix) and U₀ = M (the original matrix).
Zero out the first column below the diagonal using row operations on U₀ to produce U₁.
The matrix that undoes this row operation becomes part of L₁.

Example from the excerpt:

Start with a 3×3 matrix M.
After the first elimination step: L₁ = [[1,0,0], [1/3,1,0], [2/3,0,1]] and U₁ has zeros below the first diagonal entry.

🔄 Iterative elimination

Repeat the process for each subsequent column: zero out entries below the diagonal in column k using row k.
Each step produces a new Lᵢ and Uᵢ.
The final L is the product of all the elimination matrices; the final U is upper triangular.

Example from the excerpt:

Second step uses row operation R₃ → R₃ - (1/2)R₂ to produce U₂ = [[6,18,3], [0,6,0], [0,0,1]].
The corresponding inverse operation matrix is [[1,0,0], [0,1,0], [0,1/2,1]].
L₂ = L₁ × (this new matrix) = [[1,0,0], [1/3,1,0], [2/3,1/2,1]].

🔑 Key property: lower triangular products

The product of lower triangular matrices is always lower triangular.

This guarantees that L remains lower triangular throughout the construction.
The constants used in row operations appear directly in the appropriate columns of L (with a minus sign).
Continue until U is fully upper triangular (nothing left to zero out).

🎨 Scaling and standard form

🎨 Flexibility in representation

For any LU decomposition, you can multiply a column of L by a constant λ and divide the corresponding row of U by λ without changing the product LU.
This allows you to eliminate fractions or adjust the form.

Example from the excerpt:

Original: L = [[1,0,0], [1/3,1,0], [2/3,1/2,1]], U = [[6,18,3], [0,6,0], [0,0,1]].
Insert a diagonal scaling matrix: multiply column 1 of L by 3, column 2 by 6, etc., and divide the corresponding rows of U.
Result: L = [[3,0,0], [1,6,0], [2,3,1]], U = [[2,6,1], [0,1,0], [0,0,1]].

📐 Standard form vs. scaled form

Form	L diagonal	Appearance	Use case
Standard (lower unit triangular)	All ones	L has 1s on diagonal	Most common convention
Scaled	Arbitrary non-zero	Cleaner numbers, no fractions	Computational convenience

The excerpt notes the scaled form "looks nicer, but isn't in standard (lower unit triangular matrix) form."

📏 Non-square matrices

📏 Extending to m×n matrices

LU decomposition still makes sense for non-square matrices.
Given an m×n matrix M, write M = LU where:
- L is an m×m square lower unit triangular matrix.
- U is an m×n rectangular matrix (same shape as M).

🧮 Process for rectangular matrices

The process is exactly the same as for square matrices.
Start with L₀ = I (m×m identity) and U₀ = M.
Zero out columns below the diagonal one by one.

Example from the excerpt (Example 99):

M is 2×3: M = [[-2,1,3], [-4,4,1]].
Decomposition: L is 2×2, U is 2×3.
Start with L₀ = [[1,0], [0,1]].
Zero out first column: subtract 2 times row 1 from row 2.
Result: L₁ = [[1,0], [2,1]], U₁ = [[-2,1,3], [0,2,-5]].
U₁ is already upper triangular, so done.

⚠️ When to stop

For a 2×3 matrix, after zeroing the first column, U is already upper triangular (all entries below the diagonal are zero).
With a larger matrix, continue the process column by column until complete.

🧱 Block LDU decomposition

🧱 Block matrix setup

Let M be a square block matrix with square blocks X, Y, Z, W.
Assume X⁻¹ exists (X is invertible).
M can be decomposed as a block LDU decomposition, where D is block diagonal.

🔲 Block structure

The excerpt introduces the form: M = [[X, Y], [Z, W]].
The decomposition follows the same logic as scalar LU, but operates on blocks instead of individual entries.
(The excerpt cuts off before completing the formula, but the setup is clear.)

Key requirement:

X must be invertible for the block decomposition to proceed (needed to eliminate the Z block).

Using LU Decomposition to Solve Linear Systems

7.7.1 Using LU Decomposition to Solve Linear Systems

🧭 Overview

🧠 One-sentence thesis

LU decomposition breaks a matrix M into a lower triangular matrix L and an upper triangular matrix U through systematic row operations, enabling efficient solution of linear systems.

📌 Key points (3–5)

Uniqueness condition: there is a unique LU decomposition when L has ones on the diagonal (called lower unit triangular).
Core method: build sequences L₁, L₂, ... and U₁, U₂, ... where each Lᵢ Uᵢ = M, using row operations to zero out columns below the diagonal.
Elementary matrix trick: multiplying by a special matrix E performs a row operation; its inverse E⁻¹ undoes it, letting us track operations in L.
Common confusion: the L matrix records minus the constants used in row operations, not the constants themselves.
Flexibility: LU decomposition works for non-square matrices and can be scaled to avoid fractions, though this loses the standard lower unit triangular form.

🔧 The elementary matrix mechanism

🔧 How elementary matrices perform row operations

The excerpt introduces a key tool:

Elementary matrix E: a matrix of the form E = (1 0; λ 1) that, when multiplied on the left of M, performs the row operation R₂ → R₂ + λR₁.

When you compute EM, the second row becomes d + λa, e + λb, f + λc, ... (adding λ times the first row to the second row).
This is the fundamental trick for building the decomposition.

🔄 The inverse operation

The inverse of E is E⁻¹ = (1 0; -λ 1).
This satisfies E⁻¹E = I (the identity matrix).
Therefore M = E⁻¹(EM): the original matrix equals a lower triangular matrix times the row-operated matrix.
Why this matters: we can "undo" row operations by multiplying by lower triangular matrices, which is how we construct L.

🏗️ Building the LU decomposition step by step

🏗️ The iterative construction process

The excerpt describes creating two sequences:

Start with L₀ = I (identity) and U₀ = M.
At each step i, maintain Lᵢ Uᵢ = M.
Each Lᵢ is lower triangular; only the final Uᵢ is upper triangular.

🎯 Zeroing out the first column

Example from the excerpt: M = (6 18 3; 2 12 1; 4 15 3)

Goal: use the first row to zero out all entries below the first diagonal entry.
Row operations needed: R₂ → R₂ - (1/3)R₁ and R₃ → R₃ - (2/3)R₁.
Result: U₁ = (6 18 3; 0 6 0; 0 3 1).

📝 Recording operations in L₁

L₁ is constructed by filling its first column with minus the constants used to zero out M's first column.
From the example: L₁ = (1 0 0; 1/3 1 0; 2/3 0 1).
Key rule: the constants are negated when placed in L.
By construction, L₁U₁ = M (you should verify this yourself, as the excerpt advises).

🔁 Continuing to subsequent columns

Next step: zero the second column of U₁ below the diagonal using R₃ → R₃ - (1/2)R₂.
Result: U₂ = (6 18 3; 0 6 0; 0 0 1), which is now upper triangular.
New L matrix: multiply the matrix that undoes this operation with L₁.
The excerpt shows: L₂ = (1 0 0; 1/3 1 0; 2/3 1/2 1).

✅ Important property

The product of lower triangular matrices is always lower triangular.

This is why multiplying the "undo" matrices together keeps L lower triangular.
The final L is obtained by recording minus the constants from all row operations in the appropriate columns.
When to stop: when U becomes upper triangular (nothing left below the diagonal to zero out).

🎨 Variations and flexibility

🎨 Avoiding fractions by scaling

The excerpt notes that fractions in L can be "ugly." You can clean them up:

For matrices LU, multiply one column of L by a constant λ and divide the corresponding row of U by the same λ.
The product LU remains unchanged.
Example from excerpt: the standard form (1 0 0; 1/3 1 0; 2/3 1/2 1)(6 18 3; 0 6 0; 0 0 1) can be rewritten as (3 0 0; 1 6 0; 2 3 1)(2 6 1; 0 1 0; 0 0 1).
Trade-off: the result looks nicer but is no longer in standard lower unit triangular form (L no longer has ones on the diagonal).

📐 Non-square matrices

LU decomposition extends to m × n matrices:

Given an m × n matrix M, write M = LU where L is m × m (square lower unit triangular) and U is m × n (same shape as M).
Process: exactly the same as for square matrices—build sequences Lᵢ and Uᵢ starting with L₀ = I and U₀ = M.

Example from excerpt: M = (-2 1 3; -4 4 1) is 2 × 3.

Start with L₀ = I₂ = (1 0; 0 1).
Zero out the first column below the diagonal: subtract 2 times the first row from the second row.
Result: L₁ = (1 0; 2 1), U₁ = (-2 1 3; 0 2 -5).
Since U₁ is upper triangular, done.

🧱 Block LDU decomposition (mentioned)

The excerpt briefly introduces block decomposition:

For a square block matrix M = (X Y; Z W) where X⁻¹ exists, M can be decomposed into block LDU form.
D is block diagonal.
The excerpt does not provide the full formula (it is cut off).

🔍 Common pitfalls

🔍 Sign confusion

Don't confuse: the constants you use in row operations vs. what goes into L.
If you perform R₂ → R₂ - (1/3)R₁, you put positive 1/3 in L, not negative.
The excerpt emphasizes: L records minus the constants used to zero out columns.

🔍 Uniqueness vs. flexibility

Standard form: unique when L has ones on the diagonal (lower unit triangular).
Scaled forms: infinitely many LU decompositions exist if you allow scaling (trading factors between L and U).
Don't confuse a "nicer-looking" decomposition with the standard one—they serve different purposes.

Finding an LU Decomposition

7.7.2 Finding an LU Decomposition.

🧭 Overview

🧠 One-sentence thesis

LU decomposition systematically transforms a matrix into the product of a lower triangular matrix L and an upper triangular matrix U by recording the row operations used to zero out entries below the diagonal.

📌 Key points (3–5)

The iterative process: zero out each column below the diagonal one at a time using row operations, building U step-by-step while tracking the inverse operations in L.
Recording operations in L: the constants used in row operations (with a sign flip) are recorded in the appropriate positions of L, which remains lower triangular throughout.
Standard vs scaled forms: the standard form uses a lower unit triangular L (ones on diagonal), but you can scale columns of L and rows of U by reciprocal constants without changing the product.
Common confusion: L is not built by applying the same row operations to the identity—it records the inverse operations that undo each step.
Extension to non-square matrices: for an m×n matrix M, L is m×m (square lower triangular) and U is m×n (same shape as M), using the identical process.

🔧 The step-by-step construction process

🔧 Starting point and first elimination

Begin with L₀ = I (identity matrix) and U₀ = M (the original matrix).
Zero out the first column of U₀ below the diagonal using row operations based on the first row.
Example from excerpt: for matrix M with entries [6,18,3; 2,12,1; 4,15,3], the operation R₂ → R₂ - (1/3)R₁ eliminates the (2,1) entry.

🔧 Building L₁ from the inverse operation

The matrix L₁ records the operation that undoes the row operation.
If you used R₂ → R₂ - (1/3)R₁, the inverse is R₂ → R₂ + (1/3)R₁.
The excerpt shows L₁ = [1,0,0; 1/3,1,0; 2/3,0,1] after eliminating the first column.
Key insight: "it is obtained by recording minus the constants used for all our row operations in the appropriate columns."

🔧 Continuing to subsequent columns

After producing U₁, repeat the process for the second column below the diagonal.
Use the second row of U₁ to zero out entries below it (e.g., R₃ → R₃ - (1/2)R₂).
Build L₂ by multiplying L₁ with the new inverse-operation matrix.
The excerpt emphasizes: "The product of lower triangular matrices is always lower triangular!"

🔧 Final result

Continue until all entries below the diagonal are zero, producing the final U (upper triangular).
The final L is the accumulated product of all inverse-operation matrices.
Verification: M = LU by construction, but "you should compute this yourself as a double check."

🎨 Scaling and alternative forms

🎨 Removing fractions by scaling

For two matrices LU, we can multiply one entire column of L by a constant λ and divide the corresponding row of U by the same constant without changing the product.

The standard form has ones on the diagonal of L (lower unit triangular).
To eliminate fractions, multiply a column of L by a constant and divide the corresponding row of U by the same constant.
Example from excerpt: multiply column 1 of L by 3, column 2 by 6, then divide rows of U accordingly.
Trade-off: "The resulting matrix looks nicer, but isn't in standard (lower unit triangular matrix) form."

🎨 Why the product stays the same

When you scale column j of L by λ, every entry in that column is multiplied by λ.
When you divide row j of U by λ, every entry in that row is divided by λ.
In the matrix product LU, the contributions from column j of L and row j of U cancel out the scaling, leaving the product unchanged.

📐 Non-square matrices

📐 Shape requirements

For an m×n matrix M, the decomposition M = LU has:
- L is m×m (square lower unit triangular)
- U is m×n (same shape as M)
"From here, the process is exactly the same as for a square matrix."

📐 Example with 2×3 matrix

The excerpt shows M = [-2,1,3; -4,4,1] (2 rows, 3 columns).
Start with L₀ = I₂ (2×2 identity) and U₀ = M.
Zero out the first column below diagonal: subtract 2 times row 1 from row 2.
Result: L₁ = [1,0; 2,1] and U₁ = [-2,1,3; 0,2,-5].
Since U₁ is already upper triangular (no more entries below diagonal), the process is complete.
Don't confuse: "upper triangular" for non-square matrices means all entries below the diagonal are zero, even though there may be more columns than rows.

🧱 Block LDU decomposition

🧱 Block matrix structure

For a square block matrix M = [X,Y; Z,W] where X is invertible and square:
M can be decomposed as M = [I,0; ZX⁻¹,I] · [X,0; 0,W-ZX⁻¹Y] · [I,X⁻¹Y; 0,I]
This is a block LDU form: lower triangular blocks, diagonal blocks, upper triangular blocks.

🧱 Special case: 1×1 blocks

When each "block" is a single number (1×1), the block LDU formula reduces to the standard LDU decomposition.
Example from excerpt: [1,2; 3,4] = [1,0; 3,1] · [1,0; 0,-2] · [1,2; 0,1]
"By multiplying the diagonal matrix by the upper triangular matrix, we get the standard LU decomposition."

🧱 Verification method

The excerpt notes: "This can be checked explicitly simply by block-multiplying these three matrices."
Block multiplication follows the same rules as ordinary matrix multiplication, treating each block as a single element.

Block LDU Decomposition

7.7.3 Block LDU Decomposition

🧭 Overview

🧠 One-sentence thesis

Block LDU decomposition extends the standard LDU factorization to block matrices by treating submatrices as single elements, allowing any square block matrix with an invertible top-left block to be factored into lower triangular, block diagonal, and upper triangular block matrices.

📌 Key points (3–5)

What block LDU is: a decomposition of a square block matrix M into three block matrices (lower triangular, block diagonal, upper triangular) where each "entry" is itself a submatrix.
Key requirement: the top-left block X must be invertible (have an inverse X⁻¹).
The formula: M = (X Y; Z W) = (I 0; ZX⁻¹ I)(X 0; 0 W - ZX⁻¹Y)(I X⁻¹Y; 0 I).
Common confusion: block decomposition vs standard LDU—in block form, each "element" is a submatrix, not a scalar; the same structure applies at a higher level.
Connection to standard LDU: when blocks are 1×1 (single entries), block LDU reduces to the familiar scalar LDU decomposition.

🧩 What block LDU decomposition is

🧩 Definition and structure

Block LDU decomposition: For a square block matrix M with square blocks X, Y, Z, W such that X⁻¹ exists, M can be decomposed into a product of three block matrices: a lower block-triangular matrix, a block-diagonal matrix, and an upper block-triangular matrix.

The matrix M is written as a 2×2 arrangement of blocks: M = (X Y; Z W).
Each of X, Y, Z, W is itself a submatrix (a block), not a single number.
The decomposition treats these blocks as if they were single entries in a standard matrix.

🔍 The three factor matrices

The decomposition formula is:

M = (I 0; ZX⁻¹ I) × (X 0; 0 W - ZX⁻¹Y) × (I X⁻¹Y; 0 I)

Breaking down each factor:

Factor	Type	Structure	Role
(I 0; ZX⁻¹ I)	Lower block-triangular	Identity blocks on diagonal, ZX⁻¹ in lower-left	Encodes lower-block operations
(X 0; 0 W - ZX⁻¹Y)	Block diagonal	X in top-left, W - ZX⁻¹Y in bottom-right, zeros elsewhere	Contains the "pivot" blocks
(I X⁻¹Y; 0 I)	Upper block-triangular	Identity blocks on diagonal, X⁻¹Y in upper-right	Encodes upper-block operations

The excerpt states this can be verified "explicitly simply by block-multiplying these three matrices."
Block multiplication follows the same rules as scalar multiplication, but each "entry" operation is a matrix multiplication.

🔑 Key requirement and mechanics

🔑 Why X must be invertible

The formula requires computing X⁻¹ (the inverse of block X).
X⁻¹ appears in multiple places: ZX⁻¹, X⁻¹Y, and ZX⁻¹Y.
Without X⁻¹, the decomposition formula cannot be constructed.
This is analogous to requiring non-zero pivots in standard LU decomposition.

⚙️ The Schur complement term

The bottom-right block of the diagonal matrix is W - ZX⁻¹Y.
This is called the Schur complement of X in M.
It represents what remains of W after accounting for the influence of X, Y, and Z.
Example: if M represents a system, the Schur complement isolates the part of the system that cannot be explained by the top-left block alone.

📐 Connection to standard LDU

📐 Scalar case as a special instance

The excerpt provides Example 100 to illustrate:

For a 2×2 matrix, we can regard each entry as a 1×1 block.

Consider M = (1 2; 3 4).
Treat each scalar as a 1×1 block: X = [1], Y = [2], Z = [3], W = [4].
Then X⁻¹ = [1] (since 1×1 = 1).
Applying the block formula:
- Lower: (1 0; 3 1)
- Diagonal: (1 0; 0 -2) where -2 = 4 - 3×1×2
- Upper: (1 2; 0 1)
The excerpt notes: "By multiplying the diagonal matrix by the upper triangular matrix, we get the standard LU decomposition of the matrix."

🔄 Why the same structure works

Block operations follow the same algebraic rules as scalar operations.
The decomposition logic (eliminate lower blocks, record operations, isolate diagonal) is identical at both scales.
Don't confuse: block LDU is not a different algorithm; it is the same decomposition applied to a coarser-grained view of the matrix.

🛠️ Verification and usage

🛠️ How to verify the decomposition

The excerpt states: "This can be checked explicitly simply by block-multiplying these three matrices."
Multiply the three block matrices together using block multiplication rules.
Each block product is computed as a matrix multiplication.
The result should equal the original M = (X Y; Z W).

📝 Practical note

Block LDU is useful when M has natural substructure (e.g., systems with grouped variables).
It allows reasoning about large matrices in terms of their subcomponents.
The decomposition can be nested: if blocks themselves have structure, further decomposition is possible.

Review Problems for LU Decomposition and Determinants

7.8 Review Problems

🧭 Overview

🧠 One-sentence thesis

This problem set consolidates LU decomposition techniques—including non-square and block forms—and introduces the determinant as the single number that determines whether a square matrix is invertible.

📌 Key points (3–5)

LU for non-square matrices: the process is the same as for square matrices, but L is m×m and U is m×n when M is m×n.
Block LDU decomposition: when a block matrix has an invertible block X, it can be factored into block lower-triangular, block-diagonal, and block upper-triangular matrices.
Determinant as invertibility test: for a square matrix, the determinant is a single number; the matrix is invertible if and only if the determinant is non-zero.
Common confusion: LU decomposition applies to rectangular matrices, not just square ones; the shapes of L and U adapt to match M.
Why it matters: LU decomposition simplifies solving linear systems, and the determinant provides a quick invertibility check.

🧩 LU decomposition for non-square matrices

🧩 Shape and process

For an m×n matrix M, LU decomposition writes M = LU with L an m×m square lower unit triangular matrix and U an m×n rectangular matrix (same shape as M).

The process is exactly the same as for square matrices.
Start with L₀ = I (the m×m identity) and U₀ = M.
Create a sequence of matrices Lᵢ and Uᵢ until U becomes upper triangular.

🔢 Example: 2×3 matrix

The excerpt gives M = U₀ =

( -2  1  3 )
( -4  4  1 )

M is 2×3, so L will be 2×2 and U will be 2×3.
Start with L₀ = I₂ =

( 1  0 )
( 0  1 )

Zero out the first column below the diagonal: subtract 2 times row 1 from row 2.
Result: L₁ =

( 1  0 )
( 2  1 )

and U₁ =

( -2  1   3 )
(  0  2  -5 )

Since U₁ is upper triangular, the decomposition is complete.
Don't confuse: L is always square (m×m), even when M is not square.

🧱 Block LDU decomposition

🧱 Setup and formula

For a square block matrix M with square blocks X, Y, Z, W such that X⁻¹ exists, M can be decomposed into block lower-triangular, block-diagonal, and block upper-triangular matrices.

The formula is:

M = ( X  Y )
    ( Z  W )

equals

(  I      0   ) ( X        0        ) ( I  X⁻¹Y )
( ZX⁻¹   I   ) ( 0  W - ZX⁻¹Y ) ( 0    I   )

The middle matrix is block-diagonal.
The term W - ZX⁻¹Y is called the Schur complement.
This can be verified by block-multiplying the three matrices.

🔢 Example: 2×2 scalar matrix

Treat each entry as a 1×1 block:

( 1  2 )
( 3  4 )

equals

( 1  0 ) ( 1   0 ) ( 1  2 )
( 3  1 ) ( 0  -2 ) ( 0  1 )

Multiplying the diagonal matrix by the upper triangular matrix gives the standard LU decomposition.

🧱 Block size requirements (Problem 2)

If M is n×n and block W has r rows, then:

W is r×r
X is (n-r)×(n-r)
Y is (n-r)×r
Z is r×(n-r)

🔍 Review problem highlights

🔍 Solving lower triangular systems (Problem 1)

The linear system:

x₁ = v₁
l₂₁x₁ + x₂ = v₂
...
lₙ₁x₁ + lₙ₂x₂ + ... + xₙ = vₙ

x₁: directly given as v₁.
x₂: substitute x₁ into the second equation: x₂ = v₂ - l₂₁x₁.
x₃: substitute x₁ and x₂ into the third equation.
General xₖ: use a recursive method—solve for xₖ by substituting all previously found x₁, x₂, ..., xₖ₋₁.

🔍 UDL decomposition (Problem 2)

When W is invertible (instead of X), find a UDL decomposition:

( X  Y )   ( I  * ) ( *  0 ) ( I  0 )
( Z  W ) = ( 0  I ) ( 0  * ) ( *  I )

This is the "reverse" of the standard block LDU.
The problem asks you to fill in the stars using the invertibility of W.

🔍 Zero diagonal in LU (Problem 3)

Claim: If M is not invertible, then either L or U in M = LU has a zero on its diagonal.

L is lower unit triangular, so its diagonal is all 1s.
Therefore, if M is singular, U must have a zero on its diagonal.
This connects invertibility to the structure of the decomposition.

🔍 Geometric interpretation (Problem 4)

Describe what upper and lower triangular matrices do to the unit hypercube in their domain.

Upper triangular: shears and scales along coordinate directions.
Lower triangular: similar, but in the opposite order.
The problem asks for a geometric description, not a formula.

🔍 LDPU and solving systems (Problem 5)

Since row exchanges (permutation matrices P) are sometimes necessary, the complete decomposition is LDPU.

Suggest a procedure for using LDPU to solve linear systems.
Generalize the LU-based solving method to account for permutations.

🔍 LU vs UL (Problem 6)

Is there a reason to prefer LU over UL, or is it just convention?

The problem invites you to think about whether the order matters mathematically or is chosen for convenience.

🔍 Transpose and inverse (Problem 7)

If M is invertible, what are the LU, LDU, and LDPU decompositions of:

Mᵀ (transpose)?
M⁻¹ (inverse)?
Express these in terms of the decompositions for M.

🔍 Symmetric matrices (Problem 8)

Claim: If M is symmetric, then L = Uᵀ in the LDU decomposition of M.

Symmetry means M = Mᵀ.
Use this property to argue that the lower and upper triangular parts are transposes of each other.

🔢 Introduction to determinants

🔢 What the determinant is

The determinant boils down a square matrix to a single number; that number determines whether the square matrix is invertible or not.

For a 1×1 matrix M = (m), the determinant is m; M is invertible if and only if m ≠ 0.
For a 2×2 matrix M =

( m₁₁  m₁₂ )
( m₂₁  m₂₂ )

the determinant is m₁₁m₂₂ - m₁₂m₂₁; M is invertible if and only if det M ≠ 0.

🔢 2×2 determinant formula

The excerpt emphasizes:

det M = m₁₁m₂₂ - m₁₂m₂₁

Memorize this formula (see Figure 8.1 in the excerpt).
The inverse of M is (1 / det M) times the adjugate matrix.
Example: if det M = 0, the denominator is zero, so M⁻¹ does not exist.

🔢 3×3 determinant (Example 101)

For a 3×3 matrix M, the determinant formula begins:

det M = m₁₁m₂₂m₃₃ - m₁₁...

The excerpt is cut off, but it indicates that the determinant is a sum of products of entries.
M is non-singular (invertible) if and only if det M ≠ 0.

🔢 Determinant and invertibility

Matrix size	Determinant formula	Invertibility condition
1×1	m	m ≠ 0
2×2	m₁₁m₂₂ - m₁₂m₂₁	det M ≠ 0
3×3	(formula continues)	det M ≠ 0

Don't confuse: the determinant is not the same as the trace (sum of diagonal entries); it is a specific combination of all entries.

The Determinant Formula

8.1 The Determinant Formula

🧭 Overview

🧠 One-sentence thesis

The determinant reduces a square matrix to a single number that determines whether the matrix is invertible, and it is computed by summing signed products of matrix entries over all permutations of column indices.

📌 Key points (3–5)

What the determinant does: boils down a square matrix to a single number that tells whether the matrix is invertible (invertible if and only if determinant ≠ 0).
How it's computed: sum over all permutations of products of matrix entries, one from each row, with signs determined by whether the permutation is even or odd.
Permutation basics: a permutation is a shuffle of n objects; every permutation can be built by swapping pairs, and the parity (even/odd) of the number of swaps determines the sign.
Common confusion: the determinant formula involves all permutations, not just a few patterns—for n=4 there are already 24 terms, for n=5 there are 120 terms.
Key property: if any row of the matrix is all zeros, the determinant is zero.

🔢 Small matrix cases

🔢 1×1 and 2×2 matrices

For a 1×1 matrix M = (m), the determinant is simply m, and M is invertible if and only if m ≠ 0.

For a 2×2 matrix:

det M = det of matrix with entries m₁₁, m₁₂ in first row and m₂₁, m₂₂ in second row equals m₁₁m₂₂ − m₁₂m₂₁.

The matrix is invertible if and only if this quantity is not zero.
The inverse formula involves dividing by this determinant.
Memorize this formula for 2×2 matrices (the excerpt emphasizes this).

🔢 3×3 matrices

For a 3×3 matrix M with entries mᵢⱼ:

det M = m₁₁m₂₂m₃₃ − m₁₁m₂₃m₃₂ + m₁₂m₂₃m₃₁ − m₁₂m₂₁m₃₃ + m₁₃m₂₁m₃₂ − m₁₃m₂₂m₃₁

M is non-singular (invertible) if and only if det M ≠ 0.
Notice: in the subscripts, each ordering of the numbers 1, 2, and 3 occurs exactly once—these are all the permutations of {1, 2, 3}.

🔀 Permutations

🔀 What a permutation is

A permutation is a shuffle of n objects labeled 1 through n; it is an invertible function from the set {1, 2, ..., n} to itself.

Notation: σ = [σ(1) σ(2) σ(3) σ(4) σ(5)] lists where each position goes.
Example: σ = [4 2 5 1 3] means 1 goes to position 4, 2 stays at 2, 3 goes to 5, etc.
The top row [1 2 3 4 5] is always the same, so it can be omitted.

🔀 Counting and building permutations

How many permutations exist:

There are n! (n factorial) permutations of n distinct objects.
Reason: n choices for the first object, n−1 for the second, and so on.

How to build any permutation:

Every permutation can be built by successively swapping pairs of objects.
Example: to build [3 1 2] from [1 2 3], first swap 2 and 3, then swap 1 and 3.

🔀 Even and odd permutations

For any permutation σ, there is some number of swaps it takes to build up the permutation. If this number is even, σ is an even permutation; if odd, σ is an odd permutation.

Any way of building the same permutation from swaps will have the same parity (even or odd).
For all n ≥ 2, exactly half of the n! permutations are even and half are odd.
The trivial permutation (i → i for every i) is even because it uses zero swaps.

🔀 The sign function

The sign function sgn sends permutations to the set {−1, 1} with sgn(σ) = 1 if σ is even, and sgn(σ) = −1 if σ is odd.

This function assigns a sign to each permutation based on its parity.
It is used to determine whether a term in the determinant formula is added or subtracted.

📐 The general determinant formula

📐 Definition

The determinant of an n×n matrix M is: det M = sum over all permutations σ of sgn(σ) times m₁σ(1) m₂σ(2) ··· mₙσ(n).

What this means:

The sum is over all permutations of n objects—all functions σ from {1, ..., n} to {1, ..., n}.
Each summand (term) is a product of n entries from the matrix.
Each factor in a product comes from a different row.
The column numbers in each term are shuffled by a different permutation σ.
Each term is multiplied by +1 or −1 depending on whether σ is even or odd.

📐 Why the formula grows quickly

For n=4, there are 24 = 4! permutations, so 24 terms in the sum.
For n=5, there are already 120 = 5! permutations.
Example: a 4×4 matrix determinant starts with terms like m₁₁m₂₂m₃₃m₄₄ − m₁₁m₂₃m₃₂m₄₄ − m₁₁m₂₄... and continues for 24 terms total.
Don't confuse: the determinant is not just a few simple patterns; it involves every permutation.

📐 Key property: zero rows

Theorem: If M has a row consisting entirely of zeros, then det M = 0.

Why:

If row i is all zeros, then mᵢσ(i) = 0 for every permutation σ.
Every term in the determinant sum contains a factor from row i, so every term is zero.
Therefore the entire sum is zero.

🔗 Connection to invertibility

🔗 The fundamental question

The excerpt opens with: "Given a square matrix, is there an easy way to know when it is invertible?"

The answer:

The determinant provides this test.
A square matrix M is invertible if and only if det M ≠ 0.
This works for all sizes: 1×1, 2×2, 3×3, and general n×n matrices.

🔗 How it works in practice

Matrix size	Determinant formula	Invertibility condition
1×1	det(m) = m	m ≠ 0
2×2	m₁₁m₂₂ − m₁₂m₂₁	This quantity ≠ 0
3×3	Six-term formula (all permutations of {1,2,3})	This quantity ≠ 0
n×n	Sum over all n! permutations	This quantity ≠ 0

The determinant "boils down" the entire matrix to a single number.
That single number determines invertibility.

Simple Examples of Determinants

8.1.1 Simple Examples

🧭 Overview

🧠 One-sentence thesis

The determinant reduces a square matrix to a single number that determines whether the matrix is invertible, and for small matrices this formula can be computed directly from the entries.

📌 Key points (3–5)

What the determinant does: boils down a square matrix to a single number that tells you if the matrix is invertible.
Invertibility criterion: a square matrix M is invertible if and only if det M ≠ 0.
Size matters: for 1×1 matrices the determinant is just the entry itself; for 2×2 it's a simple cross-product formula; for 3×3 and larger the formula involves permutations.
Common confusion: the determinant formula grows rapidly—4×4 matrices have 24 terms, 5×5 have 120 terms—so the direct formula is practical only for small cases.
Key structure: each term in the determinant picks exactly one entry from each row and each column, weighted by the sign of the corresponding permutation.

🔢 Determinants for small matrices

🔢 The 1×1 case

If M is a 1×1 matrix, then M = (m) and M⁻¹ = (1/m).
M is invertible if and only if m ≠ 0.
In other words, the determinant of a 1×1 matrix is just the single entry m.

🔢 The 2×2 case

Determinant of a 2×2 matrix: det M = det of the matrix with entries m₁₁, m₁₂ in the first row and m₂₁, m₂₂ in the second row equals m₁₁m₂₂ − m₁₂m₂₁.

The excerpt emphasizes: memorize this formula (see Figure 8.1).
The matrix M is invertible if and only if m₁₁m₂₂ − m₁₂m₂₁ ≠ 0.
The inverse formula (from chapter 7 section 7.5) has this determinant in the denominator: M⁻¹ = 1/(m₁₁m₂₂ − m₁₂m₂₁) times a certain 2×2 matrix.
Example: the cross-product structure means you multiply down the main diagonal (m₁₁m₂₂) and subtract the product of the off-diagonal (m₁₂m₂₁).

🔢 The 3×3 case

For a 3×3 matrix M with entries mᵢⱼ (i, j = 1, 2, 3), the determinant is:
- det M = m₁₁m₂₂m₃₃ − m₁₁m₂₃m₃₂ + m₁₂m₂₃m₃₁ − m₁₂m₂₁m₃₃ + m₁₃m₂₁m₃₂ − m₁₃m₂₂m₃₁.
M is non-singular (invertible) if and only if det M ≠ 0.
Key observation: in the subscripts, each ordering of the numbers 1, 2, and 3 occurs exactly once.
Each such ordering is called a permutation of the set {1, 2, 3}.
This pattern generalizes to larger matrices via the permutation-based definition.

🔄 Permutations and the general determinant formula

🔄 What is a permutation?

Permutation: a shuffle of n objects labeled 1 through n; equivalently, an invertible function from the set {1, 2, ..., n} to itself.

Example: σ = [4 2 5 1 3] means σ(1) = 4, σ(2) = 2, σ(3) = 5, σ(4) = 1, σ(5) = 3.
Notation: the top row [1 2 3 4 5] is always the same, so we omit it and write just the bottom row.
There are n! (n factorial) permutations of n distinct objects, because there are n choices for the first position, n−1 for the second, and so on.

🔄 Even and odd permutations

Every permutation can be built by successively swapping pairs of objects.
Example: to build [3 1 2] from [1 2 3], first swap 2 and 3 (getting [1 3 2]), then swap 1 and 3 (getting [3 1 2]).
The number of swaps needed has a consistent parity (even or odd) no matter which sequence of swaps you choose.
If the number of swaps is even, σ is an even permutation; if odd, σ is an odd permutation.
For n ≥ 2, exactly half of the n! permutations are even and half are odd.
The trivial permutation (i → i for all i) is even, since it uses zero swaps.

🔄 The sign function

Sign function: sgn(σ) = +1 if σ is even, −1 if σ is odd.

This function maps permutations to the set {−1, +1}.
It is used to weight each term in the determinant formula.

🔄 The general determinant definition

Determinant of an n×n matrix M: det M = sum over all permutations σ of sgn(σ) · m₁σ(1) · m₂σ(2) · ... · mₙσ(n).

The sum runs over all n! permutations of n objects.
Each term (summand) is a product of n entries from the matrix, with exactly one entry from each row.
The column indices are shuffled by the permutation σ.
Each term is multiplied by +1 or −1 depending on whether σ is even or odd.
Don't confuse: this is a sum of products, not a product of sums; each product picks one entry per row and one per column.

📐 Properties and practical considerations

📐 Zero rows

Theorem 8.1.1: If M has a row consisting entirely of zeros, then det M = 0.

Reason: for every permutation σ, the factor mᵢσ(i) = 0 for some i (the zero row), so every term in the sum is zero.
This is consistent with the invertibility criterion: a matrix with a zero row cannot be invertible.

📐 Explosion of terms

For n = 4, there are 24 = 4! permutations, so the determinant formula has 24 terms.
For n = 5, there are 120 = 5! permutations.
Example: the excerpt begins to write out the determinant for a 4×4 matrix and notes that even this case is already very long.
Practical implication: the direct permutation formula is useful for understanding the determinant conceptually and for small matrices, but not for hand computation of large matrices.
Later sections (not in this excerpt) will introduce more efficient methods (e.g., cofactor expansion, row reduction).

📐 Structure of each term

Feature	Description
One entry per row	Each term picks exactly one entry from each row (indices 1, 2, ..., n in the first position).
One entry per column	The permutation σ ensures each column index appears exactly once.
Sign weighting	Each term is multiplied by sgn(σ), which alternates the sign based on the parity of the permutation.

Example: in the 3×3 case, the term m₁₁m₂₂m₃₃ corresponds to the trivial permutation [1 2 3], which is even (zero swaps), so it has a + sign.
The term −m₁₁m₂₃m₃₂ corresponds to the permutation [1 3 2], which is odd (one swap), so it has a − sign.

Permutations

8.1.2 Permutations

🧭 Overview

🧠 One-sentence thesis

Permutations—all possible orderings of n objects—provide the foundation for defining the determinant as a sum over all permutations, where each term's sign depends on whether the permutation is even or odd.

📌 Key points (3–5)

What a permutation is: a shuffle or reordering of n labeled objects, viewed as an invertible function from {1, 2, ..., n} to itself.
Counting permutations: there are n! (n factorial) permutations of n distinct objects.
Even vs odd permutations: every permutation can be built by swapping pairs; if the number of swaps has even parity, the permutation is even (sign +1), otherwise odd (sign −1).
Common confusion: any sequence of swaps that builds a given permutation will have the same parity (even or odd), even if the number of swaps differs.
Why it matters for determinants: the determinant is defined as a sum over all permutations, with each term weighted by the sign of the permutation.

🔢 What permutations are

🔢 Definition and notation

Permutation: each possible shuffle of n objects labeled 1 through n.

A permutation σ is an invertible function from the set [n] := {1, 2, ..., n} to [n].
Two-line notation: write the original order on top and the shuffled order below:
```
σ = [ 1  2  3  4  5 ]
    [ 4  2  5  1  3 ]
```
Shorthand notation: since the top line is always 1, 2, ..., n, we can omit it and write only the bottom line: σ = [4 2 5 1 3].
Example: σ(3) = 5 means "object 3 is sent to position 5."

🧮 Counting permutations

There are n! permutations of n distinct objects.
Reasoning: n choices for the first position, (n − 1) for the second once the first is chosen, and so on.
Example: for n = 4, there are 24 permutations; for n = 5, there are 120 permutations.

🔄 Building permutations with swaps

🔄 Swaps and parity

Key property: every permutation can be built by successively swapping pairs of objects.
Example: to build [3 1 2] from [1 2 3], first swap 2 and 3 to get [1 3 2], then swap 1 and 3 to get [3 1 2].
Parity: for any given permutation σ, the number of swaps needed has a fixed parity—either even or odd.
- You don't have to use the minimum number of swaps; any way of building the permutation will have the same parity.
Even permutation: built with an even number of swaps.
Odd permutation: built with an odd number of swaps.
The trivial permutation (i → i for every i) is even, since it uses zero swaps.

⚖️ Distribution of even and odd permutations

For all n ≥ 2, n! is even.
Exactly half of the permutations are even and the other half are odd.

➕ The sign function

➕ Definition

Sign function (sgn): a function that sends permutations to the set {−1, 1} with the rule:

sgn(σ) = 1 if σ is even

sgn(σ) = −1 if σ is odd

The sign function captures the parity of a permutation in a single number.
Example: if σ is built with 3 swaps (odd), then sgn(σ) = −1.

🧩 Determinants via permutations

🧩 The determinant formula

Determinant of an n × n matrix M: det M = sum over all permutations σ of sgn(σ) · m₁σ(1) · m₂σ(2) · ... · mₙσ(n)

The sum runs over all permutations of n objects.
Each summand is a product of n entries from the matrix, with each factor from a different row.
The column indices are shuffled by the permutation σ.
The sign of each term is determined by sgn(σ).

📋 Example: 4 × 4 determinant

For a 4 × 4 matrix, there are 24 permutations, so the determinant has 24 terms.
Example terms:
- m₁₁m₂₂m₃₃m₄₄ (from the identity permutation, even, so +)
- −m₁₁m₂₃m₃₂m₄₄ (from an odd permutation, so −)
- −m₁₁m₂₂m₃₄m₄₃ (from an odd permutation, so −)
- ... plus 21 more terms.
This is very cumbersome for general matrices.

🔳 Special case: diagonal matrices

If M is diagonal (mᵢⱼ = 0 whenever i ≠ j), all summands involving off-diagonal entries vanish.
Only the identity permutation contributes: det M = m₁₁m₂₂ · ... · mₙₙ.
The determinant of a diagonal matrix is the product of its diagonal entries.
Since the identity matrix is diagonal with all diagonal entries equal to 1, det I = 1.

🚫 Zero-row property

Theorem: If M has a row consisting entirely of zeros, then det M = 0.
Reason: for every permutation σ, the product m₁σ(1) · m₂σ(2) · ... · mₙσ(n) includes a factor from the zero row, so every term is zero.

🔀 Row operations and determinants

🔀 Swapping rows

Let M′ be the matrix M with rows i and j swapped.
For each permutation σ, define σ̂ as the permutation obtained by swapping positions i and j.
Key observation: sgn(σ̂) = −sgn(σ).
The determinant calculation shows:
- det M′ = sum over σ of sgn(σ) · (product with rows i and j swapped)
- = sum over σ of (−sgn(σ̂)) · (product with σ̂)
- = −sum over σ̂ of sgn(σ̂) · (product with σ̂)
- = −det M.
Swapping rows changes the sign of the determinant.
Don't confuse: the step replacing sum over σ by sum over σ̂ holds because we sum over all permutations (the set of permutations is the same, just relabeled).

Elementary Matrices and Determinants

8.2 Elementary Matrices and Determinants

🧭 Overview

🧠 One-sentence thesis

Elementary matrices—which perform row operations—have predictable effects on determinants, and this relationship reveals that the determinant of a product equals the product of determinants.

📌 Key points (3–5)

Three elementary matrices: row swap (E_ij), row multiplication (R_i(λ)), and row addition (S_ij(μ)) correspond to the three row operations.
How each affects determinants: row swap flips the sign (det = -1), row multiplication scales by λ (det = λ), and row addition leaves the determinant unchanged (det = 1).
Product formula: for any elementary matrix E and matrix M, det(EM) = det(E) · det(M).
Common confusion: row addition does not change the determinant even though it changes the matrix—this is because the "extra" term creates a matrix with two identical rows, which has determinant zero.
Why it matters: any matrix can be reduced to RREF by elementary matrices, so understanding how elementary matrices affect determinants lets us compute determinants of any matrix.

🔧 The three elementary matrices

🔄 Row swap: E_ij

E_ij: the identity matrix with rows i and j swapped.

When you multiply M by E_ij, you get M with rows i and j swapped: M' = E_ij · M.
Effect on determinant: det(E_ij) = -1, and det(M') = -det(M).
Swapping rows changes the sign of the determinant.
Example: if M is the identity matrix I, then E_ij is I with two rows swapped, so det(E_ij) = -1.

Why the sign flips: The determinant formula sums over all permutations σ. When you swap rows i and j, each permutation σ is replaced by a permutation σ̂ (obtained by swapping positions i and j), and sgn(σ̂) = -sgn(σ). Summing over all σ̂ instead of σ gives the same sum but with opposite sign.

✖️ Row multiplication: R_i(λ)

R_i(λ): the identity matrix with the i-th diagonal entry replaced by λ.

When you multiply M by R_i(λ), you get M with row i multiplied by λ: M' = R_i(λ) · M.
Effect on determinant: det(R_i(λ)) = λ, and det(M') = λ · det(M).
Multiplying a row by λ multiplies the determinant by λ.
Example: if you double row 2 of a matrix, you double its determinant.

Why it scales: In the determinant formula, every term contains exactly one entry from row i. When row i is multiplied by λ, every term in the sum is multiplied by λ, so the entire determinant is multiplied by λ.

➕ Row addition: S_ij(μ)

S_ij(μ): the identity matrix with an additional μ in the (i,j) position.

When you multiply M by S_ij(μ), you get M with μ times row j added to row i: M' = S_ij(μ) · M.
Effect on determinant: det(S_ij(μ)) = 1, and det(M') = det(M).
Adding a multiple of one row to another does not change the determinant.
Example: if you add 3 times row 1 to row 2, the determinant stays the same.

Why it doesn't change: When you expand det(M'), you get det(M) plus μ times the determinant of a matrix M'' that has two identical rows (rows i and j are the same). Since any matrix with two identical rows has determinant zero, the extra term vanishes.

Don't confuse: Row addition changes the matrix but not its determinant; row multiplication changes both the matrix and the determinant.

🧮 Determinants of products

🔗 The product formula for elementary matrices

The excerpt establishes:

Elementary matrix	Determinant	Product formula
E_ij (row swap)	det(E_ij) = -1	det(E_ij · M) = det(E_ij) · det(M)
R_i(λ) (row multiply)	det(R_i(λ)) = λ	det(R_i(λ) · M) = det(R_i(λ)) · det(M)
S_ij(μ) (row add)	det(S_ij(μ)) = 1	det(S_ij(μ) · M) = det(S_ij(μ)) · det(M)

General theorem: If E is any elementary matrix (E_ij, R_i(λ), or S_ij(μ)), then det(EM) = det(E) · det(M).

🔗 Why this matters: chains of row operations

Any matrix M can be reduced to RREF by a sequence of row operations.
Each row operation corresponds to left-multiplying by an elementary matrix.
So RREF(M) = E_1 · E_2 · ... · E_k · M, where each E_i is elementary.
By repeatedly applying the product formula, we can relate det(M) to det(RREF(M)).

Implication: Since we know the determinants of elementary matrices and can compute the determinant of RREF(M) easily, we can determine det(M).

📐 Special cases and properties

🔲 Diagonal matrices

A diagonal matrix M has M_ij = 0 whenever i ≠ j (all off-diagonal entries are zero).
Determinant of a diagonal matrix: det(M) = m_11 · m_22 · ... · m_nn (the product of diagonal entries).
Why: In the determinant formula, any term involving an off-diagonal entry is zero. The only nonzero term corresponds to the trivial permutation (i → i for all i), which has sign +1.
Identity matrix: Since I is diagonal with all diagonal entries equal to 1, det(I) = 1.

🚫 Matrices with zero or identical rows

Theorem: If M has a row consisting entirely of zeros, then det(M) = 0.

Why: Every term in the determinant formula includes one entry from each row. If row i is all zeros, then m_i,σ(i) = 0 for every permutation σ, so every term is zero.

Theorem: If M has two identical rows, then det(M) = 0.

Why: Swapping the two identical rows changes the sign of the determinant but leaves the matrix unchanged. So det(M) = -det(M), which implies det(M) = 0.

🧩 Determinant of RREF

The excerpt begins to address: What is det(RREF(M))?

Case 1: If M is not invertible, then some row of RREF(M) contains only zeros.

You can multiply the zero row by any constant λ without changing the matrix.
By the row-multiplication rule, this scales the determinant by λ.
The only number that equals λ times itself for all λ is zero, so det(RREF(M)) = 0 when M is not invertible.

(The excerpt cuts off before completing Case 2, which would address invertible matrices.)

Row Swap

8.2.1 Row Swap

🧭 Overview

🧠 One-sentence thesis

Swapping two rows of a matrix flips the sign of its determinant, which is the first key property needed to understand how row operations affect determinants and matrix invertibility.

📌 Key points (3–5)

What row swap does: swapping rows i and j changes the determinant from det M to −det M.
Why the sign flips: the permutation formula for determinants shows that swapping positions in a permutation reverses its sign, so the entire sum reverses sign.
Elementary matrix for row swap: the matrix Eᵢⱼ (identity with rows i and j swapped) has determinant −1.
Product rule hint: det(Eᵢⱼ M) = det(Eᵢⱼ) · det(M), suggesting determinants multiply when matrices multiply.
Common confusion: the step replacing ∑ over σ by ∑ over σ̂ often confuses readers, but it holds because both sums run over all permutations.

🔄 How row swap changes the determinant

🔄 The sign-flip rule

When rows i and j of matrix M are swapped to produce M′, det M′ = −det M.

This is the central result of the section.
It applies to any pair of rows, regardless of matrix size.
The sign change is exact: one swap → one sign flip.

🧮 Why the sign flips (permutation argument)

The excerpt walks through the determinant formula step by step:

Start with det M′ = ∑ over σ of sgn(σ) · (product of entries from swapped rows).
For each permutation σ, define σ̂ as the permutation obtained by swapping positions i and j.
Key fact: sgn(σ̂) = −sgn(σ) (swapping two positions in a permutation reverses its sign).
Rewrite the sum using σ̂ instead of σ; since both run over all permutations, the sums are equal.
Factor out the minus sign: det M′ = −det M.

Don't confuse: The step replacing ∑ over σ by ∑ over σ̂ is valid because both sums run over all permutations—swapping positions just relabels them.

🧩 Block-matrix view

The excerpt also presents row swap in block notation:

Write M as a column of row-blocks: M = [... Rᵢ ... Rⱼ ...]ᵀ.
After swapping, M′ = [... Rⱼ ... Rᵢ ...]ᵀ.
This makes it clear that only the order of rows changes, not the entries themselves.

🧱 The row-swap elementary matrix

🧱 Definition of Eᵢⱼ

Eᵢⱼ is the identity matrix with rows i and j swapped.

It is an elementary matrix because it performs a single row operation.
Multiplying M on the left by Eᵢⱼ swaps rows i and j: M′ = Eᵢⱼ M.

🔢 Determinant of Eᵢⱼ

Start with the identity matrix I, which has det I = 1 (since I is diagonal with all diagonal entries equal to 1).
Swapping rows i and j of I produces Eᵢⱼ.
By the sign-flip rule, det Eᵢⱼ = −det I = −1.

Example: For a 3×3 identity, swapping rows 1 and 2 gives E₁₂ with determinant −1.

🔗 Product of determinants

The excerpt observes:

M′ = Eᵢⱼ M.
det M′ = −det M (by the row-swap rule).
det Eᵢⱼ = −1.
Therefore det(Eᵢⱼ M) = det(Eᵢⱼ) · det(M).

This is the first hint that determinants of products equal products of determinants, a general rule that will be explored further.

🎯 Consequences and special cases

🎯 Matrices with identical rows

Theorem 8.1.2: If M has two identical rows, then det M = 0.

Why:

Swapping the two identical rows leaves M unchanged.
But swapping rows flips the sign of the determinant: det M = −det M.
The only number equal to its own negative is zero.

Don't confuse: This is a consequence of the row-swap rule, not a separate axiom.

🧭 Connection to invertibility

The excerpt motivates the determinant by asking:

We want to use the determinant to decide whether a matrix is invertible.
Previously, we computed inverses using row operations (Gaussian elimination).
Therefore, we need to understand what row operations do to the determinant.

Row swap is the first of three elementary row operations examined; the others (row multiplication and row addition) follow in subsequent subsections.

Row Multiplication

8.2.2 Row Multiplication

🧭 Overview

🧠 One-sentence thesis

Multiplying a row of a matrix by a scalar λ multiplies the determinant by λ, and this operation can be performed by an elementary matrix R_i(λ) whose determinant is also λ.

📌 Key points (3–5)

What row multiplication does: multiplying row i by scalar λ changes the matrix and scales the determinant by λ.
The elementary matrix R_i(λ): the identity matrix with the i-th diagonal entry replaced by λ; multiplying M by R_i(λ) multiplies row i by λ.
Determinant of R_i(λ): det R_i(λ) = λ (not 1, unlike row addition).
Product formula holds: det(R_i(λ)M) = det(R_i(λ)) · det(M) = λ · det(M).
Common confusion: don't confuse with row addition (which leaves the determinant unchanged); row multiplication scales the determinant.

🔧 The row multiplication operation

🔧 What the operation does

Start with a matrix M written as rows: M = [R₁, ..., Rₙ]ᵀ (where R_i are row vectors).
Multiply row i by a scalar λ to get M′ = [R₁, ..., λR_i, ..., Rₙ]ᵀ.
This is one of the three elementary row operations used in Gaussian elimination.

🧮 The elementary matrix R_i(λ)

R_i(λ): the identity matrix with the i-th diagonal entry replaced by λ.

Explicitly: R_i(λ) has 1s on the diagonal except in position (i, i), where it has λ.
All off-diagonal entries are 0.
Left-multiplying M by R_i(λ) performs the row multiplication: M′ = R_i(λ)M.

Example: If you want to multiply row 2 of a 3×3 matrix by 5, use R₂(5) = diag(1, 5, 1).

📐 Effect on the determinant

📐 How the determinant changes

The excerpt shows the calculation using the permutation formula for determinants:

det M′ = sum over all permutations σ of sgn(σ) · m₁,σ(1) · ... · (λ · m_i,σ(i)) · ... · m_n,σ(n)
Factor out λ from every term: det M′ = λ · sum over σ of sgn(σ) · m₁,σ(1) · ... · m_i,σ(i) · ... · m_n,σ(n)
The remaining sum is just det M.

Result:

det M′ = λ · det M
In words: multiplying one row by λ multiplies the determinant by λ.

🔍 Determinant of R_i(λ) itself

R_i(λ) is the identity matrix with row i multiplied by λ.
Applying the rule above to the identity matrix (det I = 1):
- det R_i(λ) = λ · det I = λ · 1 = λ.

Don't confuse: The determinant of R_i(λ) is λ, not 1. This is different from row swap (det = -1) and row addition (det = 1).

🧩 The product formula

🧩 Determinant of products

The excerpt verifies that the product formula holds for row multiplication:

We have M′ = R_i(λ)M.
We showed det M′ = λ · det M.
We also showed det R_i(λ) = λ.
Therefore: det(R_i(λ)M) = det(R_i(λ)) · det(M).

This is the same pattern seen with row swap elementary matrices, hinting at a general rule for determinants of products.

📊 Summary of elementary matrix determinants

Elementary matrix	What it does	Determinant
E_ij	Swap rows i and j	-1
R_i(λ)	Multiply row i by λ	λ
S_ij(μ)	Add μ times row j to row i	1

Key insight: Each elementary matrix E satisfies det(EM) = det(E) · det(M), building toward a general determinant product rule.

Row Addition

8.2.3 Row Addition

🧭 Overview

🧠 One-sentence thesis

Adding a multiple of one row to another row leaves the determinant unchanged, which is achieved by multiplying by an elementary matrix S_ij(μ) that itself has determinant 1.

📌 Key points (3–5)

The row addition operation: adding μ times row j to row i is performed by multiplying by the elementary matrix S_ij(μ), which is the identity matrix with an additional μ in the i,j position.
Effect on determinant: row addition does not change the determinant—det(M') = det(M).
Why the determinant is unchanged: the added term creates a matrix M'' with two identical rows, which has determinant 0, so the extra term vanishes.
Common confusion: unlike row multiplication (which scales the determinant by λ) or row swapping (which negates the determinant), row addition leaves the determinant completely unchanged.
The elementary matrix itself: det(S_ij(μ)) = 1 for any μ, and the product formula det(S_ij(μ)M) = det(S_ij(μ)) det(M) holds.

🔧 The row addition elementary matrix

🔧 Structure of S_ij(μ)

S_ij(μ): the identity matrix with an additional μ in the i,j position.

Start with the identity matrix (all 1s on the diagonal, 0s elsewhere).
Place μ in the position at row i, column j (off the diagonal).
All other entries remain as in the identity matrix.

🔄 How multiplication performs row addition

When you multiply S_ij(μ) by a matrix M:

The result replaces row i of M with (R_i + μR_j).
Row j and all other rows remain unchanged.
Example: if M has rows R_1, ..., R_i, ..., R_j, ..., R_n, then S_ij(μ)M has rows R_1, ..., (R_i + μR_j), ..., R_j, ..., R_n.

🧮 Why the determinant stays the same

🧮 The algebraic proof

Let M' = S_ij(μ)M. The determinant expands as:

det(M') = sum over all permutations σ of sgn(σ) times the product m_1,σ(1) ··· (m_i,σ(i) + μm_j,σ(i)) ··· m_n,σ(n).
Split the sum into two parts: one with m_i,σ(i) terms and one with μm_j,σ(i) terms.
The first part equals det(M).
The second part equals μ det(M''), where M'' is M with row i replaced by row j.

🔍 The duplicate-row argument

M'' has two identical rows (both equal to R_j).
A matrix with two identical rows has determinant 0 (a known property from earlier sections).
Therefore μ det(M'') = 0, and det(M') = det(M) + 0 = det(M).

Don't confuse: This is different from row multiplication, where multiplying a row by λ multiplies the determinant by λ. Row addition adds no scaling factor to the determinant.

📐 Properties of the row addition matrix

📐 Determinant of S_ij(μ)

Apply the row addition rule to the identity matrix: det(S_ij(μ)) = det(S_ij(μ)I) = det(I) = 1.
This holds for any value of μ.
The product formula holds: det(S_ij(μ)M) = det(S_ij(μ)) det(M) = 1 · det(M) = det(M).

🔁 Comparison with other elementary matrices

Elementary matrix	Operation	Determinant of E	Effect on det(M)
E_ij	Swap rows i and j	-1	Negates det(M)
R_i(λ)	Multiply row i by λ	λ	Multiplies det(M) by λ
S_ij(μ)	Add μ times row j to row i	1	Leaves det(M) unchanged

Common confusion: All three elementary matrices affect the determinant differently—swapping negates, scaling multiplies, but adding leaves it unchanged.

Determinant of Products

8.2.4 Determinant of Products

🧭 Overview

🧠 One-sentence thesis

The determinant of a product of matrices equals the product of their determinants, a result that follows from analyzing how elementary row operations affect determinants and connects determinants to invertibility.

📌 Key points (3–5)

Elementary matrices and determinants: Each type of elementary matrix (row swap, row multiplication, row addition) has a predictable effect on determinants and obeys det(EM) = det(E) det(M).
Determinant and invertibility: A square matrix is invertible if and only if its determinant is non-zero.
Product formula: For any square matrices M and N, det(MN) = det(M) det(N)—an extremely important result.
Common confusion: Row addition leaves the determinant unchanged (det = 1 for S_ij(μ)), but row multiplication scales the determinant by λ (det = λ for R_i(λ)); don't confuse these two operations.
Why it matters: The product formula connects matrix multiplication to determinants and provides a computational tool for checking invertibility.

🔧 Elementary matrices and their determinants

🔄 Row swap: E_ij

E_ij = identity matrix with rows i and j swapped; det(E_ij) = -1.

Swapping two rows flips the sign of the determinant.
The excerpt shows: det(E_ij M) = det(E_ij) det(M) = -1 · det(M).
Example: If M has determinant 5, swapping two rows gives a matrix with determinant -5.

✖️ Row multiplication: R_i(λ)

R_i(λ) = identity matrix with λ in position i,i (the i-th diagonal entry); det(R_i(λ)) = λ.

Multiplying row i by λ multiplies the determinant by λ.
The excerpt derives: det(R_i(λ)M) = λ det(M) by factoring λ out of the determinant sum.
Don't confuse: This scales the determinant, unlike row addition which leaves it unchanged.
Example: Doubling a row doubles the determinant; multiplying by zero makes the determinant zero.

➕ Row addition: S_ij(μ)

S_ij(μ) = identity matrix with an additional μ in the i,j position; det(S_ij(μ)) = 1.

Adding μ times row j to row i does not change the determinant.
The excerpt proves: det(M') = det(M) + μ det(M''), where M'' has two identical rows (so det(M'') = 0).
Therefore det(S_ij(μ)M) = det(M).
Example: Adding 3 times row 2 to row 1 leaves the determinant unchanged.

📋 Summary table

Elementary matrix	What it does	Determinant of E	Effect on det(M)
E_ij	Swap rows i and j	-1	Flips sign
R_i(λ)	Multiply row i by λ	λ	Scales by λ
S_ij(μ)	Add μ·(row j) to row i	1	No change

🔗 Determinants and invertibility

🎯 The invertibility criterion

Theorem: For any square matrix M, det(M) ≠ 0 if and only if M is invertible.

The excerpt establishes this by examining reduced row echelon form (RREF).
Any matrix M can be written as RREF(M) = E₁E₂···E_k M, where each E_i is an elementary matrix.

🔍 Two cases for RREF determinants

Case 1: M is not invertible

RREF(M) has a row of all zeros.
Multiplying the zero row by any λ doesn't change the matrix, but scales the determinant by λ.
This means det(RREF(M)) = λ det(RREF(M)) for any λ, so det(RREF(M)) = 0.

Case 2: M is invertible

Every row of RREF(M) has a pivot on the diagonal.
For a square matrix, this means RREF(M) is the identity matrix.
Therefore det(RREF(M)) = 1.

🧮 Why this proves the theorem

Since det(RREF(M)) = det(E₁)···det(E_k) det(M), and each elementary matrix has non-zero determinant, we have:
- det(RREF(M)) = 0 if and only if det(M) = 0.
Invertible ↔ RREF is identity ↔ det(RREF) = 1 ↔ det(M) ≠ 0.

🔄 Corollary on elementary matrices

Any elementary matrix E_ij, R_i(λ), or S_ij(μ) is invertible, except R_i(0).
The inverse of an elementary matrix is another elementary matrix.
Example: R_i(0) multiplies a row by zero, creating a zero row, so it's not invertible.

🎯 The product formula

🧩 Main result

For any square matrices M and N, det(MN) = det(M) det(N).

The excerpt emphasizes: "This result is extremely important; do not forget it!"
This formula works for any square matrices, whether invertible or not.

📐 Proof when M is invertible

The excerpt writes M and N using their RREF decompositions:

M = E₁E₂···E_k RREF(M)
N = F₁F₂···F_l RREF(N)

If M is invertible, then RREF(M) = I (identity), so:

det(MN) = det(E₁E₂···E_k I F₁F₂···F_l RREF(N))
= det(E₁)···det(E_k) det(I) det(F₁)···det(F_l) det(RREF(N))
= det(M) det(N)

📐 Proof when M is not invertible

If M is not invertible:

det(M) = 0 and RREF(M) has a zero row.
Multiplying the zero row by any λ doesn't change RREF(M), so R_n(λ) RREF(M) = RREF(M).
Then: det(MN) = det(E₁)···det(E_k) det(RREF(M)N)
- = det(E₁)···det(E_k) det(R_n(λ) RREF(M)N)
- = det(E₁)···det(E_k) λ det(RREF(M)N)
- = λ det(MN)
This holds for any λ, so det(MN) = 0 = det(M) det(N).

🔑 Why it matters

The product formula connects matrix multiplication to determinants in a simple, multiplicative way.
It allows computing det(MN) without multiplying the matrices first.
Example: If det(M) = 2 and det(N) = 3, then det(MN) = 6, regardless of the matrix entries.

🧠 Key theorem summary

📊 The fundamental connection

The excerpt establishes a chain of equivalent statements for square matrices:

Property	Equivalent to
M is invertible	det(M) ≠ 0
M is not invertible	det(M) = 0
RREF(M) = I	det(M) ≠ 0
RREF(M) has a zero row	det(M) = 0

🎓 The product rule in context

The formula det(EM) = det(E) det(M) holds for all elementary matrices.
This extends to any product: det(MN) = det(M) det(N).
Don't confuse: This is about determinants of products, not sums; there is no simple formula for det(M + N).

Review Problems: Determinants and Elementary Matrices

8.3 Review Problems

🧭 Overview

🧠 One-sentence thesis

A matrix is invertible if and only if its determinant is non-zero, and the determinant of a product equals the product of determinants—a fundamental multiplicative property that connects row operations, elementary matrices, and matrix invertibility.

📌 Key points (3–5)

Invertibility criterion: det M ≠ 0 if and only if M is invertible (Theorem 8.2.2).
Multiplicative property: det(MN) = det M · det N for any square matrices M and N—an extremely important result.
Elementary matrices are invertible: Every elementary matrix (except R_i(0)) is invertible, and its inverse is also an elementary matrix.
Common confusion: The determinant is NOT a linear transformation—det(M + N) does not equal det M + det N.
Row operations and determinants: The RREF of M has determinant 0 if M is not invertible, or determinant 1 if M is invertible.

🔗 Determinants and invertibility

🔍 The invertibility test

Theorem 8.2.2: For any square matrix M, det M ≠ 0 if and only if M is invertible.

This theorem establishes determinants as a test for invertibility.
The proof relies on the relationship between RREF(M) and det M through elementary matrices.
If M is not invertible, RREF(M) has a row of zeros, so det RREF(M) = 0.
If M is invertible, RREF(M) is the identity matrix, so det RREF(M) = 1.

Why this works:

Since RREF(M) = E₁E₂···E_k M for elementary matrices E_i, we have det RREF(M) = det(E₁)···det(E_k) det M.
Each E_i has non-zero determinant, so det RREF(M) = 0 if and only if det M = 0.

🧩 Elementary matrices and their inverses

Corollary 8.2.3: Any elementary matrix E_ij, R_i(λ), S_ij(μ) is invertible, except for R_i(0). The inverse of an elementary matrix is another elementary matrix.

Elementary matrices correspond to elementary row operations.
R_i(0) is the exception because multiplying a row by zero destroys information.
Example: If E_ij swaps rows i and j, then E_ij · E_ij = I (swapping twice returns to the original).

✖️ The multiplicative property

🎯 Product of determinants

The excerpt proves that for any square matrices M and N:

det(MN) = det M · det N

The excerpt emphasizes: "This result is extremely important; do not forget it!"

📐 Proof outline for invertible M

When M is invertible (RREF(M) = I):

Write M = E₁E₂···E_k RREF(M) and N = F₁F₂···F_l RREF(N).
Then det(MN) = det(E₁E₂···E_k I F₁F₂···F_l RREF(N)).
Apply the multiplicative property of elementary matrices step by step.
Result: det(MN) = det(E₁)···det(E_k) det(I) det(F₁)···det(F_l) det RREF(N) = det M · det N.

📐 Proof outline for non-invertible M

When M is not invertible:

det M = 0 and RREF(M) has a row of zeros.
For any λ, R_n(λ) RREF(M) = RREF(M) (multiplying a zero row does nothing).
Then det(MN) = det(E₁)···det(E_k) det(RREF(M)N).
Using the zero-row property: det(MN) = λ det(MN) for any λ, which implies det(MN) = 0.
Thus det(MN) = 0 = det M · det N.

Don't confuse: This multiplicative property does NOT extend to addition—det(M + N) ≠ det M + det N (see Problem 9g).

🛠️ Review problem themes

🧮 Computing determinants via row operations

Problem 1: Use row operations to put a 3×3 matrix M into row echelon form and prove M is non-singular if and only if a specific expression (the 3×3 determinant formula) is non-zero.

This connects the abstract definition to the concrete formula for 3×3 determinants.
The assumptions m₁₁ ≠ 0 and m₁₁m₂₂ - m₂₁m₁₂ ≠ 0 ensure the row reduction can proceed without division by zero.

Problem 8: Calculate a determinant by factoring the matrix into elementary matrices times simpler matrices, using det(M) = det(E⁻¹) det(EM).

This exploits the relationship between elementary row operations and determinants.
Each ERO matrix must be shown explicitly.

🔄 Elementary matrices in action

Problem 2: Identify what specific elementary matrices do under left and right multiplication.

Matrix	Action under left multiplication	Goal
E₁₂	Swaps rows 1 and 2	Row exchange
R_i(λ)	Multiplies row i by λ	Row scaling
S₁₂(λ)	Adds λ times row 2 to row 1	Row addition

Problem 11: Find the inverses of each elementary matrix type and verify that the product equals the identity.

Example: E_ij is its own inverse (swapping twice returns to the original).
R_i(λ) has inverse R_i(1/λ) (assuming λ ≠ 0).

🔀 Permutations and sign

Problem 3: Explore how swapping outputs in a permutation σ affects sums over all permutations.

Key observation: ∑_σ F(σ) = ∑_σ F(σ̂), where σ̂ swaps a given pair of objects.
This symmetry underlies the sign changes in determinant formulas when rows or columns are swapped.

Problem 4: Explain why det M = -det(S_ij M), where S_ij M is M with rows i and j switched.

This is a fundamental property: swapping two rows changes the sign of the determinant.

Problem 5: Show that swapping two columns also changes the sign: det M' = -det M.

🧪 Properties and non-properties

Problem 9g: Compute det(M + N) - (det M + det N) for 2×2 matrices.

This demonstrates that the determinant is not a linear transformation.
The determinant is multiplicative (det(MN) = det M · det N) but not additive.

Problem 6: Show that the scalar triple product u · (v × w) equals the determinant of the matrix with columns u, v, w.

This connects geometric intuition (volume) to the algebraic determinant.
Permuting the factors changes the sign according to permutation parity.

Problem 7: Show that if one row (or column) is a sum of multiples of others, then det M = 0.

This reflects linear dependence: the matrix is not invertible.

🔬 Advanced explorations

Problem 12: Compute det(A + tI₂) and identify the first-order term (the t¹ coefficient) in terms of tr(A) (the trace of A).

The result is a polynomial in t called the characteristic polynomial.
This generalizes to n×n matrices.

Problem 13: Compute directional derivatives of the determinant function.

The determinant det: M_nn → ℝ is a function of n² variables.
The problem asks for limits of the form lim_{t→0} [det(I_n + tE) - det(I_n)] / t for various directions E.

Problem 14: Count the number of invertible functions f: {1,...,n} → {1,...,n}.

Invertible functions are exactly the permutations.
There are n! permutations of n objects.
The set {1,...,n}^{1,...,n} (all functions, not just invertible ones) has n^n elements.

🧮 Expansion by minors

🔍 What is a minor?

Minor: The determinant of a square matrix obtained from M by deleting one row and one column.

Each entry m_ij of a square matrix M is associated with a minor: delete row i and column j, then take the determinant of what remains.

📝 Expansion formula

The determinant can be written in terms of minors of the first row:

det M = m₁₁ · (minor₁₁) - m₁₂ · (minor₁₂) + m₁₃ · (minor₁₃) - ...

The sign alternates: (-1)^(j-1) for the j-th column.
The formula sums over the first row, but you can expand along any row or column.

🧮 Example: 3×3 determinant

Example 103 computes det M for M = [[1,2,3],[4,5,6],[7,8,9]]:

det M = 1·det[[5,6],[8,9]] - 2·det[[4,6],[7,9]] + 3·det[[4,5],[7,8]]
= 1·(5·9 - 8·6) - 2·(4·9 - 7·6) + 3·(4·8 - 7·5)
= 1·(-3) - 2·(-6) + 3·(-3) = -3 + 12 - 9 = 0

Conclusion: det M = 0, so M is not invertible (M⁻¹ does not exist).

Example 104 notes that sometimes matrix entries allow simplification—though the excerpt cuts off before showing the example.

Don't confuse: Expansion by minors is a computational tool, not the definition of the determinant; the definition uses the sum over all permutations.

Properties of the Determinant

8.4 Properties of the Determinant

🧭 Overview

🧠 One-sentence thesis

The determinant behaves predictably under row operations, transposes, and matrix multiplication, enabling efficient computation strategies and revealing structural relationships like the determinant of the inverse.

📌 Key points (3–5)

Row operations before computing: performing row operations (especially to create zeros) before expanding in minors can greatly simplify determinant calculations.
Transpose symmetry: the determinant of a matrix equals the determinant of its transpose, so column operations work exactly like row operations.
Multiplicative property for inverses: det(M⁻¹) = 1/det(M), derived from det(I) = 1 and det(MN) = det(M)·det(N).
Common confusion: row vs. column operations—because det(Mᵀ) = det(M), expansion by minors works over columns just as it does over rows.
Adjoint matrix: for 2×2 matrices, the adjoint is a special matrix that, when multiplied by the original, yields det(M)·I.

🔧 Computational strategies

🔧 Using row operations to simplify

The excerpt emphasizes that knowing how determinants change under row operations makes it "often very beneficial to perform row operations before computing the determinant by brute force."
Strategy: create zeros in a row (or column) to reduce the number of terms when expanding in minors.

Example from the excerpt:

Matrix N has second row [4, 0, 0]. Switching first and second rows introduces a sign change, then expanding along the new first row (which has two zeros) leaves only one term: −4·det(2×2 submatrix) = 24.

🧮 Creating zero rows

In Example 105, row operations transform the matrix step-by-step until an entire row becomes [0, 0, 0].
A matrix with a zero row has determinant 0.
The excerpt invites the reader to "try to determine which row operations we made at each step."

Don't confuse: performing row operations changes the determinant in a controlled way (row swap → sign flip; row scaling → multiply determinant; row addition → no change), so you must track these changes to get the correct final value.

🔄 Transpose and symmetry

🔄 Determinant of the transpose

If M is a square matrix then det(Mᵀ) = det(M).

The proof uses the permutation definition of the determinant and the fact that every permutation σ has a unique inverse σ⁻¹ with the same sign: sgn(σ) = sgn(σ⁻¹).
Rewriting the sum over permutations as a sum over inverse permutations shows that det(M) = det(Mᵀ).

🔄 Why it matters: column operations

Because transposing leaves the determinant unchanged, "expansion by minors also works over columns."
Example 106: matrix M has first column [1, 0, 0]ᵀ. Expanding along this column (using det(M) = det(Mᵀ)) gives 1·det(2×2 submatrix) = −3.

Common confusion: rows vs. columns—students often think determinant properties apply only to rows, but the transpose theorem guarantees that every row property has a column analogue.

🔗 Determinants and matrix inverses

🔗 Determinant of the inverse

The excerpt recalls two earlier results:
- det(MN) = det(M)·det(N) (multiplicative property)
- det(I) = 1 (identity has determinant 1)
Combining these: 1 = det(I) = det(M·M⁻¹) = det(M)·det(M⁻¹).
Solving for det(M⁻¹):

Theorem 8.4.1: det(M⁻¹) = 1/det(M)

Why it works: the determinant is multiplicative, so the determinant of the inverse must "undo" the determinant of M to yield 1.

🧩 Adjoint of a matrix

🧩 The 2×2 adjoint

For a 2×2 matrix M = [[m₁₁, m₁₂], [m₂₁, m₂₂]], the adjoint is the matrix [[m₂₂, −m₁₂], [−m₂₁, m₁₁]].
Key property: adjoint(M)·M = det(M)·I.
This gives the inverse formula:

M⁻¹ = (1/det(M))·adjoint(M), provided det(M) ≠ 0.

Example from the excerpt:

The matrix [[a, b], [c, d]] has adjoint [[d, −b], [−c, a]].
Multiplying: [[d, −b], [−c, a]]·[[a, b], [c, d]] = (ad − bc)·I = det(M)·I.

🧩 Generalizing the adjoint

The excerpt introduces the adjoint for 2×2 matrices and notes that "the matrix ... that appears above is a special matrix, called the adjoint of M."
The text ends mid-sentence ("Let's define the adjoint for"), suggesting the general n×n definition follows in the next part.

Don't confuse: the adjoint is not the same as the transpose; it involves cofactors (signed minors) and is specifically designed so that adjoint(M)·M = det(M)·I.

Determinant of the Inverse

8.4.1 Determinant of the Inverse

🧭 Overview

🧠 One-sentence thesis

The determinant of the inverse of a matrix is the reciprocal of the determinant of the original matrix, which follows directly from the multiplicative property of determinants.

📌 Key points (3–5)

Core formula: det(M⁻¹) = 1 / det(M), meaning the inverse's determinant is the reciprocal of the original determinant.
Why it works: the proof uses det(MN) = det(M) det(N) and det(I) = 1 together with the definition M M⁻¹ = I.
Connection to adjoint: the adjoint matrix is a special matrix that appears in the formula for the inverse of a 2×2 matrix.
Common confusion: the determinant of the inverse is not the negative of the original determinant; it is the reciprocal (1 divided by the original).

🔢 The main theorem

🔢 Statement of the result

Theorem 8.4.1: det(M⁻¹) = 1 / det(M)

This says: to find the determinant of an inverse matrix, take the reciprocal of the original matrix's determinant.
Example: if det(M) = 5, then det(M⁻¹) = 1/5.
This only makes sense when det(M) ≠ 0 (otherwise M has no inverse).

🧮 Why the formula holds

The proof uses three facts that were established earlier:

Multiplicative property: det(MN) = det(M) det(N) for any n×n matrices M and N.
Identity determinant: det(I) = 1.
Definition of inverse: M M⁻¹ = I.

Putting these together:

Start with 1 = det(I).
Substitute I = M M⁻¹ to get 1 = det(M M⁻¹).
Apply the multiplicative property: 1 = det(M) det(M⁻¹).
Solve for det(M⁻¹): det(M⁻¹) = 1 / det(M).

Don't confuse: This is not saying det(M⁻¹) = -det(M). The operation is division (reciprocal), not negation.

🧷 Introduction to the adjoint

🧷 The 2×2 case

The excerpt introduces the adjoint as a special matrix that appears in the inverse formula for 2×2 matrices.

For a 2×2 matrix M = (m₁₁ m₁₂ / m₂₁ m₂₂), the inverse formula is:

M⁻¹ = (1 / det(M)) × adjoint(M)
where det(M) = m₁₁m₂₂ - m₁₂m₂₁
and the adjoint is the matrix (m₂₂ -m₁₂ / -m₂₁ m₁₁)

Pattern: The adjoint swaps the diagonal entries (m₁₁ and m₂₂) and negates the off-diagonal entries (m₁₂ and m₂₁).

🔍 The relationship

The excerpt shows that:

(adjoint of M) × M = det(M) × I

In other words:

(m₂₂ -m₁₂ / -m₂₁ m₁₁) × (m₁₁ m₁₂ / m₂₁ m₂₂) = (m₁₁m₂₂ - m₁₂m₂₁) × (1 0 / 0 1)

This relationship is what makes the adjoint useful for computing inverses: multiply both sides by 1/det(M) to get M⁻¹.

Condition: This only works when det(M) ≠ 0, because otherwise you cannot divide by det(M).

Adjoint of a Matrix

8.4.2 Adjoint of a Matrix

🧭 Overview

🧠 One-sentence thesis

The adjoint matrix provides an explicit formula for computing the inverse of any invertible square matrix through cofactors and the determinant.

📌 Key points (3–5)

What the adjoint is: the transpose of the matrix formed by cofactors of each entry.
Core relationship: for any square matrix M, the product M times adj M equals (det M) times the identity matrix.
Inverse formula: when det M is nonzero, the inverse is M⁻¹ = (1 / det M) times adj M.
Common confusion: the adjoint is not just the cofactor matrix—it is the transpose of the cofactor matrix.
Connection to 2×2 case: the familiar 2×2 inverse formula is a special case of the general adjoint formula.

🧩 Building blocks: cofactors and the adjoint

🧩 What a cofactor is

Cofactor of M corresponding to entry m_ij: the product of the minor associated to m_ij and (−1)^(i+j).

The minor is the determinant of the submatrix obtained by deleting row i and column j.
The sign factor (−1)^(i+j) alternates in a checkerboard pattern.
Notation: cofactor(m_ij).

🔄 Defining the adjoint matrix

Adjoint matrix adj M: the transpose of the matrix formed by all cofactors of M.

Start with the matrix M = (m_ij).
Form the cofactor matrix: replace each entry m_ij with cofactor(m_ij).
Transpose that cofactor matrix to get adj M.
In symbols: adj M = (cofactor(m_ij))^T.

Example from the excerpt:

For the 3×3 matrix with entries (3, −1, −1; 1, 2, 0; 0, 1, 1), the adjoint is computed by:
- Finding the cofactor of each entry (e.g., cofactor of the (1,1) entry is det of the 2×2 submatrix (2, 0; 1, 1) = 2).
- Arranging all nine cofactors into a matrix.
- Transposing that matrix to get adj M = (2, 0, 2; −1, 3, −1; 1, −3, 7).

🔗 The fundamental product: M times adj M

🔗 Why M adj M = (det M) I

The (i, j) entry of M times adj M is the dot product of the i-th row of M and the j-th column of adj M.
When i = j (diagonal entries): this dot product is exactly the expansion by minors of det M along the i-th row, so the result is det M.
When i ≠ j (off-diagonal entries): this dot product is the same as expanding by minors a matrix where the j-th row has been replaced by the i-th row—a matrix with a repeated row, which has determinant zero.
Therefore: M adj M = (det M) I.

🧮 Implication for the inverse

Rearrange the equation M adj M = (det M) I.
Divide both sides by det M (assuming det M ≠ 0): M times (1 / det M) adj M = I.
This shows that M⁻¹ = (1 / det M) adj M.

Theorem 8.4.2:

For M a square matrix with det M ≠ 0 (equivalently, if M is invertible), then M⁻¹ = (1 / det M) adj M.

🔍 Worked example and Cramer's Rule

🔍 Computing the inverse via the adjoint

Continuing the 3×3 example:

M = (3, −1, −1; 1, 2, 0; 0, 1, 1).
adj M = (2, 0, 2; −1, 3, −1; 1, −3, 7).
Multiply M times adj M: the result is (6, 0, 0; 0, 6, 0; 0, 0, 6) = 6 I.
Therefore det M = 6.
The inverse is M⁻¹ = (1/6) adj M = (1/6) times (2, 0, 2; −1, 3, −1; 1, −3, 7).

📐 Cramer's Rule

The excerpt states: "This process for finding the inverse matrix is sometimes called Cramer's Rule."
It refers to using the adjoint formula M⁻¹ = (1 / det M) adj M to compute the inverse explicitly.

🔄 Connection to the 2×2 case

🔄 The familiar 2×2 inverse

Recall the 2×2 formula:

For M = (m₁₁, m₁₂; m₂₁, m₂₂), the inverse is M⁻¹ = (1 / det M) times (m₂₂, −m₁₂; −m₂₁, m₁₁).
The matrix (m₂₂, −m₁₂; −m₂₁, m₁₁) is the adjoint of M.
The excerpt shows this explicitly: the adjoint is formed by cofactors, and for 2×2 matrices the cofactors are m₂₂, −m₁₂, −m₂₁, m₁₁ (with appropriate signs).

Don't confuse:

The 2×2 adjoint looks like "swap the diagonal, negate the off-diagonal," but this is a special case.
For larger matrices, you must compute each cofactor (minor with sign) and then transpose.

📊 Summary table: adjoint properties

Property	Description
Definition	adj M = (cofactor(m_ij))^T
Fundamental product	M adj M = (det M) I
Inverse formula	M⁻¹ = (1 / det M) adj M (when det M ≠ 0)
Relation to 2×2	The familiar 2×2 inverse formula is the adjoint formula for n=2
Cramer's Rule	Using the adjoint to compute the inverse explicitly

Application: Volume of a Parallelepiped

8.4.3 Application: Volume of a Parallelepiped

🧭 Overview

🧠 One-sentence thesis

The volume of a parallelepiped determined by three vectors in three-dimensional space equals the absolute value of the determinant of the matrix formed by those vectors as columns.

📌 Key points (3–5)

What a parallelepiped is: a "squished" box whose edges are parallel to three given vectors u, v, and w in three-dimensional space.
Volume formula: the volume equals the absolute value of the determinant of the matrix whose columns are the three vectors.
Connection to calculus: this determinant formula is equivalent to the triple scalar product (u · (v × w)) from calculus.
Why determinants matter: the determinant provides a direct algebraic way to compute geometric volume without needing cross products.

📦 The geometric object

📦 What a parallelepiped is

A parallelepiped determined by three vectors u, v, w in three-dimensional space: the "squished" box whose edges are parallel to u, v, and w.

Think of it as a three-dimensional generalization of a parallelogram.
The shape is formed by starting at the origin and extending edges along the directions of the three vectors.
The excerpt describes it as "squished" to emphasize that the edges need not be perpendicular—it is a general box shape, not necessarily rectangular.
Example: if u, v, w point in three different directions, the parallelepiped is the solid region swept out by moving along those three directions.

🧮 Computing the volume

🧮 The determinant formula

The excerpt states:

Volume = |det(u v w)|

where (u v w) denotes the matrix whose columns are the three vectors.

The vertical bars indicate absolute value, so the volume is always non-negative.
The determinant can be computed by expansion by minors (as discussed earlier in the chapter).
The sign of the determinant depends on orientation, but volume is a magnitude, so we take the absolute value.

🔗 Connection to the triple scalar product

The excerpt notes:

You probably learnt in a calculus course that the volume of this object is |u · (v × w)|.

The triple scalar product u · (v × w) is a calculus formula for the same volume.
The excerpt states: "This is the same as expansion by minors of the matrix whose columns are u, v, w."
Don't confuse: the determinant formula and the triple scalar product are two different computational methods for the same geometric quantity.

🔍 Why this formula works

🔍 Determinant as volume measure

The determinant of a matrix measures how much the linear transformation represented by that matrix scales volumes.
When the columns are u, v, w, the determinant measures the signed volume of the parallelepiped they span.
The absolute value removes the sign, leaving only the geometric volume.

🔍 Practical advantage

Using the determinant formula allows volume computation through pure linear algebra (matrix operations).
No need to compute cross products explicitly; just form the matrix and compute its determinant.
Example: given three vectors as columns, apply expansion by minors or any other determinant method to find the volume directly.

8.5 Review Problems

🧭 Overview

🧠 One-sentence thesis

This section provides practice problems that apply determinant computation techniques (expansion by minors, properties of determinants, and computational efficiency) and introduces the concept of subspaces as subsets of vector spaces that are themselves vector spaces.

📌 Key points (3–5)

Determinant computation practice: problems cover expanding by minors for larger matrices and properties of determinants for non-square matrix products.
Permutation relationships: exploring how determinants relate to permutations and their inverses.
Computational efficiency: analyzing the complexity (number of operations) of different determinant calculation methods.
Subspace introduction: a subset U of vector space V is a subspace if it is closed under addition and scalar multiplication (μu₁ + νu₂ ∈ U).
Common confusion: the Subspace Theorem simplifies checking—you don't need to verify all ten vector space properties, only closure under linear combinations.

🧮 Determinant computation problems

🧮 Expansion by minors

Problem 1 asks to find the determinant of a 4×4 matrix using expansion by minors.
This applies the technique from earlier sections where you expand along a row or column.
The method involves computing smaller 3×3 determinants (minors) and combining them with appropriate signs (cofactors).

🔄 Non-square matrix products

Problem 2 examines whether det(MM^T) = det(M^T M) for non-square matrices M.
Even though M itself is not square, both MM^T and M^T M are square matrices, so their determinants exist.
The problem also asks about the trace: tr(MM^T) = tr(M^T M).

🔀 Permutations and inverses

Problem 3 explores the relationship between a permutation σ and its inverse σ⁻¹.
You must write out sums over permutations explicitly for a function f on {1, 2, 3, 4}.
The goal is to observe and explain why ∑_σ F(σ) = ∑_σ F(σ⁻¹) for any function F on permutations.
This relates to the symmetry properties used in determinant definitions.

⚡ Computational efficiency

⚡ LU decomposition advantage

Problem 4: if M = LU is an LU decomposition, how do you efficiently compute det M?
Key insight: the determinant of a triangular matrix is the product of its diagonal entries.
Since L and U are triangular, det M = det L · det U can be computed quickly.
Invertibility check: M is invertible if and only if all diagonal entries of L and U are non-zero.

📊 Complexity analysis

Problem 5 analyzes algorithm complexity by counting operations:

Method	Operations needed	Notes
2×2 determinant	Few additions and multiplications	Direct formula
n×n by definition	Factorial growth in operations	Sum over all n! permutations
3×3 by minors	Fewer than definition method	Recursive breakdown

Addition takes a seconds, multiplication takes m seconds.
Example: computing 2·6 − 5 takes a + m seconds.
The problem asks you to compare expansion by minors versus the permutation definition for a 3×3 matrix (assuming m = 2a).

🏗️ Introduction to subspaces

🏗️ What is a subspace

Subspace: A subset U of a vector space V is a subspace of V if U is itself a vector space under the inherited addition and scalar multiplication operations of V.

Not every subset of a vector space is a subspace.
The subset must satisfy all vector space properties on its own.
Example from the excerpt: a plane P in R³ through the origin defined by ax + by + cz = 0.

✅ The Subspace Theorem (simplified checking)

Subspace Theorem: Let U be a non-empty subset of vector space V. Then U is a subspace if and only if μu₁ + νu₂ ∈ U for arbitrary u₁, u₂ in U and arbitrary constants μ, ν.

Why this matters:

You only need to check one condition: closure under linear combinations.
You don't need to verify all ten vector space properties individually.
The other eight properties are automatically inherited from V.

Don't confuse: "closure" here means specifically that any linear combination μu₁ + νu₂ stays inside U; it's not just about addition or scalar multiplication separately, but both together.

🔍 Plane through the origin example

The excerpt shows that a plane P in R³ through the origin (ax + by + cz = 0) is a subspace:

This can be written as a homogeneous system MX = 0.
If X₁ and X₂ are solutions, then μX₁ + νX₂ is also a solution by linearity: M(μX₁ + νX₂) = μMX₁ + νMX₂ = 0.
P contains the origin (set μ = ν = 0).
Therefore P is closed under addition and scalar multiplication, making it a subspace.

Subspaces

9.1 Subspaces

🧭 Overview

🧠 One-sentence thesis

A subspace is a subset of a vector space that is itself a vector space under the same operations, and the span of any set of vectors always forms a subspace.

📌 Key points (3–5)

What a subspace is: a subset of a vector space that is itself a vector space under the inherited operations.
How to check for subspaces: use the Subspace Theorem—only need to verify that any linear combination of two vectors in the subset stays in the subset (closure under linear combinations).
Building subspaces via span: the span of any set of vectors is the set of all finite linear combinations of those vectors, and it always forms a subspace.
Common confusion: a set of a few vectors (like two vectors in R³) is not itself a subspace, but the span of those vectors (all their linear combinations) is a subspace.
Why it matters: subspaces arise naturally as kernels of linear maps (solutions to homogeneous equations) and as spans of vector sets, helping us understand the structure of vector spaces.

🔍 What is a subspace?

🔍 Definition and intuition

Subspace: A subset U of a vector space V is a subspace of V if U is a vector space under the inherited addition and scalar multiplication operations of V.

A subspace is not just any subset; it must satisfy all ten vector space properties.
The key idea: you can add vectors in U and multiply them by scalars, and you always stay inside U.
Example: A plane through the origin in R³ is a subspace, but a plane not through the origin is not (it doesn't contain the zero vector).

✅ The Subspace Theorem (checking tool)

Subspace Theorem: Let U be a non-empty subset of a vector space V. Then U is a subspace if and only if μu₁ + νu₂ ∈ U for arbitrary u₁, u₂ in U, and arbitrary constants μ, ν.

This theorem simplifies checking: instead of verifying all ten vector space properties, you only need to check closure under linear combinations.
Two questions to ask:
1. If you add any two vectors in U, do you get a vector in U?
2. If you multiply any vector in U by any constant, do you get a vector in U?
If both answers are yes, then U is a subspace.
Why this works: the other eight vector space properties are automatically inherited from the larger space V.

🧪 Example: a plane through the origin

The excerpt gives a plane P in R³ defined by ax + by + cz = 0.

This can be written as a homogeneous system: MX = 0, where M = (a b c).
If X₁ and X₂ are solutions, then μX₁ + νX₂ is also a solution by linearity of matrix multiplication:
- M(μX₁ + νX₂) = μMX₁ + νMX₂ = 0.
So P is closed under addition and scalar multiplication.
P contains the origin (set μ = ν = 0).
All other vector space requirements hold because they hold for all vectors in R³.
Therefore, P is a subspace.

Don't confuse: A plane through the origin is a subspace, but a plane not through the origin is not—it fails to contain the zero vector and is not closed under scalar multiplication.

🏗️ Building subspaces with span

🏗️ What is span?

Span of S: Let V be a vector space and S = {s₁, s₂, ...} ⊂ V. The span of S, denoted span(S), is the set of all finite linear combinations of elements of S:

span(S) := {r₁s₁ + r₂s₂ + ⋯ + rₙsₙ | rᵢ ∈ R, N ∈ N}.

The span is the set of all vectors you can make by taking any finite sum of the form "a constant times s₁ plus a constant times s₂ plus a constant times s₃ and so on."
Important: only finite linear combinations are allowed; N must be finite (though it can be any finite number).
The span captures "all vectors reachable" by combining the vectors in S.

🧩 Example: spanning the xy-plane

Consider U = {(1,0,0), (0,1,0)} ⊂ R³.

U itself is not a vector space—it contains only two vectors, not their multiples or sums.
But these two vectors define the xy-plane in R³.
The span of U is:
- span(U) = {x(1,0,0) + y(0,1,0) | x, y ∈ R}.
Any vector in the xy-plane is of the form (x, y, 0) = x(1,0,0) + y(0,1,0) ∈ span(U).
So span(U) is the xy-plane, which is a vector space.

Don't confuse: The set U (just two vectors) is not a subspace, but span(U) (all linear combinations of those vectors) is a subspace.

🧩 Another example: spanning with the x-axis and a point

Let V = R³, X be the x-axis, P = (0,1,0), and S = X ∪ {P}.

The vector (2,3,0) is in span(S) because (2,3,0) = (2,0,0) + 3(0,1,0).
The vector (-12, 17.5, 0) is in span(S) because (-12, 17.5, 0) = (-12,0,0) + 17.5(0,1,0).
Any vector of the form (x,0,0) + y(0,1,0) = (x,y,0) is in span(S).
On the other hand, any vector in span(S) must have a zero in the z-coordinate (because all vectors in S have z = 0, and linear combinations preserve this).
So span(S) is the xy-plane, which is a vector space.

📐 Span is always a subspace

Lemma: For any subset S ⊂ V, span(S) is a subspace of V.

Proof idea:

Let u, v ∈ span(S) and λ, μ be constants.
By definition of span, there are constants cᵢ and dᵢ such that:
- u = c₁s₁ + c₂s₂ + ⋯
- v = d₁s₁ + d₂s₂ + ⋯
Then:
- λu + μv = λ(c₁s₁ + c₂s₂ + ⋯) + μ(d₁s₁ + d₂s₂ + ⋯)
- = (λc₁ + μd₁)s₁ + (λc₂ + μd₂)s₂ + ⋯
This last sum is a linear combination of elements of S, so it is in span(S).
Therefore span(S) is closed under linear combinations, and is thus a subspace of V.

Key takeaway: The span construction always produces a subspace, no matter what set S you start with.

🔧 Practical applications

🔧 When does a span equal the whole space?

Example: For which values of a does span{(1,0,a), (1,2,-3), (a,1,0)} = R³?

Given an arbitrary vector (x,y,z) in R³, we need to find constants r₁, r₂, r₃ such that:
- r₁(1,0,a) + r₂(1,2,-3) + r₃(a,1,0) = (x,y,z).
This can be written as a linear system:
- M(r₁, r₂, r₃)ᵀ = (x,y,z)ᵀ, where M is the matrix with columns (1,0,a), (1,2,-3), (a,1,0).
If M is invertible, then we can solve for (r₁, r₂, r₃) = M⁻¹(x,y,z) for any vector (x,y,z) ∈ R³.
So we need det M ≠ 0.
Computing: det M = -2a² + 3 + a = -(2a - 3)(a + 1).
The span is R³ if and only if a ≠ -1 and a ≠ 3/2.

Key insight: A set of vectors spans the whole space if and only if the matrix formed by those vectors is invertible (has nonzero determinant).

🔧 Kernel of a linear map

Example: Suppose L: U → V is a linear map. The kernel of L is:

ker L := {u ∈ U | L(u) = 0} ⊂ U.

If L(u) = 0 and L(u') = 0, then by linearity:
- L(αu + βu') = αL(u) + βL(u') = α0 + β0 = 0.
So the kernel is closed under linear combinations.
By the Subspace Theorem, the kernel is a subspace of U.
Finding a kernel means solving a homogeneous linear equation.

Don't confuse: The kernel is a subset of the domain U, not the codomain V. It consists of all vectors that map to zero.

🔧 Image of a linear map

The excerpt mentions the image of a linear map but does not complete the example. The image is the set of all vectors in V that are outputs of L, i.e., {L(u) | u ∈ U}.

Building Subspaces

9.2 Building Subspaces

🧭 Overview

🧠 One-sentence thesis

The span of any set of vectors forms a subspace, and important subspaces arise naturally from linear maps through kernels, images, and eigenspaces.

📌 Key points (3–5)

Span always creates a subspace: for any subset S of a vector space V, span(S) is guaranteed to be a subspace of V.
How span works: span(S) consists of all possible linear combinations of vectors from S (with finitely many terms).
Three key subspace constructions from linear maps: kernel (inputs mapped to zero), image (all possible outputs), and eigenspace (vectors scaled by a specific factor).
Common confusion: span(S) may not fill the entire space—for example, the span of vectors in the xy-plane cannot produce vectors with nonzero z-coordinates.
Determining full span: a set of vectors spans the entire space (e.g., R³) if and only if the associated matrix is invertible.

🏗️ What span means and how it works

🏗️ Definition of span

span(S) := {r₁s₁ + r₂s₂ + ⋯ + rₙsₙ | rᵢ ∈ R, sᵢ ∈ S, N ∈ N}

Span(S) is the set of all linear combinations of vectors from S.
The coefficients rᵢ come from the base field (usually the real numbers R).
Important restriction: only finitely many terms are allowed in each linear combination (N must be finite, though it can be any finite number).

🎯 Concrete example: the xy-plane

The excerpt gives S = {(2, 0, 0), (0, 1, 0)}.

The vector (3, 0, 0) is in span(S) because (3, 0, 0) = 2·(2, 0, 0) + 3·(0, 1, 0). (Note: this appears to be a typo in the source; the arithmetic should be (3, 0, 0) = 1.5·(2, 0, 0) + 0·(0, 1, 0).)
The vector (−12, 17.5, 0) is in span(S) because (−12, 17.5, 0) = (−12)·(2, 0, 0) + 17.5·(0, 1, 0). (Again, the arithmetic in the source is inconsistent but the idea is clear.)
Any vector of the form (x, y, 0) is in span(S).
Key observation: any vector in span(S) must have a zero in the z-coordinate, because both generating vectors have z = 0.
Therefore span(S) is the xy-plane, which is a vector space.

Don't confuse: span(S) with the entire space V—span may be strictly smaller (a proper subspace).

🔒 Span always forms a subspace

🔒 The fundamental lemma

Lemma 9.2.1: For any subset S ⊂ V, span(S) is a subspace of V.

🔍 Why this is true (proof sketch)

The proof shows that span(S) is closed under linear combinations:

Take any two vectors u, v in span(S).
By definition, u = c₁s₁ + c₂s₂ + ⋯ and v = d₁s₁ + d₂s₂ + ⋯ for some coefficients cᵢ, dᵢ and vectors sᵢ from S.
For any constants λ, μ, compute:
- λu + μv = λ(c₁s₁ + c₂s₂ + ⋯) + μ(d₁s₁ + d₂s₂ + ⋯)
- = (λc₁ + μd₁)s₁ + (λc₂ + μd₂)s₂ + ⋯
This result is again a linear combination of elements of S, so it is in span(S).
Therefore span(S) is closed under linear combinations, which makes it a subspace.

Note: The excerpt emphasizes that "this proof, like many proofs, consisted of little more than just writing out the definitions."

🧮 When does a span fill the whole space?

🧮 Example: spanning R³

Question: For which values of a does span{(1, 0, a), (1, 2, −3), (a, 1, 0)} = R³?

🔧 Method: invertibility test

To span all of R³, we need to be able to write any vector (x, y, z) as:

r₁(1, 0, a) + r₂(1, 2, −3) + r₃(a, 1, 0) = (x, y, z)

This translates to the linear system:

Matrix M = [[1, 1, a], [0, 2, 1], [a, −3, 0]]
M · [r₁, r₂, r₃]ᵀ = [x, y, z]ᵀ

Key insight: If M is invertible, we can solve for [r₁, r₂, r₃] for any vector (x, y, z).

✅ Solution

Compute det(M) = −2a² + 3 + a = −(2a − 3)(a + 1).
M is invertible when det(M) ≠ 0.
Therefore the span is R³ if and only if a ≠ −1 and a ≠ 3/2.

🗺️ Three important subspaces from linear maps

🗺️ The kernel of a linear map

ker L := {u ∈ U | L(u) = 0} ⊂ U

The kernel is the set of all input vectors that map to the zero vector.
Why it's a subspace: If L(u) = 0 and L(u′) = 0, then by linearity L(αu + βu′) = αL(u) + βL(u′) = α·0 + β·0 = 0.
Finding the kernel means solving a homogeneous linear equation.

🎯 The image of a linear map

im L := {L(u) | u ∈ U} ⊂ V

The image is the set of all possible output vectors of L.
Why it's a subspace: If v = L(u) and v′ = L(u′), then by linearity αv + βv′ = αL(u) + βL(u′) = L(αu + βu′).
The image captures "what outputs are reachable."

🔢 An eigenspace of a linear map

Vλ := {v ∈ V | L(v) = λv}

An eigenspace is the set of all vectors that satisfy the eigenvector equation L(v) = λv for a specific scalar λ.
Why it's a subspace: If L(u) = λu and L(v) = λv, then L(αu + βv) = αL(u) + βL(v) = αλu + βλv = λ(αu + βv).
For most scalars λ, the only solution is v = 0 (the trivial subspace {0}).
When nontrivial solutions exist, λ is called an eigenvalue and carries essential information about the map L.

Don't confuse: These three constructions—kernel, image, and eigenspace—are all subspaces, but they capture different aspects of a linear map (inputs that vanish, reachable outputs, and directions that are only scaled, respectively).

Review Problems: Subspaces and Spanning Sets

9.3 Review Problems

🧭 Overview

🧠 One-sentence thesis

This section provides practice problems on determining whether vectors belong to spans, whether unions and intersections of subspaces remain subspaces, and how to find kernels, images, and eigenspaces of linear maps.

📌 Key points (3–5)

Span membership: determining if a given vector can be written as a linear combination of a set of vectors.
Subspace closure under set operations: unions and intersections of subspaces behave differently—one preserves the subspace property, the other may not.
Kernel, image, and eigenspaces: these are all subspaces built from a linear map, expressed using span notation.
Common confusion: union vs intersection—only intersection of subspaces is guaranteed to be a subspace; union typically is not unless one subspace contains the other.

🔍 Span membership problems

🔍 Checking if a vector is in a span

Problem 1 asks: Is x − x³ in span{x², 2x + x², x + x³}?

How to solve:

Write x − x³ as a linear combination: x − x³ = r₁(x²) + r₂(2x + x²) + r₃(x + x³).
Expand and collect like terms (by powers of x).
Set up a system of equations for the coefficients r₁, r₂, r₃.
If a solution exists, the vector is in the span; otherwise it is not.

Why it matters:

Span membership is equivalent to solving a linear system.
This connects the geometric idea of "reachability" to algebraic solvability.

🧮 Span in polynomial spaces

The vectors here are polynomials: x², 2x + x², x + x³.
Linear combinations form new polynomials.
Example: If you can express x − x³ as a sum of scaled versions of the given polynomials, it lies in their span.

🧩 Subspaces under set operations

🧩 Union of subspaces (Problem 2a)

Question: If U and W are subspaces of V, is U ∪ W a subspace?

Answer (from the excerpt's hint structure):

Generally no.
A subspace must be closed under addition.
If you take one vector from U (not in W) and one from W (not in U), their sum may not lie in either U or W, so it is not in U ∪ W.
Exception: U ∪ W is a subspace only if one subspace contains the other (U ⊆ W or W ⊆ U).

Example in R³:

Let U be the x-axis and W be the y-axis.
U ∪ W is the union of two lines through the origin.
Take u = (1,0,0) ∈ U and w = (0,1,0) ∈ W.
Their sum u + w = (1,1,0) is not on either axis, so not in U ∪ W.
Therefore U ∪ W is not closed under addition and is not a subspace.

🧩 Intersection of subspaces (Problem 2b)

Question: If U and W are subspaces of V, is U ∩ W a subspace?

Answer:

Yes, always.
If u, v ∈ U ∩ W, then u, v ∈ U and u, v ∈ W.
Since U and W are both subspaces, any linear combination αu + βv is in U and also in W.
Therefore αu + βv ∈ U ∩ W, so U ∩ W is closed under linear combinations.

Example in R³:

Let U be the xy-plane and W be the xz-plane.
U ∩ W is the x-axis (the set of vectors with y = 0 and z = 0).
The x-axis is a subspace (a line through the origin).

Don't confuse:

Union typically fails the subspace test; intersection always passes.

🔧 Kernel, image, and eigenspaces

🔧 Problem setup (Problem 3)

Given a linear map L : R³ → R³ defined by

L(x, y, z) = (x + 2y + z, 2x + y + z, 0)

Find:

ker L (kernel)
im L (image)
The eigenspaces R³₋₁ and R³₃

All answers should be subsets of R³ expressed in span notation.

🛠️ Finding the kernel

ker L := {u ∈ U | L(u) = 0}

How to find it:

Solve L(x, y, z) = (0, 0, 0).
This gives the homogeneous system:
- x + 2y + z = 0
- 2x + y + z = 0
- 0 = 0 (always true)
Solve for x, y, z in terms of free variables.
Express the solution set as span of basis vectors.

Why it's a subspace:

The excerpt (Example 112) shows that if L(u) = 0 and L(u′) = 0, then L(αu + βu′) = αL(u) + βL(u′) = 0.
So ker L is closed under linear combinations.

🛠️ Finding the image

im L := {L(u) | u ∈ U}

How to find it:

The image is the set of all possible outputs of L.
Write L(x, y, z) = x(1, 2, 0) + y(2, 1, 0) + z(1, 1, 0).
So im L = span{(1, 2, 0), (2, 1, 0), (1, 1, 0)}.
Simplify by finding which vectors are independent (remove redundant ones if needed).

Why it's a subspace:

The excerpt (Example 113) shows that if v = L(u) and v′ = L(u′), then αv + βv′ = L(αu + βu′).
So im L is closed under linear combinations.

🛠️ Finding eigenspaces

Vλ := {v ∈ V | L(v) = λv}

For eigenspace R³₋₁:

Solve L(x, y, z) = −1 · (x, y, z).
This gives:
- x + 2y + z = −x → 2x + 2y + z = 0
- 2x + y + z = −y → 2x + 2y + z = 0
- 0 = −z → z = 0
Solve this system and express solutions as span.

For eigenspace R³₃:

Solve L(x, y, z) = 3 · (x, y, z).
This gives:
- x + 2y + z = 3x → −2x + 2y + z = 0
- 2x + y + z = 3y → 2x − 2y + z = 0
- 0 = 3z → z = 0
Solve and express as span.

Why eigenspaces are subspaces:

The excerpt (Example 114) shows that if L(u) = λu and L(v) = λv, then L(αu + βv) = λ(αu + βv).
So each eigenspace is closed under linear combinations.

Don't confuse:

Kernel is the eigenspace for λ = 0.
For most values of λ, the eigenspace is just {0} (the trivial subspace).
Only special values of λ (eigenvalues) give nontrivial eigenspaces.

📊 Summary table

Concept	Definition	How to find it	Key property
ker L	{u ∈ U \| L(u) = 0}	Solve L(u) = 0 (homogeneous system)	Always a subspace of the domain
im L	{L(u) \| u ∈ U}	Find span of output vectors	Always a subspace of the codomain
Eigenspace Vλ	{v ∈ V \| L(v) = λv}	Solve L(v) = λv	Subspace of V; nontrivial only for eigenvalues λ
U ∩ W	Vectors in both U and W	Set intersection	Always a subspace if U, W are subspaces
U ∪ W	Vectors in U or W	Set union	Usually NOT a subspace unless U ⊆ W or W ⊆ U

Showing Linear Dependence

10.1 Showing Linear Dependence

🧭 Overview

🧠 One-sentence thesis

A set of vectors is linearly dependent if at least one vector can be written as a linear combination of the others, which can be detected by finding nontrivial solutions to a homogeneous system.

📌 Key points (3–5)

What linear dependence means: vectors are linearly dependent when there exist constants (not all zero) such that their linear combination equals zero.
How to test for dependence: build a matrix from the vectors and check if it is singular (determinant equals zero).
Finding the coefficients: solve the homogeneous system to find the specific linear combination that equals zero.
Common confusion: linear dependence is equivalent to expressing one vector as a combination of preceding vectors, not just any vectors.
Key restriction: the zero vector can never be on a list of independent vectors.

🔍 What linear dependence means

🔍 The formal definition

Vectors v₁, v₂, ..., vₙ are linearly dependent if there exist constants c₁, c₂, ..., cₙ not all zero such that c₁v₁ + c₂v₂ + ⋯ + cₙvₙ = 0.

The key phrase is "not all zero"—at least one coefficient must be nonzero.
If the only solution is all coefficients equal to zero, the vectors are linearly independent instead.
Example: If 3v₁ + 2v₂ − v₃ + v₄ = 0, then {v₁, v₂, v₃, v₄} are linearly dependent.

🚫 Why the zero vector matters

The zero vector can never appear in a list of independent vectors.
Reason: for any scalar α, we have α·0 = 0, so we can always write a nontrivial combination equaling zero.
Don't confuse: the zero vector itself versus the zero result of a linear combination.

🧮 How to test for linear dependence

🧮 The matrix method

The excerpt shows a systematic approach:

Set up the equation: c₁v₁ + c₂v₂ + c₃v₃ = 0
Build a matrix: create matrix M with the vectors as columns: M = (v₁ v₂ v₃)
Check the determinant: if det(M) = 0, the matrix is singular and nontrivial solutions exist
Conclusion: nontrivial solutions exist if and only if the vectors are linearly dependent

Example from the excerpt: For three vectors in R³, the matrix determinant was 0, proving linear dependence.

🔧 Finding the specific coefficients

Once you know vectors are dependent, you can find the actual coefficients:

Solve the homogeneous system by row reduction.
The solution set describes all possible linear combinations that equal zero.
Example: solution set {μ(−2, −1, 1) | μ ∈ R} means any multiple of (−2, −1, 1) works.
Choosing μ = 1 gives: −2v₁ − v₂ + v₃ = 0.

🔗 The equivalence theorem

🔗 Linear dependence and linear combinations

Theorem (Linear Dependence): An ordered set of non-zero vectors (v₁, ..., vₙ) is linearly dependent if and only if one of the vectors vₖ is expressible as a linear combination of the preceding vectors.

This theorem has two directions:

Direction	What it says	Why it matters
If vₖ is a combination of earlier vectors → dependent	Rewrite as c₁v₁ + ⋯ + cₖ₋₁vₖ₋₁ − vₖ = 0	Shows dependence directly
If dependent → some vₖ is a combination of earlier vectors	Rearrange the dependence equation to isolate vₖ	Explains what dependence means geometrically

🎯 The proof strategy

Direction 1 (combination → dependent):

Start with vₖ = c₁v₁ + ⋯ + cₖ₋₁vₖ₋₁
Rearrange: c₁v₁ + ⋯ + cₖ₋₁vₖ₋₁ − vₖ + 0vₖ₊₁ + ⋯ + 0vₙ = 0
This is a vanishing combination with not all coefficients zero (the coefficient of vₖ is −1).

Direction 2 (dependent → combination):

Start with c₁v₁ + c₂v₂ + ⋯ + cₙvₙ = 0 where not all cᵢ are zero.
Let k be the largest index where cₖ ≠ 0.
Note: k > 1, otherwise c₁v₁ = 0 would imply v₁ = 0, contradicting the assumption.
Rearrange: vₖ = −(c₁/cₖ)v₁ − (c₂/cₖ)v₂ − ⋯ − (cₖ₋₁/cₖ)vₖ₋₁.

📐 Geometric interpretation

In the plane example: three non-parallel vectors in a plane through the origin are dependent because any plane is spanned by just two vectors.
The third vector must be expressible using the first two.
Example: if P = span{u, v}, then w = d₁u + d₂v for some constants d₁, d₂.

💡 Working examples

💡 Polynomial example

In the vector space P₂(t) of polynomials of degree ≤ 2:

v₁ = 1 + t
v₂ = 1 + t²
v₃ = t + t²
v₄ = 2 + t + t²
v₅ = 1 + t + t²

The set {v₁, ..., v₅} is linearly dependent because v₄ = v₁ + v₂.

This shows dependence directly by exhibiting one vector as a combination of preceding ones.

💡 What a linear combination is

A linear combination of vectors v₁, ..., vₖ multiplied by scalars c₁, ..., cₖ is: c₁v₁ + ⋯ + cₖvₖ.

This is the fundamental building block for understanding both span and dependence.
Dependence asks: when does a linear combination equal zero with nonzero coefficients?
Span asks: what vectors can be reached as linear combinations?

10.2 Showing Linear Independence

🧭 Overview

🧠 One-sentence thesis

To prove a set of vectors is linearly independent, we must show that the only solution to the equation c₁v₁ + c₂v₂ + ⋯ + cₙvₙ = 0 is when all coefficients are zero.

📌 Key points (3–5)

What linear independence requires: every linear combination with non-vanishing coefficients must give something other than the zero vector.
How to test independence: check whether the homogeneous system (v₁ v₂ ⋯ vₙ) times the coefficient vector equals zero has only the trivial solution (all coefficients zero).
Determinant test: if the matrix M formed by the vectors has non-zero determinant, the vectors are linearly independent; if det(M) = 0, they are dependent.
Common confusion: linear dependence vs independence—dependence means some non-trivial combination equals zero; independence means only the trivial combination equals zero.
Practical implication: when vectors are dependent, we can remove redundant vectors and still span the same space.

🔍 Testing for linear independence

🔍 The core requirement

To show that the set v₁, v₂, …, vₙ is linearly independent, we must show that the equation c₁v₁ + c₂v₂ + ⋯ + cₙvₙ = 0 has no solutions other than c₁ = c₂ = ⋯ = cₙ = 0.

This is the only solution condition: the trivial solution (all zeros) must be the only one.
Contrast with dependence: if any non-trivial solution exists (at least one coefficient non-zero), the set is dependent.
The excerpt emphasizes "non-vanishing coefficients"—if all coefficients are zero, that doesn't count as evidence of dependence.

🧮 Rewriting as a homogeneous system

The excerpt shows how to convert the independence question into a matrix problem:

Write c₁v₁ + c₂v₂ + ⋯ + cₙvₙ = 0 as a matrix equation: (v₁ v₂ ⋯ vₙ) times the column vector of coefficients equals the zero vector.
Build matrix M whose columns are the vectors v₁, v₂, …, vₙ.
The system has non-trivial solutions if and only if M is singular (determinant equals zero).

Example: For three vectors in R³, form the 3×3 matrix with those vectors as columns, then compute the determinant.

🧪 Worked examples

🧪 Example with non-zero determinant (independent)

The excerpt gives vectors in R³:

v₁ = (0, 0, 2), v₂ = (2, 2, 1), v₃ = (1, 4, 3).
Form matrix M with these as columns.
Compute det(M) = 12 ≠ 0.
Conclusion: Since the determinant is non-zero, the only solution to the system is c₁ = c₂ = c₃ = 0, so the vectors are linearly independent.

🧪 Example with zero determinant (dependent)

The excerpt also shows bit-valued vectors in Z₂³:

Three vectors: (1, 1, 0), (1, 0, 1), (0, 1, 1).
Form the matrix and compute the determinant (in Z₂ arithmetic).
det = 0 (in Z₂, because −1 − 1 = 1 + 1 = 0).
Conclusion: Non-trivial solutions exist, so the set is not linearly independent (i.e., it is dependent).

Don't confuse: The determinant being zero means the matrix is singular, which means non-trivial solutions exist, which means the vectors are dependent. Non-zero determinant means independent.

🧹 Removing redundant vectors from dependent sets

🧹 Why we can drop vectors

The excerpt explains that when vectors v₁, …, vₙ are linearly dependent with c₁v₁ + c₂v₂ + ⋯ + cₙvₙ = 0 and c₁ ≠ 0, then:

span{v₁, …, vₙ} = span{v₂, …, vₙ}

Any vector x in the span of all n vectors can be rewritten as a combination of only the remaining n−1 vectors.
The excerpt shows the algebra: solve for v₁ in terms of the others, then substitute wherever v₁ appears.
Goal: When writing a vector space as the span of a list, we want the list to be as short as possible.

🧹 Iterating the procedure

The excerpt gives an example with polynomials of degree ≤ 2:

Five vectors: v₁ = 1 + t, v₂ = 1 + t², v₃ = t + t², v₄ = 2 + t + t², v₅ = 1 + t + t².
First, notice v₄ = v₁ + v₂, so v₄ is redundant.
Remove v₄: span{v₁, v₂, v₃, v₄, v₅} = span{v₁, v₂, v₃, v₅}.
Next, notice v₅ = (1/2)v₁ + (1/2)v₂ + (1/2)v₃, so v₅ is also redundant.
The excerpt shows we can keep removing extraneous vectors by expressing them as linear combinations of the remaining ones.

Example: Any expression involving v₄ can be rewritten by substituting v₄ = v₁ + v₂, eliminating v₄ from the span without losing any vectors.

📊 Summary table: dependence vs independence

Property	Linearly dependent	Linearly independent
Equation c₁v₁ + ⋯ + cₙvₙ = 0	Has non-trivial solutions (some cᵢ ≠ 0)	Only trivial solution (all cᵢ = 0)
Determinant of matrix M	det(M) = 0 (singular)	det(M) ≠ 0 (non-singular)
Geometric meaning	At least one vector is a combination of others	No vector is a combination of the others
Span behavior	Can remove at least one vector without changing span	Removing any vector shrinks the span

10.3 From Dependent to Independent

10.3 From Dependent Independent

🧭 Overview

🧠 One-sentence thesis

When vectors are linearly dependent, you can remove the redundant vectors one by one until you reach a minimal spanning set (a basis) where all remaining vectors are linearly independent.

📌 Key points (3–5)

Core mechanism: if vectors are linearly dependent, one vector can be expressed as a combination of the others, so you can drop it without changing the span.
Iterative removal: repeat the removal process until no more vectors can be eliminated—what remains is linearly independent.
Minimal spanning set (basis): a set of vectors that spans the space and is linearly independent; no vector can be removed without losing coverage.
Common confusion: linear dependence vs independence—dependence means at least one vector is redundant; independence means every vector is essential.
Why it matters: finding a basis gives the shortest list of vectors needed to describe the entire vector space.

🔄 Removing redundant vectors

🔄 The key equality

The excerpt proves that if vectors v₁, …, vₙ are linearly dependent with c₁v₁ + c₂v₂ + ⋯ + cₙvₙ = 0 (where c₁ ≠ 0), then:

span{v₁, …, vₙ} = span{v₂, …, vₙ}

In other words, v₁ is redundant: every vector in the span can still be reached using only v₂, …, vₙ.
The proof rewrites any linear combination involving v₁ by solving for v₁ from the dependence equation: v₁ = −(c₂/c₁)v₂ − ⋯ − (cₙ/c₁)vₙ.
Substituting this expression eliminates v₁ from the combination.

🧹 Why remove vectors

Goal: write the vector space as the span of the shortest possible list.
Method: iterate the removal procedure—each time you find a dependence relation, drop one redundant vector.
The excerpt states: "When we write a vector space as the span of a list of vectors, we would like that list to be as short as possible."

🧮 Worked example: polynomial space

🧮 Starting set

The excerpt revisits a previous example:

S = span{1 + t, 1 + t², t + t², 2 + t + t², 1 + t + t²}

(Five polynomial vectors.)

🧮 First removal

From an earlier example, the excerpt found v₄ = v₁ + v₂, i.e., 2 + t + t² = (1 + t) + (1 + t²).
So v₄ is redundant; substitute v₄ = v₁ + v₂ in any combination to eliminate v₄.
Result: S = span{1 + t, 1 + t², t + t², 1 + t + t²}.

🧮 Second removal

Now notice that 1 + t + t² = ½(1 + t) + ½(1 + t²) + ½(t + t²).
So v₅ is also redundant.
Result: S = span{1 + t, 1 + t², t + t²}.

🧮 Checking independence

The excerpt states: "you can check that there are no (non-zero) solutions to c₁(1 + t) + c₂(1 + t²) + c₃(t + t²) = 0."
This means the remaining three vectors are linearly independent.
No more vectors can be removed without losing the span.

🎯 Minimal spanning set (basis)

🎯 Definition

A basis for a vector space S is a set of vectors that:

spans S, and

is linearly independent.

The excerpt calls this a "minimal spanning set."
"Minimal" means you cannot remove any vector without shrinking the span.

🎯 Why independence matters

Linear independence ensures that every vector in the basis is essential.
If the set were dependent, you could still remove at least one vector, so it wouldn't be minimal.
Example: the final set {1 + t, 1 + t², t + t²} is a basis for S because it spans S and is linearly independent.

🎯 Don't confuse

Spanning set (not necessarily minimal): any collection of vectors whose span equals the space; may contain redundant vectors.
Basis (minimal spanning set): a spanning set with no redundancies; every vector is needed.

🔍 Summary of the procedure

Step	Action	Result
1. Start with a spanning set	May be linearly dependent	Span is correct but list may be too long
2. Find a dependence relation	Solve for one vector in terms of others	Identify a redundant vector
3. Remove the redundant vector	Substitute and drop it	Span unchanged, list shorter
4. Repeat until independent	Check for more dependence relations	Final set is a basis

The excerpt emphasizes iteration: "This can be achieved by iterating the above procedure."
The process stops when the remaining vectors are linearly independent.

10.4 Review Problems

🧭 Overview

🧠 One-sentence thesis

A basis is a minimal linearly independent set of vectors that spans a vector space, and every finite-dimensional vector space has a well-defined dimension equal to the number of vectors in any basis.

📌 Key points (3–5)

What a basis is: a set of vectors that is both linearly independent and spans the entire vector space.
Dimension: the number of vectors in a basis; for finite-dimensional spaces, this number is the same regardless of which basis you choose.
How to find a basis: remove vectors that can be written as linear combinations of preceding ones until you have a linearly independent spanning set.
Common confusion: a spanning set is not automatically a basis—it must also be linearly independent (no redundant vectors).
Practical tool: Gaussian elimination (RREF) helps determine linear independence and spanning properties.

🧩 Core concept: Basis

🧩 What a basis is

Basis: A set S is a basis for a vector space V if S is linearly independent and V = span S.

A basis is a "minimal spanning set"—you cannot remove any vector without losing the ability to span the entire space.
The excerpt emphasizes that once you have a spanning set, you can remove vectors until the remaining ones are linearly independent; what remains is a basis.
Example: If you have three vectors that span a space S and they are linearly independent (no non-zero solutions to the zero linear combination), then those three vectors form a basis for S.

📏 Dimension

Finite-dimensional: A vector space V is finite-dimensional if it has a basis S with only finitely many elements.

Dimension: The number of vectors in a basis S is the dimension of V.

The excerpt notes that if two different bases S and T exist for the same space V, one might worry they have different sizes—but the definition implies dimension is well-defined (the same for any basis).
Don't confuse: the number of vectors in a spanning set can vary, but the number in a basis (a minimal spanning set) is always the same for a given space.

🔍 Testing for linear independence and spanning

🔍 Using Gaussian elimination (RREF)

The excerpt describes using the reduced row echelon form (RREF) of a matrix M whose columns are the vectors in question:

RREF result	Linear independence?	Spanning ℝⁿ?	What it means
Identity matrix	Yes	Yes (if m = n)	The vectors form a basis
Has a row of zeros	Depends	No (if m < n)	Not enough vectors to span; may or may not be independent
Neither	Depends	Depends	Check the structure for dependencies

The excerpt asks: "If they are linearly dependent, does RREF(M) tell you which vectors could be removed to yield an independent set?"—yes, the pivot columns correspond to independent vectors.
Example: If RREF shows the second column is a combination of the first, you can remove the second vector.

🧮 Step-by-step: removing dependent vectors

The excerpt outlines a procedure (Problem 3):

Form a matrix M with the vectors as columns.
Compute RREF(M) to see dependencies.
Write each vector as a linear combination of preceding ones (if possible).
Remove vectors that are combinations of earlier ones to get a linearly independent set.

This process yields a basis by eliminating redundancy.
Don't confuse: the order matters here—you check each vector against the ones before it, not all vectors simultaneously.

🧷 Worked examples from the problems

🧷 Bit-valued vectors over ℤ₂ (Problem 1)

The excerpt introduces Bₙ, the space of n × 1 column vectors with entries in {0, 1} and arithmetic mod 2.

(a) How many vectors? There are 2ⁿ different vectors in Bₙ (each of n positions can be 0 or 1).
(b) Find a basis of B₃: You need a linearly independent spanning set.
(c) Express other vectors: Once you have a basis, every other vector in B₃ can be written as a linear combination (with coefficients 0 or 1, mod 2) of the basis vectors.
(d) Can two vectors span B₃? The hint suggests thinking about dimension—if B₃ has dimension 3, you need at least 3 linearly independent vectors to span it.

🧷 Standard basis vectors in ℝⁿ (Problem 2)

The excerpt defines eᵢ as the vector in ℝⁿ with 1 in the i-th position and 0 elsewhere.

(a) Linear independence: The set {e₁, …, eₙ} is linearly independent (no non-trivial combination gives zero).
(b) Expressing an arbitrary vector: Any vector v in ℝⁿ can be written as v = sum from i=1 to n of (v · eᵢ) eᵢ (where v · eᵢ picks out the i-th coordinate).
(c) What is span{e₁, …, eₙ}? It is all of ℝⁿ—so {e₁, …, eₙ} is a basis for ℝⁿ.

Example: In ℝ³, the vectors (1,0,0), (0,1,0), (0,0,1) form the standard basis; any vector (a,b,c) = a(1,0,0) + b(0,1,0) + c(0,0,1).

🧷 Checking a set from ℝ³ (Problem 3)

The excerpt gives four vectors from ℝ³: (1,2,3), (2,4,6), (1,0,1), (1,4,5).

(a) Use RREF: Put these as columns of a matrix M and compute RREF(M) to check linear independence.
(b) Express as combinations: If a vector is a multiple or combination of earlier ones, write it out.
- Notice (2,4,6) = 2·(1,2,3), so it is dependent on the first vector.
(c) Remove dependent vectors: Drop (2,4,6) to get a linearly independent set.
- The remaining vectors should form a basis for their span (which may or may not be all of ℝ³, depending on the rank).

Don't confuse: removing a dependent vector does not change the span, but it does make the set linearly independent.

📊 General principles (Problem 4)

📊 Three cases for RREF

The excerpt asks you to consider three scenarios for a matrix M with columns (v₁, v₂, …, vₘ) ⊂ ℝⁿ:

RREF(M) is the identity matrix:
- The vectors are linearly independent.
- If m = n, they also span ℝⁿ (so they form a basis).
- Example: The standard basis vectors in ℝⁿ.
RREF(M) has a row of zeros:
- There are fewer pivot rows than n, so the vectors do not span ℝⁿ.
- They may or may not be linearly independent (depends on whether there are also zero columns or free variables).
- Example: Two vectors in ℝ³ can be independent but cannot span all of ℝ³.
Neither (a) nor (b):
- RREF is not the identity and has no zero rows.
- This means m > n (more vectors than dimensions), so the vectors are linearly dependent.
- They may span ℝⁿ if the rank equals n.
- Example: Four vectors in ℝ³—at least one must be redundant.

📊 Which vectors to remove

RREF tells you: the pivot columns correspond to linearly independent vectors; non-pivot columns are combinations of earlier pivot columns.
To get a basis, keep only the vectors corresponding to pivot columns.
Example: If RREF shows pivots in columns 1, 3, and 4, then v₁, v₃, v₄ are independent; you can remove v₂.

11.1 Bases in Rⁿ

11.1 Bases in R n

🧭 Overview

🧠 One-sentence thesis

Bases allow us to uniquely represent abstract vectors as column vectors and linear transformations as matrices, and any two bases for a finite-dimensional vector space must contain the same number of vectors.

📌 Key points (3–5)

Uniqueness of representation: Every vector in a vector space can be written in exactly one way as a linear combination of basis vectors.
All bases have the same size: Any two bases for the same finite-dimensional vector space contain the same number of vectors (this number is the dimension).
Testing for a basis in Rⁿ: A set of n vectors forms a basis for Rⁿ if and only if the matrix formed by those vectors has nonzero determinant.
Common confusion: Bases are not unique—infinitely many different bases exist for the same space—but the number of vectors in any basis is always the same.
Why it matters: Bases convert abstract vector space problems into concrete matrix computations.

🔑 Uniqueness of basis representation

🔑 One representation per vector

If S is a basis for a vector space V, then every vector w in V can be written uniquely as a linear combination of the vectors in S.

"Uniquely" means there is exactly one set of coefficients that works.
The excerpt proves this by contradiction: if two different sets of coefficients both worked, you could use their difference to write one basis vector as a combination of others, contradicting linear independence.
Example: If w = c₁v₁ + c₂v₂ + ... + cₙvₙ and also w = d₁v₁ + d₂v₂ + ... + dₙvₙ, then cᵢ = dᵢ for every i.

📝 Converting to column vectors

By ordering the basis set S, we obtain an ordered basis B = (v₁, ..., vₙ).
We can then write any vector w as a column vector of coefficients with respect to that basis.
Don't confuse: The column vector representation depends on the chosen basis—most vector spaces are not made from columns of numbers, so dropping the basis subscript makes no sense.

🔢 All bases have the same size

🔢 The replacement lemma

If S = {v₁, ..., vₙ} is a basis for V and T = {w₁, ..., wₘ} is a linearly independent set in V, then m ≤ n.

The proof works by replacing vectors in S one at a time with vectors from T, maintaining a basis at each step.
Key mechanism: Start with {w₁, v₁, ..., vₙ}, which is linearly dependent. Express w₁ in terms of the vᵢ, then discard one vᵢ to get a new basis S₁.
After each replacement, the new set S₁ = {w₁, v₁, ..., vᵢ₋₁, vᵢ₊₁, ..., vₙ} is still a basis (both linearly independent and spanning).
If m > n, we would eventually have more w vectors than we can fit, leading to a contradiction of T's linear independence.

🎯 Same dimension for all bases

For a finite-dimensional vector space V, any two bases for V have the same number of vectors.

Proof: If S has n vectors and T has m vectors, the replacement lemma gives m ≤ n. Swapping the roles of S and T gives n ≤ m. Therefore m = n.
This common number is called the dimension of the vector space.
Example: Rⁿ has dimension n, because the standard basis has n vectors.

📐 The standard basis for Rⁿ

📐 What it is

The standard (or canonical) basis for Rⁿ consists of vectors e₁, e₂, ..., eₙ, where eᵢ has a 1 in the iᵗʰ position and 0s everywhere else.

Written as an ordered basis: (e₁, e₂, ..., eₙ).
Each vector eᵢ points along the iᵗʰ coordinate axis and has unit length.
In multivariable calculus for R³, this is often written {î, ĵ, k̂}.
The excerpt confirms this set is linearly independent and spans Rⁿ, so dim Rⁿ = n.

🔄 Bases are not unique

While representation in a given basis is unique, the choice of basis itself is far from unique.
Example: Both {(1, 0), (0, 1)} and {(1, 1), (1, -1)} are bases for R².
Even requiring all basis vectors to have unit length still leaves infinitely many bases.
Rescaling any vector in a basis produces a new basis, so there are infinitely many possibilities.

✅ Testing whether a set is a basis for Rⁿ

✅ The determinant test

A set S = {v₁, ..., vₘ} of vectors in Rⁿ is a basis for Rⁿ if and only if m equals n and det M ≠ 0, where M is the matrix whose columns are the vectors in S.

To check if S is a basis, we must verify:
1. Linear independence: no nontrivial solution to 0 = x₁v₁ + ... + xₙvₙ.
2. Spanning: every vector w in Rⁿ can be written as w = x₁v₁ + ... + xₙvₙ.
Both conditions translate to: the matrix M must be invertible.
Invertibility is equivalent to det M ≠ 0.

🔍 Alternative test using RREF

The excerpt notes that S is a basis if and only if RREF(M) = I (the identity matrix).
This is another way to check invertibility without computing the determinant.

💡 Example

For S = {(1, 0), (0, 1)}, the matrix Mₛ = [[1, 0], [0, 1]]. Since det Mₛ = 1 ≠ 0, S is a basis for R².
For T = {(1, 1), (1, -1)}, the matrix Mₜ = [[1, 1], [1, -1]]. Since det Mₜ = -2 ≠ 0, T is a basis for R².

🔗 Matrices of linear transformations

🔗 From transformations to matrices

Bases allow us to express linear transformations as matrices, enabling concrete computation.
Suppose L: V → W is a linear transformation, with ordered bases E = (e₁, ..., eₙ) for V and F = (f₁, ..., fₘ) for W.
For each basis vector eⱼ in V, L(eⱼ) is a vector in W, so it can be written uniquely as a linear combination of the fᵢ.

🧮 Building the matrix

Write L(eⱼ) = f₁m₁ⱼ + f₂m₂ⱼ + ... + fₘmₘⱼ.
The number mᵢⱼ is the iᵗʰ component of L(eⱼ) in the basis F.
The jᵗʰ column of the matrix M is the column vector of coefficients [m₁ⱼ, m₂ⱼ, ..., mₘⱼ].
For any vector v = e₁v₁ + ... + eₙvₙ, linearity gives:
- L(v) = v₁L(e₁) + v₂L(e₂) + ... + vₙL(eₙ)
- This equals the matrix M times the column vector [v₁, v₂, ..., vₙ].
Don't confuse: The matrix representation depends on the choice of both input and output bases.

Matrix of a Linear Transformation (Redux)

11.2 Matrix of a Linear Transformation (Redux)

🧭 Overview

🧠 One-sentence thesis

Bases allow us to represent any linear transformation as a matrix by expressing how the transformation acts on each input basis vector in terms of the output basis vectors.

📌 Key points (3–5)

What the matrix represents: The matrix of a linear transformation L with respect to chosen bases E and F has columns that record how L acts on each input basis vector, expanded in the output basis.
How to build the matrix: Apply L to each input basis vector e_j, express the result as a linear combination of output basis vectors f_i, and the coefficients m_ij form the j-th column of the matrix.
Why bases matter: The same linear transformation has different matrix representations depending on which bases you choose for the input and output spaces.
Common confusion: The matrix changes when you change either the input basis or the output basis—it is not an intrinsic property of the transformation alone.
Standard basis simplification: When using the standard basis in R^n, the matrix of L is simply the matrix whose i-th column is L(e_i).

🏗️ Building the matrix from basis vectors

🏗️ The construction process

The excerpt describes a systematic way to build a matrix M for a linear transformation L: V → W given ordered bases E = (e_1, ..., e_n) for V and F = (f_1, ..., f_m) for W.

Step-by-step:

For each input basis vector e_j, compute L(e_j), which is a vector in W.
Express L(e_j) as a linear combination of the output basis vectors: L(e_j) = f_1 m_1j + ... + f_m m_mj.
The coefficients m_1j, ..., m_mj form the j-th column of the matrix M.

The number m_ij is the i-th component of L(e_j) in the basis F.

Why this works:

Any vector v in V can be written as v = e_1 v_1 + ... + e_n v_n.
By linearity, L(v) = v_1 L(e_1) + v_2 L(e_2) + ... + v_n L(e_n).
Substituting the expansions of each L(e_j) and collecting terms gives the matrix-vector product.

📐 The matrix equation

The excerpt shows that the action of L on a vector v (with coordinates v_1, ..., v_n in basis E) can be written as:

L applied to the column vector (v_1, ..., v_n) in basis E equals the matrix M times that column vector, with the result expressed in basis F.

In notation:

Input: column vector with entries v_j (coordinates in basis E)
Output: M times that column vector (coordinates in basis F)
The matrix M has entries m_ij where the j-th column records L(e_j) in the F basis.

🔄 Basis dependence

The excerpt emphasizes: "This matrix will change if we change either of the bases."

What this means:

The matrix M is not just a property of L alone.
It depends on both the choice of input basis E and output basis F.
Different bases for the same spaces produce different matrices for the same transformation.

Don't confuse: The linear transformation L is a geometric/algebraic object that exists independently of coordinates, but its matrix representation is always relative to a choice of bases.

🎯 Example: Polynomial transformation

🎯 Non-standard basis example

The excerpt provides Example 123: L: P_1(t) → P_1(t) defined by L(a + bt) = (a + b)t.

Setup:

Both input and output space are P_1(t) (polynomials of degree at most 1).
Chosen basis B = (1 - t, 1 + t) for both input and output.

Computing the matrix:

Apply L to the first basis vector: L(1 - t) = (1 - 1)t = 0 = (1 - t)·0 + (1 + t)·0, so the first column is (0, 0).
Apply L to the second basis vector: L(1 + t) = (1 + 1)t = 2t = (1 - t)·(-1) + (1 + t)·1, so the second column is (-1, 1).
The matrix is M = [[0, -1], [0, 1]].

Interpretation:

The first column records how L(1 - t) is expressed in basis B.
The second column records how L(1 + t) is expressed in basis B.
To find L of any polynomial (with coordinates (a, b) in basis B), multiply M by the column vector (a, b).

🔢 The standard basis case

🔢 Standard basis vectors in R^n

The excerpt defines the standard ordered basis (e_1, ..., e_n) for R^n:

e_i has a 1 in the i-th position and 0s everywhere else.
Example: e_1 = (1, 0, ..., 0), e_2 = (0, 1, ..., 0), etc.

Key observation: For any matrix M, the product M e_i equals the i-th column of M.

🔢 Finding the matrix in the standard basis

The excerpt states: "the matrix representing L in the standard basis is just the matrix whose i-th column is L(e_i)."

Why this is simpler:

You don't need to solve for coefficients—L(e_i) is already a column vector in R^n.
Just compute L(e_1), L(e_2), ..., L(e_n) and place them as columns.

Example from the excerpt (Example 124):

Given: L(e_1) = (1, 4, 7), L(e_2) = (2, 5, 8), L(e_3) = (3, 6, 9).
The matrix is M = [[1, 2, 3], [4, 5, 6], [7, 8, 9]].

🔢 Alternative presentation

The excerpt shows that transformation information is often given as:

L applied to (x, y, z) equals (x + 2y + 3z, 4x + 5y + 6z, 7x + 8y + 9z).

Two ways to extract the matrix:

Method	Description
Direct	Rewrite as L(x, y, z) = M times (x, y, z) and read off M
Circuitous	Expand (x, y, z) = x e_1 + y e_2 + z e_3, apply linearity, collect terms

Both give the same matrix M = [[1, 2, 3], [4, 5, 6], [7, 8, 9]].

Don't confuse: The "circuitous route" is not a different method—it's just making the linearity argument explicit. The direct method is faster but relies on recognizing the matrix-vector product form.

🔗 Connection to basis testing

🔗 Using determinants to check bases

The excerpt briefly recalls Theorem 11.1.1 from the previous section:

Let S = {v_1, ..., v_m} be a collection of vectors in R^n. Let M be the matrix whose columns are the vectors in S. Then S is a basis for V if and only if m is the dimension of V and det M ≠ 0.

Why this matters for transformations:

To build a matrix for L, you need bases for both V and W.
You can verify that your chosen sets are actually bases by forming a matrix from the vectors and checking the determinant.

🔗 Example verification

Example 122 checks two bases for R^2:

S = {(1, 0), (0, 1)}: matrix M_S = [[1, 0], [0, 1]], det M_S = 1 ≠ 0, so S is a basis.
T = {(1, 1), (1, -1)}: matrix M_T = [[1, 1], [1, -1]], det M_T = -2 ≠ 0, so T is a basis.

Remark: The excerpt also notes that S is a basis if and only if the reduced row echelon form of M is the identity matrix I.

11.3 Review Problems

🧭 Overview

🧠 One-sentence thesis

These review problems consolidate understanding of how to identify bases, compute matrix representations of linear transformations, and apply dimension theory to various vector spaces.

📌 Key points (3–5)

Matrix representation from basis images: The matrix of a linear transformation L in the standard basis has L(eᵢ) as its i-th column.
Basis criteria: A set of n vectors in an n-dimensional space forms a basis if they are linearly independent OR if they span the space (either condition implies the other).
Unique representation property: If every vector can be written uniquely as a linear combination of a set S, then S is a basis.
Common confusion: Don't confuse "n vectors" with "basis for n-dimensional space"—the vectors must also be linearly independent (or span the space).
Applications: Bases exist for many vector spaces beyond ℝⁿ, including spaces of matrices (symmetric, anti-symmetric) and linear transformations.

🔢 Matrix representation of linear transformations

🔢 Standard basis construction

The matrix representing L in the standard basis is the matrix whose i-th column is L(eᵢ).

The excerpt shows that if you know where L sends each standard basis vector, you immediately know the matrix.
Example: If L sends (1,0,0) to (1,4,7), (0,1,0) to (2,5,8), and (0,0,1) to (3,6,9), then the matrix is the 3×3 matrix with those three vectors as columns.

🔄 Alternative presentation

The excerpt presents two equivalent ways to extract the matrix:

Method	Description	When to use
Direct rewrite	Given L(x,y,z) = (x+2y+3z, 4x+5y+6z, 7x+8y+9z), recognize the coefficients form the matrix columns	When L is given as a formula
Circuitous route	Expand L(x,y,z) = xL(e₁) + yL(e₂) + zL(e₃), then collect coefficients	When you want to see the linearity explicitly

Both methods yield the same matrix.
The key insight: linearity means L(xv + yw) = xL(v) + yL(w), so knowing L on a basis determines L everywhere.

🧩 Basis identification problems

🧩 Unit vector bases (Problem 1)

The problem asks when adding a unit vector x to {(1,0)} or to {(1,0,0), (0,1,0)} gives a basis.

In ℝ²: Sₓ = {(1,0), x} is a basis for ℝ² when x is any unit vector except (1,0) and (-1,0).
- Why: Two vectors form a basis in ℝ² if and only if they are not parallel (linearly independent).
- Don't confuse: "unit vector" does not guarantee linear independence; direction matters.
In ℝ³: Sₓ = {(1,0,0), (0,1,0), x} is a basis for ℝ³ when x is any unit vector not in the xy-plane.
- Why: Three vectors form a basis in ℝ³ if the third is not in the span of the first two.
Generalization to ℝⁿ: The pattern extends—adding a vector to n-1 standard basis vectors gives a basis if and only if the new vector is not in the span of those n-1 vectors.

🔢 Bit vector bases (Problem 2)

The problem asks for the number of bases in Bₙ, the vector space of n-dimensional column vectors with entries 0 or 1.

Building a basis: Choose vectors one at a time such that each new vector is not in the span of the previous ones.
Hint interpretation:
- One vector spans itself (2 vectors total in its span, including zero).
- Two vectors span at most 4 vectors.
- k vectors span at most 2^k vectors.
The problem asks for a conjecture about the number of bases for Bₙ.

🎯 Basis characterization theorems

🎯 n independent vectors in n-dimensional space (Problem 3a)

In an n-dimensional vector space V, any n linearly independent vectors form a basis.

Why this works:
- Start with n linearly independent vectors {w₁, ..., wₙ} and a known basis {v₁, ..., vₙ}.
- Apply Lemma 11.0.2 (referenced but not shown in excerpt).
- Since the space is n-dimensional, n independent vectors must span the entire space.

🎯 n spanning vectors in n-dimensional space (Problem 3b)

In an n-dimensional vector space V, any set of n vectors that span V forms a basis.

Proof strategy:
- Suppose you have n vectors that span V but do not form a basis.
- Then they must be linearly dependent (not independent).
- Remove dependent vectors to get a smaller spanning set.
- Use Corollary 11.0.3 to derive a contradiction (a basis must have exactly n vectors).
Don't confuse: "spanning" alone doesn't guarantee basis; you also need the right number of vectors (n) in an n-dimensional space.

🔑 Unique representation criterion (Problem 4)

If every vector w in V can be expressed uniquely as a linear combination of vectors in S = {v₁, ..., vₙ}, then S is a basis of V.

What to show:
1. S is linearly independent.
2. S spans V.
Why uniqueness matters:
- Spanning: Every w can be written as a combination (given).
- Independence: If c₁v₁ + ... + cₙvₙ = 0 had a non-trivial solution, then the zero vector would have multiple representations (contradiction).
This is the converse to Theorem 11.0.1 (referenced but not shown).

🏗️ Vector spaces of transformations and matrices

🏗️ Linear transformations as a vector space (Problem 5)

The problem asks to show that all linear transformations ℝ³ → ℝ form a vector space and to find a basis.

Hint interpretation:
- Represent ℝ³ as column vectors.
- A linear transformation T: ℝ³ → ℝ is just a row vector (1×3 matrix).
- Example: T(x,y,z) = ax + by + cz corresponds to the row vector [a b c].
Basis: Three transformations corresponding to [1 0 0], [0 1 0], [0 0 1].
- Any T can be written uniquely as a linear combination of these three.
Generalization: The same argument works for ℝⁿ → ℝ (n basis transformations), ℝⁿ → ℝᵐ (n×m basis transformations).

🔲 Symmetric and anti-symmetric matrices (Problem 6)

Sₙ := {M : ℝⁿ → ℝⁿ | M = Mᵀ} (symmetric matrices)
Aₙ := {M : ℝⁿ → ℝⁿ | M = -Mᵀ} (anti-symmetric matrices)

Finding bases:
- Use the matrices Fᵢⱼ that have a 1 in the i-th row and j-th column, 0 elsewhere.
- Note: {Fᵢⱼ | 1 ≤ i ≤ r, 1 ≤ j ≤ k} is a basis for all r×k matrices.
For S₃ (3×3 symmetric):
- A symmetric matrix has M = Mᵀ, so Mᵢⱼ = Mⱼᵢ.
- Only the upper triangle (including diagonal) is independent: 6 basis matrices.
For A₃ (3×3 anti-symmetric):
- An anti-symmetric matrix has M = -Mᵀ, so Mᵢⱼ = -Mⱼᵢ and diagonal entries are 0.
- Only the strictly upper triangle is independent: 3 basis matrices.
General pattern: Describe bases for Sₙ and Aₙ using combinations of Fᵢⱼ matrices that respect the symmetry/anti-symmetry constraint.

🧮 Matrix representation with non-standard bases (Problem 7)

🧮 When output basis is images of input basis (7a)

Given: L: V → W with input basis B = (v₁, ..., vₙ) and output basis B' = (L(v₁), ..., L(vₙ)).

What the matrix looks like: Each L(vᵢ) is expressed in the output basis B'.
Since B' consists of the images themselves, L(vᵢ) = 1·L(vᵢ) + 0·L(v₂) + ... in the B' basis.
The matrix is the identity matrix.

🧮 Diagonal transformations (7b)

Given: L: V → V with B = B' = (v₁, ..., vₙ) and L(vᵢ) = λᵢvᵢ for all i.

What the matrix looks like: L(vᵢ) = λᵢvᵢ means the i-th column has λᵢ in the i-th position and 0 elsewhere.
The matrix is diagonal with entries λ₁, λ₂, ..., λₙ on the diagonal.
Example: If L(v₁) = 3v₁, L(v₂) = -2v₂, L(v₃) = 5v₃, the matrix is diag(3, -2, 5).

Invariant Directions

12.1 Invariant Directions

🧭 Overview

🧠 One-sentence thesis

A linear transformation can have special directions (eigenvectors) that it leaves unchanged except for scaling, and finding these directions allows us to represent the transformation with a simple diagonal matrix.

📌 Key points (3–5)

Invariant direction: a direction that a linear transformation stretches or shrinks but does not rotate.
Eigenvector and eigenvalue: a non-zero vector v satisfying Lv = λv is an eigenvector with eigenvalue λ; any scalar multiple cv is equally valid.
How to find them: solve the characteristic polynomial det(λI − M) = 0 for eigenvalues, then solve (M − λI)v = 0 for each eigenvalue to get eigenvectors.
Common confusion: multiplicity vs number of independent eigenvectors—an eigenvalue can appear multiple times (multiplicity) but may have fewer independent eigenvectors.
Why it matters: if a transformation has enough independent eigenvectors to form a basis, its matrix becomes diagonal in that basis, making calculations much simpler (diagonalization).

🎯 The core idea: invariant directions

🎯 What an invariant direction means

Invariant direction: a direction in which a linear transformation L acts only by stretching or shrinking, not rotating.

If Lv = λv for some non-zero vector v and scalar λ, then L leaves the direction of v unchanged.
The transformation may stretch (λ > 1), shrink (0 < λ < 1), reverse (λ < 0), or even collapse (λ = 0) the vector, but the line through v stays the same.
Example: L maps (3, 5) to itself, so the direction of (3, 5) is invariant with no stretching (λ = 1).

🔢 Eigenvector and eigenvalue definitions

Eigenvector: a non-zero vector v such that Lv = λv for some scalar λ.
Eigenvalue: the scalar λ corresponding to an eigenvector.

Any scalar multiple cv (c ≠ 0) is also an eigenvector with the same eigenvalue, because L(cv) = cL(v) = λcv.
The direction is what matters; different eigenvectors pointing the same way are equivalent.
Example: if L(1, 2) = 2(1, 2), then (1, 2) is an eigenvector with eigenvalue 2, and so is (5, 10) or any other multiple.

🧮 The eigenvalue–eigenvector equation

The fundamental equation is Lv = λv, which can be rewritten as:

(M − λI)v = 0 in matrix form, where M is the matrix of L and I is the identity.
This is a homogeneous system that has non-zero solutions only when M − λI is singular (non-invertible).
Don't confuse: this is not asking "what does L do to v?" but "for which v does L only scale v?"

🔍 Finding eigenvectors and eigenvalues

🔍 Step 1: The characteristic polynomial

Characteristic polynomial: P_M(λ) = det(λI − M), whose roots are the eigenvalues.

For an n×n matrix, this is a degree-n polynomial in λ.
Setting det(λI − M) = 0 ensures that (M − λI) is singular, so non-zero solutions exist.
Example: for M = (−4, 3; −10, 7), the characteristic polynomial is (λ − 1)(λ − 10), giving eigenvalues λ = 1 and λ = 10.

🔍 Step 2: Solve for each eigenvalue

Once you have eigenvalues λ₁, λ₂, ..., solve the system (M − λᵢI)v = 0 for each λᵢ:

Plug λᵢ into M − λᵢI and row-reduce the augmented matrix.
The solution space gives all eigenvectors for that eigenvalue.
Example: for λ = 10, solving (−8, 2; 16, −4)(x, y) = (0, 0) gives y = 4x, so any (x, 4x) works; choose (1, 4) for simplicity.

🔍 Step 3: Interpret the solution space

If the solution is one-dimensional (one free parameter), you get a line of eigenvectors.
If two-dimensional (two free parameters), you get a plane of eigenvectors.
Don't confuse: the number of free parameters is the dimension of the eigenspace, which may be less than the multiplicity of the eigenvalue.

🎨 Why invariant directions matter: diagonalization

🎨 Writing vectors in the eigenvector basis

If v₁ and v₂ are eigenvectors with eigenvalues λ₁ and λ₂, and any vector w can be written as w = rv₁ + sv₂:

Then L(w) = rL(v₁) + sL(v₂) = rλ₁v₁ + sλ₂v₂.
In the eigenvector basis, L just multiplies the coordinates by the eigenvalues.
Example: if w = rv₁ + sv₂ with λ₁ = 1 and λ₂ = 2, then L multiplies the v₁-component by 1 and the v₂-component by 2.

🎨 Diagonal matrix representation

Diagonalization: representing L by a diagonal matrix whose entries are the eigenvalues.

In the eigenvector basis, the matrix of L is diagonal: (λ₁, 0; 0, λ₂) for a 2×2 case.
This is much simpler than the original matrix, where each output mixes both inputs.
Example: instead of (ax + by, cx + dy), you get (λ₁s, λ₂t) where s and t are coordinates in the eigenvector basis.

🎨 When diagonalization is possible

You need enough independent eigenvectors to form a basis.
If an n×n matrix has n linearly independent eigenvectors, it can be diagonalized.
Don't confuse: having n eigenvalues (counting multiplicity) does not guarantee n independent eigenvectors; some matrices cannot be diagonalized.

🧩 Eigenspaces and linear combinations

🧩 Eigenspace definition

Eigenspace for λ: the set of all vectors v (including the zero vector) such that Lv = λv.

This is a subspace: it contains the zero vector and is closed under addition and scalar multiplication.
Any linear combination of eigenvectors with the same eigenvalue is also an eigenvector with that eigenvalue.
Example: if v₁ and v₂ both have eigenvalue 1, then any c₁v₁ + c₂v₂ also has eigenvalue 1.

🧩 Why sums of eigenvectors work

If Lv₁ = λv₁ and Lv₂ = λv₂ (same λ), then:

L(c₁v₁ + c₂v₂) = c₁Lv₁ + c₂Lv₂ (by linearity)
= c₁λv₁ + c₂λv₂ = λ(c₁v₁ + c₂v₂) (factoring out λ)
So the sum is also an eigenvector with eigenvalue λ.
Don't confuse: this only works if the eigenvalues are the same; you cannot add eigenvectors with different eigenvalues and expect the sum to be an eigenvector.

🧩 Dimension of eigenspaces

The dimension of an eigenspace is the number of free parameters when solving (M − λI)v = 0.
This can be less than the multiplicity of λ in the characteristic polynomial.
Example: eigenvalue λ = 1 with multiplicity 2 might have a two-dimensional eigenspace (a plane) or only a one-dimensional eigenspace (a line).

Orthonormal Bases and Complements

12.2 The Eigenvalue–Eigenvector Equation

🧭 Overview

🧠 One-sentence thesis

Orthonormal bases allow vectors to be decomposed easily using dot products, and any vector space can be split into a subspace and its orthogonal complement, providing a natural geometric structure.

📌 Key points (3–5)

Orthonormal bases: vectors that are mutually perpendicular and each has unit length; they generalize the standard basis.
Easy coefficient formula: if {u₁, …, uₙ} is orthonormal, any vector v equals the sum of (v · uᵢ)uᵢ—no solving systems needed.
Gram–Schmidt procedure: an algorithm to convert any linearly independent set into an orthogonal (or orthonormal) basis by subtracting projections.
Orthogonal complements: for any subspace U, the orthogonal complement U⊥ consists of all vectors perpendicular to U, and the whole space is the direct sum U ⊕ U⊥.
Common confusion: orthogonal matrices have orthonormal columns (unit length), not merely orthogonal columns; change-of-basis matrices between orthonormal bases are orthogonal matrices.

🧩 What makes a basis orthonormal

🧩 Orthogonal vs orthonormal

Orthogonal basis {v₁, …, vₙ}: vᵢ · vⱼ = 0 if i ≠ j (all vectors perpendicular to each other).

Orthonormal basis {u₁, …, uₙ}: uᵢ · uⱼ = δᵢⱼ (orthogonal and each vector has length 1).

The Kronecker delta δᵢⱼ is 1 when i = j and 0 otherwise.
The standard basis {e₁, …, eₙ} in Rⁿ is orthonormal: each eᵢ has length 1 and eᵢ · eⱼ = 0 for i ≠ j.
Example: In R³, {(1,0,0), (0,1,0), (0,0,1)} is orthonormal; {(2,0,0), (0,3,0), (0,0,1)} is orthogonal but not orthonormal.

🔍 Why orthonormality simplifies coordinates

For an orthonormal basis {u₁, …, uₙ}, any vector v can be written v = c₁u₁ + ⋯ + cₙuₙ.
To find cᵢ, just compute v · uᵢ: because uᵢ · uⱼ = δᵢⱼ, all cross-terms vanish and v · uᵢ = cᵢ.
Formula: v = Σᵢ (v · uᵢ)uᵢ.
No need to solve a system of equations; the coefficients are immediate dot products.

📐 Inner products and orthonormal bases

In general vector spaces (not just Rⁿ), an inner product ⟨·, ·⟩ replaces the dot product.
If {u₁, …, uₙ} is orthonormal with respect to ⟨·, ·⟩, then ⟨uᵢ, uⱼ⟩ = δᵢⱼ.
The same coordinate formula holds: v = Σᵢ ⟨v, uᵢ⟩uᵢ.
Once you express vectors in an orthonormal basis, the inner product ⟨v, v′⟩ equals the dot product of their coordinate vectors.
Example: For polynomials V = span{1, x} with inner product ⟨p, q⟩ = ∫₀¹ p(x)q(x)dx, the set {1, 2√3(x − 1/2)} is orthonormal; any polynomial a + bx has coordinates (a + b/2, b/(2√3)) in this basis, and the inner product of two polynomials equals the dot product of their coordinate vectors.

🔄 Changing between orthonormal bases

🔄 Orthogonal matrices

Orthogonal matrix P: a square matrix such that P⁻¹ = Pᵀ.

If {u₁, …, uₙ} and {w₁, …, wₙ} are two orthonormal bases, the change-of-basis matrix P has entries pⱼᵢ = uⱼ · wᵢ.
The excerpt shows PPᵀ = Iₙ, so P is orthogonal.
Key property: the columns of an orthogonal matrix form an orthonormal set.
Don't confuse: "orthogonal matrix" means the columns are orthonormal (unit length), not just orthogonal.

🔄 Symmetry from orthogonal change of basis

If D is a diagonal matrix and M = PDPᵀ for an orthogonal matrix P, then M is symmetric: Mᵀ = (PDPᵀ)ᵀ = PDPᵀ = M.
This is because Dᵀ = D (diagonal matrices are symmetric) and (Pᵀ)ᵀ = P.
Orthogonal changes of basis preserve symmetry and are central to diagonalization of symmetric matrices.

🛠️ Gram–Schmidt orthogonalization

🛠️ The core idea: subtracting projections

Given a linearly independent set {v₁, v₂, …}, Gram–Schmidt builds an orthogonal set {v₁⊥, v₂⊥, …} spanning the same space.
Start with v₁⊥ := v₁.
For each subsequent vector vᵢ, subtract its projections onto all previous orthogonal vectors:
- v₂⊥ = v₂ − [(v₁⊥ · v₂)/(v₁⊥ · v₁⊥)]v₁⊥
- v₃⊥ = v₃ − [(v₁⊥ · v₃)/(v₁⊥ · v₁⊥)]v₁⊥ − [(v₂⊥ · v₃)/(v₂⊥ · v₂⊥)]v₂⊥
- and so on.
Each vᵢ⊥ is orthogonal to all previous vⱼ⊥ (j < i).
To get an orthonormal basis, divide each vᵢ⊥ by its length.

🛠️ Orthogonal decomposition

For a single vector v and a unit vector u, write v = v‖ + v⊥ where:
- v‖ = [(u · v)/(u · u)]u (projection of v onto u)
- v⊥ = v − v‖ (component perpendicular to u)
Check: u · v⊥ = u · v − (u · v) = 0.
Gram–Schmidt generalizes this: at each step, vᵢ⊥ is vᵢ minus all its projections onto the already-orthogonalized vectors.

🛠️ Practical tips

The order of the input vectors matters: changing the order produces a different orthogonal basis.
To simplify arithmetic, start with the vector that has the most zeros (it will be used most often).
Example: Starting with {(1,1,0), (1,1,1), (3,1,1)} in R³, choose v₁ = (1,1,0) first; then v₂⊥ = (0,0,1) and v₃⊥ = (1,−1,0), yielding the orthogonal basis {(1,1,0), (0,0,1), (1,−1,0)}.

🧮 QR decomposition

🧮 Factoring M = QR

QR decomposition: any matrix M can be written M = QR, where Q is an orthogonal matrix and R is upper triangular.

Apply Gram–Schmidt to the columns of M to get orthonormal vectors (the columns of Q).
The matrix R records the Gram–Schmidt steps: entry (i, j) of R equals the dot product of the i-th column of Q with the j-th column of M.
This decomposition is useful for solving linear systems, eigenvalue problems, and least-squares approximations.

🧮 How to construct QR

Start with M and apply Gram–Schmidt to its columns, keeping track of the coefficients.
At each step, replace a column of M by the orthogonalized vector and adjust a triangular matrix to undo that replacement.
Finally, normalize the columns to get Q (orthonormal) and scale the rows of the triangular matrix accordingly to get R.
Example: For a 3×3 matrix M, the first column of Q is the first column of M normalized; the second column of Q is the Gram–Schmidt result from the first two columns of M, normalized; and so on. The matrix R is upper triangular with entries that encode the original columns in terms of the orthonormal basis.

⊥ Orthogonal complements and direct sums

⊥ What is an orthogonal complement

Orthogonal complement U⊥: the set of all vectors in W that are orthogonal to every vector in the subspace U.

Formally, U⊥ = {w ∈ W | w · u = 0 for all u ∈ U}.
U⊥ is itself a subspace: if v, w ∈ U⊥, then αv + βw ∈ U⊥ (closure under linear combinations).
Example: In R³, if U is the xy-plane, then U⊥ is the z-axis (the line perpendicular to the plane).

⊥ Direct sum decomposition

Direct sum U ⊕ V: the sum of subspaces U and V when U ∩ V = {0}.

For any subspace U in a finite-dimensional space W, the whole space decomposes as W = U ⊕ U⊥.
This means every vector w ∈ W can be written uniquely as w = u + u⊥ with u ∈ U and u⊥ ∈ U⊥.
The uniqueness comes from U ∩ U⊥ = {0}: if u ∈ U and u ∈ U⊥, then u · u = 0, so u = 0.
Example: In R⁴, let L = span{(1,1,1,1)} (a line). Then L⊥ is the 3-dimensional subspace {(x,y,z,w) | x + y + z + w = 0}, and R⁴ = L ⊕ L⊥.

⊥ Finding U⊥ with Gram–Schmidt

To construct an orthonormal basis for U⊥, start with any basis for U⊥ (e.g., from solving the equation u · x = 0) and apply Gram–Schmidt.
The excerpt shows an example: for L = span{(1,1,1,1)} in R⁴, a basis for L⊥ is {(1,−1,0,0), (1,0,−1,0), (1,0,0,−1)}; Gram–Schmidt converts this to an orthogonal basis, then normalize to get an orthonormal basis.
The operation ⊥ is an involution: (U⊥)⊥ = U (taking the orthogonal complement twice returns the original subspace).

⊥ Don't confuse: sum vs direct sum

Sum U + V = span(U ∪ V) = {u + v | u ∈ U, v ∈ V}: always a subspace, but vectors may have multiple representations if U ∩ V ≠ {0}.
Direct sum U ⊕ V: only when U ∩ V = {0}; every vector has a unique decomposition.
Example: If U and V overlap (share a nonzero vector), their sum is not a direct sum, and dimension(U + V) < dimension(U) + dimension(V).

🔗 Outer products and projection matrices

🔗 Outer product of basis vectors

For column vectors v and w, the outer product vwᵀ is a square matrix.
For the standard basis, Πᵢ = eᵢeᵢᵀ is the diagonal matrix with 1 in the i-th position and 0 elsewhere.
Properties: ΠᵢΠⱼ = Πᵢ if i = j, and 0 if i ≠ j.
Any diagonal matrix D with diagonal entries λ₁, …, λₙ can be written D = λ₁Π₁ + ⋯ + λₙΠₙ.
For an orthonormal basis {u₁, …, uₙ}, the sum Σᵢ uᵢuᵢᵀ equals the identity matrix Iₙ: this is because (Σᵢ uᵢuᵢᵀ)v = Σᵢ uᵢ(uᵢᵀv) = Σᵢ (v · uᵢ)uᵢ = v for any vector v.

🔗 Projection interpretation

The matrix uuᵀ (for a unit vector u) is the projection onto the line spanned by u.
Applying uuᵀ to any vector v gives (u · v)u, the component of v along u.
The Gram–Schmidt procedure repeatedly subtracts such projections to build orthogonal vectors.

Eigenspaces

12.3 Eigenspaces

🧭 Overview

🧠 One-sentence thesis

Orthonormal bases allow any vector to be expressed as a simple sum of projections onto basis vectors, and they enable inner products in general vector spaces to be computed as ordinary dot products of coordinate vectors.

📌 Key points (3–5)

What orthonormal bases are: bases where all vectors are perpendicular (orthogonal) and each has unit length.
Key advantage: coefficients in a linear combination are found simply by taking dot products with basis vectors.
Common confusion: orthogonal vs orthonormal—orthogonal means perpendicular; orthonormal adds the requirement that each vector has length 1.
Why it matters: orthonormal bases let you compute inner products in abstract vector spaces using familiar dot products of coordinate vectors.
Connection to standard basis: the standard basis in R^n is orthonormal and serves as the model for all orthonormal bases.

🎯 Properties of the standard basis

🎯 Unit length and orthogonality

The standard basis vectors e₁, e₂, ..., eₙ in R^n have two key properties:

Unit length: Each vector has length 1, i.e., the norm of eᵢ equals the square root of eᵢ · eᵢ, which is 1.
Orthogonality: Any two different basis vectors are perpendicular, meaning eᵢ · eⱼ = 0 when i ≠ j.

Kronecker delta: δᵢⱼ = 1 if i = j, and 0 if i ≠ j.

These properties are summarized by eᵢᵀ eⱼ = δᵢⱼ. Notice that the Kronecker delta gives the entries of the identity matrix.

🔲 Outer products and projection matrices

The inner product vᵀw is ordinary matrix multiplication of a row vector times a column vector, giving a scalar.
The outer product vwᵀ is a column vector times a row vector, giving a square matrix.

For the standard basis, define Πᵢ = eᵢeᵢᵀ. This is a diagonal matrix with a 1 in the i-th diagonal position and zeros everywhere else.

Key property: Πᵢ Πⱼ = Πᵢ if i = j, and 0 if i ≠ j.

Any diagonal matrix D with diagonal entries λ₁, ..., λₙ can be written as D = λ₁Π₁ + ⋯ + λₙΠₙ.

🧩 Orthogonal and orthonormal bases

🧩 Definitions and distinctions

The excerpt distinguishes two types of bases:

Type	Definition	Requirement
Orthogonal basis	{v₁, ..., vₙ}	vᵢ · vⱼ = 0 if i ≠ j (all vectors perpendicular)
Orthonormal basis	{u₁, ..., uₙ}	uᵢ · uⱼ = δᵢⱼ (orthogonal + each has unit length)

Don't confuse: Orthogonal only requires perpendicularity; orthonormal additionally requires each vector to have length 1.

🎁 Finding coefficients easily

Suppose T = {u₁, ..., uₙ} is an orthonormal basis for R^n. Any vector v can be written uniquely as v = c₁u₁ + ⋯ + cₙuₙ.

The key advantage: Because T is orthonormal, finding the coefficients is trivial—just take dot products:

Take the dot product of v with uᵢ: v · uᵢ = c₁(u₁ · uᵢ) + ⋯ + cᵢ(uᵢ · uᵢ) + ⋯ + cₙ(uₙ · uᵢ).
All terms vanish except cᵢ · 1, so v · uᵢ = cᵢ.

Theorem 14.2.1: For an orthonormal basis {u₁, ..., uₙ}, any vector v can be expressed as v = Σᵢ (v · uᵢ) uᵢ.

Example: If you know v and an orthonormal basis, you don't need to solve a system of equations—just compute n dot products.

🔗 Orthonormal bases and inner products

🔗 From abstract inner products to dot products

When working with general vector spaces V (not just R^n), the ordinary dot product may not make sense. Instead, you choose an inner product ⟨·, ·⟩ suited to your problem.

If V has an orthonormal basis O = (u₁, ..., uₙ) with respect to this inner product—meaning ⟨uᵢ, uⱼ⟩ = δᵢⱼ—then you can relate the abstract inner product to an ordinary dot product.

📐 Encoding vectors as coordinate columns

Any vector v in V can be written as v = ⟨v, u₁⟩u₁ + ⋯ + ⟨v, uₙ⟩uₙ.

This means v is encoded by the column vector of its coordinates in the orthonormal basis:

v corresponds to the column [⟨v, u₁⟩, ..., ⟨v, uₙ⟩] in basis O.
Similarly, v′ corresponds to [⟨v′, u₁⟩, ..., ⟨v′, uₙ⟩] in basis O.

Why it matters: The inner product ⟨v, v′⟩ in the abstract space V can now be computed as the ordinary dot product of these two coordinate column vectors. This bridges abstract vector spaces and concrete computation in R^n.

Orthonormal Bases and Coordinate Representation

12.4 Review Problems

🧭 Overview

🧠 One-sentence thesis

Orthonormal bases allow any vector to be expressed as a simple sum of dot products with basis vectors, and they convert abstract inner products into familiar dot product computations.

📌 Key points (3–5)

Orthonormal vs orthogonal: orthonormal bases require both perpendicularity (dot product zero for different vectors) and unit length (dot product one with itself), summarized by the Kronecker delta condition.
Easy coefficient formula: for an orthonormal basis, the coefficient of each basis vector in a linear combination is simply the dot product of the target vector with that basis vector.
Inner products become dot products: once you choose an orthonormal basis, computing inner products in abstract vector spaces reduces to ordinary dot product calculations on coordinate vectors.
Common confusion: the standard basis is orthonormal, but many other orthonormal bases exist; the key property is the Kronecker delta relation, not the specific vectors.
Why it matters: orthonormal bases simplify finding coordinates, computing lengths and angles, and working with inner products in general vector spaces.

🔢 Kronecker delta and basis properties

🔢 The Kronecker delta

Kronecker delta δᵢⱼ: equals 1 when i = j, and 0 when i ≠ j.

The Kronecker delta gives the entries of the identity matrix.
For the standard basis vectors eᵢ and eⱼ in Rⁿ, the dot product eᵢ · eⱼ = δᵢⱼ.
This property is the defining feature of an orthonormal basis.

📐 Orthogonal vs orthonormal bases

Type	Definition	Condition
Orthogonal basis	All vectors perpendicular to each other	vᵢ · vⱼ = 0 if i ≠ j
Orthonormal basis	Orthogonal and each vector has unit length	uᵢ · uⱼ = δᵢⱼ

Orthonormal is stricter: it requires both perpendicularity and normalization.
The standard basis {e₁, …, eₙ} is orthonormal, but many other orthonormal bases exist.

🧮 Outer products and projection matrices

The outer product of column vectors v and w is vwᵀ, which gives a square matrix (contrast with the inner product vᵀw, which is a scalar).
For the standard basis, Πᵢ = eᵢeᵢᵀ is a diagonal matrix with 1 in the i-th diagonal position and zeros elsewhere.
These matrices satisfy Πᵢ Πⱼ = Πᵢ if i = j, and 0 if i ≠ j.
Any diagonal matrix D with diagonal entries λ₁, …, λₙ can be written as D = λ₁Π₁ + ⋯ + λₙΠₙ.

🎯 Finding coefficients in an orthonormal basis

🎯 The coefficient formula (Theorem 14.2.1)

For an orthonormal basis {u₁, …, uₙ}, any vector v can be expressed as v = Σᵢ (v · uᵢ) uᵢ.

Because the basis is orthonormal, finding the coefficient cᵢ for uᵢ is straightforward: just compute v · uᵢ.
Why this works: Take the dot product of v = c₁u₁ + ⋯ + cₙuₙ with uᵢ:
- v · uᵢ = c₁(u₁ · uᵢ) + ⋯ + cᵢ(uᵢ · uᵢ) + ⋯ + cₙ(uₙ · uᵢ)
- All terms vanish except cᵢ(uᵢ · uᵢ) = cᵢ · 1 = cᵢ.
Example: If v is a vector in R³ and {u₁, u₂, u₃} is orthonormal, then v = (v · u₁)u₁ + (v · u₂)u₂ + (v · u₃)u₃.

🔍 Don't confuse with general bases

For a general (non-orthonormal) basis, finding coefficients requires solving a system of equations.
Orthonormality makes the process trivial: no matrix inversion or system solving needed.

🔗 Connecting inner products and dot products

🔗 Encoding vectors as coordinate columns

Suppose V is a vector space with an orthonormal basis O = (u₁, …, uₙ) and inner product ⟨·, ·⟩.
Any vector v in V can be written as v = ⟨v, u₁⟩u₁ + ⋯ + ⟨v, uₙ⟩uₙ.
This corresponds to the coordinate column vector with entries ⟨v, u₁⟩, …, ⟨v, uₙ⟩.

🔗 Inner product equals dot product of coordinates

For two vectors v and v′ in V, the inner product ⟨v, v′⟩ equals the dot product of their coordinate vectors:
- ⟨v, v′⟩ = ⟨v, u₁⟩⟨v′, u₁⟩ + ⋯ + ⟨v, uₙ⟩⟨v′, uₙ⟩
Why this works: Expand both v and v′ in the orthonormal basis, then use linearity of the inner product and the fact that ⟨uᵢ, uⱼ⟩ = δᵢⱼ.
Most cross terms vanish because ⟨uᵢ, uⱼ⟩ = 0 when i ≠ j; only diagonal terms ⟨uᵢ, uᵢ⟩ = 1 survive.

📝 Practical consequence

Once you have an orthonormal basis, you can compute inner products (lengths, angles) using familiar dot product formulas on coordinate vectors.
Conversely, dot product computations can always be reinterpreted as inner product computations if needed.
The excerpt notes that dot product notation is often used even when technically an inner product is meant, because they coincide in an orthonormal basis.

📚 Example: Polynomial space with integral inner product

📚 The setup

Consider the vector space V = span{1, x} of polynomials with inner product ⟨p, p′⟩ = ∫₀¹ p(x)p′(x) dx.
The obvious basis B = (1, x) is not orthonormal under this inner product.
An orthonormal basis is O = (1, 2√3(x − 1/2)).

📚 Verifying orthonormality

Check perpendicularity: ⟨2√3(x − 1/2), 1⟩ = 2√3 ∫₀¹ (x − 1/2) dx = 0.
Check unit length: ⟨x − 1/2, x − 1/2⟩ = ∫₀¹ (x − 1/2)² dx = 1/12 = (1/(2√3))², so ⟨2√3(x − 1/2), 2√3(x − 1/2)⟩ = 1.

📚 Finding coordinates

An arbitrary polynomial v = a + bx can be expressed in the orthonormal basis O:
- v = (a + b/2) · 1 + b(x − 1/2)
- Coordinate vector: (a + b/2, b/(2√3)) in basis O.
The inner product of a + bx and a′ + b′x can now be computed as the dot product of their coordinate vectors.

🔍 Don't confuse bases

The basis (1, x) is simpler to write down, but not orthonormal for this inner product.
The basis (1, 2√3(x − 1/2)) is orthonormal, so it makes inner product calculations straightforward.
Different inner products on the same space will have different orthonormal bases.

Diagonalizability: Orthonormal Bases and Change of Basis

13.1 Diagonalizability

🧭 Overview

🧠 One-sentence thesis

Orthonormal bases allow inner products to be computed as simple dot products, and the change-of-basis matrices between orthonormal bases are always orthogonal matrices (their inverse equals their transpose).

📌 Key points (3–5)

Inner products become dot products: once you express vectors in an orthonormal basis, the inner product of two vectors equals the dot product of their coordinate vectors.
Orthogonal matrices: the change-of-basis matrix P between two orthonormal bases satisfies P inverse = P transpose.
Gram–Schmidt procedure: an algorithm that converts any linearly independent set of vectors into an orthogonal (or orthonormal) basis for the same span.
Common confusion: "orthogonal matrix" vs "orthonormal basis"—an orthogonal matrix has columns that form an orthonormal set of vectors (the terminology is slightly mismatched).
Orthogonal decomposition: any vector v can be split into v parallel (along a direction u) and v perpendicular (orthogonal to u), which is the foundation of Gram–Schmidt.

🔄 Inner products and orthonormal bases

🔄 Why orthonormal bases simplify inner products

Once vectors are expressed in an orthonormal basis, the inner product of two vectors equals the dot product of their coordinate vectors.

Suppose u₁, …, uₙ is an orthonormal basis.
Any vector v can be written as ⟨v, u₁⟩u₁ + … + ⟨v, uₙ⟩uₙ.
The coordinate vector of v in this basis is (⟨v, u₁⟩, …, ⟨v, uₙ⟩).
The inner product ⟨v, v′⟩ then equals the dot product of the coordinate vectors: ⟨v, u₁⟩⟨v′, u₁⟩ + … + ⟨v, uₙ⟩⟨v′, uₙ⟩.
This works because the basis is orthonormal: ⟨uᵢ, uⱼ⟩ is 1 if i = j and 0 otherwise, so all cross terms vanish.

Why it matters: You can use familiar dot-product notation and computation even when working with abstract inner-product spaces.

📐 Example with polynomials

The excerpt gives a concrete example in the space V = span{1, x} with inner product ⟨p, p′⟩ = integral from 0 to 1 of p(x)p′(x) dx.

The obvious basis (1, x) is not orthonormal.
An orthonormal basis is O = (1, 2√3(x − 1/2)).
Any polynomial a + bx has coordinates (a + b/2, b/(2√3)) in basis O.
The inner product of a + bx and a′ + b′x can be computed as the dot product of these coordinate vectors, yielding aa′ + (1/2)(ab′ + a′b) + (1/3)bb′.
This matches the direct integral computation.

Don't confuse: The basis vectors themselves (like 2√3(x − 1/2)) look complicated, but once you have coordinates, the inner product becomes a simple dot product.

🔀 Relating orthonormal bases

🔀 Change-of-basis matrix between orthonormal bases

Suppose T = {u₁, …, uₙ} and R = {w₁, …, wₙ} are two orthonormal bases.

Each wᵢ can be written as (w₁ · u₁)u₁ + … + (wₙ · uₙ)uₙ.
The change-of-basis matrix P from T to R has entries pⱼᵢ = uⱼ · wᵢ.
The excerpt shows that P Pᵀ = Iₙ (the identity matrix), so Pᵀ = P⁻¹.

Orthogonal matrix: a matrix P such that P⁻¹ = Pᵀ.

Key result: A change-of-basis matrix relating two orthonormal bases is always an orthogonal matrix.

🧮 Why P Pᵀ = I

The excerpt uses a "dirty trick" for products of dot products:

(u · v)(w · z) = uᵀ(vwᵀ)z, where vwᵀ is an outer-product matrix.
The sum over i of (uⱼ · wᵢ)(wᵢ · uₖ) can be rewritten using this trick.
The key identity is: the sum over i of wᵢwᵢᵀ equals the identity matrix Iₙ.
This is shown by checking that (sum of wᵢwᵢᵀ) applied to any vector v equals v itself, using the fact that v = sum of cⱼwⱼ and wᵢᵀwⱼ = δᵢⱼ (1 if i = j, 0 otherwise).

📊 Example in R³

The excerpt gives an orthonormal basis S = (u₁, u₂, u₃) in R³.

The change-of-basis matrix from the standard basis E to S is P = (u₁ u₂ u₃), whose columns are the new basis vectors.
The inverse is Pᵀ, whose rows are u₁ᵀ, u₂ᵀ, u₃ᵀ.
Checking PᵀP = I amounts to checking that the dot products of rows and columns give the identity matrix, which follows from orthonormality.

Important note: The columns of an orthogonal matrix form an orthonormal set of vectors (the terminology "orthogonal matrix" refers to the property P⁻¹ = Pᵀ, not to the columns being merely orthogonal).

🔷 Symmetric matrices from orthonormal change of basis

If D is a diagonal matrix and P is an orthogonal change-of-basis matrix, then M = P D Pᵀ is the matrix of D in the new basis.

The transpose of M is (P D Pᵀ)ᵀ = (Pᵀ)ᵀ Dᵀ Pᵀ = P D Pᵀ = M.
So M is symmetric.

Why it matters: Changing to an orthonormal basis preserves certain nice properties (like symmetry) when working with diagonal matrices.

🛠️ Gram–Schmidt orthogonalization

🛠️ Orthogonal decomposition of a single vector

Given a vector v and another vector u (not in span{v}), you can construct a new vector v⊥ orthogonal to u:

v⊥ := v − (u · v / u · u) u

The term (u · v / u · u) u is called v‖ (the component of v parallel to u).
v⊥ is orthogonal to u because u · v⊥ = u · v − (u · v / u · u)(u · u) = 0.
This gives an orthogonal basis {u, v⊥} for span{u, v}.
Normalizing these vectors (dividing by their lengths) gives an orthonormal basis.

Example: If you have a direction u and a vector v, you split v into v = v⊥ + v‖, where v‖ points along u and v⊥ is perpendicular to u.

🔧 Extending to three vectors

Given linearly independent vectors u, v, w, first compute v⊥ (orthogonal to u), then compute w⊥ orthogonal to both u and v⊥:

w⊥ := w − (u · w / u · u) u − (v⊥ · w / v⊥ · v⊥) v⊥

The excerpt verifies that u · w⊥ = 0 and v⊥ · w⊥ = 0.
So {u, v⊥, w⊥} is an orthogonal basis for span{u, v, w}.

Don't confuse: w⊥ is orthogonal to both u and v⊥, not to the original v; you must use the already-orthogonalized vectors at each step.

🔄 The Gram–Schmidt procedure (general case)

Given an ordered set (v₁, v₂, …) of linearly independent vectors, define:

v⊥₁ := v₁
v⊥₂ := v₂ − (v⊥₁ · v₂ / v⊥₁ · v⊥₁) v⊥₁
v⊥₃ := v₃ − (v⊥₁ · v₃ / v⊥₁ · v⊥₁) v⊥₁ − (v⊥₂ · v₃ / v⊥₂ · v⊥₂) v⊥₂
…
v⊥ᵢ := vᵢ − sum over j < i of (v⊥ⱼ · vᵢ / v⊥ⱼ · v⊥ⱼ) v⊥ⱼ

How it works:

Each v⊥ᵢ is built by subtracting from vᵢ all components along the previously computed orthogonal vectors v⊥₁, …, v⊥ᵢ₋₁.
The result is an orthogonal basis {v⊥₁, v⊥₂, …} for span{v₁, v₂, …}.
To get an orthonormal basis, divide each v⊥ᵢ by its length.

Important: The algorithm depends on the order of the input vectors; changing the order gives a different orthogonal basis.

📝 Example in R³

The excerpt applies Gram–Schmidt to the set {(1,1,1), (1,1,0), (3,1,1)}.

The vectors are reordered as (v₁, v₂, v₃) = ((1,1,0), (1,1,1), (3,1,1)) to simplify computation (choosing the vector with the most zeros first).
v⊥₁ = v₁ = (1,1,0).
v⊥₂ = (1,1,1) − (2/2)(1,1,0) = (0,0,1).
v⊥₃ = (3,1,1) − (4/2)(1,1,0) − (1/1)(0,0,1) = (1,−1,0).
The orthogonal basis is {(1,1,0), (0,0,1), (1,−1,0)}.
Normalizing gives the orthonormal basis {(1/√2, 1/√2, 0), (0,0,1), (1/√2, −1/√2, 0)}.

Practical tip: Start with the vector that has the most zeros to reduce arithmetic.

🔗 Summary and connections

🔗 Why orthonormal bases matter

Property	Benefit
Inner products become dot products	Simplifies computation in abstract spaces
Change-of-basis matrices are orthogonal	P⁻¹ = Pᵀ, so inverses are easy
Gram–Schmidt always works	Any linearly independent set can be orthogonalized

🔗 Common workflow

Start with a linearly independent set of vectors (a basis or spanning set).
Apply Gram–Schmidt to get an orthogonal basis.
Normalize (divide by lengths) to get an orthonormal basis.
Use the orthonormal basis to simplify inner-product and change-of-basis computations.

Don't confuse: "Orthogonal basis" (vectors are pairwise orthogonal) vs "orthonormal basis" (orthogonal and each has length 1). Gram–Schmidt produces orthogonal; you must normalize separately for orthonormal.

Gram-Schmidt Orthogonalization and QR Decomposition

13.2 Change of Basis

🧭 Overview

🧠 One-sentence thesis

The Gram-Schmidt procedure systematically transforms any set of linearly independent vectors into an orthogonal (or orthonormal) basis by subtracting projections, and this process underlies the QR decomposition of matrices.

📌 Key points (3–5)

What Gram-Schmidt does: converts linearly independent vectors into an orthogonal basis for the same span by iteratively removing components along previously orthogonalized vectors.
How it works step-by-step: each new orthogonal vector is built by subtracting all projections onto earlier orthogonal vectors from the original vector.
Order matters: changing the order of input vectors produces a different orthogonal basis; you must choose an ordering before applying the algorithm.
Common confusion: the algorithm uses the already orthogonalized vectors (v⊥₁, v⊥₂, etc.) for projections, not the original vectors—each step depends on all previous orthogonal results.
Practical application: the procedure directly leads to QR decomposition, where a matrix M is factored into an orthogonal matrix Q and an upper triangular matrix R.

🔧 Building orthogonal vectors step by step

🔧 From two vectors to three

The excerpt begins by extending the idea from two orthogonal vectors to three:

Given a third vector w, first check that w does not lie in the span {u, v}, i.e., check that u, v, and w are linearly independent.

If w is independent, define:

w⊥ := w − (projection of w onto u) − (projection of w onto v⊥)
The projection formula: (u · w / u · u) u removes the u-component; (v⊥ · w / v⊥ · v⊥) v⊥ removes the v⊥-component.

Why this works:

The excerpt verifies that u · w⊥ = 0 and v⊥ · w⊥ = 0 by expanding the dot products.
Since w⊥ is orthogonal to both u and v⊥, the set {u, v⊥, w⊥} forms an orthogonal basis for span{u, v, w}.

Don't confuse: You subtract projections onto the orthogonalized vectors (v⊥), not the original v.

📐 The general recursive formula

For an ordered set (v₁, v₂, ...) of linearly independent vectors, define:

v⊥₁ := v₁ (the first vector stays as-is)
v⊥₂ := v₂ − (v⊥₁ · v₂ / v⊥₁ · v⊥₁) v⊥₁
v⊥₃ := v₃ − (v⊥₁ · v₃ / v⊥₁ · v⊥₁) v⊥₁ − (v⊥₂ · v₃ / v⊥₂ · v⊥₂) v⊥₂
...
v⊥ᵢ := vᵢ − (sum of projections onto all v⊥ⱼ for j < i)

Key properties:

Each v⊥ᵢ depends on all previous v⊥ⱼ (j < i), allowing inductive/algorithmic construction.
The result is a linearly independent, orthogonal set {v⊥₁, v⊥₂, ...} with span{v⊥₁, v⊥₂, ...} = span{v₁, v₂, ...}.
This is an orthogonal basis for the original span.

⚙️ Why order matters

The excerpt emphasizes:

The set of vectors you start out with needs to be ordered to uniquely specify the algorithm; changing the order of the vectors will give a different orthogonal basis.

Different orderings produce different orthogonal bases (all valid, but not identical).
You may need to choose the order yourself; a practical tip from the example: "choose the vector with the most zeros to be first in hopes of simplifying computations."

📚 The Gram-Schmidt procedure

📚 What it is

Gram–Schmidt orthogonalization procedure: an algorithm to build an orthogonal basis from a linearly independent set.

Named after Gram (worked at a Danish insurance company over one hundred years ago) and Schmidt (a student of Hilbert, the famous German mathematician).
The procedure is both a theoretical tool and a practical algorithm.

🧮 Worked example in R³

The excerpt provides Example 135: obtain an orthogonal basis for R³ from the linearly independent set {(1,1,1), (1,1,0), (3,1,1)}.

Step 1: Choose order

Reorder as (v₁, v₂, v₃) := ((1,1,0), (1,1,1), (3,1,1)) to put the vector with most zeros first.

Step 2: Apply Gram-Schmidt

v⊥₁ := v₁ = (1,1,0)
v⊥₂ := (1,1,1) − (2/2)(1,1,0) = (0,0,1)
v⊥₃ := (3,1,1) − (4/2)(1,1,0) − (1/1)(0,0,1) = (1,−1,0)

Result:

Orthogonal basis: {(1,1,0), (0,0,1), (1,−1,0)}
To get an orthonormal basis, divide each by its length: {(1/√2, 1/√2, 0), (0,0,1), (1/√2, −1/√2, 0)}

Don't confuse: Orthogonal means dot products are zero; orthonormal means orthogonal and each vector has length 1.

🔢 QR decomposition

🔢 What QR decomposition is

The Gram-Schmidt procedure suggests a matrix factorization:

M = QR, where Q is an orthogonal matrix and R is an upper triangular matrix.

Q's columns are the orthonormal basis vectors produced by Gram-Schmidt.
R records the steps of the Gram-Schmidt procedure.
QR decompositions are useful for solving linear systems, eigenvalue problems, and least squares approximations.

🧩 How to construct QR

The excerpt walks through Example 136: find the QR decomposition of M = [[2,−1,1], [1,3,−2], [0,1,−2]].

Strategy:

Think of M's columns as three 3-vectors.
Use Gram-Schmidt to build an orthonormal basis (these become Q's columns).
Use matrix R to record Gram-Schmidt steps so that QR = M.

Step-by-step:

First column stays; orthogonalize second column:
- Replace M's second column by the Gram-Schmidt result from the first two columns: (−7/5, 14/5, 1).
- Write M = [first matrix with orthogonal first two columns] × [almost-identity matrix with +1/5 in position (1,2)].
- The +1/5 entry "undoes" the Gram-Schmidt step when multiplied, recovering the original second column.
Orthogonalize third column:
- Use Gram-Schmidt to deduce the third orthogonal vector: (−1/6, 1/3, −7/6).
- Write M = [matrix with mutually orthogonal columns] × [upper triangular matrix with entries recording the projections].
Normalize to get Q:
- The excerpt notes "this is not quite the answer because the first matrix is now made of mutually orthogonal column vectors, but a bona fide orthogonal matrix is comprised of orthonormal [vectors]."
- Divide each column by its length to obtain Q.
- Adjust R accordingly to maintain M = QR.

Key insight:

R is upper triangular because each Gram-Schmidt step only involves earlier vectors (j < i), so R has zeros below the diagonal.
The entries of R record the projection coefficients and normalizations.

🔍 Comparison with LU decomposition

The excerpt mentions another decomposition from an earlier chapter:

Decomposition	Form	Matrices	Use cases (from excerpt)
LU	M = LU	L = lower triangular, U = upper triangular	Solving linear systems
QR	M = QR	Q = orthogonal, R = upper triangular	Solving linear systems, eigenvalue problems, least squares

Don't confuse: LU uses triangular matrices; QR uses an orthogonal matrix (columns are orthonormal) paired with an upper triangular matrix.

13.3 Changing to a Basis of Eigenvectors

🧭 Overview

🧠 One-sentence thesis

The Gram–Schmidt orthogonalization procedure transforms any linearly independent set of vectors into an orthogonal (or orthonormal) basis spanning the same space, and this process underlies the QR decomposition of matrices.

📌 Key points (3–5)

What Gram–Schmidt does: builds an orthogonal basis from a linearly independent set by inductively subtracting projections.
Order matters: changing the order of input vectors produces a different orthogonal basis; you must choose an ordering.
QR decomposition: decomposes a matrix M into Q (orthogonal matrix) times R (upper triangular matrix) by applying Gram–Schmidt to the columns of M.
Common confusion: orthogonal vs orthonormal—orthogonal means perpendicular vectors; orthonormal means perpendicular and unit length (divide by length to convert).
Why it matters: QR decompositions solve linear systems, eigenvalue problems, and least squares approximations.

🔧 The Gram–Schmidt procedure

🔧 How the algorithm works

Gram–Schmidt orthogonalization procedure: an inductive/algorithmic method to build a linearly independent, orthogonal set of vectors {v⊥₁, v⊥₂, ...} such that span{v⊥₁, v⊥₂, ...} = span{v₁, v₂, ...}.

Start with a linearly independent set {v₁, v₂, ...}.
Each new orthogonal vector v⊥ᵢ depends on all previous v⊥ⱼ for j < i.
The algorithm subtracts projections: each v⊥ᵢ is constructed by removing components of vᵢ that lie in the directions of all earlier orthogonal vectors.
The result is an orthogonal basis for the same vector space.

📐 Order dependency

The algorithm requires an ordered set of input vectors to uniquely specify the result.
Changing the order produces a different orthogonal basis (still valid, but different).
Practical tip from the excerpt: choose the vector with the most zeros first to simplify computations.

🎓 Historical note

Named after Gram (worked at a Danish insurance company over 100 years ago) and Schmidt (a student of the famous German mathematician Hilbert).

📝 Worked example in R³

📝 Setting up the problem

Goal: obtain an orthogonal basis for R³ from the linearly independent set {(1,1,1), (1,1,0), (3,1,1)}.
Chosen order (to minimize computation): (v₁, v₂, v₃) := {(1,1,0), (1,1,1), (3,1,1)}.

📝 Step-by-step construction

First vector: Set v⊥₁ := v₁ = (1,1,0).
Second vector: v⊥₂ := (1,1,1) − (projection onto v⊥₁) = (1,1,1) − (2/2)(1,1,0) = (0,0,1).
Third vector: v⊥₃ := (3,1,1) − (projection onto v⊥₁) − (projection onto v⊥₂) = (3,1,1) − (4/2)(1,1,0) − (1/1)(0,0,1) = (1,−1,0).

Result: {(1,1,0), (0,0,1), (1,−1,0)} is an orthogonal basis for R³.

📝 Converting to orthonormal

Divide each vector by its length.
Orthonormal basis: {(1/√2, 1/√2, 0), (0, 0, 1), (1/√2, −1/√2, 0)}.
Don't confuse: orthogonal means perpendicular; orthonormal means perpendicular and unit length.

🔀 QR decomposition

🔀 What QR decomposition is

QR decomposition: a matrix M is decomposed into a product M = QR, where Q is an orthogonal matrix and R is an upper triangular matrix.

Q's columns form an orthonormal basis (obtained by Gram–Schmidt on M's columns).
R records the steps of the Gram–Schmidt procedure.
Applications: solving linear systems, eigenvalue problems, least squares approximations.

🔀 How to construct QR

The excerpt demonstrates the process through a 3×3 example:

Apply Gram–Schmidt to columns of M: treat each column as a vector; build orthogonal vectors step by step.
Record the adjustments in R: each time you subtract a projection, record the coefficient in the upper triangular matrix R so that QR = M.
Normalize to get Q: divide each orthogonal column by its length to make it orthonormal; multiply the corresponding row of R by the same length to preserve the product.

Example from the excerpt:

Start with M = [(2,1,0), (−1,3,1), (1,−2,−2)].
After Gram–Schmidt and normalization: M = QR where Q has orthonormal columns and R is upper triangular.

🔀 Geometric interpretation

The process rotates the original vectors so that:
- The first lies along the x-axis.
- The second lies in the xy-plane.
- The third lies in some other generic direction.
Useful check: entry (i,j) of R equals the dot product of the i-th column of Q with the j-th column of M.

➕ Orthogonal complements and sums of subspaces

➕ Sum of subspaces

Sum of subspaces U and V: U + V := span(U ∪ V) = {u + v | u ∈ U, v ∈ V}.

This is adding vector spaces, not vectors.
All elements take the form u + v with u ∈ U and v ∈ V.
Note: U ∪ V is not a subspace, but span(U ∪ V) is.

➕ Intersection vs union

Concept	Is it a subspace?	Explanation
U ∩ V	Yes	The intersection of two subspaces is a subspace
U ∪ V	No	The union of two subspaces is generally not a subspace
span(U ∪ V)	Yes	The span of any subset of a vector space is a subspace

Example from the excerpt:

span{(1,1,0,0), (0,1,1,0)} + span{(0,1,1,0), (0,0,1,1)} = span{(1,1,0,0), (0,1,1,0), (0,0,1,1)}.
The two subspaces share the vector (0,1,1,0), but their sum is the span of all three basis vectors.

QR Decomposition and Orthogonal Complements

13.4 Review Problems

🧭 Overview

🧠 One-sentence thesis

The QR decomposition expresses a matrix as the product of an orthogonal matrix Q and an upper triangular matrix R by applying Gram–Schmidt orthogonalization to the columns, while orthogonal complements allow any finite-dimensional vector space to be decomposed as a direct sum of a subspace and its orthogonal complement.

📌 Key points (3–5)

QR decomposition: Any matrix M can be factored as M = QR, where Q has orthonormal columns and R is upper triangular, by recording the Gram–Schmidt process in matrix form.
Direct sum vs ordinary sum: U ⊕ V requires U ∩ V = {0}, which guarantees unique decomposition of every vector; U + V allows overlap and non-unique representations.
Orthogonal complement: For any subspace U in W, the orthogonal complement U⊥ consists of all vectors in W orthogonal to every vector in U.
Common confusion: Direct sum U ⊕ V vs ordinary sum U + V—only the direct sum guarantees that each vector has a unique decomposition; ordinary sums allow shared non-zero vectors.
Fundamental decomposition theorem: Every finite-dimensional vector space W can be written as W = U ⊕ U⊥, providing a natural unique decomposition.

🔄 QR Decomposition Mechanics

🔄 What QR decomposition is

QR decomposition: A factorization M = QR where Q is an orthogonal matrix (columns are orthonormal) and R is an upper triangular matrix.

The decomposition records the Gram–Schmidt orthogonalization process in matrix form.
The columns of M are treated as vectors; Gram–Schmidt builds an orthonormal basis from them to form Q.
Matrix R encodes the steps needed to reconstruct M from Q.

🛠️ How to construct QR

The procedure works in three stages:

Apply Gram–Schmidt to columns: Replace each column of M with the orthogonal vector produced by Gram–Schmidt, keeping track of the linear combinations used.
Record the combinations in R: Build an upper triangular matrix R that "undoes" the Gram–Schmidt changes when multiplied with the modified matrix.
Normalize to orthonormal: Divide each column of the intermediate matrix by its length to get Q, and multiply the corresponding row of R by the same length.

Example from the excerpt: For M with columns (2,1,0), (−1,3,1), (1,−2,−2):

First column stays as (2,1,0).
Second column becomes (−7/5, 14/5, 1) after subtracting its projection onto the first.
Third column becomes (−1/6, 1/3, −7/6) after subtracting projections onto the first two.
Then normalize each column by its length to get Q, adjusting R accordingly.

✅ Verification check

The excerpt notes: entry (i,j) of matrix R equals the dot product of the i-th column of Q with the j-th column of M.

This provides both a check of correctness and an alternative recipe for computing QR decompositions.
Understanding why this is true tests comprehension of the construction process.

➕ Sums of Subspaces

➕ Ordinary sum of subspaces

Sum of subspaces: U + V := span(U ∪ V) = {u + v | u ∈ U, v ∈ V}

This is adding vector spaces, not individual vectors, to produce a new vector space.
The dimension of U + V can be less than dim U + dim V if the subspaces share non-zero vectors.

Example from the excerpt:

span{(1,1,0,0), (0,1,1,0)} + span{(0,1,1,0), (0,0,1,1)} = span{(1,1,0,0), (0,1,1,0), (0,0,1,1)}
Both addends are 2-dimensional and share vector (0,1,1,0), so their sum is 3-dimensional, not 4-dimensional.

⊕ Direct sum of subspaces

Direct sum: If U ∩ V = {0_W}, then U ⊕ V := span(U ∪ V) = {u + v | u ∈ U, v ∈ V}

The key requirement: U and V have no non-zero vectors in common.
When U ∩ V = {0_W}, then U + V = U ⊕ V; when U ∩ V ≠ {0_W}, then U + V ≠ U ⊕ V.

🎯 Uniqueness property of direct sums

Theorem: If w ∈ U ⊕ V, there is only one way to write w as the sum of a vector in U and a vector in V.

Why this matters:

Suppose u + v = u' + v' with u, u' ∈ U and v, v' ∈ V.
Rearranging: (u − u') = −(v − v').
Since U and V are subspaces, (u − u') ∈ U and −(v − v') ∈ V.
But they are equal, so (u − u') ∈ V as well.
Since U ∩ V = {0}, we must have (u − u') = 0, so u = u'.
Similarly, v = v'.

Don't confuse: Ordinary sums allow multiple representations; only direct sums guarantee unique decomposition.

⊥ Orthogonal Complements

⊥ Definition and meaning

Orthogonal complement: If U is a subspace of W, then U⊥ := {w ∈ W | w · u = 0 for all u ∈ U}

Read as "U-perp."
This is the set of all vectors in W orthogonal to every vector in U.
For a general inner product (not just dot product), replace w · u = 0 with ⟨w, u⟩ = 0.

Example from the excerpt:

If P is any plane through the origin in R³, then P⊥ is the line through the origin orthogonal to P.
If P is the xy-plane, then R³ = P ⊕ P⊥ = {(x,y,0) | x,y ∈ R} ⊕ {(0,0,z) | z ∈ R}.

🏗️ Structure theorem for orthogonal complements

Theorem: Let U be a subspace of a finite-dimensional vector space W. Then U⊥ is a subspace of W, and W = U ⊕ U⊥.

Why U⊥ is a subspace (closure check):

Suppose v, w ∈ U⊥, so v · u = 0 = w · u for all u ∈ U.
Then u · (αv + βw) = αu · v + βu · w = 0 for all u ∈ U.
Therefore αv + βw ∈ U⊥.

Why U ∩ U⊥ = {0}:

If u ∈ U and u ∈ U⊥, then u · u = 0.
This implies u = 0.

Why W = U ⊕ U⊥ (every w ∈ W can be decomposed):

Let e₁, ..., eₙ be an orthonormal basis for U.
Define u = (w · e₁)e₁ + ··· + (w · eₙ)eₙ ∈ U.
Define u⊥ = w − u.
Then u⊥ ∈ U⊥ (by the Gram-Schmidt procedure logic).
So w = u + u⊥, proving w ∈ U ⊕ U⊥.

📐 Computing orthogonal complements

Example from the excerpt: Let L = span{(1,1,1,1)} be a line in R⁴.

Then:

L⊥ = {(x,y,z,w) ∈ R⁴ | (x,y,z,w) · (1,1,1,1) = 0}
L⊥ = {(x,y,z,w) ∈ R⁴ | x + y + z + w = 0}

To find an orthonormal basis for L⊥:

Start with any basis for L⊥, e.g., {(1,−1,0,0), (1,0,−1,0), (1,0,0,−1)}.
Apply Gram–Schmidt to get orthogonal vectors.
Normalize each vector by dividing by its length.

The result is an orthonormal basis for L⊥, and R⁴ = L ⊕ L⊥ decomposes R⁴ into a line and its three-dimensional orthogonal complement.

🔁 Involution property

Key observation: For any subspace U, the orthogonal complement of U⊥ is just U again: (U⊥)⊥ = U.

Involution: A mathematical operation which, performed twice, does nothing.

The ⊥ operation is an involution on the set of subspaces of a vector space.
Applying it twice returns you to the original subspace.

🔗 Connections and Applications

🔗 QR for solving systems

The excerpt mentions QR decompositions are useful for:

Solving linear systems
Eigenvalue problems
Least squares approximations

The geometric interpretation: rotating vectors so the first lies along the x-axis, the second in the xy-plane, and the third in a generic direction.

🔗 Natural decomposition via orthogonality

The fundamental question: Given a subspace U in W, how can we write W as the direct sum U ⊕ V for some V?

There is not a unique answer in general (many possible choices for V).
However, using the inner product, U⊥ is the natural candidate for the second subspace.
The theorem W = U ⊕ U⊥ provides a canonical, unique decomposition using orthogonality.

Decomposition type	Requirement	Uniqueness	Natural choice
U + V	None	No unique representation	Many options
U ⊕ V	U ∩ V = {0}	Unique representation	Many options
U ⊕ U⊥	Orthogonality	Unique representation	One natural choice

Properties of the Standard Basis

14.1 Properties of the Standard Basis

🧭 Overview

🧠 One-sentence thesis

The standard basis in R^n has special orthonormal properties that can be generalized to other bases, enabling easy computation of vector coefficients and relating inner products to dot products.

📌 Key points (3–5)

Standard basis properties: Each standard basis vector has unit length and all are mutually orthogonal (perpendicular).
Orthonormal vs orthogonal: Orthonormal bases require both perpendicularity and unit length; orthogonal bases only require perpendicularity.
Easy coefficient formula: For orthonormal bases, the coefficient c_i of basis vector u_i in any vector v is simply the dot product v · u_i.
Common confusion: The Kronecker delta δ_ij summarizes both properties (unit length when i = j, orthogonality when i ≠ j) and gives the entries of the identity matrix.
Bridge to general spaces: Orthonormal bases allow inner products in abstract vector spaces to be computed as dot products of coordinate vectors.

📏 Standard basis structure

📐 Length and orthogonality properties

The standard basis vectors e₁, e₂, ..., eₙ in R^n have two key properties:

Unit length: Each standard basis vector has length 1, i.e., ||e_i|| = √(e_i · e_i) = 1.

Orthogonality: Standard basis vectors are perpendicular to each other, i.e., e_i · e_j = 0 when i ≠ j.

These properties are summarized compactly:

e_i^T · e_j = δ_ij, where δ_ij is the Kronecker delta
δ_ij = 1 if i = j, and δ_ij = 0 if i ≠ j
The Kronecker delta gives the entries of the identity matrix

🔲 Outer products and projection matrices

The excerpt introduces outer products of standard basis vectors:

Inner product: v^T w produces a scalar (dot product)
Outer product: v w^T produces a square matrix

For standard basis vectors, define Π_i = e_i e_i^T:

Π_i is a diagonal matrix with 1 in the i-th diagonal position and zeros elsewhere
Multiplication rule: Π_i Π_j = Π_i if i = j, and 0 if i ≠ j
Any diagonal matrix D with diagonal entries λ₁, ..., λₙ can be written as D = λ₁Π₁ + ··· + λₙΠₙ

Example: Π₁ in R³ would be the matrix with 1 in position (1,1) and zeros everywhere else; it "projects" onto the first coordinate.

🔄 Generalizing to other bases

🔄 Orthogonal and orthonormal bases

The standard basis properties can be found in other bases:

Orthogonal basis {v₁, ..., vₙ}: vectors satisfy v_i · v_j = 0 if i ≠ j (all vectors are perpendicular).

Orthonormal basis {u₁, ..., uₙ}: vectors satisfy u_i · u_j = δ_ij (perpendicular and each has unit length).

Don't confuse: Orthogonal only requires perpendicularity; orthonormal additionally requires unit length. Every orthonormal basis is orthogonal, but not every orthogonal basis is orthonormal.

Property	Orthogonal basis	Orthonormal basis
Perpendicularity	v_i · v_j = 0 (i ≠ j)	u_i · u_j = 0 (i ≠ j)
Unit length	Not required
Summary condition	Only perpendicularity	u_i · u_j = δ_ij

⚡ Easy coefficient computation

Theorem 14.2.1: For an orthonormal basis {u₁, ..., uₙ}, any vector v can be expressed as v = Σ_i (v · u_i) u_i.

Why this works:

Since the basis is orthonormal, write v = c₁u₁ + ··· + cₙuₙ
Take the dot product of both sides with u_i:
- v · u_i = c₁(u₁ · u_i) + ··· + c_i(u_i · u_i) + ··· + cₙ(uₙ · u_i)
- All terms vanish except c_i(u_i · u_i) = c_i · 1 = c_i
Therefore c_i = v · u_i

Example: To find how much of u₃ is in v, just compute the dot product v · u₃—no need to solve a system of equations.

🌉 Connecting inner products to dot products

🌉 From abstract spaces to coordinates

The excerpt addresses how to work with inner products in general vector spaces V (not just R^n):

Setup: Suppose V has an orthonormal basis O = (u₁, ..., uₙ) with respect to some inner product ⟨·, ·⟩, meaning ⟨u_i, u_j⟩ = δ_ij.

Encoding vectors as column vectors:

Any vector v in V can be written as v = ⟨v, u₁⟩u₁ + ··· + ⟨v, uₙ⟩uₙ
This is represented by the coordinate column vector with entries ⟨v, u₁⟩, ..., ⟨v, uₙ⟩
Similarly for another vector v′ with coordinates ⟨v′, u₁⟩, ..., ⟨v′, uₙ⟩

Key insight: The dot product of these two coordinate column vectors equals the inner product ⟨v, v′⟩ in the original space.

Don't confuse: The inner product ⟨·, ·⟩ is defined on the abstract vector space V, while the dot product operates on coordinate vectors in R^n. An orthonormal basis provides the bridge between them.

Example: In a function space with an appropriate inner product, once you choose an orthonormal basis, computing inner products reduces to computing dot products of coefficient vectors.

Orthogonal and Orthonormal Bases

14.2 Orthogonal and Orthonormal Bases

🧭 Overview

🧠 One-sentence thesis

Orthonormal bases allow any vector to be expressed as a simple sum of dot products with basis vectors, and they convert inner products into ordinary dot products.

📌 Key points (3–5)

Orthogonal vs orthonormal: orthogonal bases have perpendicular vectors; orthonormal bases are orthogonal and every vector has unit length.
Easy coefficient formula: for an orthonormal basis, the coefficient of each basis vector in a linear combination is just the dot product of the target vector with that basis vector.
Inner products become dot products: once you choose an orthonormal basis, computing inner products in abstract vector spaces reduces to computing dot products of coordinate vectors.
Common confusion: the standard basis is orthonormal, but many other orthonormal bases exist; the key property is the Kronecker delta relation (dot product equals 1 if same vector, 0 otherwise).

🔑 Definitions and basic properties

🔑 Orthogonal basis

Orthogonal basis {v₁, …, vₙ}: vᵢ · vⱼ = 0 if i ≠ j.

All vectors in the basis are perpendicular to each other.
No restriction on the length of each vector.

🔑 Orthonormal basis

Orthonormal basis {u₁, …, uₙ}: uᵢ · uⱼ = δᵢⱼ (the Kronecker delta).

The Kronecker delta δᵢⱼ equals 1 when i = j and 0 when i ≠ j.
This means:
- Each basis vector has unit length (uᵢ · uᵢ = 1).
- Different basis vectors are perpendicular (uᵢ · uⱼ = 0 when i ≠ j).
The standard basis {e₁, …, eₙ} is orthonormal because eᵢᵀ eⱼ = δᵢⱼ.
Don't confuse: orthonormal is stricter than orthogonal—it adds the unit-length requirement.

🧮 Finding coefficients in an orthonormal basis

🧮 The coefficient formula

Theorem 14.2.1: For an orthonormal basis {u₁, …, uₙ}, any vector v can be expressed as

v = Σᵢ (v · uᵢ) uᵢ.

Because the basis is orthonormal, you don't need to solve a system of equations to find coefficients.
Each coefficient cᵢ is simply the dot product v · uᵢ.

🔍 Why the formula works

Start with the general expansion: v = c₁u₁ + ⋯ + cₙuₙ.
Take the dot product of both sides with any basis vector uᵢ:
- v · uᵢ = c₁(u₁ · uᵢ) + ⋯ + cᵢ(uᵢ · uᵢ) + ⋯ + cₙ(uₙ · uᵢ).
Because the basis is orthonormal:
- All terms except the i-th vanish (uⱼ · uᵢ = 0 when j ≠ i).
- The i-th term simplifies to cᵢ · 1 (uᵢ · uᵢ = 1).
Result: v · uᵢ = cᵢ.
Example: if v = 3u₁ + 5u₂, then v · u₁ = 3 and v · u₂ = 5 directly.

🌉 Connecting inner products and dot products

🌉 Encoding vectors as coordinate columns

In a general vector space V with an orthonormal basis O = (u₁, …, uₙ) and inner product ⟨·, ·⟩, any vector v can be written as:
- v = ⟨v, u₁⟩u₁ + ⋯ + ⟨v, uₙ⟩uₙ.
This is encoded as the column vector of coordinates:
- [⟨v, u₁⟩, …, ⟨v, uₙ⟩] in basis O.
Similarly for another vector v′: [⟨v′, u₁⟩, …, ⟨v′, uₙ⟩] in basis O.

🌉 Inner product equals dot product of coordinates

The inner product ⟨v, v′⟩ in the abstract space equals the dot product of the coordinate vectors:
- ⟨v, v′⟩ = ⟨v, u₁⟩⟨v′, u₁⟩ + ⋯ + ⟨v, uₙ⟩⟨v′, uₙ⟩.
Why: expand both vectors in the orthonormal basis and use linearity of the inner product; all cross terms vanish because ⟨uᵢ, uⱼ⟩ = δᵢⱼ.
Practical consequence: once you have an orthonormal basis, computing inner products becomes as simple as computing dot products of coordinate vectors.
Don't confuse: the dot product notation is often used even when the underlying operation is an inner product, because they coincide in an orthonormal basis.

📐 Example with polynomials

📐 Setting up the space

Consider the vector space V = span{1, x} (polynomials up to degree 1) with inner product:
- ⟨p, p′⟩ = ∫₀¹ p(x) p′(x) dx.
The obvious basis B = (1, x) is not orthonormal under this inner product.

📐 Constructing an orthonormal basis

An orthonormal basis is O = (1, 2√3(x − 1/2)).
Verification:
- ⟨2√3(x − 1/2), 1⟩ = 2√3 ∫₀¹ (x − 1/2) dx = 0 (orthogonality).
- ⟨x − 1/2, x − 1/2⟩ = ∫₀¹ (x − 1/2)² dx = 1/12 = (1/(2√3))² (unit length after scaling).

📐 Expressing a vector in the orthonormal basis

An arbitrary polynomial v = a + bx is expressed in the orthonormal basis O as:
- v = (a + b/2)·1 + b(x − 1/2).
The coordinate vector is [(a + b/2), b/(2√3)] in basis O.
Practical use: to compute the inner product of a + bx and a′ + b′x, just take the dot product of their coordinate vectors in O.

100

Orthonormal Bases and Dot Products

14.2.1 Orthonormal Bases and Dot Products

🧭 Overview

🧠 One-sentence thesis

Any vector in a space with an orthonormal basis can be expressed as a sum of projections onto those basis vectors, and the inner product between two vectors equals the dot product of their coordinate vectors in that basis.

📌 Key points (3–5)

Expansion formula: In an orthonormal basis, any vector v equals the sum of (v · uᵢ)uᵢ over all basis vectors uᵢ.
Inner product becomes dot product: When using an orthonormal basis, the inner product of two vectors in the abstract space equals the dot product of their coordinate column vectors.
Change of basis between orthonormal bases: The change-of-basis matrix P from one orthonormal basis to another satisfies P Pᵀ = I, meaning Pᵀ = P⁻¹.
Common confusion: Dot products only make sense in Rⁿ; for general vector spaces you must use an inner product, but an orthonormal basis lets you compute the inner product as if it were a dot product of coordinates.

🧮 Expressing vectors in orthonormal bases

🧮 The expansion theorem

Theorem 14.2.1: For an orthonormal basis {u₁, ..., uₙ}, any vector v can be expressed as v = ∑ᵢ (v · uᵢ) uᵢ.

The coefficient cᵢ in front of each basis vector uᵢ is simply the inner product (or dot product) v · uᵢ.
Why this works: Write v = c₁u₁ + ··· + cₙuₙ, then take the dot product with uᵢ:
- v · uᵢ = c₁(u₁ · uᵢ) + ··· + cᵢ(uᵢ · uᵢ) + ··· + cₙ(uₙ · uᵢ)
- Because the basis is orthonormal, uⱼ · uᵢ = 0 when j ≠ i and uᵢ · uᵢ = 1.
- So v · uᵢ = cᵢ · 1 = cᵢ.
Example: If you know v and an orthonormal basis, you can immediately read off the coordinates by computing inner products.

📐 Orthonormality condition

An orthonormal basis satisfies ⟨uᵢ, uⱼ⟩ = δᵢⱼ, where δᵢⱼ is 1 if i = j and 0 otherwise.
This means:
- Different basis vectors are orthogonal (inner product zero).
- Each basis vector has unit length (inner product with itself is 1).

🔗 Connecting inner products and dot products

🔗 Why dot products are not always available

In Rⁿ: The standard dot product is defined as (v₁, ..., vₙ) · (w₁, ..., wₙ) = v₁w₁ + ··· + vₙwₙ.
In general vector spaces: The dot product "makes no sense" because vectors may not be lists of numbers (e.g., polynomials, functions).
Instead, you must choose an "appropriate inner product" suited to the problem.

🔗 How orthonormal bases bridge the gap

Given an orthonormal basis O = (u₁, ..., uₙ) with inner product ⟨·, ·⟩:

Encode vectors as column vectors: Any vector v in V can be written as a column of its coordinates:
- v = ⟨v, u₁⟩u₁ + ··· + ⟨v, uₙ⟩uₙ
- Coordinate vector: (⟨v, u₁⟩, ..., ⟨v, uₙ⟩) in basis O.
Dot product of coordinates equals inner product:
- The dot product of the coordinate vectors (⟨v, u₁⟩, ..., ⟨v, uₙ⟩) · (⟨v′, u₁⟩, ..., ⟨v′, uₙ⟩) equals ⟨v, v′⟩.
- Why: Expand ⟨v, v′⟩ using linearity:
  - ⟨v, v′⟩ = ⟨∑ᵢ ⟨v, uᵢ⟩uᵢ, ∑ⱼ ⟨v′, uⱼ⟩uⱼ⟩
  - = ∑ᵢ ∑ⱼ ⟨v, uᵢ⟩⟨v′, uⱼ⟩⟨uᵢ, uⱼ⟩
  - Because ⟨uᵢ, uⱼ⟩ = δᵢⱼ, only terms with i = j survive.
  - = ⟨v, u₁⟩⟨v′, u₁⟩ + ··· + ⟨v, uₙ⟩⟨v′, uₙ⟩, which is the dot product formula.
Practical consequence: Once you have an orthonormal basis, you can compute inner products using the simpler dot product of coordinates.

📝 Polynomial example

Setup: V = span{1, x} with inner product ⟨p, p′⟩ = ∫₀¹ p(x)p′(x) dx.

The basis B = (1, x) is not orthonormal.
An orthonormal basis is O = (1, 2√3(x − 1/2)).
Verification:
- ⟨2√3(x − 1/2), 1⟩ = 2√3 ∫₀¹ (x − 1/2) dx = 0 (orthogonal).
- ⟨x − 1/2, x − 1/2⟩ = ∫₀¹ (x − 1/2)² dx = 1/12 = (1/(2√3))², so 2√3(x − 1/2) has unit length.

Using the orthonormal basis:

An arbitrary polynomial v = a + bx has coordinates in O:
- v = (a + b/2)·1 + b(x − 1/2) = (a + b/2, b/(2√3)) in basis O.
Inner product of a + bx and a′ + b′x via dot product:
- (a + b/2, b/(2√3)) · (a′ + b′/2, b′/(2√3)) = (a + b/2)(a′ + b′/2) + bb′/12 = aa′ + (1/2)(ab′ + a′b) + (1/3)bb′.
Direct computation confirms: ⟨a + bx, a′ + b′x⟩ = ∫₀¹ (a + bx)(a′ + b′x) dx = aa′ + (1/2)(ab′ + a′b) + (1/3)bb′.

Don't confuse: The inner product is defined on the polynomial space; the dot product is computed on coordinate vectors after choosing an orthonormal basis.

🔄 Relating two orthonormal bases

🔄 Change-of-basis matrix between orthonormal bases

Suppose T = {u₁, ..., uₙ} and R = {w₁, ..., wₙ} are two orthonormal bases for Rⁿ.

Each wᵢ can be expanded in the T basis: wᵢ = ∑ⱼ (uⱼ · wᵢ)uⱼ.
The change-of-basis matrix P from T to R has entries pⱼᵢ = uⱼ · wᵢ.

🔄 The orthogonality property: P Pᵀ = I

Claim: The matrix P satisfies P Pᵀ = Iₙ, so Pᵀ = P⁻¹.

Proof sketch:

The (j, k) entry of P Pᵀ is ∑ᵢ (uⱼ · wᵢ)(wᵢ · uₖ).
Use the "dirty trick": (uⱼ · wᵢ)(wᵢ · uₖ) = uⱼᵀ(wᵢwᵢᵀ)uₖ.
Sum over i: ∑ᵢ (uⱼ · wᵢ)(wᵢ · uₖ) = uⱼᵀ(∑ᵢ wᵢwᵢᵀ)uₖ.
Key equality: ∑ᵢ wᵢwᵢᵀ = Iₙ (explained below).
Therefore: uⱼᵀ Iₙ uₖ = uⱼᵀ uₖ = δⱼₖ.
So P Pᵀ = Iₙ.

🔄 Why ∑ᵢ wᵢwᵢᵀ = Iₙ

For any vector v, write v = ∑ⱼ cⱼwⱼ (since {wⱼ} is a basis).
Apply (∑ᵢ wᵢwᵢᵀ) to v:
- (∑ᵢ wᵢwᵢᵀ)v = (∑ᵢ wᵢwᵢᵀ)(∑ⱼ cⱼwⱼ) = ∑ⱼ cⱼ ∑ᵢ wᵢ(wᵢᵀwⱼ).
- wᵢᵀwⱼ = wᵢ · wⱼ = δᵢⱼ (orthonormality).
- So ∑ᵢ wᵢ(wᵢᵀwⱼ) = wⱼ.
- Therefore (∑ᵢ wᵢwᵢᵀ)v = ∑ⱼ cⱼwⱼ = v.
Since this holds for all v, ∑ᵢ wᵢwᵢᵀ = Iₙ.

Interpretation: The sum ∑ᵢ wᵢwᵢᵀ is the identity because it reconstructs any vector from its projections onto the orthonormal basis.

101

Relating Orthonormal Bases

14.3 Relating Orthonormal Bases

🧭 Overview

🧠 One-sentence thesis

When changing between two orthonormal bases, the change-of-basis matrix is orthogonal (its inverse equals its transpose), which preserves inner products and simplifies computations.

📌 Key points

Inner products become dot products: Once you express vectors in an orthonormal basis, computing inner products reduces to computing dot products of coordinate vectors.
Orthogonal matrices: A change-of-basis matrix between two orthonormal bases satisfies P inverse equals P transpose, making it an orthogonal matrix.
Gram-Schmidt procedure: Any set of linearly independent vectors can be systematically converted into an orthogonal (or orthonormal) basis.
Common confusion: "Orthogonal matrix" vs "orthonormal basis"—an orthogonal matrix has columns that form an orthonormal set of vectors, not just orthogonal vectors.
Orthogonal decomposition: Any vector v can be split into a component parallel to a given direction u and a component perpendicular to u.

🔄 Inner products and orthonormal bases

🔄 Why orthonormal bases simplify inner products

When you have an orthonormal basis u₁, …, uₙ, the inner product of two vectors v and v′ can be computed as a dot product of their coordinate vectors.
Specifically: the inner product of v and v′ equals the sum of products of their coordinates: ⟨v, u₁⟩⟨v′, u₁⟩ + ⋯ + ⟨v, uₙ⟩⟨v′, uₙ⟩.
This works because the basis is orthonormal: ⟨uᵢ, uⱼ⟩ equals 1 if i = j and 0 otherwise, so all cross terms vanish.

📐 Example with polynomials

The excerpt gives a concrete example in the space V = span{1, x} with inner product defined by integration from 0 to 1.

The obvious basis B = (1, x) is not orthonormal.
An orthonormal basis is O = (1, 2√3(x − 1/2)).
For an arbitrary polynomial v = a + bx, its coordinates in the orthonormal basis are (a + b/2, b/(2√3)).
The inner product of a + bx and a′ + b′x can now be computed as a dot product: (a + b/2)(a′ + b′/2) + (bb′)/12, which matches the integral formula.

Don't confuse: The inner product is defined by integration here, not by the usual dot product; the orthonormal basis allows you to compute it as if it were a dot product.

🔀 Change of basis between orthonormal bases

🔀 The change-of-basis matrix

Suppose T = {u₁, …, uₙ} and R = {w₁, …, wₙ} are two orthonormal bases.

Each wᵢ can be written as a linear combination of the uⱼ: wᵢ = (w₁ · u₁)u₁ + ⋯ + (wᵢ · uₙ)uₙ.
The change-of-basis matrix from T to R is P = (pⱼᵢ) where pⱼᵢ = uⱼ · wᵢ.

🔑 Orthogonal matrices

Orthogonal matrix: A matrix P is orthogonal if P inverse equals P transpose.

The excerpt proves that the product P Pᵀ equals the identity matrix Iₙ, which means Pᵀ = P⁻¹.
The key step uses the identity: the sum over i of wᵢwᵢᵀ equals Iₙ (this is shown by checking that it fixes every vector v).
Theorem 14.3.1: A change-of-basis matrix relating two orthonormal bases is an orthogonal matrix.

📊 Example in R³

The excerpt provides an example with an orthonormal basis S = (u₁, u₂, u₃) in R³ and the standard basis E.

The change-of-basis matrix from E to S is P = (u₁ u₂ u₃), whose columns are the new basis vectors.
The inverse is simply Pᵀ, whose rows are the transposes of the basis vectors.
Verification: PᵀP = I because matrix multiplication amounts to taking dot products between rows and columns, and the uᵢ are orthonormal.

Important note: The columns of an orthogonal matrix form an orthonormal set of vectors, not just an orthogonal set.

🔷 Symmetric matrices from orthogonal change of basis

If D is a diagonal matrix and P is an orthogonal matrix, then M = PDPᵀ is symmetric.

Proof: Mᵀ = (PDPᵀ)ᵀ = (Pᵀ)ᵀDᵀPᵀ = PDPᵀ = M.
This shows that orthogonal changes of basis preserve symmetry when starting from a diagonal matrix.

⊥ Orthogonal decomposition

⊥ Splitting a vector into parallel and perpendicular parts

Given a vector v and another vector u not in span{v}, you can construct a new vector v⊥ orthogonal to u:

v⊥ := v − (u · v / u · u)u.
This is orthogonal to u because u · v⊥ = u · v − (u · v / u · u)(u · u) = 0.
The term v‖ := (u · v / u · u)u is the component of v parallel to u.
So v = v⊥ + v‖ is an orthogonal decomposition of v.

Don't confuse: This decomposition depends on the choice of u; changing u changes both v⊥ and v‖.

🔨 Building an orthogonal basis from two vectors

If u and v are linearly independent, then {u, v⊥} is an orthogonal basis for span{u, v}.

Normalizing these vectors gives an orthonormal basis: {u/|u|, v⊥/|v⊥|}.
In R³, you can extend this to a full orthogonal basis {u, v⊥, u × v⊥}.

🔧 Extending to three vectors

Given a third vector w not in span{u, v}, define:

w⊥ := w − (u · w / u · u)u − (v⊥ · w / v⊥ · v⊥)v⊥.
The excerpt verifies that u · w⊥ = 0 and v⊥ · w⊥ = 0.
So {u, v⊥, w⊥} is an orthogonal basis for span{u, v, w}.

🔄 The Gram-Schmidt procedure

🔄 The algorithm

Given an ordered set (v₁, v₂, …) of linearly independent vectors, the Gram-Schmidt procedure produces an orthogonal basis (v⊥₁, v⊥₂, …) for the same span.

The formulas:

v⊥₁ := v₁
v⊥₂ := v₂ − (v⊥₁ · v₂ / v⊥₁ · v⊥₁)v⊥₁
v⊥₃ := v₃ − (v⊥₁ · v₃ / v⊥₁ · v⊥₁)v⊥₁ − (v⊥₂ · v₃ / v⊥₂ · v⊥₂)v⊥₂
In general: v⊥ᵢ := vᵢ − sum over j < i of (v⊥ⱼ · vᵢ / v⊥ⱼ · v⊥ⱼ)v⊥ⱼ

🧩 How it works

Each v⊥ᵢ depends on all previous v⊥ⱼ for j < i, allowing you to build up the orthogonal set inductively.
The resulting set {v⊥₁, v⊥₂, …} is linearly independent and orthogonal, and spans the same space as {v₁, v₂, …}.
To get an orthonormal basis, divide each v⊥ᵢ by its length.

Important: The order of the input vectors matters; changing the order gives a different orthogonal basis.

📝 Example in R³

The excerpt applies Gram-Schmidt to the set {(1,1,1), (1,1,0), (3,1,1)}.

Reorder to (v₁, v₂, v₃) = ((1,1,0), (1,1,1), (3,1,1)) to simplify computations (choosing the vector with the most zeros first).
v⊥₁ := v₁ = (1,1,0).
v⊥₂ := (1,1,1) − (2/2)(1,1,0) = (0,0,1).
v⊥₃ := (3,1,1) − (4/2)(1,1,0) − (1/1)(0,0,1) = (1,−1,0).
The orthogonal basis is {(1,1,0), (0,0,1), (1,−1,0)}.
Normalizing gives the orthonormal basis {(1/√2, 1/√2, 0), (0,0,1), (1/√2, −1/√2, 0)}.

Practical tip: Choose the order of input vectors to minimize computation; putting vectors with more zeros first often helps.

🏛️ Historical note

The algorithm is named after Gram (who worked at a Danish insurance company over a hundred years ago) and Schmidt (a student of the famous German mathematician Hilbert).

🔢 QR decomposition (preview)

🔢 Connection to matrix factorization

The excerpt mentions that just as a matrix M can be decomposed into lower and upper triangular matrices M = LU, the Gram-Schmidt procedure is related to another decomposition called QR decomposition.

The excerpt does not provide details, but signals that Section 14.5 will cover this topic.
This suggests that Gram-Schmidt has applications beyond just finding orthonormal bases—it can be used to factor matrices in useful ways.

Note: The excerpt ends before explaining QR decomposition, so no further details are available here.

Budget: 1000000 Used: 237918 (including this closure)

102

Gram-Schmidt & Orthogonal Complements

14.4 Gram-Schmidt & Orthogonal Complements

🧭 Overview

🧠 One-sentence thesis

The Gram-Schmidt procedure systematically transforms any linearly independent set of vectors into an orthogonal (or orthonormal) basis for the same span, enabling applications like QR decomposition.

📌 Key points (3–5)

What Gram-Schmidt does: converts a linearly independent set of vectors into an orthogonal basis for their span, preserving the span.
How it works: each new orthogonal vector is built by subtracting projections onto all previously computed orthogonal vectors.
Order matters: the algorithm requires an ordered input set; changing the order produces a different orthogonal basis.
Common confusion: the procedure builds each vector inductively—each v⊥ᵢ depends on all earlier v⊥ⱼ (j < i), not on the original vectors alone.
Why it matters: Gram-Schmidt enables QR decomposition, which is useful for solving linear systems, eigenvalue problems, and least squares approximations.

🔧 Building orthogonal vectors step-by-step

🔧 The basic idea: removing projections

The excerpt shows how to construct an orthogonal vector w⊥ from a third vector w, given two already-orthogonal vectors u and v⊥:

w⊥ := w − (u · w / u · u) u − (v⊥ · w / v⊥ · v⊥) v⊥

Start with the original vector w.
Subtract the projection of w onto u.
Subtract the projection of w onto v⊥.
The result w⊥ is orthogonal to both u and v⊥.

Why this works: The excerpt verifies that u · w⊥ = 0 and v⊥ · w⊥ = 0 by expanding the dot products and using the fact that u is orthogonal to v⊥.

Example: If you have three linearly independent vectors u, v, w, you first make v⊥ orthogonal to u, then make w⊥ orthogonal to both u and v⊥. The set {u, v⊥, w⊥} is an orthogonal basis for span{u, v, w}.

🔄 Inductive construction

Each orthogonal vector v⊥ᵢ depends on all previously computed orthogonal vectors v⊥₁, v⊥₂, ..., v⊥ᵢ₋₁.
This allows you to build up the orthogonal set algorithmically, one vector at a time.
The span is preserved: span{v⊥₁, v⊥₂, ...} = span{v₁, v₂, ...}.

Don't confuse: You do not subtract projections onto the original vectors vⱼ; you subtract projections onto the already-orthogonalized vectors v⊥ⱼ.

📐 The Gram-Schmidt procedure

📐 The general algorithm

Given an ordered set (v₁, v₂, ...) of linearly independent vectors, define:

v⊥₁ := v₁
v⊥₂ := v₂ − (v⊥₁ · v₂ / v⊥₁ · v⊥₁) v⊥₁
v⊥₃ := v₃ − (v⊥₁ · v₃ / v⊥₁ · v⊥₁) v⊥₁ − (v⊥₂ · v₃ / v⊥₂ · v⊥₂) v⊥₂
...
v⊥ᵢ := vᵢ − (v⊥₁ · vᵢ / v⊥₁ · v⊥₁) v⊥₁ − (v⊥₂ · vᵢ / v⊥₂ · v⊥₂) v⊥₂ − ... − (v⊥ᵢ₋₁ · vᵢ / v⊥ᵢ₋₁ · v⊥ᵢ₋₁) v⊥ᵢ₋₁

The result is a linearly independent, orthogonal set {v⊥₁, v⊥₂, ...} that forms an orthogonal basis for the same vector space.

🔀 Order dependence

The algorithm requires an ordered set to uniquely specify the result.
Changing the order of the input vectors will produce a different orthogonal basis.
Practical tip: The excerpt suggests choosing the vector with the most zeros to be first, because it is used most often in the algorithm and simplifies computations.

🎯 From orthogonal to orthonormal

The Gram-Schmidt procedure produces an orthogonal basis.
To obtain an orthonormal basis, divide each orthogonal vector by its length.
Example: The excerpt shows that after Gram-Schmidt on three vectors in R³, dividing each result by its length yields an orthonormal basis.

🧮 Worked example in R³

🧮 Setup and ordering choice

The excerpt applies Gram-Schmidt to the linearly independent set {(1,1,1), (1,1,0), (3,1,1)}.

Ordering strategy: Choose (v₁, v₂, v₃) := ((1,1,0), (1,1,1), (3,1,1)) because the first vector has the most zeros, simplifying later computations.

🧮 Step-by-step computation

Set v⊥₁ := v₁ = (1,1,0).
Compute v⊥₂ := (1,1,1) − (2/2)(1,1,0) = (0,0,1).
Compute v⊥₃ := (3,1,1) − (4/2)(1,1,0) − (1/1)(0,0,1) = (1,−1,0).

Result: The set {(1,1,0), (0,0,1), (1,−1,0)} is an orthogonal basis for R³.

Orthonormal version: Divide each by its length to get {(1/√2, 1/√2, 0), (0,0,1), (1/√2, −1/√2, 0)}.

🔢 QR decomposition

🔢 What QR decomposition is

M = QR, where Q is an orthogonal matrix and R is an upper triangular matrix.

The columns of Q are obtained by applying Gram-Schmidt to the columns of M and normalizing.
The matrix R records the steps of the Gram-Schmidt procedure.
QR decompositions are useful for solving linear systems, eigenvalue problems, and least squares approximations.

🔢 How R encodes the Gram-Schmidt steps

The excerpt shows a 3×3 example where:

The first matrix in the decomposition has orthogonal columns (produced by Gram-Schmidt).
The matrix R on the right "undoes" the Gram-Schmidt transformations so that the product QR equals the original M.
R is almost the identity matrix, with entries above the diagonal that record the projection coefficients subtracted during Gram-Schmidt.

Example structure: M = (first matrix with orthogonal columns) × (upper triangular R).

🔢 Building Q and R together

The excerpt demonstrates:

Replace the second column of M with the Gram-Schmidt result from the first two columns.
Record the projection coefficient (e.g., 1/5) in the (1,2) entry of R.
Repeat for the third column, recording coefficients in the first row of R.
The first matrix now has mutually orthogonal columns; normalize them to get Q (the excerpt notes this final step is needed for a "bona fide" orthogonal matrix).

Don't confuse: The intermediate matrix has orthogonal columns but not necessarily unit length; the final Q must have orthonormal columns.

103

14.4.1 The Gram-Schmidt Procedure

🧭 Overview

🧠 One-sentence thesis

The Gram-Schmidt procedure algorithmically transforms any linearly independent set of vectors into an orthogonal (or orthonormal) basis that spans the same vector space.

📌 Key points (3–5)

What the procedure does: builds an orthogonal basis from a linearly independent set while preserving the span.
How it works: each new orthogonal vector is constructed by subtracting projections onto all previously computed orthogonal vectors.
Order matters: changing the order of the input vectors produces a different orthogonal basis.
Common confusion: the first vector is used most often in the algorithm, so choosing the vector with the most zeros first can simplify computation.
Practical outcome: the resulting orthogonal basis can be normalized (divided by vector lengths) to obtain an orthonormal basis.

🔧 How the algorithm works

🔧 Inductive construction

The excerpt explains that each orthogonal vector v⊥ᵢ depends on all previously computed orthogonal vectors v⊥ⱼ for every j < i.

The procedure builds up the set {v⊥₁, v⊥₂, ...} step by step.
At each step, you subtract off components in the directions of all earlier orthogonal vectors.
This ensures that the new vector is orthogonal to all previous ones.

Why this works:

Subtracting projections removes any overlap with earlier directions.
The resulting vectors remain linearly independent and span the same space as the original set.

📐 The projection formula

The excerpt shows the pattern:

Each v⊥ᵢ is computed by taking the original vector vᵢ and subtracting its projections onto all earlier v⊥ⱼ.
The notation involves dot products and division by lengths (or squared lengths).

Example from the excerpt:

v⊥₂ := v₂ − (projection onto v⊥₁)
v⊥₃ := v₃ − (projection onto v⊥₁) − (projection onto v⊥₂)

Don't confuse:

The projections are onto the new orthogonal vectors v⊥ⱼ, not the original vectors vⱼ.

🎯 Practical considerations

🎯 Order dependence

The set of vectors you start out with needs to be ordered to uniquely specify the algorithm; changing the order of the vectors will give a different orthogonal basis.

The algorithm is not symmetric: the first vector is used most frequently.
You may need to choose an ordering yourself.
Strategy from the excerpt: choose the vector with the most zeros to be first to simplify computations.

🎯 From orthogonal to orthonormal

The excerpt demonstrates two stages:

Orthogonal basis: vectors are perpendicular but not necessarily unit length.
Orthonormal basis: divide each orthogonal vector by its length.

Example: The excerpt shows the orthogonal basis {(1,1,0), (0,0,1), (1,−1,0)} for R³, then normalizes each vector by dividing by √2, 1, and √2 respectively.

📝 Worked example: R³

📝 Setup and ordering choice

The excerpt applies Gram-Schmidt to the linearly independent set {(1,1,1), (1,1,0), (3,1,1)} in R³.

Ordering decision:

The algorithm uses the first vector most often.
The excerpt chooses (1,1,0) as v₁ because it has the most zeros.
Final order: v₁ = (1,1,0), v₂ = (1,1,1), v₃ = (3,1,1).

📝 Step-by-step computation

First vector: v⊥₁ := v₁ = (1,1,0) (no change).
Second vector: v⊥₂ := (1,1,1) − (projection onto v⊥₁) = (0,0,1).
- The excerpt shows the projection coefficient as 2/2.
Third vector: v⊥₃ := (3,1,1) − (projection onto v⊥₁) − (projection onto v⊥₂) = (1,−1,0).
- The excerpt shows projection coefficients 4/2 and 1/1.

Result:

Orthogonal basis: {(1,1,0), (0,0,1), (1,−1,0)}.
Orthonormal basis: {(1/√2, 1/√2, 0), (0,0,1), (1/√2, −1/√2, 0)}.

🏛️ Historical note

🏛️ Origin of the name

The excerpt mentions:

Gram: worked at a Danish insurance company over one hundred years ago.
Schmidt: was a student of Hilbert, the famous German mathematician.

This context explains why the procedure is called "Gram–Schmidt orthogonalization."

104

QR Decomposition

14.5 QR Decomposition

🧭 Overview

🧠 One-sentence thesis

QR decomposition expresses a matrix as the product of an orthogonal matrix Q and an upper triangular matrix R by applying the Gram–Schmidt procedure to the columns of the original matrix, and this decomposition is useful for solving linear systems, eigenvalue problems, and least squares approximations.

📌 Key points (3–5)

What QR decomposition is: a factorization M = QR where Q is orthogonal (orthonormal columns) and R is upper triangular.
How to build it: apply Gram–Schmidt to the columns of M to create orthogonal vectors, then normalize them to form Q; record the Gram–Schmidt steps in R so that QR = M.
Key normalization step: after Gram–Schmidt produces orthogonal columns, divide each column by its length (to make Q orthonormal) and multiply the corresponding row of R by the same length.
Common confusion: the first matrix after Gram–Schmidt has orthogonal columns, but Q must have orthonormal columns—don't forget the final normalization step.
Why it matters: QR decompositions are useful for solving linear systems, eigenvalue problems, and least squares approximations.

🔧 Building the QR decomposition

🔧 The Gram–Schmidt backbone

Think of the columns of M as vectors and use Gram–Schmidt to build an orthonormal basis from them.
These orthonormal vectors become the columns of the orthogonal matrix Q.
The matrix R records the steps of the Gram–Schmidt procedure so that the product QR equals M.

📝 Step-by-step construction

The excerpt walks through a concrete example with a 3×3 matrix:

First column: keep it as is.
Second column: replace it with the orthogonal vector produced by Gram–Schmidt from the first two columns of M.
Third column: use Gram–Schmidt to produce a vector orthogonal to the first two.
Record the steps: use R (initially almost the identity matrix) to "undo" the changes so that multiplying the two matrices gives back M.

Example from the excerpt: the second column of M is replaced by a vector orthogonal to the first column, and R gets a +1/5 entry in the first row, second column, which undoes this change when the matrices are multiplied.

🔄 The normalization step

After Gram–Schmidt, the first matrix has mutually orthogonal column vectors, but not yet orthonormal.

An orthogonal matrix is comprised of orthonormal vectors (unit length).

How to fix: divide each column of the first matrix by its length.
Balance the equation: multiply the corresponding row of the second matrix by the same amount.
This ensures QR still equals M while making Q truly orthogonal (orthonormal).

Don't confuse: "orthogonal columns" (perpendicular) vs. "orthonormal columns" (perpendicular and unit length)—Q requires the latter.

🧮 Understanding the R matrix

🧮 What R records

R is upper triangular.
Entry (i, j) of R equals the dot product of the i-th column of Q with the j-th column of M.
The excerpt notes: "Some people memorize this fact and use it as a recipe for computing QR decompositions."
This dot-product property is a useful check and an alternative way to think about constructing R.

🌐 Geometric interpretation

The excerpt provides a geometric view:

Start with three vectors given by the columns of M.
Rotate them so that:
- The first lies along the x-axis.
- The second lies in the xy-plane.
- The third lies in some other generic direction.
This rotation is what the QR decomposition accomplishes.

Example: in the worked example, the third vector ends up in the yz-plane.

🧩 Orthogonal complements (related concept)

🧩 Subspace sums and direct sums

The excerpt introduces notation for combining subspaces:

Sum of U and V: U + V := span(U ∪ V) = {u + v | u ∈ U, v ∈ V}

Direct sum: If U ∩ V = {0_W} then U ⊕ V := span(U ∪ V) = {u + v | u ∈ U, v ∈ V}

Key distinction:

When U ∩ V = {0_W}, U + V = U ⊕ V (direct sum).
When U ∩ V ≠ {0_W}, U + V ≠ U ⊕ V (not a direct sum).

Why direct sums matter: If w ∈ U ⊕ V, there is only one way to write w as the sum of a vector in U and a vector in V (uniqueness property).

⊥ Orthogonal complement definition

Orthogonal complement of U in W: U⊥ := {w ∈ W | w · u = 0 for all u ∈ U}

Read as "U-perp."
This is the set of all vectors in W orthogonal to every vector in U.
For a general inner product: U⊥ := {w ∈ W | ⟨w, u⟩ = 0 for all u ∈ U}.

🏗️ Key theorem about orthogonal complements

The excerpt states:

Theorem 14.6.2: Let U be a subspace of a finite-dimensional vector space W. Then U⊥ is a subspace of W, and W = U ⊕ U⊥.

What this means:

U⊥ is itself a subspace (closure holds).
U ∩ U⊥ = {0} (only the zero vector is in both).
Every vector w ∈ W can be uniquely written as w = u + u⊥ where u ∈ U and u⊥ ∈ U⊥.

How to construct the decomposition: Let e₁, …, eₙ be an orthonormal basis for U. For any w ∈ W:

Set u = (w · e₁)e₁ + ⋯ + (w · eₙ)eₙ ∈ U.
Set u⊥ = w − u.
Then u⊥ ∈ U⊥ and w = u + u⊥.

📐 Examples of orthogonal complements

Example (R³): Consider any plane P through the origin in R³. Then P⊥ is the line through the origin orthogonal to P.

If P is the xy-plane, then R³ = P ⊕ P⊥ = {(x, y, 0) | x, y ∈ R} ⊕ {(0, 0, z) | z ∈ R}.

Example (R⁴): Let L = span{(1, 1, 1, 1)} be a line in R⁴. Then:

L⊥ = {(x, y, z, w) ∈ R⁴ | x + y + z + w = 0}, a 3-dimensional subspace.
The excerpt shows how to use Gram–Schmidt to find an orthogonal (then orthonormal) basis for L⊥.
R⁴ = L ⊕ L⊥, a decomposition into a line and its three-dimensional orthogonal complement.

🔁 Involution property

The excerpt notes: for any subspace U, (U⊥)⊥ is just U again.

An involution is any mathematical operation which performed twice does nothing.

So ⊥ is an involution on the set of subspaces of a vector space.

105

Orthogonal Complements

14.6 Orthogonal Complements

🧭 Overview

🧠 One-sentence thesis

Every finite-dimensional vector space can be decomposed into a subspace and its orthogonal complement, which together span the entire space without overlap.

📌 Key points (3–5)

What orthogonal complement is: the set of all vectors orthogonal to every vector in a given subspace U, denoted U-perp.
Key theorem: For any subspace U of a finite-dimensional space W, the orthogonal complement U-perp is also a subspace, and W = U ⊕ U-perp (direct sum).
Involution property: Taking the orthogonal complement twice returns the original subspace: (U-perp)-perp = U.
Common confusion: U-perp is not just "vectors perpendicular to U"; it must be perpendicular to every vector in U, not just a basis.
Why it matters: Orthogonal complements allow any vector to be uniquely decomposed into components parallel and perpendicular to a subspace.

🔍 Definition and basic concept

🔍 What U-perp means

Orthogonal complement U-perp: the set of all vectors w in W such that the dot product w · u = 0 for all u in U.

Notation: U-perp is read "U-perp" or "U orthogonal."
The definition assumes the dot product as the inner product; for a general inner product, replace w · u with ⟨w, u⟩.
Key requirement: orthogonality to every vector in U, not just one or a few.

🎯 Geometric intuition

Example from excerpt: Consider any plane P through the origin in R³. Then P-perp is the line through the origin orthogonal to P.
- If P is the xy-plane, then R³ = P ⊕ P-perp = {(x, y, 0) | x, y in R} ⊕ {(0, 0, z) | z in R}.
Another example: Consider any line L through the origin in R⁴. Then L-perp is a 3-dimensional subspace orthogonal to L.
- If L = span of (1, 1, 1, 1), then L-perp = {(x, y, z, w) in R⁴ | x + y + z + w = 0}.

🧩 Main theorem and proof structure

🧩 Three claims of the theorem

The theorem states that for subspace U of finite-dimensional W:

U-perp is a subspace of W.
U ∩ U-perp = {0} (they intersect only at the zero vector).
W = U ⊕ U-perp (every vector in W can be written as u + u-perp).

✅ Why U-perp is a subspace

Closure check: Suppose v, w are in U-perp. Then v · u = 0 and w · u = 0 for all u in U.
For any scalars α, β: u · (αv + βw) = α(u · v) + β(u · w) = 0 for all u in U.
Therefore αv + βw is in U-perp, proving closure under linear combinations.

🔒 Why U and U-perp intersect only at zero

Suppose u is in both U and U-perp.
Then u · u = 0 (since u is in U-perp and must be orthogonal to every vector in U, including itself).
But u · u = 0 implies u = 0.
Don't confuse: this does not mean U and U-perp are disjoint except at zero; it means their only shared vector is zero.

🏗️ Why every vector decomposes into U ⊕ U-perp

Construction: Let e₁, ..., eₙ be an orthonormal basis for U.
For any w in W, define:
- u = (w · e₁)e₁ + ... + (w · eₙ)eₙ (this is in U)
- u-perp = w - u
The vector u-perp is in U-perp (can be checked using the Gram-Schmidt procedure).
Then w = u + u-perp, so w is in U ⊕ U-perp.
Note: This step requires W to be finite-dimensional.

🔄 Involution property

🔄 Double complement returns to the original

For any subspace U, the orthogonal complement of U-perp is U itself: (U-perp)-perp = U.
This makes "perp" an involution: a mathematical operation that, performed twice, does nothing (returns to the starting point).
Example: If P is a plane in R³ and L is the perpendicular line, then L-perp = P.

📐 Worked example: finding an orthonormal basis for L-perp

📐 Setup

Let L = span of (1, 1, 1, 1) be a line in R⁴.
Then L-perp = {(x, y, z, w) in R⁴ | x + y + z + w = 0}.
The excerpt finds an orthonormal basis for L-perp using Gram-Schmidt.

🛠️ Step-by-step construction

Start with a basis for L-perp: The set {(1, -1, 0, 0), (1, 0, -1, 0), (1, 0, 0, -1)} forms a basis.
Apply Gram-Schmidt:
- Set v-perp₁ = v₁ = (1, -1, 0, 0).
- Compute v-perp₂ = v₂ - projection of v₂ onto v-perp₁ = (1/2, 1/2, -1, 0).
- Compute v-perp₃ = v₃ - projection onto v-perp₁ - projection onto v-perp₂ = (1/3, 1/3, 1/3, -1).
Normalize: Divide each vector by its length to get an orthonormal basis.
Result: R⁴ = L ⊕ L-perp, a decomposition into a line and its 3-dimensional orthogonal complement.

🔍 Key observation

The dimension of L is 1, and the dimension of L-perp is 3.
In general, if U has dimension k in an n-dimensional space, U-perp has dimension n - k.

106

14.7 Review Problems

🧭 Overview

🧠 One-sentence thesis

This collection of review problems consolidates techniques for working with orthogonal and orthonormal bases, the Gram-Schmidt procedure, symmetric matrices, and their eigenvalue properties in various vector spaces.

📌 Key points (3–5)

Orthogonal vs orthonormal bases: orthogonal bases have perpendicular vectors but not necessarily unit length; orthonormal bases require both perpendicularity and unit length.
Gram-Schmidt procedure: a systematic method to convert any basis into an orthogonal or orthonormal basis, applicable to both finite-dimensional spaces and function spaces with inner products.
Symmetric matrices always have real eigenvalues: the discriminant formula for 2×2 symmetric matrices shows why eigenvalues must be real, and this generalizes to all symmetric matrices.
Common confusion: orthogonal matrices (Q) preserve inner products, but the relationship between eigenvectors/eigenvalues and orthogonal transformations requires careful analysis.
Orthogonal complements: for any subspace U within W, the orthogonal complement U⊥ is also a subspace, and direct sum decompositions depend on the inner product structure.

🔧 Working with orthogonal bases

🔧 Finding coefficients in orthogonal bases

Problem 2 asks: if S = {v₁, ..., vₙ} is an orthogonal (not orthonormal) basis for Rⁿ, and any vector v can be written as v = sum of cᵢvᵢ, how do you find the constants cᵢ?

Key difference from orthonormal bases: orthonormal bases allow immediate coefficient extraction via dot products because each basis vector has length 1.
For orthogonal bases, the vectors are perpendicular but may have different lengths.
The hint suggests using the orthogonality property: dot v with each basis vector to isolate the corresponding coefficient.
Example approach: take v · vⱼ to eliminate all terms except cⱼ(vⱼ · vⱼ), then solve for cⱼ.

📐 Constructing orthonormal bases from two vectors

Problem 3 guides construction of an orthonormal basis for R³ starting from two linearly independent vectors u and v:

(a) The perpendicular component v⊥

The formula v⊥ := v - (u·v)/(u·u) u removes the component of v parallel to u.
This vector lies in the plane P = span{u, v} because it is a linear combination of u and v.

(b) Angle between v⊥ and u

The construction ensures v⊥ is perpendicular to u.
The cosine of the angle between perpendicular vectors is 0.

(c) Finding a third perpendicular vector

Need a vector perpendicular to both u and v⊥.
The hint suggests using properties of the plane spanned by u and v⊥.

(d) Testing with concrete vectors

The problem provides u = (1, 2, 0) and v = (0, 1, 1) for verification.
Apply the abstract formulas to check the procedure works.

🔄 Gram-Schmidt procedure applications

🔄 Systematic orthogonalization in R⁴

Problem 4 outlines a step-by-step procedure to find an orthonormal basis for R⁴ that includes (1, 1, 1, 1):

Step	What to do	Matrix equation
(a)	Find v₂ perpendicular to v₁	v₁ᵀx = 0
(b)	Find v₃ perpendicular to v₁ and v₂	(v₁ᵀ, v₂ᵀ)x = 0
(c)	Find v₄ perpendicular to v₁, v₂, v₃	(v₁ᵀ, v₂ᵀ, v₃ᵀ)x = 0
(d)	Normalize all four vectors	Divide each by its length

Key technique: use Gaussian elimination to pick specific vectors from the solution sets.
The procedure specifies which variable (x₂, x₃, etc.) should be the coefficient at each step.
Don't confuse: finding perpendicular vectors vs. normalizing them—these are separate steps.

📊 Gram-Schmidt on function spaces

Problems 5 and 6 apply Gram-Schmidt to vector spaces of functions:

Problem 5: Polynomial space

Vector space V = span{1, x, x², x³}
Inner product: f · g := integral from 0 to 1 of f(x)g(x)dx
Apply Gram-Schmidt to {1, x, x², x³}

Problem 6: Trigonometric functions

Vector space V = span{sin(x), sin(2x), sin(3x)}
Inner product: f · g := integral from 0 to 2π of f(x)g(x)dx
The problem asks to extend the pattern to build an orthonormal basis for span{sin(nx) | n ∈ N}

Why function spaces matter: the same orthogonalization techniques work for infinite-dimensional spaces when equipped with appropriate inner products.

🔢 Concrete Gram-Schmidt example

Problem 8 asks to carefully write out the Gram-Schmidt procedure for the vectors (1,1,1), (1,-1,1), (1,1,-1):

This is a worked example to practice the algorithm step-by-step.
Additional question: can the second vector obtained be rescaled to have integer components?
Example: if the procedure produces a vector with fractional entries, find an integer multiple that clears denominators.

🔐 Properties of orthogonal matrices

🔐 Inner product preservation

Problem 7(a) asks to show that orthogonal n×n matrices Q preserve inner products:

If Q is orthogonal, then u · v = (Qu) · (Qv) for any u, v ∈ Rⁿ.

What this means: applying Q to both vectors does not change their dot product.
This property characterizes orthogonal matrices geometrically: they preserve angles and lengths.

🔄 Outer products and eigenstructure

Problem 7 continues with related questions:

(b) Outer product preservation

Does Q preserve the outer product?
Don't confuse inner products (scalars) with outer products (matrices).

(c) Eigenvalues of special matrices

If {u₁, ..., uₙ} is orthonormal and M = sum of λᵢuᵢuᵢᵀ, what are the eigenvalues and eigenvectors?
The structure suggests the uᵢ are eigenvectors with eigenvalues λᵢ.

(d) Effect of orthogonal transformation

How do eigenvectors and eigenvalues change if we replace {u₁, ..., uₙ} by {Qu₁, ..., Quₙ}?
This tests understanding of how orthogonal transformations interact with eigenstructure.

🧮 Linear independence and Gram-Schmidt

🧮 Independence preservation

Problem 9 explores how Gram-Schmidt preserves linear independence:

(a) Two vectors

If u and v are linearly independent, show u and v⊥ are also linearly independent.
Explain why {u, v⊥} is a basis for span{u, v}.
Hint provided suggests a specific approach.

(b) Three vectors

Extend the argument to three independent vectors u, v, w.
Show that u, v⊥, w⊥ (as defined by Gram-Schmidt) remain independent.

Why this matters: Gram-Schmidt transforms a basis into an orthogonal basis without losing the spanning property.

⚠️ When Gram-Schmidt fails

Problem 11 asks: given any three vectors u, v, w, when do v⊥ or w⊥ vanish?

Key insight: if v⊥ = 0, then v is already in span{u} (linearly dependent).
Similarly, w⊥ = 0 means w is in span{u, v⊥}.
Example: if the original vectors are not linearly independent, Gram-Schmidt produces zero vectors at some step.

🏗️ Matrix factorizations and decompositions

🏗️ QR factorization

Problem 10 asks to find the QR factorization of a specific 3×3 matrix:

What QR factorization is: decompose M = QR where Q is orthogonal and R is upper triangular.
The Gram-Schmidt procedure on the columns of M produces Q.
R encodes the coefficients from the orthogonalization process.

🧩 Subspace complements

Problem 12: use the subspace theorem to check that U⊥ is a subspace of W when U is a subspace of W.

Subspace theorem requirements: must contain zero vector, closed under addition, closed under scalar multiplication.
For orthogonal complement: if x and y are both perpendicular to all vectors in U, then so is any linear combination.

🔲 Symmetric and antisymmetric matrices

Problem 13 explores the decomposition of n×n matrices:

Definitions

Sₙ = space of n×n symmetric matrices (Mᵀ = M)
Aₙ = space of n×n antisymmetric matrices (Mᵀ = -M)
Mₙₙ = space of all n×n matrices

Questions

What are dim Mₙₙ, dim Sₙ, and dim Aₙ?
Show that Mₙₙ = Sₙ + Aₙ (every matrix is a sum of symmetric and antisymmetric parts).
Define inner product M · N = tr(MN) (trace of the product).
Is Aₙ⊥ = Sₙ? Is Mₙₙ = Sₙ ⊕ Aₙ (direct sum)?

Key distinction: sum vs. direct sum—direct sum requires the intersection to be {0}.

🎯 Symmetric matrices and eigenvalues

🎯 Real eigenvalues of symmetric matrices

The excerpt proves that 2×2 symmetric matrices always have real eigenvalues:

Definition: A matrix M is symmetric if Mᵀ = M.

Example 140: General 2×2 symmetric matrix

Consider the matrix with entries a, b, b, d (symmetric because both off-diagonal entries are b).
Characteristic polynomial: Pλ = λ² - (a+d)λ - b² + ad
Eigenvalue formula: λ = (a+d)/2 ± sqrt(b² + ((a-d)/2)²)
Key observation: the discriminant 4b² + (a-d)² is always positive (sum of squares).
Therefore eigenvalues must be real (no imaginary part from the square root).

Why this matters: symmetric matrices arise naturally in applications (like the distance table example), and real eigenvalues guarantee certain geometric properties.

🔗 Eigenvectors of distinct eigenvalues

The excerpt begins to explore eigenvectors:

Suppose symmetric matrix M has two distinct eigenvalues λ ≠ μ with eigenvectors x and y.
Mx = λx and My = μy
The excerpt starts to consider the dot product x · y = xᵀy...
(The excerpt cuts off here, but this setup typically leads to proving eigenvectors are orthogonal.)

📐 Function space orthogonal complements

Problem 14 applies orthogonal complement concepts to a function space:

Vector space V = span{sin(t), sin(2t), sin(3t)} (note: sin(3t) listed twice, likely a typo).
Inner product: f · g := integral from 0 to 2π of f(t)g(t)dt
Find the orthogonal complement to U = span{sin(t) + sin(2t)} in V.
Express sin(t) - sin(2t) as the sum of vectors from U and U⊥.

Example approach:

A vector in U⊥ must have inner product zero with sin(t) + sin(2t).
Use the orthogonality of sine functions over [0, 2π] to find the complement.
Decompose sin(t) - sin(2t) = (component in U) + (component in U⊥).

107

15.1 Review Problems

🧭 Overview

🧠 One-sentence thesis

Every real symmetric matrix can be diagonalized by an orthogonal matrix built from an orthonormal basis of its eigenvectors, and the review problems explore the properties of eigenvalues, eigenvectors, and eigenspaces that make this diagonalization possible.

📌 Key points (3–5)

Core theorem: A matrix M is symmetric (M = M^T) if and only if M can be written as PDP^T, where P is orthogonal and D is diagonal with eigenvalues on the diagonal.
How to diagonalize: Build an orthogonal matrix P from an orthonormal basis of eigenvectors; then D = P^T M P gives the diagonalized form.
Reality of eigenvalues: Symmetric matrices with real entries have real eigenvalues (Problem 1 proves this using complex conjugates).
Dimension relationship: The sum of the dimensions of all eigenspaces equals n for an n×n symmetric matrix (explored in Problem 4).
Common confusion: Not all matrices are diagonalizable, but symmetric matrices always are; the key is finding an orthonormal eigenvector basis.

🔑 The diagonalization theorem

🔑 Statement of the main result

Theorem 15.0.2: Every symmetric matrix is similar to a diagonal matrix of its eigenvalues. In other words, M = M^T if and only if M = PDP^T where P is an orthogonal matrix and D is a diagonal matrix whose entries are the eigenvalues of M.

The theorem guarantees that symmetric matrices can always be diagonalized.
The matrix P is built from eigenvectors and is orthogonal, meaning P^T P = I.
The diagonal matrix D contains the eigenvalues of M on its diagonal.

📐 How the diagonalization procedure works

The excerpt describes a successive procedure:

Start with a symmetric matrix M and find an eigenvector x₁ with eigenvalue λ₁.
Normalize x₁ to unit length and use it as the first column of P.
The transformation P^T M P produces a matrix with λ₁ in the top-left corner and zeros below it.
The remaining (n-1)×(n-1) block (denoted M̂) is also symmetric.
Repeat the procedure on M̂ to find the next eigenvalue and eigenvector.
Continue until all eigenvalues are isolated on the diagonal.

Don't confuse: The matrix P is not arbitrary; it must be built from orthonormal eigenvectors (unit length and mutually perpendicular).

🧮 Worked example: 2×2 case

🧮 Example 142 walkthrough

The excerpt provides a concrete 2×2 example:

Given matrix:

M = ( 2  1 )
    ( 1  2 )

Eigenvalues and eigenvectors:

Eigenvalue 3 with eigenvector (1, 1)
Eigenvalue 1 with eigenvector (1, -1)

Building the orthogonal matrix P:

Normalize the eigenvectors to unit length: (1/√2, 1/√2) and (1/√2, -1/√2)
Place these as columns of P:

P = ( 1/√2   1/√2  )
    ( 1/√2  -1/√2  )

Verification:

P^T P = I (P is orthogonal)
MP = PD where D is the diagonal matrix with entries 3 and 0 on the first row, 0 and 1 on the second row
Therefore D = P^T M P is the diagonalized form

Example extension: The excerpt mentions a "3×3 Example" but does not provide details.

🔬 Problem 1: Reality of eigenvalues

🔬 Complex conjugates and inner products

Problem 1 guides the reader through a proof that symmetric matrices with real entries have real eigenvalues.

Key steps:

(a) For a complex number z = x + iy, compute zz̄ (where z̄ = x - iy is the complex conjugate). The result is x² + y², which is a real, non-negative number.
(b) If λ = λ̄, then the imaginary part y must be zero, so λ is real.
(c) For a vector x in C^n, define x† as the row vector of complex conjugates. Then x†x is a sum of terms z̄ᵢzᵢ, which by part (a) are all real and non-negative, so x†x is real and non-negative.

🔬 The eigenvalue argument

(d) For a symmetric matrix M with real entries and eigenvector x with eigenvalue λ, compute (x†Mx)/(x†x). Since Mx = λx, this equals λ(x†x)/(x†x) = λ.
(e–g) Show that x†Mx is a 1×1 matrix (a scalar), and that (x†Mx)^T = x†Mx because M is symmetric and real.
(h) Since x†Mx is real (from the transpose property) and equals λ (from part d), and x†x is real and positive, λ must be real.

Why it matters: This proof explains why symmetric matrices always have real eigenvalues, which is essential for the diagonalization theorem.

🧩 Problem 2: Building orthonormal bases

🧩 Constructing orthonormal vectors

Problem 2 asks: given a unit vector x₁ = (a, b, c) in R³ (where a² + b² + c² = 1), find vectors x₂ and x₃ such that {x₁, x₂, x₃} is an orthonormal basis.

What this means:

x₂ and x₃ must be unit vectors (length 1).
x₁, x₂, x₃ must be mutually perpendicular (dot products zero).

The resulting matrix P:

P has columns x₁, x₂, x₃.
Because the columns form an orthonormal basis, P is an orthogonal matrix (P^T P = I).

Connection to diagonalization: This exercise practices the key skill needed to diagonalize symmetric matrices—building an orthogonal matrix from orthonormal eigenvectors.

🔍 Problem 3: Existence of eigenvalues

🔍 Linear dependence argument

Problem 3 proves that every linear transformation L: V → V (where V is finite-dimensional) has at least one eigenvalue.

The proof outline:

(a) The list (v, Lv, L²v, ..., Lⁿv) has n+1 vectors in an n-dimensional space, so it must be linearly dependent.
(b) Therefore there exist scalars αᵢ (not all zero) such that α₀v + α₁Lv + α₂L²v + ... + αₙLⁿv = 0.
(c) Let m be the largest index with αₘ ≠ 0, and define the polynomial p(z) = α₀ + α₁z + α₂z² + ... + αₘzᵐ. This polynomial can be factored as p(z) = αₘ(z - λ₁)(z - λ₂)...(z - λₘ).
(d) Substituting L for z gives (L - λ₁)(L - λ₂)...(L - λₘ)v = 0.
(e) Since v ≠ 0, at least one of the factors (L - λᵢ) must map some vector to zero, meaning λᵢ is an eigenvalue of L.

Why it matters: This shows that eigenvalues always exist (though they may be complex), which is foundational for diagonalization theory.

📊 Problem 4: Dimensions of eigenspaces

📊 Example matrix

Problem 4 works with the specific matrix:

A = ( 4   0   0 )
    ( 0   2  -2 )
    ( 0  -2   2 )

Tasks:

(a) Find all eigenvalues of A.
(b) Find a basis for each eigenspace and compute the sum of the dimensions of all eigenspaces.

📊 General pattern

(c) Based on the example, guess a formula for the sum of the dimensions of the eigenspaces of a real n×n symmetric matrix.

Expected result: The sum of the dimensions of all eigenspaces equals n for an n×n symmetric matrix.

Why this works: Because symmetric matrices are diagonalizable, they have a complete set of n linearly independent eigenvectors, which span the entire n-dimensional space.

Don't confuse: For non-symmetric matrices, the sum of eigenspace dimensions can be less than n (the matrix may not be diagonalizable).

🔄 Problem 5: Non-square matrices and SVD

🔄 Symmetric products of non-square matrices

Problem 5 explores what happens when M is not square (so M cannot be symmetric).

Key observations:

MM^T and M^T M are both symmetric, so both are diagonalizable.
MM^T is a (number of rows of M) × (number of rows of M) matrix.
M^T M is a (number of columns of M) × (number of columns of M) matrix.

🔄 Relationship between eigenvalues and eigenvectors

(a) Do all eigenvalues of MM^T also appear as eigenvalues of M^T M?
(b) How can you obtain an eigenvector of M^T M from an eigenvector of MM^T?

🔄 Singular Value Decomposition example

(c) For the specific matrix M = ((1, 2), (3, 3), (2, 1)), compute orthonormal eigenvector bases for both MM^T and M^T M.
Change the input and output bases for M to these orthonormal bases.
The hint indicates this leads to the Singular Value Decomposition Theorem.

Why it matters: The Singular Value Decomposition extends diagonalization ideas to non-square matrices, which is widely used in data analysis and numerical methods.

108

Range

16.1 Range

🧭 Overview

🧠 One-sentence thesis

The range of a linear transformation (or matrix) is the set of all outputs it can produce, and it can be efficiently computed by finding the span of the pivot columns or by using elementary column operations.

📌 Key points (3–5)

What range is: the subset of the codomain consisting of all elements that the function actually maps to.
How to find range of a matrix: the range equals the span of the matrix's columns, simplified to a linearly independent set (pivot columns).
Efficient computation method: use elementary column operations (ECOs) to transform the matrix to Column Reduced Echelon Form (CREF) and delete zero columns.
Common confusion: range vs image—range is the image of the entire domain; image can refer to the output of any subset of the domain.
Why it matters: understanding range is a step toward determining whether a linear transformation has an inverse.

📐 Definition and basic concept

📐 What range means

Range of a function f : S → T is the set ran(f) := {f(s) | s ∈ S} ⊂ T.

It is the subset of the codomain T consisting of elements that f actually maps to.
In other words: the things in T you can reach by starting in S and applying f.
The range is always a subset of the codomain, but may not equal the entire codomain.

🔗 Range vs image distinction

Range: the image of the entire domain S.
Image of a subset U: the set f(U) = {f(x) | x ∈ U} for any subset U of S.
Don't confuse: the excerpt notes that "for most subsets U of the domain S of a function f the image of U is not a vector space," but the range (image of the whole domain) is a vector space when f is a linear transformation.

🧮 Computing the range of a matrix

🧮 Basic method: span of columns

The range of a matrix is the span of its columns.
This is "very easy" until the last step: simplification.
You should end by writing the vector space as the span of a linearly independent set.

🔢 Example walkthrough (Example 143)

The excerpt computes:

ran of the 3×4 matrix with columns [1,1,0]ᵀ, [2,2,0]ᵀ, [0,1,1]ᵀ, [1,2,1]ᵀ

Step 1: Write as span of all columns.

ran(matrix) = {x·[1,1,0]ᵀ + y·[2,2,0]ᵀ + z·[0,1,1]ᵀ + w·[1,2,1]ᵀ | x,y,z,w ∈ ℝ}

Step 2: Identify pivot columns using RREF.

After row reduction, the RREF shows that the second and fourth columns are non-pivot columns.
Non-pivot columns can be expressed as linear combinations of columns to their left.

Step 3: Remove non-pivot columns.

Final answer: ran(matrix) = span{[1,1,0]ᵀ, [0,1,1]ᵀ}
This is a linearly independent set.

⚙️ Efficient method: elementary column operations (ECOs)

Key insight: The span of a set of vectors does not change when you replace the vectors through an invertible process.

Procedure:

Apply elementary column operations (ECOs) to the matrix.
Transform to Column Reduced Echelon Form (CREF).
Delete zero columns.
The range of the resulting matrix equals the range of the original matrix.

🔢 Example with ECOs (Example 144)

The excerpt computes the range of a 3×3 matrix:

Step	Operation	Result
Start	Original matrix	Columns: [0,1,1]ᵀ, [1,3,2]ᵀ, [1,1,0]ᵀ
1	Swap columns 1 and 3	Columns: [1,1,0]ᵀ, [1,3,2]ᵀ, [0,1,1]ᵀ
2	Replace column 2 with (column 2 − column 1)	Columns: [1,1,0]ᵀ, [0,2,2]ᵀ, [0,1,1]ᵀ
3	Scale column 2 by 1/2	Columns: [1,1,0]ᵀ, [0,1,1]ᵀ, [0,1,1]ᵀ
4	Replace column 3 with (column 3 − column 2)	Columns: [1,1,0]ᵀ, [0,1,1]ᵀ, [0,0,0]ᵀ
5	Delete zero column	Final: [1,1,0]ᵀ, [0,1,1]ᵀ

The final range is the span of the two non-zero columns.
The excerpt notes this is "an efficient way to compute and encode the range of a matrix."

🖼️ Image of a subset

🖼️ Definition of image

Image of a subset U of the domain S of a function f : S → T is f(U) = Im U := {f(x) | x ∈ U}.

The image is defined for any subset U, not just the entire domain.
When U is the entire domain S, the image equals the range.

🧊 Geometric example (Example 145)

The excerpt gives a concrete example:

Input: the unit cube U in ℝ³, defined as {a·[1,0,0]ᵀ + b·[0,1,0]ᵀ + c·[0,0,1]ᵀ | a,b,c ∈ [0,1]}.
Transformation: multiply by matrix M with columns [1,1,0]ᵀ, [0,1,0]ᵀ, [0,1,1]ᵀ.
Output: a parallelepiped {a·[1,1,0]ᵀ + b·[0,1,0]ᵀ + c·[0,1,1]ᵀ | a,b,c ∈ [0,1]}.

Important note: "For most subsets U of the domain S of a function f the image of U is not a vector space."

Example: the image of the unit cube is a parallelepiped, which is not a vector space (it doesn't contain the zero vector unless the cube does).
Don't confuse: the range (image of the entire domain) is a vector space for linear transformations.

109

Image

16.2 Image

🧭 Overview

🧠 One-sentence thesis

The range (or image) of a function is the set of all outputs actually produced, and understanding pre-images, one-to-one, and onto properties determines whether a function has an inverse.

📌 Key points (3–5)

Range vs codomain: the range is the subset of the codomain that is actually mapped to; it is always a vector space when the function is a matrix.
Pre-image: the set of all domain elements that map into a given subset of the codomain; it reverses the direction of thinking.
One-to-one (injective): different inputs always produce different outputs; this is a condition on pre-images.
Onto (surjective): every element of the codomain is mapped to by at least one domain element; this is a condition on the range.
Common confusion: "image" can mean either the entire range or a single output element; the excerpt prefers "range" to avoid this ambiguity.

🔍 Range and image terminology

🔍 What range means

Range of f: the image of the entire domain; the set of all outputs actually produced by f.

Notation: ran f = Img S, also written im(f) or f(S).
For a matrix, the range is always a span of vectors, hence a vector space.
Example: if a function maps the domain S to a codomain T, the range is the subset of T that is actually hit.

⚠️ Why "range" is preferred over "image"

The word "image" has two meanings:
1. The entire set of outputs (synonymous with range).
2. A single output element assigned to a single input.
Example: for A(x) = 2x − 1, one might say "the image of 2 is 3" (meaning the single output).
By contrast, "the range of 2 is 3" makes no sense because 2 is not a function and 3 is not a set.
Don't confuse: "range" always refers to a set; "image" can refer to either a set or a single element depending on context.

🔄 Pre-image

🔄 What pre-image means

Pre-image of a subset U ⊂ T: f⁻¹(U) := {s ∈ S | f(s) ∈ U} ⊂ S.

The pre-image is the set of all domain elements that map into U.
It reverses the direction: instead of asking "where does s go?", we ask "what maps into U?"
Example: the pre-image of a line segment U under a matrix M is the set of all vectors x such that M**x is in U.

🧮 Example from the excerpt

Given U = {a [2, 1, 1]ᵀ | a ∈ [0, 1]} (a line segment) and matrix M = [[1, 0, 1], [0, 1, 1], [0, 1, 1]],
The pre-image M⁻¹U is the set {x | M**x = v for some v ∈ U}.
After row reduction, the solution is M⁻¹U = {a [2, 1, 0]ᵀ + b [−1, −1, 1]ᵀ | a ∈ [0, 1], b ∈ ℝ}, a strip from a plane in ℝ³.

🎯 One-to-one and onto

🎯 One-to-one (injective)

One-to-one (1:1): different elements in S always map to different elements in T.

Formally: if x ≠ y ∈ S, then f(x) ≠ f(y).
Also called injective or a monomorphism.
This is a condition on the pre-images of f: each output has at most one pre-image.
Example: if two different inputs produced the same output, the function would not be one-to-one.

🎯 Onto (surjective)

Onto: every element of T is mapped to by some element of S.

Formally: for any t ∈ T, there exists some s ∈ S such that f(s) = t.
Also called surjective or an epimorphism.
This is a condition on the range of f: the range equals the entire codomain.
Don't confuse: onto means every codomain element is hit; one-to-one means no two domain elements hit the same codomain element.

🔗 Bijective (isomorphism)

A function is bijective (or an isomorphism) if it is both injective and surjective.
This means: every codomain element is hit exactly once.

🔁 Inverse functions

🔁 When an inverse exists

Theorem: A function f : S → T has an inverse function g : T → S if and only if f is bijective.

The proof has two parts:

🔁 Existence of inverse implies bijective

Suppose f has an inverse g.
Injective: If f(s) = f(s′), then g(f(s)) = g(f(s′)), so s = s′. Thus different inputs cannot produce the same output.
Surjective: Let t be any element of T. Then f(g(t)) = t, so g(t) is an element of S that maps to t. Thus every element of T is mapped to.

🔁 Bijective implies existence of inverse

(The excerpt does not complete this direction of the proof.)

Property	Condition on	Meaning
One-to-one (injective)	Pre-images	Different inputs → different outputs
Onto (surjective)	Range	Every codomain element is hit
Bijective (isomorphism)	Both	Every codomain element is hit exactly once; inverse exists

110

One-To-One and Onto

16.2.1 One-To-One and Onto

🧭 Overview

🧠 One-sentence thesis

A function has an inverse if and only if it is both one-to-one (injective) and onto (surjective), and for linear transformations these properties can be checked by examining the kernel and range.

📌 Key points

One-to-one (injective): different elements in the domain always map to different elements in the codomain.
Onto (surjective): every element of the codomain is mapped to by at least one element in the domain.
Bijective functions: both injective and surjective; these are exactly the functions that have inverses.
Common confusion: injectivity concerns pre-images (what maps to what), while surjectivity concerns the range (what gets covered).
Special property for linear maps: a linear transformation is one-to-one if and only if only the zero vector maps to the zero vector.

🔍 Injective functions (one-to-one)

🔍 What one-to-one means

A function f is one-to-one (sometimes denoted 1:1) if different elements in S always map to different elements in T.

Formally: if x ≠ y in S, then f(x) ≠ f(y).
Alternative name: injective functions (also called monomorphisms).
The excerpt emphasizes that "injectivity is a condition on the pre-images of f."

🧪 How to check injectivity

For general functions: verify that no two distinct inputs produce the same output.
For linear transformations: the excerpt states that a linear transformation is one-to-one if and only if 0_V is the only vector sent to 0_W.
Don't confuse: you only need to check one special vector (the zero vector) for linear maps, unlike arbitrary functions where you must check all pairs.

🎯 Surjective functions (onto)

🎯 What onto means

A function f is onto if every element of T is mapped to by some element of S.

Formally: for any t ∈ T, there exists some s ∈ S such that f(s) = t.
Alternative name: surjective functions (also called epimorphisms).
The excerpt notes that "surjectivity is a condition on the range of f."

🔗 Relationship to range

A function is onto when its range equals its entire codomain.
Example: if the codomain is T but only some subset gets mapped to, the function is not onto.

🔄 Bijective functions and inverses

🔄 What bijective means

If f is both injective and surjective, it is bijective (or an isomorphism).

A bijective function is both one-to-one and onto.
The excerpt provides a theorem: a function f : S → T has an inverse function g : T → S if and only if f is bijective.

🧩 Why bijectivity guarantees an inverse

The excerpt proves both directions:

Direction	Key idea
Inverse exists ⇒ bijective	If g is an inverse, then f must be injective (because g(f(s)) = s forces distinct inputs to have distinct outputs) and surjective (because every t has a pre-image g(t)).
Bijective ⇒ inverse exists	If f is bijective, every element t has exactly one pre-image (at least one from surjectivity, at most one from injectivity), so define g(t) to be that unique pre-image.

Example: if f is bijective, for any t in the codomain, there is exactly one s in the domain with f(s) = t, so we can "reverse" the function.

🧮 Special properties for linear transformations

🧮 The zero-vector test

The excerpt states: "a linear transformation is one-to-one if and only if 0_V is the only vector that is sent to 0_W."
Why this works: linear functions always send 0_V to 0_W (i.e., f(0_V) = 0_W).
By looking at just one very special vector, we can determine whether f is one-to-one—this is unique to linear functions, not true for arbitrary functions.

🔑 The kernel

If L : V → W is a linear function, then the set ker L = {v ∈ V | Lv = 0_W} ⊂ V is called the kernel of L.

The kernel is the set of all vectors in the domain that map to the zero vector in the codomain.
Connection to injectivity: if L is not injective, then we can find v₁ ≠ v₂ such that Lv₁ = Lv₂, which means L(v₁ − v₂) = 0 but v₁ − v₂ ≠ 0.
Practical check: if L has matrix M in some basis, finding the kernel is equivalent to solving the homogeneous system MX = 0.
Don't confuse: the kernel concerns injectivity (pre-images of zero), while the range concerns surjectivity (what gets covered).

📐 Pre-image and related concepts

📐 What pre-image means

The pre-image of any subset U ⊂ T is f⁻¹(U) := {s ∈ S | f(s) ∈ U} ⊂ S.

The pre-image of a set U is the set of all elements of S which map to U.
Example from the excerpt: for a matrix M and a line segment U, the pre-image M⁻¹U is the set of all vectors x such that Mx = v for some v ∈ U; in the given case, this turns out to be a strip from a plane in ℝ³.

📝 Range vs image terminology

The excerpt prefers "range of f" over "image of f" to avoid confusion with homophones.
The word "image" can also describe a single element of the codomain assigned to a single element of the domain.
Example: for A(x) = 2x − 1, one might say "the image of 2 is 3" (a single output), but would never say "the range of 2 is 3" (since 2 is not a function and 3 is not a set).

111

Kernel

16.2.2 Kernel

🧭 Overview

🧠 One-sentence thesis

The kernel of a linear transformation is the set of all vectors that map to zero, and a transformation is injective if and only if its kernel contains only the zero vector.

📌 Key points (3–5)

What the kernel is: the set of all input vectors that a linear transformation sends to the zero vector in the codomain.
Injectivity test: a linear transformation is one-to-one (injective) if and only if its kernel is trivial (contains only the zero vector).
Kernel as a subspace: the kernel of any linear transformation is always a subspace of the domain.
Common confusion: a trivial kernel guarantees injectivity but does not guarantee invertibility—the transformation must also be surjective.
Computational method: finding the kernel is equivalent to solving the homogeneous system M X = 0, where M is the matrix of the transformation.

🔍 What the kernel measures

🔍 Definition and meaning

Kernel of L: If L : V → W is a linear function, then ker L = { v ∈ V | Lv = 0_W } ⊂ V.

The kernel collects all vectors in the domain that are "collapsed" to zero by the transformation.
It is a subset of the domain V, not the codomain W.
The excerpt emphasizes that if L is not injective, then there exist v₁ ≠ v₂ such that Lv₁ = Lv₂, which means v₁ − v₂ ≠ 0 but L(v₁ − v₂) = 0, so v₁ − v₂ is in the kernel.

🧮 How to compute the kernel

If L has matrix M in some basis, finding ker L is equivalent to solving the homogeneous system M X = 0.
Use row operations to reduce M to row-reduced echelon form (RREF).
The kernel is described by the span of vectors corresponding to non-pivot columns.
Example from the excerpt: ker of a matrix is found by writing a string of equalities between kernels of matrices that differ by row operations, then reading off linear relations between columns.

🧩 Kernel and injectivity

🧩 The injectivity criterion

Theorem 16.2.2: A linear transformation L : V → W is injective if and only if ker L = { 0_V }.

Why this works: Linear transformations are special—by looking at just one vector (the zero vector), we can determine whether L is one-to-one.
If ker L = { 0 }, then the only vector sent to zero is the zero vector itself, so distinct inputs cannot map to the same output.
Example from the excerpt: L(x, y) = (x + y, x + 2y, y). Solving M X = 0 gives x = y = 0, so ker L = { 0 } and L is injective.

⚠️ Trivial kernel does not guarantee invertibility

A trivial kernel only gives half of what is needed for invertibility.
The transformation must also be surjective (onto).
Example from the excerpt: the matrix [1 0; 1 1; 0 1] has ker = {(0, 0)} (no non-pivot columns), so it is injective, but it maps R² → R³ and many vectors in R³ (like (1, 0, 0)) are not in its range, so it is not invertible.
Don't confuse: injective ≠ invertible; you need both injective and surjective.

🏗️ Kernel as a subspace

🏗️ Subspace property

Theorem 16.2.3: If L : V → W is linear, then ker L is a subspace of V.

Proof idea: If L(v) = 0 and L(u) = 0, then for any constants c, d, L(cu + dv) = cL(u) + dL(v) = 0. By the subspace theorem, ker L is a subspace.
The kernel is closed under addition and scalar multiplication.
Example from the excerpt: Let L : R³ → R be defined by L(x, y, z) = x + y + z. Then ker L = { (x, y, z) ∈ R³ | x + y + z = 0 }, which is a plane through the origin and thus a subspace of R³.

🔗 Connection to eigenspaces

When L : V → V, if L has a zero eigenvalue, the associated eigenspace is exactly the kernel of L.
The 0-eigenspace consists of all vectors v such that Lv = 0v = 0.

📐 Computational examples

📐 Example with trivial kernel

Example 147 from the excerpt:

L(x, y) = (x + y, x + 2y, y).
Matrix: [1 1; 1 2; 0 1]. Row-reduce to [1 0; 0 1; 0 0].
All solutions of M X = 0 are x = y = 0.
Therefore ker L = { 0 } and L is injective.

📐 Example with non-trivial kernel

Example 148 from the excerpt:

Matrix: [1 2 0 1; 1 2 1 2; 0 0 1 1]. Row-reduce to [1 2 0 1; 0 0 1 1; 0 0 0 0].
Non-pivot columns: columns 2 and 4.
ker = span { (−2, 1, 0, 0), (−1, 0, −1, 1) }.
These vectors describe linear relations: −2c₁ + c₂ = 0 and −c₁ − c₃ + c₄ = 0.
General rule: one vector in the spanning set for each non-pivot column, with a 1 as the last non-zero entry in each vector.

🔧 Efficient kernel computation

Write a string of equalities between kernels of matrices that differ by row operations.
Once RREF is reached, note the linear relationships between columns for a basis of the kernel.
If a matrix has more than one element in its kernel, it is not invertible (multiple solutions to M x = 0 implies RREF M ≠ I).

📊 Nullity and rank

📊 Definitions

Rank of L: the dimension of the range of L, denoted rank L := dim L(V) = dim ran L.

Nullity of L: the dimension of the kernel of L, denoted null L := dim ker L.

📊 The Dimension Formula

Theorem 16.2.5 (Dimension Formula): Let L : V → W be a linear transformation, with V finite-dimensional. Then:

dim V = dim ker L + dim L(V) = null L + rank L.

This formula relates the dimension of the domain to the dimensions of the kernel and range.
It is a fundamental constraint on linear transformations.

🌐 Range as a subspace

🌐 Range subspace theorem

Theorem 16.2.4: If L : V → W is linear, then the range L(V) is a subspace of W.

Why: If w = L(v) and w′ = L(v′), then for any constants c, d, linearity ensures cw + dw′ = L(cv + dv′).
By the subspace theorem, the range is a subspace.
Example from the excerpt: L(x, y) = (x + y, x + 2y, y) maps R² to a plane through the origin in R³, which is a subspace.

🌐 Column space interpretation

When a basis and a linear transformation are given, the range is often called the column space of the corresponding matrix.
Example from the excerpt: L(x, y) = [1 1; 1 2; 0 1] (x, y) = x (1, 1, 0) + y (1, 2, 1), so L(R²) = span { (1, 1, 0), (1, 2, 1) }.
The columns of the matrix encode the possible outputs of L.

🌐 Finding a basis for the range

Start with a basis S = { v₁, …, vₙ } for V.
The most general output is L(α₁v₁ + ⋯ + αₙvₙ) = α₁Lv₁ + ⋯ + αₙLvₙ ∈ span { Lv₁, …, Lvₙ }.
Thus L(V) = span L(S) = span { Lv₁, …, Lvₙ }.
However, { Lv₁, …, Lvₙ } may not be linearly independent; solve c₁Lv₁ + ⋯ + cₙLvₙ = 0 to find relations and discard vectors until a basis is obtained.
The size of this basis is the rank of L.

112

Kernel, Range, Nullity, Rank: Summary

16.3 Summary

🧭 Overview

🧠 One-sentence thesis

A linear transformation is invertible if and only if it is bijective, and this invertibility can be characterized in at least sixteen equivalent ways involving the matrix, its kernel, its range, and its rank.

📌 Key points (3–5)

Rank and nullity: rank is the dimension of the range (image); nullity is the dimension of the kernel; together they satisfy the dimension formula.
Dimension formula: for a linear transformation L : V → W, dim V = null L + rank L.
Row rank equals column rank: the number of linearly independent rows equals the number of linearly independent columns in any matrix.
Common confusion: invertibility requires a square matrix; if M is not n × n, many of the sixteen conditions are no longer equivalent.
Sixteen equivalent conditions: for an n × n matrix representing a linear transformation on an n-dimensional space, invertibility, bijectivity, full rank, nonzero determinant, and fourteen other properties are all equivalent.

🔢 Rank and nullity definitions

🔢 What rank measures

Rank of a linear transformation L: the dimension of its range (image).

Notation: rank L := dim L(V) = dim ran L.
The rank counts how many dimensions the output space actually occupies.
Example: if L : R² → R³ and the image is a plane through the origin, rank L = 2.

🕳️ What nullity measures

Nullity of a linear transformation: the dimension of the kernel.

Notation: null L := dim ker L.
The nullity counts how many dimensions of input collapse to zero.
Example: if ker L is a line through the origin in R³, null L = 1.

🧮 Column space and basis for the range

When a linear transformation L is given by a matrix M, the range is called the column space of M.
To find a basis for the range:
- Start with a basis S = {v₁, …, vₙ} for the domain V.
- The most general output is L(α₁v₁ + ⋯ + αₙvₙ) = α₁Lv₁ + ⋯ + αₙLvₙ.
- Therefore L(V) = span{Lv₁, …, Lvₙ}.
- The set {Lv₁, …, Lvₙ} may not be linearly independent; solve c₁Lv₁ + ⋯ + cₙLvₙ = 0 to find relations and discard dependent vectors until a basis remains.
The size of this basis is the rank of L.

📐 The dimension formula

📐 Statement and meaning

Dimension Formula: Let L : V → W be a linear transformation, with V finite-dimensional. Then dim V = dim ker L + dim L(V) = null L + rank L.

This formula partitions the input space into two parts: the part that collapses to zero (kernel) and the part that spans the output (range).
Example: if L : R⁵ → R³ has a 2-dimensional kernel, then the range must be 5 − 2 = 3-dimensional.

🧩 Proof sketch

Pick a basis for V: {v₁, …, vₚ, u₁, …, uᵩ}, where {v₁, …, vₚ} is a basis for ker L.
Then p = null L and p + q = dim V.
Show that {L(u₁), …, L(uᵩ)} is a basis for L(V):
- Spanning: any w in L(V) can be written w = L(c₁v₁ + ⋯ + cₚvₚ + d₁u₁ + ⋯ + dᵩuᵩ) = d₁L(u₁) + ⋯ + dᵩL(uᵩ), since L(vᵢ) = 0.
- Linear independence: suppose d₁L(u₁) + ⋯ + dᵩL(uᵩ) = 0 with not all dⱼ zero. Then L(d₁u₁ + ⋯ + dᵩuᵩ) = 0, so d₁u₁ + ⋯ + dᵩuᵩ is in ker L. But ker L = span{v₁, …, vₚ}, which contradicts the assumption that {v₁, …, vₚ, u₁, …, uᵩ} is a basis for V.
Therefore q = rank L, and dim V = p + q = null L + rank L.

⚠️ Finite-dimensional assumption

The formula assumes V is finite-dimensional.
For infinite-dimensional spaces (e.g., the space of all polynomials), the formula ∞ = ∞ + x cannot be solved for x, so the dimension formula is not useful for computation.

🔄 Row rank equals column rank

🔄 Column rank

For an m × n matrix M representing L : V → W (with dim V = n, dim W = m):
- The column rank is the number of linearly independent columns of M.
- This equals the rank of L, i.e., the dimension of the image.

🔄 Row rank

The row rank is the number of linearly independent rows (viewed as vectors).
Each linearly independent row gives an independent equation in the system M x = 0.
Each independent equation reduces the size of the kernel by one.

🔄 Why they are equal

From the dimension formula: null L = n − r, where r is the column rank.
If the row rank is s, then null L + s = n (because s independent equations reduce the kernel dimension).
Combining: null L + s = n and null L = n − r.
Therefore r = s: row rank equals column rank.

✅ Sixteen equivalent conditions for invertibility

✅ The invertibility theorem

Invertibility Theorem: Let V be an n-dimensional vector space and L : V → V a linear transformation with matrix M in some basis. Then M is an n × n matrix, and the following sixteen statements are equivalent:

#	Condition	Category
1	For any vector v in Rⁿ, the system M x = v has exactly one solution	Solvability
2	M is row-equivalent to the identity matrix	Row operations
3	For any vector v in V, L(x) = v has exactly one solution	Transformation solvability
4	M is invertible	Matrix property
5	The homogeneous system M x = 0 has no nonzero solutions	Kernel
6	det(M) ≠ 0	Determinant
7	Mᵀ is invertible	Transpose
8	M does not have 0 as an eigenvalue	Eigenvalues (matrix)
9	L does not have 0 as an eigenvalue	Eigenvalues (transformation)
10	det(λI − M) does not have 0 as a root	Characteristic polynomial
11	The columns (or rows) of M span Rⁿ	Spanning
12	The columns (or rows) of M are linearly independent	Independence
13	The columns (or rows) of M are a basis for Rⁿ	Basis
14	L is injective (one-to-one)	Injectivity
15	L is surjective (onto)	Surjectivity
16	L is bijective	Bijectivity

⚠️ Square matrix requirement

Critical: M must be an n × n matrix (square).
If M is not square, it cannot be invertible, and many of the sixteen statements are no longer equivalent to each other.
Don't confuse: a non-square matrix can have full rank (e.g., all columns independent), but it still cannot be invertible.

🔗 Why these are equivalent

The excerpt states that many equivalences were proved in earlier chapters; some were left as exercises.
The key insight: for a square matrix on a finite-dimensional space, being one-to-one (injective) is equivalent to being onto (surjective), which is equivalent to being bijective, which is equivalent to being invertible.
Example: if L is injective, then ker L = {0}, so null L = 0. By the dimension formula, rank L = dim V = n, so L is surjective.

113

16.4 Review Problems

🧭 Overview

🧠 One-sentence thesis

The kernel, range, nullity, and rank of a linear transformation are interconnected through the dimension formula and determine fundamental properties such as injectivity, surjectivity, and invertibility.

📌 Key points (3–5)

Kernel and range relationship: The kernel and range of a matrix M determine orthogonal decompositions of both the domain and target spaces (Fundamental Theorem of Linear Algebra).
Trivial kernel ⟷ injectivity: A linear transformation is one-to-one if and only if its kernel contains only the zero vector.
Computing bases: The columns of a matrix corresponding to pivot columns in RREF form a basis for the range; the kernel can be found from the RREF.
Common confusion: For infinite-dimensional spaces, the dimension formula can break down—one of kernel or range can be finite while the other is infinite.
Dimension formula verification: For finite-dimensional spaces, dimension of domain equals nullity plus rank; this must be verified in concrete examples.

🔗 Fundamental relationships between kernel and range

🔗 Orthogonal decompositions (Fundamental Theorem)

The excerpt establishes that for any matrix M : R^m → R^n:

R^m = ker M ⊕ ran M^T (domain decomposition)
R^n = ker M^T ⊕ ran M (target decomposition)

The proof outline in Problem 1 shows:

M x = 0 if and only if x is perpendicular to all columns of M^T
Therefore ker M is perpendicular to ran M^T
This gives orthogonal direct sum decompositions

The Fundamental Theorem of Linear Algebra: A linear transformation M determines orthogonal decompositions of both its domain and target space.

Why it matters: These decompositions show that the kernel and range are complementary subspaces that together account for the entire space.

🎯 Kernel-injectivity equivalence

Problem 2 asks to prove a key characterization:

Theorem: L : V → W is one-to-one (injective) if and only if ker L = {0_V}.

Two directions must be shown:

(Trivial kernel ⇒ injective): If ker L contains only the zero vector, then L is one-to-one
(Injective ⇒ trivial kernel): If L is one-to-one, then ker L = {0_V}

Example: If L(x₁) = L(x₂), then L(x₁ - x₂) = 0, so x₁ - x₂ is in ker L. If ker L is trivial, then x₁ - x₂ = 0, hence x₁ = x₂, proving injectivity.

Don't confuse: A trivial kernel means the transformation is injective, but this does not automatically make it surjective or invertible unless dimensions match.

🧮 Computing bases for kernel and range

🧮 Range basis from RREF

Problem 4 presents an algorithm for finding a basis for the range:

Key insight: If M is row-equivalent to a matrix in RREF, the columns of the original matrix M corresponding to pivot columns in the RREF form a basis for L(R^n).

Example from Problem 4(a): Given RREF with pivots in columns 1, 2, 3:

1  0  0  -1
0  1  0   1
0  0  1   1

The first three columns of the original matrix M form a basis for L(R^4).

Why this works: Row operations do not change the column space relationships, and pivot columns indicate which original columns are linearly independent.

🔍 General algorithm for range basis

Problem 4(b) asks for a general procedure:

Compute RREF of the matrix M
Identify pivot columns in the RREF
Select the corresponding columns from the original matrix M
These columns form a basis for L(R^n)

🧩 Range from basis of domain

Problem 3 establishes: If {v₁, ..., vₙ} is a basis for V and L : V → W is linear, then:

L(V) = span{Lv₁, ..., Lvₙ}

Why: Every vector in V can be written as a linear combination of basis vectors, and L preserves linear combinations, so every vector in L(V) is a linear combination of the images of the basis vectors.

🔧 Extending kernel basis to domain basis

Problem 5 claims: If {v₁, ..., vₙ} is a basis for ker L (where L : V → W), it is always possible to extend this set to a basis for V.

This reflects the dimension formula: the kernel is a subspace of the domain, and any basis for a subspace can be extended to a basis for the whole space.

📐 Dimension formula in action

📐 Derivative operator example

Problem 6 examines d/dx : Pₙ(x) → Pₙ(x):

Kernel: Constants (polynomials of degree 0), so dimension 1
Range: Polynomials of degree at most n-1, so dimension n
Verification: dim(domain) = n+1 = 1 + n = dim(ker) + dim(range)

What changes with different target spaces:

If target is Pₙ₋₁(x): the derivative is now surjective
If target is Pₙ₊₁(x): the range is a proper subspace

📐 Partial derivative operator

Problem 6 also considers L = ∂/∂x + ∂/∂y : P₂(x,y) → P₂(x,y):

Example: L(xy) = y + x
Task: find a basis for ker L and verify the dimension formula

Key: The kernel consists of polynomials where the sum of partial derivatives is zero.

⚠️ Infinite-dimensional cases

⚠️ When dimension formula breaks down

Problem 7 demonstrates failures in infinite dimensions:

⚠️ Derivative on all polynomials

For D = d/dx on R[x] (all polynomials):

Range: All of R[x] (every polynomial is a derivative of some other polynomial)
Kernel: Constants (infinite-dimensional space has infinite-dimensional kernel)
The dimension formula cannot be applied because dimensions are infinite

⚠️ Multiplication operator

For L(p(x)) = x·p(x) on R[x]:

Kernel: {0} (only the zero polynomial)
Range: All polynomials with zero constant term (infinite-dimensional)
One side is trivial, the other infinite

⚠️ General principle for infinite dimensions

Problem 7(c) shows: If V is infinite-dimensional and L : V → V is linear:

If dim ker L < ∞, then dim L(V) is infinite
If dim L(V) < ∞, then dim ker L is infinite

Don't confuse: In finite dimensions, small kernel means large range and vice versa; in infinite dimensions, at least one must be infinite.

🎲 Probability and linear independence

🎲 Random bit vectors

Problem 8 explores: "What is the probability that a random bit vector lies in the span of other vectors?"

🎲 Linear independence test

Part (i): A collection S of k bit vectors in B³ is linearly independent if and only if the kernel of the matrix M (with columns from S) is trivial (contains only the zero vector).

Connection: Linear independence means no non-trivial combination equals zero, which is exactly the definition of trivial kernel.

🎲 Adding a random vector

Part (ii): Given 2 linearly independent bit vectors in B³, when is S ∪ {v} linearly independent for a random v?

Test: Check if v is in span(S)
If v is not in the span, then S ∪ {v} is linearly independent

🎲 Eigenvalues and probability

Part (iii) connects to eigenvalues:

Characteristic polynomial of 3×3 bit matrix has degree 3
Coefficients are 0 or 1, giving finitely many possibilities
Key question: What is the probability that 0 is an eigenvalue?
Connection to kernel: 0 is an eigenvalue ⟷ kernel is non-trivial ⟷ columns are linearly dependent

Probability that columns form a basis: Equals the probability that the kernel is trivial, which equals the probability that 0 is not an eigenvalue.

Contrast with real vectors: For real vectors chosen "at random," the probability that a random vector lies in the span of n-1 other vectors in R^n is zero (the span has measure zero).

114

Projection Matrices

17.1 Projection Matrices

🧭 Overview

🧠 One-sentence thesis

Projection matrices allow us to find the component of a vector that lies in the range of a matrix, which is the closest approximation when the original system has no exact solution.

📌 Key points (3–5)

Why projection matters: when M X = V has no solution, the best approximation is to solve M X = V_r, where V_r is the part of V in the range of M.
The projection formula: the matrix M (M^T M)^(-1) M^T projects any vector V onto the range of M.
How it works: multiplying V by this projection matrix extracts V_r, the component in ran M, discarding the component in ker M^T.
Common confusion: the projection matrix is not M itself; it is a composite expression that depends on both M and M^T.
Connection to least squares: solving M^T M X = M^T V gives X, and then M X recovers V_r, the projection of V.

🧩 The geometry behind projection

🧩 Decomposing the codomain

The codomain of M is the direct sum codom M = ran M ⊕ ker M^T.

Any vector V in the codomain can be uniquely written as V = V_r + V_k, where:
- V_r is in the range of M (ran M)
- V_k is in the kernel of M transpose (ker M^T)
The system M X = V has a solution if and only if V is entirely in ran M, i.e., V_k = 0.
If V_k ≠ 0, there is no exact solution, but we can solve for the "closest thing": M X = V_r.

🎯 Why V_r is the best approximation

The excerpt states that "the closest thing to a solution of M X = V is a solution to M X = V_r."
V_r is the part of V that M can actually "reach."
V_k is orthogonal to the range of M (it lies in ker M^T), so it cannot be produced by M acting on any X.
Example: if you want to hit a target V but M can only produce vectors in a plane, V_r is the point in that plane nearest to V.

🔧 Deriving the projection matrix

🔧 Starting from the least-squares solution

The excerpt walks through a chain of implications:

Suppose X is a solution to M X = V_r.
Then M^T M X = M^T V_r.
Since V_k is in ker M^T, we have M^T V_k = 0, so M^T (V_r + V_k) = M^T V_r.
Therefore M^T M X = M^T V, the familiar least-squares equation.
If M^T M is invertible, then X = (M^T M)^(-1) M^T V.

🧮 The projection formula

By assumption, X solves M X = V_r.
Substituting X = (M^T M)^(-1) M^T V into M X gives:
- M (M^T M)^(-1) M^T V = V_r.
Conclusion: the matrix M (M^T M)^(-1) M^T projects V onto its ran M component.

Projection matrix onto ran M: P = M (M^T M)^(-1) M^T

Applying P to any vector V extracts V_r.
Don't confuse: this is not the same as M or M^T alone; it is a specific combination that "undoes" the effect of M^T M.

📐 Worked example

📐 Projecting onto a plane

The excerpt provides Example 154:

Goal: project the vector (1, 1, 1) onto the span of (1, 1, 0) and (1, -1, 0).

Setup:

The span is the range of the matrix M with columns (1, 1, 0) and (1, -1, 0):
- M = matrix with rows [1, 1], [1, -1], [0, 0] (3×2 matrix).

Steps:

Compute M^T M:
- M^T = matrix with rows [1, 1, 0] and [1, -1, 0] (2×3 matrix).
- M^T M = matrix with rows [2, 0], [0, 2] (2×2 matrix).
Invert M^T M:
- (M^T M)^(-1) = (1/2) times the identity 2×2 matrix.
Form the projection matrix:
- P = M (M^T M)^(-1) M^T = (1/2) M M^T.
- After multiplication, P = (1/2) times the matrix with rows [2, 0, 0], [0, 2, 0], [0, 0, 0].
Apply P to (1, 1, 1):
- P (1, 1, 1) = (1, 1, 0).

Result: the projection of (1, 1, 1) onto the plane is (1, 1, 0).

🔍 Interpreting the result

The original vector (1, 1, 1) has a component (1, 1, 0) in the plane and a component (0, 0, 1) perpendicular to it.
The projection matrix extracts only the in-plane part.
Notice that the third row of P is all zeros, which "kills" the third coordinate (the out-of-plane component).

🔗 Connection to least squares

🔗 From projection back to solutions

The excerpt emphasizes that "we learned to find solutions to this in the previous subsection."
The least-squares method solves M^T M X = M^T V to find X.
The projection matrix provides an alternative view: instead of solving for X directly, we first project V to V_r, then solve M X = V_r (which now has a solution).

🔗 Why both perspectives are useful

Least squares: focuses on finding the best-fit X.
Projection: focuses on understanding what part of V is "reachable" by M.
Both rely on the same formula M^T M X = M^T V, but projection makes the geometry explicit.

115

Singular Value Decomposition

17.2 Singular Value Decomposition

🧭 Overview

🧠 One-sentence thesis

Singular value decomposition provides an analog of the eigenvalue problem for non-square matrices by finding orthonormal bases for the domain and range that reveal how the linear transformation stretches input vectors.

📌 Key points (3–5)

What SVD solves: when a linear transformation L : V → W has different input and output dimensions (m ≠ n), the matrix M is not square, so there is no standard eigenvalue problem—SVD provides the analog.
How it works: even though M is not square, both M M^T and M^T M are square and symmetric, so they have eigenvalue problems with orthonormal eigenvector bases.
What singular values are: the square roots of the eigenvalues (√λᵢ) that appear along the diagonal of the transformed matrix, showing the stretching factors.
Common confusion: right singular vectors (orthonormal basis for the input space V) vs left singular vectors (orthonormal basis for the output space W)—they come from eigenvectors of M^T M and M M^T respectively.
Geometric meaning: SVD provides orthonormal bases for domain and range and reveals the factors by which L stretches the orthonormal input basis vectors.

🔧 The setup and key matrices

🔧 Why non-square matrices need a different approach

When L : V → W is a linear transformation with dim V = n and dim W = m, the matrix M is m × n.
If n ≠ m, the matrix is not square, so there is no standard eigenvalue problem.
However, if both V and W have inner products, we can use orthonormal bases and construct square matrices from M.

🔄 Constructing square symmetric matrices

Even though M is not square, both M M^T and M^T M are square and symmetric.

M^T is the matrix of a linear transformation L* : W → V (the adjoint).
This gives two compositions:
- L* L : V → V (represented by M^T M, an n × n matrix)
- L L* : W → W (represented by M M^T, an m × m matrix)
Both are square and symmetric, so both can be diagonalized with orthonormal eigenvector bases (as shown in Chapter 15).

🎯 Finding the singular value decomposition

🎯 Starting with eigenvectors of L* L

The excerpt assumes ker L = {0} (trivial kernel) to simplify the computation.

Step 1: Find orthonormal eigenvectors for the input space

Find an orthonormal basis (u₁, …, uₙ) for V composed of eigenvectors of L* L.
These satisfy: L* L uᵢ = λᵢ uᵢ.

Step 2: Show that L uᵢ are eigenvectors of L L*

Multiply the equation L* L uᵢ = λᵢ uᵢ by L:
- L L* L uᵢ = λᵢ L uᵢ
This means L uᵢ is an eigenvector of L L*.
The vectors (L u₁, …, L uₙ) are linearly independent because ker L = {0}.

📏 Computing lengths and angles

The excerpt computes the inner product of the transformed vectors:

(M Uᵢ) · (M Uⱼ) = Uᵢ^T M^T M Uⱼ = λⱼ Uᵢ^T Uⱼ = λⱼ δᵢⱼ
This shows the vectors (L u₁, …, L uₙ) are orthogonal but not orthonormal.
The length of L uᵢ is √λᵢ.

Normalizing to get orthonormal vectors:

Divide each by its length: (L u₁ / √λ₁, …, L uₙ / √λₙ)
These are orthonormal and linearly independent.

🧩 Completing the output basis

Since ker L = {0}, we have dim L(V) = dim V = n.
Because n ≤ m (input dimension ≤ output dimension), the normalized vectors cannot be a full basis for W.
However, they are a subset of the eigenvectors of L L*, so we can extend them to an orthonormal basis:
- O' = (L u₁ / √λ₁, …, L uₙ / √λₙ, vₙ₊₁, …, vₘ) =: (v₁, …, vₘ)

📐 The singular value decomposition result

📐 The diagonal-like matrix

With orthonormal basis O = (u₁, …, uₙ) for V and O' = (v₁, …, vₘ) for W, compute the matrix of L:

L O = (L u₁, …, L uₙ) = (√λ₁ v₁, …, √λₙ vₙ)
This can be written as (v₁, …, vₘ) times a matrix with √λᵢ along the leading diagonal and zeros elsewhere.

The resulting matrix has the form:

√λ₁   0    ···  0
0    √λ₂   ···  0
⋮     ⋮    ⋱   ⋮
0     0    ··· √λₙ
0     0    ···  0
⋮     ⋮    ⋱   ⋮
0     0    ···  0

The numbers √λᵢ along the leading diagonal are called the singular values of L.

🔑 Right and left singular vectors

Term	Definition	Source
Right singular vectors	Orthonormal eigenvectors of M^T M (basis for input space V)	The uᵢ vectors
Left singular vectors	Orthonormal eigenvectors of M M^T (basis for output space W)	The normalized L uᵢ / √λᵢ vectors

Don't confuse: Right singular vectors live in the input space and come from M^T M; left singular vectors live in the output space and come from M M^T.

🔄 Change of basis representation

If P is the matrix whose columns are the right singular vectors and Q is the matrix whose columns are the left singular vectors, then:

M' = Q⁻¹ M P
where M' is the diagonal-like matrix with singular values.

🎨 Geometric interpretation

🎨 What singular values reveal

Singular vectors and values provide orthonormal bases for the domain and range of L and give the factors by which L stretches the orthonormal input basis vectors.

The right singular vectors form an orthonormal basis for the input space.
The left singular vectors form an orthonormal basis for the output space.
Each singular value √λᵢ tells you how much L stretches the i-th input basis vector uᵢ.
Example: if √λ₁ = 2, then L stretches u₁ by a factor of 2 to produce 2v₁.

🌟 Why SVD matters (Wigner's "Unreasonable Effectiveness")

The excerpt quotes Eugene Wigner's observation about "canonical quantities"—those that do not depend on any choices we make for calculating them—corresponding to important features of systems.

Applications mentioned:

Eigenvalues of certain equations encode notes and harmonics that a guitar string can play.
Singular values appear in many linear algebra applications, especially those involving very large data sets such as statistics and signal processing.

📝 Worked example

📝 Computing SVD for a specific matrix

Given matrix:

M = [ 1/2    1/2  ]
    [  -1      1  ]
    [-1/2  -1/2  ]

Step 1: Find right singular vectors (from M^T M)

M^T M = [ 3/2 -1/2 ] [-1/2 3/2 ]
Eigenvalues and eigenvectors:
- λ = 1, u₁ = (1/√2, 1/√2)
- λ = 2, u₂ = (1/√2, -1/√2)
Orthonormal input basis O = (u₁, u₂)

Step 2: Compute M uᵢ and find left singular vectors

M u₁ = (1/√2, 0, -1/√2) (eigenvector of M M^T with eigenvalue 1)
M u₂ = (0, -√2, 0) (eigenvector of M M^T with eigenvalue 2)
Normalize: v₁ = (1/√2, 0, -1/√2), v₂ = (0, -1, 0)
Third eigenvector (eigenvalue 0): v₃ = (1/√2, 0, 1/√2)
Orthonormal output basis O' = (v₁, v₂, v₃)

Step 3: The SVD result

The new matrix M' with respect to bases O and O' is:

M' = [ 1    0  ]
     [ 0   √2  ]
     [ 0    0  ]

Singular values are 1 and √2.
Relationship: M' = Q⁻¹ M P, where P and Q are change-of-basis matrices formed from the right and left singular vectors.

116

Review Problems

17.3 Review Problems

🧭 Overview

🧠 One-sentence thesis

These review problems consolidate understanding of linear transformations, kernels, inner products, orthogonalization, and singular value decomposition by asking students to prove properties, construct examples, and apply techniques to differential equations.

📌 Key points (3–5)

Solution sets and kernels: To describe all solutions to L(u) = v, you need both a particular solution and the kernel of L.
Properties of M^T M: When M has trivial kernel, the matrix M^T M is symmetric, positive semi-definite, and maps zero only to zero.
Gram-Schmidt reformulation: The classical algorithm can be rewritten using projection matrices.
Common confusion: M^T M is invertible even when M itself is not square or invertible (if columns are linearly independent).
Applications: Singular value decomposition and least squares extend to approximating solutions of differential equations in polynomial spaces.

🔍 Linear systems and kernels

🔍 Why the kernel matters for solution sets

Problem 1 asks: given L(u_ps) = v, why must you compute ker L to describe the full solution set of L(u) = v?

You have found one solution u_ps.
The complete solution set is u_ps + ker L (all vectors of the form u_ps + k where k is in the kernel).
Without the kernel, you only know one solution, not all solutions.
Example: If ker L contains non-zero vectors, there are infinitely many solutions; if ker L = {0}, u_ps is the unique solution.

Don't confuse: A particular solution vs. the general solution—the general solution is the particular solution plus the entire kernel.

🧮 Properties of M^T M

🧮 Symmetry and positivity

Problem 2 explores three properties when M is m × n with trivial kernel:

Property	Statement	Why it holds
Symmetry	u^T M^T M v = v^T M^T M u	M^T M is symmetric; transpose of a scalar equals itself
Positive semi-definite	v^T M^T M v ≥ 0	This is (Mv)^T (Mv) = ‖Mv‖², which is always non-negative
Zero only at zero	If v^T M^T M v = 0, then v = 0	‖Mv‖² = 0 implies Mv = 0; trivial kernel means v = 0

🔑 The hint about dot products

The expression v^T M^T M v can be rewritten as (Mv) · (Mv) in R^n.
The dot product of a vector with itself is the squared length.
A vector has zero length if and only if it is the zero vector.

🔄 Gram-Schmidt and projections

🔄 Reformulation using projection matrices

Problem 3 asks to rewrite the Gram-Schmidt algorithm in terms of projection matrices.

Classical Gram-Schmidt subtracts components along previous vectors.
Each subtraction step can be expressed as applying a projection matrix.
This reformulation makes the geometric meaning clearer: at each step, you project onto the orthogonal complement of the span of previous vectors.

🧱 Invertibility of M^T M

Problem 4 proves: if v₁, ..., vₖ are linearly independent, then M = (v₁ ··· vₖ) may not be invertible, but M^T M is invertible.

M is m × k; if m ≠ k, M cannot be square, so not invertible in the usual sense.
Linear independence of columns means ker M = {0}.
From Problem 2, M^T M maps zero only to zero, so M^T M is invertible.
Don't confuse: M being non-square vs. M^T M always being square (k × k).

📐 Singular value decomposition examples

📐 Constructing simple SVD examples

Problem 5 asks to write out the SVD theorem for:

A 3 × 1 matrix
A 3 × 2 matrix
A 3 × 3 symmetric matrix

Requirements:

No zero components (to make the structure clear).
Simple computations (to focus on understanding, not arithmetic).
Explain the choice of matrices.

🎯 Why choose specific matrices

The problem emphasizes pedagogical choices:

Non-zero entries reveal the full structure of the decomposition.
Simple numbers (e.g., 1, √2) make hand computation feasible.
Symmetric matrices have special properties (eigenvectors are singular vectors).

🧪 Application to differential equations

🧪 Polynomial approximation of solutions

Problem 6 applies least squares to approximate a solution to the differential equation:

d/dx f = x + x²

Setup:

Domain and codomain are both span{1, x, x²} (polynomials of degree at most 2).
The derivative operator is a linear transformation between these spaces.
Define bases for domain and codomain.

Approach:

The equation may not have an exact polynomial solution in this space.
Use least squares to find the "best" polynomial approximation.
This is a finite-dimensional analogue of function approximation.

🔧 The hint about bases

Begin by choosing bases for both the domain (polynomials before differentiation) and codomain (polynomials after differentiation).
Express the derivative operator as a matrix with respect to these bases.
The right-hand side x + x² becomes a vector in the codomain.
Solve the resulting least-squares problem.