A complete interactive guide — from vectors to eigenvalues, with worked examples at every step.
Estimated reading time: 2-3 hours.
"Algebra" means relationships between quantities. "Linear Algebra" means line-like relationships — predictable, proportional, no surprises.
Think of a rooftop: move 3 feet forward, rise 1 foot. Move 30 feet forward — you expect a 10-foot rise. That's linear. Climbing a dome? Each foot forward raises you a different amount. Not linear.
A function $F$ is linear if it obeys two rules:
If I double the input, the output doubles. If I triple it, the output triples.
The effect of combined inputs = the combination of individual effects.
You can combine both rules into one: $F(ax + by) = aF(x) + bF(y)$. This is called superposition.
| Function | Linear? | Why? |
|---|---|---|
| $F(x) = 3x$ | Yes | $F(2x) = 6x = 2 \cdot 3x = 2F(x)$ ✓ |
| $F(x) = x^2$ | No | $F(2x) = 4x^2 \neq 2x^2 = 2F(x)$ ✗ |
| $F(x) = x + 3$ | No | $F(0) = 3 \neq 0$. Linear functions must pass through the origin ✗ |
| $F(x) = \sin(x)$ | No | $\sin(2x) \neq 2\sin(x)$ in general ✗ |
| $F(x) = 0$ | Yes | $F(ax) = 0 = a \cdot 0 = aF(x)$ ✓ (the "boring" linear function) |
Compare $F(x) = ax$ (linear) with $F(x) = x^n$ (non-linear). Watch how doubling the input affects the output.
The force in a spring is $F = kx$, where $k$ is the spring constant and $x$ is the displacement:
Double the stretch, double the force. Automotive suspension engineers, bridge designers, and seismologists rely on this linearity. When deformation exceeds the elastic limit the relationship becomes non-linear — exactly when the math gets hard.
Maxwell's equations for electromagnetism are linear. This means if $E_1$ is the electric field from charge A and $E_2$ is the field from charge B, then the total field is simply $E_1 + E_2$. You can analyse each charge separately.
If the equations were non-linear (like in general relativity), you couldn't just add fields — the presence of one charge would change how the other charge's field works. This is why gravity is so much harder to calculate than electromagnetism.
Before we can do linear algebra, we need something to do it to. That something is the vector.
A vector is an ordered list of numbers. That's it. But depending on context, it can represent many things:
We write vectors as columns (by convention) and use bold or arrows:
$$\mathbf{v} = \vec{v} = \begin{bmatrix} 3 \\ 4 \end{bmatrix} \qquad \text{a 2D vector}$$ $$\mathbf{w} = \begin{bmatrix} 1 \\ -2 \\ 5 \end{bmatrix} \qquad \text{a 3D vector}$$Add vectors component by component. Geometrically: put them tip-to-tail.
Multiply every component by the same number (the "scalar"). This stretches or shrinks the vector.
Multiplying by 2 doubles the length. Multiplying by $-1$ flips the direction.
Drag the sliders to change vectors $\mathbf{u}$ and $\mathbf{v}$. See their sum and scalar multiples.
Given $\mathbf{a} = \begin{bmatrix} 2 \\ -1 \\ 3 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} -1 \\ 4 \\ 2 \end{bmatrix}$, compute $3\mathbf{a} - 2\mathbf{b}$.
The length of a vector comes from the Pythagorean theorem:
Example: $\left\|\begin{bmatrix} 3 \\ 4 \end{bmatrix}\right\| = \sqrt{9 + 16} = \sqrt{25} = 5$
A unit vector has length 1. To "normalise" any vector, divide by its length: $\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}$. This keeps the direction but makes the length exactly 1.
Normalise $\mathbf{v} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$.
Your phone's position is a vector $\mathbf{p} = (x, y, z)$ in 3D space relative to the Earth's centre. The satellite's position is another vector $\mathbf{s}$. The distance between you and the satellite is $\|\mathbf{p} - \mathbf{s}\|$ — the length of the difference vector. Your phone receives timing signals from 4+ satellites and solves for the position vector $\mathbf{p}$ where all the distances are consistent. Pure vector arithmetic.
Now the payoff: combining linear operations. Which operations are linear?
The crucial insight: we can combine multiple linear functions and the result is still linear:
This is called a linear combination. Each input is scaled by a constant (a "weight"), then the results are added.
Let's verify the linearity of $G$ explicitly:
Enter coefficients $a, b, c$ and two input vectors. Verify that $G(\mathbf{u} + \mathbf{v}) = G(\mathbf{u}) + G(\mathbf{v})$.
Express $\begin{bmatrix} 7 \\ 1 \end{bmatrix}$ as a linear combination of $\begin{bmatrix} 1 \\ 2 \end{bmatrix}$ and $\begin{bmatrix} 3 \\ -1 \end{bmatrix}$.
We need $a$ and $b$ such that $a\begin{bmatrix} 1 \\ 2 \end{bmatrix} + b\begin{bmatrix} 3 \\ -1 \end{bmatrix} = \begin{bmatrix} 7 \\ 1 \end{bmatrix}$.
A sound engineer mixes three microphone tracks — each channel is scaled by a gain coefficient and summed:
This is a linear combination. Doubling all input volumes doubles the mix — no distortion (in the linear regime). Every DAW, mixing console, and hearing aid processes audio this way.
The dot product (or "inner product") takes two vectors and produces a single number. It measures how much two vectors point in the same direction.
Multiply corresponding components, then add them all up.
$\begin{bmatrix} 2 \\ 3 \\ -1 \end{bmatrix} \cdot \begin{bmatrix} 4 \\ -2 \\ 5 \end{bmatrix} = (2)(4) + (3)(-2) + (-1)(5) = 8 - 6 - 5 = -3$
The dot product has an elegant geometric interpretation:
where $\theta$ is the angle between the two vectors.
This tells us everything about how two vectors relate:
| $\mathbf{a} \cdot \mathbf{b}$ | $\cos\theta$ | Meaning |
|---|---|---|
| Positive | $> 0$ | Vectors point roughly the same way ($\theta < 90°$) |
| Zero | $= 0$ | Vectors are perpendicular / orthogonal ($\theta = 90°$) |
| Negative | $< 0$ | Vectors point roughly opposite ($\theta > 90°$) |
| $= \|\mathbf{a}\|\|\mathbf{b}\|$ | $= 1$ | Vectors point the exact same direction ($\theta = 0°$) |
The projection of $\mathbf{a}$ onto $\mathbf{b}$ tells you "how far does $\mathbf{a}$ go in the direction of $\mathbf{b}$?"
The scalar part $\frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{b}\|}$ is called the scalar projection (a signed length).
Adjust vectors $\mathbf{a}$ and $\mathbf{b}$. Watch the dot product value and the projection change.
Find the angle between $\mathbf{a} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 4 \\ -5 \\ 6 \end{bmatrix}$.
Project $\mathbf{a} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$ onto $\mathbf{b} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$ (the x-axis).
This makes intuitive sense: the "shadow" of $(3,4)$ onto the x-axis is just $(3,0)$.
Netflix represents each user as a vector of movie ratings: User A = (5, 3, 0, 4, ...) and User B = (4, 3, 0, 5, ...). The dot product (or cosine similarity $\frac{\mathbf{a}\cdot\mathbf{b}}{\|\mathbf{a}\|\|\mathbf{b}\|}$) measures how similar their tastes are. A high value means "these users like the same things" → recommend User B's top-rated movies to User A. This is called collaborative filtering.
In physics, work = force $\cdot$ displacement = $\mathbf{F} \cdot \mathbf{d} = \|\mathbf{F}\|\|\mathbf{d}\|\cos\theta$. Push a box at an angle? Only the component of force along the direction of motion does work. That's the dot product in action — it extracts "the useful part" of the force vector.
With vectors and linear combinations under our belt, three big questions arise:
The span of a set of vectors = all possible linear combinations of those vectors. It's the set of every point you can reach by scaling and adding them.
Think of it like a colour mixing analogy:
Vectors are linearly independent if none of them can be written as a combination of the others. Equivalently:
The only way to combine them to get zero is if all coefficients are zero.
Intuition: each independent vector adds a genuinely new direction. If a vector is dependent, it's "wasted" — it points somewhere the others already covered.
Are $\mathbf{v}_1 = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$ and $\mathbf{v}_2 = \begin{bmatrix} 3 \\ 6 \end{bmatrix}$ linearly independent?
Are $\mathbf{v}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$ and $\mathbf{v}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$ independent?
A basis for a space is a set of vectors that is:
The standard basis for $\mathbb{R}^3$ is:
$$\mathbf{e}_1 = \begin{bmatrix} 1\\0\\0 \end{bmatrix}, \quad \mathbf{e}_2 = \begin{bmatrix} 0\\1\\0 \end{bmatrix}, \quad \mathbf{e}_3 = \begin{bmatrix} 0\\0\\1 \end{bmatrix}$$Any 3D vector is a combination: $\begin{bmatrix} 3\\-2\\5 \end{bmatrix} = 3\mathbf{e}_1 - 2\mathbf{e}_2 + 5\mathbf{e}_3$.
Two vectors in 2D. When they're independent, they span the whole plane (shown in blue). When dependent, they only span a line.
Your screen creates colours using three basis vectors: Red, Green, Blue. The "span" of {R, G, B} is every colour your screen can display. The dimension is 3 (you need exactly 3 primaries). If you only had R and G (dimension 2), you'd be stuck in a plane of the colour cube — no blues at all.
Any colour = $r \cdot \text{Red} + g \cdot \text{Green} + b \cdot \text{Blue}$ with $r,g,b \in [0,1]$. This is literally a linear combination with the colour primaries as basis vectors.
We have a bunch of inputs to track, and predictable linear operations to perform. How do we organise?
Inputs go in vertical columns (vectors):
Operations go in horizontal rows. If $F(x,y,z) = 3x + 4y + 5z$, we abbreviate the entire function as the row $[3\;\;4\;\;5]$. Each row is a recipe: "take this much of input 1, this much of input 2, etc."
Multiple operations stack into rows; multiple inputs sit side-by-side as columns:
Read it as: "Row 1 says: take 3 of the first input + 4 of the second + 5 of the third. Row 2 says: take 3 of the first and ignore the rest."
To find entry $(i, j)$ of the output: take row $i$ of the operations matrix and column $j$ of the input matrix. Multiply corresponding entries and sum.
It's a dot product of row $i$ with column $j$.
Size convention: $m \times n$ means $m$ rows, $n$ columns. Multiply $[m \times \mathbf{n}] \cdot [\mathbf{n} \times p] = [m \times p]$. The inner dimensions must match — the number of columns in the first must equal the number of rows in the second.
Compute $\begin{bmatrix} 2 & -1 \\ 3 & 4 \end{bmatrix}\begin{bmatrix} 5 \\ 2 \end{bmatrix}$.
Compute $\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$.
Digital cameras apply a $3 \times 3$ colour correction matrix to each pixel's RGB values:
Each row is an operation: "How much of each input colour contributes to this output channel." Every phone camera and Instagram filter is a matrix applied to millions of pixel vectors.
Imagine "pouring" each input column through each operation row. As an input passes an operation, it creates one output entry.
There's an even more powerful way to think about it: each column of the output is a linear combination of the columns of the operations matrix, with the input vector providing the coefficients.
This is the "column picture" — the output is $x_1$ of column 1 $+ x_2$ of column 2 $+ x_3$ of column 3.
Enter two matrices and watch the multiplication step by step. (2×3) × (3×2) = (2×2).
In a neural network, each layer computes $\mathbf{y} = W\mathbf{x} + \mathbf{b}$, where $W$ is a weight matrix. For a layer with 3 inputs and 2 outputs:
GPT, image classifiers, self-driving car vision — they're all stacks of matrix multiplications. Training adjusts the weights; inference is just pouring data through matrices.
Some matrices appear everywhere because they do useful things:
Copies input to output unchanged. $IA = A$ and $AI = A$. It's the "do nothing" matrix — the multiplicative equivalent of 1.
Destroys everything. $OA = O$. The additive identity — $A + O = A$.
Scales each input independently. No mixing between components. Easiest matrix to understand.
Sum all inputs, or average them. Single-row matrices that reduce a vector to a number.
Zeros below the diagonal. Appears in Gaussian elimination. Easy to solve by back-substitution.
A matrix is symmetric if $A = A^T$ (it equals its transpose — same when you flip rows↔columns). Symmetric matrices have special properties: their eigenvalues are always real, and their eigenvectors are orthogonal. They appear in physics, statistics, and optimisation.
Select a preset operation matrix or type your own. See how it transforms the input vector.
GPS satellites broadcast positions in the Earth-Centred Earth-Fixed (ECEF) system. Your phone converts to local East-North-Up (ENU) coordinates using a rotation matrix derived from your latitude $\phi$ and longitude $\lambda$:
Every time your phone shows a blue dot on a map, it has multiplied satellite vectors through this matrix.
The transpose of a matrix flips it across its main diagonal — rows become columns and columns become rows.
If $A$ is $m \times n$, then $A^T$ is $n \times m$. Entry $(A^T)_{ij} = A_{ji}$.
The transpose connects to the dot product: $\mathbf{a} \cdot \mathbf{b} = \mathbf{a}^T \mathbf{b}$. Writing the dot product as matrix multiplication makes many proofs cleaner.
Compute $\mathbf{a}^T\mathbf{b}$ where $\mathbf{a} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$, $\mathbf{b} = \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix}$.
This is identical to the dot product $\mathbf{a} \cdot \mathbf{b} = 32$. The transpose turns a column into a row, enabling the "row × column" multiplication.
If a matrix $A$ transforms input into output ($\mathbf{y} = A\mathbf{x}$), the inverse $A^{-1}$ goes backwards: $\mathbf{x} = A^{-1}\mathbf{y}$. It "undoes" the transformation.
Applying $A$ then $A^{-1}$ (or vice versa) gets you back to where you started.
Not all matrices have inverses. An inverse exists if and only if:
These are all equivalent statements — if any one is true, they're all true.
Swap $a$ and $d$, negate $b$ and $c$, divide by the determinant.
Find the inverse of $A = \begin{bmatrix} 3 & 1 \\ 5 & 2 \end{bmatrix}$.
Find the inverse of $A = \begin{bmatrix} 2 & 4 \\ 1 & 2 \end{bmatrix}$.
The Hill cipher encrypts a message by multiplying letter-pair vectors by a key matrix $K$. To decrypt, multiply by $K^{-1}$. If $K = \begin{bmatrix} 3 & 3 \\ 2 & 5 \end{bmatrix}$, then $K^{-1} = \begin{bmatrix} 5 & -3 \\ -2 & 3 \end{bmatrix} \cdot \frac{1}{9}$ (mod 26). The inverse literally reverses the encryption — no inverse means no decryption, which is why singular matrices make bad encryption keys.
Let's see linear algebra as a "mini-spreadsheet" in action.
Suppose a new product launches: Apple stock jumps 20%, Google drops 5%, Microsoft stays flat. We want to (1) update each stock value and (2) compute total profit.
Three inputs enter, four outputs leave. The first three rows are a "modified identity" (update each value); the fourth row computes the change. Read each row as a recipe!
Holdings: AAPL = $1000, GOOG = $2000, MSFT = $500.
Enter stock holdings and market changes. The matrix does the rest.
Nobel laureate Wassily Leontief modelled entire economies with matrices. If three industries each consume outputs of the others, the total production $\mathbf{x}$ needed to meet external demand $\mathbf{d}$ is:
Governments and the World Bank still use this matrix model to forecast economic ripple effects.
When we treat inputs as 2D coordinates, a $2 \times 2$ matrix becomes a geometric transformation. Here are the big four:
Stretch horizontally by $s_x$, vertically by $s_y$.
Rotates every point by angle $\theta$ counter-clockwise around the origin.
Flips the y-coordinate. Like looking in a mirror along the x-axis.
Tilts shapes sideways. $k > 0$ leans right; $k < 0$ leans left.
Rotate $(3, 1)$ by $90°$ counter-clockwise.
Find the matrix that reflects across the line $y = x$.
Pick a transform and watch a unit square warp in real time. The grid shows how the entire space is affected.
Every frame in a 3D game, each vertex is transformed by a series of matrices:
M (Model) positions the object. V (View) moves relative to the camera. P (Projection) flattens 3D to 2D. At 60 FPS with millions of vertices, GPUs perform billions of matrix multiplications per second.
A system of linear equations can be written as a single matrix equation $M\mathbf{x} = \mathbf{b}$:
If $M$ is invertible, the solution is simply $\mathbf{x} = M^{-1}\mathbf{b}$. But we usually don't compute the inverse directly — instead we use Gaussian elimination.
The idea: use legal row operations (add/subtract multiples of rows) to turn the matrix into an upper triangle, then solve from the bottom up.
Legal operations:
Solve: $2x + y = 5$ and $x - y = 1$.
Solve: $x + 2y + z = 9$, $2x - y + 3z = 8$, $3x + y - z = 3$.
| Case | What happens | Geometrically (2D) |
|---|---|---|
| Unique solution | $\det \neq 0$, lines/planes meet at one point | Two lines cross at one point |
| No solution | Contradictory equations (e.g. $0 = 5$) | Parallel lines, never meet |
| Infinite solutions | Redundant equations, free variables | Same line, every point is a solution |
Enter coefficients for two equations. See the solution graphically (intersection of two lines).
An electrical circuit with 3 loops and 3 unknown currents yields:
SPICE simulators solve systems of thousands of linear equations per simulation step.
The determinant is a single number that captures the "essence" of a square matrix. It measures how the matrix scales area (2D) or volume (3D).
| $\det(A)$ | Meaning |
|---|---|
| $|\det| > 1$ | The transformation expands area/volume |
| $|\det| < 1$ | The transformation shrinks area/volume |
| $|\det| = 1$ | Area/volume is preserved (e.g., rotations) |
| $\det = 0$ | Singular! Output collapses to lower dimension. No inverse. |
| $\det < 0$ | Orientation is flipped (mirror reflection) |
$\det\begin{bmatrix} 3 & 1 \\ 2 & 4 \end{bmatrix} = (3)(4) - (1)(2) = 12 - 2 = 10$
The matrix scales area by a factor of 10. Since it's positive, orientation is preserved.
$\det\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}$
Determinant is 0! This matrix is singular. Why? Row 3 = Row 1 + Row 2. The three row vectors are linearly dependent — they all lie in a plane, so they can't fill 3D space.
Adjust matrix entries. The shaded region shows the transformed unit square and its area = |det|.
In finite element analysis, the stiffness matrix $K$ relates forces to displacements: $K\mathbf{u} = \mathbf{f}$. If $\det(K) = 0$, the structure has a mechanism — it can move freely without resistance. Engineers check the determinant to verify structural stability before construction.
Every matrix defines two important subspaces that tell you fundamental things about what the matrix can and cannot do.
The column space of $A$ = the set of all possible outputs $A\mathbf{x}$. It's the span of the columns of $A$.
If $\mathbf{b}$ is in the column space, then $A\mathbf{x} = \mathbf{b}$ has a solution. If it's not, no solution exists.
The rank of a matrix = dimension of the column space = number of linearly independent columns.
The null space of $A$ = the set of all inputs that get mapped to zero.
If the null space is just $\{\mathbf{0}\}$, the matrix is injective (one-to-one) — different inputs always produce different outputs.
If the null space contains non-zero vectors, some information is being destroyed.
The number of "useful" dimensions (rank) plus the number of "destroyed" dimensions (nullity) always equals the total input dimensions. What the matrix doesn't use goes to zero.
Find the null space of $A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix}$.
Find the column space of $A = \begin{bmatrix} 1 & 3 \\ 2 & 6 \end{bmatrix}$.
MRI machines acquire far fewer measurements than pixels in the final image. The measurement matrix $A$ maps the unknown image $\mathbf{x}$ to the measurements $\mathbf{b}$: $A\mathbf{x} = \mathbf{b}$. Since $A$ has more columns than rows, the null space is non-trivial — infinitely many images could produce the same measurements. Compressed sensing adds the constraint "find the sparsest $\mathbf{x}$" to pick the right one. Understanding null space = understanding which information is lost.
This is one of the most important ideas in all of linear algebra.
Consider spinning a globe: every point moves to a new position — except the points on the axis (the poles). In matrix terms, most vectors change direction when you apply a matrix. But some special vectors only get scaled:
An eigenvector $\mathbf{v}$ is an input that doesn't change direction through the matrix — it only scales by factor $\lambda$ (the eigenvalue).
"Eigen" is German for "own" or "characteristic" — these are the matrix's own vectors, the directions it naturally acts along.
| $\lambda$ | What happens to the eigenvector |
|---|---|
| $\lambda > 1$ | Stretches (gets longer) |
| $0 < \lambda < 1$ | Shrinks (gets shorter) |
| $\lambda = 1$ | Unchanged (fixed direction AND length) |
| $\lambda = 0$ | Collapsed to zero (this direction is destroyed) |
| $\lambda < 0$ | Flipped direction and scaled |
| Complex $\lambda$ | Rotation (no real eigenvector stays on its line) |
Start from $M\mathbf{v} = \lambda\mathbf{v}$, rewrite as $(M - \lambda I)\mathbf{v} = \mathbf{0}$. For a non-zero $\mathbf{v}$ to exist, the matrix $(M - \lambda I)$ must be singular:
Solve this polynomial for $\lambda$. Then find eigenvectors by solving $(M - \lambda I)\mathbf{v} = \mathbf{0}$.
Find the eigenvalues and eigenvectors of $A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}$.
Eigenvectors reveal the "natural axes" of a transformation. Along these axes, the matrix acts simply (just scaling). In any other direction, it does complicated stretching + rotating. This is why eigenvectors are central to:
Set a 2×2 matrix. The eigenvectors (red/blue lines) stay on their line when transformed.
The web is a giant matrix $M$ where $M_{ij}$ = probability of clicking from page $j$ to page $i$. The principal eigenvector (with $\lambda = 1$) gives the steady-state probability of a random surfer being on each page — this is the PageRank score:
Pages with high eigenvector components rank higher. The same eigenvector technique powers social-network influence scores, recommendation engines, and epidemiological models.
A bridge's stiffness matrix $K$ and mass matrix $M$ define the eigenvalue problem $K\mathbf{v} = \omega^2 M\mathbf{v}$. Each eigenvector $\mathbf{v}$ is a natural vibration mode (the shape the bridge oscillates in), and $\omega$ is the frequency. The Tacoma Narrows Bridge collapsed in 1940 because wind excited an eigenmode — the bridge oscillated along an eigenvector of its structural matrix until it broke apart. Engineers now compute these eigenvalues to ensure no natural frequency matches expected wind or traffic patterns.
We've been using the standard basis $\{(1,0), (0,1)\}$, but any set of independent vectors can serve as a basis. Change of basis translates coordinates between different "reference frames."
Suppose a new basis $B = \{\mathbf{b}_1, \mathbf{b}_2\}$. The change-of-basis matrix $P$ has the new basis vectors as columns:
From new to standard: $\mathbf{v}_{\text{standard}} = P \cdot \mathbf{v}_{\text{new basis}}$
From standard to new: $\mathbf{v}_{\text{new basis}} = P^{-1} \cdot \mathbf{v}_{\text{standard}}$
If $A$ is a transformation expressed in the standard basis, the same transformation expressed in basis $B$ is:
Read right-to-left: convert from $B$-coords to standard ($P$), apply the transformation ($A$), convert back to $B$-coords ($P^{-1}$).
The matrix $A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}$ has eigenvectors $\mathbf{v}_1 = \begin{bmatrix} 1\\1 \end{bmatrix}$ ($\lambda_1=5$) and $\mathbf{v}_2 = \begin{bmatrix} 1\\-2 \end{bmatrix}$ ($\lambda_2=2$).
In data science, you compute the covariance matrix of your data, find its eigenvectors (the "principal components"), and change basis to these eigenvectors. In the new basis, the data is uncorrelated and sorted by variance. This reveals which features actually matter and lets you compress high-dimensional data (e.g., 1000 gene expressions → 5 principal components) without losing much information.
A powerful idea: we can treat the operations matrix as input to another matrix. Applying one operations matrix to another gives a new matrix that does both transformations in order:
$X$ first applies $N$, then $T$. We combined the operations themselves — no data needed.
Want to apply the same transform $k$ times? Use $M^k$.
First rotate by $90°$, then scale by 2.
Pick two 2D transformations. See the individual and combined effects on the unit square.
A robot arm computes the gripper position by composing rotation matrices for each joint:
Factory robots, surgical arms, and Mars rovers chain dozens of matrices for sub-millimetre precision.
Linear transformations always keep the origin fixed. But what about translation (sliding everything over)? Translation is NOT linear — it violates $F(\mathbf{0}) = \mathbf{0}$.
The trick: add a dummy "1" entry to the input. Now the matrix has an extra column to add constants:
We pretend the input lives in one higher dimension and place a "1" there. A shear in the higher dimension looks like a slide (translation) in the original dimensions. The dummy entry stays 1, ready for more slides.
Rotate $(2, 0)$ by $90°$, then translate by $(3, 1)$.
Combine rotation and translation — something a plain 2×2 matrix can't do alone.
A self-driving car uses $4 \times 4$ homogeneous matrices to track its position. Each LiDAR point cloud is transformed from sensor frame to world frame:
Without homogeneous coordinates, you'd need separate rotation and addition steps. Every self-driving car, drone, and warehouse robot composes these matrices thousands of times per second.
The cross product takes two 3D vectors and returns a third vector that is perpendicular to both. It only works in 3D.
Magnitude: $\|\mathbf{a} \times \mathbf{b}\| = \|\mathbf{a}\|\|\mathbf{b}\|\sin\theta$ = area of the parallelogram formed by $\mathbf{a}$ and $\mathbf{b}$.
Direction follows the right-hand rule: point your fingers from $\mathbf{a}$ toward $\mathbf{b}$; your thumb points in the direction of $\mathbf{a} \times \mathbf{b}$.
$\begin{bmatrix} 1\\2\\3 \end{bmatrix} \times \begin{bmatrix} 4\\5\\6 \end{bmatrix}$
The cross product can be written as a (symbolic) determinant:
This is a mnemonic — expand along the first row using cofactors to get the formula above.
Torque $\boldsymbol{\tau} = \mathbf{r} \times \mathbf{F}$ (position × force). The cross product gives a vector perpendicular to both — pointing along the axis of rotation. Its magnitude is $\|\mathbf{r}\|\|\mathbf{F}\|\sin\theta$, which is exactly the "turning effectiveness" of the force. Every physics simulation of spinning objects, gyroscopes, and orbital mechanics uses cross products extensively.
Here's a map of everything we covered and how the pieces connect:
The recurring theme: linear algebra lets you decompose complex problems into simple pieces, solve each piece, and recombine. Whether you're building a search engine, training an AI, designing a bridge, or rendering a video game — this decompose-solve-recombine pattern is why linear algebra is the most widely used branch of mathematics.
Inspired by: BetterExplained by Kalid Azad, Khan Academy Linear Algebra, and 3Blue1Brown — Essence of Linear Algebra.