An interactive guide — mini-spreadsheets for your math equations
"Algebra" means relationships. "Linear Algebra" means line-like relationships — predictable, proportional, no surprises.
A function $F$ is linear if it obeys two rules:
Think of a rooftop: move 3 feet forward, rise 1 foot. Move 30 feet forward — you expect a 10-foot rise. That's linear. Climbing a dome? Each foot forward raises you a different amount. Not linear.
Compare $F(x) = ax$ (linear) with $F(x) = x^n$ (non-linear). Watch how doubling the input affects the output.
The force in a spring is $F = kx$, where $k$ is the spring constant and $x$ is the displacement. This is a linear relationship:
Double the stretch, double the force. Automotive suspension engineers, bridge designers, and seismologists rely on this linearity when designing systems of springs and dampers. When deformation exceeds the elastic limit the relationship becomes non-linear — exactly when the math gets hard.
Which operations are linear?
The useful insight: we can combine multiple linear functions into a bigger one that's still linear:
This is "mini arithmetic": multiply each input by a constant, then add the results. Because it's linear, we can split inputs apart, analyse them individually, and combine the results:
Enter coefficients $a, b, c$ and two input vectors. Verify that $G(\mathbf{u} + \mathbf{v}) = G(\mathbf{u}) + G(\mathbf{v})$.
A sound engineer mixes three microphone tracks. Each channel is scaled by a gain coefficient and summed:
This is a linear combination. Doubling all input volumes doubles the mix — no distortion (in the linear regime). Every DAW (digital audio workstation), mixing console, and hearing aid processes audio this way: linear combinations of signals, thousands of times per second.
We have a bunch of inputs to track, and predictable linear operations to perform. How do we organise?
Inputs go in vertical columns:
Operations go in horizontal rows. If $F(x,y,z) = 3x + 4y + 5z$, we abbreviate the entire function as the row $[3\;\;4\;\;5]$.
Multiple operations stack into rows; multiple inputs sit side-by-side as columns:
Size convention: $m \times n$ means $m$ rows, $n$ columns. Multiply $[m \times n] \cdot [n \times p] = [m \times p]$. The inner dimensions must match.
Digital cameras apply a $3 \times 3$ colour correction matrix to each pixel's RGB values:
Each row is an operation: "How much of each input colour contributes to this output channel." Every phone camera, Photoshop filter, and Instagram effect is a matrix applied to millions of pixel vectors — pure linear algebra running in real time on your GPU.
Imagine "pouring" each input column through each operation row. As an input passes an operation, it creates one output entry.
Enter two matrices and watch the multiplication step by step. (2×3) × (3×2) = (2×2).
In a neural network, each layer computes $\mathbf{y} = W\mathbf{x} + \mathbf{b}$, where $W$ is a weight matrix and $\mathbf{x}$ is the input vector. For a layer with 3 inputs and 2 outputs:
GPT, image classifiers, self-driving car vision systems — they're all stacks of matrix multiplications. Training adjusts the weights; inference is just pouring data through matrices, billions of times per second on specialised hardware (GPUs/TPUs).
Some important matrices to know (for 3 inputs):
Copies input to output unchanged. $IA = A$.
$(x,y,z) \to (x,z,y)$
Select a preset operation matrix or type your own. See how it transforms the input vector.
GPS satellites broadcast positions in the Earth-Centred Earth-Fixed (ECEF) coordinate system. Your phone converts to local East-North-Up (ENU) coordinates using a rotation matrix derived from your latitude $\phi$ and longitude $\lambda$:
This is a "reorder + rotate" matrix. Every time your phone shows a blue dot on a map, it has just multiplied satellite vectors through this matrix.
The key insight: linear algebra gives you mini-spreadsheets for your math equations. Let's see it in action.
Suppose a new product launches: Apple stock jumps 20%, Google drops 5%, Microsoft stays flat. We want to (1) update each stock value and (2) compute total profit.
Three inputs enter, four outputs leave. The first three rows are a "modified identity" (update each value); the fourth row computes the change.
Enter stock holdings and market changes. The matrix does the rest.
Nobel laureate Wassily Leontief modelled entire economies with matrices. If three industries (agriculture, manufacturing, services) each consume outputs of the others, the total production $\mathbf{x}$ needed to meet external demand $\mathbf{d}$ is:
where $A$ is the "consumption matrix" (how much each industry uses from the others per unit of output). Governments and the World Bank still use this matrix model to forecast the economic ripple effects of policy changes, trade disruptions, and infrastructure investments.
When we treat inputs as 2D coordinates, a $2 \times 2$ matrix becomes a geometric transformation:
Pick a transform and watch a unit square warp in real time. The grid shows how the entire space is affected.
Every frame in a 3D game, each vertex of every model is transformed by a series of matrices:
M (Model) positions the object in the world. V (View) moves the world relative to the camera. P (Projection) flattens 3D to 2D screen coordinates. At 60 FPS with millions of vertices, GPUs perform billions of matrix-vector multiplications per second — the entire visual world you see in games, VR, and film CGI is built on $4 \times 4$ transformation matrices.
A system of linear equations can be written as a single matrix equation $M\mathbf{x} = \mathbf{b}$:
Gauss-Jordan elimination transforms the augmented matrix $[M|\mathbf{b}]$ into $[I|\mathbf{x}]$ by adding/subtracting rows — revealing the solution without rewriting full equations.
Enter coefficients for two equations in two unknowns. See the solution graphically (intersection of two lines).
An electrical circuit with 3 loops and 3 unknown currents $I_1, I_2, I_3$ yields:
Electrical engineers solve these matrix equations daily — from PCB design to power grid load balancing. SPICE simulators (used in every chip design) solve systems of thousands of linear equations per simulation step.
The determinant measures how a matrix scales area (2D) or volume (3D). Feed in a unit square; the determinant tells you the area of the output parallelogram.
Adjust matrix entries. The shaded region shows the transformed unit square and its area = |det|.
In finite element analysis (FEA), the stiffness matrix $K$ of a structure relates forces to displacements: $K\mathbf{u} = \mathbf{f}$. If $\det(K) = 0$, the structure has a mechanism — it can move freely without resistance (think of a four-bar linkage that collapses). Engineers check the determinant (or condition number) of the stiffness matrix to verify structural stability before construction begins.
Consider spinning a globe: every point moves to a new position — except the points on the axis (the poles). In matrix terms:
An eigenvector $\mathbf{v}$ is an input that doesn't change direction through the matrix — it only scales by factor $\lambda$ (the eigenvalue).
If $\lambda > 1$: the eigenvector stretches. If $0 < \lambda < 1$: it shrinks. If $\lambda < 0$: it flips direction.
Set a 2×2 matrix. The eigenvectors (red/blue lines) stay on their line when transformed.
The original Google search algorithm modelled the web as a giant matrix $M$ where $M_{ij}$ is the probability of clicking from page $j$ to page $i$. The principal eigenvector of $M$ (with $\lambda = 1$) gives the steady-state probability of a random surfer being on each page — this is the PageRank score.
Pages with high eigenvector components rank higher. The same technique powers social-network influence scores, recommendation engines, and epidemiological models (where the dominant eigenvalue determines whether a disease spreads or dies out).
A funky thought: we can treat the operations matrix as input to another matrix. Applying one operations matrix to another gives a new matrix that does both transformations in order:
$X$ first applies $N$, then $T$. We didn't need any input data — we combined the operations themselves.
Want to apply the same transform $k$ times? Use $M^k$.
Pick two 2D transformations. See the individual and combined effects on the unit square.
A robot arm with 3 joints computes the position of its gripper by composing rotation matrices for each joint:
Each $R_i$ is a rotation matrix for joint $i$. By composing them, the controller instantly knows where the gripper ends up for any combination of joint angles. Factory robots, surgical arms, and Mars rovers all chain dozens of matrices together to plan movements with sub-millimetre precision.
Our mini-arithmetic has multiplications but no plain addition. But we can cheat: add a dummy "1" entry to the input. Now the matrix has an extra column to play with:
We pretend the input lives in one higher dimension and place a "1" there. A skew in the higher dimension looks like a slide (translation) in the original. The dummy entry stays 1, ready for more slides.
Combine rotation and translation — something a plain 2×2 matrix can't do alone.
A self-driving car uses $4 \times 4$ homogeneous transformation matrices to track its position. Each sensor reading (LiDAR point cloud) is in the sensor's frame. To place it on the map:
Each $T$ is a $4 \times 4$ matrix encoding both rotation and translation. Without homogeneous coordinates, you'd need separate rotation and addition steps — messy, slow, and error-prone when chaining dozens of coordinate frames. Every self-driving car, drone, and warehouse robot composes these matrices thousands of times per second.
Inspired by: BetterExplained — An Intuitive Guide to Linear Algebra by Kalid Azad.