Engineering Math — From Foundations to Kalman Filters

A comprehensive guide for software engineers entering AV/ML/GPS — assumes minimal math background

You write code for a living. You think in logic, loops, and data structures. This guide translates math into that mindset — every concept is grounded in something physical you can picture, and connected to the AV/ML systems you want to understand.

Algebra → Functions → Limits → Exponents & Logs → Derivatives → Integrals → Diff Equations → Systems → Numerical Methods → Control → Statistics → Kalman Filter

0. Algebra: The Language

Algebra is just using letters to represent unknown numbers, then finding those numbers using rules. You already do this in code: let x = totalPrice / quantity. Math does the same thing.

Variables and Expressions

A variable is a placeholder for a number you don't know yet (like a function parameter). An expression is a recipe that uses variables:

$3x + 7$ means "take some number $x$, multiply by 3, add 7"
If $x = 4$: $3(4) + 7 = 19$
If $x = -2$: $3(-2) + 7 = 1$

Multiplication is often written without a sign: $3x$ means $3 \times x$. Parentheses work like code: evaluate the inside first.

Equations: Finding the Unknown

An equation says two expressions are equal. Solving means finding what value of the variable makes it true.

Golden rule: Whatever you do to one side, do to the other side. The equation stays balanced.

Worked example: Solve $3x + 7 = 22$

Step 1: Subtract 7 from both sides → $3x = 15$

Step 2: Divide both sides by 3 → $x = 5$

Check: $3(5) + 7 = 22$ ✓

Worked example: Solve $\frac{v - v_0}{a} = t$ for $v$

Step 1: Multiply both sides by $a$ → $v - v_0 = at$

Step 2: Add $v_0$ to both sides → $v = v_0 + at$

This is the velocity equation from physics. You just derived it by rearranging.

Interactive: Equation Solver

Solve $ax + b = c$ — enter coefficients and check your answer.

x + = Your x:

Subscripts and Greek Letters

Math uses subscripts to label related variables: $v_0$ means "the initial velocity", $v_f$ means "the final velocity". They're just names — like v_initial and v_final in code.

Greek letters are used because we run out of Roman ones:

$\theta$ (theta) — angles
$\alpha, \beta$ (alpha, beta) — coefficients, learning rates
$\sigma$ (sigma) — standard deviation
$\mu$ (mu) — mean
$\Delta$ (delta) — "change in" ($\Delta x$ = change in $x$)
$\Sigma$ (capital sigma) — summation (a for-loop that adds)
$\omega$ (omega) — angular velocity

Summation Notation (Σ)

This is literally a for-loop:

$$\sum_{i=1}^{5} i^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 55$$

In code: let sum = 0; for (let i = 1; i <= 5; i++) sum += i*i;

Real-World Example: GPS Averaging for Better Accuracy

Surveyors collect $N$ GPS readings and average them to reduce noise:

$$\bar{x} = \frac{1}{N}\sum_{i=1}^{N} x_i$$

This is just algebra + summation. With 100 readings, the average position is ~10× more precise than a single reading. The $\frac{1}{N}$ in front and the $\Sigma$ are the only "math" — the rest is collecting data.

1. Functions: Input → Output Machines

A function is a machine that takes an input and produces exactly one output. In code, it's literally a function:

Math: $f(x) = x^2 + 1$
Code: function f(x) { return x*x + 1; }

$f(3) = 10, \quad f(-2) = 5, \quad f(0) = 1$

The input variable (here $x$) is called the independent variable. The output $f(x)$ is the dependent variable (it depends on what you feed in). We often write $y = f(x)$.

Graphing a Function

A graph is a picture of all input-output pairs. The x-axis is the input, the y-axis is the output. Every point $(x, y)$ on the curve satisfies $y = f(x)$.

Key Function Types You'll See Everywhere

Linear: $f(x) = mx + b$

A straight line. $m$ = slope (rise/run), $b$ = y-intercept (where it crosses the y-axis). Constant rate of change.

Example: distance = speed × time. If speed is constant, distance is linear in time.

Quadratic: $f(x) = ax^2 + bx + c$

A parabola (U-shape or ∩-shape). The rate of change itself is changing.

Example: distance under constant acceleration: $d = \frac{1}{2}at^2$.

Exponential: $f(x) = a \cdot b^x$

Grows (or decays) by a constant percentage each step. The most important function in engineering — covered in depth next section.

Sinusoidal: $f(x) = A\sin(\omega x + \phi)$

Oscillates forever between $-A$ and $A$. $A$ = amplitude (height), $\omega$ = angular frequency (speed of oscillation), $\phi$ = phase shift (horizontal offset). Everything that repeats — waves, vibrations, AC power, wheel rotation, seasonal patterns — is modeled with sine and cosine.

Example: A wheel spinning at $\omega = 10$ rad/s traces $y = r\sin(10t)$. An AC outlet delivers $V(t) = 170\sin(120\pi t)$ — a 60 Hz oscillation with 170 V peak. More detail in the trig guide.

Interactive: Function Explorer

Pick a function type and adjust parameters. See how the graph changes.

Type:

m:2.0 b:1.0

Composition: Functions Feeding Functions

If $f(x) = x^2$ and $g(x) = x + 3$, then $f(g(x)) = f(x+3) = (x+3)^2$. This is like piping in Unix: the output of $g$ becomes the input of $f$. In AV systems, sensor data flows through chains of functions: raw signal → filter → transform → estimate.

Real-World Example: Temperature Sensor Pipeline

A thermistor outputs resistance $R$. The processing pipeline is a chain of functions:

$$R \xrightarrow{f_1} \text{voltage} \xrightarrow{f_2} \text{digital count} \xrightarrow{f_3} \text{temperature in °C} \xrightarrow{f_4} \text{filtered temperature}$$

Each arrow is a function. The full pipeline is $f_4(f_3(f_2(f_1(R))))$ — function composition. Every sensor in an AV has a similar pipeline, and understanding what each function does (and can go wrong) is core to sensor engineering.

2. Limits: Getting Infinitely Close

Before we can define derivatives or integrals, we need one key idea: what value does a function approach as the input gets close to some number? This is a limit — the mathematical tool for handling "infinitely close" without actually dividing by zero.

The Intuition

What is $f(x) = \frac{x^2 - 1}{x - 1}$ when $x = 1$? Plugging in gives $\frac{0}{0}$ — undefined. But watch what happens as $x$ gets close to 1:

$f(0.9) = 1.9 \quad f(0.99) = 1.99 \quad f(0.999) = 1.999$
$f(1.001) = 2.001 \quad f(1.01) = 2.01 \quad f(1.1) = 2.1$

The function approaches $2$. We write: $\displaystyle\lim_{x \to 1}\frac{x^2-1}{x-1} = 2$

Why? Factor the numerator: $\frac{x^2-1}{x-1} = \frac{(x-1)(x+1)}{x-1} = x + 1$ (when $x \neq 1$). As $x \to 1$, this approaches $1 + 1 = 2$. The function has a "hole" at $x = 1$, but the limit fills it in.

Formal Notation

$$\lim_{x \to a} f(x) = L$$

"As $x$ gets arbitrarily close to $a$ (but not equal to $a$), $f(x)$ gets arbitrarily close to $L$."

The function doesn't need to be defined at $a$ — limits care about what happens near $a$, not at $a$. This is exactly the situation with derivatives: we divide by $h$, then let $h \to 0$.

Computing Limits: The Toolkit

1. Direct Substitution

If plugging in gives a real number (no $0/0$, no blowup), that's your answer:

$\displaystyle\lim_{x \to 3}(2x + 1) = 7 \qquad \lim_{x \to 0}\cos(x) = 1 \qquad \lim_{x \to \pi}\sin(x) = 0$

2. Factor and Cancel (resolving $0/0$)

Worked example: $\displaystyle\lim_{x \to 2}\frac{x^2 - 4}{x - 2}$

Step 1: Direct substitution gives $\frac{4-4}{2-2} = \frac{0}{0}$ — indeterminate.

Step 2: Factor: $\frac{(x-2)(x+2)}{x-2} = x + 2$ (valid when $x \neq 2$)

Step 3: Now substitute: $\displaystyle\lim_{x \to 2}(x+2) = 4$

3. Multiply by Conjugate (another $0/0$ trick)

Worked example: $\displaystyle\lim_{x \to 0}\frac{\sqrt{x+4}-2}{x}$

Step 1: Direct sub gives $\frac{2-2}{0} = \frac{0}{0}$.

Step 2: Multiply by conjugate: $\frac{\sqrt{x+4}-2}{x}\cdot\frac{\sqrt{x+4}+2}{\sqrt{x+4}+2} = \frac{(x+4)-4}{x(\sqrt{x+4}+2)} = \frac{1}{\sqrt{x+4}+2}$

Step 3: Now substitute: $\frac{1}{\sqrt{4}+2} = \frac{1}{4}$

4. Limits at Infinity

What happens as $x$ grows without bound? The highest-power terms dominate:

$\displaystyle\lim_{x \to \infty}\frac{1}{x} = 0 \qquad \lim_{x \to \infty}\frac{3x^2 + 5x}{x^2 + 1} = 3 \qquad \lim_{x \to \infty}e^{-x} = 0$

For $\frac{3x^2+5x}{x^2+1}$: divide top and bottom by $x^2$ → $\frac{3+5/x}{1+1/x^2} \to \frac{3}{1}$. Same idea as Big-O: keep the dominant term.

5. The Squeeze Theorem

If $g(x) \le f(x) \le h(x)$ near $a$, and $\lim_{x\to a} g(x) = \lim_{x\to a} h(x) = L$, then $\lim_{x\to a} f(x) = L$ too. The function is "squeezed" to the limit. This is how the most important trig limit is proven.

Critical Limits (Memorise These)

The sinc limit (fundamental for trigonometry & signal processing): $$\lim_{x \to 0}\frac{\sin x}{x} = 1$$

Even though $\sin(0)/0$ is undefined, the ratio approaches exactly 1. Proven by squeezing: $\cos x \le \frac{\sin x}{x} \le 1$ near $x = 0$. This single limit is why $\frac{d}{dx}\sin x = \cos x$ — it's the foundation of all trig calculus.

The companion cosine limit: $$\lim_{x \to 0}\frac{1 - \cos x}{x} = 0$$

Used to derive $\frac{d}{dx}\cos x = -\sin x$. Together with the sinc limit, these two unlock every trigonometric derivative and integral.

The definition of $e$ (continuous compounding): $$\lim_{n \to \infty}\left(1 + \frac{1}{n}\right)^n = e \approx 2.71828$$

Covered in depth in the next chapter — the natural base of growth and decay.

Exponential dominates polynomial: $$\lim_{x \to \infty}\frac{x^n}{e^x} = 0 \quad\text{for any } n$$

No matter how large $n$ is, $e^x$ eventually dwarfs $x^n$. This is why $O(2^n)$ algorithms are impractical and why exponential decay kills any polynomial signal.

One-Sided Limits

Sometimes a function approaches different values from the left vs right:

$\displaystyle\lim_{x \to 0^+}\frac{1}{x} = +\infty \qquad \lim_{x \to 0^-}\frac{1}{x} = -\infty$

Since left ≠ right, $\lim_{x \to 0}\frac{1}{x}$ does not exist. A two-sided limit exists only when both one-sided limits agree. In engineering: $\tan(\theta)$ blows up at $\theta = \pi/2$ (90°) — important when computing steering angles near full lock.

Continuity: When Limits Behave

A function is continuous at $x = a$ if: (1) $f(a)$ exists, (2) $\lim_{x\to a}f(x)$ exists, and (3) they're equal. No holes, no jumps, no blowups.

Polynomials, $e^x$, $\sin x$, $\cos x$ are continuous everywhere. Rational functions like $\frac{1}{x}$ are continuous except where the denominator is zero. $\tan(x) = \frac{\sin x}{\cos x}$ is continuous except at $x = \frac{\pi}{2} + n\pi$ (where $\cos = 0$).

Why this matters for engineering: Continuous functions are predictable — small input changes produce small output changes. If a sensor transfer function has a discontinuity, tiny noise can cause huge output jumps. Control systems, Kalman filters, and numerical solvers all assume smoothness. When reality isn't smooth (wheel hitting a curb, GPS signal lost), special handling is needed — and understanding limits tells you where things break.

Interactive: Limit Explorer

Watch what value $f(x)$ approaches as $x$ gets close to the critical point. The hollow circle shows where the function is undefined. Increase zoom to approach more closely.

f(x): Zoom:5

Real-World Example: The Sinc Function in Signal Reconstruction

In digital signal processing, the ideal reconstruction filter is the sinc function:

$$\text{sinc}(x) = \frac{\sin(\pi x)}{\pi x}$$

At $x = 0$, this is $0/0$, but $\lim_{x\to 0}\text{sinc}(x) = 1$. When your phone plays audio, it reconstructs a continuous waveform from discrete samples using sinc interpolation — the limit at zero is what makes the math work. This connects to the Nyquist theorem: to perfectly reconstruct a signal with frequencies up to $f$, you must sample at $\geq 2f$ Hz. GPS receivers, LiDAR sensors, and camera frame rates are all constrained by this limit-based result.

Real-World Example: Numerical Derivatives — Limits in Your Code

Every numerical derivative is an approximation of a limit:

const derivative = (f(x + h) - f(x)) / h; // h = 1e-8

You can't set $h = 0$ (division by zero). You make $h$ small and hope it's "close enough" to the limit. But there's a tradeoff: too large → truncation error (the limit isn't reached). Too small → floating-point cancellation ($f(x+h) - f(x)$ rounds to 0). The sweet spot is $h \approx \sqrt{\epsilon_{\text{machine}}} \approx 10^{-8}$ for 64-bit floats. Understanding limits tells you why this tradeoff exists.

3. Exponents, Logarithms & the Number $e$

This chapter is critical. Almost every differential equation solution involves $e^{something}$. You can't skip this.

Exponents: Repeated Multiplication

$2^3 = 2 \times 2 \times 2 = 8$ ("2 multiplied by itself 3 times")
$5^2 = 25, \quad 10^4 = 10000, \quad 3^1 = 3, \quad 7^0 = 1$ (anything to the 0th power = 1)

Key rules (these are worth memorising):

$$a^m \cdot a^n = a^{m+n} \qquad \frac{a^m}{a^n} = a^{m-n} \qquad (a^m)^n = a^{mn}$$ $$a^{-n} = \frac{1}{a^n} \qquad a^{1/2} = \sqrt{a} \qquad a^{1/n} = \sqrt[n]{a}$$

Why exponents matter: They describe anything that grows or shrinks by a percentage. If a population doubles every hour: $P(t) = P_0 \cdot 2^t$. If a signal halves every second: $S(t) = S_0 \cdot (0.5)^t$.

Logarithms: The Inverse of Exponents

A logarithm answers: "what power do I need?"

$$\log_b(x) = y \quad\Longleftrightarrow\quad b^y = x$$

"$\log$ base $b$ of $x$ equals $y$" means "$b$ raised to $y$ gives $x$."

$\log_2(8) = 3$ because $2^3 = 8$
$\log_{10}(1000) = 3$ because $10^3 = 1000$
$\log_2(1) = 0$ because $2^0 = 1$

Think of it as the inverse function: exponent goes up, logarithm comes back down.

$$b^{\log_b(x)} = x \qquad\text{and}\qquad \log_b(b^x) = x$$

They undo each other — like encrypt(decrypt(msg)) = msg.

Log rules (mirror the exponent rules):

$$\log(xy) = \log(x) + \log(y) \qquad \log(x/y) = \log(x) - \log(y) \qquad \log(x^n) = n\log(x)$$

The Number $e$ ≈ 2.71828...

This is the most important number in calculus and engineering. Here's why it exists:

Imagine you invest $1 at 100% annual interest. How much do you have after 1 year?

Compounded once per year: $1 \times 2 = \$2.00$
Twice per year (50% each): $1 \times 1.5 \times 1.5 = \$2.25$
Monthly (8.33% each): $1 \times (1 + 1/12)^{12} = \$2.613...$
Daily: $(1 + 1/365)^{365} = \$2.7146...$
Every second: $(1 + 1/31536000)^{31536000} = \$2.71828...$

$$e = \lim_{n \to \infty}\left(1 + \frac{1}{n}\right)^n \approx 2.71828$$

$e$ is what you get when you compound continuously — it's the natural base of growth and decay.

The natural logarithm $\ln(x) = \log_e(x)$ is the log with base $e$. In code, Math.exp(x) = $e^x$ and Math.log(x) = $\ln(x)$.

Why $e$ is everywhere in engineering: $e^x$ is the unique function that is its own derivative — $\frac{d}{dx}e^x = e^x$. The rate of growth equals the current value. This makes $e^x$ the natural solution to "something changes proportionally to itself" — which is how almost every physical system works (cooling, charging, decaying, growing).

Interactive: Exponential Growth & Decay

See how $y = a \cdot e^{kx}$ behaves. Positive $k$ = growth, negative $k$ = decay.

a:1.0 k:0.50

Real-World Example: WiFi Signal Strength Decay

Radio signal power decays exponentially with distance (in addition to inverse-square spreading). GPS signal attenuation through building walls:

$$P(d) = P_0 \cdot e^{-\alpha d}$$

where $\alpha$ is the attenuation constant of the material. Concrete has $\alpha \approx 0.4$/m — after 5 m of concrete, signal strength is $e^{-2} \approx 13.5\%$ of original. This is why GPS doesn't work indoors, and why autonomous vehicles must handle GPS-denied environments.

4. Derivatives: Measuring Change

The Problem: How Fast Is Something Changing Right Now?

You're driving. Your odometer reads 100 km at 2:00 PM and 160 km at 3:00 PM. Average speed = $\frac{160-100}{1} = 60$ km/h. But were you going exactly 60 the whole time? Probably not — you sped up, slowed down, maybe stopped.

To know your speed at exactly 2:15 PM, you'd shrink the time interval: check at 2:15:00 and 2:15:01 (1 second). Even smaller: 2:15:00.000 and 2:15:00.001. The derivative is the limit of this process — the instantaneous rate of change.

Slope of a Line (Average Rate of Change)

$$\text{slope} = \frac{\text{rise}}{\text{run}} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1} = \frac{f(x + h) - f(x)}{h}$$

This gives the average rate of change between two points. As $h$ gets smaller, the two points get closer, and the line becomes a tangent — touching the curve at exactly one point.

The Derivative: Slope as $h \to 0$

$$f'(x) = \frac{df}{dx} = \lim_{h \to 0}\frac{f(x+h) - f(x)}{h}$$

Worked example from scratch: Find the derivative of $f(x) = x^2$.

Step 1: Compute $f(x+h) = (x+h)^2 = x^2 + 2xh + h^2$

Step 2: Compute the difference: $f(x+h) - f(x) = 2xh + h^2$

Step 3: Divide by $h$: $\frac{2xh + h^2}{h} = 2x + h$

Step 4: Let $h \to 0$: $\lim_{h \to 0}(2x + h) = 2x$

Result: If $f(x) = x^2$ then $f'(x) = 2x$. At $x = 3$, the slope is $6$.

Derivative Rules (Shortcuts)

You don't go through the limit every time. Patterns emerge:

Power rule: $\frac{d}{dx}x^n = nx^{n-1}$
$x^2 \to 2x, \quad x^3 \to 3x^2, \quad x^5 \to 5x^4, \quad x^1 \to 1, \quad x^0 \to 0$

Constant multiplier: $\frac{d}{dx}[cf(x)] = c \cdot f'(x)$
$\frac{d}{dx}[5x^3] = 5 \cdot 3x^2 = 15x^2$

Sum rule: $\frac{d}{dx}[f + g] = f' + g'$
$\frac{d}{dx}[x^3 + 4x^2 - 7x + 2] = 3x^2 + 8x - 7$

Exponential: $\frac{d}{dx}e^x = e^x$ (this is why $e$ is special!)

Chain rule: $\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)$
$\frac{d}{dx}e^{3x} = e^{3x} \cdot 3 = 3e^{3x}$ (outer derivative × inner derivative)

Trig derivatives:
$\frac{d}{dx}\sin x = \cos x \qquad \frac{d}{dx}\cos x = -\sin x \qquad \frac{d}{dx}\tan x = \sec^2 x$
$\frac{d}{dx}\sin(\omega t) = \omega\cos(\omega t)$ (chain rule) $\frac{d}{dx}\cos(\omega t) = -\omega\sin(\omega t)$

Why does $\frac{d}{dx}\sin x = \cos x$? It comes from the limit $\lim_{h\to 0}\frac{\sin h}{h} = 1$ (Chapter 2). That single limit powers all of trig calculus.

Physical Meanings

Position $x(t)$ → Velocity $v(t) = \frac{dx}{dt}$ → Acceleration $a(t) = \frac{dv}{dt} = \frac{d^2x}{dt^2}$

Angular position $\theta(t)$ → Angular velocity $\omega = \frac{d\theta}{dt}$ → Angular acceleration $\alpha = \frac{d\omega}{dt}$

Temperature $T(t)$ → Cooling/heating rate $\frac{dT}{dt}$

Bank balance $B(t)$ → Rate of earning/spending $\frac{dB}{dt}$

Worked example (trig + chain rule): A wheel of radius $R = 0.3$ m spins with angular position $\theta(t) = 5t$ rad. A point on the rim traces a circle: $x(t) = R\cos(5t)$, $y(t) = R\sin(5t)$. Find the velocity.

Step 1: $v_x = \frac{dx}{dt} = -R\sin(5t) \cdot 5 = -1.5\sin(5t)$ m/s (chain rule on cosine)

Step 2: $v_y = \frac{dy}{dt} = R\cos(5t) \cdot 5 = 1.5\cos(5t)$ m/s (chain rule on sine)

Speed: $|v| = \sqrt{v_x^2 + v_y^2} = \sqrt{1.5^2(\sin^2\!+\!\cos^2)} = 1.5$ m/s $= R\omega$ ✓

This is how AV wheel encoders work: measure $\theta(t)$, differentiate with trig to get x/y velocity components for dead reckoning navigation.

Interactive: Derivative Visualiser

See $f(x)$, its derivative $f'(x)$, and the tangent line at any point. When the function is flat, the derivative is zero. When the function is steep, the derivative is large.

f(x): x = 1.00

Real-World Example: Speed Cameras Use Derivatives

An ANPR (Automatic Number Plate Recognition) camera records your car's position at two points. Your average speed is the finite difference $\Delta x / \Delta t$. But your instantaneous speed (what a radar gun measures) is the derivative $dx/dt$. Autonomous vehicles compute derivatives of everything: rate of change of distance to the car ahead ($\frac{dD}{dt}$ — closing speed), rate of change of steering angle ($\frac{d\delta}{dt}$ — steering rate), rate of change of lateral position ($\frac{dy}{dt}$ — drift speed).

5. Integrals: Accumulating Change

The integral is the reverse of the derivative. If the derivative chops a quantity into rates, the integral adds the rates back up to recover the quantity.

The Problem: Adding Up Tiny Pieces

You have a graph of your car's speed over time. How far did you travel? Distance = speed × time. But speed keeps changing! So you chop time into tiny pieces $\Delta t$, multiply each by the speed at that moment, and add them up:

$$\text{distance} \approx \sum_{i=1}^{N} v(t_i) \cdot \Delta t$$

This is a Riemann sum — literally a for-loop. As $\Delta t \to 0$ and $N \to \infty$, the sum becomes exact:

$$\text{distance} = \int_{t_1}^{t_2} v(t)\,dt$$

The integral symbol $\int$ is just a stretched "S" for "Sum". The $dt$ tells you what variable you're summing over.

Antiderivatives

Since integration is the reverse of differentiation, we ask: "what function, when differentiated, gives me this?"

$$\text{If } \frac{d}{dx}[F(x)] = f(x), \text{ then } \int f(x)\,dx = F(x) + C$$

$C$ is the "constant of integration" — because the derivative of any constant is 0, we don't know what constant was there.

Common antiderivatives (reverse the derivative table):

$$\int x^n\,dx = \frac{x^{n+1}}{n+1} + C \quad(n \neq -1)$$ $$\int e^x\,dx = e^x + C$$ $$\int \frac{1}{x}\,dx = \ln|x| + C$$ $$\int \cos x\,dx = \sin x + C \qquad \int \sin x\,dx = -\cos x + C$$

Worked example: $\int (6x^2 + 4x - 3)\,dx$

Step 1: Integrate each term using the power rule: $\frac{6x^3}{3} + \frac{4x^2}{2} - 3x + C$

Result: $2x^3 + 2x^2 - 3x + C$

Check: Differentiate: $6x^2 + 4x - 3$ ✓ (we get back the original)

Definite Integrals (Area Under the Curve)

$$\int_a^b f(x)\,dx = F(b) - F(a)$$

The Fundamental Theorem of Calculus: to find the area under $f$ from $a$ to $b$, find the antiderivative $F$, then compute $F(b) - F(a)$.

Worked example: Area under $f(x) = x^2$ from $x = 0$ to $x = 3$

Step 1: Antiderivative: $F(x) = \frac{x^3}{3}$

Step 2: $F(3) - F(0) = \frac{27}{3} - 0 = 9$

Result: The area under $x^2$ from 0 to 3 is exactly 9 square units.

Worked example (trig): Energy in one half-cycle of AC current: $\int_0^{\pi}\sin(x)\,dx$

Step 1: Antiderivative of $\sin(x)$: $F(x) = -\cos(x)$

Step 2: $F(\pi) - F(0) = -\cos(\pi) - (-\cos(0)) = -(-1) + 1 = 2$

Result: The area under one arch of $\sin(x)$ is exactly 2.

In AC power, this integral gives energy per half-cycle. The full cycle $\int_0^{2\pi}\sin(x)\,dx = 0$ because positive and negative halves cancel — this is why AC needs rectification to charge a DC battery.

Interactive: Riemann Sum → Integral

Watch the rectangles approximate the area. More rectangles = better approximation → integral in the limit.

Rectangles N:5 a: b:

Real-World Example: Odometer — Integration in Your Dashboard

Your car's odometer computes $\text{distance} = \int_0^T v(t)\,dt$ every moment. At each tick of the wheel speed sensor (say every 10 ms), the ECU does:

distance += speed * 0.01; // Euler integration (Riemann sum)

That's a Riemann sum with $\Delta t = 0.01$ s. The GPS receiver does the same with Doppler-derived velocities. Every position estimate on your phone is a running integral of velocity. Every velocity estimate is a running integral of acceleration. It's integrals all the way down.

6. Differential Equations: Rules of Change

Now we combine derivatives with equations. A differential equation (DE) is an equation containing derivatives. Instead of telling you a value, it tells you the rule by which a value changes.

Ordinary equation: $x = 5$ (tells you the value)
Differential equation: $\frac{dx}{dt} = -2x$ (tells you the rule of change)

The SWE analogy: An ordinary equation is like a constant: const x = 5. A differential equation is like a loop that updates state: while(running) { x += -2*x*dt; }. You define the update rule, and the system evolves.

Why DEs, Not Direct Answers?

In the real world, we rarely know the answer directly. Instead, we know how things relate in the moment:

"Hot coffee cools proportionally to the temperature difference with the room" → $\frac{dT}{dt} = -k(T - T_{\text{room}})$
"Force equals mass times acceleration" → $F = m\frac{d^2x}{dt^2}$
"A capacitor's voltage changes at a rate inversely proportional to RC" → $\frac{dV}{dt} = -\frac{V}{RC}$
"An infected population grows proportionally to contacts with uninfected" → $\frac{dI}{dt} = \beta I(N-I)/N$
"A pendulum swings with restoring force proportional to displacement" → $\frac{d^2\theta}{dt^2} = -\frac{g}{L}\sin\theta$. For small angles $\sin\theta \approx \theta$, so $\ddot{\theta} + \frac{g}{L}\theta = 0$, giving $\theta(t) = \theta_0\cos\!\left(\sqrt{g/L}\,t\right)$ — sinusoidal oscillation, trig emerging from physics.

Terminology

Order = highest derivative. $\frac{dx}{dt} = \ldots$ is 1st order. $\frac{d^2x}{dt^2} = \ldots$ is 2nd order.
ODE = Ordinary DE (one independent variable, usually time $t$).
Solution = a function $x(t)$ that satisfies the equation when you plug it in.
Initial condition (IC) = the starting value, e.g. $x(0) = 5$. Without this, you have a family of solutions (the $+C$ from integration).
IVP = Initial Value Problem = DE + IC. This gives a unique solution.

Worked example: $\frac{dv}{dt} = 9.8, \quad v(0) = 0$ (a ball dropped from rest)

Step 1: This says "velocity changes at a constant rate of 9.8 m/s²" (gravity).

Step 2: Integrate both sides w.r.t. $t$: $v(t) = 9.8t + C$

Step 3: Use IC: $v(0) = 0 \Rightarrow C = 0$

Result: $v(t) = 9.8t$. After 3 seconds: $v(3) = 29.4$ m/s.

Interactive: Physical System Modeler

Pick a real system. See its DE, the solution, and the curve. Think about why the curve has that shape.

System:

Real-World Example: Why ML Training Loss is a DE

Gradient descent updates weights: $w_{n+1} = w_n - \eta \frac{\partial L}{\partial w}$. In the continuous limit, this is a DE:

$$\frac{dw}{dt} = -\eta \nabla L(w)$$

For a simple quadratic loss $L = w^2$, this becomes $\frac{dw}{dt} = -2\eta w$, whose solution is $w(t) = w_0 e^{-2\eta t}$ — exponential decay toward the minimum. The learning rate $\eta$ controls how fast. This is why training loss curves look like exponential decays — they literally are.

7. First-Order ODEs — Step by Step

Two equations handle the vast majority of engineering. Let's solve them completely, no steps skipped.

Equation A: Exponential Growth/Decay — $\frac{dy}{dt} = ky$

This says: "the rate of change of $y$ is proportional to $y$ itself." Big $y$ → big change. Small $y$ → small change.

Full solution by separation of variables:

Step 1 — Separate: Get all $y$ stuff on left, all $t$ stuff on right: $\frac{dy}{y} = k\,dt$

Step 2 — Integrate both sides: $\int \frac{1}{y}\,dy = \int k\,dt$ → $\ln|y| = kt + C_1$

Step 3 — Solve for $y$: Exponentiate both sides: $|y| = e^{kt + C_1} = e^{C_1} \cdot e^{kt}$. Call $e^{C_1} = C$: $y = Ce^{kt}$

Step 4 — Apply IC: If $y(0) = y_0$: $y_0 = Ce^{0} = C$, so $C = y_0$

Result: $y(t) = y_0 e^{kt}$

$k > 0$: exponential growth. $k < 0$: exponential decay. The half-life (time to halve) is $t_{1/2} = \frac{\ln 2}{|k|}$.

Equation B: Linear with Forcing — $\frac{dy}{dt} + ay = b$

This says: "$y$ is being pulled toward the value $b/a$ at a rate controlled by $a$." Like a thermostat: the room is pushed toward the set temperature.

Full solution by integrating factor:

Step 1 — Standard form: $\frac{dy}{dt} + ay = b$ (already there)

Step 2 — Integrating factor: $\mu(t) = e^{\int a\,dt} = e^{at}$

Step 3 — Multiply through: $e^{at}\frac{dy}{dt} + ae^{at}y = be^{at}$. The left side is the derivative of $ye^{at}$:

$\frac{d}{dt}\left[ye^{at}\right] = be^{at}$

Step 4 — Integrate: $ye^{at} = \frac{b}{a}e^{at} + C$

Step 5 — Solve for $y$: $y = \frac{b}{a} + Ce^{-at}$

Step 6 — Apply IC: $y(0) = y_0$: $y_0 = \frac{b}{a} + C$ → $C = y_0 - \frac{b}{a}$

Result: $y(t) = \underbrace{\frac{b}{a}}_{\text{steady state}} + \underbrace{\left(y_0 - \frac{b}{a}\right)e^{-at}}_{\text{decaying transient}}$

The time constant $\tau = 1/a$ is how long the transient takes to decay to ~37% of its initial value (or equivalently, how long to reach ~63% of the way to steady state).

Interactive: First-Order ODE Explorer

Adjust $a$, $b$, $y_0$. Watch the solution curve. Orange dot = start. Dashed = steady state. Vertical line = time constant $\tau$.

a:1.0 b:3.0 y₀:0.0

Real-World Example: RC Circuit — Fully Worked

Capacitor ($C = 100\,\mu$F) charges through resistor ($R = 10\,$kΩ) from 5V supply. Kirchhoff's voltage law gives:

$$V_s = V_R + V_C = RC\frac{dV_C}{dt} + V_C$$ $$\Rightarrow \frac{dV_C}{dt} + \frac{1}{RC}V_C = \frac{V_s}{RC}$$

This is Equation B with $a = 1/RC = 1/(10000 \times 0.0001) = 1$, $b = V_s/RC = 5$.

$\tau = RC = 1$ s. Steady state = $V_s = 5$ V.

$$V_C(t) = 5(1 - e^{-t})$$

After $1\tau$: 63% → 3.15V. After $3\tau$: 95% → 4.75V. After $5\tau$: 99.3% → 4.97V.

Real-World Example: GPS Smoothing Filter — From Scratch

Raw GPS position $x_{\text{GPS}}$ jumps around. We want a smooth estimate $\hat{x}$. First-order filter:

$$\frac{d\hat{x}}{dt} = \alpha(x_{\text{GPS}} - \hat{x})$$

Rearrange: $\frac{d\hat{x}}{dt} + \alpha\hat{x} = \alpha\, x_{\text{GPS}}$. This is Equation B with $a = \alpha$, $b = \alpha \cdot x_{\text{GPS}}$. The estimate exponentially approaches the GPS reading with time constant $\tau = 1/\alpha$. Large $\alpha$ → fast but noisy. Small $\alpha$ → smooth but laggy. This tradeoff is the central dilemma of all sensor filtering.

8. Systems of DEs & State Space

Real systems have multiple quantities changing at once, often affecting each other. A car has position AND velocity AND heading — all changing, all connected.

From One DE to Many

A single second-order DE can be split into two first-order DEs. Newton's law $F = ma$ means $m\frac{d^2x}{dt^2} = F$. Define $v = \frac{dx}{dt}$:

$$\frac{dx}{dt} = v \qquad \frac{dv}{dt} = \frac{F}{m}$$

Now we have two first-order DEs instead of one second-order. This is always possible, and it's how computers actually solve higher-order DEs.

State-Space Form

Pack all your variables into a state vector and write the system as a matrix equation:

$$\dot{\mathbf{x}} = A\mathbf{x} + B\mathbf{u}$$

For a car with drag: $\mathbf{x} = \begin{bmatrix} x \\ v \end{bmatrix}$, $A = \begin{bmatrix} 0 & 1 \\ 0 & -b \end{bmatrix}$, $B = \begin{bmatrix} 0 \\ 1/m \end{bmatrix}$, $u = F$

Row 1: $\dot{x} = 0 \cdot x + 1 \cdot v = v$ (position changes by velocity)

Row 2: $\dot{v} = 0 \cdot x + (-b) \cdot v + F/m$ (velocity changes by force minus drag)

Why state space? It turns any system — no matter how many variables — into a single matrix equation. The tools from your linear algebra guide apply directly. Eigenvalues of $A$ determine stability. Matrix exponential $e^{At}$ gives the solution. Everything connects.

Trig in state space: When a vehicle has heading angle $\theta$, position updates use trigonometry: $\dot{x} = v\cos\theta$, $\dot{y} = v\sin\theta$. The state vector becomes $[x, y, \theta, v]^T$. The $\cos$ and $\sin$ terms make this nonlinear — the "$A$ matrix" depends on the current state $\theta$. This is why real AV navigation uses the Extended Kalman Filter, which linearises trig functions at each time step using their derivatives ($\cos\theta \to -\sin\theta$, etc.).

Interactive: 2D Vehicle Model

A car with position and velocity. Adjust throttle and drag. Left: time plots. Right: phase portrait (state-space trajectory).

Throttle u:2.0 Drag b:0.5 v₀:0.0

Real-World Example: The 15-State GPS/INS Navigation Vector

A real navigation filter tracks:

$$\mathbf{x} = [\text{lat}, \text{lon}, \text{alt}, v_N, v_E, v_D, \text{roll}, \text{pitch}, \text{yaw}, b_{ax}, b_{ay}, b_{az}, b_{gx}, b_{gy}, b_{gz}]^T$$

15 states. The $A$ matrix is 15×15 — 225 entries encoding how every state affects every other. But the structure is the same as our 2-state car: $\dot{\mathbf{x}} = A\mathbf{x} + B\mathbf{u}$. Understanding the simple case gives you the pattern for the complex one.

9. Numerical Methods: How Computers Solve DEs

As a SWE, this is where you live. You'll rarely solve a DE by hand in production. But you must understand how numerical solvers work, because the choice of method affects accuracy, stability, and speed of your code.

The core idea: a continuous curve is approximated by stepping forward in discrete time increments $\Delta t$ (just like game physics or animation frames).

Euler's Method

$$y_{n+1} = y_n + \Delta t \cdot f(t_n, y_n)$$

"Current value + step size × current slope = next value"

Think of it as walking with a compass: check direction, walk straight for a bit, check again. With small steps it works. With big steps you drift off course.

In code:

let y = y0;
for (let t = 0; t < tMax; t += dt) {
y += dt * f(t, y);
}

Runge-Kutta 4 (RK4)

Instead of one slope sample, take four across the step and average them (weighted). This is dramatically more accurate.

$$k_1 = f(t_n,\; y_n)$$ $$k_2 = f(t_n + \tfrac{h}{2},\; y_n + \tfrac{h}{2}k_1)$$ $$k_3 = f(t_n + \tfrac{h}{2},\; y_n + \tfrac{h}{2}k_2)$$ $$k_4 = f(t_n + h,\; y_n + hk_3)$$ $$y_{n+1} = y_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4)$$

In code:

function rk4Step(f, t, y, h) {
  const k1 = f(t, y);
  const k2 = f(t+h/2, y+h/2*k1);
  const k3 = f(t+h/2, y+h/2*k2);
  const k4 = f(t+h, y+h*k3);
  return y + h/6*(k1 + 2*k2 + 2*k3 + k4);
}

Interactive: Euler vs RK4

Solve $dy/dt = -2y + 4$, $y(0) = 0$ (exact: $y = 2 - 2e^{-2t}$). Increase $\Delta t$ to see Euler fail while RK4 stays close.

Δt:0.30 y₀:

Real-World Example: GPS Dead Reckoning in a Tunnel

GPS signal lost. The car's IMU (accelerometer + gyro) dead-reckons:

// Euler integration at 100 Hz
velocity += acceleration * 0.01;
position += velocity * 0.01;

This is Euler's method on $\dot{v} = a$ and $\dot{x} = v$. At 100 Hz ($\Delta t = 0.01$ s), Euler works fine. But accelerometer noise accumulates (random walk) — after 30 s, position error can be metres. This is the fundamental reason GPS/INS fusion (Kalman filter) exists.

10. Control Systems & PID

A control system is a DE with a feedback loop. It measures where you are, compares to where you want to be, and applies a correction. Your thermostat, cruise control, and every robot on Earth uses this pattern.

The Feedback Loop

$$e(t) = \text{target} - \text{actual} \qquad\text{(error: how far off am I?)}$$ $$u(t) = \text{controller}(e) \qquad\text{(correction to apply)}$$ $$\text{system evolves} \qquad\text{(physics responds to correction)}$$ $$\text{measure again} \qquad\text{(loop back to the top)}$$

PID: Three-Part Correction

$$u(t) = \underbrace{K_p \cdot e(t)}_{\text{Proportional}} + \underbrace{K_i \int_0^t e(\tau)\,d\tau}_{\text{Integral}} + \underbrace{K_d \cdot \frac{de}{dt}}_{\text{Derivative}}$$

P (Proportional) — "The further I am, the harder I push." Steering: the more you're drifting from the lane center, the more you turn the wheel. Problem: can overshoot.

I (Integral) — "If I've been off for a while, push harder." A persistent crosswind pushes you right; the I term accumulates that bias and gradually adds leftward correction. Problem: can windup and cause sluggish oscillation.

D (Derivative) — "If I'm approaching the target fast, ease off." You're almost at the right lane position but zooming toward it — D applies braking. Problem: amplifies noise in measurements.

Trig connection: In vehicle control, steering angle $\delta$ changes heading via $\dot{\theta} = \frac{v}{L}\tan(\delta)$ (bicycle model, $L$ = wheelbase). The PID output becomes $\delta$, which through $\tan$, $\sin$, $\cos$ propagates into position changes: $\dot{x} = v\cos\theta$, $\dot{y} = v\sin\theta$. Trig is the bridge between control output and physical motion.

Interactive: PID Controller

A mass must reach the target (dashed line). Tune the three gains. Try: P only (oscillates), P+D (fast, no overshoot), P+I (eliminates steady-state error), all three (balanced).

Kp:5.0 Ki:1.0 Kd:2.0 Target:3.0

Real-World Example: Tesla's Lane Keeping

A simplified lateral controller for lane keeping:

$$\delta = K_p \cdot e_{\text{lateral}} + K_d \cdot \dot{e}_{\text{lateral}} + K_i \int e_{\text{lateral}}\,dt$$

where $\delta$ is the steering angle and $e_{\text{lateral}}$ is the distance from lane center. Camera detects lane lines → computes error → PID computes steering → car turns → new camera image → repeat at 30 Hz. Modern AVs use MPC (Model Predictive Control), which is PID's sophisticated cousin, but the feedback loop concept is identical.

11. Probability & Statistics for Sensors

Every sensor lies. GPS gives a position, but it's off by a few metres. An accelerometer reads $9.81\,\text{m/s}^2$, but the true value might be $9.79$. Statistics quantifies this uncertainty and tells you how much to trust each measurement.

Probability Basics

Probability = "how likely is this?" A number between 0 (impossible) and 1 (certain).

$$P(\text{event}) = \frac{\text{number of favorable outcomes}}{\text{total possible outcomes}}$$

Fair coin: $P(\text{heads}) = 1/2$. Fair die: $P(\text{six}) = 1/6$.

Key rules:

$P(\text{not } A) = 1 - P(A)$
$P(A \text{ or } B) = P(A) + P(B)$ (if A and B can't both happen)
$P(A \text{ and } B) = P(A) \times P(B)$ (if A and B are independent)

Mean (Average), Variance, Standard Deviation

Mean: $\mu = \frac{1}{N}\sum_{i=1}^N x_i$ (center of the data — arr.reduce((a,b) => a+b) / arr.length)

Variance: $\sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2$ (average squared deviation from mean)

Standard deviation: $\sigma = \sqrt{\sigma^2}$ (spread, in the same units as the data)

Worked example: GPS readings (metres): 10.2, 9.8, 10.5, 10.1, 9.4

Mean: $\mu = (10.2 + 9.8 + 10.5 + 10.1 + 9.4)/5 = 50/5 = 10.0$ m

Deviations: $0.2, -0.2, 0.5, 0.1, -0.6$

Squared deviations: $0.04, 0.04, 0.25, 0.01, 0.36$

Variance: $\sigma^2 = (0.04 + 0.04 + 0.25 + 0.01 + 0.36)/5 = 0.14$ m²

Std dev: $\sigma = \sqrt{0.14} \approx 0.374$ m

The Gaussian (Normal) Distribution

Most sensor errors follow a bell curve — small errors are common, large errors are rare.

$$p(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$

The entire shape is defined by just two numbers: $\mu$ (center) and $\sigma$ (width). This is why Gaussians dominate engineering — two parameters describe an entire noise profile.

68% of values fall within $\mu \pm 1\sigma$
95% within $\mu \pm 2\sigma$
99.7% within $\mu \pm 3\sigma$ (the "three-sigma rule")

Interactive: Gaussian Explorer

Adjust $\mu$ and $\sigma$. The shaded bands show 1σ, 2σ, 3σ intervals.

μ:0.0 σ:1.0

Real-World Example: GPS Accuracy Spec Decoded

A GPS module spec says "CEP 2.5 m". CEP (Circular Error Probable) = 50% of fixes within this radius. Assuming Gaussian errors:

$$\sigma \approx \frac{\text{CEP}}{1.18} \approx 2.12 \text{ m}$$

So 68% of fixes are within 2.12 m, 95% within 4.24 m. An autonomous vehicle using this GPS can't know its position better than ~2 m from GPS alone. That's why you fuse it with camera, LiDAR, wheel odometry, and maps — each has different $\sigma$ values, and the Kalman filter combines them optimally.

12. The Kalman Filter

This section uses everything before it. Algebra (equations), functions (state transitions), exponents (uncertainty growth), derivatives (system model), integrals (prediction), DEs (physics model), systems (state space), numerical methods (discrete propagation), control (correction), and statistics (Gaussian uncertainty). It is the most important algorithm in GPS/AV engineering.

The Core Problem

You have two sources of information about where a car is:

A physics model (DE): "I was going 20 m/s north, so after 1 second I should be 20 m further north." This drifts over time (imperfect model).
A sensor measurement (GPS): "You are at position X." This is noisy (imperfect sensor).

Neither is perfect. The Kalman filter answers: "What is the best estimate given both?"

The Intuition: Smart Weighted Average

If the model says 100 m and the GPS says 103 m, where are you really? It depends on which you trust more:

If the model is very precise (low uncertainty) and GPS is noisy → lean toward 100.
If the model has been drifting (high uncertainty) and GPS just got a fix → lean toward 103.
The Kalman filter computes the optimal blend automatically, based on the uncertainties.

The Two-Step Loop

Step 1: PREDICT (use the DE)

$$\hat{x}_{k|k-1} = A\hat{x}_{k-1} + Bu_k$$

"Based on physics (the state-space model), where should I be now?"

$$P_{k|k-1} = AP_{k-1}A^T + Q$$

"My uncertainty grew because the model isn't perfect ($Q$ = process noise covariance)."

Step 2: UPDATE (use the measurement)

$$K_k = \frac{P_{k|k-1}H^T}{HP_{k|k-1}H^T + R}$$

$K$ = Kalman gain. If sensor is precise ($R$ small), $K \to 1$ (trust sensor). If model is precise ($P$ small), $K \to 0$ (trust model).

$$\hat{x}_k = \hat{x}_{k|k-1} + K_k\underbrace{(z_k - H\hat{x}_{k|k-1})}_{\text{innovation (surprise)}}$$

"New estimate = prediction + gain × (measurement − predicted measurement)."

$$P_k = (I - K_kH)P_{k|k-1}$$

"Uncertainty shrinks because we incorporated new information."

Predict → Measure → Compute Gain → Correct → Repeat forever

Interactive: 1D Kalman Filter

A car moves at constant velocity. GPS (red dots) is noisy. The Kalman filter (green) fuses the motion model with measurements. Adjust noise levels — when Q is high, filter trusts GPS more; when R is high, it trusts the model more.

Process noise Q:0.10 Meas noise R:2.0

Real-World Example: GPS/INS Fusion — The Full Picture

A self-driving car runs an Extended Kalman Filter at 100 Hz:

Predict (at 100 Hz, from IMU): Integrate accel + gyro → propagate position, velocity, attitude. Uncertainty grows.

Update (at 10 Hz, from GPS): Compare predicted position to GPS fix. Kalman gain is high → large correction. Uncertainty shrinks.

Update (at 30 Hz, from camera): Compare predicted lane position to detected lane lines. Another correction, another uncertainty reduction.

In a tunnel (no GPS), the filter prediction uncertainty grows and grows. When GPS returns, the first fix causes a large correction (high Kalman gain × big innovation). This predict-update loop — DEs + statistics + linear algebra + numerical methods — runs in every vehicle, drone, phone, and satellite navigation system in the world.

The Complete Map

Algebra (language) → Functions (input/output) → Limits (approaching) → Exp/Log (growth/decay) → Derivatives (rates) → Integrals (accumulation) → DEs (rules of change) → Systems (multi-variable) → Numerical (computers solve) → Control (feedback) → Statistics (uncertainty) → Kalman (fuse everything)

Every topic feeds the next. The math isn't abstract — it's the literal code running inside every GPS receiver, every autonomous vehicle, every drone, every ML training loop. You now have the complete foundation.

Source: Comprehensive engineering math curriculum. See also: Trigonometry guide and Linear Algebra guide in this series.