An interactive guide — the foundation of everything in math and CS
A set is a collection of distinct things. That's it. No order, no duplicates. The "things" are called elements or members.
You use sets every day without thinking about it:
A set of the first 5 positive integers. Curly braces = "this is a set."
Key properties that make a set a set:
When you run SELECT DISTINCT user_id FROM orders, you get a set. The DISTINCT keyword literally enforces the set property — no duplicates.
In Python: user_ids = set(row['user_id'] for row in orders)
In JavaScript: const userIds = new Set(orders.map(o => o.userId))
Every programming language has a Set data structure because sets are that fundamental.
When you run git diff --name-only, you get a set of filenames that changed. Each file appears at most once, order doesn't matter — you care about which files changed, not in what sequence.
Math has compact notation for sets. Here's the Rosetta Stone:
$x \in A$ — "$x$ is an element of $A$" (x belongs to A)
$x \notin A$ — "$x$ is NOT an element of $A$"
$|A|$ — "the size of $A$" (how many elements, also called cardinality)
$\emptyset$ or $\{\}$ — the empty set (a set with nothing in it)
1. Roster notation — just list the elements:
2. Set-builder notation — describe a rule:
Read the "|" as "such that". This says: "the set of all x, such that x is an integer and x is divisible by 2."
Set-builder notation is exactly a filter/comprehension:
Math: $B = \{x \in \text{Users} \mid x.\text{age} \geq 18\}$
Python: B = {u for u in users if u.age >= 18}
SQL: SELECT * FROM users WHERE age >= 18
JS: const B = new Set(users.filter(u => u.age >= 18))
Same idea, different syntax. The math notation is actually shorter than most code.
Mathematicians use special letters for number sets you'll see everywhere:
$\emptyset = \{\}$ — a set with zero elements. Think: the result of a search query that matches nothing.
Search "asdfghjklqwerty" on Amazon. You get $\emptyset$ — an empty set of products. The set exists (it's a valid container), it just has nothing in it.
In code: results = set() # valid, but empty
Important: $\emptyset \neq \{\emptyset\}$. An empty bag is different from a bag containing an empty bag!
In programming, we only ever deal with finite sets (memory is finite). But infinite sets matter in theory — they're why algorithms have complexity classes.
The universal set $U$ is "everything we're currently talking about." It depends on context:
If every element of $A$ is also in $B$, then $A$ is a subset of $B$:
And $B$ is a superset of $A$: $B \supseteq A$.
If $A$ is a subset but $A \neq B$ (meaning $B$ has extra stuff), it's a proper subset: $A \subset B$.
Quick examples:
In any app with roles and permissions:
$\text{ViewerPerms} = \{\text{read}\}$
$\text{EditorPerms} = \{\text{read}, \text{write}\}$
$\text{AdminPerms} = \{\text{read}, \text{write}, \text{delete}, \text{manage\_users}\}$
Notice: $\text{ViewerPerms} \subset \text{EditorPerms} \subset \text{AdminPerms}$
That's exactly how permission hierarchies work. Checking "does this user have permission X?" is literally checking set membership: $X \in \text{UserPerms}$.
Checking "can a Viewer do everything an Editor can?" is checking: $\text{EditorPerms} \subseteq \text{ViewerPerms}$? No — so Viewers can't do all Editor tasks.
TypeScript's type system is literally set theory:
type Animal = { name: string }
type Dog = { name: string; breed: string }
Dog is a subtype of Animal because every Dog has everything an Animal has (and more). In set theory terms: the set of all possible Dog objects is a subset of the set of all possible Animal objects.
Union types (string | number) and intersection types (A & B) are literally set union and intersection. TypeScript IS applied set theory.
The union of two sets is everything that's in either set (or both):
Think of it as merging two lists and removing duplicates.
Example:
The 3 appears in both, but only once in the union.
Properties of union:
Google searches multiple indexes. Each index returns a set of results. The final results page is the union of all those sets.
results = google_web_results | google_news_results | google_scholar_results
In Python, the | operator on sets is literally union: set_a | set_b
In SQL: SELECT ... UNION SELECT ... gives you the union (with automatic dedup).
$\text{EmailNotifs} = \{\text{orders}, \text{promotions}, \text{security}\}$
$\text{PushNotifs} = \{\text{messages}, \text{orders}, \text{reminders}\}$
$\text{AllNotifTypes} = \text{EmailNotifs} \cup \text{PushNotifs} = \{\text{orders}, \text{promotions}, \text{security}, \text{messages}, \text{reminders}\}$
"Show me all the types of notifications this user gets through any channel" = union.
The intersection is everything that's in BOTH sets:
Example:
If two sets have no elements in common, their intersection is empty: $A \cap B = \emptyset$. These sets are called disjoint.
Properties:
Facebook's "mutual friends" feature is literally set intersection:
$\text{YourFriends} \cap \text{TheirFriends} = \text{MutualFriends}$
In Python: mutual = your_friends & their_friends (the & operator on sets)
In SQL: SELECT friend_id FROM your_friends INTERSECT SELECT friend_id FROM their_friends
$\text{EnabledFlags} = \{\text{dark\_mode}, \text{new\_checkout}, \text{ai\_search}\}$
$\text{BetaFlags} = \{\text{new\_checkout}, \text{ai\_search}, \text{voice\_nav}\}$
$\text{EnabledFlags} \cap \text{BetaFlags} = \{\text{new\_checkout}, \text{ai\_search}\}$
"Which beta features are currently enabled?" = intersection.
$A \setminus B$ (also written $A - B$) is everything in $A$ that's NOT in $B$:
Example:
Unlike union and intersection, difference is NOT commutative: $A \setminus B \neq B \setminus A$ (usually).
$A \triangle B$ is everything in exactly one of the sets, but NOT in both:
The complement of $A$ (written $A'$ or $\overline{A}$ or $A^c$) is everything in the universal set $U$ that's NOT in $A$:
$U$ = all emails in your inbox
$\text{Read}$ = set of emails you've opened
$\text{Unread} = U \setminus \text{Read} = \text{Read}^c$
Your "unread count" is $|U \setminus \text{Read}|$.
$\text{FilesOnMain} = \{a.js, b.js, c.js, d.js\}$
$\text{FilesOnBranch} = \{a.js, b.js, e.js, f.js\}$
Files added on branch: $\text{FilesOnBranch} \setminus \text{FilesOnMain} = \{e.js, f.js\}$
Files deleted on branch: $\text{FilesOnMain} \setminus \text{FilesOnBranch} = \{c.js, d.js\}$
Files that changed (added or deleted): $\text{FilesOnMain} \triangle \text{FilesOnBranch} = \{c.js, d.js, e.js, f.js\}$
Files unchanged: $\text{FilesOnMain} \cap \text{FilesOnBranch} = \{a.js, b.js\}$
Venn diagrams visualize set operations. Enter elements for sets A and B, then see the result of each operation:
The Cartesian product $A \times B$ is the set of ALL possible ordered pairs — one element from $A$ and one from $B$:
Example:
Size: $|A \times B| = |A| \cdot |B|$. With 2 elements and 2 elements, you get 4 pairs.
This extends to more sets: $A \times B \times C$ gives you ordered triples $(a, b, c)$.
A CROSS JOIN in SQL is literally the Cartesian product:
SELECT * FROM sizes CROSS JOIN colors
If sizes = {S, M, L} and colors = {Red, Blue}, you get 6 rows: (S,Red), (S,Blue), (M,Red), (M,Blue), (L,Red), (L,Blue).
This is why CROSS JOINs can be dangerous — if both tables have 10,000 rows, the result has 100,000,000 rows. $10000 \times 10000 = 10^8$.
Other JOINs (INNER, LEFT, etc.) are Cartesian products followed by a filter (WHERE clause).
$\text{Sizes} = \{S, M, L, XL\}$
$\text{Colors} = \{\text{Black}, \text{White}, \text{Navy}\}$
$\text{AllVariants} = \text{Sizes} \times \text{Colors}$
Total variants = $4 \times 3 = 12$: (S,Black), (S,White), (S,Navy), (M,Black), ...
Add a third dimension: $\text{Materials} = \{\text{Cotton}, \text{Polyester}\}$
$\text{AllSKUs} = \text{Sizes} \times \text{Colors} \times \text{Materials}$, giving $4 \times 3 \times 2 = 24$ SKUs.
The power set $\mathcal{P}(A)$ is the set of ALL possible subsets of $A$ — including the empty set and $A$ itself:
Size: $|\mathcal{P}(A)| = 2^{|A|}$. A set with 3 elements has $2^3 = 8$ subsets.
Why $2^n$? For each element, you have a binary choice: include it or not. That's $n$ yes/no decisions, giving $2^n$ combinations.
You have 4 features to test: $\{\text{dark\_mode}, \text{new\_nav}, \text{ai\_chat}, \text{redesigned\_cart}\}$.
The power set gives you ALL possible combinations = $2^4 = 16$ test groups, including "no features" (control) and "all features."
This is why testing all combinations of $n$ features grows exponentially. With 10 features, you'd need $2^{10} = 1024$ test groups. With 20 features: over a million. This is exactly why full factorial testing is impractical and we use partial factorial designs.
These two laws are the most important rules connecting union, intersection, and complement. They show up everywhere — in SQL, boolean logic, circuit design, and everyday coding.
Law 1: $(A \cup B)^c = A^c \cap B^c$
"NOT (A OR B)" is the same as "(NOT A) AND (NOT B)"
Law 2: $(A \cap B)^c = A^c \cup B^c$
"NOT (A AND B)" is the same as "(NOT A) OR (NOT B)"
In English:
You write this guard clause:
if (!(isAdmin || isModerator)) { deny(); }
By De Morgan's Law 1, this is identical to:
if (!isAdmin && !isModerator) { deny(); }
Both mean "if the user is neither an admin nor a moderator, deny access."
Another one:
if (!(hasTicket && hasID)) { reject(); }
By Law 2:
if (!hasTicket || !hasID) { reject(); }
"If they don't have both a ticket and ID" = "If they're missing a ticket, or missing ID, or both."
WHERE NOT (status = 'active' OR role = 'admin')
is equivalent to (by De Morgan):
WHERE status != 'active' AND role != 'admin'
Query optimizers use De Morgan's laws automatically to rewrite queries into more efficient forms.
Pick an element and see how both sides of De Morgan's laws evaluate to the same thing:
| x ∈ A | x ∈ B | (A∪B)ᶜ | Aᶜ ∩ Bᶜ | Match? | (A∩B)ᶜ | Aᶜ ∪ Bᶜ | Match? |
|---|
Cardinality = size of a set. $|A|$ is how many elements are in $A$.
The most important counting formula in set theory is the Inclusion-Exclusion Principle:
You can't just add the sizes — you'd count the overlap twice. Subtract it once to fix.
For three sets:
A survey of 200 developers:
How many know at least one? $|JS \cup Py| = 120 + 90 - 50 = 160$
How many know neither? $200 - 160 = 40$
How many know only JavaScript? $120 - 50 = 70$
How many know only Python? $90 - 50 = 40$
Check: $70 + 40 + 50 + 40 = 200$ ✓ (only-JS + only-Py + both + neither = total)
Your bug tracker shows:
Total unique bugs: $|A \cup B| = 45 + 30 - 8 = 67$
Without inclusion-exclusion, you'd report 75 bugs. 8 of those are duplicates counted twice.
| Operation | Notation | Code | Meaning |
|---|---|---|---|
| Union | $A \cup B$ | a | b | In A or B (or both) |
| Intersection | $A \cap B$ | a & b | In both A and B |
| Difference | $A \setminus B$ | a - b | In A but not B |
| Symmetric Diff | $A \triangle B$ | a ^ b | In one but not both |
| Complement | $A^c$ | u - a | Not in A |
| Subset | $A \subseteq B$ | a <= b | Every element of A is in B |
| Cardinality | $|A|$ | len(a) | Number of elements |
| Cartesian Product | $A \times B$ | itertools.product(a,b) | All pairs |
| Power Set | $\mathcal{P}(A)$ | combinations loop | All subsets |