Advanced Graphs
Master topological sorting (Kahn's BFS and DFS post-order) for dependency ordering and cycle detection in directed graphs, plus Union-Find (Disjoint Set Union) with path compression and union by rank for connected components, cycle detection in undirected graphs, and element grouping.
Learn & Reference
Understanding the Pattern
Advanced Graphs Pattern
This pattern covers two powerful graph techniques that appear frequently in interviews: Topological Sort for directed acyclic graphs and Union-Find (Disjoint Set Union) for undirected connectivity problems. While they solve different categories of problems, both are essential tools in the advanced graph toolkit.
Part 1: Topological Sort
A topological sort produces a linear ordering of vertices in a directed acyclic graph (DAG) such that for every directed edge u -> v, vertex u comes before v in the ordering. If the graph has a cycle, no valid topological ordering exists.
Think of it like course prerequisites: if Course A requires Course B, then B must appear before A in any valid schedule. Topological sort finds such a schedule — or detects that one is impossible.
Prerequisites: Topological Order:
0 -> 1 -> 3 [0, 2, 1, 3] or [2, 0, 1, 3]
2 -> 1
A DAG can have multiple valid topological orderings. Any ordering where all edges point "forward" is correct.
Kahn's Algorithm (BFS with In-Degrees)
The most intuitive approach. Track the in-degree (number of incoming edges) of each node. Nodes with in-degree 0 have no unsatisfied dependencies and can go first.
1. Build adjacency list and compute in-degree for each node
2. Add all nodes with in-degree 0 to a queue
3. While queue is not empty:
a. Remove node from queue -> add to result
b. For each neighbor, decrement its in-degree
c. If neighbor's in-degree becomes 0, add to queue
4. If result has all nodes -> valid ordering
If result is shorter -> cycle exists
DFS Post-Order Reversal
Process nodes in DFS order, and add each node to the result after all its descendants are processed (post-order). Reverse the result to get topological order.
1. For each unvisited node, run DFS
2. In DFS, mark node as "in progress" (on current path)
3. Recurse on all neighbors
4. If we visit an "in progress" node -> cycle detected
5. After all neighbors done, mark as "done" and push to stack
6. Pop all from stack -> topological order
Cycle Detection in Directed Graphs
Both approaches naturally detect cycles:
- Kahn's: If the result has fewer nodes than the graph, a cycle exists (some nodes never reached in-degree 0)
- DFS: If we encounter a node that's currently "in progress" on the recursion stack, we've found a back edge (cycle)
Building the Graph from Constraints
Many problems don't give you an explicit graph — you derive edges from constraints:
- Course prerequisites:
[course, prereq]-> edge from prereq to course - Alien dictionary: Compare adjacent words to find character ordering -> edges between characters
- Build systems: Dependencies between tasks -> edges from dependency to dependent
Part 2: Union-Find (Disjoint Set Union)
The Union-Find data structure tracks a collection of non-overlapping sets. It supports two core operations:
- Find(x) — determine which set element x belongs to (returns the set's representative/root)
- Union(x, y) — merge the sets containing x and y
Initial: {0} {1} {2} {3} {4} <- 5 separate sets
Union(0,1): {0,1} {2} {3} {4} <- merged 0 and 1
Union(2,3): {0,1} {2,3} {4} <- merged 2 and 3
Union(1,3): {0,1,2,3} {4} <- merged via representatives
Find(2) == Find(0)? -> true <- same set
Find(4) == Find(0)? -> false <- different sets
Under the hood, each set is represented as a tree where each node points to its parent. The root of the tree is the set's representative. Initially, every element is its own root (parent[i] = i).
Path Compression
During find(x), make every node on the path point directly to the root. This flattens the tree so future lookups are faster.
Before find(4): After find(4):
0 0
/ \ / | \ \
1 2 1 2 3 4
|
3
|
4
Union by Rank
When merging two sets, attach the shorter tree under the taller tree's root. This prevents the tree from growing tall. Combined with path compression, this gives O(alpha(n)) amortized time per operation — essentially constant.
Connected Components Count
Initialize a counter equal to the number of elements. Each successful union (where two elements were in different sets) decrements the counter. The final value is the number of connected components.
Cycle Detection in Undirected Graphs
Process edges one by one. For each edge (u, v):
- If
find(u) == find(v), they're already connected -> this edge creates a cycle - Otherwise,
union(u, v)to merge their components
Choosing Between Them
| Scenario | Use | |----------|-----| | Dependency ordering, scheduling | Topological Sort | | Cycle detection in directed graphs | Topological Sort (DFS or Kahn's) | | Cycle detection in undirected graphs | Union-Find | | Connected components (static) | Union-Find | | Grouping by equivalence (accounts, synonyms) | Union-Find | | Character ordering from comparisons | Topological Sort |
Common Mistakes
- Forgetting to handle disconnected components in topo sort — In Kahn's algorithm, initialize ALL nodes with in-degree 0 into the queue (not just node 0). In DFS, iterate over ALL nodes as starting points.
- Wrong edge direction for prerequisites — For prerequisites
[a, b]meaning "b before a," the edge isb -> a, nota -> b. Getting this backwards produces the reverse ordering. - Confusing "visited" states in DFS cycle detection — You need THREE states: unvisited, in-progress (on current recursion path), and done. Using only two states (visited/unvisited) can't detect cycles in directed graphs.
- Not detecting cycles in topo sort — Always check if the result includes all nodes. Returning a partial ordering when a cycle exists is a common source of wrong answers.
- Forgetting path compression in Union-Find — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
- Wrong rank update in Union-Find — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
- Not initializing Union-Find parent array correctly — Every element must start as its own parent:
parent[i] = i. Initializing all to 0 breaks everything. - Using Union-Find for directed graphs — Union-Find tracks undirected connectivity. For directed graph problems (course prerequisites, alien dictionary), use topological sort instead.
- Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent/in-degree arrays are sized correctly (n+1) and you initialize all indices.
When to Use
Template
1# ===== Kahn's Algorithm — Topological Sort (BFS) =====2from collections import deque34def topo_sort(num_courses, prerequisites):5graph = [[] for _ in range(num_courses)]6in_degree = [0] * num_courses78for course, prereq in prerequisites:9graph[prereq].append(course)10in_degree[course] += 11112queue = deque(i for i in range(num_courses) if in_degree[i] == 0)13order = []1415while queue:16node = queue.popleft()17order.append(node)18for neighbor in graph[node]:19in_degree[neighbor] -= 120if in_degree[neighbor] == 0:21queue.append(neighbor)2223return order if len(order) == num_courses else []242526# ===== Union-Find (Disjoint Set Union) =====27class UnionFind:28def __init__(self, n):29self.parent = list(range(n))30self.rank = [0] * n3132def find(self, x):33if self.parent[x] != x:34self.parent[x] = self.find(self.parent[x]) # path compression35return self.parent[x]3637def union(self, x, y):38px, py = self.find(x), self.find(y)39if px == py:40return False # already in same set41if self.rank[px] < self.rank[py]:42px, py = py, px43self.parent[py] = px44if self.rank[px] == self.rank[py]:45self.rank[px] += 146return True # merged two different sets
Syntax Reference
1# --- Kahn's Algorithm Setup (Topological Sort) ---2from collections import deque, defaultdict34graph = defaultdict(list) # node -> [neighbors]5in_degree = [0] * num_nodes67for src, dst in edges:8graph[src].append(dst)9in_degree[dst] += 11011# Initialize queue with in-degree 0 nodes12queue = deque(i for i in range(num_nodes) if in_degree[i] == 0)1314# Process: pop from queue, decrement neighbor in-degrees15while queue:16node = queue.popleft()17for neighbor in graph[node]:18in_degree[neighbor] -= 119if in_degree[neighbor] == 0:20queue.append(neighbor)2122# --- Union-Find Class ---23class UnionFind:24def __init__(self, n):25self.parent = list(range(n))26self.rank = [0] * n2728def find(self, x):29if self.parent[x] != x:30self.parent[x] = self.find(self.parent[x]) # path compression31return self.parent[x]3233def union(self, x, y):34px, py = self.find(x), self.find(y)35if px == py:36return False # already in same set37if self.rank[px] < self.rank[py]:38px, py = py, px39self.parent[py] = px40if self.rank[px] == self.rank[py]:41self.rank[px] += 142return True # merged two different sets
Common Recipes
1# Recipe 1: Kahn's Algorithm — Topological Sort (BFS)2from collections import deque, defaultdict34def topo_sort_kahn(num_nodes, edges):5graph = defaultdict(list)6in_degree = [0] * num_nodes7for src, dst in edges:8graph[src].append(dst)9in_degree[dst] += 11011queue = deque(i for i in range(num_nodes) if in_degree[i] == 0)12order = []1314while queue:15node = queue.popleft()16order.append(node)17for neighbor in graph[node]:18in_degree[neighbor] -= 119if in_degree[neighbor] == 0:20queue.append(neighbor)2122return order if len(order) == num_nodes else [] # empty = cycle2324# Recipe 2: Cycle Detection in Directed Graph (Kahn's)25def has_cycle_directed(num_nodes, edges):26order = topo_sort_kahn(num_nodes, edges)27return len(order) != num_nodes # True if cycle exists2829# Recipe 3: Union-Find — Count Connected Components30def count_components(n, edges):31uf = UnionFind(n)32components = n33for u, v in edges:34if uf.union(u, v):35components -= 136return components3738# Recipe 4: Union-Find — Cycle Detection in Undirected Graph39def has_cycle_undirected(n, edges):40uf = UnionFind(n)41for u, v in edges:42if not uf.union(u, v):43return True # u and v already connected -> cycle44return False
Complexity
| Operation | Time | Space | Notes |
|---|---|---|---|
| Kahn's Algorithm (BFS topo sort) | O(V + E) | O(V + E) | Adjacency list + in-degree array + queue |
| DFS Post-Order (topo sort) | O(V + E) | O(V + E) | Adjacency list + color array + recursion stack |
| Cycle Detection (directed) | O(V + E) | O(V + E) | Same as topo sort -- cycle = incomplete ordering |
| Union-Find: Find(x) | O(alpha(n)) ~ O(1) | O(n) | With path compression; alpha is inverse Ackermann |
| Union-Find: Union(x, y) | O(alpha(n)) ~ O(1) | O(n) | Dominated by two find() calls |
| Union-Find: Build (n unions) | O(n * alpha(n)) ~ O(n) | O(n) | Processing all edges once with path compression + rank |
| Cycle Detection (undirected) | O(E * alpha(n)) ~ O(E) | O(n) | One find per edge using Union-Find |
Watch Out
- ✗Forgetting to handle disconnected components in topo sort — In Kahn's algorithm, initialize ALL nodes with in-degree 0 into the queue (not just node 0). In DFS, iterate over ALL nodes as starting points.
- ✗Wrong edge direction for prerequisites — For prerequisites `[a, b]` meaning "b before a," the edge is `b -> a`, not `a -> b`. Getting this backwards produces the reverse ordering.
- ✗Confusing "visited" states in DFS cycle detection — You need THREE states: unvisited, in-progress (on current recursion path), and done. Using only two states (visited/unvisited) can't detect cycles in directed graphs.
- ✗Not detecting cycles in topo sort — Always check if the result includes all nodes. Returning a partial ordering when a cycle exists is a common source of wrong answers.
- ✗Forgetting path compression in Union-Find — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
- ✗Wrong rank update in Union-Find — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
- ✗Not initializing Union-Find parent array correctly — Every element must start as its own parent: `parent[i] = i`. Initializing all to 0 breaks everything.
- ✗Using Union-Find for directed graphs — Union-Find tracks undirected connectivity. For directed graph problems (course prerequisites, alien dictionary), use topological sort instead.
- ✗Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent/in-degree arrays are sized correctly (n+1) and you initialize all indices.