PATTERN 18 OF 22

Advanced Graphs

Master topological sorting (Kahn's BFS and DFS post-order) for dependency ordering and cycle detection in directed graphs, plus Union-Find (Disjoint Set Union) with path compression and union by rank for connected components, cycle detection in undirected graphs, and element grouping.

Time: Topological Sort: O(V + E); Union-Find: O(alpha(n)) per operation (amortized near-constant)
Space: Topological Sort: O(V + E) for adjacency list; Union-Find: O(n) for parent and rank arrays

Learn & Reference

Understanding the Pattern

Advanced Graphs Pattern

This pattern covers two powerful graph techniques that appear frequently in interviews: Topological Sort for directed acyclic graphs and Union-Find (Disjoint Set Union) for undirected connectivity problems. While they solve different categories of problems, both are essential tools in the advanced graph toolkit.


Part 1: Topological Sort

A topological sort produces a linear ordering of vertices in a directed acyclic graph (DAG) such that for every directed edge u -> v, vertex u comes before v in the ordering. If the graph has a cycle, no valid topological ordering exists.

Think of it like course prerequisites: if Course A requires Course B, then B must appear before A in any valid schedule. Topological sort finds such a schedule — or detects that one is impossible.

Prerequisites:              Topological Order:
0 -> 1 -> 3                 [0, 2, 1, 3] or [2, 0, 1, 3]
2 -> 1

A DAG can have multiple valid topological orderings. Any ordering where all edges point "forward" is correct.

Kahn's Algorithm (BFS with In-Degrees)

The most intuitive approach. Track the in-degree (number of incoming edges) of each node. Nodes with in-degree 0 have no unsatisfied dependencies and can go first.

1. Build adjacency list and compute in-degree for each node
2. Add all nodes with in-degree 0 to a queue
3. While queue is not empty:
   a. Remove node from queue -> add to result
   b. For each neighbor, decrement its in-degree
   c. If neighbor's in-degree becomes 0, add to queue
4. If result has all nodes -> valid ordering
   If result is shorter -> cycle exists

DFS Post-Order Reversal

Process nodes in DFS order, and add each node to the result after all its descendants are processed (post-order). Reverse the result to get topological order.

1. For each unvisited node, run DFS
2. In DFS, mark node as "in progress" (on current path)
3. Recurse on all neighbors
4. If we visit an "in progress" node -> cycle detected
5. After all neighbors done, mark as "done" and push to stack
6. Pop all from stack -> topological order

Cycle Detection in Directed Graphs

Both approaches naturally detect cycles:

  • Kahn's: If the result has fewer nodes than the graph, a cycle exists (some nodes never reached in-degree 0)
  • DFS: If we encounter a node that's currently "in progress" on the recursion stack, we've found a back edge (cycle)

Building the Graph from Constraints

Many problems don't give you an explicit graph — you derive edges from constraints:

  • Course prerequisites: [course, prereq] -> edge from prereq to course
  • Alien dictionary: Compare adjacent words to find character ordering -> edges between characters
  • Build systems: Dependencies between tasks -> edges from dependency to dependent

Part 2: Union-Find (Disjoint Set Union)

The Union-Find data structure tracks a collection of non-overlapping sets. It supports two core operations:

  • Find(x) — determine which set element x belongs to (returns the set's representative/root)
  • Union(x, y) — merge the sets containing x and y
Initial:    {0} {1} {2} {3} {4}      <- 5 separate sets
Union(0,1): {0,1} {2} {3} {4}        <- merged 0 and 1
Union(2,3): {0,1} {2,3} {4}          <- merged 2 and 3
Union(1,3): {0,1,2,3} {4}            <- merged via representatives
Find(2) == Find(0)?  -> true          <- same set
Find(4) == Find(0)?  -> false         <- different sets

Under the hood, each set is represented as a tree where each node points to its parent. The root of the tree is the set's representative. Initially, every element is its own root (parent[i] = i).

Path Compression

During find(x), make every node on the path point directly to the root. This flattens the tree so future lookups are faster.

Before find(4):            After find(4):
    0                          0
   / \                      / | \ \
  1   2                    1  2  3  4
      |
      3
      |
      4

Union by Rank

When merging two sets, attach the shorter tree under the taller tree's root. This prevents the tree from growing tall. Combined with path compression, this gives O(alpha(n)) amortized time per operation — essentially constant.

Connected Components Count

Initialize a counter equal to the number of elements. Each successful union (where two elements were in different sets) decrements the counter. The final value is the number of connected components.

Cycle Detection in Undirected Graphs

Process edges one by one. For each edge (u, v):

  • If find(u) == find(v), they're already connected -> this edge creates a cycle
  • Otherwise, union(u, v) to merge their components

Choosing Between Them

| Scenario | Use | |----------|-----| | Dependency ordering, scheduling | Topological Sort | | Cycle detection in directed graphs | Topological Sort (DFS or Kahn's) | | Cycle detection in undirected graphs | Union-Find | | Connected components (static) | Union-Find | | Grouping by equivalence (accounts, synonyms) | Union-Find | | Character ordering from comparisons | Topological Sort |

Common Mistakes

  1. Forgetting to handle disconnected components in topo sort — In Kahn's algorithm, initialize ALL nodes with in-degree 0 into the queue (not just node 0). In DFS, iterate over ALL nodes as starting points.
  2. Wrong edge direction for prerequisites — For prerequisites [a, b] meaning "b before a," the edge is b -> a, not a -> b. Getting this backwards produces the reverse ordering.
  3. Confusing "visited" states in DFS cycle detection — You need THREE states: unvisited, in-progress (on current recursion path), and done. Using only two states (visited/unvisited) can't detect cycles in directed graphs.
  4. Not detecting cycles in topo sort — Always check if the result includes all nodes. Returning a partial ordering when a cycle exists is a common source of wrong answers.
  5. Forgetting path compression in Union-Find — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
  6. Wrong rank update in Union-Find — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
  7. Not initializing Union-Find parent array correctly — Every element must start as its own parent: parent[i] = i. Initializing all to 0 breaks everything.
  8. Using Union-Find for directed graphs — Union-Find tracks undirected connectivity. For directed graph problems (course prerequisites, alien dictionary), use topological sort instead.
  9. Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent/in-degree arrays are sized correctly (n+1) and you initialize all indices.

When to Use

The problem involves **ordering with dependencies** (prerequisites, build order, task scheduling) -> Topological Sort
You need to detect if a **directed dependency graph has a cycle** -> Topological Sort
You need to derive an **ordering from pairwise comparisons** (alien dictionary, custom sort) -> Topological Sort
The problem asks about **connected components** or grouping elements -> Union-Find
You need to **detect cycles in an undirected graph** -> Union-Find
Elements need to be **merged into equivalence classes** (accounts, synonyms, regions) -> Union-Find
The problem involves **dynamic connectivity** — are these two elements connected? -> Union-Find
You process **edges one at a time** and need to track which components merge -> Union-Find

Template

1# ===== Kahn's Algorithm — Topological Sort (BFS) =====
2from collections import deque
3
4def topo_sort(num_courses, prerequisites):
5 graph = [[] for _ in range(num_courses)]
6 in_degree = [0] * num_courses
7
8 for course, prereq in prerequisites:
9 graph[prereq].append(course)
10 in_degree[course] += 1
11
12 queue = deque(i for i in range(num_courses) if in_degree[i] == 0)
13 order = []
14
15 while queue:
16 node = queue.popleft()
17 order.append(node)
18 for neighbor in graph[node]:
19 in_degree[neighbor] -= 1
20 if in_degree[neighbor] == 0:
21 queue.append(neighbor)
22
23 return order if len(order) == num_courses else []
24
25
26# ===== Union-Find (Disjoint Set Union) =====
27class UnionFind:
28 def __init__(self, n):
29 self.parent = list(range(n))
30 self.rank = [0] * n
31
32 def find(self, x):
33 if self.parent[x] != x:
34 self.parent[x] = self.find(self.parent[x]) # path compression
35 return self.parent[x]
36
37 def union(self, x, y):
38 px, py = self.find(x), self.find(y)
39 if px == py:
40 return False # already in same set
41 if self.rank[px] < self.rank[py]:
42 px, py = py, px
43 self.parent[py] = px
44 if self.rank[px] == self.rank[py]:
45 self.rank[px] += 1
46 return True # merged two different sets
{ }

Syntax Reference

1# --- Kahn's Algorithm Setup (Topological Sort) ---
2from collections import deque, defaultdict
3
4graph = defaultdict(list) # node -> [neighbors]
5in_degree = [0] * num_nodes
6
7for src, dst in edges:
8 graph[src].append(dst)
9 in_degree[dst] += 1
10
11# Initialize queue with in-degree 0 nodes
12queue = deque(i for i in range(num_nodes) if in_degree[i] == 0)
13
14# Process: pop from queue, decrement neighbor in-degrees
15while queue:
16 node = queue.popleft()
17 for neighbor in graph[node]:
18 in_degree[neighbor] -= 1
19 if in_degree[neighbor] == 0:
20 queue.append(neighbor)
21
22# --- Union-Find Class ---
23class UnionFind:
24 def __init__(self, n):
25 self.parent = list(range(n))
26 self.rank = [0] * n
27
28 def find(self, x):
29 if self.parent[x] != x:
30 self.parent[x] = self.find(self.parent[x]) # path compression
31 return self.parent[x]
32
33 def union(self, x, y):
34 px, py = self.find(x), self.find(y)
35 if px == py:
36 return False # already in same set
37 if self.rank[px] < self.rank[py]:
38 px, py = py, px
39 self.parent[py] = px
40 if self.rank[px] == self.rank[py]:
41 self.rank[px] += 1
42 return True # merged two different sets
📋

Common Recipes

1# Recipe 1: Kahn's Algorithm — Topological Sort (BFS)
2from collections import deque, defaultdict
3
4def topo_sort_kahn(num_nodes, edges):
5 graph = defaultdict(list)
6 in_degree = [0] * num_nodes
7 for src, dst in edges:
8 graph[src].append(dst)
9 in_degree[dst] += 1
10
11 queue = deque(i for i in range(num_nodes) if in_degree[i] == 0)
12 order = []
13
14 while queue:
15 node = queue.popleft()
16 order.append(node)
17 for neighbor in graph[node]:
18 in_degree[neighbor] -= 1
19 if in_degree[neighbor] == 0:
20 queue.append(neighbor)
21
22 return order if len(order) == num_nodes else [] # empty = cycle
23
24# Recipe 2: Cycle Detection in Directed Graph (Kahn's)
25def has_cycle_directed(num_nodes, edges):
26 order = topo_sort_kahn(num_nodes, edges)
27 return len(order) != num_nodes # True if cycle exists
28
29# Recipe 3: Union-Find — Count Connected Components
30def count_components(n, edges):
31 uf = UnionFind(n)
32 components = n
33 for u, v in edges:
34 if uf.union(u, v):
35 components -= 1
36 return components
37
38# Recipe 4: Union-Find — Cycle Detection in Undirected Graph
39def has_cycle_undirected(n, edges):
40 uf = UnionFind(n)
41 for u, v in edges:
42 if not uf.union(u, v):
43 return True # u and v already connected -> cycle
44 return False
O(n)

Complexity

OperationTimeSpaceNotes
Kahn's Algorithm (BFS topo sort)O(V + E)O(V + E)Adjacency list + in-degree array + queue
DFS Post-Order (topo sort)O(V + E)O(V + E)Adjacency list + color array + recursion stack
Cycle Detection (directed)O(V + E)O(V + E)Same as topo sort -- cycle = incomplete ordering
Union-Find: Find(x)O(alpha(n)) ~ O(1)O(n)With path compression; alpha is inverse Ackermann
Union-Find: Union(x, y)O(alpha(n)) ~ O(1)O(n)Dominated by two find() calls
Union-Find: Build (n unions)O(n * alpha(n)) ~ O(n)O(n)Processing all edges once with path compression + rank
Cycle Detection (undirected)O(E * alpha(n)) ~ O(E)O(n)One find per edge using Union-Find

Watch Out

  • Forgetting to handle disconnected components in topo sort — In Kahn's algorithm, initialize ALL nodes with in-degree 0 into the queue (not just node 0). In DFS, iterate over ALL nodes as starting points.
  • Wrong edge direction for prerequisites — For prerequisites `[a, b]` meaning "b before a," the edge is `b -> a`, not `a -> b`. Getting this backwards produces the reverse ordering.
  • Confusing "visited" states in DFS cycle detection — You need THREE states: unvisited, in-progress (on current recursion path), and done. Using only two states (visited/unvisited) can't detect cycles in directed graphs.
  • Not detecting cycles in topo sort — Always check if the result includes all nodes. Returning a partial ordering when a cycle exists is a common source of wrong answers.
  • Forgetting path compression in Union-Find — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
  • Wrong rank update in Union-Find — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
  • Not initializing Union-Find parent array correctly — Every element must start as its own parent: `parent[i] = i`. Initializing all to 0 breaks everything.
  • Using Union-Find for directed graphs — Union-Find tracks undirected connectivity. For directed graph problems (course prerequisites, alien dictionary), use topological sort instead.
  • Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent/in-degree arrays are sized correctly (n+1) and you initialize all indices.