Union-Find
Use the Disjoint Set Union (DSU) structure to efficiently group elements, detect cycles in undirected graphs, and count connected components. Master path compression and union by rank for near-constant-time operations.
Learn & Reference
Understanding the Pattern
Union-Find Pattern
The Union-Find (also called Disjoint Set Union or DSU) data structure tracks a collection of non-overlapping sets. It supports two core operations:
- Find(x) — determine which set element x belongs to (returns the set's representative/root)
- Union(x, y) — merge the sets containing x and y
Initial: {0} {1} {2} {3} {4} ← 5 separate sets
Union(0,1): {0,1} {2} {3} {4} ← merged 0 and 1
Union(2,3): {0,1} {2,3} {4} ← merged 2 and 3
Union(1,3): {0,1,2,3} {4} ← merged via representatives
Find(2) == Find(0)? → true ← same set
Find(4) == Find(0)? → false ← different sets
Core Concept
Under the hood, each set is represented as a tree where each node points to its parent. The root of the tree is the set's representative. Initially, every element is its own root (parent[i] = i).
parent = [0, 1, 2, 3, 4] ← each element is its own root
After Union(0, 1): parent = [0, 0, 2, 3, 4] ← 1 points to 0
After Union(2, 3): parent = [0, 0, 2, 2, 4] ← 3 points to 2
After Union(1, 3): parent = [0, 0, 0, 2, 4] ← 2's root points to 0's root
Without optimization, trees can become long chains (O(n) per find). Two optimizations make operations nearly O(1):
Key Techniques
1. Path Compression
During find(x), make every node on the path point directly to the root. This flattens the tree so future lookups are faster.
Before find(4): After find(4):
0 0
/ \ / | \ \
1 2 1 2 3 4
|
3
|
4
2. Union by Rank (or Size)
When merging two sets, attach the shorter tree under the taller tree's root. This prevents the tree from growing tall. Combined with path compression, this gives O(α(n)) amortized time — essentially constant.
Rank tracks tree height (approximate):
rank = [0, 0, 0, 0]
Union(0, 1): attach 1 under 0, rank[0] stays 0 (both rank 0, so rank[0] becomes 1)
Union(2, 3): attach 3 under 2, rank[2] becomes 1
Union(0, 2): both rank 1, attach 2 under 0, rank[0] becomes 2
3. Connected Components Count
Initialize a counter equal to the number of elements. Each successful union (where two elements were in different sets) decrements the counter. The final value is the number of connected components.
4. Cycle Detection in Undirected Graphs
Process edges one by one. For each edge (u, v):
- If
find(u) == find(v), they're already connected → this edge creates a cycle - Otherwise,
union(u, v)to merge their components
This is the key insight for problems like Redundant Connection.
Common Mistakes
- Forgetting path compression — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
- Wrong rank/size update — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
- Not initializing parent array correctly — Every element must start as its own parent:
parent[i] = i. Initializing all to 0 breaks everything. - Using Union-Find for directed graphs — Union-Find works for undirected connectivity. For directed graphs, use DFS/BFS or topological sort instead.
- Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent array is sized correctly (n+1) and you initialize all indices.
When to Use
Template
1class UnionFind:2def __init__(self, n):3self.parent = list(range(n))4self.rank = [0] * n56def find(self, x):7# TODO: Implement with path compression8pass910def union(self, x, y):11# TODO: Implement with union by rank12pass1314def solve(n, edges):15uf = UnionFind(n)16# TODO: Process edges using union-find17pass
Syntax Reference
1# Union-Find (Disjoint Set Union)2# Built-in: None — implement manually34class UnionFind:5def __init__(self, n):6self.parent = list(range(n))7self.rank = [0] * n89def find(self, x):10if self.parent[x] != x:11self.parent[x] = self.find(self.parent[x]) # path compression12return self.parent[x]1314def union(self, x, y):15px, py = self.find(x), self.find(y)16if px == py:17return False # already in same set18if self.rank[px] < self.rank[py]:19px, py = py, px20self.parent[py] = px21if self.rank[px] == self.rank[py]:22self.rank[px] += 123return True # merged two different sets
Common Recipes
1# Recipe 1: Count connected components2def count_components(n, edges):3uf = UnionFind(n)4components = n5for u, v in edges:6if uf.union(u, v):7components -= 18return components910# Recipe 2: Cycle detection in undirected graph11def has_cycle(n, edges):12uf = UnionFind(n)13for u, v in edges:14if not uf.union(u, v):15return True # u and v already connected → cycle16return False1718# Recipe 3: Group elements by component19def group_by_component(n, edges):20uf = UnionFind(n)21for u, v in edges:22uf.union(u, v)23groups = {}24for i in range(n):25root = uf.find(i)26groups.setdefault(root, []).append(i)27return list(groups.values())2829# Recipe 4: Union-Find with string keys (e.g., emails)30def union_strings(pairs):31parent = {}32def find(x):33parent.setdefault(x, x)34if parent[x] != x:35parent[x] = find(parent[x])36return parent[x]37def union(x, y):38px, py = find(x), find(y)39if px != py:40parent[px] = py41for a, b in pairs:42union(a, b)43return parent
Complexity
| Operation | Without Optimization | With Path Compression + Rank | Notes |
|---|---|---|---|
| Find(x) | O(n) | O(α(n)) ≈ O(1) | α is inverse Ackermann — grows incredibly slowly |
| Union(x, y) | O(n) | O(α(n)) ≈ O(1) | Dominated by two find() calls |
| Connected(x, y) | O(n) | O(α(n)) ≈ O(1) | Just find(x) == find(y) |
| Build (n unions) | O(n²) | O(n · α(n)) ≈ O(n) | Processing all edges once |
| Space | O(n) | O(n) | parent[] + rank[] arrays |
Watch Out
- ✗Forgetting path compression — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
- ✗Wrong rank/size update — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
- ✗Not initializing parent array correctly — Every element must start as its own parent: `parent[i] = i`. Initializing all to 0 breaks everything.
- ✗Using Union-Find for directed graphs — Union-Find works for undirected connectivity. For directed graphs, use DFS/BFS or topological sort instead.
- ✗Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent array is sized correctly (n+1) and you initialize all indices.