PATTERN 0 OF 22

Union-Find

Use the Disjoint Set Union (DSU) structure to efficiently group elements, detect cycles in undirected graphs, and count connected components. Master path compression and union by rank for near-constant-time operations.

Time: O(α(n)) ≈ O(1) amortized per operation with path compression + union by rank
Space: O(n) for parent and rank arrays

Learn & Reference

Understanding the Pattern

Union-Find Pattern

The Union-Find (also called Disjoint Set Union or DSU) data structure tracks a collection of non-overlapping sets. It supports two core operations:

  • Find(x) — determine which set element x belongs to (returns the set's representative/root)
  • Union(x, y) — merge the sets containing x and y
Initial:    {0} {1} {2} {3} {4}      ← 5 separate sets
Union(0,1): {0,1} {2} {3} {4}        ← merged 0 and 1
Union(2,3): {0,1} {2,3} {4}          ← merged 2 and 3
Union(1,3): {0,1,2,3} {4}            ← merged via representatives
Find(2) == Find(0)?  → true          ← same set
Find(4) == Find(0)?  → false         ← different sets

Core Concept

Under the hood, each set is represented as a tree where each node points to its parent. The root of the tree is the set's representative. Initially, every element is its own root (parent[i] = i).

parent = [0, 1, 2, 3, 4]   ← each element is its own root

After Union(0, 1):  parent = [0, 0, 2, 3, 4]   ← 1 points to 0
After Union(2, 3):  parent = [0, 0, 2, 2, 4]   ← 3 points to 2
After Union(1, 3):  parent = [0, 0, 0, 2, 4]   ← 2's root points to 0's root

Without optimization, trees can become long chains (O(n) per find). Two optimizations make operations nearly O(1):

Key Techniques

1. Path Compression

During find(x), make every node on the path point directly to the root. This flattens the tree so future lookups are faster.

Before find(4):            After find(4):
    0                          0
   / \                      / | \ \
  1   2                    1  2  3  4
      |
      3
      |
      4

2. Union by Rank (or Size)

When merging two sets, attach the shorter tree under the taller tree's root. This prevents the tree from growing tall. Combined with path compression, this gives O(α(n)) amortized time — essentially constant.

Rank tracks tree height (approximate):
  rank = [0, 0, 0, 0]

Union(0, 1): attach 1 under 0, rank[0] stays 0 (both rank 0, so rank[0] becomes 1)
Union(2, 3): attach 3 under 2, rank[2] becomes 1
Union(0, 2): both rank 1, attach 2 under 0, rank[0] becomes 2

3. Connected Components Count

Initialize a counter equal to the number of elements. Each successful union (where two elements were in different sets) decrements the counter. The final value is the number of connected components.

4. Cycle Detection in Undirected Graphs

Process edges one by one. For each edge (u, v):

  • If find(u) == find(v), they're already connected → this edge creates a cycle
  • Otherwise, union(u, v) to merge their components

This is the key insight for problems like Redundant Connection.

Common Mistakes

  1. Forgetting path compression — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
  2. Wrong rank/size update — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
  3. Not initializing parent array correctly — Every element must start as its own parent: parent[i] = i. Initializing all to 0 breaks everything.
  4. Using Union-Find for directed graphs — Union-Find works for undirected connectivity. For directed graphs, use DFS/BFS or topological sort instead.
  5. Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent array is sized correctly (n+1) and you initialize all indices.

When to Use

The problem asks about **connected components** or grouping elements
You need to **detect cycles in an undirected graph**
Elements need to be **merged into equivalence classes** (accounts, synonyms, regions)
The problem involves **dynamic connectivity** — are these two elements connected?
You process **edges one at a time** and need to track which components merge
The problem can be modeled as merge if related — friends of friends, overlapping intervals, shared attributes

Template

1class UnionFind:
2 def __init__(self, n):
3 self.parent = list(range(n))
4 self.rank = [0] * n
5
6 def find(self, x):
7 # TODO: Implement with path compression
8 pass
9
10 def union(self, x, y):
11 # TODO: Implement with union by rank
12 pass
13
14def solve(n, edges):
15 uf = UnionFind(n)
16 # TODO: Process edges using union-find
17 pass
{ }

Syntax Reference

1# Union-Find (Disjoint Set Union)
2# Built-in: None — implement manually
3
4class UnionFind:
5 def __init__(self, n):
6 self.parent = list(range(n))
7 self.rank = [0] * n
8
9 def find(self, x):
10 if self.parent[x] != x:
11 self.parent[x] = self.find(self.parent[x]) # path compression
12 return self.parent[x]
13
14 def union(self, x, y):
15 px, py = self.find(x), self.find(y)
16 if px == py:
17 return False # already in same set
18 if self.rank[px] < self.rank[py]:
19 px, py = py, px
20 self.parent[py] = px
21 if self.rank[px] == self.rank[py]:
22 self.rank[px] += 1
23 return True # merged two different sets
📋

Common Recipes

1# Recipe 1: Count connected components
2def count_components(n, edges):
3 uf = UnionFind(n)
4 components = n
5 for u, v in edges:
6 if uf.union(u, v):
7 components -= 1
8 return components
9
10# Recipe 2: Cycle detection in undirected graph
11def has_cycle(n, edges):
12 uf = UnionFind(n)
13 for u, v in edges:
14 if not uf.union(u, v):
15 return True # u and v already connected → cycle
16 return False
17
18# Recipe 3: Group elements by component
19def group_by_component(n, edges):
20 uf = UnionFind(n)
21 for u, v in edges:
22 uf.union(u, v)
23 groups = {}
24 for i in range(n):
25 root = uf.find(i)
26 groups.setdefault(root, []).append(i)
27 return list(groups.values())
28
29# Recipe 4: Union-Find with string keys (e.g., emails)
30def union_strings(pairs):
31 parent = {}
32 def find(x):
33 parent.setdefault(x, x)
34 if parent[x] != x:
35 parent[x] = find(parent[x])
36 return parent[x]
37 def union(x, y):
38 px, py = find(x), find(y)
39 if px != py:
40 parent[px] = py
41 for a, b in pairs:
42 union(a, b)
43 return parent
O(n)

Complexity

OperationWithout OptimizationWith Path Compression + RankNotes
Find(x)O(n)O(α(n)) ≈ O(1)α is inverse Ackermann — grows incredibly slowly
Union(x, y)O(n)O(α(n)) ≈ O(1)Dominated by two find() calls
Connected(x, y)O(n)O(α(n)) ≈ O(1)Just find(x) == find(y)
Build (n unions)O(n²)O(n · α(n)) ≈ O(n)Processing all edges once
SpaceO(n)O(n)parent[] + rank[] arrays

Watch Out

  • Forgetting path compression — Without it, find() degrades to O(n) on skewed trees. Always compress the path during find.
  • Wrong rank/size update — Only increment rank when merging two trees of equal rank. If ranks differ, the rank of the larger tree stays the same.
  • Not initializing parent array correctly — Every element must start as its own parent: `parent[i] = i`. Initializing all to 0 breaks everything.
  • Using Union-Find for directed graphs — Union-Find works for undirected connectivity. For directed graphs, use DFS/BFS or topological sort instead.
  • Off-by-one with 1-indexed nodes — Many problems use 1-indexed nodes. Make sure your parent array is sized correctly (n+1) and you initialize all indices.