Union-Find Data Structure
Overview
The Union-Find data structure, also known as Disjoint Set Union (DSU), is a powerful data structure used to efficiently manage and query disjoint sets. It provides an elegant solution for tracking which elements belong to the same set and for merging sets together. The data structure maintains a collection of disjoint (non-overlapping) sets and supports two fundamental operations:
- Find: Determines the representative (root) of the set to which a particular element belongs. This operation answers the question: "Which set does this element belong to?"
- Union: Merges two sets into one by connecting their roots. This operation combines two separate sets into a single set.
The Union-Find data structure uses a tree-based representation where each set is represented as a tree, with one element serving as the root (representative) of that set. Initially, each element is its own parent, forming n singleton sets. The Find operation traverses up the tree to find the root, while Union connects two trees by making one root point to the other.
Key Optimizations
To achieve near-constant time complexity, Union-Find employs two key optimizations:
- Path Compression: During the Find operation, all nodes along the path from the queried element to the root are directly connected to the root. This flattens the tree structure, making future Find operations faster.
- Union by Rank: When merging two sets, the smaller tree (by rank/height) is attached to the root of the larger tree. This prevents the tree from becoming too tall, keeping operations efficient.
With these optimizations, both Find and Union operations achieve nearly constant amortized time complexity, approximately O(α(n)), where α(n) is the inverse Ackermann function, which grows extremely slowly and is effectively constant for all practical purposes. The space complexity is O(n) to store the parent and rank arrays.
How It Works
Union-Find works by:
- Initialization: Each element starts as its own parent, creating n singleton sets
- Find Operation: Traverses up the tree to find the root, applying path compression along the way
- Union Operation: Connects two trees by making one root point to the other, using union by rank to keep trees balanced
- Path Compression: During Find, all nodes on the path are connected directly to the root
Union-Find Algorithm Pseudocode
Initialize:
parent = [i for i in range(n)]
rank = [1] * n
Find(x):
if x != parent[x]:
parent[x] = Find(parent[x]) # Path compression
return parent[x]
Union(x, y):
root_x = Find(x)
root_y = Find(y)
if root_x != root_y:
if rank[root_x] > rank[root_y]:
parent[root_y] = root_x
elif rank[root_x] < rank[root_y]:
parent[root_x] = root_y
else:
parent[root_y] = root_x
rank[root_x] += 1
Implementation
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [1] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Path compression
return self.parent[x]
def union(self, x, y):
root_x = self.find(x)
root_y = self.find(y)
if root_x != root_y:
if self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
elif self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
def connected(self, x, y):
return self.find(x) == self.find(y)
Complexity Analysis
- Time Complexity:
- Find: O(α(n)) amortized (nearly constant)
- Union: O(α(n)) amortized (nearly constant)
- Without optimizations: O(n) worst case
- Space Complexity: O(n) - to store parent and rank arrays
The inverse Ackermann function α(n) grows extremely slowly. For all practical values of n (even up to 2^65536), α(n) is less than 5, making Union-Find operations effectively constant time.
Path Compression Explained
Path compression is a crucial optimization that flattens the tree structure during Find operations. When we find the root of an element, we update all nodes along the path to point directly to the root.
Example: If we have a path A → B → C → D (root), after Find(A), the structure becomes:
Before: A → B → C → D
After: A → D
B → D
C → D
This makes future Find operations on A, B, or C much faster.
Union by Rank Explained
Union by rank ensures that when merging two trees, we always attach the smaller tree to the root of the larger tree. This prevents the tree from becoming too tall.
The rank represents an upper bound on the height of the tree. When two trees have the same rank, we increment the rank of the new root.
Example:
Tree 1 (rank 2): Tree 2 (rank 1):
A D
/ \ / \
B C E F
After Union(A, D): Attach smaller tree to larger
A (rank 2)
/|\
B C D
/ \
E F
Common Trends in Problems where Union-Find is Applied
- Connected Components: Union-Find excels at tracking and merging connected components in graphs, making it ideal for problems that ask about connectivity or require grouping connected elements.
- Dynamic Connectivity: When you need to answer connectivity queries as edges are added or removed over time, Union-Find provides an efficient solution.
- Cycle Detection: Union-Find can detect cycles in undirected graphs by checking if two nodes already belong to the same set before adding an edge between them.
- Minimum Spanning Tree: Kruskal's algorithm uses Union-Find to efficiently determine if adding an edge would create a cycle, enabling the construction of MSTs.
- Equivalence Relations: Problems involving equivalence classes, transitive relationships, or "friends of friends" scenarios naturally map to Union-Find operations.
Union-Find Framework
- Initialize the data structure with each element as its own parent and rank set to 1.
- Use the Find operation to locate the root representative of an element's set.
- Apply path compression during Find to optimize future queries by flattening the tree structure.
- Use Union by rank to merge sets efficiently, always attaching the smaller tree to the larger one.
- Check if elements are in the same set by comparing their root representatives from Find operations.
Example
Using Union-Find to track connected components:
# Initialize with 5 elements: [0, 1, 2, 3, 4]
uf = UnionFind(5)
# Initially, each element is its own parent
# parent = [0, 1, 2, 3, 4]
# rank = [1, 1, 1, 1, 1]
# Union operations
uf.union(0, 1) # Connect 0 and 1
uf.union(2, 3) # Connect 2 and 3
uf.union(1, 2) # Connect 1 and 2 (connects all 0,1,2,3)
# Find operations
uf.find(0) == uf.find(3) # True - same component
uf.find(0) == uf.find(4) # False - different components
Real-World Applications
- Network Connectivity: Determining if nodes in a network are connected
- Image Processing: Connected component labeling in images
- Social Networks: Finding friend groups or communities
- Kruskal's Algorithm: Building minimum spanning trees
- Percolation Theory: Modeling physical systems
- Equivalence Testing: Determining if two elements are equivalent
Related Algorithms
Explore other graph algorithms:
- Depth-First Search (DFS) - Graph traversal
- Breadth-First Search (BFS) - Level-by-level traversal
- Dijkstra's Algorithm - Shortest path in weighted graphs
- Back to Graph Algorithms Overview