Mastering Algorithms

Union-Find Data Structure

Overview

The Union-Find data structure, also known as Disjoint Set Union (DSU), is a powerful data structure used to efficiently manage and query disjoint sets. It provides an elegant solution for tracking which elements belong to the same set and for merging sets together. The data structure maintains a collection of disjoint (non-overlapping) sets and supports two fundamental operations:

  • Find: Determines the representative (root) of the set to which a particular element belongs. This operation answers the question: "Which set does this element belong to?"
  • Union: Merges two sets into one by connecting their roots. This operation combines two separate sets into a single set.

The Union-Find data structure uses a tree-based representation where each set is represented as a tree, with one element serving as the root (representative) of that set. Initially, each element is its own parent, forming n singleton sets. The Find operation traverses up the tree to find the root, while Union connects two trees by making one root point to the other.

Key Optimizations

To achieve near-constant time complexity, Union-Find employs two key optimizations:

  • Path Compression: During the Find operation, all nodes along the path from the queried element to the root are directly connected to the root. This flattens the tree structure, making future Find operations faster.
  • Union by Rank: When merging two sets, the smaller tree (by rank/height) is attached to the root of the larger tree. This prevents the tree from becoming too tall, keeping operations efficient.

With these optimizations, both Find and Union operations achieve nearly constant amortized time complexity, approximately O(α(n)), where α(n) is the inverse Ackermann function, which grows extremely slowly and is effectively constant for all practical purposes. The space complexity is O(n) to store the parent and rank arrays.

How It Works

Union-Find works by:

  1. Initialization: Each element starts as its own parent, creating n singleton sets
  2. Find Operation: Traverses up the tree to find the root, applying path compression along the way
  3. Union Operation: Connects two trees by making one root point to the other, using union by rank to keep trees balanced
  4. Path Compression: During Find, all nodes on the path are connected directly to the root

Union-Find Algorithm Pseudocode


Initialize:
    parent = [i for i in range(n)]
    rank = [1] * n

Find(x):
    if x != parent[x]:
        parent[x] = Find(parent[x])  # Path compression
    return parent[x]

Union(x, y):
    root_x = Find(x)
    root_y = Find(y)
    if root_x != root_y:
        if rank[root_x] > rank[root_y]:
            parent[root_y] = root_x
        elif rank[root_x] < rank[root_y]:
            parent[root_x] = root_y
        else:
            parent[root_y] = root_x
            rank[root_x] += 1
                

Implementation


class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [1] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]
    
    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)
        
        if root_x != root_y:
            if self.rank[root_x] > self.rank[root_y]:
                self.parent[root_y] = root_x
            elif self.rank[root_x] < self.rank[root_y]:
                self.parent[root_x] = root_y
            else:
                self.parent[root_y] = root_x
                self.rank[root_x] += 1
    
    def connected(self, x, y):
        return self.find(x) == self.find(y)
                

Complexity Analysis

  • Time Complexity:
    • Find: O(α(n)) amortized (nearly constant)
    • Union: O(α(n)) amortized (nearly constant)
    • Without optimizations: O(n) worst case
  • Space Complexity: O(n) - to store parent and rank arrays

The inverse Ackermann function α(n) grows extremely slowly. For all practical values of n (even up to 2^65536), α(n) is less than 5, making Union-Find operations effectively constant time.

Path Compression Explained

Path compression is a crucial optimization that flattens the tree structure during Find operations. When we find the root of an element, we update all nodes along the path to point directly to the root.

Example: If we have a path A → B → C → D (root), after Find(A), the structure becomes:

Before:  A → B → C → D
After:   A → D
         B → D
         C → D
                

This makes future Find operations on A, B, or C much faster.

Union by Rank Explained

Union by rank ensures that when merging two trees, we always attach the smaller tree to the root of the larger tree. This prevents the tree from becoming too tall.

The rank represents an upper bound on the height of the tree. When two trees have the same rank, we increment the rank of the new root.

Example:

Tree 1 (rank 2):     Tree 2 (rank 1):
    A                    D
   / \                  / \
  B   C                E   F

After Union(A, D): Attach smaller tree to larger
    A (rank 2)
   /|\
  B C D
      / \
     E   F
                

Union-Find Framework

  • Initialize the data structure with each element as its own parent and rank set to 1.
  • Use the Find operation to locate the root representative of an element's set.
  • Apply path compression during Find to optimize future queries by flattening the tree structure.
  • Use Union by rank to merge sets efficiently, always attaching the smaller tree to the larger one.
  • Check if elements are in the same set by comparing their root representatives from Find operations.

Example

Using Union-Find to track connected components:


# Initialize with 5 elements: [0, 1, 2, 3, 4]
uf = UnionFind(5)

# Initially, each element is its own parent
# parent = [0, 1, 2, 3, 4]
# rank = [1, 1, 1, 1, 1]

# Union operations
uf.union(0, 1)  # Connect 0 and 1
uf.union(2, 3)  # Connect 2 and 3
uf.union(1, 2)  # Connect 1 and 2 (connects all 0,1,2,3)

# Find operations
uf.find(0) == uf.find(3)  # True - same component
uf.find(0) == uf.find(4)  # False - different components
                

Real-World Applications

  • Network Connectivity: Determining if nodes in a network are connected
  • Image Processing: Connected component labeling in images
  • Social Networks: Finding friend groups or communities
  • Kruskal's Algorithm: Building minimum spanning trees
  • Percolation Theory: Modeling physical systems
  • Equivalence Testing: Determining if two elements are equivalent

Related Algorithms

Explore other graph algorithms: