Mastering Algorithms

Greedy Algorithms

Introduction to Greedy Algorithms

Greedy algorithms are a class of algorithms that make locally optimal choices at each step with the hope of finding a global optimum. The term "greedy" comes from the strategy of always choosing the option that looks best at the moment, without considering the broader context or future consequences.

Unlike dynamic programming, which considers all possible subproblems, greedy algorithms make a single choice and never reconsider it. This makes them typically faster and simpler, but they only work correctly when the problem has specific properties that guarantee the greedy choice leads to an optimal solution.

Greedy algorithms are widely used in optimization problems, scheduling, graph algorithms, and compression. They're particularly valuable when you need efficient solutions and can prove that the greedy strategy is optimal.

When to Use Greedy Algorithms

Greedy algorithms are suitable when:

  • The problem has the greedy choice property - a globally optimal solution can be reached by making locally optimal choices
  • The problem has optimal substructure - an optimal solution contains optimal solutions to subproblems
  • You need a fast, efficient solution (often O(n log n) or better)
  • The problem can be solved by making a series of choices where each choice is independent

Greedy vs Dynamic Programming

Understanding when to use greedy algorithms versus dynamic programming is crucial:

Feature Greedy Algorithms Dynamic Programming
Decision Making Make one choice, never reconsider Consider all possibilities, store results
Time Complexity Usually O(n log n) or O(n) Usually O(n²) or higher
Space Complexity Usually O(1) or O(n) Usually O(n) or O(n²)
Optimality Requires proof of greedy choice property Guaranteed optimal if problem has optimal substructure
Best For Optimization with greedy choice property Problems with overlapping subproblems

Key Properties of Greedy Algorithms

1. Greedy Choice Property

A problem exhibits the greedy choice property if a globally optimal solution can be reached by making a locally optimal (greedy) choice at each step. This means that at each decision point, we can make the choice that looks best right now, and this will lead to an optimal solution.

Example: In the activity selection problem, always choosing the activity that ends earliest (greedy choice) leads to the maximum number of non-overlapping activities (optimal solution).

2. Optimal Substructure

A problem has optimal substructure if an optimal solution to the problem contains optimal solutions to its subproblems. This property is shared with dynamic programming, but greedy algorithms use it differently - they make one choice and solve one subproblem, rather than solving all subproblems.

Example: In the minimum spanning tree problem, if we know the MST of a graph, and we add a new edge, the optimal way to incorporate it uses the existing MST structure.

3. Proving Greedy Algorithms

To prove a greedy algorithm is correct, you typically need to:

  1. Prove the greedy choice property: Show that making the greedy choice is always safe (doesn't exclude optimal solutions)
  2. Prove optimal substructure: Show that the problem can be solved by combining the greedy choice with an optimal solution to the remaining subproblem
  3. Use exchange argument: Show that any optimal solution can be transformed to include the greedy choice without making it worse

Activity Selection Problem

The activity selection problem is a classic greedy algorithm problem: given a set of activities with start and finish times, select the maximum number of non-overlapping activities.

Problem Statement

Given n activities with start times s[i] and finish times f[i], where activity i takes place during [s[i], f[i]), find the maximum number of activities that can be scheduled without overlapping.

Greedy Strategy

The greedy choice is to always select the activity that finishes earliest. This leaves the maximum amount of time for remaining activities.

Algorithm

def activity_selection(activities):
    # Sort activities by finish time
    activities.sort(key=lambda x: x[1])
    
    selected = [activities[0]]  # Always select first activity
    last_finish = activities[0][1]
    
    for start, finish in activities[1:]:
        if start >= last_finish:  # No overlap
            selected.append((start, finish))
            last_finish = finish
    
    return selected

# Example
activities = [(1, 4), (3, 5), (0, 6), (5, 7), (8, 9), (5, 9)]
result = activity_selection(activities)
# Result: [(1, 4), (5, 7), (8, 9)] - 3 activities

Complexity Analysis

  • Time Complexity: O(n log n) - dominated by sorting
  • Space Complexity: O(n) - for storing selected activities

Why It Works

The greedy choice property holds because if we have an optimal solution that doesn't include the earliest-finishing activity, we can replace the first activity in that solution with the earliest-finishing one without reducing the total count (since it finishes earlier, it can't conflict with more activities).

Fractional Knapsack Problem

The fractional knapsack problem allows taking fractions of items, making it solvable with a greedy approach. This contrasts with the 0-1 knapsack problem, which requires dynamic programming.

Problem Statement

Given items with weights and values, and a knapsack with capacity W, maximize the total value. You can take fractions of items (unlike 0-1 knapsack where items are indivisible).

Greedy Strategy

Sort items by value-to-weight ratio (value/weight) in descending order. Take items in this order, taking as much as possible of each item until the knapsack is full.

Algorithm

def fractional_knapsack(items, capacity):
    # Sort by value/weight ratio (descending)
    items.sort(key=lambda x: x[1] / x[0], reverse=True)
    
    total_value = 0
    remaining_capacity = capacity
    
    for weight, value in items:
        if remaining_capacity >= weight:
            # Take entire item
            total_value += value
            remaining_capacity -= weight
        else:
            # Take fraction of item
            fraction = remaining_capacity / weight
            total_value += value * fraction
            break
    
    return total_value

# Example
items = [(10, 60), (20, 100), (30, 120)]  # (weight, value)
capacity = 50
result = fractional_knapsack(items, capacity)
# Takes all of item 1 (10, 60), all of item 2 (20, 100), 
# and 2/3 of item 3 (20, 80) = 240 total value

Complexity Analysis

  • Time Complexity: O(n log n) - sorting dominates
  • Space Complexity: O(1) - only using a few variables

Why Greedy Works for Fractional Knapsack

Since we can take fractions, the item with the highest value-to-weight ratio should be prioritized. If we don't take as much as possible of the best ratio item, we can always improve the solution by replacing some of a lower-ratio item with the higher-ratio item.

Huffman Coding

Huffman coding is a lossless data compression algorithm that creates optimal prefix-free codes for characters based on their frequencies. It's used in file compression formats like ZIP and JPEG.

Problem Statement

Given characters and their frequencies, assign binary codes to each character such that:

  • No code is a prefix of another (prefix-free property)
  • The total encoded length is minimized

Greedy Strategy

Build a binary tree bottom-up by repeatedly merging the two nodes with lowest frequencies. The characters with higher frequencies get shorter codes.

Algorithm

import heapq

class Node:
    def __init__(self, char, freq, left=None, right=None):
        self.char = char
        self.freq = freq
        self.left = left
        self.right = right
    
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(frequencies):
    # Create priority queue (min-heap)
    heap = [Node(char, freq) for char, freq in frequencies.items()]
    heapq.heapify(heap)
    
    # Build tree by merging nodes
    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = Node(None, left.freq + right.freq, left, right)
        heapq.heappush(heap, merged)
    
    return heap[0]  # Root of Huffman tree

def build_codes(root, code="", codes={}):
    if root.char is not None:
        codes[root.char] = code if code else "0"
    else:
        build_codes(root.left, code + "0", codes)
        build_codes(root.right, code + "1", codes)
    return codes

# Example
frequencies = {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}
tree = build_huffman_tree(frequencies)
codes = build_codes(tree)
# Result: {'f': '0', 'c': '100', 'd': '101', 'a': '1100', 
#          'b': '1101', 'e': '111'}

Complexity Analysis

  • Time Complexity: O(n log n) - building heap and merging nodes
  • Space Complexity: O(n) - for the tree and codes

Applications

  • File compression (ZIP, GZIP)
  • Image compression (JPEG uses Huffman coding)
  • Network protocols
  • Data transmission optimization

Minimum Spanning Tree (MST)

A minimum spanning tree connects all vertices in a weighted, undirected graph with minimum total edge weight. Two famous greedy algorithms solve this: Kruskal's and Prim's.

Kruskal's Algorithm

Kruskal's algorithm builds the MST by repeatedly adding the smallest edge that doesn't form a cycle. It uses the Union-Find data structure to efficiently detect cycles.

Algorithm

def kruskal_mst(edges, num_vertices):
    # Sort edges by weight
    edges.sort(key=lambda x: x[2])  # (u, v, weight)
    
    mst = []
    parent = list(range(num_vertices))
    
    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])  # Path compression
        return parent[x]
    
    def union(x, y):
        root_x, root_y = find(x), find(y)
        if root_x != root_y:
            parent[root_x] = root_y
            return True
        return False
    
    for u, v, weight in edges:
        if union(u, v):  # No cycle
            mst.append((u, v, weight))
            if len(mst) == num_vertices - 1:
                break
    
    return mst

Complexity

  • Time Complexity: O(E log E) - sorting edges dominates
  • Space Complexity: O(V) - for Union-Find structure

Prim's Algorithm

Prim's algorithm grows the MST from a starting vertex, always adding the minimum-weight edge that connects a vertex in the MST to a vertex outside it.

Algorithm

import heapq

def prim_mst(graph, start):
    n = len(graph)
    mst = []
    visited = [False] * n
    pq = [(0, start, -1)]  # (weight, vertex, parent)
    
    while pq and len(mst) < n - 1:
        weight, u, parent = heapq.heappop(pq)
        
        if visited[u]:
            continue
        
        visited[u] = True
        if parent != -1:
            mst.append((parent, u, weight))
        
        for v, w in graph[u]:
            if not visited[v]:
                heapq.heappush(pq, (w, v, u))
    
    return mst

Complexity

  • Time Complexity: O(E log V) with binary heap, O(E + V log V) with Fibonacci heap
  • Space Complexity: O(V) - for priority queue and visited array

Kruskal's vs Prim's

Feature Kruskal's Prim's
Data Structure Union-Find Priority Queue
Best For Sparse graphs (E ≈ V) Dense graphs (E ≈ V²)
Time Complexity O(E log E) O(E log V)
Approach Edge-based (adds edges) Vertex-based (grows from vertex)

For more details on Union-Find, see Graph Algorithms.

Interval Scheduling Problems

Interval scheduling problems are a class of problems where you need to schedule intervals (time periods, tasks, etc.) optimally. Many variations use greedy strategies.

Maximum Non-Overlapping Intervals

This is the same as the activity selection problem - find the maximum number of non-overlapping intervals.

Interval Partitioning (Minimum Rooms)

Given intervals, find the minimum number of resources (rooms, machines) needed so no two overlapping intervals use the same resource.

Algorithm

def min_rooms(intervals):
    # Create events: (time, type) where type: 1=start, -1=end
    events = []
    for start, end in intervals:
        events.append((start, 1))
        events.append((end, -1))
    
    # Sort by time, then by type (end before start at same time)
    events.sort(key=lambda x: (x[0], x[1]))
    
    max_rooms = 0
    current_rooms = 0
    
    for time, event_type in events:
        current_rooms += event_type
        max_rooms = max(max_rooms, current_rooms)
    
    return max_rooms

# Example
intervals = [(1, 4), (2, 5), (3, 6), (5, 7)]
# Result: 3 rooms needed

Complexity

  • Time Complexity: O(n log n) - sorting events
  • Space Complexity: O(n) - for events array

Coin Change (Greedy Version)

The coin change problem asks: given coins of different denominations and a target amount, find the minimum number of coins needed. A greedy approach works for certain coin systems (like US currency) but not all.

When Greedy Works

Greedy works for coin systems where each coin is at least twice the value of the next smaller coin. For example, US coins: 1, 5, 10, 25, 50, 100.

Algorithm

def coin_change_greedy(coins, amount):
    coins.sort(reverse=True)  # Sort descending
    count = 0
    i = 0
    
    while amount > 0 and i < len(coins):
        if coins[i] <= amount:
            num_coins = amount // coins[i]
            count += num_coins
            amount -= num_coins * coins[i]
        i += 1
    
    return count if amount == 0 else -1

# Example (US coins)
coins = [1, 5, 10, 25]
amount = 67
# Result: 6 coins (2 quarters, 1 dime, 1 nickel, 2 pennies)

When Greedy Fails

Greedy fails for coin systems like [1, 3, 4] and amount 6:

  • Greedy: 4 + 1 + 1 = 3 coins
  • Optimal: 3 + 3 = 2 coins

For general coin systems, use dynamic programming (see Dynamic Programming).

Common Greedy Patterns

1. Sorting + Greedy Choice

Many greedy algorithms start by sorting the input, then make greedy choices in order:

  • Activity Selection: Sort by finish time
  • Fractional Knapsack: Sort by value/weight ratio
  • Interval Scheduling: Sort by end time

2. Priority Queue / Heap

Use a priority queue when you need to repeatedly select the best option:

  • Huffman Coding: Merge nodes with lowest frequencies
  • Prim's MST: Always add minimum-weight edge
  • Dijkstra's Algorithm: Process closest unvisited vertex

3. Two-Pointer Technique

Some greedy problems can be solved with two pointers:

  • Meeting room scheduling
  • Merging intervals
  • Container with most water

Proving Greedy Algorithms Correct

Exchange Argument

The most common proof technique is the exchange argument:

  1. Assume you have an optimal solution that doesn't use the greedy choice
  2. Show you can exchange/swap elements to include the greedy choice
  3. Prove the new solution is at least as good (optimal)
  4. Conclude the greedy choice is safe

Example: Activity Selection Proof

Claim: Always choosing the activity that ends earliest is optimal.

Proof: Let S be an optimal solution that doesn't include activity a (earliest finishing). Let a' be the first activity in S. Since a finishes before a', we can replace a' with a in S. The new solution has the same number of activities (optimal) and a finishes earlier, so it can't conflict with more activities. Therefore, including a is optimal.

Mathematical Induction

Another common approach is to prove by induction that after making k greedy choices, there exists an optimal solution that includes those k choices.

Common Pitfalls and Mistakes

1. Assuming Greedy Always Works

Not all problems can be solved greedily. Always verify the greedy choice property and optimal substructure before using a greedy approach.

2. Wrong Greedy Criterion

Choosing the wrong criterion for the greedy choice leads to incorrect solutions. For example:

  • Activity Selection: Must sort by finish time, not start time
  • Fractional Knapsack: Must use value/weight ratio, not just value

3. Not Handling Edge Cases

Greedy algorithms often have edge cases:

  • Empty input
  • Single element
  • All elements identical
  • Boundary conditions

4. Confusing with Dynamic Programming

Remember: Greedy makes one choice and never looks back. DP considers all possibilities and stores results. If you need to reconsider previous choices, use DP, not greedy.

Practice Problems

Easy

  • LeetCode 455: Assign Cookies - Match children with cookies greedily
  • LeetCode 860: Lemonade Change - Greedy change making
  • LeetCode 1005: Maximize Sum Of Array After K Negations - Greedy negation strategy

Medium

  • LeetCode 435: Non-overlapping Intervals - Activity selection variant
  • LeetCode 452: Minimum Number of Arrows to Burst Balloons - Interval scheduling
  • LeetCode 621: Task Scheduler - Greedy task scheduling
  • LeetCode 763: Partition Labels - Greedy partitioning
  • LeetCode 122: Best Time to Buy and Sell Stock II - Greedy profit maximization

Hard

  • LeetCode 135: Candy - Greedy distribution with constraints
  • LeetCode 330: Patching Array - Greedy patching strategy
  • LeetCode 502: IPO - Greedy project selection

Real-World Applications

Network Design

  • Minimum spanning trees for network topology
  • Shortest path routing (Dijkstra's algorithm)
  • Network flow optimization

Data Compression

  • Huffman coding for file compression
  • Lempel-Ziv-Welch (LZW) compression
  • Image and video compression

Scheduling

  • CPU task scheduling
  • Meeting room allocation
  • Job scheduling in operating systems
  • Resource allocation

Optimization

  • Portfolio optimization (fractional knapsack)
  • Resource allocation problems
  • Approximation algorithms for NP-hard problems

Summary

Greedy algorithms are powerful tools for optimization problems that exhibit the greedy choice property and optimal substructure. Key takeaways:

  • Greedy algorithms make locally optimal choices at each step
  • They're typically faster than dynamic programming (O(n log n) vs O(n²))
  • They require proof that the greedy choice leads to optimal solutions
  • Common patterns include sorting + greedy choice, priority queues, and two pointers
  • Classic problems: Activity Selection, Fractional Knapsack, Huffman Coding, MST
  • Always verify the greedy choice property before using a greedy approach

When in doubt between greedy and DP, consider: if you need to reconsider previous choices, use DP. If making one choice and moving forward works, greedy might be the right approach.

Related Topics

Continue exploring related algorithm topics: