Greedy Algorithms
Introduction to Greedy Algorithms
Greedy algorithms are a class of algorithms that make locally optimal choices at each step with the hope of finding a global optimum. The term "greedy" comes from the strategy of always choosing the option that looks best at the moment, without considering the broader context or future consequences.
Unlike dynamic programming, which considers all possible subproblems, greedy algorithms make a single choice and never reconsider it. This makes them typically faster and simpler, but they only work correctly when the problem has specific properties that guarantee the greedy choice leads to an optimal solution.
Greedy algorithms are widely used in optimization problems, scheduling, graph algorithms, and compression. They're particularly valuable when you need efficient solutions and can prove that the greedy strategy is optimal.
When to Use Greedy Algorithms
Greedy algorithms are suitable when:
- The problem has the greedy choice property - a globally optimal solution can be reached by making locally optimal choices
- The problem has optimal substructure - an optimal solution contains optimal solutions to subproblems
- You need a fast, efficient solution (often O(n log n) or better)
- The problem can be solved by making a series of choices where each choice is independent
Greedy vs Dynamic Programming
Understanding when to use greedy algorithms versus dynamic programming is crucial:
| Feature | Greedy Algorithms | Dynamic Programming |
|---|---|---|
| Decision Making | Make one choice, never reconsider | Consider all possibilities, store results |
| Time Complexity | Usually O(n log n) or O(n) | Usually O(n²) or higher |
| Space Complexity | Usually O(1) or O(n) | Usually O(n) or O(n²) |
| Optimality | Requires proof of greedy choice property | Guaranteed optimal if problem has optimal substructure |
| Best For | Optimization with greedy choice property | Problems with overlapping subproblems |
Key Properties of Greedy Algorithms
1. Greedy Choice Property
A problem exhibits the greedy choice property if a globally optimal solution can be reached by making a locally optimal (greedy) choice at each step. This means that at each decision point, we can make the choice that looks best right now, and this will lead to an optimal solution.
Example: In the activity selection problem, always choosing the activity that ends earliest (greedy choice) leads to the maximum number of non-overlapping activities (optimal solution).
2. Optimal Substructure
A problem has optimal substructure if an optimal solution to the problem contains optimal solutions to its subproblems. This property is shared with dynamic programming, but greedy algorithms use it differently - they make one choice and solve one subproblem, rather than solving all subproblems.
Example: In the minimum spanning tree problem, if we know the MST of a graph, and we add a new edge, the optimal way to incorporate it uses the existing MST structure.
3. Proving Greedy Algorithms
To prove a greedy algorithm is correct, you typically need to:
- Prove the greedy choice property: Show that making the greedy choice is always safe (doesn't exclude optimal solutions)
- Prove optimal substructure: Show that the problem can be solved by combining the greedy choice with an optimal solution to the remaining subproblem
- Use exchange argument: Show that any optimal solution can be transformed to include the greedy choice without making it worse
Activity Selection Problem
The activity selection problem is a classic greedy algorithm problem: given a set of activities with start and finish times, select the maximum number of non-overlapping activities.
Problem Statement
Given n activities with start times s[i] and finish times f[i], where activity i takes place during [s[i], f[i]), find the maximum number of activities that can be scheduled without overlapping.
Greedy Strategy
The greedy choice is to always select the activity that finishes earliest. This leaves the maximum amount of time for remaining activities.
Algorithm
def activity_selection(activities):
# Sort activities by finish time
activities.sort(key=lambda x: x[1])
selected = [activities[0]] # Always select first activity
last_finish = activities[0][1]
for start, finish in activities[1:]:
if start >= last_finish: # No overlap
selected.append((start, finish))
last_finish = finish
return selected
# Example
activities = [(1, 4), (3, 5), (0, 6), (5, 7), (8, 9), (5, 9)]
result = activity_selection(activities)
# Result: [(1, 4), (5, 7), (8, 9)] - 3 activities
Complexity Analysis
- Time Complexity: O(n log n) - dominated by sorting
- Space Complexity: O(n) - for storing selected activities
Why It Works
The greedy choice property holds because if we have an optimal solution that doesn't include the earliest-finishing activity, we can replace the first activity in that solution with the earliest-finishing one without reducing the total count (since it finishes earlier, it can't conflict with more activities).
Fractional Knapsack Problem
The fractional knapsack problem allows taking fractions of items, making it solvable with a greedy approach. This contrasts with the 0-1 knapsack problem, which requires dynamic programming.
Problem Statement
Given items with weights and values, and a knapsack with capacity W, maximize the total value. You can take fractions of items (unlike 0-1 knapsack where items are indivisible).
Greedy Strategy
Sort items by value-to-weight ratio (value/weight) in descending order. Take items in this order, taking as much as possible of each item until the knapsack is full.
Algorithm
def fractional_knapsack(items, capacity):
# Sort by value/weight ratio (descending)
items.sort(key=lambda x: x[1] / x[0], reverse=True)
total_value = 0
remaining_capacity = capacity
for weight, value in items:
if remaining_capacity >= weight:
# Take entire item
total_value += value
remaining_capacity -= weight
else:
# Take fraction of item
fraction = remaining_capacity / weight
total_value += value * fraction
break
return total_value
# Example
items = [(10, 60), (20, 100), (30, 120)] # (weight, value)
capacity = 50
result = fractional_knapsack(items, capacity)
# Takes all of item 1 (10, 60), all of item 2 (20, 100),
# and 2/3 of item 3 (20, 80) = 240 total value
Complexity Analysis
- Time Complexity: O(n log n) - sorting dominates
- Space Complexity: O(1) - only using a few variables
Why Greedy Works for Fractional Knapsack
Since we can take fractions, the item with the highest value-to-weight ratio should be prioritized. If we don't take as much as possible of the best ratio item, we can always improve the solution by replacing some of a lower-ratio item with the higher-ratio item.
Huffman Coding
Huffman coding is a lossless data compression algorithm that creates optimal prefix-free codes for characters based on their frequencies. It's used in file compression formats like ZIP and JPEG.
Problem Statement
Given characters and their frequencies, assign binary codes to each character such that:
- No code is a prefix of another (prefix-free property)
- The total encoded length is minimized
Greedy Strategy
Build a binary tree bottom-up by repeatedly merging the two nodes with lowest frequencies. The characters with higher frequencies get shorter codes.
Algorithm
import heapq
class Node:
def __init__(self, char, freq, left=None, right=None):
self.char = char
self.freq = freq
self.left = left
self.right = right
def __lt__(self, other):
return self.freq < other.freq
def build_huffman_tree(frequencies):
# Create priority queue (min-heap)
heap = [Node(char, freq) for char, freq in frequencies.items()]
heapq.heapify(heap)
# Build tree by merging nodes
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = Node(None, left.freq + right.freq, left, right)
heapq.heappush(heap, merged)
return heap[0] # Root of Huffman tree
def build_codes(root, code="", codes={}):
if root.char is not None:
codes[root.char] = code if code else "0"
else:
build_codes(root.left, code + "0", codes)
build_codes(root.right, code + "1", codes)
return codes
# Example
frequencies = {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}
tree = build_huffman_tree(frequencies)
codes = build_codes(tree)
# Result: {'f': '0', 'c': '100', 'd': '101', 'a': '1100',
# 'b': '1101', 'e': '111'}
Complexity Analysis
- Time Complexity: O(n log n) - building heap and merging nodes
- Space Complexity: O(n) - for the tree and codes
Applications
- File compression (ZIP, GZIP)
- Image compression (JPEG uses Huffman coding)
- Network protocols
- Data transmission optimization
Minimum Spanning Tree (MST)
A minimum spanning tree connects all vertices in a weighted, undirected graph with minimum total edge weight. Two famous greedy algorithms solve this: Kruskal's and Prim's.
Kruskal's Algorithm
Kruskal's algorithm builds the MST by repeatedly adding the smallest edge that doesn't form a cycle. It uses the Union-Find data structure to efficiently detect cycles.
Algorithm
def kruskal_mst(edges, num_vertices):
# Sort edges by weight
edges.sort(key=lambda x: x[2]) # (u, v, weight)
mst = []
parent = list(range(num_vertices))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x]) # Path compression
return parent[x]
def union(x, y):
root_x, root_y = find(x), find(y)
if root_x != root_y:
parent[root_x] = root_y
return True
return False
for u, v, weight in edges:
if union(u, v): # No cycle
mst.append((u, v, weight))
if len(mst) == num_vertices - 1:
break
return mst
Complexity
- Time Complexity: O(E log E) - sorting edges dominates
- Space Complexity: O(V) - for Union-Find structure
Prim's Algorithm
Prim's algorithm grows the MST from a starting vertex, always adding the minimum-weight edge that connects a vertex in the MST to a vertex outside it.
Algorithm
import heapq
def prim_mst(graph, start):
n = len(graph)
mst = []
visited = [False] * n
pq = [(0, start, -1)] # (weight, vertex, parent)
while pq and len(mst) < n - 1:
weight, u, parent = heapq.heappop(pq)
if visited[u]:
continue
visited[u] = True
if parent != -1:
mst.append((parent, u, weight))
for v, w in graph[u]:
if not visited[v]:
heapq.heappush(pq, (w, v, u))
return mst
Complexity
- Time Complexity: O(E log V) with binary heap, O(E + V log V) with Fibonacci heap
- Space Complexity: O(V) - for priority queue and visited array
Kruskal's vs Prim's
| Feature | Kruskal's | Prim's |
|---|---|---|
| Data Structure | Union-Find | Priority Queue |
| Best For | Sparse graphs (E ≈ V) | Dense graphs (E ≈ V²) |
| Time Complexity | O(E log E) | O(E log V) |
| Approach | Edge-based (adds edges) | Vertex-based (grows from vertex) |
For more details on Union-Find, see Graph Algorithms.
Interval Scheduling Problems
Interval scheduling problems are a class of problems where you need to schedule intervals (time periods, tasks, etc.) optimally. Many variations use greedy strategies.
Maximum Non-Overlapping Intervals
This is the same as the activity selection problem - find the maximum number of non-overlapping intervals.
Interval Partitioning (Minimum Rooms)
Given intervals, find the minimum number of resources (rooms, machines) needed so no two overlapping intervals use the same resource.
Algorithm
def min_rooms(intervals):
# Create events: (time, type) where type: 1=start, -1=end
events = []
for start, end in intervals:
events.append((start, 1))
events.append((end, -1))
# Sort by time, then by type (end before start at same time)
events.sort(key=lambda x: (x[0], x[1]))
max_rooms = 0
current_rooms = 0
for time, event_type in events:
current_rooms += event_type
max_rooms = max(max_rooms, current_rooms)
return max_rooms
# Example
intervals = [(1, 4), (2, 5), (3, 6), (5, 7)]
# Result: 3 rooms needed
Complexity
- Time Complexity: O(n log n) - sorting events
- Space Complexity: O(n) - for events array
Coin Change (Greedy Version)
The coin change problem asks: given coins of different denominations and a target amount, find the minimum number of coins needed. A greedy approach works for certain coin systems (like US currency) but not all.
When Greedy Works
Greedy works for coin systems where each coin is at least twice the value of the next smaller coin. For example, US coins: 1, 5, 10, 25, 50, 100.
Algorithm
def coin_change_greedy(coins, amount):
coins.sort(reverse=True) # Sort descending
count = 0
i = 0
while amount > 0 and i < len(coins):
if coins[i] <= amount:
num_coins = amount // coins[i]
count += num_coins
amount -= num_coins * coins[i]
i += 1
return count if amount == 0 else -1
# Example (US coins)
coins = [1, 5, 10, 25]
amount = 67
# Result: 6 coins (2 quarters, 1 dime, 1 nickel, 2 pennies)
When Greedy Fails
Greedy fails for coin systems like [1, 3, 4] and amount 6:
- Greedy: 4 + 1 + 1 = 3 coins
- Optimal: 3 + 3 = 2 coins
For general coin systems, use dynamic programming (see Dynamic Programming).
Common Greedy Patterns
1. Sorting + Greedy Choice
Many greedy algorithms start by sorting the input, then make greedy choices in order:
- Activity Selection: Sort by finish time
- Fractional Knapsack: Sort by value/weight ratio
- Interval Scheduling: Sort by end time
2. Priority Queue / Heap
Use a priority queue when you need to repeatedly select the best option:
- Huffman Coding: Merge nodes with lowest frequencies
- Prim's MST: Always add minimum-weight edge
- Dijkstra's Algorithm: Process closest unvisited vertex
3. Two-Pointer Technique
Some greedy problems can be solved with two pointers:
- Meeting room scheduling
- Merging intervals
- Container with most water
Proving Greedy Algorithms Correct
Exchange Argument
The most common proof technique is the exchange argument:
- Assume you have an optimal solution that doesn't use the greedy choice
- Show you can exchange/swap elements to include the greedy choice
- Prove the new solution is at least as good (optimal)
- Conclude the greedy choice is safe
Example: Activity Selection Proof
Claim: Always choosing the activity that ends earliest is optimal.
Proof: Let S be an optimal solution that doesn't include activity a (earliest finishing). Let a' be the first activity in S. Since a finishes before a', we can replace a' with a in S. The new solution has the same number of activities (optimal) and a finishes earlier, so it can't conflict with more activities. Therefore, including a is optimal.
Mathematical Induction
Another common approach is to prove by induction that after making k greedy choices, there exists an optimal solution that includes those k choices.
Common Pitfalls and Mistakes
1. Assuming Greedy Always Works
Not all problems can be solved greedily. Always verify the greedy choice property and optimal substructure before using a greedy approach.
2. Wrong Greedy Criterion
Choosing the wrong criterion for the greedy choice leads to incorrect solutions. For example:
- Activity Selection: Must sort by finish time, not start time
- Fractional Knapsack: Must use value/weight ratio, not just value
3. Not Handling Edge Cases
Greedy algorithms often have edge cases:
- Empty input
- Single element
- All elements identical
- Boundary conditions
4. Confusing with Dynamic Programming
Remember: Greedy makes one choice and never looks back. DP considers all possibilities and stores results. If you need to reconsider previous choices, use DP, not greedy.
Practice Problems
Easy
- LeetCode 455: Assign Cookies - Match children with cookies greedily
- LeetCode 860: Lemonade Change - Greedy change making
- LeetCode 1005: Maximize Sum Of Array After K Negations - Greedy negation strategy
Medium
- LeetCode 435: Non-overlapping Intervals - Activity selection variant
- LeetCode 452: Minimum Number of Arrows to Burst Balloons - Interval scheduling
- LeetCode 621: Task Scheduler - Greedy task scheduling
- LeetCode 763: Partition Labels - Greedy partitioning
- LeetCode 122: Best Time to Buy and Sell Stock II - Greedy profit maximization
Hard
- LeetCode 135: Candy - Greedy distribution with constraints
- LeetCode 330: Patching Array - Greedy patching strategy
- LeetCode 502: IPO - Greedy project selection
Real-World Applications
Network Design
- Minimum spanning trees for network topology
- Shortest path routing (Dijkstra's algorithm)
- Network flow optimization
Data Compression
- Huffman coding for file compression
- Lempel-Ziv-Welch (LZW) compression
- Image and video compression
Scheduling
- CPU task scheduling
- Meeting room allocation
- Job scheduling in operating systems
- Resource allocation
Optimization
- Portfolio optimization (fractional knapsack)
- Resource allocation problems
- Approximation algorithms for NP-hard problems
Summary
Greedy algorithms are powerful tools for optimization problems that exhibit the greedy choice property and optimal substructure. Key takeaways:
- Greedy algorithms make locally optimal choices at each step
- They're typically faster than dynamic programming (O(n log n) vs O(n²))
- They require proof that the greedy choice leads to optimal solutions
- Common patterns include sorting + greedy choice, priority queues, and two pointers
- Classic problems: Activity Selection, Fractional Knapsack, Huffman Coding, MST
- Always verify the greedy choice property before using a greedy approach
When in doubt between greedy and DP, consider: if you need to reconsider previous choices, use DP. If making one choice and moving forward works, greedy might be the right approach.
Related Topics
Continue exploring related algorithm topics:
- Dynamic Programming - Compare greedy vs DP approaches
- Graph Algorithms - MST algorithms (Kruskal's, Prim's) and shortest paths
- Sorting Algorithms - Many greedy algorithms rely on sorting
- Trees - Huffman coding uses binary trees