Greedy Algorithms

— Christopher Genovese and Alex Reinhart

In solving optimization problems, we make choices at each of a sequence of steps. If the number of complexity of the choices is high, finding an optimal solution can be hard, perhaps infeasible. But if we can reduce the number of choices to few – or even one – things become considerably easier.

A greedy algorithm is one in which we make the choice that looks best at each particular moment.

This is a local strategy in the sense that we make our choice based on the information we have at the moment without concern or consideration for its downstream implications.

A greedy algorithm works when this sequence of local choices leads to a global optimum.

Question: Consider stepwise regression – a greedy algorithm for variable selection in linear regression.

We start with only an intercept and successively add variables At each stage, we choose the variable to add with the largest “F-to-enter” (the squared regression t-statistic if that variable is added to the current set), or none if all the F-to-enter’s are too small.

Does this greedy algorithm give an optimal solution?

We will see important problems as we go where greedy solutions are optimal, including Minimum Spanning Tree and Orthogonal Matching Pursuit. For now, we look at two.

Example: Resource Scheduling #

We have a single resource (e.g., a car, a seminar room, a 3-D printer) and \(n\) individuals who want to use it. Assume that user \(1 \le i \le n\) needs to start using the resource at time \(s_i\) and will finish using it at time \(f_i\), with \(s_i < f_i\).

Two users \(i\) and \(j\) are compatible if \(s_i \ge f_j\) or \(s_j \ge f_i\), that is, if their intervals of use do not overlap.

Goal: select the maximum-sized subset of mutually compatible users.

Assume, for convenience, that users \(1, \ldots, n\) are labeled in order of increasing finish time (i.e., \(f_1 \le f_2 \le \cdots \le f_n\)).

A Subproblem Decomposition #

Let \(S_{ij}\) be the set of users who start after \(i\) finishes and finish before \(j\) starts. Subproblem: Find the maximal set \(A_{ij}\) of mutually compatible users in \(S_{ij}\).

Suppose \(k \in A_{ij}\). Then \(A_{ik} = A_{ij} \cap S_{ik}\) and \(A_{kj} = A_{ij} \cap S_{kj}\). So, \[ A_{ij} = A_{ik} \cup {k} \cup A_{kj}. \] The original problem is of this form (with fictitious users 0 and \(n+1\) with suitable start and finish times).

We can thus express an optimal solution to our problem in terms of subproblems, though we have to search for subproblem solutions. (This looks like divide and conquer, but it can actually be solved by the technique dynamic programming that we will cover next week.)

But what if we are greedy…

A Greedy Decomposition #

Intuitively, what would a greedy choice be?

At each stage, choose the compatible user with the
earliest finishing time!

This leaves us only one subproblem at each choice. And it turns out – this can be proved mathematically – that the greedy solution is optimal.

schedule(n, s, f, k):
  m <- k + 1
  while m <= n and s[m] < f[k]:
    m <- m + 1

  if m <= n:
    return {m} union schedule(n, s, f, m)
  else:
    return emptyset()

Question: Can we transform this into an iterative algorithm?

Start with k = 1, A = {1}
for m = 2 to n:
  if s[m] >= f[k]:
    A = A union {m}
    k = m
return A

Example: Huffman Coding and Data Compression #

We have a stream of tokens drawn from some distribution. We want to encode that stream to compress the data. Specifically, we assign a binary codeword (a string of 0’s and 1’s) to each token such that more frequent tokens have shorter codewords.

We want to come as possible to the information bound on data compression for this source.

Note: We will consider only prefix codes, in which no codeword is a prefix of any other codeword. This makes decoding efficient and (it turns out) costs us nothing in effectiveness.

We will represent binary prefix codes by binary trees, with left branches corresponding to 0 and right branches corresponding to 1, and with tokens at the leaves.

The problem of designing an optimal code for distributions with specified frequencies has the greedy-choice property.

We start with a forest, where each tree is a singleton node for each possible token value (with its associated frequency).

In python-ish:

def huffman(tokens, freqs):
    q = PriorityQueue()
    for entry in zip(freqs, tokens):
        q.insert(entry)

    n = len(tokens)
    for iteration in range(1,n):
        freq1, min1 = q.pop_min()
        freq2, min2 = q.pop_min()
        q.insert((freq1 + freq2, {'left': min1, 'right': min2, 'freq': freq1 + freq2}))
    return q.pop_min()[1]

This produces a binary tree with each token represented by a leaf node. We read off the codeword by assigning a 1 bit for every right child we visit and a 0 bit for every left child we visit. This gives us a binary string that we use for the code. It can be proved to be optimal.

Here’s how it might go with

token a b c d e f
freq 0.45 0.13 0.12 0.16 0.09 0.05
f e c b d a
c b [f e] d a
[f e] d [c b] a
[c b] [[f e] d] a
[[c b] [[f e] d]] a
[[[c b] [[f e] d]] a]