In solving optimization problems, we make choices at each of a
sequence of steps. If the number of complexity of the choices is high,
finding an optimal solution can be hard, perhaps infeasible. But if we
can reduce the number of choices to few – or even *one* – things
become considerably easier.

A **greedy algorithm** is one in which we make the choice that *looks best
at each particular moment*.

This is a local strategy in the sense that we make our choice based on the information we have at the moment without concern or consideration for its downstream implications.

A greedy algorithm works when this sequence of *local* choices leads to
a *global* optimum.

**Question**: Consider stepwise regression – a greedy algorithm for
variable selection in linear regression.

We start with only an intercept and successively add variables At each stage, we choose the variable to add with the largest “F-to-enter” (the squared regression t-statistic if that variable is added to the current set), or none if all the F-to-enter’s are too small.

Does this greedy algorithm give an optimal solution?

We will see important problems as we go where greedy solutions are optimal, including Minimum Spanning Tree and Orthogonal Matching Pursuit. For now, we look at two.

## Example: Resource Scheduling #

We have a single resource (e.g., a car, a seminar room, a 3-D printer) and \(n\) individuals who want to use it. Assume that user \(1 \le i \le n\) needs to start using the resource at time \(s_i\) and will finish using it at time \(f_i\), with \(s_i < f_i\).

Two users \(i\) and \(j\) are *compatible* if \(s_i \ge f_j\) or \(s_j \ge f_i\),
that is, if their intervals of use do not overlap.

Goal: select the *maximum-sized* subset of mutually compatible users.

Assume, for convenience, that users \(1, \ldots, n\) are labeled in
order of *increasing finish time* (i.e., \(f_1 \le f_2 \le \cdots \le f_n\)).

### A Subproblem Decomposition #

Let \(S_{ij}\) be the set of users who start after \(i\) finishes and finish before \(j\) starts. Subproblem: Find the maximal set \(A_{ij}\) of mutually compatible users in \(S_{ij}\).

Suppose \(k \in A_{ij}\). Then \(A_{ik} = A_{ij} \cap S_{ik}\) and \(A_{kj} = A_{ij} \cap S_{kj}\). So, \[ A_{ij} = A_{ik} \cup {k} \cup A_{kj}. \] The original problem is of this form (with fictitious users 0 and \(n+1\) with suitable start and finish times).

We can thus express an optimal solution to our problem in terms of
subproblems, though we have to search for subproblem solutions.
(This looks like divide and conquer, but it can actually be solved
by the technique *dynamic programming* that we will cover next week.)

But what if we are greedy…

### A Greedy Decomposition #

Intuitively, what would a greedy choice be?

```
At each stage, choose the compatible user with the
earliest finishing time!
```

This leaves us only **one subproblem** at each choice.
And it turns out – this can be proved mathematically –
that the greedy solution is optimal.

```
schedule(n, s, f, k):
m <- k + 1
while m <= n and s[m] < f[k]:
m <- m + 1
if m <= n:
return {m} union schedule(n, s, f, m)
else:
return emptyset()
```

Question: Can we transform this into an *iterative* algorithm?

```
Start with k = 1, A = {1}
for m = 2 to n:
if s[m] >= f[k]:
A = A union {m}
k = m
return A
```

## Example: Huffman Coding and Data Compression #

We have a stream of tokens drawn from some distribution. We want to encode that stream to compress the data. Specifically, we assign a binary codeword (a string of 0’s and 1’s) to each token such that more frequent tokens have shorter codewords.

We want to come as possible to the information bound on data compression for this source.

Note: We will consider only **prefix codes**, in which no codeword is a
prefix of any other codeword. This makes decoding efficient and (it
turns out) costs us nothing in effectiveness.

We will represent binary prefix codes by binary trees, with left branches corresponding to 0 and right branches corresponding to 1, and with tokens at the leaves.

The problem of designing an optimal code for distributions with specified frequencies has the greedy-choice property.

We start with a forest, where each tree is a singleton node for each possible token value (with its associated frequency).

In python-ish:

```
def huffman(tokens, freqs):
q = PriorityQueue(tokens, freqs)
n = len(tokens)
for iteration in range(1,n):
min1 = q.pop_min()
min2 = q.pop_min()
q.insert({'left': min1, 'right': min2, 'freq': min1.freq + min2.freq})
return q.pop_min()
```

This produces a binary tree with each token represented by a *leaf node*.
We read off the codeword by assigning a 1 bit for every right child
we visit and a 0 bit for every left child we visit. This gives
us a binary string that we use for the code. It can be proved
to be optimal.

Here’s how it might go with

token | a | b | c | d | e | f |
---|---|---|---|---|---|---|

freq | 0.45 | 0.13 | 0.12 | 0.16 | 0.09 | 0.05 |

```
f e c b d a
c b [f e] d a
[f e] d [c b] a
[c b] [[f e] d] a
[[c b] [[f e] d]] a
[[[c b] [[f e] d]] a]
```