Dynamic Programming

— Christopher Genovese and Alex Reinhart

In this chapter, we consider a useful algorithmic strategy called dynamic programming that is based on decomposing problems into sub-problems in a particular way.

Note: The term “programming” here is used in the old sense: referring to planning, scheduling, routing, assignment – and the optimization thereof. As with “linear programming,” “quadratic programming,” and “mathematical programming” you can think of it as a synonym for optimization.

(The name was chosen by Richard Bellman in the 1950s essentially because it sounded cool and interesting, and hence the Air Force, which was funding his research, wouldn’t notice that he was actually just doing mathematics research.)

And indeed, dynamic programming is useful for a wide variety combinatorial optimization problems, as we will see. But first, let’s consider some examples.

Illustrative Examples #

Fibonacci Revisited #

We saw earlier a simple recursive algorithm for computing Fibonacci numbers.

   if n == 0: return 0
   if n == 1: return 1
   return fib(n-1) + fib(n-2)

Recall that this does not perform well. Why?

fib(7) -> fib(6) -> fib(5), fib(4) -> fib(4), fib(3), fib(3), ...
          fib(5) -> fib(4), fib(3)

We end up recomputing the same value multiple times.

Two key strategies:

  1. Solve the problem in a particular order (from smaller n to larger n).
  2. Store the solutions of problems we have already solved (memoization).

Together these strategies give us a performant algorithm.

Making Change #

When we talked about divide and conquer algorithms, we solved the problem of how to make change in an optimal way using coins of specified denominations. How did we go about that?

Suppose we have coin denominations 1, 5, 10, and 25 and a purchase leaving 67 cents in change. What do we do?

Shortest Distance #

Consider a (one-way) road network connecting sites in a town, where each path from a site to a connected site has a cost.


What is the lowest-cost path from S to E? How do we find it?

Solution #

Start from E and work backward.

  • The best route from E to E costs 0.
  • The best route from D to E costs 1.
  • The best route from B to E costs 2.
  • The best route from A to E costs 8.
  • The best route from C to E costs 4.
  • The best route from S to E costs 6. [S -> C -> D -> E]

Why does this work? #

The lowest-cost path from a given node to E is a subproblem of the original.

We can arrange these subproblems in an ordering so that we can solve the easiest subproblems first and then solve the harder subproblems in terms of the easier ones.

What ordering?


This is just the topological sort of the DAG that we saw above. This gives us a subproblem dependent order:

S, C, A, B, D, E;

Illustrative Challenge: Making Change #

Dr. Ecco, whom we will call a “consulting statistician,” receives a visit from the director of a mint from a small country. The president of the country (who is a mathematician herself) has asked him to design the country’s currency, but the president is obsessed with minimizing the number of coins that her citizens need (on average) to make a purchase.

The mint has studied the situation and has a good estimate of the distribution of item cost remainders (e.g., costs mod 100) across the country and of the costs to the mint of producing any given size of the coin set. The president believes she knows the proper trade off between number of these latter costs with the efficiency of making change.

The general question 1 is: given this information, can we find coin denominations of each size that minimize the expected number of coins required for a purchase, and in turn balance the costs of each coin set size against the benefits from efficiency? Because costs can be 1 cent (assume the country uses currency units with 100 “cents” per unit), we will consider cases where “pennies” are always included, but the more general case is an interesting extension.

We distinguish two types of counting:

the coins for a given purchase cost can always be obtained by choosing the largest denomination that works at each step. (Ex: 1, 5, 10, 25 is intuitive.)
the set is not intuitive. (Ex: 1, 10, 25.)

A related question is whether non-intuitive sets are ever required given a constrained number of coins.

Exercise: For a given candidate set of possible coins, how do you find the optimal change to make, in the intuitive and non-intuitive cases, for every purchase cost from 1 to 99. You want to minimize the number of coins used to make change.

For example: You have coins 1, 10, and 25. You need 30 cents. Using 10, 10, 10 is better than using 25, 1, 1, 1, 1, 1.1

The more general question seems to require searching over all subsets of the allowed denominations, which would be of exponential complexity in general, though manageable for reasonably small sizes. Can we do better?


  1. What are the individual problems we want to solve?
  2. Can we use the solutions to some problems to help us solve other problems faster?
  3. In what order should we solve the problems?

See documents/Activities/dynamic-programming/change.py for some code to do this.

Related question: What are some boundary test cases to consider with this code?

Overview: Dynamic Programming (DP) #

  1. Decompose a problem into (possibly many) smaller subproblems.

  2. Arrange those subproblems in the order they will be needed.

  3. Compute solutions to the subproblems in order, storing the solution to each subproblem for later use.

  4. The solution to a subproblem combines the solutions to earlier subproblems in a specific way.

Reminder: Topological Sorting DAGS #

A topological sort of a DAG is a linear ordering of the DAG’s nodes such that if \((u,v)\) is a directed edge in the graph, node \(u\) comes before node \(v\) in the ordering.

Example: A directed graph


and a rearrangment showing a topological sort


The sorted nodes are S C A B D E.

For a general DAG, how do we use DFS to do a topological sort?

The Key Ideas Revisited #

  1. Decompose a problem into smaller subproblems.

    Implicitly, each subproblem is a node in a directed graph, and there is a directed edge \((u,v)\) in that graph when the result of one subproblem is required in order to solve the other.

    \((v,u)\) is an edge when the result of solving subproblem \(v\) requires the result of subproblem \(u\).

  2. Arrange those subproblems in the topologically sorted order of the graph.

    A topological sort of the underlying DAG yields an ordering of the subproblems. We will call this a subproblem order or dependent order.

  3. Compute solutions to the subproblems in order, storing the result of each subproblem for later use if needed. This storing approach is called memoization or caching.

    One common scenario is when the subroblems are computed by a single function, and we store our previous solution by memoizing the function. That is, when we call the function, we check if we have called it with these particular arguments before. If so, return the previously computed value. Otherwise, compute the value and store it, marking these arguments as being previously computed.

  4. The solution to a subproblem combines the solutions to earlier subproblems through a specific mathematical relation.

    The mathematical relationship between a subproblem solution and the solution of previous subproblems is often embodied in an equation, or set of equations, called the Bellman equations. We will see examples below.

Question #

For the Fibonacci example we just saw, what are the subproblems? What is the DAG? What does memoizing look like?

Examples #

Example #1: Shortest Path in a Graph #

Above, we saw Dijkstra’s single-source shortest path algorithm; I claim that this is a dynamic programming problem.

We considered a (one-way) road network connecting sites in a town, where each path from a site to a connected site has a cost.


What is the lowest-cost path from S to E? How did we find it?

Formalizing our Solution #

\[ \gdef\dist{\text{dist}} \]

For nodes \(u\) in our graph, let \(\dist(u)\) be the minimal cost of a path from \(u\) to \(E\) (the end node). We want \(\dist(S)\). Finding \(\dist(u)\) is a subproblem.

For subproblem nodes \(u, v\) with an edge \(u \to v\) connecting them, let \(c(u,v) \equiv c(v,u)\) be the cost of that edge.

Here is our algorithm:

  1. Initialize \(\dist(u) = \infty\) for all \(u\).
  2. Set \(\dist(E) = 0\).
  3. Topologically sort the graph, giving us a sequence of nodes from S to E. Call this “subproblem dependent order”.
  4. Work through the sorted nodes in reverse. For each node \(v\), set

    \[ \dist(v) = \min_{u \to v} \left(\dist(u) + c(u,v)\right) \]

These last equations are called the Bellman equations.

Let’s try it. The subproblem dependent order is S, C, A, B, D, E, yielding:

dist(E) =                                 0
dist(D) = dist(E) + 1                   = 1
dist(B) = min(dist(E) + 2, dist(D) + 1) = 2
dist(A) = dist(B) + 6                   = 8
dist(C) = min(dist(A) + 4, dist(D) + 3) = 4
dist(S) = min(dist(A) + 1, dist(C) + 2) = 6

Exercise #

Write a function min_cost_path that returns the minimal cost path to a target node from every other node in a weighted, directed graph, along with the minimal cost. If there is no directed path from a node to the target node, the path should be empty and the cost should be infinite.

Your function should take a representation of the graph and a list of nodes in subproblem order. You can represent the graph anyway you prefer; however, one convenient interface, especially for R users, would be:

min_cost_path(target_node, dag_nodes, costs)

where target_node names the target node, dag_nodes lists all the nodes in dependent order, and costs is a symmetric matrix of edge weights with rows and columns arranged in dependent order. Assume: costs[u,v] = Infinity if there is no edge between u and v.

Note: You can use the above example as a test case.

Example #2: Longest Increasing Subsequence #

Given a sequence s of n ordinals, find the longest subsequence whose elements are strictly increasing.

5, 2, 8, 6, 3, 6, 9, 7  ->   2, 3, 6, 9

Let’s sketch a dynamic-programming solution for this problem. Work with a partner to answer these questions.

  • What are the subproblems?
  • Are they arranged in a DAG? If so, what are the relations?
  • What are the Bellman equations for these subproblems?
  • Sketch the DP algorithm here.
  • We can find the longest length, how do we get the path?
  • How would a straightforward recursion implementation perform? What goes wrong?

Example #3: Matrix Product Ordering #

Suppose we have three matrices \(A\), \(B\), and \(C\). To compute \(ABC\), we have two choices \((AB)C\) or \(A(BC)\). Which is better?

Assuming standard matrix multiplication, multiplying an \(n\times p\) by a \(p \times r\) takes \(O(npr)\) operations.

Example: Suppose \(A\), \(B\), and \(C\) are respectively \(100 \times 20\), \(20 \times 100\) and \(100 \times 20\).

  • \((AB)C\) takes \(100\cdot 20 \cdot 100 + 100 \cdot 100 \cdot 20 = 2\cdot 20\cdot 100^2\)
  • \(A(BC)\) takes \(100\cdot 20 \cdot 20 + 20\cdot100\cdot 20 = 2\cdot 20^2 \cdot 100\)

This is a factor of 5 difference.

Problem: Given matrices \(A_1, \ldots, A_n\) and their dimensions, what is the best way to “parenthesize” them in computing the products?

There are exponentially many choices, so brute force is out.

Subproblems: For each pair \(i \le j\), parenthesize \(A_i \cdots A_j\).

This gives \(n^2\) subproblems. How long does each subproblem take to solve? Look at the Bellman equations.

cost(i,j) = min_{k in i..j} cost(i,k) + cost(k+1,j) + combinationCost(i,j,k)

This is \(O(n)\) for each subproblem, giving \(O(n^3)\) total. (We can improve this.)

Example #4: Hidden Markov Models and Reinforcement Learning #

Hidden Markov Models #

In a Hidden Markov Model, the states of a system follow a discrete-time Markov chain with initial distribution \(\alpha\) and transition probability matrix \(P\), but we do not get to observe those states.

Instead we observe signals/symbols that represent noisy measurements of the state. In state \(s\), we observe symbol \(y\) with probability \(Q_{sy}\). These measurements are assumed conditionally independent of everything else given the state.

This model is denoted HMM\(\langle\alpha,P,Q\rangle\).

Many applications:

  • Natural language processing
  • Bioinformatics
  • Image Analysis
  • Learning Sciences

If we denote the hidden Markov chain by \(X\) and the observed signals/symbols by \(Y\), then there are fast recursive algorithms for finding the marginal distribution of a state given the observed signals/symbols and the parameters.

But usually, we want to reconstruct the sequence of hidden states. That is, we want to find:

\[ \mathop{\rm argmax}\limits_{s_{0..n}} P_\theta\left\{ X_{0..n} = s_{0..n} \mid Y_{0..n} = y_{0..n} \right\} \]

It is sufficient for our purposes to solve a related problem

\[ v_k(s) = \mathop{\rm argmax}\limits_{s_{0..k-1}} P_\theta\left\{ X_{0..k-1} = s_{0..k-1}, X_k = s, Y_{0..k} = y_{0..k} \right\} \]

Then we have

\[ v_k(s) = Q_{s,y_k} \max_r P_{rs} v_{k-1}( r), \]

which gives us a dynamic programming solution.

\[ \max_s v_n(s) = Q_{w_n,y_n} \max_r P_{r,w_n} v_{n-1} ( r) \]

where \(w_n = {\rm argmax}_s v_k(s)\).

This is a DAG of subproblems.

Reinforcement Learning #

Reinforcement Learning builds on a similar representation where we have a Markov chain whose state transitions we can influence through actions. We get rewards and penalties over time depending on what state the chain is in.

We learn by interacting with our environment: making decisions, taking actions, and then (possibly much later) achieving some reward or penalty.

Reinforcement learning is a statistical framework for capturing this process.

A goal-directed learner, or agent, learns how to map situations (i.e., states) to actions so as to maximize some numerical reward.

  • The agent must discover the value of an action by trying it.
  • The agent’s actions may affect not only the immediate reward but also the next state – and through that the downstream rewards as well.

These two features – value discovery and delayed reward – are important features of the reinforcement learning problem.

  • Examples

    • Playing tic-tac-toe against an imperfect player.

    • The agent plays a game of chess against an opponent.

    • An adaptive controller adjusts a petroleum refinery’s operation in real time to optimize some trade-off among yield, cost, and quantity.

    • A gazelle calf struggles to stand after birth and is running in less than an hour.

    • An autonomous, cleaning robot searches for trash among several rooms but must take care to return to a recharging station when power gets low.

    • Phil prepares his breakfast.

    Common themes: interaction, goals, rewards, uncertainty, planning, monitoring

  • Elements of Reinforcement Learning

    • The agent – the entity that is learning

    • The environment – the context in which the agent is operating, often represented by a state space.

    • A policy – determines the agent’s behavior.

      It is a function mapping from states of the environment to actions (or distributions over actions) to take in that state.

    • A reward function – defines the agent’s goal.

      It maps each state or each state and action pair to a single number (called the reward) that measures the desirability of that state or state-and-action.

      The reward at any state (or state-action pair) is immediate. The agent’s overall goal is to maximize the accumulated reward.

    • A value function – indicates the accumulated desirability of each state (or state-action pair).

      Roughly speaking, the value function at a state is the expected accumulated reward an agent can receive starting at that state (by choosing ``good” actions). The value function captures the delayed rewards.

    • An environment model (Optional) – mimics the environment for planning purposes.

  • Markov Decision Processes

    We model reinforcment using a Markov Decision Process.

    This is a discrete-time stochastic process on a state space S where the probabilities of transitions between states are governed by Markov transition probabilities for each action the agent can choose from a set A.

    The goal is to maximize the cumulative reward (possibly discounted) over time (with reward r(X_k) at time k).

    Using the Markov structure of the process allows us to find good (or even optimal) decision policies.

Activity/Example: Edit Distance between Strings #

When you make a spelling mistake, you have usually produced a “word” that is close in some sense to your target word. What does close mean here?

The edit distance between two strings is the minimum number of edits – insertions, deletions, and character substitutions – that converts one string into another.

Example: Snowy vs. Sunny. What is the edit distance?


Three changes transformed one into the other.

Another way to look at this is how the two strings align, where we can mark insertions/deletions in this alignment. Here are two possible alignments of Snowy and Sunny

S _ n o w y                         _ S n o w _ y
S u n n _ y                         S u n _ _ n y

0 1 0 1 1 0   Total cost: 3         1 1 0 1 1 1 0   Total cost: 5

How can we find the edit distance for any two strings, edit(s,t)?

To do that, we need to identify subproblems whose solutions we can combine to solve the larger problem.

To this end, consider another example: EXPONENTIAL vs. POLYNOMIAL.

We will consider prefixes of each string. So consider several prefix pairs:


edit = x       edit = y     edit = z


What is the edit distance of EXPONENTIAL and POLYNOMIAL given x, y, and z?

  • We can add a ‘__’ after EXPONENTIA giving cost 1 + x.
  • We can add a ‘__’ after POLYNOMIA giving cost 1 + y.
  • The L after POLYNOMIA and EXPONENTIA giving cost z.

Thus, the edit distance is: min(1 + x, 1 + y, z).

These prefixes form our subproblems. The result is:

E X P O N E N _ T I A L
_ _ P O L Y N O M I A L
1 1 0 0 1 1 0 1 1 0 0 0

Edit distance: 6

Questions #

  • Formally, what are the subproblems?



  • Are these subproblems arranged in a DAG?

  • How do we combine subproblems? (The Bellman Equations)

Application: Fast file differences #

Programs diff, git-diff, rsync use such algorithms (along with related dynamic programming problem Longest Common Subsequence) to quickly find meaningful ways to describe differences between arbitrary text files.

Application: Genetic Alignment #

Use edit distance logic to find the best alignment between two sequences of genetic bases (A, T, C, G). We allow our alignment to include gaps (’_’) in either or both sequences.

Given two sequences, we can score our alignment by summing a score at each position based on whether the bases match, mismatch, or include a gap.


C _ G _ A A T G C C _ A A A
m g m g m m x m m m g x m m

Score = 3*gap + 2*mismatch + 9*match

With (sub-)sequences, S and T, let S’ and T’ respectively, be the sequences without the last base. There are then three subproblems to solve to align(S,T):

  • align(S,T’)
  • align(S’,T)
  • align(S’,T’)

The score for S and T is the biggest score of:

  • score(align(S,T’)) + gap
  • score(align(S’,T)) + gap
  • score(align(S’,T’)) + match if last characters of S,T match
  • score(align(S’,T’)) + mismatch if last characters do not match

The boundary cases (e.g., zero or one character sequences) are easy to compute directly.

Question: Longest Common Subsequence #

If we want to find the longest common subsequence (LCS) between two strings, how can we adapt the logic underlying this edit distance example to find a dynamic programming solution?

  1. Based on “Small Change for Mujikhistan” from Doctor Ecco’s Cyberpuzzles by Dennis E. Shasha [return]