Divide and Conquer

— Alex Reinhart and Christopher Genovese

Suppose we want to write some code that can solve a particular problem – calculate some quantity, select data meeting certain requirements, whatever. The code to solve the problem is intimidating, involving big loops over lots of data or combinations or permutations or options.

Suppose, however, that:

  1. The problem would be very easy to solve if there were only one or two data points (or options, or whichever).
  2. If you had the solution to two or more problems, it would be very easy to combine them to make the solution of a bigger problem.

When a problem has these features, we can divide and conquer to solve it.

Under this strategy we solve a problem recursively. It has three main steps:

  1. Divide the problem into smaller subproblems of the same type.
  2. Conquer the subproblems:
    • Small enough subproblems can be solved directly. This is called the base case.
    • Otherwise, solve the subproblem recursively – using the original algorithm. This is called a recursive case.
  3. Combine the subproblem solutions into a solution for the original problem.

Introductory Example #

Problem: Given a monetary value amount in cents and a set of coin denominations coins, also in cents, generate all the distinct ways to make change for the former in terms of the latter.

Write this as a function change_for (or change-for or changeFor depending on your language conventions). Also write a function best_change_for, with the same arguments, that gives the way to make change with the smallest number of coins required.

The function change_for should return a list of vectors, where each vector contains the coins used to make change. The returned list should contain no repeats (accounting for order). The function best_change_for should return any single way of making change (a vector of denominations) of minimal size.

For example:

change_for(17, [1, 5, 10, 25])

# should return these results:
[[10, 5, 1, 1],
 [5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [10, 1, 1, 1, 1, 1, 1, 1],
 [5, 5, 1, 1, 1, 1, 1, 1, 1],
 [5, 5, 5, 1, 1]]

Discussion Questions #

  • What are the subproblems here?
  • What subproblems are in the base case?
  • What is the recursive structure of this problem?
  • How are the subproblem results combined?
  • How can you ensure that there are no repeats in the generated list?
  • What is the interface for the function change-for?

Constructing Tests #

Generate some test ideas with a partner!

Pseudo-Code #

With a partner, sketch out the pseudocode for a recursive function that solves the problem.

Example: Mergesort #

Problem: Given an array of objects \(a_1, \dots, a_n\) from an ordered set (e.g., numbers), find a permutation of the array such that \(a_1 \leq a_2 \leq \dots \leq a_n\).

One divide-and-conquer approach to this problem is mergesort:

  • Divide the n-element array into two arrays of n/2 elements each.
  • Conquer by sorting the two subarrays recursively using mergesort.
  • Combine the sorted subarrays by merging into a single sorted array.

The base case here is when the subarrays have length 1.

Utility function merge #

Define a procedure merge(a, lo, mid, hi) that takes an array a and indices lo <= mid < hi into a where subarrays a[lo..mid] and a[mid+1..hi] are assumed in sorted order and merges the two subarrays in place to sort a[lo..hi].

How does the computation time of merge vary with n = hi - lo + 1?

On to mergeSort #

Assume you have merge. Write mergeSort(a) that sorts a using divide-and conquer.

Back to merge #

How would you implement merge? (Hint: Think about two piles of cards.)

Example: Multiplication #

Multiplying two complex numbers seems to require four multiplications: \[ (a + i b) (c + i d) = a c - bd + i (ad + bc), \] but we can do it in three \[ (a + i b) (c + i d) = a c - bd + i [(a + b)(c + d) - a c - b d]. \] This reduction of cost by ¾ might not be a big deal for a single multiplication, but when we apply this strategy recursively the ¾ gain is applied at each stage – to significant effect.

Now consider multipying two n-bit integers x and y, where n is a power of 2. Thinking in terms of their binary representations, we can write \[ \begin{aligned} x &= 2^{n/2} x_L + x_R \cr y &= 2^{n/2} y_L + y_R,\cr \end{aligned} \] where \(x_L\), \(x_R\), etc. are n/2-bit integers. For example, if \(x = 10110110_2\) then \(x_L = 1011_2\) and \(x_R = 0110_2\).

We can apply the multiplication trick above to reduce four multiplications to three: \[ x y = 2^n x_L y_L + x_R y_R + 2^{n/2} [(x_L + x_R)(y_L + y_R) - x_L y_L - x_R y_R]. \] The additions and multiplication by powers of 2 (shifts) are fast. The arbitrary multiplications are the costly operation, but we have reduced the problem to three multiplications of half the size. We can apply the same trick recursively to each of these three multiplications.

fast_multiply(x,y):
   n = max(bitwidth(x), bitwidth(y))
   if n <= NATIVE_BITS:
       return x*y

   x_L, x_R = leftmost ceiling(n/2), rightmost floor(n/2) bits of x
   y_L, y_R = leftmost ceiling(n/2), rightmost floor(n/2) bits of y

   a = fast_multiply(x_L, y_L)
   b = fast_multiply(x_R, y_R)
   c = fast_multiply(x_L + x_R, y_L + y_R)
   return (a << n) + b + (c - a - b) << n/2

Practical note #

On real machines, it would not make sense to recurse down to the single bit level. Rather, the base case would be whatever size supports fast native multiplications on the given machine (e.g., 16-, 32-, or 64-bit).

Improved computational cost #

We can estimate the number of operations \(T_n\) required for this recursive algorithm on an n-bit number: \[ T_n \le 3 T_{n/2} + O(n), \] which gives us \(T_n = O(n^{\log_2 3}) \approx O(n^{1.59})\). This is notably better than the \(O(n^2)\) performance we get if we use four multiplications.

Extending to matrix multiplication #

The same idea applies to matrix multiplication (called Strassen’s algorithm). The standard blockwise matrix multiplication gives eight multiplications of submatrices: \[ \begin{bmatrix} A & B \cr C & D \end{bmatrix} \times \begin{bmatrix} E & F \cr G & H \end{bmatrix} = \begin{bmatrix} AE + BG & AF + BH \cr CE + DG & CF + DH \end{bmatrix} \] We can reduce this to seven multiplications with a bit of rearrangement. \[ \begin{bmatrix} A & B \cr C & D \end{bmatrix} \times \begin{bmatrix} E & F \cr G & H \end{bmatrix} = \begin{bmatrix} M_5 + M_4 - M_2 + M_6 & M_1 + M_2 \cr M_3 + M_4 & M_1 + M_5 - M_3 - M_7 \end{bmatrix} \] where

\[\begin{aligned} M_1 &= A (F - H) & & M_5 = (A + D)(E + H) \cr M_2 &= (A + B) H & & M_6 = (B - D)(G + H) \cr M_3 &= (C + D) E & & M_7 = (A - C)(E + F) \cr M_4 &= D (G - E). \end{aligned} \]

Now recurse. This yields \(O(n^{\log_2 7})\) versus \(O(n^3)\).

Exercise: Selecting the Median #

Suppose you have a numeric vector with a very large dimension \(n\) (in the billions, say, or even more). You want to find the median of these values.

What’s the most direct way to do this? What are some issues we might face with that approach?

Let’s devise a divide-and-conquer method for this problem. To do this, we need to (at least):

  1. Identify appropriate subproblems
  2. Decide which subproblems to solve recursively and which to solve directly.
  3. Determine how to combine subproblem solutions to get solutions to larger problems.

Number 1 is the tricky part here. First, remember that the recursive structure means that we will apply the same algorithm to the smaller problems. Second, suppose you had a median value \(v\) already. What properties does it satisfy? What subproblems does this suggest?

Ideas?

We will develop pseudo-code for a function median(X) that computes a median of the vector X by divide-and-conquer.

How many “steps” does this algorithm take? How does this performance depend on our choice at each stage?

Example: Selecting General Order Statistics #

Given a list X, we want to pick the k-th smallest element in X. We could sort them and pick it, but we would like a linear-time algorithm.

\[ \gdef\size{\operatorname{size}} \gdef\selection{\operatorname{selection}} \]

selection(X,k):

  1. Pick a random “pivot” \(v\) from the list.
  2. Partition \(X\) into \(X_L\), \(X_v\), \(X_R\), values less than, equal to, and greater than \(v\).
  3. Let \(k’ = k - \size(X_L) - \size(X_v)\).
  4. Then return \[\selection(X,k) = \begin{cases} \selection(X_L, k) & \text{if } k \le \size(X_L) \cr v & \text{if } \size(X_L) < k \le \size(X_L) + \size(X_v)\cr \selection(X_R, k’) & \text{if } k > \size(X_L) + \size(X_v). \end{cases} \]

The computation of the three sublists can be done in linear time (and in place, without copying the lists). At each stage, we reduce the size of the list from \(\size(X)\) to \(\max(\size(X_L), \size(X_R))\).

If we pick \(v\) to split \(X\) in half, we get a linear time algorithm.

But that would involve picking the median, which is one of the order statistics we might be trying to compute. Ooops. Instead, to choose \(v\) at each stage, take a (small) random sample from the list and pick the median of the sample. With high probability, that choice of pivot will be a good one. This tends to work in practice.

Example: Fast Fourier Transform (FFT) #

The FFT is a critically important algorithm for computing a critically important mathematical transform.

The FFT is based on a divide-and-conquer algorithm for fast polynomial multiplication, and it has other recursive representations as well.

If \[\begin{aligned} A(x) &= a_0 + a_1 x + \cdots + a_d x^d \cr B(x) &= b_0 + b_1 x + \cdots + b_d x^d, \end{aligned} \] then \[ C(x) = A(x) B(x) = c_0 + c_1 x + \cdots + c_{2d} x^{2d}, \] where \(c_k = a_0 b_k + a_1 b_{k-1} + \cdots + a_k b_0\). The coefficients of C are the convolutions of the coefficients of A and B.

Key ideas #

  • A d-degree polynomial is determined by its value at \(d+1\) distinct points.
  • This gives us two equivalent but distinct representations of a polynomial: the coefficients and the values at selected points.
  • It is slow to multiply polynomials in the coefficient representation, but fast to multiply them in the value representation.
  • Divide and conquer gives us a fast, recursive way to evaluate polynomials if we use special points called the roots of unity
  • Divide and conquer gives us a fast way to move between coefficient and value representation, almost for free.

Multiplying in the value representation #

  1. Pick \(n \ge 2d + 1\) distinct points \(x_1, \ldots, x_n\)
  2. Given polynomials (value representation) \(A(x_1), \ldots, A(x_n)\) and \(B(x_1), \ldots, B(x_n)\), form products \[ C(x_1), \ldots, C(x_n) = A(x_1)B(x_1), \ldots, A(x_n)B(x_n) \]
  3. Recover coefficient representation of C

Divide and conquer #

We can split A (and similarly B) as follows \[ A(x) = A_{\rm even}(x^2) + x A_{\rm odd}(x^2). \] Then to compute any pair of values \(\pm x\), we can share calculations – giving a speed up. \[\begin{aligned} A(x) &= A_\text{even}(x^2) + x A_\text{odd}(x^2) \cr A(-x) &= A_\text{even}(x^2) - x A_\text{odd}(x^2), \end{aligned}\] requiring two evaluations of a smaller polynomial to compute both two evaluations of A.

Using pairs \(\pm x_0, \ldots, \pm x_{n/2-1}\) seems to work fine at the first level. But the next set of points is \(x_0^2, \ldots, x_{n/2-1}^2\). How can those be positive-negative pairs?

The answer: use well chosen complex numbers – n-th roots of unity!

If \(\omega^n = 1\), \(\omega\) is an \(n\)th root of unity. The value \(\omega = e^{2\pi i/n}\) is called the principal \(n\)th root of unity because all the others are derived from it. The \(n\)th roots of unity satisfy two properties that are important for our purposes:

  • They come in positive-negative pairs: \(\omega^{n/2 + j} = -\omega^j\).
  • Squaring them produces (n/2)-nd roots of unity: \((\omega^2)^{n/2} = 1\).

So if we start with the right roots of unity, our divide and conquer recursion keeps its special properties throughout.

fft(A, w):
    A is a polynomial in the coefficient representation
    w is an nth root of unity

Do:

  1. If w == 1, return A(1)
  2. Write \(A(x) = A_{\rm even}(x^2) + x A_{\rm odd}(x^2)\)
  3. Call \(\texttt{fft}(A_{\rm even}, w^2)\) to evaluate A_even at even powers of w
  4. Call \(\texttt{fft}(A_{\rm odd}, w^2)\) to evaluate A_odd at even powers of w
  5. for j in 0..n-1: \(A(w^j) = A_{\rm even}(w^2j) + w^j A_{\rm odd}(w^2j)\)

Return \(A(w^0)\), …, \(A(w^{n-1})\)

Interpolation #

As stated above, the divide and conquer gives us a fast, recursive method to multiply value representations. But amazingly, it gives us a fast way to convert to the coefficient representation as well.

It turns out that when

values = fft(coefficients, w)

we also have

coefficients = fft(values, w^{-1})/n.

Polynomial multiplication thus looks like:

  1. Apply fft to convert from coefficient to value representation
  2. Multiply value representations
  3. Apply (inverse) fft to convert back to coefficient representation

Generalizing #

It is useful to express this in matrix-vector terms. For instance, a polynomial A can be written as \[ \begin{bmatrix} A(x_0) \cr A(x_1) \cr \vdots \cr A(x_{n-1}) \end{bmatrix} = \begin{bmatrix} 1 & x_0 & x_0^2 & \cdots & x_0^{n-1} \cr 1 & x_1 & x_1^2 & \cdots & x_1^{n-1} \cr & & \vdots & & \cr 1 & x_{n-1} & x_{n-1}^2 & \cdots & x_{n-1}^{n-1} \cr \end{bmatrix} \begin{bmatrix} a_0 \cr a_1 \cr \vdots \cr a_{n-1} \end{bmatrix} \] with the matrix – call it \(M(x)\) – known as the Vandermonde matrix.

As long as \(x_0,\ldots,x_{n-1}\) are distinct, \(M(x)\) is invertible.

Evaluation is multiplication by \(M\); interpolation is multiplication by \(M^{-1}\).

The FFT is just multiplication by \(M(\omega)\) for an nth root of unity. The inverse FFT is obtained from the fact that \(M^{-1}(\omega) = M(\omega^{-1})/n\).

Our general FFT uses divide-and-conquer to make this multiplication fast, similar to what we did above with the polynomials:

fft(a, w):

     Input:  a = (a_0, ..., a_{n-1}) is an array where n is a power of 2
             w is a primitive nth root of unity
     Output: M_n(w) a

if w == 1:
    return a
(s_0, ..., s_{n/2 - 1}) = fft((a0, a_2, ..., a_{n-2}), w^2)
(u_0, ..., u_{n/2 - 1}) = fft((a1, a_3, ..., a_{n-1}), w^2)

for j from 0 to n/2 - 1:
    r_j = s_j + w^j u_j
    r_{j + n/2} = s_j - w^j u_j

return (r_0, ..., r_{n-1})