Functional Programming

— Christopher Genovese and Alex Reinhart

Plan for Today #

  • Core Concepts of Functional Programming
  • Higher-Order Functions
  • Activity

Paradigm Shift #

A programming paradigm is a framework of organizing concepts and abstractions that guide how we design and build software and how we think about solving problems.

Two familiar and commonly used paradigms are:

object-oriented programming
the class is the fundamental unit of abstraction - data and code are wrapped together, encapsulated, on objects that represent the nouns of our system.
imperative programming
a program is a sequence of commands and actions for the computer to perform that successively update program state. An imperative program describes how to perform a task.

These two paradigms often overlap in practice and are so familiar that alternatives can be hard to see/

Today, we will start a dive into related paradigms – functional programming, data-oriented programming, and declarative programming – that offer powerful benefits to our thinking and practices.

First Class Entities #

An entity is first class when it can be:

  • created at run-time,
  • passed as a parameter to a function,
  • returned from a function, or
  • assigned into a variable.

A first class entity typically has built-in support within a language and runtime environment.

For instance, (fixed-width) integers are first-class entities in all common languages.

  • Created: 101
  • Passed: max(11, -4)
  • Returned: return 7
  • Assigned: i = 17

Similarly with floating-point numbers, booleans, strings, and arrays. Many languages have more specialized first-class entities:

  • Arbitrary precision numbers (Clojure, Mathematica, Java)
  • Hash Tables/Associative Arrays/Tables (Python, Clojure, Perl, Ruby, Lua)
  • Raw Memory Locations (C, C++)
  • Classes/Objects (Python, Ruby, C++, Java, \(\ldots\))
  • Types (Idris)
  • Unevaluated Code (Common Lisp, Racket, Scheme, Clojure, Rust, R)
  • Functions (Python, R, Haskell, Clojure, OCaml, Rust, JavaScript, \(\ldots\))

The first-class entities in a language shape a languages central idioms and abstractions.

Core Concepts of Functional Programming #

Composability is a fundamental aspiration in software design. We want the parts of our system to interoperate and combine as freely as possible. This offers power, expressiveness, reusability, and many other benefits.

In functional programming (FP), functions are the fundamental unit of abstraction providing simple, easily composed building blocks for solving problems.

Central features of FP:

  • Functions are first-class entities
  • Functions are pure (whenever possible)
  • Data is immutable (whenever possible)
  • Programs have declarative structure (as much as possible)
  • Exploit laziness when appropriate
  • Referential transparency

Why use functional programming?

  • Functions are simple, easily composed abstractions
  • Easy to express many ideas with very little code
  • Easier to reason about – and test – a functional program
  • Avoids the significant complexity of mutating state
  • Effective Concurrency/Parallelism by avoiding mutable state
  • Focuses on data flow
  • Efficiently exploits recursive thinking
  • Fast development cycle
  • Encourages code reuse
  • Isolate side effects

Languages designed for FP: Clojure, Haskell, OCaml, Scala, Elm

Languages with great FP support: Rust, R, Common Lisp, Julia, Ruby, JavaScript

Decent FP Support: Python

Languages adding many FP features: C++, Java

First-Class Functions #

Having first-class functions means that functions are data too.

Some use cases:

  1. Parameterized strategies
  2. Generic operations - abstracting operations across a range of types
  3. Dynamically-defined operations based on parameters and data
  4. Combine multiple pieces into useful aggregate functions
apply(M, ACROSS_COLS, function(x) { max(x[!is.na(x) && x != 999]) })

pairwise.distances(X, metric = function(x, y) { max(abs(x - y)) })
integrate(lambda x: x*x, 0, 1)
// Abstracting Array Iteration
function forEach(array, itemAction) {
    for ( var i = 0; i < array.length; i++ ) {
        itemAction(array[i])
    }
}
forEach(["R", "SAS", "SPSS"], console.log);
forEach(["R", "SAS", "SPSS"], store);
forEach(["R", "SAS", "SPSS"], function(x) {myObject.add(x)});
def fwd_solver(f, z_init):
  "Fixed-point solver by forward iteration"
  z_prev, z = z_init, f(z_init)
  while np.linalg.norm(z_prev - z) > 1e-5:
    z_prev, z = z, f(z)
  return z

def newton_solver(f, z_init):
  "Newton iteration fixed-point solver"
  f_root = lambda z: f(z) - z
  g = lambda z: z - np.linalg.solve(jax.jacobian(f_root)(z), f_root(z))
  return fwd_solver(g, z_init)
;; *Markov Chain Monte Carlo (MCMC)* is a simulation method
;; where we create a Markov chain whose /limiting distribution/
;; is a distribution from which we want to sample.

(defn mcmc-step
  "Make one step in MH chain, choosing random move."
  [state moves]
  (let [move (random-choice moves)]
    (metropolis-hastings-step (move state))))

(def chain
  (mcmc-sample initial-state
               :select (mcmc-select :burn-in 1000 :skip 5)
               :extract :theta1
               :moves [(univariate-move :theta1 random-walk 0.1)
                       (univariate-move :theta2 random-walk 0.4)
                       (univariate-move :theta3 random-walk 0.2)]))
(take 100 chain)

Pure Functions #

What does this code do?

x = [5]
process( x )
x[0] = x[0] + 1

A function is pure if it:

  • always returns the same value when you pass it the same arguments
  • has no observable side effects

What are “side effects”?

  • Changing global variables
  • Printing out results
  • Modifying a database
  • Editing a file
  • Generating a random number
  • …anything else that changes the rest of the world outside the function.

Examples:

pure <- function(x) {
    return( sin(x) )
}

global.state <- 10

not.pure <- function(x) {
    return( x + global.state )
}

also.not.pure <- function(x) {
    return( x + rnorm(1) )
}

another.not.pure <- function(x) {
    save_to_file(x, "storex.txt")
    return( x + rnorm(1) )
}

u <- 10
and.again <- function(x) {
    ...
    print(...)              # Input/Output
    z <- rnorm(n)           # Changing internal state
    my.list$foo <- mean(x)  # Mutating objects' state
    u <<- u + 1             # Changing out-of-scope values
    ...
}
def foo(a):
    a[0] = -1

    return sum(a)

x = [1, 2, 3]

sum_of_x = foo(x)

x  #=> [-1, 2, 3]

Benefits of pure functions

  1. deterministic and mathematically well-defined
  2. easy to reason about
  3. easy to test
  4. easy to change
  5. can be composed and reused
  6. can be run simultaneously in concurrent processes
  7. can be evaluated lazily

Calculations, Actions, Data #

Actions
results depend on when they are called, how many times they are called, or context in which they are called
Calculations
operations results that do not affect the world when they run, processing inputs to output directly, independent of context or timing
Data
Facts about events, a record of something that happened, encodes meaning in structure

Useful to practice distinguishing among these three categories with your code.

Prefer calculations (pure functions) where possible and keep a clear delineation.

Immutable Data #

This is a common pattern in imperative programming:

a = initial value
for index in IndexSet:
    a[index] = update based on index, a[index], ....
return a

Each step in the loop updates – or mutates – the state of the object.

The familiarity of this operation obscures the tremendous increase in complexity it produces. Mutability couples different parts of your code and across time. Greatly complicates parallism and concurrency.

Immutable (persistent) data structures do not change once created.

  • We can pass the data to anywhere (even simultaneously), knowing that it will maintain its meaning.
  • We can maintain the history of objects as they transform.
  • With pure functions and immutable data, many calculations can be left to when they are needed (laziness).

These data structures achieve this with good performance by clever structure sharing. (Eg: pyrsistent library in Python, immer and immutable.js in JavaScript, built-in in Clojure, rstackdeque in R, \(\ldots\))

Favoring immutable data and pure functions makes values and transformations, rather than actions, the key ingredient of programs. So FP prefers expressions over statements

(foo (if (pred x) x (transform x)) y)
if pred(x):
    y = x
else:
    y = transform(x)
foo(y)

Declarative Structure #

A declarative program tells us what the program produces not how to produce it.

qsort []	= []
qsort l@(x:xs) = qsort small ++ mid ++ qsort large
  where
    small = [y | y <- xs, y < x]
    mid	  = [y | y <- l, y == x]
    large = [y | y <- xs, y > x]

Programming languages or packages often provide ways to thread data through multiple functions to clearly express the flow of operations:

tokenize <- function(line) {
    line %>%
        str_extract_all("([A-Za-z][-A-Za-z]+)") %>%
        unlist %>%
        sapply(tolower) %>%
        as.character
}

## much clearer than:
## as.character(sapply(tolower, unlist(str_extract_all("([A-Z]...)", line))))
(defn tokenize [line]
  (->> line
       (re-seq "([A-Za-z][-A-Za-z]+)")
       (map lower-case)))

Laziness (sometimes) #

Laziness refers to only computing a result when it is needed. This can be a powerful optimization, though it is not always desired. The opposite of lazy here is eager (or strict).

min = hd(qsort list)   -- Laziness makes this O(n) not O(n ln n)
fibs1 = 0 : 1 : zipWith (+) fibs1 (tail fibs1)
fibs2 = 0 : 1 : [ a+b | a <- fibs2 | b <- tail fibs2 ]

take 10 fibs1   -- prodcues [0,1,1,2,3,5,8,13,21,34]
(defn mcmc-sample
  "Generates an MCMC sample from an initial state.
   Returns a lazy sequence.

   Keyword arguments:

     :move    -- a collection of moves, which are
                 functions from state to a candidate
     :select  -- a selector function which indicates
                 a boolean for each index and state
                 if that state should be kept in the
                 output
     :extract -- a function to extract features
                 (e.g., parameters) from the state
  "
  [initial-state & {:keys [moves select extract]
                    :or {select (constantly true),
                         extract identity}}]
  (letfn [(stepper [index state]
            (lazy-seq
             (if (select index state)
               (cons (extract state)
                     (stepper (inc index) (mcmc-step state moves)))
               (stepper (inc index) (mcmc-step state moves)))))]
    (stepper 0 initial-state)))

Closures #

Closures are functions with an environment – variables and data – attached. The environment is persistent, private, and hidden. This is a powerful approach for associating state with functions that you pass into other functions. (In fact, an entire OOP system could be built from closures.)

counter <- function(start=0, inc=1) {
    value <- start

    return(function() {
        current <- value
        value <<- value + inc
        return(current)
    })
}

cc <- counter(10, 2)
dd <- counter(10, 2)

cc() # 10
cc() # 12
cc() # 14...
value  # Error: object 'value' not found
dd() # 10

kernel_density <- function(data, bandwidth) {
    return(function(x) {
        ## calculate the density here
        for (row in 1:nrow(data)) {
             ## calculate some stuff
        }
        return(density)
    })
}

d <- kernel_density(data, 4)

d(3) #=> returns density at 3

Only the anonymous function returned by counter can access that internal state: it is private and unique to each instance.

Higher-Order Functions #

Higher-order functions are functions that take other functions as arguments.

Three commonly used: map, filter, and reduce

Map #

The map operation takes a function and a collection and calls the function for each successive element of the collection, producing a new collection out of the results. For example, in R:

square <- function(x) x^2
Map(square, 1:10)   #=>  list(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)

or in Python:

list(map(lambda x: x*x, range(1, 11)))  #=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

using an anonymous function, declared with the lambda keyword, that has no name.

map can usually do more than this. In most languages, it can take multiple sequences and a function that takes multiple arguments. It then calls the function with successive elements from each sequence as its arguments, producing a sequence of the results. For example, in R

Map(`-`, 1:10, 10:1)   #=>  list(-9, -7, -5, -3, -1, 1, 3, 5, 7, 9)

and in Python

list(map(lambda x, y: x - y, range(1, 11), range(10, 0, -1)))
   #=> [-9, -7, -5, -3, -1, 1, 3, 5, 7, 9]

We can use map to replace a common pattern we see over and over in code processing data:

transformed_data <- c()

for (ii in 1:length(data)) {
    transformed_data <- c(transformed_data, do_stuff_to(data[ii]))
}

## becomes

transformed_data <- Map(do_stuff_to, data)

The map operation expresses our meaning more clearly and concisely, and is more efficient to boot (since it does not repeatedly copy the transformed_data).

Filter #

The filter operation takes a predicate (a function returning a boolean) and a collection. It calls the predicate for each element, and returns a new collection containing only those elements for which the predicate returns true. For example, in R:

is_odd <- function(x) { (x %% 2 != 0) }

Filter(is_odd, 1:10) #=>  c(1, 3, 5, 7, 9)
Filter(is_odd, as.list(1:10)) #=>  list(1, 3, 5, 7, 9)

Notice that each result is the same type as the collection Filter was given. In Python,

filter(lambda x: x % 2 != 0, range(1, 11)) #=> [1, 3, 5, 7, 9]

We can always combine a map with a filter, applying a function to only those elements matching a predicate. The composability of these operations is a major advantage, and much easier to deal with than building complicated loops over data manually.

Reduce #

The reduce operation, also sometimes called fold, is a much more general way to process the elements of a sequence. We could use it to build map, filter, or many other interesting operations.

reduce takes as arguments a function, an accumulator, and a sequence. The function takes two arguments – the accumulator and one element of the sequence – and returns an updated accumulator. The function is first passed with the initial accumulator and the first element of the sequence, returning a new accumulator which is passed to the function again with the second element of the sequence, and so on. The final value of the accumulator is returned by reduce.

For example, suppose we want to add up the numbers in a list, but keep separate sums of the odd numbers and the even numbers. In R:

parity_sum <- function(accum, element) {
    if ( element %% 2 == 0 ) {
        list(accum[[1]] + element, accum[[2]])
    } else {
        list(accum[[1]], accum[[2]] + element)
    }
}

Reduce(parity_sum, 1:10, list(0,0))  #=> list(30, 25)

In Python:

def parity_sum(acc, x):
  even, odd = acc
  if x % 2 == 0:
      return [even + x, odd]
  else:
      return [even, odd + x]

reduce(parity_sum, range(1,11), [0,0])  #=> [30, 25]

Activity #

Appendix: A Brief Clojure Primer #

Why Clojure? #

  • A powerful and elegantly designed language with a simple core and extensive libraries.
  • Immutability, concurrency, rich core data types
  • Same language runs on the JVM and the browser
  • A significant data-science footprint
  • Cutting-edge ideas: spec, transducers, metadata, async, …
  • All the power of lisp: macros, REPL-driven development, single and multiple dispatch, destructuring, …
  • Functional design, testing, rapid development
  • Great community

Simple Syntax #

A clojure form (expression) is either:

  1. A literal piece of data

    • Numbers 42, -1.23, 8/7 (rational), BigInteger, …
    • Boolean Values true and false
    • Null value nil
    • Strings "foo bar" and characters \c
    • Symbols 'foo or Keywords :bar
    • Regular expressions #"[A-Z][a-z]*"
    • Vectors [1 2 3]
    • Sets #{"dog" "cat" "wolverine"}
    • Maps {:name "Col. Mustard" :weapon "candlestick" :room "Drawing Room"}
    • Lists '(1 2 3)
    • Evaluated symbols: x, even?, upper-bound
  2. Function call (+ 1 2 3) (f "input", :option 1)

Whitespace (including ‘,’) and comments are ignored.

Everything is an expression, which has a value, e.g.: (if true 100 -100) has the value 100.

Functions are first class objects, and some literal objects act like functions too.

Simple Operations #

Functions #

(defn f
  "Documentation here"
  [arguments here]
  (body arguments here))

(f "called" "this way")

(defn f2
  "This function has more than one arity"
  ([] "no arguments value")
  ([one-argument] (+ one-argument 10))
  ([two arguments] [two arguments])
  ([two plus & more] {:first two :second plus :rest more}))

(f2)                     ;=> "no arguments value"
(f2 32)                  ;=> 42
(f2 "these all" "work")  ;=> ["these all", "work"]
(f2 :a :b :c :d :e :f)   ;=> {:first :a, :second :b,
                         ;    :rest [:c :d :e :f]}

;; Anonymous function and a shortcut

(fn [x y] (+ x y))  ; these are equivalent
#(+ %1 %2)

(#(+ %1 %2) 10 6)   ;=> 16

;; Closures

(let [env 10]
  (defn g [x]
    (+ env x)))

(g 20)  ;=> 30

Special forms #

Binding values

(let [x 10,       ;x, y, and z are immutable
      y 20
      z 30]
  (+ x y z 4))    ;=> 64
; x, y, z are not bound out here

(let [simple-mutable (atom 0)] ; an atom is thread safe
  (swap! simple-mutable inc))  ;=> 1

(def answer 42)  ; globally bound vars
(def sums {[1 2 3] 6, [10 22] 32, [] 0})
(def v [1 1 3 5 8])
(def a-map {:a 1 :b 2 :c 3})
(def a-set #{:foo :bar :zap}) ; elements are unique

;; note any values can be keys for a map

Conditionals

(if true 1 0)  ;=> 1
(if false 1 0) ;=> 0
(if nil 1 0)   ;=> 0
;; only false and nil are falsy, all other values truthy
(if "" 1 0)    ;=> 1
(if 0 1 0)     ;=> 1
(if [] 1 0)    ;=> 1

;; Functions are values too
((if true + -) 4 2)  ;=> 6
((if false + -) 4 2) ;=> 2

(when true
  (println "Hello, world"))

Comparisons

(= 10 20)  ;=> false
(= [:a :b :c] [:a :b :c])  ;=> true
(< 10 20)  ;=> true
(<= 11 11) ;=> true
(or (= nil nil) (= 3 4))   ;=> true
(and (= nil nil) (= 3 4))  ;=> false
(and (= nil nil) (= 4 4))  ;=> true
(not false)                ;=> true

Loops

(loop [step 0]
  (println (str "This is step " step "."))
  (if (> step 9)
    (println "Done.")
    (recur (inc step))))

What does this print?

Accessing data #

(get sums [1 2 3])        ;=> 6
(sums [1 2 3])            ;=> 6  the map acts like a function
(get sums [2 2] :missing) ;=> :missing  (default value)

(get a-map :b)  ;=> 2
(a-map :b)      ;=> 2
(:b a-map)      ;=> 2  keywords also act like a function

(get v 2)      ;=> 3
(nth v 2)      ;=> 3
(nth v 10)     ;=> EXCEPTION
(nth v 10 42)  ;=> 42
(get v 10)     ;=> nil

Adding to collections #

(conj v 13) ;=> [1 1 3 5 8 13]
v           ;=> [1 1 3 5 8]  (v is immutable)
(conj a-set :ahhh) ;=> #{:foo :bar :zap :ahhh}
a-set              ;=> #{:foo :bar :zap} structure shared

(assoc a-map :d 4) ;=> {:a 1, :b 2, :c 3, :d 4}
a-map              ;=> {:a 1, :b 2, :c 3}  structure shared

Destructuring #

We can bind names to values within structures with destructuring:

a, b = f(12) # f returns a tuple (12, 13); a = 12, b = 13

(let [[x y] [1 2 3]]
  (vector x y))      ;=> [1 2]

(let [[x y & more] [1 2 3 4 5]]
  (vector x y more))  ;=> [1 2 '(3 4 5)]

(let [{:keys [a b c]} {:a 1 :b 2 :c 3}]
  [a b c])    ;=> [1 2 3]

(defn f [[x y] {:keys [a b c]}]
  [x y a b c])

(f [1 2] {:a 10 :b 20 :c 30}) ;=> [1 2 10 20 30]

(defn g [x y & more-args]
  (+ 4 x y (first more-args)))

(g 10 20 30 40 50 60)         ;=> 64
(apply g 10 [20 30 40 50 60]) ;=> 64

;; much more is possible

Namespaces and libraries #

All code is defined within a namespace that controls access to each symbol.

This lets library code be loaded without stepping on other code.

For example: all the code I’m executing now is in the user namespace. There are mechanisms for importing symbols from other namespaces, which is how you work with libraries.

There are also ways to access features of the host platform (e.g., the JVM or the javascript environment).

This is beyond our goals for today

Appendix: Statistical Examples with Clojure #

Means and Variances #

We can compute simple means using the recurrence \[ \bar x_{n+1} = \bar x_n + K_n (x_{n+1} - \bar x_n), \] where \(K_n = 1/(n + 1)\). The value \(K_n\) is a “gain” parameter that applies to the “residual” at the next step.

(defn simple-update [[xbar n] x]
  (let [n' (+ n 1)
        K (/ n')]
    [(+ xbar (* K (- x xbar))), n']))

(def data (range 5))

(->> data
     (reduce simple-update [0.0 0])
     first)

This same gain idea works with weighted averages using the same recurrence with where \(K_n = w_{n+1}/\sum_{i=1}^{n+1} w_i\).

(defn weighted-update [[xbar wsum] [x w]]
  (let [wsum' (+ wsum w)
        K (/ w wsum')]
    [(+ xbar (* K (- x xbar))), wsum']))

(def data (range 5))
(def weights [1.0 2.0 3.0 4.0 5.0])

(->> (map vector data weights)
     (reduce weighted-update [0.0 0])
     first)

Alternatively, we could just keep track of the components

(defn weighted-update-nodiv [[x-dot-w wsum] [x w]]
  [(+ x-dot-w (* x w)), (+ wsum w)])

(->> (map vector data weights)
     (reduce weighted-update-nodiv [0.0 0])
     (apply /))

but the gain form will come in handy soon.

We can do similar things with variance

(defn welford-variance
  "Updating function for Welford variance computation
  0-arity version gives initializer, 2-arity updater."
  ([] {:S 0.0 :xbar 0.0 :wsum 0.0})
  ([{:keys [S xbar wsum]} [x w]]
   (let [wsum' (+ wsum w)
         K (/ w wsum')
         xbar' (+ xbar (* K (- x xbar)))]
     {:S    (+ S (* w (- x xbar) (- x xbar')))
      :xbar xbar'
      :wsum wsum'})))

(let [extract-parts (juxt :S (comp dec :wsum))]
  (->> (map vector data (repeat 1))
       (reduce welford-variance (welford-variance))
       extract-parts
       (apply /)))

Least Squares #

Consider a basic homoskedastic regression model \[ y = X \beta + \sigma^2 \epsilon, \] where \(X\) is \(n \times p\), \(\beta\) is \(p \times 1\), and \(\epsilon\) is mean 0, unit variance noise.

We can compute \(\hat\beta = (X^T X)^{-1} X^T y\) directly with various methods, but it is useful to think of this sequentially. With the help of the Woodbury formula for one step updates of an inverse, we have:

(defn as-scalar [mat]
  (select mat 0 0))

(defn %*% [a b]
  (let [M (mmul a b)]
    (if (every? #(= % 1) (shape M)) (as-scalar M) M)))

(defn least-squares-update [sigma-squared p scale]
  (fn
    ([] {:beta-hat (new-matrix p 1)
         :V (mul (identity-matrix p) sigma-squared scale)})
    ([{:keys [beta-hat V]} [x y]]
     (let [VxT (mmul V (transpose x))
           D (+ sigma-squared (%*% x VxT))
           K (div VxT D)
           residual (- y (%*% x beta-hat))]
       {:beta-hat (add beta-hat (mul K residual))
        :V (sub V (mul (mmul K (transpose K)) D))}))))

(def lsdata
  "Sequential regression data, each element of which
  contains a row of X and the corresponding y."
  [[[[1.0 0.0 0.0 0.0]]    -2.28442]
   [[[1.0 1.0 1.0 1.0]]    -4.83168]
   [[[1.0 -1.0 1.0 -1.0]] -10.46010]
   [[[1.0 -2.0 4.0 -8.0]]   1.40488]
   [[[1.0 2.0 4.0 8.0]]   -40.80790]])

(def updater (least-squares-update 1.0 4 1000.0))

(->> lsdata
     (reduce updater (updater))
     :beta-hat)

This is regression analysis in a functional style, with a few simple lines

The Kalman Filter #

Our sequential least squares actually solves another problem: finding the minimum variance predictor for a linear dynamical system driven by a stochastic process.

Consider state vectors \(x_t\) and observation vectors \(y_t\) that evolve as follows:

\[\begin{aligned} x_{t+1} &= A x_t + \delta_t \cr y_{t+1} &= C x_t + \epsilon_t, \end{aligned}\]

where \(\delta_t\) and \(\epsilon_t\) are each, say, iid Normal, mean 0 noise with fixed covariance. (These are called the state noise and measurement noise, respectively.)

If \(\hat x_{t+1}\) (and \(\hat y_{t+1}\)) is the best predictor of \(x_{t+1}\) (and \(y_{t+1}\)) given observations up to time \(t\), then

\[ \hat x_{t+1} = A \hat x_t + K_t (y_t - \hat y_t), \]

which is of the same form we just computed.

This gives us a functional implementation of the Kalman Filter.

Markov Chain Monte Carlo #

Markov Chain Monte Carlo (MCMC) is a simulation method where we create a Markov chain whose limiting distribution is a distribution from which we want to sample.

(defn mcmc-step
  "Make one step in MH chain, choosing random move."
  [state moves]
  (let [move (random-choice moves)]
    (metropolis-hastings-step (move state))))

(defn mcmc-sample
  "Generate an MCMC sample from an initial state.
   Returns a lazy sequence.

   Keyword arguments:

     :move    -- a collection of moves, which are
                 functions from state to a candidate
     :select  -- a selector function which indicates
                 a boolean for each index and state
                 if that state should be kept in the
                 output
     :extract -- a function to extract features
                 (e.g., parameters) from the state
  "
  [initial-state & {:keys [moves select extract]
                    :or {select (constantly true),
                         extract identity}}]
  (letfn [(stepper [index state]
            (lazy-seq
             (if (select index state)
               (cons (extract state)
                     (stepper (inc index) (mcmc-step state moves)))
               (stepper (inc index) (mcmc-step state moves)))))]
    (stepper 0 initial-state)))

(def chain
  (mcmc-sample initial-state
               :select (mcmc-select :burn-in 1000 :skip 5)
               :extract :theta1
               :moves [(univariate-move :theta1 random-walk 0.1)
                       (univariate-move :theta2 random-walk 0.4)
                       (univariate-move :theta3 random-walk 0.2)]))

(take 100 chain)

The design of this chain is modular, and easily adaptable to a wide variety of models.