## Goals for Today #

- Describe the core concepts of functional programming (FP)
- Show you some of the mechanics of using FP
- Argue for the value of “immutability” and “pure functions”
- Give you a taste of what “thinking functionally” means
- Pointing you toward FP resources in R and other languages

What takes longer than one class: the “aha” moment.

Key point:

**Whatever language you use, using the ideas and methods of FP will make you a
better programmer.**

## Review and Contrast with OOP #

A fundamental part of developing software is **building effective
abstractions** to model and organize the functionality the code
provides.

Previously, we looked at object-oriented programming (OOP), which models
problems with a collection of inter-related *objects* that have designated
*behaviors* and which manage their own *internal state*.

The core features of OOP are:

- Encapsulation
- keeping things on a “need to know” basis
- Polymorphism
- interchangeability of objects based on interface
- Inheritance and Composition
- building new classes that can be used interchangeably with others
- Delegation of Responsibilities
- each class handles its own sphere of concerns

In OOP, **the class is the fundamental unit of abstraction**.

Why OOP?

- Modularity
- Separation of Concerns
- Ease of modification where interface is shared
- Ease of testing (maybe)
- Can be easy to reason about

Issues:

- Overhead, boilerplate, verbosity
- Emergent complexity – many classes, complex relationships
- In practice, not as modular or reusable as hoped
- Parallelism/Concurrency is very hard
- Fundamentally prioritizes nouns over verbs. Why?
- Can be hard to reason about

Today, we consider another programming paradigm called
**Functional programming (FP)** that emphasizes *verbs* over nouns.

Functional design focuses on building and *combining* the basic parts of a
computation, so simple abstract ideas can be *composed* to build many things.

Notes:

OOP was the clearly dominant paradigm of the 1990s and 2000s and remains entrenched, but FP has been enjoying a striking renaissance this decade.

FP and OOP are not in strictly in opposition; they can usefully coexist. But they do lead to different ways of thinking about computation.

For reference, there are many common programming paradigms. These include: procedural, object-oriented, functional, logic, event-driven, aspect oriented, and automata-based. Procedural and object-oriented are

*imperative*– giving a sequence of computational steps that mutate the state of the program – while functional and logic programming (and some database query languages) are*declarative*– expressing the intent of the program without necessarily specifying the control flow. (Aspect-oriented is a bit odd.)

## The Core Concepts of Functional Programming (FP) #

In *functional programming* (FP), **functions are the fundamental unit of abstraction**.

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

– Alan Perlis

Functions return values, and functions *are* values – they can be passed to
other functions, manipulated, and stored. Functional programming thinks of
code as a way of transforming values to produce new values, not as a series of
commands:

Functional programming is a style of programming that emphasizes the evaluation of expressions rather than the execution of commands.

– comp.lang.functional FAQ

Why use functional programming?

- Functions are simple, easily composed abstractions
- Easy to express many ideas with very little code
- Easier to reason about (and thus check) a program
- Effective Concurrency and Parallelism
- Declarative structure; focus on data flow
- Avoid the complexity of mutating state
- Efficiently exploit recursive thinking

There are many programming languages designed from the beginning for functional programming, like Clojure, OCaml, Haskell, Scala, and Elm. Others use many functional programming ideas, including R, Lisp, Julia, Ruby, and JavaScript. (Python does so only begrudgingly.) Even old stodgy languages like C++ and Java have recently added functional programming features.

### First-Class Functions #

In functional programming, **functions are first-class entities**.

An entity is “first class” if it can be:

- created on demand,
- stored in a variable or data structure,
- passed as arguments to functions, and
- returned as values from functions.

For instance, integers are first-class entities:

- Create:
`14`

- Store:
`i = 7`

- Pass:
`max(7, 14)`

- Return:
`return(i + 2)`

Integers are “data.”

Having first-class functions means that *functions are data too*.

```
chain_twice <- function(f, x) {
return( f(f(x)) )
}
incr <- function(x) { return( x + 1 ) }
chain_twice(incr, 10) #=> 12
incr_by <- function(increment) {
return( function(x) { x + increment } )
}
incr_by(10) #=> function(x) { x + 10 }
incr_by(10)(10) #=> 20
i12 <- incr_by(12)
i17 <- incr_by(17)
i12(10) #=>22
i17(10) #=>27
chain_twice(incr_by(10), 10) #=> 30
apply(M, ACROSS_COLS,
function(x) { max(x[!is.na(x) && x != 999]) })
pairwise.distances(X, metric=function(x, y) { max(abs(x - y)) })
subtract.mean <- function(x)
{
xbar <- mean(x)
return( function(y) { return(y - xbar) } )
}
```

```
def call_twice(f, x):
return f(f(x))
call_twice(lambda x: x + 1, 10) #=> 12
```

```
(defn chain-twice
"Compose a function with itself"
[f]
(comp f f))
(def incr2 (chain-twice inc))
(incr2 10) ;=> 12
(defn incr-by [increment]
(fn [x] (+ x increment)))
(def incr-by-10 (incr-by 10))
(def incr-by-20 (chain-twice (incr-by 10)))
(incr-by-10 10) ;=> 20
(incr-by-20 10) ;=> 30
```

```
def call_twice(x, &f)
f.call(f.call(x))
end
call_twice(10) {|x| x + 1} #=> 12
```

These examples illustrate at least three use cases:

- Succinct ways to write and pass functions (like
`lambda`

) - Parameterized strategies: pass functions to other functions to tell them what to do
- Dynamically-defined operations: define a new function based on parameters and data

Notice that functions can be created on the fly *even without names*,
what are called **anonymous functions**:

```
integrate(function(x) { x*x }, 0, 1)
#=> 0.3333333 with absolute error < 3.7e-15
```

`integrate(lambda x: x*x, 0, 1)`

Another example, in JavaScript:

```
// Abstracting Array Iteration
function forEach(array, itemAction) {
for ( var i = 0; i < array.length; i++ ) {
itemAction(array[i])
}
}
forEach(["R", "SAS", "SPSS"], console.log);
forEach(["R", "SAS", "SPSS"], store);
forEach(["R", "SAS", "SPSS"], function(x) {myObject.add(x)});
```

### Pure Functions (where possible) #

A function is **pure** if it:

- always returns the same value when you pass it the same arguments
- has no
*observable*side effects

Question: what are side effects?

- Changing global variables
- Printing out results
- Modifying a database
- Editing a file
- …anything else that changes the rest of the world outside the function.

Pure functions are deterministic and mathematically well-defined. They are easy to test, to reason about, to change, and to compose.

Because pure functions do not modify any global state, they can be run simultaneously, allowing parallel or concurrent processing. Or they can be evaluated lazily, only calculating answers when needed.

Examples:

```
pure <- function(x) {
return( sin(x) )
}
global.state <- 10
not.pure <- function(x) {
return( x + global.state )
}
also.not.pure <- function(x) {
return( x + rnorm(1) )
}
another.not.pure <- function(x) {
save_to_file(x, "storex.txt")
return( x + rnorm(1) )
}
u <- 10
and.again <- function(x) {
...
print(...) # Input/Output
z <- rnorm(n) # Changing internal state
my.list$foo <- mean(x) # Mutating objects' state
u <<- u + 1 # Changing out-of-scope values
...
}
```

And equally subtle:

```
def foo(a):
a[0] = -1
```

When you see a call `foo(x)`

for an array `x`

, can you tell what happens?

### Immutable Data #

This is a common pattern in imperative programming:

```
a = initial value
for index in IndexSet:
a[index] = update based on index, a[index], ....
return a
```

Each step in the loop updates – or *mutates* – the state
of the object.

This is so familiar that we don’t really think about it, but it is hard to think about your code when all of your data can change between many different states.

Mutation introduces a greater dependence on time and order in operations. This makes it harder to parallelize or do lazy computation.

**Immutable data structures** do not change once created.

- We can pass the data to anywhere (even simultaneously), knowing that it will maintain its meaning.
- We can maintain the history of objects as they transform.
- With pure functions and immutable data, many calculations can be left to when they are needed (laziness).

FP favors *immutable data*, and some FP languages have no
(or severely limited) notions of assignment.

Immutability changes how you think about arranging your computation. We produce new values instead of mutating previous ones:

```
for ( i in 1:n ) {
a[i] <- a[i] + 1
}
```

versus

`Map(function(x) { x + 1 }, a)`

or

`(map inc a)`

Favoring immutable data and pure functions makes values and transformations, rather than actions, the key ingredient of programs. So FP prefers expressions over statements:

```
## Statements:
cleaned <- function(x) {
if (is_valid(x)) {
return(x)
} else {
return(cleanup(x))
}
}
## Expressions:
cleaned <- function(x) {
if (is_valid(x)) {
x
} else {
cleanup(x)
}
}
```

```
(defn cleaned [x]
(if (valid? x) x (cleanup x)))
```

### Closures #

Closures are functions with an environment – variables and data – attached. The environment is persistent, private, and hidden. This is a powerful approach for associating state with functions that you pass into other functions. (In fact, an entire OOP system could be built from closures.)

```
counter <- function(start=0, inc=1) {
value <- start
return(function() {
current <- value
value <<- value + inc
return(current)
})
}
cc <- counter(10, 2)
dd <- counter(10, 2)
cc() # 10
cc() # 12
cc() # 14...
value # Error: object 'value' not found
dd() # 10
```

```
(defn counter
([] (counter 0 1))
([start] (counter start 1))
([start incr]
(let [counter-value (atom start)]
(fn [] (swap! counter-value + incr)))))
(def counter1 (counter))
(def counter2 (counter 10 5))
(def counter3 (counter 2))
(counter1) ;=> 1
(counter1) ;=> 2
(counter1) ;=> 3
(counter2) ;=> 15
(counter2) ;=> 20
(counter2) ;=> 25
(counter3) ;=> 3
(counter3) ;=> 4
(counter3) ;=> 5
```

Only the anonymous function returned by `counter`

can access that
internal state: it is private and unique to each instance.

### Laziness (sometimes) #

**Lazy evaluation** (or laziness for short) means that expressions
are not evaluated *until their results are needed*. This is a feature of
certain programming languages, like Haskell, and is made easier by functional
programming.

```
fib = 0 : 1 : zipWith (+) fib (tail fib)
take 10 fib -- => [0,1,1,2,3,5,8,13,21,34]
```

```
(def fib (cons 0 (cons 1 (lazy-seq (map + fib (rest fib))))))
(take 10 fib) ;=> (0 1 1 2 3 5 8 13 21 34)
```

### Composition and Connection of Functions #

Programming languages or packages often provide ways to thread data through multiple functions to clearly express the flow of operations:

```
tokenize <- function(line) {
line %>%
str_extract_all("([A-Za-z][-A-Za-z]+)") %>%
unlist %>%
sapply(tolower) %>%
as.character
}
## much clearer than:
## as.character(sapply(tolower, unlist(str_extract_all("([A-Z]...)", line))))
```

```
(defn tokenize [line]
(->> line
(re-seq "([A-Za-z][-A-Za-z]+)")
(map lower-case)))
```

## Map, Reduce, and Filter #

Let’s look at a few key higher-order functions – functions that take functions as arguments – we can use for many things.

### Map #

The `map`

operation operation takes a function and a collection and calls the
function for each successive element of the collection, producing a new
collection out of the results. For example, in R:

```
square <- function(x) x^2
Map(square, 1:10) #=> list(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
```

or in Python:

`list(map(lambda x: x*x, range(1, 11))) #=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]`

using an *anonymous* function, declared with the `lambda`

keyword, that has no
name.

`map`

can usually do more than this. In most languages, it can take *multiple
sequences* and a function that takes multiple arguments. It then calls the
function with successive elements from each sequence as its arguments,
producing a sequence of the results. For example, in R

`Map(`-`, 1:10, 10:1) #=> list(-9, -7, -5, -3, -1, 1, 3, 5, 7, 9)`

and in Python

```
list(map(lambda x, y: x - y, range(1, 11), range(10, 0, -1)))
#=> [-9, -7, -5, -3, -1, 1, 3, 5, 7, 9]
```

We can use `map`

to replace a common pattern we see over and over in code
processing data:

```
transformed_data <- c()
for (ii in 1:length(data)) {
transformed_data <- c(transformed_data, do_stuff_to(data[ii]))
}
## becomes
transformed_data <- Map(do_stuff_to, data)
```

The `map`

operation expresses our meaning more clearly and concisely, and is
more efficient to boot (since it does not repeatedly copy the
`transformed_data`

).

### Filter #

The `filter`

operation takes a predicate (a function returning a boolean) and a
collection. It calls the predicate for each element, and returns a new
collection containing only those elements for which the predicate returns
true. For example, in R:

```
is_odd <- function(x) { (x %% 2 != 0) }
Filter(is_odd, 1:10) #=> c(1, 3, 5, 7, 9)
Filter(is_odd, as.list(1:10)) #=> list(1, 3, 5, 7, 9)
```

Notice that each result is the same type as the collection `Filter`

was given.
In Python,

`filter(lambda x: x % 2 != 0, range(1, 11)) #=> [1, 3, 5, 7, 9]`

We can always combine a `map`

with a `filter`

, applying a function to only those
elements matching a predicate. The composability of these operations is a
major advantage, and much easier to deal with than building complicated loops
over data manually.

### Reduce #

The `reduce`

operation, also sometimes called `fold`

, is a much more general way
to process the elements of a sequence. We could use it to build `map`

, `filter`

,
or many other interesting operations.

`reduce`

takes as arguments a function, an *accumulator*, and a sequence. The
function takes two arguments – the accumulator and one element of the
sequence – and returns an updated accumulator. The function is first passed
with the initial accumulator and the first element of the sequence, returning
a new accumulator which is passed to the function again with the second
element of the sequence, and so on. The final value of the accumulator is
returned by `reduce`

.

For example, suppose we want to add up the numbers in a list, but keep separate sums of the odd numbers and the even numbers. In R:

```
parity_sum <- function(accum, element) {
if ( element %% 2 == 0 ) {
list(accum[[1]] + element, accum[[2]])
} else {
list(accum[[1]], accum[[2]] + element)
}
}
Reduce(parity_sum, 1:10, list(0,0)) #=> list(30, 25)
```

In Python:

```
def parity_sum(acc, x):
even, odd = acc
if x % 2 == 0:
return [even + x, odd]
else:
return [even, odd + x]
reduce(parity_sum, range(1,11), [0,0]) #=> [30, 25]
```

## Activities #

Let’s do some activities to get used to the idea of using higher-order
functions like `map`

, `filter`

, and `reduce`

, and functions that return functions.

You can work through these activities (or some subset) in any order that interests you. Work in groups of two or three.

**However**, you may **not** use loops of any form: no `for`

or `while`

loops can appear
anywhere in your code. Anywhere you’d want to loop over a sequence, use `map`

,
`filter`

, or `reduce`

instead.

Write a few tests for each function. These activities are best done in a language with first-class functions, like R, Python, any Lisp variant, or Haskell. They are possible in C++ and Java, but not nearly as convenient or simple.

Part of this activity is also the `no-loops`

task in the problem bank – you can
clean up your solutions and submit them as an assignment! (You must submit
your own solution, not copy from your group.)

Tasks:

- Write a function that sums a list (or vector) of numbers.
- Write a function that takes a list/vector of integers and returns a list
containing
*only*those elements for which the*preceding*integer is negative. Write a function that takes a string and returns true if all square

`[ ]`

, round`( )`

, and curly`{ }`

delimiters are properly paired and legally nested, or returns false otherwise. Other characters should be ignored.For example,

`[(a)]{[b]}`

is legally nested, but`{a}([b)]`

is not.*Hint:*What data structure can you use to track the delimiters you’ve seen so far?Write a function

`roman`

that parses a Roman numeral string and returns the number it represents. You can assume that the input is well-formed, in upper case, and adheres to the “subtractive principle”. You need only handle positive integers up to MMMCMXCIX (3999), the largest number representable with ordinary letters.A reference on Roman numerals, including the “subtractive principle”: http://www.numericana.com/answer/roman.htm#valid

Be sure to test carefully – Roman numerals are tricky.

Write a function

`chain`

that takes as its argument a list of functions, and returns a new function that applies each function in turn to its argument.For example,

`import math positive_root = chain([abs, math.sqrt]) positive_root(-4) #=> 2.0`

- Write a function
`partial`

that takes a function and several arguments and returns a function that takes additional arguments and calls the original function with all the arguments in order. For example,

`foo <- function(x, y, z) { x + y + z } bar <- partial(foo, 2) bar(3, 4) #=> 2 + 3 + 4 = 9`

- Write a function
Write a function

`memoize`

that takes a function as an argument and returns a new function the*memoizes*inputs: for every input it receives, it calls the original function, gets its output, stores the input and output in a data structure, and then returns the output. If called again with the same input, it simply retrieves the output from the data structure and returns it,*without*having to call the function.This is a useful trick if you have a function that takes a lot of time to run, returns the same answer every time it’s called with the same input, and that you expect will be called many times with repeated inputs.

(This is much easier in Python than it is in R, for reasons related to the data structure you’ll use to store the input and output.)

Write a function

`count-repeats`

(or`count_repeats`

or`countRepeats`

as you like) and a function`run-length`

, that, respectively, takes a sequence of strings and returns the number of strings that are repeated at least twice in a row (at successive indices) and takes a sequence of strings and returns the maximum number of consecutive indices for which a string is repeated. So, for`["h" "h" "t" "h" "t" "t" "t"]`

,`count-repeats`

would return 2 and`run-length`

would return 3. Each of these functions can be implemented directly using a common third function that you write.Write a function

`mavg`

that takes a vector of numbers, a window size, and an optional vector of weights equal in length to the window size, and returns a vector of (backward) moving averages, using the weights in the window or the reciprocal of the window size if no weights are supplied. The ith element of the result is the weighted average (using the weights) of the data at indices i, i+1, …, i+window_size-1. Note that the result is shorter than the data vector by window size.Thus,

`[1, 2, 3, 4, 5, 6],`

`3`

, and`[1 2 1]`

would return the vector`[1/5 + 2*2/5 + 3/5, 2/5 + 2*3/5 + 4/5, ..., 4/5 + 2*5/5 + 6/5].`

Write a function

`reverse_map`

(or`reverse-map`

or`reverseMap`

as you like) that takes a hash table and returns the “reversed” hash table. The keys of the reversed hash table are the values of the original hash table, and the values of the reversed hash table are either the keys of the original, or when there are multiple keys with the same value, a list of such keys.Write a function

`nnk`

that takes a collection of p-dimensional vectors (as a data matrix, data frame, list of vectors as you prefer), a target point, and a positive integer`k`

, and returns the`k`

nearest neighbors among the data to the target point.

## Appendix: A Brief Clojure Primer #

### Why Clojure? #

- A powerful and elegantly designed language with a simple core and extensive libraries.
- Immutability, concurrency, rich core data types
- Same language runs on the JVM and the browser
- A significant
*data-science*footprint - Cutting-edge ideas: spec, transducers, metadata, async, …
- All the power of lisp: macros, REPL-driven development, single and multiple dispatch, destructuring, …
- Functional design, testing, rapid development
- Great community

### Simple Syntax #

A clojure form (expression) is either:

- A
*literal*piece of data - Numbers 42, -1.23, 8/7 (rational), BigInteger, …
- Boolean Values
`true`

and`false`

- Null value
`nil`

- Strings
`"foo bar"`

and characters`\c`

- Symbols
`'foo`

or Keywords`:bar`

- Regular expressions
`#"[A-Z][a-z]*"`

- Vectors
`[1 2 3]`

- Sets
`#{"dog" "cat" "wolverine"}`

- Maps
`{:name "Col. Mustard"`

`:weapon "candlestick"`

`:room "Drawing Room"}`

- Lists
`'(1 2 3)`

Evaluated symbols:

`x`

,`even?`

,`upper-bound`

Function call

`(+ 1 2 3)`

`(f "input", :option 1)`

Whitespace (including ‘,’) and comments are ignored.

Everything is an expression, which has a value, e.g.:

`(if true 100 -100)`

has the value 100.Functions are first class objects, and some literal objects act like functions too.

### Simple Operations #

#### Functions #

`(defn f "Documentation here" [arguments here] (body arguments here)) (f "called" "this way") (defn f2 "This function has more than one arity" ([] "no arguments value") ([one-argument] (+ one-argument 10)) ([two arguments] [two arguments]) ([two plus & more] {:first two :second plus :rest more})) (f2) ;=> "no arguments value" (f2 32) ;=> 42 (f2 "these all" "work") ;=> ["these all", "work"] (f2 :a :b :c :d :e :f) ;=> {:first :a, :second :b, ; :rest [:c :d :e :f]} ;; Anonymous function and a shortcut (fn [x y] (+ x y)) ; these are equivalent #(+ %1 %2) (#(+ %1 %2) 10 6) ;=> 16 ;; Closures (let [env 10] (defn g [x] (+ env x))) (g 20) ;=> 30`

#### Special forms #

Binding values

```
(let [x 10, ;x, y, and z are immutable
y 20
z 30]
(+ x y z 4)) ;=> 64
; x, y, z are not bound out here
(let [simple-mutable (atom 0)] ; an atom is thread safe
(swap! simple-mutable inc)) ;=> 1
(def answer 42) ; globally bound vars
(def sums {[1 2 3] 6, [10 22] 32, [] 0})
(def v [1 1 3 5 8])
(def a-map {:a 1 :b 2 :c 3})
(def a-set #{:foo :bar :zap}) ; elements are unique
;; note any values can be keys for a map
```

Conditionals

```
(if true 1 0) ;=> 1
(if false 1 0) ;=> 0
(if nil 1 0) ;=> 0
;; only false and nil are falsy, all other values truthy
(if "" 1 0) ;=> 1
(if 0 1 0) ;=> 1
(if [] 1 0) ;=> 1
;; Functions are values too
((if true + -) 4 2) ;=> 6
((if false + -) 4 2) ;=> 2
(when true
(println "Hello, world"))
```

Comparisons

```
(= 10 20) ;=> false
(= [:a :b :c] [:a :b :c]) ;=> true
(< 10 20) ;=> true
(<= 11 11) ;=> true
(or (= nil nil) (= 3 4)) ;=> true
(and (= nil nil) (= 3 4)) ;=> false
(and (= nil nil) (= 4 4)) ;=> true
(not false) ;=> true
```

Loops

```
(loop [step 0]
(println (str "This is step " step "."))
(if (> step 9)
(println "Done.")
(recur (inc step))))
```

What does this print?

#### Accessing data #

```
(get sums [1 2 3]) ;=> 6
(sums [1 2 3]) ;=> 6 the map acts like a function
(get sums [2 2] :missing) ;=> :missing (default value)
(get a-map :b) ;=> 2
(a-map :b) ;=> 2
(:b a-map) ;=> 2 keywords also act like a function
(get v 2) ;=> 3
(nth v 2) ;=> 3
(nth v 10) ;=> EXCEPTION
(nth v 10 42) ;=> 42
(get v 10) ;=> nil
```

#### Adding to collections #

```
(conj v 13) ;=> [1 1 3 5 8 13]
v ;=> [1 1 3 5 8] (v is immutable)
(conj a-set :ahhh) ;=> #{:foo :bar :zap :ahhh}
a-set ;=> #{:foo :bar :zap} structure shared
(assoc a-map :d 4) ;=> {:a 1, :b 2, :c 3, :d 4}
a-map ;=> {:a 1, :b 2, :c 3} structure shared
```

#### Destructuring #

We can bind names to values within structures
with **destructuring**:

a, b = f(12) # f returns a tuple (12, 13); a = 12, b = 13

```
(let [[x y] [1 2 3]]
(vector x y)) ;=> [1 2]
(let [[x y & more] [1 2 3 4 5]]
(vector x y more)) ;=> [1 2 '(3 4 5)]
(let [{:keys [a b c]} {:a 1 :b 2 :c 3}]
[a b c]) ;=> [1 2 3]
(defn f [[x y] {:keys [a b c]}]
[x y a b c])
(f [1 2] {:a 10 :b 20 :c 30}) ;=> [1 2 10 20 30]
(defn g [x y & more-args]
(+ 4 x y (first more-args)))
(g 10 20 30 40 50 60) ;=> 64
(apply g 10 [20 30 40 50 60]) ;=> 64
;; much more is possible
```

### Namespaces and libraries #

All code is defined within a **namespace**
that controls access to each symbol.

This lets library code be loaded without stepping on other code.

For example: all the code I’m executing now
is in the `user`

namespace. There are mechanisms
for importing symbols from other namespaces,
which is how you work with libraries.

There are also ways to access features of the host platform (e.g., the JVM or the javascript environment).

This is beyond our goals for today

## Appendix: Statistical Examples with Clojure #

### Means and Variances #

We can compute simple means using the recurrence \[ \bar x_{n+1} = \bar x_n + K_n (x_{n+1} - \bar x_n), \] where \(K_n = 1/(n + 1)\). The value \(K_n\) is a “gain” parameter that applies to the “residual” at the next step.

```
(defn simple-update [[xbar n] x]
(let [n' (+ n 1)
K (/ n')]
[(+ xbar (* K (- x xbar))), n']))
(def data (range 5))
(->> data
(reduce simple-update [0.0 0])
first)
```

This same gain idea works with weighted averages using the same recurrence with where \(K_n = w_{n+1}/\sum_{i=1}^{n+1} w_i\).

```
(defn weighted-update [[xbar wsum] [x w]]
(let [wsum' (+ wsum w)
K (/ w wsum')]
[(+ xbar (* K (- x xbar))), wsum']))
(def data (range 5))
(def weights [1.0 2.0 3.0 4.0 5.0])
(->> (map vector data weights)
(reduce weighted-update [0.0 0])
first)
```

Alternatively, we could just keep track of the components

```
(defn weighted-update-nodiv [[x-dot-w wsum] [x w]]
[(+ x-dot-w (* x w)), (+ wsum w)])
(->> (map vector data weights)
(reduce weighted-update-nodiv [0.0 0])
(apply /))
```

but the gain form will come in handy soon.

We can do similar things with variance

```
(defn welford-variance
"Updating function for Welford variance computation
0-arity version gives initializer, 2-arity updater."
([] {:S 0.0 :xbar 0.0 :wsum 0.0})
([{:keys [S xbar wsum]} [x w]]
(let [wsum' (+ wsum w)
K (/ w wsum')
xbar' (+ xbar (* K (- x xbar)))]
{:S (+ S (* w (- x xbar) (- x xbar')))
:xbar xbar'
:wsum wsum'})))
(let [extract-parts (juxt :S (comp dec :wsum))]
(->> (map vector data (repeat 1))
(reduce welford-variance (welford-variance))
extract-parts
(apply /)))
```

### Least Squares #

Consider a basic homoskedastic regression model \[ y = X \beta + \sigma^2 \epsilon, \] where \(X\) is \(n \times p\), \(\beta\) is \(p \times 1\), and \(\epsilon\) is mean 0, unit variance noise.

We can compute \(\hat\beta = (X^T X)^{-1} X^T y\)
directly with various methods, but it is useful
to think of this *sequentially*. With the help
of the Woodbury formula for one step updates of
an inverse, we have:

```
(defn as-scalar [mat]
(select mat 0 0))
(defn %*% [a b]
(let [M (mmul a b)]
(if (every? #(= % 1) (shape M)) (as-scalar M) M)))
(defn least-squares-update [sigma-squared p scale]
(fn
([] {:beta-hat (new-matrix p 1)
:V (mul (identity-matrix p) sigma-squared scale)})
([{:keys [beta-hat V]} [x y]]
(let [VxT (mmul V (transpose x))
D (+ sigma-squared (%*% x VxT))
K (div VxT D)
residual (- y (%*% x beta-hat))]
{:beta-hat (add beta-hat (mul K residual))
:V (sub V (mul (mmul K (transpose K)) D))}))))
(def lsdata
"Sequential regression data, each element of which
contains a row of X and the corresponding y."
[[[[1.0 0.0 0.0 0.0]] -2.28442]
[[[1.0 1.0 1.0 1.0]] -4.83168]
[[[1.0 -1.0 1.0 -1.0]] -10.46010]
[[[1.0 -2.0 4.0 -8.0]] 1.40488]
[[[1.0 2.0 4.0 8.0]] -40.80790]])
(def updater (least-squares-update 1.0 4 1000.0))
(->> lsdata
(reduce updater (updater))
:beta-hat)
```

This is regression analysis in a functional style, with a few simple lines

### The Kalman Filter #

Our sequential least squares actually solves another problem: finding the minimum variance predictor for a linear dynamical system driven by a stochastic process.

Consider state vectors \(x_t\) and observation vectors \(y_t\) that evolve as follows:

\[\begin{aligned} x_{t+1} &= A x_t + \delta_t \cr y_{t+1} &= C x_t + \epsilon_t, \end{aligned}\]

where \(\delta_t\) and \(\epsilon_t\) are each, say, iid
Normal, mean 0 noise with fixed covariance. (These are
called the *state noise* and *measurement noise,*
respectively.)

If \(\hat x_{t+1}\) (and \(\hat y_{t+1}\)) is the best predictor of \(x_{t+1}\) (and \(y_{t+1}\)) given observations up to time \(t\), then

\[ \hat x_{t+1} = A \hat x_t + K_t (y_t - \hat y_t), \]

which is of the same form we just computed.

This gives us a functional implementation of
the **Kalman Filter**.

### Markov Chain Monte Carlo #

**Markov Chain Monte Carlo (MCMC)** is a simulation method
where we create a Markov chain whose *limiting distribution*
is a distribution from which we want to sample.

```
(defn mcmc-step
"Make one step in MH chain, choosing random move."
[state moves]
(let [move (random-choice moves)]
(metropolis-hastings-step (move state))))
(defn mcmc-sample
"Generate an MCMC sample from an initial state.
Returns a lazy sequence.
Keyword arguments:
:move -- a collection of moves, which are
functions from state to a candidate
:select -- a selector function which indicates
a boolean for each index and state
if that state should be kept in the
output
:extract -- a function to extract features
(e.g., parameters) from the state
"
[initial-state & {:keys [moves select extract]
:or {select (constantly true),
extract identity}}]
(letfn [(stepper [index state]
(lazy-seq
(if (select index state)
(cons (extract state)
(stepper (inc index) (mcmc-step state moves)))
(stepper (inc index) (mcmc-step state moves)))))]
(stepper 0 initial-state)))
(def chain
(mcmc-sample initial-state
:select (mcmc-select :burn-in 1000 :skip 5)
:extract :theta1
:moves [(univariate-move :theta1 random-walk 0.1)
(univariate-move :theta2 random-walk 0.4)
(univariate-move :theta3 random-walk 0.2)]))
(take 100 chain)
```

The design of this chain is modular, and easily adaptable to a wide variety of models.