Functional Thinking (Week 3 Tuesday)

— Christopher Genovese and Alex Reinhart

Announcements #

  • Checking status
  • Office hours: Wed 3

Plan for Today #

  • Core Concepts of Functional Programming
  • Higher-Order Functions
  • A language for Types
  • Data-oriented Functional Programming: A Preview
  • Activity

Paradigm Shift #

A programming paradigm is a framework of organizing concepts and abstractions that guide how we design and build software and how we think about solving problems.

Two familiar and commonly used paradigms are:

object-oriented programming
the class is the fundamental unit of abstraction - data and code are wrapped together, encapsulated, on objects that represent the nouns of our system.
imperative programming
a program is a sequence of commands and actions for the computer to perform that successively update program state. An imperative program describes how to perform a task.

These two paradigms often overlap in practice and are so familiar that alternatives can be hard to see/

Today, we will start a dive into related paradigms – functional programming, data-oriented programming, and declarative programming – that offer powerful benefits to our thinking and practices.

First Class Entities #

An entity is first class when it can be:

  • created at run-time,
  • passed as a parameter to a function,
  • returned from a function, or
  • assigned into a variable.

A first class entity typically has built-in support within a language and runtime environment.

For instance, (fixed-width) integers are first-class entities in all common languages.

  • Created: 101
  • Passed: max(11, -4)
  • Returned: return 7
  • Assigned: i = 17

Similarly with floating-point numbers, booleans, strings, and arrays. Many languages have more specialized first-class entities:

  • Arbitrary precision numbers (Clojure, Mathematica, Java)
  • Hash Tables/Associative Arrays/Tables (Python, Clojure, Perl, Ruby, Lua)
  • Raw Memory Locations (C, C++)
  • Classes/Objects (Python, Ruby, C++, Java, \(\ldots\))
  • Types (Idris)
  • Unevaluated Code (Common Lisp, Racket, Scheme, Clojure, Rust, R)
  • Functions (Python, R, Haskell, Clojure, OCaml, Rust, JavaScript, \(\ldots\))

The first-class entities in a language shape a languages central idioms and abstractions.

Core Concepts of Functional Programming #

Composability is a fundamental aspiration in software design. We want the parts of our system to interoperate and combine as freely as possible. This offers power, expressiveness, reusability, and many other benefits.

In functional programming (FP), functions are the fundamental unit of abstraction providing simple, easily composed building blocks for solving problems.

Central features of FP:

  • Functions are first-class entities
  • Functions are pure (whenever possible)
  • Data is immutable (whenever possible)
  • Programs have declarative structure (as much as possible)
  • Exploit laziness when appropriate
  • Referential transparency

Why use functional programming?

  • Functions are simple, easily composed abstractions
  • Easy to express many ideas with very little code
  • Easier to reason about – and test – a functional program
  • Avoids the significant complexity of mutating state
  • Effective Concurrency/Parallelism by avoiding mutable state
  • Focuses on data flow
  • Efficiently exploits recursive thinking
  • Fast development cycle
  • Encourages code reuse

Languages designed for FP: Clojure, Haskell, OCaml, Scala, Elm

Languages with great FP support: Rust, R, Common Lisp, Julia, Ruby, JavaScript

Reluctant FP: Python

Stodgy languages adding FP features: C++, Java

First-Class Functions #

Having first-class functions means that functions are data too.

Some use cases:

  1. Parameterized strategies
  2. Generic operations - abstracting operations across a range of types
  3. Dynamically-defined operations based on parameters and data
  4. Combine multiple pieces into useful aggregate functions
apply(M, ACROSS_COLS, function(x) { max(x[!is.na(x) && x != 999]) })

pairwise.distances(X, metric = function(x, y) { max(abs(x - y)) })
integrate(lambda x: x*x, 0, 1)
// Abstracting Array Iteration
function forEach(array, itemAction) {
    for ( var i = 0; i < array.length; i++ ) {
        itemAction(array[i])
    }
}
forEach(["R", "SAS", "SPSS"], console.log);
forEach(["R", "SAS", "SPSS"], store);
forEach(["R", "SAS", "SPSS"], function(x) {myObject.add(x)});
def fwd_solver(f, z_init):
  "Fixed-point solver by forward iteration"
  z_prev, z = z_init, f(z_init)
  while np.linalg.norm(z_prev - z) > 1e-5:
    z_prev, z = z, f(z)
  return z

def newton_solver(f, z_init):
  "Newton iteration fixed-point solver"
  f_root = lambda z: f(z) - z
  g = lambda z: z - np.linalg.solve(jax.jacobian(f_root)(z), f_root(z))
  return fwd_solver(g, z_init)
;; *Markov Chain Monte Carlo (MCMC)* is a simulation method
;; where we create a Markov chain whose /limiting distribution/
;; is a distribution from which we want to sample.

(defn mcmc-step
  "Make one step in MH chain, choosing random move."
  [state moves]
  (let [move (random-choice moves)]
    (metropolis-hastings-step (move state))))

(def chain
  (mcmc-sample initial-state
               :select (mcmc-select :burn-in 1000 :skip 5)
               :extract :theta1
               :moves [(univariate-move :theta1 random-walk 0.1)
                       (univariate-move :theta2 random-walk 0.4)
                       (univariate-move :theta3 random-walk 0.2)]))
(take 100 chain)

Pure Functions #

What does this code do?

x = [5]
process( x )
x[0] = x[0] + 1

A function is pure if it:

  • always returns the same value when you pass it the same arguments
  • has no observable side effects

What are “side effects”?

  • Changing global variables
  • Printing out results
  • Modifying a database
  • Editing a file
  • Generating a random number
  • …anything else that changes the rest of the world outside the function.

Examples:

pure <- function(x) {
    return( sin(x) )
}

global.state <- 10

not.pure <- function(x) {
    return( x + global.state )
}

also.not.pure <- function(x) {
    return( x + rnorm(1) )
}

another.not.pure <- function(x) {
    save_to_file(x, "storex.txt")
    return( x + rnorm(1) )
}

u <- 10
and.again <- function(x) {
    ...
    print(...)              # Input/Output
    z <- rnorm(n)           # Changing internal state
    my.list$foo <- mean(x)  # Mutating objects' state
    u <<- u + 1             # Changing out-of-scope values
    ...
}
def foo(a):
    a[0] = -1

    return sum(a)

x = [1, 2, 3]

sum_of_x = foo(x)

x  #=> [-1, 2, 3]

Benefits of pure functions

  1. deterministic and mathematically well-defined
  2. easy to reason about
  3. easy to test
  4. easy to change
  5. can be composed and reused
  6. can be run simultaneously in concurrent processes
  7. can be evaluated lazily

Calculations, Actions, Data #

Actions
results depend on when they are called, how many times they are called, or context in which they are called
Calculations
operations results that do not affect the world when they run, processing inputs to output directly, independent of context or timing
Data
Facts about events, a record of something that happened, encodes meaning in structure

Useful to practice distinguishing among these three categories with your code.

Prefer calculations (pure functions) where possible and keep a clear delineation.

Immutable Data #

This is a common pattern in imperative programming:

a = initial value
for index in IndexSet:
    a[index] = update based on index, a[index], ....
return a

Each step in the loop updates – or mutates – the state of the object.

The familiarity of this operation obscures the tremendous increase in complexity it produces. Mutability couples different parts of your code and across time. Greatly complicates parallism and concurrency.

Immutable (persistent) data structures do not change once created.

  • We can pass the data to anywhere (even simultaneously), knowing that it will maintain its meaning.
  • We can maintain the history of objects as they transform.
  • With pure functions and immutable data, many calculations can be left to when they are needed (laziness).

These data structures achieve this with good performance by clever structure sharing. (Eg: pyrsistent library in Python, immer and immutable.js in JavaScript, built-in in Clojure, rstackdeque in R, \(\ldots\))

Favoring immutable data and pure functions makes values and transformations, rather than actions, the key ingredient of programs. So FP prefers expressions over statements

(foo (if (pred x) x (transform x)) y)
if pred(x):
    y = x
else:
    y = transform(x)
foo(y)

Declarative Structure #

A declarative program tells us what the program produces not how to produce it.

qsort []	= []
qsort l@(x:xs) = qsort small ++ mid ++ qsort large
  where
    small = [y | y <- xs, y < x]
    mid	  = [y | y <- l, y == x]
    large = [y | y <- xs, y > x]

Programming languages or packages often provide ways to thread data through multiple functions to clearly express the flow of operations:

tokenize <- function(line) {
    line %>%
        str_extract_all("([A-Za-z][-A-Za-z]+)") %>%
        unlist %>%
        sapply(tolower) %>%
        as.character
}

## much clearer than:
## as.character(sapply(tolower, unlist(str_extract_all("([A-Z]...)", line))))
(defn tokenize [line]
  (->> line
       (re-seq "([A-Za-z][-A-Za-z]+)")
       (map lower-case)))

Laziness (sometimes) #

Laziness refers to only computing a result when it is needed. This can be a powerful optimization, though it is not always desired. The opposite of lazy here is eager (or strict).

min = hd(qsort list)   -- Laziness makes this O(n) not O(n ln n)
fibs1 = 0 : 1 : zipWith (+) fibs1 (tail fibs1)
fibs2 = 0 : 1 : [ a+b | a <- fibs2 | b <- tail fibs2 ]

take 10 fibs1   -- prodcues [0,1,1,2,3,5,8,13,21,34]
(defn mcmc-sample
  "Generates an MCMC sample from an initial state.
   Returns a lazy sequence.

   Keyword arguments:

     :move    -- a collection of moves, which are
                 functions from state to a candidate
     :select  -- a selector function which indicates
                 a boolean for each index and state
                 if that state should be kept in the
                 output
     :extract -- a function to extract features
                 (e.g., parameters) from the state
  "
  [initial-state & {:keys [moves select extract]
                    :or {select (constantly true),
                         extract identity}}]
  (letfn [(stepper [index state]
            (lazy-seq
             (if (select index state)
               (cons (extract state)
                     (stepper (inc index) (mcmc-step state moves)))
               (stepper (inc index) (mcmc-step state moves)))))]
    (stepper 0 initial-state)))

Closures #

Closures are functions with an environment – variables and data – attached. The environment is persistent, private, and hidden. This is a powerful approach for associating state with functions that you pass into other functions. (In fact, an entire OOP system could be built from closures.)

counter <- function(start=0, inc=1) {
    value <- start

    return(function() {
        current <- value
        value <<- value + inc
        return(current)
    })
}

cc <- counter(10, 2)
dd <- counter(10, 2)

cc() # 10
cc() # 12
cc() # 14...
value  # Error: object 'value' not found
dd() # 10

kernel_density <- function(data, bandwidth) {
    return(function(x) {
        ## calculate the density here
        for (row in 1:nrow(data)) {
             ## calculate some stuff
        }
        return(density)
    })
}

d <- kernel_density(data, 4)

d(3) #=> returns density at 3

Only the anonymous function returned by counter can access that internal state: it is private and unique to each instance.

Higher-Order Functions #

Higher-order functions are functions that take other functions as arguments.

Three commonly used: map, filter, and reduce

Map #

The map operation takes a function and a collection and calls the function for each successive element of the collection, producing a new collection out of the results. For example, in R:

square <- function(x) x^2
Map(square, 1:10)   #=>  list(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)

or in Python:

list(map(lambda x: x*x, range(1, 11)))  #=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

using an anonymous function, declared with the lambda keyword, that has no name.

map can usually do more than this. In most languages, it can take multiple sequences and a function that takes multiple arguments. It then calls the function with successive elements from each sequence as its arguments, producing a sequence of the results. For example, in R

Map(`-`, 1:10, 10:1)   #=>  list(-9, -7, -5, -3, -1, 1, 3, 5, 7, 9)

and in Python

list(map(lambda x, y: x - y, range(1, 11), range(10, 0, -1)))
   #=> [-9, -7, -5, -3, -1, 1, 3, 5, 7, 9]

We can use map to replace a common pattern we see over and over in code processing data:

transformed_data <- c()

for (ii in 1:length(data)) {
    transformed_data <- c(transformed_data, do_stuff_to(data[ii]))
}

## becomes

transformed_data <- Map(do_stuff_to, data)

The map operation expresses our meaning more clearly and concisely, and is more efficient to boot (since it does not repeatedly copy the transformed_data).

Filter #

The filter operation takes a predicate (a function returning a boolean) and a collection. It calls the predicate for each element, and returns a new collection containing only those elements for which the predicate returns true. For example, in R:

is_odd <- function(x) { (x %% 2 != 0) }

Filter(is_odd, 1:10) #=>  c(1, 3, 5, 7, 9)
Filter(is_odd, as.list(1:10)) #=>  list(1, 3, 5, 7, 9)

Notice that each result is the same type as the collection Filter was given. In Python,

filter(lambda x: x % 2 != 0, range(1, 11)) #=> [1, 3, 5, 7, 9]

We can always combine a map with a filter, applying a function to only those elements matching a predicate. The composability of these operations is a major advantage, and much easier to deal with than building complicated loops over data manually.

Reduce #

The reduce operation, also sometimes called fold, is a much more general way to process the elements of a sequence. We could use it to build map, filter, or many other interesting operations.

reduce takes as arguments a function, an accumulator, and a sequence. The function takes two arguments – the accumulator and one element of the sequence – and returns an updated accumulator. The function is first passed with the initial accumulator and the first element of the sequence, returning a new accumulator which is passed to the function again with the second element of the sequence, and so on. The final value of the accumulator is returned by reduce.

For example, suppose we want to add up the numbers in a list, but keep separate sums of the odd numbers and the even numbers. In R:

parity_sum <- function(accum, element) {
    if ( element %% 2 == 0 ) {
        list(accum[[1]] + element, accum[[2]])
    } else {
        list(accum[[1]], accum[[2]] + element)
    }
}

Reduce(parity_sum, 1:10, list(0,0))  #=> list(30, 25)

In Python:

def parity_sum(acc, x):
  even, odd = acc
  if x % 2 == 0:
      return [even + x, odd]
  else:
      return [even, odd + x]

reduce(parity_sum, range(1,11), [0,0])  #=> [30, 25]

Data-Oriented Functional Programming: A Preview #

An approach to design that makes the data entities in a system a central concern. This is an opinionated flavor of functional programming that suggests practices to reduce complexity, maximize reuse, and improve correctness.

Let’s start with a few core principles that we will dive into in the next few activities.

Principle #1: Separate Data and Code #

Design operations to operate on data that is separately described and stored. Keep state out of your functions.

Principle #2: Use Immutable Data When Possible #

Reduces complexity at very small performance penalty for general use. Transient mutability is fine in tight bottlenecks.

Principle #3: Represent Data with Generic Data Structures #

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

– Alan Perlis

Principle #4: Describe Complex Data Models #

Type systems and schema can provide rich descriptions of our data models, offer validation, test generation, auto documentation, and many other benefits.

A Language for Types #

It helps to have a language for describing the shape of data. In some languages the shape is specified at compile time as types (via a type system), and in others described at compile and/or run time via schemas (e.g., via JSON Schema, Malli, etc.).

We will borrow the language of types to get a kind of pseudocode for describing the shape of data.

data Shape = Circle Float Float Float | Rectangle Float Float Float Float

data UnaryIntFn = Int -> Int
data BinaryIntFn = Int -> Int -> Int

data Expr a
  = Literal { intVal :: Int }
  | Ident   { name :: String  }
  | Index   { target :: a, idx :: a }
  | Unary   { op :: String, target :: a }
  | Binary  { lhs :: a, op :: String, rhs :: a }
  | Call    { func :: a, args :: [a] }
  | Paren   { subexpr :: a }

data List a = Nil | Cons a (List a)

data BTree a = Leaf a | Branch (BTree a) (BTree a)

FP Activity I (Homework init ) #

Today, we’ll consider a problem that we will tackle next time; this will be a homework exercise dominoes available to do.

Here is a motivating version: https://adventofcode.com/2017/day/24