Back to top

Unit Testing

— Christopher Genovese and Alex Reinhart

Complexity can have consequences!

Whether it is

the crash of the Mars Climate Orbiter (1998),
a failure of the national telephone network (1990),
a deadly medical device (1985, 2000),
a massive Northeastern blackout (2003),
the Heartbleed, Goto Fail, Shellshock exploits (2012–2014),
a 15-year-old fMRI analysis software bug that inflated significance levels (2015),

bugs will happen. It is hard to know whether a piece of software is actually doing what it is supposed to do. It is easy to write a thousand lines of research code, then discover that your results have been wrong for months.

Discipline, design, and careful thought are all helpful in producing working software. But even more important is effective testing, and that is the central topic for today.

A Common (Interactive) Workflow #

Write a function.
Try some reasonable values at the REPL to check that it works.
If there are problems, maybe insert some print statements, and modify the function.
Repeat until things seem fine.

(REPL: Read-Eval-Print Loop, the console at which you type commands and view results, such as you might use in RStudio or Jupyter Notebooks.)

This will leave you with a lot of bugs. Worse, you can’t keep track of what cases you tested, so when you return to edit your code, you can’t be sure your edits don’t break some of those cases.

For example, suppose you’re modeling some binomial data. One argument to your function is the number of successes in the trials. One day you decide it’s more convenient to use the number of failures, so you change the function and the functions that call it. But you’ve forgotten about some other functions that call it – and you may never notice that they’re now giving the wrong answers. If you do, it may take hours to track down the cause.

This sounds far-fetched, but variations on it happen all the time.

So how can we more systematically test our code for correctness?

One method is unit testing.

Unit Testing #

A “unit” is a vaguely defined concept that is intended to represent a small, well-defined piece of code. A unit is usually a function, method, class, module, or small group of related classes.

A test is simply some code that calls the unit with some inputs and checks that its answer matches an expected output.

Unit testing consists of writing tests that are

focused on a small, low-level piece of code (a unit)
typically written by the programmer with standard tools
fast to run (so can be run often, i.e. before every commit).

The benefits of unit testing are many, including

Exposing problems early
Making it easy to change (refactor) code without forgetting pieces or breaking things
Simplifying integration of components
Providing natural documentation of what the code should do
Driving the design of new code.

Research has shown that unit testing costs you in time written for tests, but dramatically reduces the number of bugs later found in the software. Just ask this former PhD student:

Personal experience has shown that nothing is quite as satisfying as writing a bunch of tests for some code, realizing you can rewrite the code in a much better way, and having it pass all the tests on the first try.

Nonetheless, many people seem to have learned testing from this book:

Components of a Unit Testing Framework #

A test is a collection of one or more assertions executed in sequence, within a self-contained environment. The test fails if any of those assertions fail.

Each test should focus on a single function or feature and should be well-named so that a failed test lets you know precisely where to look in your code for the problem:

test_that("Conway's rules are correct", {
    # conway_rules(num_neighbors, alive?)
    expect_true(conway_rules(3, FALSE))
    expect_false(conway_rules(4, FALSE))
    expect_true(conway_rules(2, TRUE))
    ...
})

(This uses the testthat package for R, available from CRAN.)

A test suite is a collection of related tests in a common context. A test suite can make provisions to prepare the environment for each test and clean up after (load a big dataset, connect to a database…). A test suite might also make available “test doubles” (think stunt double) that mimic or partially implement other features in the code (e.g., database access) to enable the unit to be tested as independently as possible.

Test suites are run and the results reported, particularly failures, in a easy to parse and economical style. For example, Python’s unittest can report like this:

$ python test/trees_test.py -v

test_crime_counts (__main__.DataTreeTest)
Ensure Ks are consistent with num_points. ... ok
test_indices_sorted (__main__.DataTreeTest)
Ensure all node indices are sorted in increasing order. ... ok
test_no_bbox_overlap (__main__.DataTreeTest)
Check that child bounding boxes do not overlap. ... ok
test_node_counts (__main__.DataTreeTest)
Ensure that each node's point count is accurate. ... ok
test_oversized_leaf (__main__.DataTreeTest)
Don't recurse infinitely on duplicate points. ... ok
test_split_parity (__main__.DataTreeTest)
Check that each tree level has the right split axis. ... ok
test_trange_contained (__main__.DataTreeTest)
Check that child tranges are contained in parent tranges. ... ok
test_no_bbox_overlap (__main__.QueryTreeTest)
Check that child bounding boxes do not overlap. ... ok
test_node_counts (__main__.QueryTreeTest)
Ensure that each node's point count is accurate. ... ok
test_oversized_leaf (__main__.QueryTreeTest)
Don't recurse infinitely on duplicate points. ... ok
test_split_parity (__main__.QueryTreeTest)
Check that each tree level has the right split axis. ... ok
test_trange_contained (__main__.QueryTreeTest)
Check that child tranges are contained in parent tranges. ... ok

----------------------------------------------------------------------
Ran 12 tests in 23.932s

OK

Test packages often make it easy to run an entire suite all at once. R’s testthat package provides a simple function to run all tests in a directory:

test_dir("tests/")

This prints out a summary of results. You could automate this with a simple script, say test-all.R, that runs all of your tests at once; if you use devtools for your R code, it can also do this for you. If you write an R package, there is a standard structure for setting up tests so you can run all the package’s tests at once, automatically.

Wait, What Do I Test? #

So what should be included in tests?

A core principle: tests should be passed by a correct function, but not by an incorrect function.

That seems obvious, but it’s deeper than it looks. You want your tests to stress your functions: try them in multiple ways to make them break. Test corner cases. After all, we could write this test:

test_that("Addition is commutative", {
    expect_equal(add(1, 3), add(3, 1))
})

which is a perfectly good test, but is also passed by these two functions:

add <- function(a, b) {
    return(4)
}

add <- function(a, b) {
    return(a * b)
}

So this test is easily passed by incorrect functions! We need to test more thoroughly.

Always try to test:

several specific inputs for which you know the correct answer
“edge” cases, like a list of size zero or size eleventy billion
special cases that the function must handle, but which you might forget about months from now
error cases that should throw an error instead of returning an invalid answer
previous bugs you’ve fixed, so those bugs never return.

Try to cover all branches of your function: that is, if your function has several different things it can do depending on the inputs, test inputs for each of these different things.

Test Exercises #

Scenario 1. Find the maximum sum of a subsequence #

Function name: max_sub_sum(arr)

Write a function that takes as input a vector of n numbers and returns the maximum sum found in any contiguous subvector of the input. (We take the sum of an empty subvector to be zero.)

For example, in the vector [1, -4, 4, 2, -2, 5], the maximum sum is 9, for the subvector [4, 2, -2, 5].

There’s a clever algorithm for doing this fast, with an interesting history related to our department. But that’s not important right now. How do we test it?

(If you want to implement it – there’s a homework problem for that! Try the max-sub-sum exercise.)

Test ideas?

Scenario 2. Create a half-space function for a given vector #

Function name: half_space_of(point)

Given a vector in Euclidean space, return a boolean function that tests whether a new point is in the positive half space of the original vector. (The vector defines a perpendicular plane through the origin which splits the space in two: the positive half space and the negative half space.) An example:

foo <- half_space_of(c(2, 2))

foo(c(1, 1)) == TRUE

Test ideas?

Scenario 3. What’s the closest pair of points? #

Function name: closest_pair(points)

Given a set of points in the 2D plane, find the pair of points which are the closest together, out of all possible pairs.

Later we will learn a good algorithm, using dynamic programming, to solve this without comparing all possible pairs. For now, let’s think of tests.

Question: are there properties of closest_pair that can be tested? Could we generate random data and test that these properties hold?

Ideas?

Tests in Practice #

Tests are commonly kept in separate source files from the rest of your code. In a long-running project, you may have a test/ folder containing test code for each piece of your project, plus any data files or other bits needed for the tests.

For example, in my thesis, I had:

$ ls test/

background_test.py  covariance_test.py  kde_test.py         spatial_test.py
bayes_test.nb       crimedata_test.py   likelihood_test.nb  test-data.h5
bayes_test.py       em_test.nb          likelihood_test.py  trees_test.py
bbox_test.py        em_test.py          __pycache__/
bg-test.h5          emtools_test.py     rect_bg_1.png
cd-dump.h5          geometry_test.py    regress-fit.h5

and I can run all the tests with a single command. You should choose a similar structure: separate files with tests, plus a script or shell command that runs all the tests (e.g. using testthat’s test_dir function or Python’s pytest module).

The goal is to make tests easy to run, so you run them often. Run your tests before you commit new code, after you make any interesting changes, and whenever you fix bugs (remember to add a test for the bug!). You should always be confident that your code is well tested.

Testing Your Homework #

We will expect you to write tests for all the code you submit in homework assignments. The new-homework script will create a test file automatically for you. Check out the Testing Tutorial below for details.

Test-Driven Development #

Unit testing can be the basis of a software design approach.

Test Driven Development (TDD) uses a short development cycle for each new feature or component:

Write tests that specify the component’s desired behavior. The tests will initially fail as the component does not yet exist.
Create the minimal implementation that passes the test.
Refactor the code to meet design standards, running the tests with each change to ensure correctness.

Why work this way?

Writing the tests may help you realize what arguments the function must take, what other data it needs, and what kinds of errors it needs to handle.
The tests define a specific plan for what the function must do.
You will catch bugs at the beginning instead of at the end (or never).
Testing is part of design, instead of a lame afterthought you dread doing.

Try it. We will expect to see tests with your homework anyway, so you might as well write the tests first!

More Types of Testing #

Regression Testing
- seeks to uncover new bugs introduced after changes
- often done by re-running old tests and comparing with passing results
- good for testing entire analyses to be sure nothing changes unexpectedly
Integration Testing
- test how system components fit together and interact
Acceptance Testing
- does the system meet its overall specifications?
- blackbox system tests

Top-down Testing

specifies test requirements in terms of constants and metaconstants
allows you to write functions in terms of other functions not yet written

Top-down test, where account-number is implemented in terms of digit-parcels and digit, which are not yet implemented:

(unfinished digit-parcels digit)

(fact "an account number is constructed from character parcels"
  (account-number ..parcel..) => "01"
  (provided
    (digit-parcels ..parcel..) => [..0-parcel.. ..1-parcel..]
    (digit ..0-parcel..) => 0
    (digit ..1-parcel..) => 1))

Others as well (functional, usability, …)

A Testing Tutorial #

So how do you go about writing tests? A few rules of thumb:

Keep tests in separate files from the code they test. This makes it easy to run them separately.
Give tests names. Testing frameworks usually let you give the test functions names or descriptions. test_1 doesn’t help you at all, but test_tree_insert makes it easy for you to remember what the test is for.
Make tests replicable. If a test involves random data, what do you do when the test fails? You need some way to know what random values it used so you can figure out why the test fails.
Use tests instead of the REPL. If you’re building a complicated function, write the tests in advance and use them to help you while you write the function. If you debug your code by calling the function on the REPL with various arguments, you’ll waste a lot of time compared to writing those calls into a test and running it every time you make changes.

Python #

In Python, we recommend using the pytest package, which makes testing very easy. (unittest is built in, but more complicated to use.) Simply install it by running

pip install -U pytest

Now suppose you’ve written some functions in foobar.py that you would like to test. The standard Python convention is to place tests in a separate file with a name starting with test_, so we would create test_foobar.py containing:

from foobar import foo, bar

def test_foo_values():
    assert foo(4) == 8
    assert foo(2.2) == 1.9

def test_bar_limits():
    assert bar(4, [1, 90], option=True) < 8

# If you want to test that bar() raises an exception when called with certain
# arguments, e.g. if bar() should raise an error when its argument is negative:
def test_bar_errors():
    with pytest.raises(ValueError):
        bar(-4, [2, 10])  # test passes if bar raises a ValueError

When you use new-homework to start an assignment, a test file is created for you automatically.

To run the tests, just run

pytest -v test_foobar.py

That’s it! You use ordinary Python assert statements and you can assert anything. pytest will automatically find the functions whose names begin with test, run them, and report the results. You do not need to write print statements to print out results; pytest will do that automatically.

For more detail, check out the pytest Getting Started guide, which covers things like asserting that a function raises a certain exception and links to the rest of the pytest documentation.

R #

In R, we recommend using testthat to write your unit tests. Install the package using install.packages as usual.

Now suppose you’ve written some functions in foobar.R that you would like to test. The standard R convention is to place tests in a separate file with a name starting with test_, so we would create test_foobar.R containing

library(testthat)

source("foobar.R")

test_that("foo values are correct", {
    expect_equal(foo(4), 8)
    expect_equal(foo(2.2), 1.9)
})

test_that("bar has correct limits", {
    expect_lt(bar(4, c(1, 90), option = TRUE), 8)
})

test_that("bar throws an error on bad inputs", {
    expect_error(bar(-4, c(1, 10))) # test passes if bar calls stop() or throws an error here
})

Notice that we use expect_equal and expect_lt. This way, when a test fails, testthat can print out a helpful error message explaining what the test was supposed to do. You do not need to write cat commands to print out results; testthat will print results if any tests fail.

When you use new-homework to start an assignment, a test file will be created automatically for you.

To run the tests, just run

Rscript test_foobar.R

For more detail, check out the testthat reference guide. You can test many things beyond equality, including checking that errors are raised, and there are tools to run all tests in a directory and report results with varying levels of detail.

Test Framework References #

For whatever language you use, there is likely already a unit testing framework which makes testing easy to do. No excuses!

R #

See the Testing chapter from Hadley Wickham’s R Packages book for examples using testthat.

testthat (recommended, friendly and easy)
RUnit (standard xUnit style)

Python #

pytest (more ergonomic, recommended)
unittest (built-in, nice)
Hypothesis (generative testing)

Java #

JUnit (the original)

Clojure #

clojure.test (built in)
midje (excellent!)
test.check (generative)

JavaScript #

Karma (cf. Protractor)
Buster.js
QUnit
Mocha
Jasmine (client side, BDD, no DOM)

Unit Testing

A Common (Interactive) Workflow #

Unit Testing #

Components of a Unit Testing Framework #

Wait, What Do I Test? #

Test Exercises #

Scenario 1. Find the maximum sum of a subsequence #

Scenario 2. Create a half-space function for a given vector #

Scenario 3. What’s the closest pair of points? #

Tests in Practice #

Testing Your Homework #

Test-Driven Development #

More Types of Testing #

A Testing Tutorial #

Python #

R #

Test Framework References #

R #

Python #

Java #

Clojure #

JavaScript #

C++ #

Haskell #

Behavior-Driven Development Frameworks #

Course Info

Sessions

Integration

Tools

Practices

Concepts

Algorithms

Methods

Databases

Tips