Assertions

— Alex Reinhart and Christopher Genovese

An assertion is a statement that a condition is true.

Assertions are intended to state conditions that are expected to always be true, and an assertion failure usually leads to program termination. If the assertion is true, nothing happens.

A couple quick examples:

library(assertthat)

foo <- function(x) {
    assert_that(x >= 0)
    # ... do stuff with x
}
#include <cassert>

int adjust(int base, int increment) {
    assert(base >= 0);
    assert(increment % 2 == 0);
    // ...
}

Assertions inside functions add overhead, since the condition must be checked every time the code runs. Hence assertions are sometimes disabled once development is complete and the code is in use.

(Example: In C++, simply defining a macro NDEBUG disables all assertions.)

What are assertions for? #

Assertions are a debugging aid. They are used to help the programmer detect and fix bugs (conditions that should not occur) and verify that the underlying assumptions remain true.

For example, while working on a function to fit a complicated model, you may know that certain parameters must remain in a certain range if the model is fit correctly. If you add assertions, you will catch errors in your code before they give you nonsense results:

def fit(data, ...):

    for it in range(max_iterations):
        # iterative fitting code here
        ...

        # Plausibility check
        assert np.all(alpha >= 0), "negative alpha"
        assert np.all(theta >= 0), "negative theta"
        assert omega > 0, "Nonpositive omega"
        assert eta2 > 0, "Nonpositive eta2"
        assert sigma2 > 0, "Nonpositive sigma2"

    ...

In this same model fitting algorithm, you might know that the log-likelihood is guaranteed to increase at each iteration – if it does not, something is wrong with your code. This could be an assertion.

In general, if you find yourself thinking something like “I need to calculate this quantity, but I’m pretty sure this variable will always be [something], so I can just do [something else] instead”, that’s an assertion that can be written into the code to catch if your assumption is wrong.

Assertions are not a substitute for error handling. They are not, for instance, intended to detect and handle erronous input or a bad program state that is not due to a bug in the program.

In fit, for example, we can’t recover from omega < 0. It’s impossible, so if it happens, our code is wrong. There is nothing sensible to do to fix it.

(However, in languages meant to be used interactively, like R, they are often used to catch when users call functions in incorrect ways.)

For problems that can be recovered from – like a file that couldn’t be opened – we have errors and exceptions.

Assertions provide both run-time checking (especially during development) and also a simple documentation of the underlying assumptions.

R users can use the assertthat package; Python has the assert statement built in to the language, as do some other languages.

How are assertions different from tests? #

Assertions seem similar to unit tests: a unit test asserts that a function does a certain thing, right?

While this is true, we use assertions for more than just testing.

A unit test provides a function certain inputs and makes sure the outputs are correct. Unit tests exist outside the function, calling it in different ways to check its overall behavior.

But we write assertions inside functions, verifying that certain conditions are always true.

A unit test verifies that a function is correct. An assertion verifies that the users of the function use it correctly, and that required properties are true every time it runs.

There is another common idiom we often see assertions use for: checking the inputs and outputs of a function to make sure they meet a contract. We’ll get to that idea, Design by Contract, in a moment.

An assertion exercise #

Open the documents repository on your computer and do a git pull.

Open the files Activities/errors-assertions/shift-the-mean.r and Activities/errors-assertions/naughty-user.r in RStudio or another editor.

Try running naughty-user.r. You’ll see that it hits an error in Case 1, because the user has tried to call mean_shift_trajectories with invalid arguments.

For each case:

  1. Try running the code. If it throws an error, examine the error message; if it doesn’t, ask if it should throw an error.
  2. Look at the code in shift-the-mean.r and figure out the cause of the problem.
  3. Determine if there is an assertion you can add that would catch this error sooner, and make its cause clearer to the naughty user. Add the assertion(s) and run the code again, checking that you get a more useful error. Use the assert_that function from the assertthat package to implement your assertions.

Repeat this for each case.

Design by Contract #

This is a design methodology where programmers define precise and verifiable specifications for software components. The contract is useful for both debugging and documentation.

Design by contract is based on the metaphor of a legal contract between two parties defining the obligations embodied in a transaction (e.g., function call).

The main ingredients of a contract (usually at the function or class level) are

preconditions
a condition that should be true just prior to execution
postconditions
a condition that should be true immediately after execution
invariants
a condition that should be true during execution
side effects
modified state or observable interaction with the outside world
error
error conditions that can occur
returns
values, types, meaning returned
guarantees
performance (time or space), validity (ACID), etc.

The first three are the most commonly used. Some languages, like Eiffel and Racket, have sophisticated built-in contract systems:

(define/contract (foo x y)
  (-> positive? positive? positive?)
  (+ (* x x) (* y y)))

This defines foo to have a contract that its arguments are positive and it returns a positive number. If we violate the contract, Racket tells us what code is to blame – the calling code, in this case:

(foo -2 3)

foo: contract violation
  expected: positive?
  given: -2
  in: the 1st argument of
      (-> positive? positive? positive?)
  contract from: (function foo)
  blaming: main
   (assuming the contract is correct)
  at: bad-code.rkt:3.14

You can add contracts to Python with an extra module:

from contracts import contract

@contract(lines='list(str)',
          returns='dict(str: (int,>=1))')
def word_count(lines):
    result = {}

    for line in lines:
        for word in line.split():
            result[word] = result.get(word, 0) + 1

    return result

In some implementations, new types of contracts can be defined separately and reused:

from contracts import contract, new_contract

@new_contract
def even(x):
    if x % 2 != 0:
        msg = 'The number %s is not even.' % x
        raise ValueError(msg)

    # do stuff
    ...

@contract(x='int,even')
def foo(x):
    pass

foo(2)
foo(3)

contracts.interface.ContractNotRespected: Breach for argument 'x' to foo().
The number 3 is not even.
checking: callable()   for value: Instance of int: 3
checking: even         for value: Instance of int: 3
checking: int,even     for value: Instance of int: 3
Variables bound in inner context:
- args: Instance of tuple: ()
- kwargs: Instance of dict: {}

Other languages take this to an extreme: SPARK (based on Ada) analyzes each function and tries to logically prove that it satisfies the specified contract, and will throw an error if the function won’t satisfy the contract. This can be done without even running the code.

In languages without built-in contracts, like R, we can use assertions inside functions to check pre- and post-conditions when the code runs. These don’t give such elegant error messages, but serve the same purpose. Many R users use assertions primarily for pre-condition checks, and the assertthat package is designed for this.

A brief exercise #

Suppose you have a function shortest_path(graph, start_node, end_node) that is intended to calculate the shortest path between two nodes in an undirected graph. Write a contract specifying the pre- and post-conditions the function must satisfy.

You can write informally, such as just saying “graph must be a graph object, and start_node must be a…“, instead of using a specific syntax.

Assume you have a range of useful functions like is_graph, is_node, graph_contains_node, and so on.

What conditions did you specify?