Announcements #
- HW2 (classification-tree-basic) is now online
- Office Hours: Thu 3-4, plus by Appt
- Questions?
Plan #
- Brief Commentary
- Comment on pausing folds
- Practices: Testing
- Sprint Activity: State Machines
Testing #
Motivation: Complexity can have consequences! #
Whether it is
- the crash of the Mars Climate Orbiter (1998),
- a failure of the national telephone network (1990),
- a deadly medical device (1985, 2000),
- a massive Northeastern blackout (2003),
- the Heartbleed, Goto Fail, Shellshock exploits (2012–2014),
- a 15-year-old fMRI analysis software bug that inflated significance levels (2015),
bugs will happen. It is hard to know whether a piece of software is actually doing what it is supposed to do. It is easy to write a thousand lines of research code, then discover that your results have been wrong for months.
Discipline, design, and careful thought are all helpful in producing working software. But even more important is effective testing, and that is the central topic for today.
Quick Terminology #
- test
- a function that runs other code in a library or application codebase and checks its results for correctness
- test suite
- a collection of tests on a related theme
- unit
- (loosely) a piece of code with a small, well-defined purpose and scope.
- assertion
- a claim about the state of a program or the result of a test
- unit test
- an automated test of a unit, usually checking the result with several different inputs or configurations
- property
- a logical invariant that a piece of code should satisfy
- property test
- a test of a property that generates consistent inputs and upon failure, attempts to produce a near-minimal example that causes failure
- fixture
- a context or setting that needs to be setup before tests are run (and usually torn down afterwards); these often create fakes/stubs/mocks that simulate entities our code should interact with in reality
Unit Testing #
A “unit” is a vaguely defined concept that is intended to represent a small, well-defined piece of code. A unit is usually a function, method, class, module, or small group of related classes.
A test is simply some code that calls the unit with some inputs and checks that its answer matches an expected output.
Unit testing consists of writing tests that are
- focused on a small, low-level piece of code (a unit)
- typically written by the programmer with standard tools
- fast to run (so can be run often, i.e. before every commit).
The benefits of unit testing are many, including
- Exposing problems early
- Making it easy to change (refactor) code without forgetting pieces or breaking things
- Simplifying integration of components
- Providing natural documentation of what the code should do
- Driving the design of new code.
Property (aka Generative) Testing #
A powerful technique for automatically testing logical invariants without having to create many small examples by hand. A property is a claim made about a specified set of quantities; we specify the set through generators that produce values of particular types and shapes. Generators may be combined to form new generators that are more specific to our needs.
Two examples show the idea.
(def sort-idempotent-prop
(for-all [v (gen/vector gen/int)]
(= (sort v) (sort (sort v)))))
(quick-check 100 sort-idempotent-prop)
;; => {:result true,
;; :pass? true,
;; :num-tests 100,
;; :time-elapsed-ms 28,
;; :seed 1528580707376}
from hypothesis import given
from hypothesis.strategies import text
@given(text())
def test_decode_inverts_encode(s):
assert decode(encode(s)) == s
One of the most useful features of Property/Generative testing is that on failures, the generators will search for a (roughly) minimal/simple example that fails, to make it easier to identify the problem.
Here’s a simple example using a fake sort
function:
(def sorted-first-less-than-last-prop
(for-all [v (gen/not-empty (gen/vector gen/int))]
(let [s (sort v)]
(< (first s) (last s)))))
(quick-check 100 sorted-first-less-than-last-prop)
;; => {:num-tests 5,
;; :seed 1528580863556,
;; :fail [[-3]],
;; :failed-after-ms 1,
;; :result false,
;; :result-data nil,
;; :failing-size 4,
;; :pass? false,
;; :shrunk
;; {:total-nodes-visited 5,
;; :depth 2,
;; :pass? false,
;; :result false,
;; :result-data nil,
;; :time-shrinking-ms 1,
;; :smallest [[0]]}}
Practicalities #
-
Recommended: pytest and Hypothesis (generative testing) for Python, testthat for R
library(testthat) source("foobar.R") test_that("foo values are correct", { expect_equal(foo(4), 8) expect_equal(foo(2.2), 1.9) }) test_that("bar has correct limits", { expect_lt(bar(4, c(1, 90), option = TRUE), 8) }) test_that("bar throws an error on bad inputs", { expect_error(bar(-4, c(1, 10))) # test passes if bar calls stop() or throws an error here })
from __future__ import annotations import pytest # ... def test_kinds_factories(): "Builtin kind factories" a = symbol('a') assert constant(1).values == {1} assert constant((2,)).values == {2} assert constant((2, 3)).values == {vec_tuple(2, 3)} assert either(0, 1).values == {0, 1} assert weights_of(either(0, 1, 2).weights) == pytest.approx([as_quantity('2/3'), as_quantity('1/3')]) assert lmap(str, values_of(either(a, 2 * a, 2).weights)) == ['<a>', '<2 a>'] # ... with pytest.raises(KindError): k0 >> me1
-
Tests are commonly kept in separate source files from the rest of your code. In a long-running project, you may have a
test/
folder containing test code for each piece of your project, plus any data files or other bits needed for the tests. -
All tests can now be run with a single command (e.g. using
testthat
’stest_dir
function or Python’spytest
module) -
Run tests often. It is common to set up a hook that runs your tests before each commit, and perhaps rejects the commit if the tests fail.
-
Every time you check your code, such as at the repl or with an example run, make a test out of it. Every time you encounter a bug or other failure, make a test out of it. Every example you put in your documentation can produce a test.
-
There is a wide variety of built-in assertions in common testing libraries, including for instance asserting that a piece of code throws an error.
There may also be third party libraries that add additional assertions and tools; these can be included as a “dev dependency” without affecting your users.
-
It is valuable when possible to write some tests before you implement a function. This can help you understand (and even document) what the function needs to do, including edge cases.
-
Make tests replicable: If a test involves random data, what do you do when the test fails? You need some way to know what random values it used so you can figure out why the test fails.
-
Other types of testing can be relevant: integration testing, interaction testing, acceptance testing, top-down testing, ….
Rapid-Fire Activity: Name that Test #
Scenario. Find the maximum sum of a subsequence #
Function name: max_sub_sum(arr)
Write a function that takes as input a vector of n numbers and returns the maximum sum found in any contiguous subvector of the input. (We take the sum of an empty subvector to be zero.)
For example, in the vector [1, -4, 4, 2, -2, 5], the maximum sum is 9, for the subvector [4, 2, -2, 5].
There’s a clever algorithm for doing this fast, with an interesting history related to our department. But that’s not important right now. How do we test it?
(If you want to implement it – there’s a repository problem for that! Try the
max-sub-sum
exercise.)
Test ideas?
Sprint Activity: State Machines #
Specification #
The state of a system is a description of the system’s configuration at a particular moment. It typically captures the features of interest for a particular model or analysis, and in many cases, the future evolution of the system depends only on the current state.
A state machine is an abstract model of computation, a device that can be in one state at a time and transitions among states in response to external events, possibly producing effects or actions at a transition.
The task today is to write a small library for defining, representing, and “running” state machines.
There are four basic data entities/types in this system:
- States
- Transitions
- Actions
- Events
Your library should provide mechanisms to
- create a state machine,
- define states and events,
- specify the legal transitions among those states,
- associate actions with transitions,
- associate transitions with events, and
- given a state machine, “run” the machine by dispatching events.
Examples:
- Traffic Light
- Vending Machine
- Pattern in a String
Design Discussion #
States #
- What data should be embodied in a state?
- How might we construct a state?
- Can there be more than one initial state?
- Should we handle product and sum states specially?
- Where should actions be stored?
- How might we capture the idea of a final/accepting state?
Transitions #
- What data does a transition need to hold?
- Guard conditions on transitions?
- When can an associated action be applied in a transition? Must we select one of these?
Actions #
- What are actions as data?
- What information should be passed to an action?
- Is raising an error ok in an action?
Events #
- What is the essential information in an event?
- What is the signature of event dispatch?
- Do events trigger immediately/synchronously? What if resulting actions take a while? Do we need a queue?
- Can we run more than one machine in parallel?
- How do we type events?
Operations #
- What operations do we need to support on these entities?
- What are the types of these operations?
- What does are main entry point look like?
Implementation #
Goal: Working implementation in an hour; pair up to share the load