1 of 35

An Experimentalist Approach to Software Testing

Jeff Soules and Brian Ward

Center for Computational Mathematics, Flatiron Institute

2 of 35

Introduction

2

3 of 35

Jeff Soules is a senior software engineer at the Flatiron Institute with over 20 years of experience across software development, support, design, and project management & who has released over a dozen production systems
Brian Ward is a software engineer at the Flatiron Institute who works primarily on Stan, a probabilistic programming language used in fields from epidemiology to baseball analytics.

Who are we and why should you listen to us?

3

4 of 35

Automated software tests

Regularly re-run (by machine)
Evaluated objectively (by machine)
Expected to work on all user deployments

→ ANYONE can run & interpret

Reproducible, deterministic results

We do not want a reproducibility crisis in scientific software

What are we talking about today?

4

5 of 35

Test your oven with a thermometer, not a turkey.

5

6 of 35

“Experimentalist testing” means treating tests as . . .

controlled experiments that
each confirm one specific property or invariant of
the code artifact you are publishing

Test result must depend on your code – a small part of your code – and nothing but your code

6

7 of 35

Experimentalist testing can

Show correctness of our implementation
Localize sources of error
Increase confidence during future code changes
Document code expectations/design choices
Confirm numerical accuracy or convergence (maybe)

7

8 of 35

https://github.com/jsoules/experimental-testing-exercises
If you’re using Python, we recommend installing pytest
If you want to use C++, we will provide links to the Compiler Explorer

What you need to follow along today

8

9 of 35

Anatomy of a test

9

10 of 35

What makes a test?

To do a test, you must:

Set up the preconditions for the behavior you’re testing
Call the code whose behavior you’re testing
Compare the actual result with the expected result

Mnemonic is “arrange, act, assert”

…but the greatest of these is assert

10

11 of 35

One test, one property

Every test should test a specific, explicit property of the code
It should pass if, and only if, that property holds

Yes, this will lead to lots of tests!

This is not a problem so long as you keep them all fast.

11

12 of 35

Controlled experiments

Success or failure should depend only on the property under test
Any other factors should be set to known values
Systems not under test should be omitted

by use of fakes/stubs/mocks, monkey-patching, etc.

12

13 of 35

Only test your code

A useful test must call the implementation
There’s little point to testing library code
Proofs of method correctness belong in papers, not project tests

Project tests may still be derived from theory, like if an invariant should hold each iteration

13

14 of 35

Three varieties of bad tests

Those that…

Never fail (tautological, or fail to assert anything)
Never pass (or: fail if you breathe on them)
Might fail or pass on the same inputs

Also fishy: tests that take a huge amount of work to achieve control

14

15 of 35

Live code a test

Follow along at https://github.com/jsoules/experimental-testing-exercises

Folder ‘00’

15

16 of 35

Assumptions and invariants

16

17 of 35

What to assert?

Basic level: confirming expectations

Results of a (deterministic) computation
Correct logical branch was taken
Error states are handled appropriately
Does your code ‘do nothing’ correctly?

By making assumptions explicit, tests act as a kind of documentation

17

18 of 35

What to assert?

Intermediate level: interactions with external systems

Did the file actually get written?
Database call was correct?
Handled unexpected response from system?

Requires experimental control (e.g. through fakes) and test isolation

18

19 of 35

What to assert?

Conceptual level: system invariants

Is energy conserved?
Are results properly normalized?
Do lists have the right length/shape, loops have right iterations, etc

Sometimes it makes sense to use unrealistic inputs to confirm they are still manipulated properly

19

20 of 35

Exercises

https://github.com/jsoules/experimental-testing-exercises

Folders ‘01’ and ‘02’

20

21 of 35

Experimenting on a�world you control

21

22 of 35

The envy of every experimentalist

You create the very world you experiment upon!
Reshape interfaces to something more convenient
Or keep the interface but update the implementation
Use testability as a guide to writing better code

22

23 of 35

Tests: beyond pass and fail

Code that’s difficult to test is often also difficult to…

Understand and reason about
Explain and document
Maintain & extend
Actually use

Listen to your pain.

23

24 of 35

Signs that things need to change

Lots of setup/fakes/mocks needed to achieve experimental control
Cleanup required after calling ordinary functions
Test needs considerable logic to match the right case
Difficult to state the expected result of an operation
Fakes make things work that shouldn’t

24

25 of 35

Responding to common issues

Control is hard → sign the units are too big/too ambitious
Frequent cleanup → isolate code with external effects
Test contains logic → separate conditional and operational code
Hard to describe results → code unit is too ambitious
Test passes with fakes only → Poor interface documentation; ought to have many more tests

25

26 of 35

Exercises Part 2

https://github.com/jsoules/experimental-testing-exercises

Folders ‘03’ and ‘04’

26

27 of 35

Conclusion

27

28 of 35

Tests are experiments

They should be:

Controlled
Focused
Repeatable

28

29 of 35

You have an advantage

You usually control the system you’re experimenting on
Testability is highly correlated with other measures of code quality

29

30 of 35

Thank you

30

31 of 35

Appendix: Helpful tools

31

32 of 35

Tools to help with testing

Testing framework

Runs tests for you
Can help ensure test isolation
Let you select subsets of tests to run
Reporting features, including coverage

32

33 of 35

Tools to help with testing

Assertion libraries

You’ve only tested what you’ve asserted
Good tools help you zero in on discrepancies

Expressive comparisons, error handlers, etc. let you make more precise assertions

Hypothesis testing tools can help turn invariants into assertions automatically
Gold/expect/cram testing can help ensure reliable output (but may be brittle)

33

34 of 35

Tools to help with testing

Mocking framework

Tests should focus on what you control

It’s a bummer if your tests fail because someone else’s web server is down

Mocking allows you to replace some pieces of code with special versions just for testing

And can also give you more insight, e.g. “this function called with xyz”

Often included in testing libraries alongside special asserts

Be careful not to go too far with this ability!

34

35 of 35

Tools to help with testing

Coverage reporting

Visual tool to see what definitely hasn’t been tested
Cannot say what you meaningfully asserted, only what you called
Not a progress bar (despite appearances)
Particularly useful for checking conditional branches

35