1 of 35

An Experimentalist Approach to Software Testing

Jeff Soules and Brian Ward

Center for Computational Mathematics, Flatiron Institute

2 of 35

Introduction

2

3 of 35

  • Jeff Soules is a senior software engineer at the Flatiron Institute with over 20 years of experience across software development, support, design, and project management & who has released over a dozen production systems
  • Brian Ward is a software engineer at the Flatiron Institute who works primarily on Stan, a probabilistic programming language used in fields from epidemiology to baseball analytics.

Who are we and why should you listen to us?

3

4 of 35

  • Automated software tests
    • Regularly re-run (by machine)
    • Evaluated objectively (by machine)
    • Expected to work on all user deployments

→ ANYONE can run & interpret

  • Reproducible, deterministic results
    • We do not want a reproducibility crisis in scientific software

What are we talking about today?

4

5 of 35

Test your oven with a thermometer, not a turkey.

5

6 of 35

“Experimentalist testing” means treating tests as . . .

  • controlled experiments that
  • each confirm one specific property or invariant of
  • the code artifact you are publishing

Test result must depend on your code – a small part of your code – and nothing but your code

6

7 of 35

Experimentalist testing can

  • Show correctness of our implementation
  • Localize sources of error
  • Increase confidence during future code changes
  • Document code expectations/design choices
  • Confirm numerical accuracy or convergence (maybe)

7

8 of 35

  • https://github.com/jsoules/experimental-testing-exercises
  • If you’re using Python, we recommend installing pytest
  • If you want to use C++, we will provide links to the Compiler Explorer

What you need to follow along today

8

9 of 35

Anatomy of a test

9

10 of 35

What makes a test?

To do a test, you must:

  • Set up the preconditions for the behavior you’re testing
  • Call the code whose behavior you’re testing
  • Compare the actual result with the expected result

Mnemonic is “arrange, act, assert”

…but the greatest of these is assert

10

11 of 35

One test, one property

  • Every test should test a specific, explicit property of the code
  • It should pass if, and only if, that property holds

Yes, this will lead to lots of tests!

This is not a problem so long as you keep them all fast.

11

12 of 35

Controlled experiments

  • Success or failure should depend only on the property under test
  • Any other factors should be set to known values
  • Systems not under test should be omitted
    • by use of fakes/stubs/mocks, monkey-patching, etc.

12

13 of 35

Only test your code

  • A useful test must call the implementation
  • There’s little point to testing library code
  • Proofs of method correctness belong in papers, not project tests
    • Project tests may still be derived from theory, like if an invariant should hold each iteration

13

14 of 35

Three varieties of bad tests

Those that…

  • Never fail (tautological, or fail to assert anything)
  • Never pass (or: fail if you breathe on them)
  • Might fail or pass on the same inputs

Also fishy: tests that take a huge amount of work to achieve control

14

15 of 35

Live code a test

Follow along at https://github.com/jsoules/experimental-testing-exercises

Folder ‘00

15

16 of 35

Assumptions and invariants

16

17 of 35

What to assert?

Basic level: confirming expectations

  • Results of a (deterministic) computation
  • Correct logical branch was taken
  • Error states are handled appropriately
  • Does your code ‘do nothing’ correctly?

By making assumptions explicit, tests act as a kind of documentation

17

18 of 35

What to assert?

Intermediate level: interactions with external systems

  • Did the file actually get written?
  • Database call was correct?
  • Handled unexpected response from system?

Requires experimental control (e.g. through fakes) and test isolation

18

19 of 35

What to assert?

Conceptual level: system invariants

  • Is energy conserved?
  • Are results properly normalized?
  • Do lists have the right length/shape, loops have right iterations, etc

Sometimes it makes sense to use unrealistic inputs to confirm they are still manipulated properly

19

20 of 35

Exercises

https://github.com/jsoules/experimental-testing-exercises

Folders ‘01’ and ‘02

20

21 of 35

Experimenting on a�world you control

21

22 of 35

The envy of every experimentalist

  • You create the very world you experiment upon!
  • Reshape interfaces to something more convenient
  • Or keep the interface but update the implementation
  • Use testability as a guide to writing better code

22

23 of 35

Tests: beyond pass and fail

Code that’s difficult to test is often also difficult to…

  • Understand and reason about
  • Explain and document
  • Maintain & extend
  • Actually use

Listen to your pain.

23

24 of 35

Signs that things need to change

  • Lots of setup/fakes/mocks needed to achieve experimental control
  • Cleanup required after calling ordinary functions
  • Test needs considerable logic to match the right case
  • Difficult to state the expected result of an operation
  • Fakes make things work that shouldn’t

24

25 of 35

Responding to common issues

  • Control is hard → sign the units are too big/too ambitious
  • Frequent cleanup → isolate code with external effects
  • Test contains logic → separate conditional and operational code
  • Hard to describe results → code unit is too ambitious
  • Test passes with fakes only → Poor interface documentation; ought to have many more tests

25

26 of 35

Exercises Part 2

https://github.com/jsoules/experimental-testing-exercises

Folders ‘03’ and ‘04

26

27 of 35

Conclusion

27

28 of 35

Tests are experiments

They should be:

  • Controlled
  • Focused
  • Repeatable

28

29 of 35

You have an advantage

  • You usually control the system you’re experimenting on
  • Testability is highly correlated with other measures of code quality

29

30 of 35

Thank you

30

31 of 35

Appendix: Helpful tools

31

32 of 35

Tools to help with testing

Testing framework

  • Runs tests for you
  • Can help ensure test isolation
  • Let you select subsets of tests to run
  • Reporting features, including coverage

32

33 of 35

Tools to help with testing

Assertion libraries

  • You’ve only tested what you’ve asserted
  • Good tools help you zero in on discrepancies
    • Expressive comparisons, error handlers, etc. let you make more precise assertions
  • Hypothesis testing tools can help turn invariants into assertions automatically
  • Gold/expect/cram testing can help ensure reliable output (but may be brittle)

33

34 of 35

Tools to help with testing

Mocking framework

  • Tests should focus on what you control
    • It’s a bummer if your tests fail because someone else’s web server is down
  • Mocking allows you to replace some pieces of code with special versions just for testing
    • And can also give you more insight, e.g. “this function called with xyz”
  • Often included in testing libraries alongside special asserts

Be careful not to go too far with this ability!

34

35 of 35

Tools to help with testing

Coverage reporting

  • Visual tool to see what definitely hasn’t been tested
  • Cannot say what you meaningfully asserted, only what you called
  • Not a progress bar (despite appearances)
  • Particularly useful for checking conditional branches

35