An Introduction to�Property-Based Testing
Zac Hatfield-Dodds & Ryan Soklaski
The Plan
Property-Based Testing 101
What is testing, anyway?
“Testing is the art and science of running your code and then checking that it did the right thing.
Cool things that aren’t testing: assertions, type-checkers, linters, code review, coffee or sleep...
A few kinds of tests:
“Testing is the art and science of running your code and then checking that it did the right thing.
Cool things that aren’t testing: assertions, type-checkers, linters, code review, coffee or sleep...
These are properties!
Sorting is fully specified by just two properties:
partial specs are still very useful for finding bugs :-)
Summary
Exercises!
Describing your Data
An overview of Hypothesis strategies
Scalar values
You name it, Hypothesis can generate it. Literally.
Collections
st.lists(elements, min_size=0, max_size=None, unique_by=None, unique=False)
st.tuples(...)
st.fixed_dictionaries()
Modifying strategies with .map() and .filter()
.map()
.filter()
Special: just() and sampled_from()
just() “lifts” a value into a strategy that will only ever generate that value� e.g. `timezones=just(UTC)` - if you only want to vary other args, use just()�
sampled_from() chooses an element of a sequence� e.g. join=sampled_from([“inner”, “outer”])`�
works well with enums, including flag enums� i.e. sampled_from(Permissions) can generate R, W, X, R|W, R|X, R|W|X, …
Special: one_of() and nothing()
one_of() takes the union of strategies, like adding sets
nothing() is like the empty set
Impossible to “subtract” strategies or take the intersection 😭
Special: builds()
Construct custom objects - you’ll use this a lot
Recursive data: recursive() or deferred() ?
Simple rules of thumb:
Inferred strategies - from_type()
Inferred strategies - other
Inferred strategies - design tips
Special: @composite, .flatmap(), and data()
Three ways to generate data with internal dependencies and no filters�(e.g. “a tuple of a list, and a valid index into the list”)
Flatmap works for simple cases: from_type(type).flatmap(from_type)
@composite is semantically equivalent, better UX for nontrivial things
The ‘inner composite’ trick
The ‘inner composite’ trick
Special: data()
data() allows you to draw from strategies inside your test�- like @composite plus awareness of the test-so-far
Upside: incredibly flexible and powerful; arbitrary state and dependencies�Downside: can be too flexible and powerful, complicated failure reports
Summary: use data() if you need it �… but if @composite would also work, use the simpler tool instead.
Where to look for strategies
Exercises!
Aiming to teach a way of thinking �
“Duct tape mindset”: if it’s not working yet, use more!
Break time!
The Plan
Common Test Tactics
Common properties you can test
this works shockingly well
(especially with assertions in your code)
test
NO assert
call.
Roundtrips
Every codebase has roundtrips:
they’re critical to our code, have complicated inputs and outputs,�errors are common, and they’re logic bugs are prone to silent failure.� �Property-test all your round-trips!
Equivalent functions
Exactly equivalent:
�Sometimes equivalent:
Validate the output
I always feel silly writing these checks, but sometimes they catch a bug
Best to write these assertions in your code, not tests
Idempotent, commutative, associative, etc
Thanks to Haskell, property-based testing is named for “algebraic properties”
More common from set-like than number-like operations, e.g.�blog.developer.atlassian.com/programming-with-algebra/ �found them very useful in merging event streams
Model based / stateful testing
Metamorphic relations
i.e. between two related inputs
e.g. add noops → function is equivalent�or known change → known change
assert,�fuzz,�roundtrip,
and then relax according to the 80/20 rule.
$ hypothesis write my.tests
an interactive live demo �of the Ghostwriter.
Exercises!
Putting it into Practice
Beyond the principles
You’ve learned the principles. Now, some tips for the real world
and then our final exercises will be real-world bughunting :-)
Designing PBT suites
PBT is part of a more general test plan - not a panacea!
Custom strategies for your project
Better print()-debugging with note() and event()
note()
event()
Runtime Statistics
[example statistics output here, showing an event() and target report]
Dealing with external randomness
Random number generators:
Best option: pass a random.Random() from the st.randoms() strategy
(non-PRNG randomness like thread timings is basically out of scope, sorry)
Dealing with global PRNGs
If you can’t pass a Random() instance…
random_module() will vary the seeds of all known ‘global’ PRNGs
hypothesis.register_random() can add to the list
Consider requesting upstream integration via a plugin�(Zac is usually happy to write these)
Settings
Profiles
As a decorator on a test function (quick and dirty)
From the pytest command-line (inc. profile selection)
Settings - performance
Check --hypothesis-show-statistics to see timings, including proportion of time spent generating data vs executing your test function
Settings - determinism
Maybe you only want to know about new bugs in CI
And then run in nondeterministic mode in other tests.
Reproducing failures
Reproducing failures
�Temporary decorator, but great�in CI when printing doesn’t work
Sharing the database
You could share the directory-based DB, but much better to use our native tools:
(and it’s easy to implement a Hypothesis DB on any key-value store)
target() -guided testing
Hypothesis is mostly “blackbox” - using heuristics and diversity-sampling�this is better than random, but a directed search is better again.�
hypothesis.target(score_to_maximise, label="for multi-objective optimisation")
Coverage-guided fuzzing: Atheris
No targets? No problem - target “executed this line of code”!��Atheris is Google’s libfuzzer wrapper for Python
Coverage-guided fuzzing: HypoFuzz
HypoFuzz is Zac’s fuzzing engine for Hypothesis test suites
�Better workflow integration,�great database support*, etc.��*but not better than .fuzz_one_input 😇
Where to go for support
hypothesis.readthedocs.io/en/latest/support.html
We do not promise free support, but you can try:
If you have a support or training budget, email us!
Updating Hypothesis
We do continuous deployment - every PR is a new release
Update on the schedule that works for you�e.g. weekly, monthly, to get a new feature or perf improvement
We take stability very seriously�...but you should still pin all your transitive dependencies
Exercises!
Q&A time
last chance before we wrap up
Thanks for coming!
Now go forth and test everything :-)