1 of 25

Differential Testing

with Foundry

@annascarroll

2 of 25

What is differential testing?

Differential testing is utilized to ensure identical behavior between two or more implementations of equivalent code.

  • Compare 2+ instances of equivalent code
    • A, B
  • Provide the same test inputs
    • x
  • Ensure all meaningful behavior is identical (not just outputs)
    • A(x) == B(x)

�Note: it’s loosely assumed that inputs are generated (e.g. fuzzed), but not required

3 of 25

4 of 25

Differential testing smart contracts

Candidate implementations do not need to be written in the same language, nor even compile to the same VM

  1. Test Solidity against Python
    1. for EVM-agnostic code like Math or Merkle Tree libraries
  2. Test Solidity against Solidity
    • for rich, stateful, EVM-specific applications - emits, reverts, etc.
  3. Test Solidity against Huff, Vyper, Edge
    • for personal edification

5 of 25

with Foundry…

  • All the ergonomics of Foundry that Solidity devs are used to
  • Easy combination with other testing patterns
    • Local tests, Fork tests
    • Explicit inputs, Fuzzed inputs
  • Cheatcode `ffi` enables testing with non-EVM implementations
  • Can stick to native testing patterns for EVM-only implementations

6 of 25

EVM-Specific Use Cases

  1. Gas Optimization
  2. Validating Upgrades

7 of 25

Gas Optimization

  • Write an un-Optimized reference implementation in idiomatic Solidity
  • Transpile the reference into a highly Optimized final version
  • Differential test the un-Optimized reference vs. Optimized final
  • “The Seaport Pattern”

8 of 25

Gas Optimization - Benefits

  • Developers can design & build the app more quickly using Solidity, then focus on optimization later
  • End users get a hyper-optimized contract on mainnet
  • Reference implementation becomes a valuable resource
    • Auditors can audit the reference and the final for higher confidence
    • Integrators & devs with less context on Solidity can understand the code via the reference
    • Teaching tool to learn Yul / Assembly by studying the reference & the final
  • Differential testing builds confidence that optimizations didn’t break functionality

9 of 25

Validating Upgrades

  • Write fork tests against production contracts pre-upgrade
  • Write fork tests against production contracts post-upgrade
  • Differential testing builds confidence that new code doesn’t interact badly with state on mainnet

10 of 25

Coding Patterns

  1. “Naive” pattern
  2. Seaport pattern
  3. My proposed pattern >:~)

11 of 25

Naive pattern

12 of 25

Naive pattern

  • Probably the first thing in peoples’ minds when they picture Differential testing
  • Gets clunky really fast
  • To make richer assertions (expectEmit, expectRevert, etc.), need to repeat the same assertion for each contract
  • Lots of boilerplate

13 of 25

Seaport pattern

14 of 25

15 of 25

Seaport pattern: Benefits

  • A lot cleaner than the Naive pattern, but still less readable than normal Foundry tests
  • Under-the-hood, workarounds are inscrutable for most devs to understand
  • Inaccessible

16 of 25

Proposed pattern

17 of 25

Proposed pattern

18 of 25

Proposed pattern: Input Equality

  • Explicitly defined inputs are automatically equal
  • To make fuzz inputs equal… side quest into Foundry fuzzer

19 of 25

Foundry Fuzzer

  • The Foundry fuzzer is “smart” about producing inputs
  • It doesn’t just throw meaningless data at tests
  • It adds values from the test setup to a dictionary
    • Any values written to storage (SSTORE)
    • Any values pushed to the stack (PUSH opcodes)
  • Dictionary data serves as input for an algorithm which produces test inputs
  • More bugs and edge cases can be caught this way

  • Algorithm that produces inputs can be made deterministic by adding a salt
  • If input dictionary is the same && a salt is provided, test inputs are the same

20 of 25

Proposed pattern: Input Equality

  • Explicit developer-defined defined inputs are automatically the same
  • To make fuzz inputs the same…
    • Provide a salt
    • Probably exclude values pushed to stack
    • Maybe exclude values from storage
    • Make sure FUZZ_RUNS is configured as high as you want it - inputs will be exactly the same every time you run the tests

[fuzz]

seed = "1337"

include_push_bytes = false

21 of 25

Proposed pattern: Drawbacks

  • Excluding dictionary values makes the fuzzer a bit “dumber”
    • Could surface less bugs
  • Including dictionary values means the inputs might no longer be the same between the test runs
    • e.g. does not conform to true definition of Differential testing
  • To mitigate, could run the test suite twice with different configurations to get the best of both worlds

22 of 25

Proposed pattern: Benefits

  • Very readable
  • Quick & easy to try, even with existing test suites
  • Pragmatic
  • Approachable for all levels
  • Could radically reduce the barrier-to-entry for trying Differential testing!

23 of 25

Acknowledgements

Kudos to emo.eth, 0age, et al. for work on Differential testing in Seaport

Thank u evalir for answering Qs about Foundry fuzzer :)

Thank u Prestwich, Jenny Pollack, aleph_v for being awesome sounding boards <3

24 of 25

References

25 of 25

Questions?

Twitter: @annascarroll