Rich coverage signal
and consequences for scaling
Agenda
Fuzzing:
generating a maximally diverse
but limited set of test inputs
Coverage-guided fuzzing: coverage is the diversity metric
Fuzz Target
S.U.T.
Inputs
Fuzzing
Engine
Coverage
Inputs
Mutator
Seed
Corpus
Generated
Corpus
Guided fuzzing is stronger than unguided (*), but…
Guided fuzzing is unguided most of the time
(*) empirically
Initial state: seed corpus
Input in corpus
Early fuzzing: interesting mutants are added to corpus
Input in corpus
Input not in corpus
Input newly added to corpus
Corpus grows
Input in corpus
Input not in corpus
Input newly added to corpus
Recently added
More fuzzing
Input in corpus
Input not in corpus
Input newly added to corpus
Corpus grows
Input in corpus
Input not in corpus
Input newly added to corpus
Recently added
Recently added
Recently added
Recently added
And grows … until it doesn’t
Input in corpus
Input not in corpus
Input newly added to corpus
And grows … until it doesn’t
Input in corpus
Input not in corpus
Input newly added to corpus
Problem: traditional coverage signals are sparse
Desired: dense coverage signal, but fuzzing still scales
Input in corpus
Rich (dense) coverage =>
slower runs, larger corpus, more runs =>
more CPU, RAM, and Disk
Case Study: SiliFuzz
SiliFuzz: detects CPU bugs and defects
[0] SiliFuzz: Fuzzing CPUs by proxy. arxiv.org/abs/2110.11519
[1] Cores that don't count. dl.acm.org/doi/10.1145/3458336.3465297 (Google 2021)
[2] Silent Data Corruptions at Scale. arxiv.org/abs/2102.11245 (Facebook 2021)
[3] Detecting silent data corruptions in the wild (<link>, Meta 2022)
SiliFuzz: fuzzing by proxy
Anecdotal evidence in support for rich coverage signal
Centipede:
a distributed fuzzing engine
with support for rich coverage signal
Centipede’s goals
Centipede: Engine vs Runner
Fuzz Target
S.U.T.
Inputs
Centipede
Engine
Coverage
Inputs
Mutator
Seed
Corpus
Generated
Corpus
Centipede Runner
Features and Feature Domains
Feature domains supported currently
Shards, state, distributed execution
Guided Mutation and Execution Metadata
<not covered here>
Corpus management
Corpus distillation (minimization)
Centipede vs {libFuzzer, AFL, … }
Q&A