Hybrid Fuzzing
Luca Borzacchiello
$ whoami
Outline
Symbolic Execution
Symbolic Execution
Program analysis technique pioneered by J. C. King in 1976
Symbolic Execution
Key ideas:
Program analysis technique pioneered by J. C. King in 1976
Symbolic Execution - Example Straight-line
int32_t foo(int32_t a) {
a = a + 10;
a = a * 2;
return a;
}
Symbolic Execution - Example Straight-line
int32_t foo(int32_t a) {
a = a + 10;
a = a * 2;
return a;
}
Symbolic Execution - Example Straight-line
int32_t foo(int32_t a) {
a = a + 10;
a = a * 2;
return a;
}
Symbolic Execution - Example Straight-line
int32_t foo(int32_t a) {
a = a + 10;
a = a * 2;
return a;
}
Symbolic Execution - Example Straight-line
int32_t foo(int32_t a) {
a = a + 10;
a = a * 2;
return a;
}
Easy! No branches!
Symbolic Execution - Example Branch
What happens at branches?
Symbolic Execution - Example Branch
What happens at branches?
int32_t abs(int32_t a) {
if (a > 0)
return a;
return -a;
}
Symbolic Execution - Example Branch
What happens at branches?
int32_t abs(int32_t a) {
if (a > 0)
return a;
return -a;
}
Symbolic Execution - Example Branch
What happens at branches?
int32_t abs(int32_t a) {
if (a > 0)
return a;
return -a;
}
The execution state is split in two states that models either outcome of the branch
Symbolic Execution - Example Branch
What happens at branches?
int32_t abs(int32_t a) {
if (a > 0)
return a;
return -a;
}
The execution state is split in two states that models either outcome of the branch
Each state has a path constraint that defines its condition of validity
Symbolic Execution - Symbolic Formulas
Now What?
Symbolic Execution - Symbolic Formulas
We can use an SMT Solver to reason on the generated formulas: e.g., we can check whether the dividend of a division can be zero
Now What?
[...]
return a / b;
SMT Solver
Is satisfiable?
32-bit bit-vectors
Symbolic Execution - Symbolic Formulas
We can use an SMT Solver to reason on the generated formulas: e.g., we can check whether the dividend of a division can be zero
Now What?
[...]
return a / b;
SMT Solver
Is satisfiable?
yes!
32-bit bit-vectors
Symbolic Execution - Example
Symbolic Execution can find all the inputs that makes the assertion at line 8 fail
Symbolic Execution - Final Remarks
In general, pure static symbolic execution hardly scales on real-world programs:
Static symbolic executors:
Concolic Execution
Concolic Execution
Dynamic flavor of symbolic execution (concrete + symbolic)
Key idea:
Concolic Execution - Example
int32_t foo(int32_t a) {
a = a + 1;
if (a > 5)
if (a < 8)
a = a / 2;
return a;
}
Concolic Execution - Example
int32_t foo(int32_t a) {
a = a + 1;
if (a > 5)
if (a < 8)
a = a / 2;
return a;
}
concrete value (seed)
Concolic Execution - Example
int32_t foo(int32_t a) {
a = a + 1;
if (a > 5)
if (a < 8)
a = a / 2;
return a;
}
Concolic Execution - Example
int32_t foo(int32_t a) {
a = a + 1;
if (a > 5)
if (a < 8)
a = a / 2;
return a;
}
Concolic Execution - Example
int32_t foo(int32_t a) {
a = a + 1;
if (a > 5)
if (a < 8)
a = a / 2;
return a;
}
We can use the negated branch condition to generate a new input that covers the other outcome of the branch
Concolic Execution - Example
int32_t foo(int32_t a) {
a = a + 1;
if (a > 5)
if (a < 8)
a = a / 2;
return a;
}
Concolic Execution - Example
int32_t foo(int32_t a) {
a = a + 1;
if (a > 5)
if (a < 8)
a = a / 2;
return a;
}
Concolic Execution - Final Remarks
Pros wrt static symbolic execution
Cons wrt static symbolic execution
NOTE: path explosion is still an issue!
Hybrid Fuzzing
Fuzzing
Fuzzing
very effective in practice
Fuzzing
very effective in practice
Fuzzing
very effective in practice
Fuzzing
very effective in practice
Fuzzing
very effective in practice
Fuzzing
very effective in practice
Fuzzing
Still, it has some drawbacks:
Fuzzing
Still, it has some drawbacks:
Can we combine fuzzing and symbolic execution to take the best from the two worlds?
Fuzzing vs Symbolic Execution
Driller
First work (2016) on fuzzing improved with concolic execution
Key Ideas:
Issues:
QSYM
Hybrid fuzzer and concolic executor based on Intel PIN (2018)
Key Ideas:
QSYM - Parallel Setup
Input queue A
Fuzzer�[master]
Fuzzer�[slave]
QSYM
Input queue B
load/store
load/store
load
store
two-way synchronization
one-way synchronization
one-way synchronization
Input queue C
SymCC
Hybrid fuzzer and concolic executor based on LLVM (2020)
Key Ideas:
Fuzzolic
Fuzzolic
Fast concolic executor based on QEMU (JIT-ed instrumentation)
Tracer�(QEMU)
Solver�Frontend
Query Pool
Expression Pool
Shared Memories
Solver�Backend
QEMU - JIT Workflow
(a)�Next basic block to execute is in cache?
(b)�JIT compilation of the basic block, then�add it to cache
(c)�Execute basic block �from the cache
YES
NO
THEN
THEN
Note: during JIT compilation, QEMU translate native code to the intermediate language TCG
Fuzzolic - Instrumentation
QEMU with FUZZOLIC
TCG�code
TCG�code
Lifting from target code to TCG code
Live analysis and optimization
Symbolic�instrumentation
Target�binary�code
Lowering from TCG code to �host code
TCG�code
Host�binary�code
TCG�code
QEMU without FUZZOLIC
Edge legend:
QEMU with/without FUZZOLIC
Fuzzolic instruments the program in TCG
Fuzzolic - Instrumentation
Two instrumentation methods:
mov rbx, rax
x64
Fuzzolic - Instrumentation
Two instrumentation methods:
mov rbx, rax
x64
mov_i64 tcg_reg_rbx, tcg_reg_rax
tcg
Fuzzolic - Instrumentation
Two instrumentation methods:
mov rbx, rax
x64
mov_i64 tcg_reg_rbx, tcg_reg_rax
tcg
mov_i64 tcg_sym_reg_rbx, tcg_sym_reg_rax
mov_i64 tcg_reg_rbx, tcg_reg_rax
instrumented (inline)
Fuzzolic - Instrumentation
Two instrumentation methods:
mov rbx, rax
x64
mov_i64 tcg_reg_rbx, tcg_reg_rax
tcg
mov_i64 tcg_sym_reg_rbx, tcg_sym_reg_rax
mov_i64 tcg_reg_rbx, tcg_reg_rax
instrumented (inline)
call fuzzolic_mov_reg, tcg_sym_reg_rax, tcg_sym_reg_rbx
mov_i64 tcg_reg_rbx, tcg_reg_rax
instrumented (helper)
Fuzzolic - TCG Helpers
Unfortunately, while TCG is architecture-independent, it uses many helpers for particular instructions
Fuzzy-SAT - An Approximate Solver
Key Ideas:
Fuzzy-SAT - An Approximate Solver
Given a branch query and the seed, mutate the bytes of the seed trying to solve while keeping satisfiable.
Fuzzy-SAT Architecture
Fuzzy-SAT - An Approximate Solver
Given a branch query and the seed, mutate the bytes of the seed trying to solve while keeping satisfiable.
Fuzzy-SAT Architecture
Fuzzy-SAT - An Approximate Solver
Given a branch query and the seed, mutate the bytes of the seed trying to solve while keeping satisfiable.
Fuzzy-SAT Architecture
Fuzzy-SAT - Performance
Fuzzolic - Performance
DEMO
Thanks!
Questions?