6 of 63

Symbolic Execution

Key ideas:

Each input in the program is associated with a symbol
Each symbol represents a set of values
Instructions in the program generates expressions

Program analysis technique pioneered by J. C. King in 1976

7 of 63

Symbolic Execution - Example Straight-line

int32_t foo(int32_t a) {

a = a + 10;

a = a * 2;

return a;

}

8 of 63

Symbolic Execution - Example Straight-line

int32_t foo(int32_t a) {

a = a + 10;

a = a * 2;

return a;

}

9 of 63

Symbolic Execution - Example Straight-line

int32_t foo(int32_t a) {

a = a + 10;

a = a * 2;

return a;

}

10 of 63

Symbolic Execution - Example Straight-line

int32_t foo(int32_t a) {

a = a + 10;

a = a * 2;

return a;

}

11 of 63

Symbolic Execution - Example Straight-line

int32_t foo(int32_t a) {

a = a + 10;

a = a * 2;

return a;

}

Easy! No branches!

12 of 63

Symbolic Execution - Example Branch

What happens at branches?

13 of 63

Symbolic Execution - Example Branch

What happens at branches?

int32_t abs(int32_t a) {

if (a > 0)

return a;

return -a;

}

14 of 63

Symbolic Execution - Example Branch

What happens at branches?

int32_t abs(int32_t a) {

if (a > 0)

return a;

return -a;

}

15 of 63

Symbolic Execution - Example Branch

What happens at branches?

int32_t abs(int32_t a) {

if (a > 0)

return a;

return -a;

}

The execution state is split in two states that models either outcome of the branch

16 of 63

Symbolic Execution - Example Branch

What happens at branches?

int32_t abs(int32_t a) {

if (a > 0)

return a;

return -a;

}

The execution state is split in two states that models either outcome of the branch

Each state has a path constraint that defines its condition of validity

17 of 63

Symbolic Execution - Symbolic Formulas

Now What?

18 of 63

Symbolic Execution - Symbolic Formulas

We can use an SMT Solver to reason on the generated formulas: e.g., we can check whether the dividend of a division can be zero

Now What?

[...]

return a / b;

SMT Solver

Is satisfiable?

32-bit bit-vectors

19 of 63

Symbolic Execution - Symbolic Formulas

We can use an SMT Solver to reason on the generated formulas: e.g., we can check whether the dividend of a division can be zero

Now What?

[...]

return a / b;

SMT Solver

Is satisfiable?

yes!

32-bit bit-vectors

20 of 63

Symbolic Execution - Example

Symbolic Execution can find all the inputs that makes the assertion at line 8 fail

image credit

21 of 63

Symbolic Execution - Final Remarks

In general, pure static symbolic execution hardly scales on real-world programs:

Path explosion
Hard-to-solve constraints
Symbolic memory accesses
Emulation time
…

Static symbolic executors:

22 of 63

Concolic Execution

23 of 63

Concolic Execution

Dynamic flavor of symbolic execution (concrete + symbolic)

Key idea:

Choose a concrete path (i.e., an input for the program)
Run concretely the program
Build symbolic expressions alongside concrete values
Check conditions on the chosen path

24 of 63

Concolic Execution - Example

int32_t foo(int32_t a) {

a = a + 1;

if (a > 5)

if (a < 8)

a = a / 2;

return a;

}

25 of 63

Concolic Execution - Example

int32_t foo(int32_t a) {

a = a + 1;

if (a > 5)

if (a < 8)

a = a / 2;

return a;

}

concrete value (seed)

26 of 63

Concolic Execution - Example

int32_t foo(int32_t a) {

a = a + 1;

if (a > 5)

if (a < 8)

a = a / 2;

return a;

}

27 of 63

Concolic Execution - Example

int32_t foo(int32_t a) {

a = a + 1;

if (a > 5)

if (a < 8)

a = a / 2;

return a;

}

28 of 63

Concolic Execution - Example

int32_t foo(int32_t a) {

a = a + 1;

if (a > 5)

if (a < 8)

a = a / 2;

return a;

}

We can use the negated branch condition to generate a new input that covers the other outcome of the branch

29 of 63

Concolic Execution - Example

int32_t foo(int32_t a) {

a = a + 1;

if (a > 5)

if (a < 8)

a = a / 2;

return a;

}

30 of 63

Concolic Execution - Example

int32_t foo(int32_t a) {

a = a + 1;

if (a > 5)

if (a < 8)

a = a / 2;

return a;

}

31 of 63

Concolic Execution - Final Remarks

Pros wrt static symbolic execution

No need to call the solver at branches if we are not interested in generating a new input: less queries
The implementation can be very fast if we compile the instrumentation (e.g., using LLVM, PIN, QEMU)

Cons wrt static symbolic execution

To explore a branch that is not taken by the seed, you need to re-execute from the beginning

NOTE: path explosion is still an issue!

32 of 63

Hybrid Fuzzing

33 of 63

Fuzzing

State-of-the-art technique for automatic test-case generation and vulnerability detection
Based on random mutations of a pool of inputs with the goal of finding crashes in the target application

34 of 63

Fuzzing

State-of-the-art technique for automatic test-case generation and vulnerability detection
Based on random mutations of a pool of inputs with the goal of finding crashes in the target application

very effective in practice

35 of 63

Fuzzing

State-of-the-art technique for automatic test-case generation and vulnerability detection
Based on random mutations of a pool of inputs with the goal of finding crashes in the target application
Very effective in practice...

very effective in practice

36 of 63

Fuzzing

State-of-the-art technique for automatic test-case generation and vulnerability detection
Based on random mutations of a pool of inputs with the goal of finding crashes in the target application
Very effective in practice...

very effective in practice

37 of 63

Fuzzing

State-of-the-art technique for automatic test-case generation and vulnerability detection
Based on random mutations of a pool of inputs with the goal of finding crashes in the target application
Very effective in practice...

very effective in practice

38 of 63

Fuzzing

State-of-the-art technique for automatic test-case generation and vulnerability detection
Based on random mutations of a pool of inputs with the goal of finding crashes in the target application
Very effective in practice...

very effective in practice

39 of 63

Fuzzing

State-of-the-art technique for automatic test-case generation and vulnerability detection
Based on random mutations of a pool of inputs with the goal of finding crashes in the target application
Very effective in practice...

very effective in practice

40 of 63

Fuzzing

Still, it has some drawbacks:

Struggles in overcoming checks against magic numbers or checksums (road-blocks)
Typically needs a reasonable set of input seeds

41 of 63

Fuzzing

Still, it has some drawbacks:

Struggles in overcoming checks against magic numbers or checksums (road-blocks)
Typically needs a reasonable set of input seeds

Can we combine fuzzing and symbolic execution to take the best from the two worlds?

42 of 63

Fuzzing vs Symbolic Execution

43 of 63

Driller

First work (2016) on fuzzing improved with concolic execution

Key Ideas:

Launch the fuzzer (AFL) for some time
When the fuzzer get stuck (i.e., is unable to generate a new input), run the concolic executor and trying to generate a new input

Issues:

The concolic executor is slow (interpreted, and written in python)
The interaction between the fuzzer and the concolic executor is coarse grained

44 of 63

QSYM

Hybrid fuzzer and concolic executor based on Intel PIN (2018)

Key Ideas:

JIT the instrumentation at runtime using PIN (no interpretation)
The concolic executor runs in parallel with the fuzzer
Optimizations: linearization and optimistic solving

45 of 63

QSYM - Parallel Setup

Input queue A

Fuzzer�[master]

Fuzzer�[slave]

QSYM

Input queue B

load/store

load

store

two-way synchronization

one-way synchronization

Input queue C

46 of 63

SymCC

Hybrid fuzzer and concolic executor based on LLVM (2020)

Key Ideas:

Very fast instrumentation added at compilation time with an LLVM pass
Parallel setup (as in QSYM)
Function models to inject symbolic data
Requires source code and recompilation of all components (or misses symbolic propagation)

48 of 63

Fuzzolic

Fast concolic executor based on QEMU (JIT-ed instrumentation)

Developed independently and in parallel with SymQEMU (another concolic executor based on QEMU)
Same parallel setup of QSYM
Two solver backends:

Z3 SMT solver
FuzzySAT: approximate solver

Tracer�(QEMU)

Solver�Frontend

Query Pool

Expression Pool

Shared Memories

Solver�Backend

49 of 63

QEMU - JIT Workflow

(a)�Next basic block to execute is in cache?

(b)�JIT compilation of the basic block, then�add it to cache

(c)�Execute basic block �from the cache

YES

THEN

Note: during JIT compilation, QEMU translate native code to the intermediate language TCG

50 of 63

Fuzzolic - Instrumentation

QEMU with FUZZOLIC

TCG�code

Lifting from target code to TCG code

Live analysis and optimization

Symbolic�instrumentation

Target�binary�code

Lowering from TCG code to �host code

TCG�code

Host�binary�code

TCG�code

QEMU without FUZZOLIC

Edge legend:

QEMU with/without FUZZOLIC

Fuzzolic instruments the program in TCG

51 of 63

Fuzzolic - Instrumentation

Two instrumentation methods:

Inline instrumentation (simplified)

mov rbx, rax

x64

52 of 63

Fuzzolic - Instrumentation

Two instrumentation methods:

Inline instrumentation (simplified)

mov rbx, rax

x64

mov_i64 tcg_reg_rbx, tcg_reg_rax

tcg

53 of 63

Fuzzolic - Instrumentation

Two instrumentation methods:

Inline instrumentation (simplified)

mov rbx, rax

x64

mov_i64 tcg_reg_rbx, tcg_reg_rax

tcg

mov_i64 tcg_sym_reg_rbx, tcg_sym_reg_rax

mov_i64 tcg_reg_rbx, tcg_reg_rax

instrumented (inline)

54 of 63

Fuzzolic - Instrumentation

Two instrumentation methods:

Inline instrumentation (simplified)

Helper instrumentation (simplified)

mov rbx, rax

x64

mov_i64 tcg_reg_rbx, tcg_reg_rax

tcg

mov_i64 tcg_sym_reg_rbx, tcg_sym_reg_rax

mov_i64 tcg_reg_rbx, tcg_reg_rax

instrumented (inline)

call fuzzolic_mov_reg, tcg_sym_reg_rax, tcg_sym_reg_rbx

mov_i64 tcg_reg_rbx, tcg_reg_rax

instrumented (helper)

55 of 63

Fuzzolic - TCG Helpers

Unfortunately, while TCG is architecture-independent, it uses many helpers for particular instructions

x86 is full of those helpers (e.g., vector instructions)

Fuzzolic has hard-written models for some of those helpers, but many are missing (e.g., floating point instructions)

56 of 63

Fuzzy-SAT - An Approximate Solver

Key Ideas:

Given a branch query of a concolic executor ( ), the seed that has driven the concolic exploration satisfies by design
Fuzzing transformations has been proven effective in overcoming branch conditions
While SMT solvers offer a rich set of solving primitives, most concolic executors (e.g., QSYM) are built on top of a few but essential primitives

57 of 63

Fuzzy-SAT - An Approximate Solver

Given a branch query and the seed, mutate the bytes of the seed trying to solve while keeping satisfiable.

Fuzzy-SAT Architecture

58 of 63

Fuzzy-SAT - An Approximate Solver

Given a branch query and the seed, mutate the bytes of the seed trying to solve while keeping satisfiable.

Analysis: learn from the symbolic expressions added to

Fuzzy-SAT Architecture

59 of 63

Fuzzy-SAT - An Approximate Solver

Given a branch query and the seed, mutate the bytes of the seed trying to solve while keeping satisfiable.

Analysis: learn from the symbolic expressions added to
Reasoning: use the acquired knowledge to apply simple but fast transformations to the seed

Fuzzy-SAT Architecture

1 of 63

2 of 63

3 of 63

4 of 63

5 of 63

6 of 63

7 of 63

8 of 63

9 of 63

10 of 63

11 of 63

12 of 63

13 of 63

14 of 63

15 of 63

16 of 63

17 of 63

18 of 63

19 of 63

20 of 63

21 of 63

22 of 63

23 of 63

24 of 63

25 of 63

26 of 63

27 of 63

28 of 63

29 of 63

30 of 63

31 of 63

32 of 63

33 of 63

34 of 63

35 of 63

36 of 63

37 of 63

38 of 63

39 of 63

40 of 63

41 of 63

42 of 63

43 of 63

44 of 63

45 of 63

46 of 63

47 of 63

48 of 63

49 of 63

50 of 63

51 of 63

52 of 63

53 of 63

54 of 63

55 of 63

56 of 63

57 of 63

58 of 63

59 of 63

60 of 63

61 of 63

62 of 63

63 of 63