1 of 183

Symbolic Execution for �Bug Hunting in Binaries

Fabio Gritti

(@degrigis)

2 of 183

2

Many many papers from an heterogeneous set of research groups!

3 of 183

Background

3

4 of 183

Bug Hunting

4

  • Hunt for program states that are breaking the logic of the application and/or can be exploited to take control of the program itself.
  • Hunting for bugs became the gold rush of our ages
    • Different hunter → different motivations�

5 of 183

Symbolic Execution

5

1975

6 of 183

6

Are there any values for “a” and “b” for which the program breaks the assertion?

7 of 183

7

a=5, b=7

Are there any values for “a” and “b” for which the program breaks the assertion?

8 of 183

8

a=3, b=1

Are there any values for “a” and “b” for which the program breaks the assertion?

9 of 183

Symbolic Execution

9

symbolic variables

Are there any values for “a” and “b” for which the program breaks the assertion?

10 of 183

Symbolic Execution

10

symbolic variables

Are there any values for “a” and “b” for which the program breaks the assertion?

a=2, b=0

11 of 183

Program Binary

11

  • A binary is an artifact (i.e., a file) produced by a compiler after compiling a program’s source code.�
  • Strict definition: binary contains assembly instructions executed by the CPU�
  • Loose Definition: binary can contain the IR interpreted by a VM�
  • Often, any debugging info/high level metadata coming from source code is unavailable in this artifact

12 of 183

SE for Bug Hunting in Binaries

12

  • Use symbolic analysis over program binaries to hunt for as many bugs as we can!�
  • SE can be used for formal verification, but no false negative is tolerated
    • Verification is HARD and very tightly coupled with a strict specification�
  • Bug hunting → relax the soundness requirement (but we can still give some guarantees)

13 of 183

Motivation

13

14 of 183

Why Binaries?

14

    • Source language independent!
      • Many high level languages are compiled down to same ASM�
    • “What you fuzz is what you ship” [62]
      • Truth lies in the binary! �
    • Source code unavailable for specific domains
      • Malware Analysis
      • Firmware Analysis

15 of 183

Why Symbolic Analysis?

15

  • Speed up the process of understanding a program’s behaviors
    • Which code is triggered by which input? �
  • Can provide guarantees over some properties of the target program
    • There exists no input for which the program reaches a specific state�
  • Compensate limitations of black-box fuzzing
    • Blackbox fuzzing blind-spots

16 of 183

Why do we need this?

16

  • Annual cost estimate for inadequate infrastructure for software testing was estimated to be ~$60 billion*
  • The potential cost reduction from feasible infrastructure improvements was estimated to be ~$22.2 billion*�

*Source: NIST report [66]

17 of 183

SE 101

17

18 of 183

18

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Program

Sm�States Manager

σ

Symbolic store

19 of 183

19

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

σ

Symbolic store

20 of 183

20

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

add rax, rbx

Sm�States Manager

σ

Symbolic store

21 of 183

21

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

mem[x] = 0x5

mem[y] = sym_var1

mem[z] = 0xdeadbeef

22 of 183

22

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

a != 0 /\ b == 5

Sm�States Manager

σ

Symbolic store

23 of 183

23

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

σ

Symbolic store

a != 0 /\ b > 5

SAT:�a=5 b=6

24 of 183

24

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

σ

Symbolic store

a*x != 0 /\ f(a) > 5

?!?!?!??!

25 of 183

25

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

σ

Symbolic store

26 of 183

26

symbolic variables

2

*States numbers correspond to code line (for simplicity we use program’s source code here)

27 of 183

27

symbolic variables

2

3

8

*States numbers correspond to code line (for simplicity we use program’s source code here)

a==0

a!=0

4

28 of 183

28

symbolic variables

2

3

8

4

*States numbers correspond to code line (for simplicity we use program’s source code here)

5

8

6

8

a==0

a!=0

b!=0

b==0

29 of 183

29

symbolic variables

2

3

8

4

*States numbers correspond to code line (for simplicity we use program’s source code here)

5

8

6

8

a!=0

b!=0

b==0

😎

😎

a==0

30 of 183

30

symbolic variables

2

3

8

4

*States numbers correspond to code line (for simplicity we use program’s source code here)

5

8

6

8

a!=0

b!=0

b==0

a=2�b=0

😰

😎

😎

a==0

31 of 183

Research Areas

31

32 of 183

Research Challenges

32

Execution Performances

State Explosion

Imprecise Analysis

  • Memory exhaustion of the system
  • No meaningful analysis progresses
  • Too many false positives/negatives
  • Invalid/unusable results
  • Not enough coverage before timeout
  • No meaningful analysis progresses

Constraints Solving

  • Solving time timeout
  • Unfeasible constraints

33 of 183

33

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

34 of 183

34

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Store Management

States Management

Environment

Disassembly

Functions Boundaries

Program Processing

Sm�States Manager

Constraints Management

35 of 183

35

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Store Management

States Management

Environment

Program Processing

Sm�States Manager

Constraints Management

36 of 183

36

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Program Processing

Sm�States Manager

Program Processing

σ

Symbolic store

Program Processing

37 of 183

Program Processing

  • How the symbolic execution engine is supposed to process the instructions contained in the target binary?
  • Classic Approaches:
    • IR-Based Symbolic Execution
    • IR-Less Symbolic Execution

37

Program Processing

38 of 183

Binary Processing

38

IR-Based Symbolic Execution�(most popular)

add rax,rbx

mov rcx, rbx

dec rcx

sub rax,rbx

jnz label1

translate

SUM T0,T1

PUT …

DEC …

PUT …

ASM

IR

exe

39 of 183

Binary Processing

39

IR-Less Symbolic Execution

add rax,rbx

mov rcx, rbx

dec rcx

sub rax,rbx

jnz label1

exe

Symbolic �reasoning

Hook

exe

ASM

40 of 183

Binary Processing

40

IR-Based Symbolic Execution

IR-Less Symbolic Execution

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

41 of 183

Binary Processing

41

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

42 of 183

Binary Processing

42

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Architecture agnostic

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

43 of 183

Binary Processing

43

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Architecture agnostic

Easier queries to the solver[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

44 of 183

Binary Processing

44

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Architecture agnostic

Easier queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

45 of 183

Binary Processing

45

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Architecture agnostic

Easier queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

Poor execution performance[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

46 of 183

Binary Processing

46

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Hard to implement(thousands of instructions)

Architecture agnostic

Easier queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

Poor execution performance[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

47 of 183

Binary Processing

47

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Hard to implement(thousands of instructions)

Architecture agnostic

Architecture-dependent (not portable!)

Easier queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

Poor execution performance[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

48 of 183

Binary Processing

48

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Hard to implement(thousands of instructions)

Architecture agnostic

Architecture-dependent (not portable!)

Easier queries to the solver[39]

Harder queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

Poor execution performance[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

49 of 183

Binary Processing

49

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Hard to implement(thousands of instructions)

Architecture agnostic

Architecture-dependent (not portable!)

Easier queries to the solver[39]

Harder queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

More Robust�(Unsupported ins => Just execute concretely)

Poor execution performance[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

50 of 183

Binary Processing

50

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Hard to implement(thousands of instructions)

Architecture agnostic

Architecture-dependent (not portable!)

Easier queries to the solver[39]

Harder queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

More Robust�(Unsupported ins => Just execute concretely)

Poor execution performance[39]

Good execution performance[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

51 of 183

Binary Processing

51

IR-Based Symbolic Execution

IR-Less Symbolic Execution

Easier to implement�(Small number of high-level instructions)

Hard to implement(thousands of instructions)

Architecture agnostic

Architecture-dependent (not portable!)

Easier queries to the solver[39]

Harder queries to the solver[39]

Less robust�(Unsupported native ins => Fatal error)

More Robust�(Unsupported ins => Just execute concretely)

Poor execution performance[39]

Good execution performance[39]

KLEE[7], S2E[10], angr[25]

QSYM[34], SAGE[62], Triton[65]

52 of 183

SymQEMU: Compilation-based symbolic execution �for binaries

  • Intuition: embed the symbolic executor code inside the application itself!�
  • instruments QEMU’s IR (TCG) during dynamic binary translation to embed SE operations. Compile the final instrumented IR to machine code and execute! �
  • Symbolic reasoning of operations is borrowed from a previous project: SymCC�

52

TAKEAWAYS

53 of 183

53

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

States Management

54 of 183

State Management

  • How to efficiently explore the program’s states in a given time budget?�
  • Issues:
    • Too many states to track (state/path explosion)
    • Wasting time analyzing useless path!
    • States have too complex path constraints�
  • Approaches:
    • Hybrid Execution (concrete+symbolic)
    • Program Summarization
    • Path Scheduling

54

States Management

55 of 183

Hybrid Execution

  • Mix concrete and symbolic inputs to support symbolic analysis and increase code coverage (and therefore, possibility to find new bugs)�
  • This SE variant has been heavily used in modern hybrid fuzzers
  • Approaches:
    • [2005] Concolic Execution (DSE)
    • [2014] Symcretic Symbolic Execution

55

State Management → Hybrid Execution

56 of 183

Hybrid Execution

  • Mix concrete and symbolic inputs to support symbolic analysis and increase code coverage (and therefore, possibility to find new bugs)�
  • This SE variant has been heavily used in modern hybrid fuzzers
  • Approaches:
    • [2005] Concolic Execution (DSE)
      • Execute program with symbolic and concrete inputs. Collect path constraints and use concrete values to help symb-exec to get unstuck.
    • [2014] Symcretic Symbolic Execution

56

State Management → Hybrid Execution

57 of 183

57

x=X0, y=Y0

1 |void test_me(int x, int y){

2 | z = (y*y) % 50;

3 | if(z == x){�4 | // ERROR

5 | }else{

6 | // SOMETHING

7 | }

8 |}

z = (y0y0) mod 50

S2

σ

σ = symbolic store

π = path constraints

x=1, y=2

Concolic Execution (DSE)

State Management → Hybrid Execution

58 of 183

58

x=X0, y=Y0

z = (y0y0) mod 50

S2

S3

σ

σ = symbolic store

π = path constraints

x=X0, y=Y0

1 |void test_me(int x, int y){

2 | z = (y*y) % 50;

3 | if(z == x){�4 | // ERROR

5 | }else{

6 | // SOMETHING

7 | }

8 |}

x=1, y=2

Concolic Execution (DSE)

State Management → Hybrid Execution

59 of 183

59

S2

S3

S6

S4

x0 == (y0y0) mod 50

x0 != (y0y0) mod 50

SAT?

π

π

σ = symbolic store

π = path constraints

x=X0, y=Y0

x=X0, y=Y0

1 |void test_me(int x, int y){

2 | z = (y*y) % 50;

3 | if(z == x){�4 | // ERROR

5 | }else{

6 | // SOMETHING

7 | }

8 |}

x=1, y=2

z = (y0y0) mod 50

σ

SAT?

😥

😥

Concolic Execution (DSE)

State Management → Hybrid Execution

60 of 183

60

S2

S3

S6

S4

1 == 2*2 mod 50

1 != 2*2 mod 50

FALSE

π

π

σ = symbolic store

π = path constraints

x=X0, y=Y0

x=X0, y=Y0

1 |void test_me(int x, int y){

2 | z = (y*y) % 50;

3 | if(z == x){�4 | // ERROR

5 | }else{

6 | // SOMETHING

7 | }

8 |}

x=1, y=2

z = (y0y0) mod 50

σ

TRUE

😎

😎

Concolic Execution (DSE)

State Management → Hybrid Execution

61 of 183

61

S2

S3

S6

S4

1 == 2*2 mod 50

1 != 2*2 mod 50

FALSE

π

π

σ = symbolic store

π = path constraints

x=X0, y=Y0

x=X0, y=Y0

1 |void test_me(int x, int y){

2 | z = (y*y) % 50;

3 | if(z == x){�4 | // ERROR

5 | }else{

6 | // SOMETHING

7 | }

8 |}

x=1, y=2

z = (y0y0) mod 50

σ

TRUE

😎

😎

Negate constraints, generate new inputs, cover new code!

Concolic Execution (DSE)

State Management → Hybrid Execution

62 of 183

62

S2

S3

S6

S4

1 == 2*2 mod 50

1 != 2*2 mod 50

FALSE

π

π

σ = symbolic store

π = path constraints

x=X0, y=Y0

x=X0, y=Y0

1 |void test_me(int x, int y){

2 | z = (y*y) % 50;

3 | if(z == x){�4 | // ERROR

5 | }else{

6 | // SOMETHING

7 | }

8 |}

x=1, y=2

z = (y0y0) mod 50

σ

TRUE

😎

😎

Negate constraints, generate new inputs, cover new code!

Concolic Execution (DSE)

State Management → Hybrid Execution

63 of 183

Hybrid Execution

  • Mix concrete and symbolic inputs to support symbolic analysis and increase code coverage (and therefore, possibility to find new bugs)�
  • This SE variant has been heavily used in modern hybrid fuzzers
  • Approaches:
    • [2005] Concolic Execution (DSE)
    • [2014] Symcretic Symbolic Execution
      • Use backward symbolic execution (BSE) and concrete execution to reason about specific target instruction (a.k.a. line reachability problem)

63

State Management → Hybrid Execution

64 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications

64

State Management → Program Summarization

s1

s2

s3

s5

s4

s6

s7

s8

s1

s2

s3

s4

s5

Red border weight represents “state expressiveness” �(i.e., its complexity in terms of constraints)

65 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
    • State merging
    • Third-party libraries summarization

65

State Management → Program Summarization

66 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
      • Avoid paying the cost of re-executing the same functions over and over.
    • Loop summarization
    • States Merging
    • Third-party libraries summarization

66

State Management → Program Summarization

67 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
      • [2007] Compositional Dynamic Test Generation
        • Summarize functions with preconditions and postconditions
      • [2008] Demand-driven Compositional Test Generation
        • Summarize functions with pre- and post- targeting a specific path
    • Loop summarization
    • States Merging
    • Third-party libraries summarization

67

State Management → Program Summarization

68 of 183

68

1 |int is_positive(int x){

2 | if (x>0) return 1;

3 | return 0;

4 |}

State Management → Program Summarization

Compositional Dynamic Test Generation

69 of 183

69

is_positive:

(x > 0 ∧ ret = 1) ∨ (x ≤ 0 ∧ ret = 0)

1 |int is_positive(int x){

2 | if (x>0) return 1;

3 | return 0;

4 |}

pre-cond1

post-cond1

post-cond2

pre-cond2

State Management → Program Summarization

Compositional Dynamic Test Generation

70 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
      • Avoid to pay the cost of re-executing the same loop every time a state enters it
    • States Merging
    • Third-party libraries summarization

70

State Management → Program Summarization

71 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
      • [2009] Loop-Extended Symbolic Execution on Binary Programs
      • [2011] Automatic Partial Loop Summarization in Dynamic Test Generation
      • [2016] Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis
    • States Merging
    • Third-party libraries summarization

71

State Management → Program Summarization

72 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
      • [2009] Loop-Extended Symbolic Execution on Binary Programs
        • Relate loop’s dependent vars with program inputs to increase amount of information we can extract from a loop
      • [2011] Automatic Partial Loop Summarization in Dynamic Test Generation
      • [2016] Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis
    • States Merging
    • Third-party libraries summarization

72

State Management → Program Summarization

73 of 183

73

State Management → Program Summarization

1 | int func(int x){

2 | counter = 0

3 | for(int i=0; i<x; i++){

4 | do_things();

5 | counter++

6 | if(counter == 1000){

7 | maybe_bug()

8 | }

9 | }

10 | }

symbolic input

Loop-Extended Symbolic Execution on Binary Programs

3

4

5

10

6

3

4

10

No “counter” symbolization

74 of 183

74

State Management → Program Summarization

1 | int func(int x){

2 | counter = 0

3 | for(int i=0; i<x; i++){

4 | do_things();

5 | counter++

6 | if(counter == 1000){

7 | maybe_bug()

8 | }

9 | }

10 | }

symbolic input

Loop-Extended Symbolic Execution on Binary Programs

3

4

5

10

6

3

4

10

7

“counter” symbolization

75 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
      • [2009] Loop-Extended Symbolic Execution on Binary Programs
      • [2011] Automatic Partial Loop Summarization in Dynamic Test Generation
        • Capture loop’s side-effects and summarize with pre-conditions and post-conditions formulas over input (single path loop)
      • [2016] Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis
    • States Merging
    • Third-party libraries summarization

75

State Management → Program Summarization

76 of 183

76

1| int n:=∗;

2| int x:=∗;

3| int z:=∗;

4| while (x<n){

5| x++;

6| }

x0 < n ( x = n )

Single-path Loop

Cycle 1: 4-5

Cycle 2: 4-5

pre-cond

post-cond

State Management → Program Summarization

Automatic Partial Loop Summarization in Dynamic Test Generation

77 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
      • [2009] Loop-Extended Symbolic Execution on Binary Programs
      • [2011] Automatic Partial Loop Summarization in Dynamic Test Generation
      • [2016] Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis
        • Summarize multi-paths loops with a disjunction of constraints
    • States Merging
    • Third-party libraries summarization

77

State Management → Program Summarization

78 of 183

78

1| int n:=∗;

2| int x:=∗;

3| int z:=∗;

4| while (x<n){

5| if(z>x) x++;

6| else z++;

7| }

(x0 ≥ n∧x = x0∧z = z0)(x0 < n ≤ z0∧x = n∧z = z0)(x0 < n∧z < n∧x = z0 = n)

Multi-path Loop

Cycle 1: 4-5

Cycle 2: 4-5

Cycle X: 4-6

pre-cond1

post-cond1

pre-cond2

post-cond2

pre-cond3

post-cond3

State Management → Program Summarization

Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis

79 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
    • States Merging
      • Model state progression using path constraints rather than generating new states
    • Third-party libraries summarization

79

State Management → Program Summarization

80 of 183

80

| void abs(int n){

1 | int ret = 0;

2 | if (n < 0)

3 | ret = -n;

4 | else

5 | ret = n;

6 | return ret;

1

2

3

4

6

6

ret -n

ret n

No merging

State Management → Program Summarization

81 of 183

81

| void abs(int n){

1 | int ret = 0;

2 | if (n < 0)

3 | ret = -n;

4 | else

5 | ret = n;

6 | return ret;

1

2

3

4

6

States merging

ret ← ITE(n < 0, −n, n).

State Management → Program Summarization

82 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
    • States Merging
      • [2014] Enhancing Symbolic Execution with Veritesting
        • Leverages SSE to find opportunities of state merging during DSE
      • [2012] Efficient State Merging in Symbolic Execution
      • [2018] Boost Symbolic Execution Using Dynamic State Merging and Forking
    • Third-party libraries summarization

82

State Management → Program Summarization

83 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
    • States Merging
      • [2014] Enhancing Symbolic Execution with Veritesting
      • [2012] Efficient State Merging in Symbolic Execution
        • Optimize states merging opportunities to avoid performance loss.
      • [2018] Boost Symbolic Execution Using Dynamic State Merging and Forking
        • Optimization of “Efficient State Merging in Symbolic Execution”
    • Third-party libraries summarization

83

State Management → Program Summarization

84 of 183

84

| void f3(int a,int b){

1 | int flag = 0;

2 | If (a == b){

3 | flag = 1;

| }

4 | g3(a,b);

5 | If (flag)

6 | g1(flag+1);

7 | else

8 | g2(flag+2);

| }

Naive dynamic merging�(Efficient State Merging in Symbolic Execution)

1

4

5

flag ← ITE(a==b, 1, 0)

g3(a,b);

6

8

flag ←ITE(a==b, 1, 0) + 1

flag ←ITE(a==b, 1, 0) + 2

😱

😱

😎

flag ← 0

Assume parameters a,b are symbolic

State Management → Program Summarization

85 of 183

85

| void f3(int a,int b){

1 | int flag = 0;

2 | If (a == b){

3 | flag = 1;

| }

4 | g3(a,b);

5 | If (flag)

6 | g1(flag+1);

7 | else

8 | g2(flag+2);

| }

1

4

5

flag ← ITE(a==b, 1, 0)

g3(a,b);

6

8

flag ← 1

flag ← 0

| void f3(int a,int b){

1 | int flag = 0;

2 | If (a == b){

3 | flag = 1;

| }

4 | g3(a,b);

5 | If (flag)

6 | g1(flag+1);

7 | else

8 | g2(flag+2);

| }

Dynamic merging + active fork� (Boost Symbolic Execution Using Dynamic State Merging and Forking)

😎

😎

😎

Assume parameters a,b are symbolic

flag ← 0

State Management → Program Summarization

86 of 183

Program Summarization

  • Reduce the amount of generated states by using constraints or simplifications
  • Approaches:
    • Function summarization
    • Loop summarization
    • States Merging
    • Third-party libraries summarization
      • Use “models” to summarize side-effects of non-tracked functions on the symbolic states (We’ll see this later when discussing the execution Environment)

86

State Management → Program Summarization

87 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • Stop the symbolic engine to explore specific paths when certain conditions arise
    • Path Prioritization
      • Prioritize exploration of specific paths according to some conditions (e.g., bug detected, calls to specific functions), or, following a specific order (BFS, DFS, etc…)�

87

State Management → Path Scheduling

88 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [2012] Pre-conditioned Symbolic Execution
      • [2015] Underconstrained Symbolic Execution
      • [2015] Post-conditioned Symbolic Execution
      • [2018] Dynamic Path Pruning in Symbolic Execution
    • Path Prioritization
      • [2008] Random Path Selection & Coverage Optimized Search
      • [2011] Directed Symbolic Execution
      • [2018] Chopped Symbolic Execution
      • [2021] SyML: Guiding Symbolic Execution Toward Vulnerable States Through Pattern Learning
      • [2021] Learning to Explore Paths for Symbolic Execution

88

State Management → Path Scheduling

89 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [2012] Pre-conditioned Symbolic Execution
        • Pre-constrain program’s input to promote exploration of exploitable paths, prune the rest.
      • [2015] Underconstrained Symbolic Execution
      • [2015] Post-conditioned Symbolic Execution
      • [2018] Dynamic Path Pruning in Symbolic Execution
    • Path Prioritization
      • [+]

89

State Management → Path Scheduling

90 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [2012] Pre-conditioned Symbolic Execution
      • [2015] Underconstrained Symbolic Execution
        • Start symbolic execution from an arbitrary function rather than entry point. Pay the cost of many symbolic vars in memory.
      • [2015] Post-conditioned Symbolic Execution
      • [2018] Dynamic Path Pruning in Symbolic Execution
    • Path Prioritization
      • [+]

90

State Management → Path Scheduling

91 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [2012] Pre-conditioned Symbolic Execution
      • [2015] Underconstrained Symbolic Execution
      • [2015] Post-conditioned Symbolic Execution
        • Avoid the analysis of common path suffixes to reduce the number of generated states
      • [2018] Dynamic Path Pruning in Symbolic Execution
    • Path Prioritization
      • [+]

91

State Management → Path Scheduling

92 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [2012] Pre-conditioned Symbolic Execution
      • [2015] Underconstrained Symbolic Execution
      • [2015] Post-conditioned Symbolic Execution
      • [2018] Dynamic Path Pruning in Symbolic Execution
        • Optimize the numbers of checks for SAT/UNSAT paths to speed up symbolic execution. CheckAll vs CheckNothing vs “CheckDynamically”
    • Path Prioritization
      • [+]

92

State Management → Path Scheduling

93 of 183

93

1

2

3

sat?

sat?

Σ�SMT�solver

4

CheckAll strategy

State Management → Path Scheduling

94 of 183

94

1

2

3

4

F

T

Σ�SMT�solver

CheckAll strategy

State Management → Path Scheduling

95 of 183

95

1

2

3

4

5

6

F

T

sat?

Σ�SMT�solver

sat?

CheckAll strategy

State Management → Path Scheduling

96 of 183

96

1

2

3

4

5

6

F

T

F

Σ�SMT�solver

T

CheckAll strategy

State Management → Path Scheduling

97 of 183

97

1

2

3

4

7

8

11

12

14

15

Σ�SMT�solver

5

6

9

10

13

sat?

sat?

sat?

CheckNothing strategy

State Management → Path Scheduling

98 of 183

98

1

2

3

4

7

8

11

12

14

15

Σ�SMT�solver

5

6

9

10

13

CheckNothing strategy

State Management → Path Scheduling

99 of 183

99

1

2

3

4

7

8

11

12

14

15

Σ�SMT�solver

5

6

9

10

13

Useless states!

CheckNothing strategy

State Management → Path Scheduling

100 of 183

100

1

2

3

4

7

8

11

12

14

15

Σ�SMT�solver

F

T

sat?

sat?

CheckDynamic strategy�(Dynamic Path Pruning in Symbolic Execution)

State Management → Path Scheduling

101 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [+]
    • Path Prioritization
      • [2008] Random Path Selection & Coverage Optimized Search
        • Pick next state to explore by walking the tree of already explored states from the root and randomly take branches until a leaf
        • Pick next state that covers unseen instructions.
      • [2011] Directed Symbolic Execution
      • [2018] Chopped Symbolic Execution
      • [2021] SyML: Guiding Symbolic Execution Toward Vulnerable States Through Pattern Learning
      • [2021] Learning to Explore Paths for Symbolic Execution

101

State Management → Path Scheduling

102 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion
  • Approaches:
    • Path Pruning
      • [+]
    • Path Prioritization
      • [2008] Random Path Selection & Coverage Optimized Search
      • [2011] Directed Symbolic Execution
        • Symbolic execution used to study how to reach a specific target line in a program
      • [2018] Chopped Symbolic Execution
      • [2021] SyML: Guiding Symbolic Execution Toward Vulnerable States Through Pattern Learning
      • [2021] Learning to Explore Paths for Symbolic Execution

102

State Management → Path Scheduling

103 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion
  • Approaches:
    • Path Pruning
      • [+]
    • Path Prioritization
      • [2008] Random Path Selection & Coverage Optimized Search
      • [2011] Directed Symbolic Execution
      • [2018] Chopped Symbolic Execution
        • Pre-define part of the program that should be skipped during the symbolic-execution, come back later if needed.
      • [2021] SyML: Guiding Symbolic Execution Toward Vulnerable States Through Pattern Learning
      • [2021] Learning to Explore Paths for Symbolic Execution

103

State Management → Path Scheduling

104 of 183

104

1

2

3

4

test(x)

Target:

  • test

Skip:

  • funcA
  • funcB

call funcA(&x)

Chopped symbolic execution

105 of 183

105

1

2

3

4

test(x)

Target:

  • test

Skip:

  • funcA
  • funcB

WritesTo:

  • funcA: x

4s

Snapshot state

call funcA(&x)

Chopped symbolic execution

Computed with

pointer analysis

106 of 183

106

1

2

3

4

test(x)

Target:

  • test

Skip:

  • funcA
  • funcB

WritesTo:

  • funcA: x

5

6

call funcA(&x)

4s

Snapshot state

Chopped symbolic execution

107 of 183

107

1

2

3

4

test(x)

Target:

  • test

Skip:

  • funcA
  • funcB

WritesTo:

  • funcA: x

5

6

x=x+6

call funcA(&x)

4s

Snapshot state

Chopped symbolic execution

108 of 183

108

1

2

3

4

call funcA(&x)

test(x)

Target:

  • test

Skip:

  • funcA
  • funcB

WritesTo:

  • funcA: x

5

6

x=x+6

7

8

*x++

4s

Snapshot state

Propagates writes

Recovery mode

Chopped symbolic execution

109 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [+]
    • Path Prioritization
      • [2008] Random Path Selection & Coverage Optimized Search
      • [2011] Directed Symbolic Execution
      • [2018] Chopped Symbolic Execution
      • [2021] SyML: Guiding Symbolic Execution Toward Vulnerable States Through Pattern Learning
        • Use ML to decide which paths’ are more promising to reach a vulnerability.
      • [2021] Learning to Explore Paths for Symbolic Execution

109

State Management → Path Scheduling

110 of 183

Path Scheduling

  • Manage paths exploration to reach more interesting program’s state and avoid state explosion�
  • Approaches:
    • Path Pruning
      • [+]
    • Path Prioritization
      • [2008] Random Path Selection & Coverage Optimized Search
      • [2011] Directed Symbolic Execution
      • [2018] Chopped Symbolic Execution
      • [2021] SyML: Guiding Symbolic Execution Toward Vulnerable States Through Pattern Learning
      • [2021] Learning to Explore Paths for Symbolic Execution
        • Attempting to generalize the “state searching” problem. Offline training to automatically derive searching strategies with a set of states’ features that prioritize certain goals.

110

State Management → Path Scheduling

111 of 183

111

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

Environment

112 of 183

Environment

  • How to handle code that interact with external environment or third party libraries?�
  • Issues:
    • Unmodeled interactions = add symbolic variable? stop execution? Execute everything symbolic?�
  • Approaches:
    • Abstract Models [7, 15, 25]
    • Concrete Delegation [10, 34, 40]

112

113 of 183

Environment

  • How to handle code that interact with external environment or third party libraries?�
  • Issues:
    • Unmodeled interactions = add symbolic variable? stop execution? Execute everything symbolic?�
  • Approaches:
    • Abstract Models
      • Summarize a call to external procedure with a specific function
        • Function level VS Syscall level
    • Concrete Delegation

113

114 of 183

Environment

  • How to handle code that interact with external environment or third party libraries?�
  • Issues:
    • Unmodeled interactions = add symbolic variable? stop execution? Execute everything symbolic?�
  • Approaches:
    • Abstract Models
    • Concrete Delegation
      • Execution of external functions is delegated to the real system outside of the symbolic executor

114

115 of 183

Environment

  • How to handle code that interact with external environment or third party libraries?�
  • Issues:
    • Unmodeled interactions = add symbolic variable? stop execution? Execute everything symbolic?�
  • Approaches:
    • Abstract Models
    • Concrete Delegation
      • [2011] Selective Symbolic Execution
      • [2012] Unleashing Mayhem on Binary Code
      • [2020] Interleaved Symbolic Execution

115

116 of 183

Environment

  • How to handle code that interact with external environment or third party libraries?�
  • Issues:
    • Unmodeled interactions = add symbolic variable? stop execution? Execute everything symbolic?�
  • Approaches:
    • Abstract Models
    • Concrete Delegation
      • [2011] Selective Symbolic Execution
        • Run entire software stack in emulator. Symbolically execute only a pre-selected part of the software stack, concretely execute the rest.
      • [2012] Unleashing Mayhem on Binary Code
      • [2020] Interleaved Symbolic Execution

116

117 of 183

Environment

  • How to handle code that interact with external environment or third party libraries?�
  • Issues:
    • Unmodeled interactions = add symbolic variable? stop execution? Execute everything symbolic?�
  • Approaches:
    • Abstract Models
    • Concrete Delegation
      • [2011] Selective Symbolic Execution
      • [2012] Unleashing Mayhem on Binary Code
        • Maintain a lightweight virtual machine for every state.
      • [2020] Interleaved Symbolic Execution

117

118 of 183

Environment

  • How to handle code that interact with external environment or third party libraries?�
  • Issues:
    • Unmodeled interactions = add symbolic variable? stop execution? Execute everything symbolic?�
  • Approaches:
    • Abstract Models
    • Concrete Delegation
      • [2011] Selective Symbolic Execution
      • [2012] Unleashing Mayhem on Binary Code
      • [2020] Interleaved Symbolic Execution
        • Manually interleave concrete and symbolic execution to reach “deeper” code in the program.

118

119 of 183

119

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

Constraints Management

120 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Issues:
    • Querying the SMT solver too often is a bottleneck
    • Some constraints CANNOT be solved�

120

Σ�SMT�solver

πPath Constraints

Constraints Management

121 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
    • Constraints Prediction
    • New Constraints Solving Techniques

121

122 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
      • Simplify the constraints with equivalents ones to speed up solving time
    • Constraints Caching
    • Constraints Prediction
    • New Constraints Solving Techniques

122

123 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
      • [2008] Expression Rewriting[7]
      • [2008] Constraint Set Simplification[7]
      • [2008] Implied Value Concretization[7]
      • [2008] Constraint Independence[7]
    • Constraints Caching
    • Constraints Prediction
    • New Constraints Solving Techniques

123

124 of 183

124

x+0 → x | x * 2n = x << n | 2*x-x=x

Expression Rewriting

x > 10 /\ x = 5 → True

Constraint Set Simplification

x + 1 = 10 → x = 9

Implied Value Concretization

{i < j, j < 20,k > 0}, i=20? → {i < j, j < 20}

Constraint Independence

Constraints Management → Constraints Reduction

125 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
      • [2008] Counter-Example Cache[7]
        • Cache constraints solutions and consider superset/subset when solving
      • [2012] Green: Reducing, reusing and recycling constraints in program analysis
    • Constraints Prediction
    • New Constraints Solving Techniques

125

126 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
      • [2008] Counter-Example Cache
      • [2012] Green: Reducing, reusing and recycling constraints in program analysis
        • Universal constraints caching technique. Solutions re-usable across target programs, analysis, and tools.
    • Constraints Prediction
    • New Constraints Solving Techniques

126

127 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
    • Constraints Prediction
      • [2020] Constraint Solving with Deep Learning for Symbolic Execution
        • Canonize and vectorize a set of constraints’ and their solutions (SAT vs UNSAT) to train a DNN. Use DNN oracle run-time to check for satisfiability.
      • [2021] Boosting symbolic execution via constraint solving time prediction
    • New Constraints Solving Techniques

127

128 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
    • Constraints Prediction
      • [2020] Constraint Solving with Deep Learning for Symbolic Execution
      • [2021] Boosting symbolic execution via constraint solving time prediction
        • Use ML to predict how long it is going to take to solve a specific constraints for a target solver. Stir the execution somewhere else to avoid blocking the analysis.
    • New Constraints Solving Techniques

128

129 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
    • Constraints Prediction
    • New Constraints Solving Techniques
      • [2014] Solving Complex Path Conditions through Heuristic Search on Induced Polytopes
        • Solution for solving linear mixed with non-linear constraints. Search solutions within a polytope defined by constraints.
      • [2019] Just Fuzz It: Solving Floating-Point Constraints using Coverage-Guided Fuzzing
      • [2019] Enhancing Symbolic Execution by Machine Learning Based Solver Selection

129

130 of 183

130

x = x * y /\ x > 2

Constraints Management → New Solving Technique

Solving Complex Path Conditions through Heuristic Search on Induced Polytopes

131 of 183

131

x = x * y /\ x > 2

Constraints Management → New Solving Technique

Solving Complex Path Conditions through Heuristic Search on Induced Polytopes

132 of 183

132

x = x * y /\ x > 2

Constraints Management → New Solving Technique

Solving Complex Path Conditions through Heuristic Search on Induced Polytopes

133 of 183

133

x = x * y /\ x > 2

1

2

3

4

Constraints Management → New Solving Technique

Tabu search

Solving Complex Path Conditions through Heuristic Search on Induced Polytopes

134 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
    • Constraints Prediction
    • New Constraints Solving Techniques
      • [2014] Solving Complex Path Conditions through Heuristic Search on Induced Polytopes
      • [2019] Just Fuzz It: Solving Floating-Point Constraints using Coverage-Guided Fuzzing
        • Replace classic SAT solving reasoning techniques with fuzzing techniques
      • [2019] Enhancing Symbolic Execution by Machine Learning Based Solver Selection

134

135 of 183

135

Constraints Management → New Solving Technique

Just Fuzz It: Solving Floating-Point Constraints using Coverage-Guided Fuzzing

136 of 183

Constraints Management

  • How to simplify/reduce queries to the SMT solver? How to efficiently solve constraints?�
  • Approaches:
    • Constraints Reduction
    • Constraints Caching
    • Constraints Prediction
    • New Constraints Solving Techniques
      • [2014] Solving Complex Path Conditions through Heuristic Search on Induced Polytopes
      • [2019] Just Fuzz It: Solving Floating-Point Constraints using Coverage-Guided Fuzzing
      • [2019] Enhancing Symbolic Execution by Machine Learning Based Solver Selection
        • Use ML to predict what is the best solver to handle a specific set of constraints and apply the decision run-time.

136

137 of 183

137

σ

Symbolic store

stmt

Next Ins. to eval

Σ�SMT�solver

π�Path Constraints

Target Binary

Sm�States Manager

Store Management

138 of 183

Store Management

  • How to organize memory and how to support symbolic memory operations? (read/writes) �
  • Issues:
    • Handling memory read/writes with symbolic addresses
    • Organize the memory layout to reflect the real program�
  • Classic approaches
    • Single Concretization
    • Forking model
    • Merge Model
    • Flat Model
  • New Approaches
    • [+]

138

σ

Symbolic store

Store Management

139 of 183

139

1

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

σ = symbolic store

π = path constraints

Single Concretization

Store Management

140 of 183

140

1

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

RBX ← solver.solve_one(rbx) = 0xdeadbeef

σ = symbolic store

π = path constraints

Single Concretization

Store Management

141 of 183

141

1

rbx = x0

σ

2

rbx = 0xdeadbeef

rax = MEM[0xdeadbeef]

σ

σ = symbolic store

π = path constraints

Single Concretization

Store Management

1| RBXX02| MOV RAX, [RBX]

142 of 183

142

1

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

σ = symbolic store

π = path constraints

Forking Model

Store Management

143 of 183

143

1

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

RBX ← solver.solve(rbx) = [0xdeadbeef,� 0x41414100,� 0x7fffffff,� 0x7ffff3ee]

σ = symbolic store

π = path constraints

Forking Model

Store Management

144 of 183

144

1

rbx = x0

σ

2a

rbx = 0xdeadbeef

rax = MEM[0xdeadbeef]

σ

σ = symbolic store

π = path constraints

2b

2c

2d

rbx = 0x7ffff3ee

rax = MEM[0x7ffff3ee]

σ

Forking Model

Store Management

1| RBXX02| MOV RAX, [RBX]

145 of 183

145

1

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

σ = symbolic store

π = path constraints

Merge Model

Store Management

146 of 183

146

1

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

RBX ← solver.solve(rbx) = [0xdeadbeef,� 0x41414100]

σ = symbolic store

π = path constraints

Merge Model

Store Management

147 of 183

147

1

rbx = x0

σ

σ = symbolic store

π = path constraints

2

rbx = x0

rax = ITE(x0 == 0xdeadbeef, MEM[0xdeadbeef], � ITE(x0== 0x41414100 sdadadadadadaMEM[0x41414100], ….)

σ

Merge Model

Store Management

1| RBXX02| MOV RAX, [RBX]

148 of 183

148

1

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

σ = symbolic store

π = path constraints

Flat Model + SMT Array Theory

Store Management

149 of 183

149

1

2

1| RBXX02| MOV RAX, [RBX]

rbx = x0

σ

rbx = x0

rax = SELECT(MemArr,x0)

σ

σ = symbolic store

π = path constraints

MemArr

(Memory as array)

Store Management

Flat Model + SMT Array Theory

150 of 183

Store Management

  • How to organize memory and how to support symbolic memory operations? (read/writes)
  • New Approaches
      • [2012] Unleashing Mayhem on binary code
        • Partial merge memory model (i.e., concretize writes, keep read symbolic)
        • Many optimizations over symbolic read reasoning
          • Boundary refinements, ITE predicates organized as ISTs
      • [2017] MEMTHINK: Rethinking pointer reasoning in symbolic execution
      • [2019] A Segmented Memory Model for Symbolic Execution
      • [2020] Relocatable Addressing Model for Symbolic Execution

150

151 of 183

Store Management

  • How to organize memory and how to support symbolic memory operations? (read/writes)
  • New Approaches
      • [2012] Unleashing Mayhem on binary code
      • [2017] MEMTHINK: Rethinking pointer reasoning in symbolic execution
        • Flat memory model. Never concretize memory addresses, keep them symbolic and use ITE expressions.
      • [2019] A Segmented Memory Model for Symbolic Execution
      • [2020] Relocatable Addressing Model for Symbolic Execution

151

152 of 183

152

0x7fffffff

0x23

0xdeadbeef

0x41

Address

Value

MEMORY

x = load(α)

α is getting concretized to �some value, e.g., 0x7fffffff

153 of 183

153

0x7fffffff

0x23

0xdeadbeef

0x41

Address

Value

MEMORY

x = load(α)

α is getting concretized to �some value, e.g., 0x7fffffff

x = 23

154 of 183

154

0x7fffffff

0x23

0xdeadbeef

0x41

Address

Value

MEMORY

x = load(α)

α is getting concretized to �some values, e.g., 0x7fffffff, 0xdeadbeef

155 of 183

155

0x23

0xdeadbeef

0x41

Address

Value

MEMORY

x = load(α)

0x7fffffff

X = ITE( α == 0x7fffffff, 0x23,

ITE( α == 0xdeadbeef, 0x41, 0))

Always keep mapping between concrete memory addresses and their values

156 of 183

156

0x23

B

C

Address

Value

MEMORY

x = load(α)

MEMTHINK APPROACH

A

X = ITE( α == A, 0x23,

ITE( α == B, C, 0))

Always keep mapping between symbolic memory addresses and their values

157 of 183

Store Management

  • How to organize memory and how to support symbolic memory operations? (read/writes)
  • New Approaches
      • [2012] Unleashing Mayhem on binary code
      • [2017] MEMTHINK: Rethinking pointer reasoning in symbolic execution
      • [2019] A Segmented Memory Model for Symbolic Execution
        • Split memory into pre-computed segments, use Array Theory within the segments to handle symbolic accesses
      • [2020] Relocatable Addressing Model for Symbolic Execution

157

158 of 183

158

p0,p1,p2,p3,p4,p5

MEM

p0,p1,�p2,p3,p4�p5

seg0

seg1

seg2

0 forks, complex constraints over MEMORY

3 forks, simpler constraints over MEMORY

Symbolic ptr: X0

Symbolic ptr: X0

Concrete sols

Concrete sols

Not segmented

Segmented

SMT Array

Theory

SMT Array �Theory

SMT Array �Theory

SMT Array �Theory

Store Management → New Approaches

A Segmented Memory Model for Symbolic Execution

159 of 183

Store Management

  • How to organize memory and how to support symbolic memory operations? (read/writes)
  • New Approaches
      • [2012] Unleashing Mayhem on binary code
      • [2017] MEMTHINK: Rethinking pointer reasoning in symbolic execution
      • [2019] A Segmented Memory Model for Symbolic Execution
      • [2020] Relocatable Addressing Model for Symbolic Execution
        • Puts together the techniques of the previous two papers:
          • Keep memory addresses symbolic (but using Array Theory)
          • Dynamically Segmented Memory Model
            • (powered by segments relocation)

159

160 of 183

Conclusions

160

161 of 183

Conclusions

  • Current state of research really pushed the boundaries of the design of original symbolic executors, unfortunately, research is VERY fragmented.
    • angr, KLEE, S2E, Mayhem, McSema, PathFinder, SymQEMU, ….�
  • Quality of tools and techniques comparison is debatable, no real incentive for that [26]
    • Comparison with old versions, no fair comparisons…�
  • Conflicts between SE techniques in different research areas is poorly understood.
    • I believe this field desperately needs more measurements research!�

161

162 of 183

Conclusions

  • Fundamental ideas for a modern symbolic executor:
    • States Management:
      • Concolic Execution
      • Efficient state merging technique
      • Dynamic path pruning
      • Flexible search strategy
      • Summarization of functions and loops
      • Symbolic execution caching [12][43]

162

163 of 183

Conclusions

  • Fundamental ideas for a modern symbolic executor:
    • States Management:
      • Concolic Execution
      • Efficient state merging technique
      • Dynamic path pruning
      • Flexible search strategy
      • Summarization of functions and loops
      • Symbolic execution caching [12][43]
    • Binary Processing:
      • Hybrid binary processing (i.e., SymQemu approach)

163

164 of 183

Conclusions

  • Fundamental ideas for a modern symbolic executor:
    • States Management:
      • Concolic Execution
      • Efficient state merging technique
      • Dynamic path pruning
      • Flexible search strategy
      • Summarization of functions and loops
      • Symbolic execution caching [12][43]
    • Binary Processing:
      • Hybrid binary processing (i.e., SymQemu approach)
    • Constraints Solving:
      • Constraints reduction/caching
      • Supersolver for constraints solving

164

165 of 183

Conclusions

  • Fundamental ideas for a modern symbolic executor:
    • Environment:
      • Concrete delegation

165

166 of 183

Conclusions

  • Fundamental ideas for a modern symbolic executor:
    • Environment:
      • Concrete delegation
    • Store management:
      • New solutions based on Array Theory are promising, but need more benchmarks

166

167 of 183

Conclusions

  • Fundamental ideas for a modern symbolic executor:
    • Environment:
      • Concrete delegation
    • Store management:
      • New solutions based on Array Theory are promising, but need more benchmarks
    • Introspection capabilities:
      • Automatically identify pitfalls of SE
      • Clear understanding of how the analysis is being progressed and how constraints are generated
      • Profilers

167

168 of 183

Future Work

  • Standard benchmarking framework for a systematic comparison between techniques.

168

169 of 183

Future Work

  • Standard benchmarking framework for a systematic comparison between techniques.
  • RA that needs the most work:
    • Symbolic storage: SMT Array Theory is great, but how can we refine the balance between constraints complexity and performances?
    • Environment interactions: how to systematically synchronize the delegated concrete operations with the symbolic execution by balancing performances and precision?
    • Summarization techniques: how to improve the summarization techniques to model more complex loops and functions?�

169

170 of 183

Future Work

  • Standard benchmarking framework for a systematic comparison between techniques.
  • RA that needs the most work:
    • Symbolic storage: SMT Array Theory is great, but how can we refine the balance between constraints complexity and performances?
    • Environment interactions: how to systematically synchronize the delegated concrete operations with the symbolic execution by balancing performances and precision?
    • Summarization techniques: how to improve the summarization techniques to model more complex loops and functions?�
  • Introspection/measurements techniques must be improved to better understand our tools

170

171 of 183

Thanks!

171

172 of 183

References

[1] High coverage detection of input-related security faults

[2] CUTE: A Concolic Unit Testing Engine for C

[3] DART: directed automated random testing

[4] Compositional Dynamic Test Generation

[5] Demand-driven compositional symbolic execution

[6] EXE: Automatically Generating Inputs of Death

[7] KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs

[8] Loop-extended symbolic execution on binary programs

[9] All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution

[10] S2E: a platform for in-vivo multi-path analysis of software systems

[11] Automatic partial loop summarization in dynamic test generation

[12] Memoized symbolic execution

[13] Efficient State Merging in Symbolic Execution

[14] Probabilistic symbolic execution

[15] Unleashing Mayhem on Binary Code

[16] Green: reducing, reusing and recycling constraints in program analysis

[17] Targeted test input generation using symbolic-concrete backward execution

[18] Enhancing symbolic execution with veritesting

[19] Solving complex path conditions through heuristic search on induced polytopes

[20] Under-Constrained Symbolic Execution: Correctness Checking for Real Code

172

173 of 183

[21] Postconditioned Symbolic Execution

[22] DASE: Document-Assisted Symbolic Execution for Improving Automated Software Testing

[23] Compositional Symbolic Execution using Fine-Grained Summaries

[24] Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis

[25] SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis

[26] On the Techniques We Create, the Tools We Build, and Their Misalignments: a Study of KLEE

[27] StatSym: Vulnerable Path Discovery through Statistics-Guided Symbolic Execution

[28] Rethinking pointer reasoning in symbolic execution

[29] A Survey of Symbolic Execution Techniques

[30] Chopped symbolic execution

[31] Dynamic Path Pruning in Symbolic Execution

[32] Automatically generating search heuristics for concolic testing

[33] Boost Symbolic Execution Using Dynamic State Merging and Forking

[34] QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

[35] Concolic testing with adaptively changing search heuristics

[36] Just fuzz it: solving floating-point constraints using coverage-guided fuzzing

[37] Enhancing Symbolic Execution by Machine Learning Based Solver Selection

[38] A segmented memory model for symbolic execution

[39] Systematic Comparison of Symbolic Execution Systems: Intermediate Representation and its Generation

[40] SYMBION: Interleaving Symbolic with Concrete Execution

[41] Symbolic execution with SymCC: Don't interpret, compile!

[42] Making Symbolic Execution Promising by Learning Aggressive State-Pruning Strategy

[43] Running symbolic execution forever

[44] Constraint Solving with Deep Learning for Symbolic Execution

[45] Relocatable addressing model for symbolic execution

173

174 of 183

174

[46] Multiplex Symbolic Execution: Exploring Multiple Paths by Solving Once

[47] Boosting symbolic execution via constraint solving time prediction

[48] Fuzzy-Sat: Fuzzing Symbolic Expressions

[49] Address-Aware Query Caching for Symbolic Execution

[50] SymQEMU:Compilation-based symbolic execution for binaries

[51] TASE: Reducing Latency of Symbolic Execution with Transactional Memory

[52] Pending Constraints in Symbolic Execution for Better Exploration and Seeding

[53] WISE: Automated Test Generation for Worst-Case Complexity

[54] Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints

[55] Ferry: State-Aware Symbolic Execution for Exploring State-Dependent Program Paths

[56] Steering Symbolic Execution to Less Traveled Paths

[57] Learning to Explore Paths for Symbolic Execution

[58] AEG: Automatic Exploit Generation

[59] SE: SELECT—a formal system for testing and debugging programs by symbolic execution

[60] Generalized Symbolic Execution for Model Checking and Testing

[61] Test Input Generation with Java PathFinder

[62] Billions and Billions of Constraints: Whitebox Fuzz Testing in Production

[63] Automated Whitebox Fuzz Testing

[64] Memory models in symbolic execution: key ideas and new thoughts

[65] Triton: A dynamic binary analysis framework (https://triton.quarkslab.com/)

[66] The Economic Impacts of Inadequate Infrastructure for Software Testing

175 of 183

Extras

175

176 of 183

176

1

2

3

4

call func(x)

x >= 10

x < 10

test(x)

Selective symbolic execution

177 of 183

177

1

2

3

4

5

call func(x)

func(x=12)

6

x >= 10

x < 10

x < 10

test(x)

Selective symbolic execution

concrete domain

symbolic domain

178 of 183

178

1

2

3

4

5

call func(x)

func(x=12)

6

x < 10

x < 10

test(x)

7

x > 10 /\ �x = 12 (soft)

x >= 10

Selective symbolic execution

concrete domain

symbolic domain

179 of 183

179

1

2

3

4

call func(x)

x < 10

x >= 10

x < 10

test(x)

7

8

9

x != 12

x == 12

Selective symbolic execution

x > 10 /\ �x = 12 (soft)

😤

180 of 183

180

1

2

3

4

5

call func(x)

func(x=11)

6

x < 10

x >= 10

x < 10

test(x)

7

8

9

x != 12

x == 12

Selective symbolic execution

x > 10 /\ �x = 12 (soft)

concrete domain

symbolic domain

181 of 183

181

1

2

3

4

call func(x)

x < 10

x >= 10

x < 10

test(x)

7

8

9

x != 12

x == 12

Selective symbolic execution

x > 10 /\ �x = 12 (soft)

10

x > 10 /\ �x = 11 (soft)

11

12

x != 12

x == 12

5

func(x=11)

6

concrete domain

symbolic domain

182 of 183

182

Interleaved symbolic execution

183 of 183

183

mode = 0

while (true){

int curr_type = read_item_type()

switch(curr_type){

case 0: [...]

case 1: [...]

[...]

case 23: mode = 1

[...]

Case 83: {

if(mode == 1){

memcpy(...) // bug here

}

}

}

}

State-var

State-dependent program path

State Management → Path Scheduling

Ferry: State-Aware SE for Exploring State-Dependent Program Paths