Symbolic Execution for �Bug Hunting in Binaries
Fabio Gritti
(@degrigis)
2
Many many papers from an heterogeneous set of research groups!
Background
3
Bug Hunting
4
�
Symbolic Execution
5
1975
6
Are there any values for “a” and “b” for which the program breaks the assertion?
7
a=5, b=7
Are there any values for “a” and “b” for which the program breaks the assertion?
8
a=3, b=1
Are there any values for “a” and “b” for which the program breaks the assertion?
Symbolic Execution
9
symbolic variables
Are there any values for “a” and “b” for which the program breaks the assertion?
Symbolic Execution
10
symbolic variables
Are there any values for “a” and “b” for which the program breaks the assertion?
a=2, b=0
Program Binary
11
SE for Bug Hunting in Binaries
12
Motivation
13
Why Binaries?
14
Why Symbolic Analysis?
15
Why do we need this?
16
*Source: NIST report [66]
SE 101
17
18
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Program
Sm�States Manager
σ
Symbolic store
19
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
σ
Symbolic store
20
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
add rax, rbx
Sm�States Manager
σ
Symbolic store
21
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
mem[x] = 0x5
mem[y] = sym_var1
mem[z] = 0xdeadbeef
22
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
a != 0 /\ b == 5
Sm�States Manager
σ
Symbolic store
23
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
σ
Symbolic store
a != 0 /\ b > 5
SAT:�a=5 b=6
24
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
σ
Symbolic store
a*x != 0 /\ f(a) > 5
?!?!?!??!
25
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
σ
Symbolic store
26
symbolic variables
2
*States numbers correspond to code line (for simplicity we use program’s source code here)
27
symbolic variables
2
3
8
*States numbers correspond to code line (for simplicity we use program’s source code here)
a==0
a!=0
4
28
symbolic variables
2
3
8
4
*States numbers correspond to code line (for simplicity we use program’s source code here)
5
8
6
8
a==0
a!=0
b!=0
b==0
29
symbolic variables
2
3
8
4
*States numbers correspond to code line (for simplicity we use program’s source code here)
5
8
6
8
a!=0
b!=0
b==0
😎
😎
a==0
30
symbolic variables
2
3
8
4
*States numbers correspond to code line (for simplicity we use program’s source code here)
5
8
6
8
a!=0
b!=0
b==0
a=2�b=0
😰
😎
😎
a==0
Research Areas
31
Research Challenges
32
Execution Performances
State Explosion
Imprecise Analysis
Constraints Solving
33
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
34
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Store Management
States Management
Environment
Disassembly
Functions Boundaries
Program Processing
Sm�States Manager
Constraints Management
35
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Store Management
States Management
Environment
Program Processing
Sm�States Manager
Constraints Management
36
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Program Processing
Sm�States Manager
Program Processing
σ
Symbolic store
Program Processing
Program Processing
37
Program Processing
Binary Processing
38
IR-Based Symbolic Execution�(most popular)
add rax,rbx
mov rcx, rbx
dec rcx
sub rax,rbx
jnz label1
translate
SUM T0,T1
PUT …
DEC …
PUT …
ASM
IR
exe
Binary Processing
39
IR-Less Symbolic Execution
add rax,rbx
mov rcx, rbx
dec rcx
sub rax,rbx
jnz label1
exe
Symbolic �reasoning
Hook
exe
ASM
Binary Processing
40
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
| |
| |
| |
| |
| |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
41
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | |
| |
| |
| |
| |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
42
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | |
Architecture agnostic | |
| |
| |
| |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
43
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | |
Architecture agnostic | |
Easier queries to the solver[39] | |
| |
| |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
44
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | |
Architecture agnostic | |
Easier queries to the solver[39] | |
Less robust�(Unsupported native ins => Fatal error) | |
| |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
45
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | |
Architecture agnostic | |
Easier queries to the solver[39] | |
Less robust�(Unsupported native ins => Fatal error) | |
Poor execution performance[39] | |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
46
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | Hard to implement�(thousands of instructions) |
Architecture agnostic | |
Easier queries to the solver[39] | |
Less robust�(Unsupported native ins => Fatal error) | |
Poor execution performance[39] | |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
47
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | Hard to implement�(thousands of instructions) |
Architecture agnostic | Architecture-dependent (not portable!) |
Easier queries to the solver[39] | |
Less robust�(Unsupported native ins => Fatal error) | |
Poor execution performance[39] | |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
48
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | Hard to implement�(thousands of instructions) |
Architecture agnostic | Architecture-dependent (not portable!) |
Easier queries to the solver[39] | Harder queries to the solver[39] |
Less robust�(Unsupported native ins => Fatal error) | |
Poor execution performance[39] | |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
49
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | Hard to implement�(thousands of instructions) |
Architecture agnostic | Architecture-dependent (not portable!) |
Easier queries to the solver[39] | Harder queries to the solver[39] |
Less robust�(Unsupported native ins => Fatal error) | More Robust�(Unsupported ins => Just execute concretely) |
Poor execution performance[39] | |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
50
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | Hard to implement�(thousands of instructions) |
Architecture agnostic | Architecture-dependent (not portable!) |
Easier queries to the solver[39] | Harder queries to the solver[39] |
Less robust�(Unsupported native ins => Fatal error) | More Robust�(Unsupported ins => Just execute concretely) |
Poor execution performance[39] | Good execution performance[39] |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
Binary Processing
51
IR-Based Symbolic Execution | IR-Less Symbolic Execution |
Easier to implement�(Small number of high-level instructions) | Hard to implement�(thousands of instructions) |
Architecture agnostic | Architecture-dependent (not portable!) |
Easier queries to the solver[39] | Harder queries to the solver[39] |
Less robust�(Unsupported native ins => Fatal error) | More Robust�(Unsupported ins => Just execute concretely) |
Poor execution performance[39] | Good execution performance[39] |
KLEE[7], S2E[10], angr[25]
QSYM[34], SAGE[62], Triton[65]
SymQEMU: Compilation-based symbolic execution �for binaries
52
TAKEAWAYS
53
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
States Management
State Management
54
States Management
Hybrid Execution
�
55
State Management → Hybrid Execution
Hybrid Execution
�
56
State Management → Hybrid Execution
57
x=X0, y=Y0
1 |void test_me(int x, int y){
2 | z = (y*y) % 50;
3 | if(z == x){�4 | // ERROR
5 | }else{
6 | // SOMETHING
7 | }
8 |}
z = (y0y0) mod 50
S2
σ
σ = symbolic store
π = path constraints
x=1, y=2
Concolic Execution (DSE)
State Management → Hybrid Execution
58
x=X0, y=Y0
z = (y0y0) mod 50
S2
S3
σ
σ = symbolic store
π = path constraints
x=X0, y=Y0
1 |void test_me(int x, int y){
2 | z = (y*y) % 50;
3 | if(z == x){�4 | // ERROR
5 | }else{
6 | // SOMETHING
7 | }
8 |}
x=1, y=2
Concolic Execution (DSE)
State Management → Hybrid Execution
59
S2
S3
S6
S4
x0 == (y0y0) mod 50
x0 != (y0y0) mod 50
SAT?
π
π
σ = symbolic store
π = path constraints
x=X0, y=Y0
x=X0, y=Y0
1 |void test_me(int x, int y){
2 | z = (y*y) % 50;
3 | if(z == x){�4 | // ERROR
5 | }else{
6 | // SOMETHING
7 | }
8 |}
x=1, y=2
z = (y0y0) mod 50
σ
SAT?
😥
😥
Concolic Execution (DSE)
State Management → Hybrid Execution
60
S2
S3
S6
S4
1 == 2*2 mod 50
1 != 2*2 mod 50
FALSE
π
π
σ = symbolic store
π = path constraints
x=X0, y=Y0
x=X0, y=Y0
1 |void test_me(int x, int y){
2 | z = (y*y) % 50;
3 | if(z == x){�4 | // ERROR
5 | }else{
6 | // SOMETHING
7 | }
8 |}
x=1, y=2
z = (y0y0) mod 50
σ
TRUE
😎
😎
Concolic Execution (DSE)
State Management → Hybrid Execution
61
S2
S3
S6
S4
1 == 2*2 mod 50
1 != 2*2 mod 50
FALSE
π
π
σ = symbolic store
π = path constraints
x=X0, y=Y0
x=X0, y=Y0
1 |void test_me(int x, int y){
2 | z = (y*y) % 50;
3 | if(z == x){�4 | // ERROR
5 | }else{
6 | // SOMETHING
7 | }
8 |}
x=1, y=2
z = (y0y0) mod 50
σ
TRUE
😎
😎
Negate constraints, generate new inputs, cover new code!
Concolic Execution (DSE)
State Management → Hybrid Execution
62
S2
S3
S6
S4
1 == 2*2 mod 50
1 != 2*2 mod 50
FALSE
π
π
σ = symbolic store
π = path constraints
x=X0, y=Y0
x=X0, y=Y0
1 |void test_me(int x, int y){
2 | z = (y*y) % 50;
3 | if(z == x){�4 | // ERROR
5 | }else{
6 | // SOMETHING
7 | }
8 |}
x=1, y=2
z = (y0y0) mod 50
σ
TRUE
😎
😎
Negate constraints, generate new inputs, cover new code!
Concolic Execution (DSE)
State Management → Hybrid Execution
Hybrid Execution
�
63
State Management → Hybrid Execution
Program Summarization
64
State Management → Program Summarization
s1
s2
s3
s5
s4
s6
s7
s8
s1
s2
s3
s4
s5
Red border weight represents “state expressiveness” �(i.e., its complexity in terms of constraints)
Program Summarization
65
State Management → Program Summarization
Program Summarization
66
State Management → Program Summarization
Program Summarization
67
State Management → Program Summarization
68
1 |int is_positive(int x){
2 | if (x>0) return 1;
3 | return 0;
4 |}
State Management → Program Summarization
Compositional Dynamic Test Generation
69
is_positive:
(x > 0 ∧ ret = 1) ∨ (x ≤ 0 ∧ ret = 0)
1 |int is_positive(int x){
2 | if (x>0) return 1;
3 | return 0;
4 |}
pre-cond1
post-cond1
post-cond2
pre-cond2
State Management → Program Summarization
Compositional Dynamic Test Generation
Program Summarization
70
State Management → Program Summarization
Program Summarization
71
State Management → Program Summarization
Program Summarization
72
State Management → Program Summarization
73
State Management → Program Summarization
1 | int func(int x){
2 | counter = 0
3 | for(int i=0; i<x; i++){
4 | do_things();
5 | counter++
6 | if(counter == 1000){
7 | maybe_bug()
8 | }
9 | }
10 | }
symbolic input
Loop-Extended Symbolic Execution on Binary Programs
3
4
…
5
10
6
3
4
10
…
No “counter” symbolization
74
State Management → Program Summarization
1 | int func(int x){
2 | counter = 0
3 | for(int i=0; i<x; i++){
4 | do_things();
5 | counter++
6 | if(counter == 1000){
7 | maybe_bug()
8 | }
9 | }
10 | }
symbolic input
Loop-Extended Symbolic Execution on Binary Programs
3
4
…
5
10
6
3
4
10
…
7
“counter” symbolization
Program Summarization
75
State Management → Program Summarization
76
1| int n:=∗;
2| int x:=∗;
3| int z:=∗;
4| while (x<n){
5| x++;
6| }
x0 < n ∧ ( x = n )
Single-path Loop�
Cycle 1: 4-5
Cycle 2: 4-5
…
pre-cond
post-cond
State Management → Program Summarization
Automatic Partial Loop Summarization in Dynamic Test Generation
Program Summarization
77
State Management → Program Summarization
78
1| int n:=∗;
2| int x:=∗;
3| int z:=∗;
4| while (x<n){
5| if(z>x) x++;
6| else z++;
7| }
(x0 ≥ n∧x = x0∧z = z0)∨(x0 < n ≤ z0∧x = n∧z = z0)∨(x0 < n∧z < n∧x = z0 = n)
Multi-path Loop�
Cycle 1: 4-5
Cycle 2: 4-5
Cycle X: 4-6
…
pre-cond1
post-cond1
pre-cond2
post-cond2
pre-cond3
post-cond3
State Management → Program Summarization
Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis
Program Summarization
79
State Management → Program Summarization
80
| void abs(int n){
1 | int ret = 0;
2 | if (n < 0)
3 | ret = -n;
4 | else
5 | ret = n;
6 | return ret;
1
2
3
4
6
6
ret ← -n
ret ← n
No merging
State Management → Program Summarization
81
| void abs(int n){
1 | int ret = 0;
2 | if (n < 0)
3 | ret = -n;
4 | else
5 | ret = n;
6 | return ret;
1
2
3
4
6
States merging
ret ← ITE(n < 0, −n, n).
State Management → Program Summarization
Program Summarization
82
State Management → Program Summarization
Program Summarization
83
State Management → Program Summarization
84
| void f3(int a,int b){
1 | int flag = 0;
2 | If (a == b){
3 | flag = 1;
| }
4 | g3(a,b);
5 | If (flag)
6 | g1(flag+1);
7 | else
8 | g2(flag+2);
| }
Naive dynamic merging�(Efficient State Merging in Symbolic Execution)
1
4
5
flag ← ITE(a==b, 1, 0)
g3(a,b);
6
8
flag ←ITE(a==b, 1, 0) + 1
flag ←ITE(a==b, 1, 0) + 2
😱
😱
😎
flag ← 0
Assume parameters a,b are symbolic
State Management → Program Summarization
85
| void f3(int a,int b){
1 | int flag = 0;
2 | If (a == b){
3 | flag = 1;
| }
4 | g3(a,b);
5 | If (flag)
6 | g1(flag+1);
7 | else
8 | g2(flag+2);
| }
1
4
5
flag ← ITE(a==b, 1, 0)
g3(a,b);
6
8
flag ← 1
flag ← 0
| void f3(int a,int b){
1 | int flag = 0;
2 | If (a == b){
3 | flag = 1;
| }
4 | g3(a,b);
5 | If (flag)
6 | g1(flag+1);
7 | else
8 | g2(flag+2);
| }
Dynamic merging + active fork� (Boost Symbolic Execution Using Dynamic State Merging and Forking)
😎
😎
😎
Assume parameters a,b are symbolic
flag ← 0
State Management → Program Summarization
Program Summarization
86
State Management → Program Summarization
Path Scheduling
87
State Management → Path Scheduling
Path Scheduling
88
State Management → Path Scheduling
Path Scheduling
89
State Management → Path Scheduling
Path Scheduling
90
State Management → Path Scheduling
Path Scheduling
91
State Management → Path Scheduling
Path Scheduling
92
State Management → Path Scheduling
93
1
2
3
sat?
sat?
Σ�SMT�solver
4
CheckAll strategy
State Management → Path Scheduling
94
1
2
3
4
F
T
Σ�SMT�solver
CheckAll strategy
State Management → Path Scheduling
95
1
2
3
4
5
6
F
T
sat?
Σ�SMT�solver
sat?
CheckAll strategy
State Management → Path Scheduling
96
1
2
3
4
5
6
F
T
F
Σ�SMT�solver
T
CheckAll strategy
State Management → Path Scheduling
97
1
2
3
4
7
8
11
12
14
15
Σ�SMT�solver
5
6
9
10
13
sat?
sat?
sat?
CheckNothing strategy
State Management → Path Scheduling
98
1
2
3
4
7
8
11
12
14
15
Σ�SMT�solver
5
6
9
10
13
CheckNothing strategy
State Management → Path Scheduling
99
1
2
3
4
7
8
11
12
14
15
Σ�SMT�solver
5
6
9
10
13
Useless states!
CheckNothing strategy
State Management → Path Scheduling
100
1
2
3
4
7
8
11
12
14
15
Σ�SMT�solver
F
T
sat?
sat?
CheckDynamic strategy�(Dynamic Path Pruning in Symbolic Execution)
State Management → Path Scheduling
Path Scheduling
101
State Management → Path Scheduling
Path Scheduling
102
State Management → Path Scheduling
Path Scheduling
103
State Management → Path Scheduling
104
1
2
3
4
test(x)
Target:
Skip:
call funcA(&x)
Chopped symbolic execution
105
1
2
3
4
test(x)
Target:
Skip:
WritesTo:
4s
Snapshot state
call funcA(&x)
Chopped symbolic execution
Computed with
pointer analysis
106
1
2
3
4
test(x)
Target:
Skip:
WritesTo:
5
6
call funcA(&x)
4s
Snapshot state
Chopped symbolic execution
107
1
2
3
4
test(x)
Target:
Skip:
WritesTo:
5
6
x=x+6
call funcA(&x)
4s
Snapshot state
Chopped symbolic execution
108
1
2
3
4
call funcA(&x)
test(x)
Target:
Skip:
WritesTo:
5
6
x=x+6
7
8
*x++
4s
Snapshot state
Propagates writes
Recovery mode
Chopped symbolic execution
Path Scheduling
109
State Management → Path Scheduling
Path Scheduling
110
State Management → Path Scheduling
111
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
Environment
Environment
112
Environment
113
Environment
114
Environment
115
Environment
116
Environment
117
Environment
118
119
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
Constraints Management
Constraints Management
120
Σ�SMT�solver
π�Path Constraints
Constraints Management
Constraints Management
121
Constraints Management
122
Constraints Management
123
124
x+0 → x | x * 2n = x << n | 2*x-x=x
Expression Rewriting
x > 10 /\ x = 5 → True
Constraint Set Simplification
x + 1 = 10 → x = 9
Implied Value Concretization
{i < j, j < 20,k > 0}, i=20? → {i < j, j < 20}
Constraint Independence
Constraints Management → Constraints Reduction
Constraints Management
125
Constraints Management
126
Constraints Management
127
Constraints Management
128
Constraints Management
129
130
x = x * y /\ x > 2
Constraints Management → New Solving Technique
Solving Complex Path Conditions through Heuristic Search on Induced Polytopes
131
x = x * y /\ x > 2
Constraints Management → New Solving Technique
Solving Complex Path Conditions through Heuristic Search on Induced Polytopes
132
x = x * y /\ x > 2
Constraints Management → New Solving Technique
Solving Complex Path Conditions through Heuristic Search on Induced Polytopes
133
x = x * y /\ x > 2
1
2
3
4
Constraints Management → New Solving Technique
Tabu search
Solving Complex Path Conditions through Heuristic Search on Induced Polytopes
Constraints Management
134
135
Constraints Management → New Solving Technique
Just Fuzz It: Solving Floating-Point Constraints using Coverage-Guided Fuzzing
Constraints Management
136
137
σ
Symbolic store
stmt
Next Ins. to eval
Σ�SMT�solver
π�Path Constraints
Target Binary
Sm�States Manager
Store Management
Store Management
138
σ
Symbolic store
Store Management
139
1
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
σ = symbolic store
π = path constraints
Single Concretization
Store Management
140
1
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
RBX ← solver.solve_one(rbx) = 0xdeadbeef
σ = symbolic store
π = path constraints
Single Concretization
Store Management
141
1
rbx = x0
σ
2
rbx = 0xdeadbeef
rax = MEM[0xdeadbeef]
σ
σ = symbolic store
π = path constraints
Single Concretization
Store Management
1| RBX ← X0�2| MOV RAX, [RBX]�
142
1
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
σ = symbolic store
π = path constraints
Forking Model
Store Management
143
1
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
RBX ← solver.solve(rbx) = [0xdeadbeef,� 0x41414100,� 0x7fffffff,� 0x7ffff3ee]
σ = symbolic store
π = path constraints
Forking Model
Store Management
144
1
rbx = x0
σ
2a
rbx = 0xdeadbeef
rax = MEM[0xdeadbeef]
σ
σ = symbolic store
π = path constraints
2b
2c
2d
rbx = 0x7ffff3ee
rax = MEM[0x7ffff3ee]
σ
Forking Model
Store Management
1| RBX ← X0�2| MOV RAX, [RBX]�
145
1
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
σ = symbolic store
π = path constraints
Merge Model
Store Management
146
1
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
RBX ← solver.solve(rbx) = [0xdeadbeef,� 0x41414100]
σ = symbolic store
π = path constraints
Merge Model
Store Management
147
1
rbx = x0
σ
σ = symbolic store
π = path constraints
2
rbx = x0
rax = ITE(x0 == 0xdeadbeef, MEM[0xdeadbeef], � ITE(x0== 0x41414100 sdadadadadadaMEM[0x41414100], ….)
σ
Merge Model
Store Management
1| RBX ← X0�2| MOV RAX, [RBX]�
148
1
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
σ = symbolic store
π = path constraints
Flat Model + SMT Array Theory
Store Management
149
1
2
1| RBX ← X0�2| MOV RAX, [RBX]�
rbx = x0
σ
rbx = x0
rax = SELECT(MemArr,x0)
σ
σ = symbolic store
π = path constraints
MemArr
(Memory as array)
Store Management
Flat Model + SMT Array Theory
Store Management
150
Store Management
151
152
0x7fffffff
0x23
0xdeadbeef
0x41
Address
Value
MEMORY
x = load(α)
α is getting concretized to �some value, e.g., 0x7fffffff
153
0x7fffffff
0x23
0xdeadbeef
0x41
Address
Value
MEMORY
x = load(α)
α is getting concretized to �some value, e.g., 0x7fffffff
x = 23
154
0x7fffffff
0x23
0xdeadbeef
0x41
Address
Value
MEMORY
x = load(α)
α is getting concretized to �some values, e.g., 0x7fffffff, 0xdeadbeef
155
0x23
0xdeadbeef
0x41
Address
Value
MEMORY
x = load(α)
0x7fffffff
X = ITE( α == 0x7fffffff, 0x23,
ITE( α == 0xdeadbeef, 0x41, 0))
Always keep mapping between concrete memory addresses and their values
156
0x23
B
C
Address
Value
MEMORY
x = load(α)
MEMTHINK APPROACH
A
X = ITE( α == A, 0x23,
ITE( α == B, C, 0))
Always keep mapping between symbolic memory addresses and their values
Store Management
157
158
p0,p1,p2,p3,p4,p5
MEM
p0,p1,�p2,p3,p4�p5
seg0
seg1
seg2
0 forks, complex constraints over MEMORY
3 forks, simpler constraints over MEMORY
Symbolic ptr: X0
Symbolic ptr: X0
Concrete sols
Concrete sols
Not segmented
Segmented
SMT Array
Theory
SMT Array �Theory
SMT Array �Theory
SMT Array �Theory
Store Management → New Approaches
A Segmented Memory Model for Symbolic Execution
Store Management
159
Conclusions
160
Conclusions
161
Conclusions
162
Conclusions
163
Conclusions
164
Conclusions
165
Conclusions
166
Conclusions
167
Future Work
168
Future Work
169
Future Work
170
Thanks!
171
References
[1] High coverage detection of input-related security faults
[2] CUTE: A Concolic Unit Testing Engine for C
[3] DART: directed automated random testing
[4] Compositional Dynamic Test Generation
[5] Demand-driven compositional symbolic execution
[6] EXE: Automatically Generating Inputs of Death
[7] KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
[8] Loop-extended symbolic execution on binary programs
[9] All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution
[10] S2E: a platform for in-vivo multi-path analysis of software systems
[11] Automatic partial loop summarization in dynamic test generation
[12] Memoized symbolic execution
[13] Efficient State Merging in Symbolic Execution
[14] Probabilistic symbolic execution
[15] Unleashing Mayhem on Binary Code
[16] Green: reducing, reusing and recycling constraints in program analysis
[17] Targeted test input generation using symbolic-concrete backward execution
[18] Enhancing symbolic execution with veritesting
[19] Solving complex path conditions through heuristic search on induced polytopes
[20] Under-Constrained Symbolic Execution: Correctness Checking for Real Code
172
[21] Postconditioned Symbolic Execution
[22] DASE: Document-Assisted Symbolic Execution for Improving Automated Software Testing
[23] Compositional Symbolic Execution using Fine-Grained Summaries
[24] Proteus: Computing Disjunctive Loop Summary via Path Dependency Analysis
[25] SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis
[26] On the Techniques We Create, the Tools We Build, and Their Misalignments: a Study of KLEE
[27] StatSym: Vulnerable Path Discovery through Statistics-Guided Symbolic Execution
[28] Rethinking pointer reasoning in symbolic execution
[29] A Survey of Symbolic Execution Techniques
[30] Chopped symbolic execution
[31] Dynamic Path Pruning in Symbolic Execution
[32] Automatically generating search heuristics for concolic testing
[33] Boost Symbolic Execution Using Dynamic State Merging and Forking
[34] QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing
[35] Concolic testing with adaptively changing search heuristics
[36] Just fuzz it: solving floating-point constraints using coverage-guided fuzzing
[37] Enhancing Symbolic Execution by Machine Learning Based Solver Selection
[38] A segmented memory model for symbolic execution
[39] Systematic Comparison of Symbolic Execution Systems: Intermediate Representation and its Generation
[40] SYMBION: Interleaving Symbolic with Concrete Execution
[41] Symbolic execution with SymCC: Don't interpret, compile!
[42] Making Symbolic Execution Promising by Learning Aggressive State-Pruning Strategy
[43] Running symbolic execution forever
[44] Constraint Solving with Deep Learning for Symbolic Execution
[45] Relocatable addressing model for symbolic execution
173
174
[46] Multiplex Symbolic Execution: Exploring Multiple Paths by Solving Once
[47] Boosting symbolic execution via constraint solving time prediction
[48] Fuzzy-Sat: Fuzzing Symbolic Expressions
[49] Address-Aware Query Caching for Symbolic Execution
[50] SymQEMU:Compilation-based symbolic execution for binaries
[51] TASE: Reducing Latency of Symbolic Execution with Transactional Memory
[52] Pending Constraints in Symbolic Execution for Better Exploration and Seeding
[53] WISE: Automated Test Generation for Worst-Case Complexity
[54] Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints
[55] Ferry: State-Aware Symbolic Execution for Exploring State-Dependent Program Paths
[56] Steering Symbolic Execution to Less Traveled Paths
[57] Learning to Explore Paths for Symbolic Execution
[58] AEG: Automatic Exploit Generation
[59] SE: SELECT—a formal system for testing and debugging programs by symbolic execution
[60] Generalized Symbolic Execution for Model Checking and Testing
[61] Test Input Generation with Java PathFinder
[62] Billions and Billions of Constraints: Whitebox Fuzz Testing in Production
[63] Automated Whitebox Fuzz Testing
[64] Memory models in symbolic execution: key ideas and new thoughts
[65] Triton: A dynamic binary analysis framework (https://triton.quarkslab.com/)
[66] The Economic Impacts of Inadequate Infrastructure for Software Testing
Extras
175
176
1
2
3
4
call func(x)
x >= 10
x < 10
test(x)
Selective symbolic execution
177
1
2
3
4
5
call func(x)
func(x=12)
6
x >= 10
x < 10
x < 10
test(x)
Selective symbolic execution
concrete domain
symbolic domain
178
1
2
3
4
5
call func(x)
func(x=12)
6
x < 10
x < 10
test(x)
7
x > 10 /\ �x = 12 (soft)
x >= 10
Selective symbolic execution
concrete domain
symbolic domain
179
1
2
3
4
call func(x)
x < 10
x >= 10
x < 10
test(x)
7
8
9
x != 12
x == 12
Selective symbolic execution
x > 10 /\ �x = 12 (soft)
😤
180
1
2
3
4
5
call func(x)
func(x=11)
6
x < 10
x >= 10
x < 10
test(x)
7
8
9
x != 12
x == 12
Selective symbolic execution
x > 10 /\ �x = 12 (soft)
concrete domain
symbolic domain
181
1
2
3
4
call func(x)
x < 10
x >= 10
x < 10
test(x)
7
8
9
x != 12
x == 12
Selective symbolic execution
x > 10 /\ �x = 12 (soft)
10
x > 10 /\ �x = 11 (soft)
11
12
x != 12
x == 12
5
func(x=11)
6
concrete domain
symbolic domain
182
Interleaved symbolic execution
183
mode = 0
while (true){
int curr_type = read_item_type()
switch(curr_type){
case 0: [...]
case 1: [...]
[...]
case 23: mode = 1
[...]
Case 83: {
if(mode == 1){
memcpy(...) // bug here
}
}
}
}
State-var
State-dependent program path
State Management → Path Scheduling
Ferry: State-Aware SE for Exploring State-Dependent Program Paths