The (Long) Journey To A Multi-Architecture Disassembler
Nicolas Falliere, Joan Calvet, Cedric Lucas
PNF Software
REcon Montréal 2019
Who Are We?
2
You?
We are hiring
JEB Decompiler - PNF Software - 2019
PNF Software In Three Dates
3
JEB Decompiler - PNF Software - 2019
JEB’s Native Decompilation Pipeline
Simplified View
4
Disassembler
Native Model
(routines, data…)
IR Converters
Raw IR�(low-level)
IR Optimizers
Final IR�(higher-level, typed)
AST Builder
AST Optimizers
Raw AST
Native Code File
High Level Representation (e.g. pseudo-C)
Focus of this presentation!
JEB Decompiler - PNF Software - 2019
JEB’s Disassembler (1)�Informal Definitions
5
JEB Decompiler - PNF Software - 2019
JEB’s Disassembler (1)�Informal Definitions
6
JEB Decompiler - PNF Software - 2019
JEB’s Disassembler (2)�Why Do We Need It?
7
JEB Decompiler - PNF Software - 2019
JEB’s Disassembler (3)�Development Context
8
JEB Decompiler - PNF Software - 2019
This Presentation’s Intent
9
JEB Decompiler - PNF Software - 2019
This Presentation’s Intent
10
JEB Decompiler - PNF Software - 2019
Outline
11
JEB Decompiler - PNF Software - 2019
Disclaimer
This is a research talk (not a sales talk), showing work-in-progress and not intended to present the best ever solution to disassembling.
12
JEB Decompiler - PNF Software - 2019
Setting The Scene
13
14
ROUTINE 1
Visual Studio x86 Compiler
(no optimizations, no inlining, no symbols)
secret.c
secret.exe
…
…
…
CALL ROUTINE 2
…
ROUTINE 2
…�CALL ROUTINE 3�…
XOR 0x37373737,...
…
“Ideal”
Disassembler
EXPECTED OUTPUT SKETCH
JEB Decompiler - PNF Software - 2019
How To Get There?
15
?
secret.exe
First, we need some prerequisites...
ROUTINE 1
…
…
…
CALL ROUTINE2
…
ROUTINE 2
…�CALL ROUTINE 3�…
XOR 0x37373737,...
…
JEB Decompiler - PNF Software - 2019
Prerequisite 1: Executable File Parsers
16
16
Architecture Information
x86, little-endian,...
Memory Mapping
secret.exe
Section 1
Section 2
0x40001000
0x40003000
Entry Point
PE Parser
Section 3
0x40004000
JEB Decompiler - PNF Software - 2019
Interlude: Executable File Parsers in JEB
17
JEB Decompiler - PNF Software - 2019
Prerequisite 2: Instruction Disassemblers
18
Binary Blobs
55
X86
ARM
MIPS
0F 00 40 10
Parsed Instructions
Mnemonic
PUSH
Operand(s)
EBP
Next Instruction(s)
Fallthrough
SUBNE R0, PC Fallthrough
BEQZ $v0, 0x0F Fallthrough,
+0x0F
JEB Decompiler - PNF Software - 2019
Interlude: Instruction Disassemblers in JEB
19
JEB Decompiler - PNF Software - 2019
Back To Our Toy Example
20
ROUTINE 1
…
…
…
CALL ROUTINE2
…
ROUTINE 2
…�CALL ROUTINE 3�…
XOR 0x37373737,...
…
secret.exe
?
JEB Decompiler - PNF Software - 2019
Picking a First “Intuitive” Strategy
21
JEB Decompiler - PNF Software - 2019
Picking a First “Intuitive” Strategy
22
JEB Decompiler - PNF Software - 2019
Picking a First “Intuitive” Strategy
23
JEB Decompiler - PNF Software - 2019
24
Input Memory Mapping
Disassembler Data Structures
EP
25
PUSH EBP
Disassembler Data Structures
Input Memory Mapping
26
PUSH EBP
MOV EBP, ESP
Disassembler Data Structures
Input Memory Mapping
27
PUSH EBP
MOV EBP, ESP�CMP [EBP+8], 1�
Disassembler Data Structures
Input Memory Mapping
28
Conditional branch: end of block, continue analyzing fallthrough, store target for later analysis
Disassembler Data Structures
PUSH EBP
MOV EBP, ESP
CMP [EBP+8], 1�JNZ 0x40103B
Input Memory Mapping
Addresses To Analyze Later
Intra Routine
0x40103B
29
Fast Forward...
30
Disassembler Data Structures
[...]
CALL 0x4034D0
PUSH EBP
MOV EBP, ESP
CMP [EBP+8], 1�JNZ 0x40103B
Routine call: continue analyzing fallthrough, store target for later analysis
Input Memory Mapping
Addresses To Analyze Later
Intra Routine
0x40103B
Others Routines
0x4034D0
31
Fast Forward...
32
Addresses To Analyze Later
Intra Routine
Disassembler Data Structures
0x40103B
[...]
CALL 0x4034D0
[...]
PUSH EBP
MOV EBP, ESP
CMP [EBP+8], 1�JNZ 0x40103B
RET instruction: end of block, pop next address to analyze
Others Routines
0x4034D0
[...]
RET
Input Memory Mapping
33
Fast Forward...
34
Addresses To Analyze Later
Others Routines
Disassembler Data Structures
[...]
CALL 0x4034D0
[...]
PUSH EBP
MOV EBP, ESP
CMP [EBP+8], 1�JNZ 0x40103B
No more addresses to analyze: current CFG is finished. Analyze next routine.
[...]
RET
[...]
Input Memory Mapping
Intra Routine
0x4034D0
35
Fast Forward...
End Result
36
Routine 1 - starts at 0x401010
Routine 2 - starts at 0x4034D0
MOV EAX, 4
SHL EAX, 0
MOV ECX, [EBP+CH]
MOV EDX, [ECX+EAX]
PUSH EDX
CALL 0x4034D0
ADD ESP, 4
JMP LOC_401044
PUSH EBP
MOV EBP, ESP
CMP [EBP+8], 1�JNZ 0x40103B
POP EBP
RET
XOR EAX, EAX
PUSH EBP
MOV EBP, ESP
MOV EAX, [EBP+8]
PUSH EAX
CALL 0x4034F5
ADD ESP, 4
XOR EAX, [0x414880]
POP EBP
RET
JEB Decompiler - PNF Software - 2019
In the end, we produced the expected disassembly with a simple recursive algorithm!
�
The magic was in the instruction disassembler...
37
So, it’s not that hard to disassemble whole executables?
How does this algorithm generalize to others Visual Studio executables? To others compilers? To others architectures?
38
2. Some Questionable Assumptions We Made
More Or Less Consciously
39
Assumption 1: CALL Always Return To Caller
40
“Routine call: continue analyzing fallthrough, store target for later analysis”
Counter-Example: Non Returning Calls �Exiting APIs - Visual Studio CRT
41
JEB Decompiler - PNF Software - 2019
Counter-Example: Non Returning Calls �Infinitely Looping Routines - GCC 4.9
42
JEB Decompiler - PNF Software - 2019
Head-Scratching With Non Returning Routines (1)
43
JEB Decompiler - PNF Software - 2019
Head-Scratching With Non Returning Routines (1)
44
JEB Decompiler - PNF Software - 2019
Head-Scratching With Non Returning Routines (1)
45
JEB Decompiler - PNF Software - 2019
Head-Scratching With Non Returning Routines (2)
46
JEB Decompiler - PNF Software - 2019
Head-Scratching With Non Returning Routines (2)
47
JEB Decompiler - PNF Software - 2019
The JEB Way
48
JEB Decompiler - PNF Software - 2019
The JEB Way
49
JEB Decompiler - PNF Software - 2019
Assumption 2: Routine Control Flow Graphs Are Distincts
50
“No more addresses to analyze: current CFG is finished.”
51
Counter-Example:
Routines Sharing Code in Visual Studio 2017 CRT
JEB Decompiler - PNF Software - 2019
52
Then, we parse and found a branch within an existing basic block!
Head-Scratching On Shared Code
Let’s say, we parse first
Do we duplicate instructions into another basic block, or do we split the existing basic block?
1
2
1
2
JEB Decompiler - PNF Software - 2019
Head-Scratching On Shared Code
53
JEB Decompiler - PNF Software - 2019
Head-Scratching On Shared Code
54
JEB Decompiler - PNF Software - 2019
The JEB Way
55
JEB Decompiler - PNF Software - 2019
The JEB Way
56
JEB Decompiler - PNF Software - 2019
The JEB Way: End Result
57
JEB Decompiler - PNF Software - 2019
Assumption 3: Branch Instructions Immediately End Basic-Blocks
58
“Conditional branch: end of block, continue analyzing fallthrough, store target for later analysis”
Counter-Example: MIPS Branch Delay Slots
59
Conditional branch
(if $v0 == $s5)
Branch delay slots (always executed)
Conditional branch
(if $v0 == 0)
JEB Decompiler - PNF Software - 2019
Head-Scratching With Delay Slots (1)
60
JEB Decompiler - PNF Software - 2019
Head-Scratching With Delay Slots (1)
61
JEB Decompiler - PNF Software - 2019
Head-Scratching With Delay Slots (1)
62
JEB Decompiler - PNF Software - 2019
Head-Scratching With Delay Slots (2)
63
?
JEB Decompiler - PNF Software - 2019
Head-Scratching With Delay Slots (2)
64
?
JEB Decompiler - PNF Software - 2019
Head-Scratching With Delay Slots (3)
65
?
JEB Decompiler - PNF Software - 2019
Head-Scratching With Delay Slots (3)
66
?
JEB Decompiler - PNF Software - 2019
The JEB Way
�
67
JEB Decompiler - PNF Software - 2019
Example of End Result (JEB’s CFG)
68
JEB Decompiler - PNF Software - 2019
Assumption 4: The Instruction Set Remains The Same
69
(never said anything about that, but that was taken for granted, right?)
Counter-Example: ARM/Thumb Switch
70
ARM and Thumb are different instruction sets sharing the same encoding space
Both can be in the same executable...
Thumb ISA
ARM ISA
BLX: Branch with Link and eXchange instruction set
JEB Decompiler - PNF Software - 2019
Head-Scratching With Instruction Set Switching
�
71
JEB Decompiler - PNF Software - 2019
Head-Scratching With Instruction Set Switching
72
JEB Decompiler - PNF Software - 2019
The JEB Way
73
JEB Decompiler - PNF Software - 2019
Assumption 5: Control Flow Can Always Be Followed
74
“Routine call: continue analyzing fallthrough address, store target for later analysis”
Counter-Example: Jumptables�VS2017 x86 Compact Switch
75
JEB Decompiler - PNF Software - 2019
Counter-Example: Jumptables�VS2017 x86 Compact Switch
76
Discovering control flow here means finding ECX’s possible values
JEB Decompiler - PNF Software - 2019
How To Find Possible Values For Indirect Operands?�(i.e. register or memory operands)
77
JEB Decompiler - PNF Software - 2019
The JEB Way: Pattern Matching For Visual Studio x86 Jumptables
78
JEB Decompiler - PNF Software - 2019
The JEB Way: Pattern Matching For Visual Studio x86 Jumptables
79
Get jumptable address
JEB Decompiler - PNF Software - 2019
The JEB Way: Pattern Matching For Visual Studio x86 Jumptables
80
Get jumptable address
Get jumptable size
JEB Decompiler - PNF Software - 2019
The JEB Way: Pattern Matching For Visual Studio x86 Jumptables
81
Get jumptable address
Get jumptable size
Parse jumptable
JEB Decompiler - PNF Software - 2019
The JEB Way: Pattern Matching For Visual Studio x86 Jumptables
82
Get jumptable address
Get jumptable size
Parse jumptable
Control flow improvement
JEB Decompiler - PNF Software - 2019
The JEB Way: Pattern Matching For Visual Studio x86 Jumptables
Specific to Visual Studio x86
Get jumptable address
Get jumptable size
Parse jumptable
Control flow improvement
Generic
(More on how we integrate such compiler-specific logic into the disassembler later)
JEB Decompiler - PNF Software - 2019
Computing control flow with pattern matching... really?
84
JEB Decompiler - PNF Software - 2019
Syntactic solutions might be acceptable, though inelegant, when the target code is very common, because:
But obviously syntactic solutions cannot scale in the context of a multi-architecture disassembler...
85
JEB Decompiler - PNF Software - 2019
Moar Jumptables Examples�Case 1: ARM GCC
JEB Decompiler - PNF Software - 2019
Moar Jumptables Examples�Case 2: ARM GCC -fPIC
JEB Decompiler - PNF Software - 2019
Moar Jumptables Examples�Case 3: ARM Thumb GCC
JEB Decompiler - PNF Software - 2019
JEB Decompiler - PNF Software - 2019
MIPS
Position-Independent Code
(System V ABI)
90
Entry Point
?
Syntactic methods clearly not suitable here!
JEB Decompiler - PNF Software - 2019
How To Find Possible Values For Indirect Operands?
Back To The Original Question (And Let’s Forget About Syntactic Solutions)
91
JEB Decompiler - PNF Software - 2019
Interlude: JEB’s Intermediate Language
Definition
Low-level imperative language, made of expressions:
92
JEB Decompiler - PNF Software - 2019
Interlude: JEB’s Intermediate Language
Example
s32:var1[0:8[ = s8:var2 + s8:var3
93
Textual Representation
JEB Decompiler - PNF Software - 2019
Interlude: JEB’s Intermediate Language
Example
s32:var1[0:8[ = s8:var2 + s8:var3
94
Object Representation
IEAssign
IESlice
IEVar
IERange
IEOperation
IEVar
IEVar
s32:var1
[0:8[
s8:var2
s8:var3
+
Textual Representation
JEB Decompiler - PNF Software - 2019
Interlude: JEB’s Intermediate Language
Main Purpose
The IL’s primary purpose is to allow expressing native instruction’s semantics:
95
X86 Instruction
xor eax, dword ds:[10000h]
s32:_eax = (s32:_eax ^ 32<s16:_ds>[i32:10000h])�s1:_zf = (s32:_eax ? i1:0 : i1:1)�s1:_sf = s32:_eax[31:32[�s1:_pf = PARITY(s32:_eax[0:8[)�s1:_of = i1:0�s1:_cf = i1:0
Semantic Representation in JEB’s IL
JEB Decompiler - PNF Software - 2019
Interlude: JEB’s Intermediate Language
Main Purpose
96
Disassembler
IR Converters
IL Semantic Representation
IR Optimizers
Optimized IL�(higher-level, typed)
AST Builder
AST Optimizers
JEB Decompiler - PNF Software - 2019
Interlude: JEB’s Intermediate Language
Main Purpose
97
JEB Decompiler - PNF Software - 2019
Having access to native-to-IL converters allows us to implement the simulation at the IL level:
98
JEB Decompiler - PNF Software - 2019
The JEB Way : Intermediate Language Simulation
99
s32:_esp = (s32:_esp - i32:00000004h)
32<s16:_ss>[s32:_esp] = s32:_ebp
s32:_ebp = s32:_esp
32<s16:_ds>[i32:004152A0h] = i32:00401000h
s1:_zf = ((32<s16:_ss>[(s32:_ebp + i32:00000008h)] - i32:00000002h) ? i1:0 : i1:1)
s1:_sf = (32<s16:_ss>[(s32:_ebp + i32:00000008h)] - i32:00000002h)[31:32[
s1:_pf = PARITY((32<s16:_ss>[(s32:_ebp + i32:00000008h)] - i32:00000002h)[0:8[)
s1:_cf = (32<s16:_ss>[(s32:_ebp + i32:00000008h)] <u i32:00000002h)
s1:_af = ((32<s16:_ss>[(s32:_ebp + i32:00000008h)] ^ i32:00000002h) ^
s32:_eip = (s1:_zf ? i32:00401033h : i32:0040104Dh)
JEB Decompiler - PNF Software - 2019
The JEB Way : Intermediate Language Simulation
2. Simulate IL routine to build the machine state at each instruction:
100
JEB Decompiler - PNF Software - 2019
The JEB Way : Intermediate Language Simulation
101
IEOperation evaluation (excerpt)
JEB Decompiler - PNF Software - 2019
The JEB Way : Intermediate Language Simulation
102
JEB Decompiler - PNF Software - 2019
The JEB Way : Intermediate Language Simulation
103
JEB Decompiler - PNF Software - 2019
The JEB Way : Intermediate Language Simulation
3. Report the values found during simulation to the disassembler
104
JEB Decompiler - PNF Software - 2019
The JEB Way:
MIPS
Position-Independent
Code�(before IL simulation)
105
JEB Decompiler - PNF Software - 2019
The JEB Way:
MIPS
Position-Independent
Code
(after IL simulation)
106
JEB Decompiler - PNF Software - 2019
Interlude: Revisiting Syntactic Solutions With JEB IL
Example: x86/ARM Jumptables
Disassembly
Optimized JEB’s IL
...�if (32[(s32:_SP0 - i32:8h)] >u i32:9h)
goto 0027
...
s32:_eip = 32[((s32:_ecx * i32:4h) + i32:00401440h)]
...
if (s32:_R3 >u i32:9h)
goto 0021�...
s32:_PC = 32[((s32:_R3 * i32:4h) + i32:00010714h)]
JEB Decompiler - PNF Software - 2019
Interlude: Revisiting Syntactic Solutions With JEB IL
Example: x86/ARM Jumptables
Disassembly
Optimized JEB’s IL
...�if (32[(s32:_SP0 - i32:8h)] >u i32:9h)
goto 0027
...
s32:_eip = 32[((s32:_ecx * i32:4h) + i32:00401440h)]
...
if (s32:_R3 >u i32:9h)
goto 0021
...
s32:_PC = 32[((s32:_R3 * i32:4h) + i32:00010714h)]
One pattern catches both implementations!
JEB Decompiler - PNF Software - 2019
IL simulation provides only concrete and trustable values, and therefore does not always work.
So what if we cannot follow the control flow from main()? Do we have another way to find secret_algo()?
109
JEB Decompiler - PNF Software - 2019
Distinguishing Code From Data
�
110
JEB Decompiler - PNF Software - 2019
Distinguishing Code From Data
�
111
JEB Decompiler - PNF Software - 2019
Distinguishing Code From Data
�
112
JEB Decompiler - PNF Software - 2019
Distinguishing Code From Data
113
CODE or DATA?
push ebp�mov ebp, esp
(classic VS routine prologue)
int 3�int 3
...�(classic VS code padding)
likely CODE!
pop ebp
ret
(classic VS routine epilogue)
JEB Decompiler - PNF Software - 2019
Distinguishing Code From Data
114
KNOWN CODE
KNOWN CODE
KNOWN DATA
0x8048000
0x8050000
KNOWN DATA
CODE or DATA?
CODE or DATA?
GCC for x86 (usually) does not mix code and data!
likely CODE
likely DATA
JEB Decompiler - PNF Software - 2019
The JEB Way: Compiler-Specific Heuristics
115
JEB Decompiler - PNF Software - 2019
The JEB Way: Compiler-Specific Heuristics
116
compiler is gcc or clang�and architecture is x86 �and no obfuscations/malformations�and A is within code area �and bytes at A do not look like code padding
JEB Decompiler - PNF Software - 2019
The JEB Way: Compiler-Specific Heuristics�How To Integrate Them Into a Generic Disassembler? (1)�
117
JEB Decompiler - PNF Software - 2019
The JEB Way: Compiler-Specific Heuristics�How To Integrate Them Into a Generic Disassembler? (2)�
118
See INativeCodeAnalyzerExtension
JEB Decompiler - PNF Software - 2019
The JEB Way: Compiler-Specific Heuristics�What If... Heuristics Are Wrong?
119
JEB Decompiler - PNF Software - 2019
The JEB Way: Compiler-Specific Heuristics�What If... Heuristics Are Wrong?
�
120
JEB Decompiler - PNF Software - 2019
The JEB Way: Compiler-Specific Heuristics�What If... Heuristics Are Wrong?
121
JEB Decompiler - PNF Software - 2019
Assumption 6: All Code Matters
122
What’s Up With atoi()?
Counter-Example: Statically Linked Library Routines
123
... [large routine] ...
Pretty complex routine, but… it’s “just” atoi()!
JEB Decompiler - PNF Software - 2019
Identifying Library Routines
124
JEB Decompiler - PNF Software - 2019
Identifying Library Routines
125
JEB Decompiler - PNF Software - 2019
Identifying Library Routines
126
JEB Decompiler - PNF Software - 2019
The JEB Way: Signatures (Basic) Workflow
127
JEB
Files with named routines
Object Files
Signatures�Generator
Signatures
Disassembler
Executables With Symbols
JEB Projects
Generates
Loads and matches
against routines
...
JEB Decompiler - PNF Software - 2019
Interlude: JEB Native Signatures (1)
128
JEB Decompiler - PNF Software - 2019
Interlude: JEB Native Signatures (1)
129
JEB Decompiler - PNF Software - 2019
Interlude: JEB Native Signatures (2)
130
JEB Decompiler - PNF Software - 2019
Which Features To Identify Compiler Library Routines?
131
JEB Decompiler - PNF Software - 2019
Which Features To Identify Compiler Library Routines?
132
JEB Decompiler - PNF Software - 2019
Feature: Routine Code Hash (1)
133
JEB Decompiler - PNF Software - 2019
Feature: Routine Code Hash (2)
134
X86 object file snippet
call [address]
xor ecx, ecx
mov [address], eax
More complex normalization cases exist (e.g. ARM relocations can change BL mnemonic to BLX)
Normalization: abstract absolute addresses
(any constant actually)
JEB Decompiler - PNF Software - 2019
It Might Not Be Enough...
135
Same routine code hash, but different behaviors
=> another feature: name of called routines
JEB Decompiler - PNF Software - 2019
It Might (Still) Not Be Enough...
136
Same routine code hash and callee routines, but different behaviors
=> another feature: constants
JEB Decompiler - PNF Software - 2019
Signatures Generation Strategy (1)
�
137
JEB Decompiler - PNF Software - 2019
Signatures Generation Strategy
Pragmatic Approach
138
JEB Decompiler - PNF Software - 2019
Signatures Generation Strategy (2)
Pragmatic Approach
139
JEB Decompiler - PNF Software - 2019
How To Deal With Indistinguishable Routines?
140
JEB Decompiler - PNF Software - 2019
How To Deal With Indistinguishable Routines?
141
JEB Decompiler - PNF Software - 2019
JEB Signatures Packages
142
Packages List Extract
JEB Decompiler - PNF Software - 2019
3. Enough With Broken Assumptions, What’s The Point?
143
Let’s Sum Up
144
JEB Decompiler - PNF Software - 2019
Pessimistic Realistic Conclusion
There is no such thing as a disassembler able to correctly disassemble all programs for all architectures/compilers
145
JEB Decompiler - PNF Software - 2019
We cannot disassemble correctly all programs, but we might still be able to do “ok” on a subset of them.
146
JEB Decompiler - PNF Software - 2019
What Can We Do?
147
JEB Decompiler - PNF Software - 2019
What Can We Do?
148
JEB Decompiler - PNF Software - 2019
The JEB Way
(Very Simplified) Disassembler Workflow
149
Initialization
Executable File Parsing
Compiler Identification
JEB Decompiler - PNF Software - 2019
The JEB Way
(Very Simplified) Disassembler Workflow
150
Strategy
Modifies
Modifies
Initialization
Disassembler Engine
Executable File Parsing
Gaps Processing
CFGs Building
IL Simulation
Signatures Matching
Extensions
Errors Logger
Gap Processors
Recursive Disassembling
Compiler Identification
Instantiates
Queries
Report Errors
JEB Decompiler - PNF Software - 2019
Final Notes
�
151
JEB Decompiler - PNF Software - 2019
Conclusion
�
152
JEB Decompiler - PNF Software - 2019
Thank you!
153
JEB Decompiler - PNF Software - 2019