Explainable Neural Binary Analysis
Jane Adams and Michael Davinroy
Glossary
Neural Binary Analysis
Using machine learning to infer analysis information about binary executables in a cybersecurity context, often for malware analysis and/or reverse engineering
Basic Block
The simplest ‘unit’ of code with no control flow calls in or out
Control Flow Graph (CFG)
An graph representation of binaries in which nodes are basic blocks and edges are control flow calls between basic blocks (ACFG: Attributed CFG)
Motivation / Problem Statement
What is the problem you want to solve?
Develop a visualization dashboard for viewing results of a GNN model which assesses code similarity for cybersecurity purposes
Who has this problem?
Cybersecurity analysts / researchers, for communicating their research to outside stakeholders
Why is it relevant / interesting?
Evaluating model/data integrity; Identifying and targeting security risks in enterprise systems
Background / Related Work
We are using the GMN-SNN model from DeepMind as a demo
We are using the binary function similarity dataset from Cisco-Talos
This could potentially be useful for other binary similarity models in that dataset, including: Asm2Vec, CodeCMR, Trex, Catalog1, FunctionSimSearch, GNN-S2V, SAFE, Zeek
The Datasets
Function A
Function B
Basic Block
Basic Block
Basic Block
Basic Block
Basic Block
Basic Block
Basic Block
Basic Block
Instructions are shown on hover
Entry nodes are highlighted in red
Results
Lessons Learned
Future Work
Provably impossible, actually!