Past and Present of Program Diffing
Joxean Koret
Past and Present of Program Diffing
Introduction
What is program diffing?
What is BinDiffing?
What are we going to see in this talk:
BinDiffing
Let’s talk about the first...
BinDiff, the tool
Commercial (closed source) tool written in C++ with GUI in Java, by Halvar. Rolf Rolles joined Sabre (later Zynamics) and rewrote it to allow per-block matching.
Rolf left and the core was rewritten again by Halvar and Sören Meyer-Eppler somewhere around 2007.
The initial version of BinDiff (1.6?) had the following features for finding matches:
BinDiff, the tool
Zynamics times:
BinDiff, the tool
Google times (after they bought Zynamics):
BinDiff, the tool
BinDiff 3.0 was published in 2009.
BinDiff 4.0 was published in 2011.
BinDiff 4.1 was published in 2014.
BinDiff 4.2 was published in 2016.
BinDiff 4.3 was published in 2017.
And BinDiff 5 was published in 2019.
During BinDiff 4.0 to 5 little to no support happened.
Other tools
Since the first version of BinDiff was published, other various tools appeared:
The quality of these alternative tools, all of them Open Source, greatly varies. However, they all have one thing in common: they are (or seem to be) abandoned, with the only exception of YaDiff and Diaphora.
Diaphora, reviving binary diffing
The initial version of Diaphora was published in 2015, at SyScan. I wrote it due to despair with BinDiff, after Zynamics was bought by Google, and with all the other dead and/or unmaintained Open Source alternatives.
Also, because BinDiff didn’t have many of the features I wanted.
Let’s talk about the features, at 2015, that were the top of the top...
Binary Diffing Features in 2015
BinDiff 4.X featured the following features:
And… that’s about it.
Binary Diffing Features in 2015
BinDiff 4.X turned out to be frustrating for my daily job:
Diaphora as of 2019
Diaphora added many new features not available in other public tools:
New tools since 2015
Since Diaphora was published, only one more Open Source tool appeared: YaDiff, from the YaCo project (a Collaborative Reverse-Engineering plugin tool for IDA).
YaDiff (the diffing part), however, only focus on exporting and importing symbols between databases.
Extremely fast. Simplistic heuristics. A good and fast tool when it works.
It lacks support for exporting/importing enums, structs and any kind of things related to the pseudo-code.
It seems to be maintained as of today and, probably, will be ported to Ghidra.
Program Diffing in the academia
There are various great papers (with no accompanying source code or binary whatsoever, with only some little exceptions) about program diffing in the academia.
Some of my favourites papers from which I have extracted many ideas for Diaphora or Pigaios are shown in the next slides...
Academic Papers
Efficient Features for Function Matching Between Binary Executables
A new algorithm for Diaphora (КОКА, from Koret-Karamitas) was implemented based on the awesome ideas of Huku.
Academic Papers
BinPro: A Tool for Binary Source Code Provenance.
No code whatsoever was ever released, but the paper called my attention and served as the basis for Pigaios.
Academic Papers
BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis
Academic Papers
Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks
Academic Papers
SAFE: Self-Attentive Function Embeddings for Binary Similarity
Academic Papers
Debin: Predicting Debug Information in Stripped Binaries
Academic Papers
DeClassifier: Class-Inheritance Inference Engine for Optimized C++ Binaries.
The possible future
The following is what I think will be the future of program diffing:
Thank you!
Any questions?