1 of 23

Efficient Binary-Level Coverage Analysis

M. Ammar Ben Khadra, Dominik Stoffel, and Wolfgang Kunz

2 of 23

Why you need to care?

2/23

3 of 23

Why you need to care?

  • Inspires confidence in software (up to a level)
  • Compliance in aerospace & automotive, e.g., DO-178C, ISO-26262
  • Coverage-guided fuzzing, e.g., AFL
  • Several other uses, e.g., test suite reduction

3/23

4 of 23

The current workflow

4/23

SRC

Compile

BIN

Run

Report

COV

SRC

SRC

coverage instrumentation prevents at least some compiler optimizations. This results in longer test run times, which leads to more timeouts

Ivanković et. al., “Code Coverage at Google”, ESEC/FSE’19

5 of 23

A simpler alternative

5/23

Compiler

GCC

Clang

ICC

Optimization

Debug

Release

LTO

Language

C

C++

Rust

BIN

Run

Report

COV

6 of 23

Why binary-level coverage?

6/23

a!=0?

b==0?

c!=0?

  • Detailed report also for tricky code, e.g., exception handling
  • Flexibility to instrument specific functions only

b+c

0

7 of 23

Introducing bcov

Static instrumentation for x86-64 ELF binaries

  1. Patch. Code segment for trampolines. Data segment for coverage data
  2. Run with our bcov-rt library
  3. Report basic block/instruction coverage

7/23

8 of 23

Basic mechanism

  • Insert detours to trampolines to track coverage
  • Single mov instruction updates coverage data

8/23

Original code

36b62: cmp eax,0x140

36b67: sete al

36b6a: jmp 36bce

Patched code

Trampoline

36b62: cmp eax,0x140 36b67: jmp 6002b8

6002b8: mov BYTE PTR [rip+0xadd88],1 6002bf: sete al

6002c2: jmp 36bce

9 of 23

Many techniques, one tool

9/23

Precision

Efficiency

Transparency

Sliced microexecution

Probe pruning

Optimized probe selection

Single-instruction coverage update

Jump table instrumentation

Semantic-preserving instruction rewriting

Non-return analysis

Detour hosting

10 of 23

Focus of the talk

  • Probe pruning. We bring Agrawal’s technique to binary-level instrumentation

H. Agrawal. “Dominators, super blocks, and program coverage”, POPL ’94

  • Sliced microexecution. Novel jump table analysis combining backward slicing and microexecution
  • Detour hosting. Host detour of a short BB (size < 5 bytes) in a large neighbouring BB. Try to choose the host optimally

10/23

11 of 23

Probe pruning

  • Recover CFG from binary and from it derive SB-DG
  • Leaf-node policy: covering D, E, and H achieves 100% coverage
  • Any-node policy: probing A or C enables precise coverage tracking

11/23

CFG

SB-DG

12 of 23

Sliced microexecution (1)

  • Does it depend on a constant base?
  • Slice should depend on a single variable, the jump table index.

12/23

9f6a1: lea r15,[rip+0xe69e4] ; set table base

. ; omitted instructions

9f6f0: movzx eax,r12b ; index is r12b

9f6f4: cmp r12b ,0x5b ; bound comparison

9f6f8: mov QWORD PTR [rsp+0x8],rax

. ; omitted instructions

9f707: ja 9f880 ; jump to default case

9f70d: mov rax,QWORD PTR [rsp+0x8]

9f712: movsxd rax,DWORD PTR [r15+rax*4]

9f716: add rax,r15

9f719: jmp rax ; jump to matching case

13 of 23

Sliced microexecution (2)

  • Does it have a bound condition?
  • Yes, condition @ 9f707, which is influenced by r12b

13/23

9f6a1: lea r15,[rip+0xe69e4] ; set table base

. ; omitted instructions

9f6f0: movzx eax,r12b ; index is r12b

9f6f4: cmp r12b ,0x5b ; bound comparison

9f6f8: mov QWORD PTR [rsp+0x8],rax

. ; omitted instructions

9f707: ja 9f880 ; jump to default case

9f70d: mov rax,QWORD PTR [rsp+0x8]

9f712: movsxd rax,DWORD PTR [r15+rax*4]

9f716: add rax,r15

9f719: jmp rax ; jump to matching case

14 of 23

Sliced microexecution (3)

  • What are the bound limits?
  • Microexecute the slice. Assign values to r12b in [0, 0x5b) and (0x5b, )

14/23

9f6a1: lea r15,[rip+0xe69e4] ; set table base

. ; omitted instructions

9f6f0: movzx eax,r12b ; index is r12b

9f6f4: cmp r12b ,0x5b ; bound comparison

9f6f8: mov QWORD PTR [rsp+0x8],rax

. ; omitted instructions

9f707: ja 9f880 ; jump to default case

9f70d: mov rax,QWORD PTR [rsp+0x8]

9f712: movsxd rax,DWORD PTR [r15+rax*4]

9f716: add rax,r15

9f719: jmp rax ; jump to matching case

15 of 23

Detour hosting

15/23

Original code

  • Call @ 0xad6800 is 3 bytes. Return address is 0xad6803
  • Inserted detour to relocated host @ 0xad67f3
  • Adjusted return address @ 0x1d31b40

ad67f3: jmp ad6803 ; host

ad67f5: nop [multi-byte] ; padding

ad6800: call QWORD PTR [rax+0x58]

Patched code

Trampoline

ad67f3: jmp 1d31afa ; jump to relocated host

ad67f8: call 1d31b39 ; hosted detour

ad67fd: nop DOWRD PTR [rax]

ad6800: jmp ad67f8 ; jump to hosted detour

1d31b39: mov BYTE PTR [rip+0x4f8d01],1 1d31b40: sub QWORD PTR [rsp], -6 1d31b45: jmp QWORD PTR [rax + 0x58]

16 of 23

Experimental setup

16/23

Binary

gas

perl

python

llc

ffmpeg

libmagickcore

libxerces

libopencv

Compiler

gcc-5.5

gcc-7.4

clang-5.0

clang-8.0

Optimization

Debug

Release

LTO

Total

95 binary

1.6 million function

17 of 23

Scalability and transparency

17/23

  • bcov can analyze and patch llc (4 million instructions) in 30 seconds
  • No test regressions after replacing original binaries with instrumented versions

18 of 23

Performance overhead

18/23

  • Replaced original binaries with ones instrumented with bcov
  • Average performance overhead is 14%.

19 of 23

Overhead compared to DBI tools

19/23

  • Pin and DynamoRIO (DR) introduced regressions
  • bcov offers better efficiency, usability, and transparency

20 of 23

Coverage report accuracy (1)

20/23

BIN

BIN

bcov

drcov

COV

COV

Non-determinism!

BIN

bcov + drcov

COV

  • TP = bcov /\ drcov
  • FP = bcov /\ !drcov
  • FN = !bcov /\ drcov
  • TN = N/A

COV

21 of 23

Coverage report accuracy (2)

21/23

  • Binaries compiled using gcc-7.4 in release build
  • Table describes the test suite runs. Dump size given in MB

Module

Process #

drcov size

bcov size

BB #

Inst. #

xerces

80

12.34

4.32

116,378

420,096

magick

58

7.71

2.90

125,521

521,107

llc

7,862

3481.97

4176.16

1,067,151

4,343,021

gas

1,235

71.94

38.56

60,511

220,447

ffmpeg

3,309

423.45

762.39

496,404

3,050,228

22 of 23

Coverage report accuracy (3)

22/23

  • True Positives (TP) are listed as average/maximum
  • Precision = TP / (TP + FP), Recall = TP / (TP + FN)

Module

TP BB

TP Inst.

Precision

Recall

xerces

9,523.2 / 21,927

40,651.2 / 92,144

99,98%

99.42%

magick

5,689.4 / 20,709

21,614.9 / 83,444

99,98%

99.94%

llc

4,5184.5 / 90,952

257,209.5 / 461,656

99,98%

99.68%

gas

2,916.4 / 5,015

11,045.8 / 19,578

99.93%

99.67%

ffmpeg

9,682.0 / 14,489

41,439.1 / 63,591

99.98%

99.94%

23 of 23

Try it out!

  • After 20 years of Dyninst, it is time to revisit the trade offs
  • We implemented bcov from scratch in more than 17 KLoC in C++
  • The result is outstanding efficiency, transparency, and accuracy
  • Reach out on github, ESECFSE subreddit, or directly per email

23/23