1 of 33

PointsTo

Static Use-After-Free Detector for C/C++

Presented by: Peter Goodman

2 of 33

PointsTo detects use-after-free bugs

  • Analyzes LLVM bitcode and finds use-after-free bugs in C/C++ code�
  • Scales to large programs
    • SpiderMonkey is Mozilla’s Firefox’s javascript engine
    • 700K Lines of Code
    • Starts producing reports on Mozilla Spidermonkey after ~2 days of staring hard at the bitcodes

2

3 of 33

Use-after-frees are bad!

  • They happen when you use some memory that you have already freed

delete x;�…�y = x->foo;

  • Cause corruption
  • Lead to strange control flow
    • Call virtual method on free’d object

3

what is this I don’t even

4 of 33

Example use-after-free

x = new X;�…�delete x;�…�y = new Y;�…�y->bar = 2;�x->foo = 1;

4

(1)

5 of 33

Example use-after-free

x = new X;�…�delete x;��y = new Y;�…�y->bar = 2;�x->foo = 0xBEEF;

5

x

(2)

6 of 33

Example use-after-free

x = new X;�…�delete x;�…�y = new Y;�…�y->bar = 2;�x->foo = 0xBEEF;

6

x

(3)

7 of 33

Example use-after-free

x = new X;�…�delete x;�…�y = new Y;�…�y->bar = 2;�x->foo = 0xBEEF;

7

x

y

x and y point to the same memory!

(4)

8 of 33

Example use-after-free

x = new X;�…�delete x;�…�y = new Y;�…�y->bar = 2;�x->foo = 0xBEEF;

8

x

y

2

(5)

9 of 33

Example use-after-free

x = new X;�…�delete x;�…�y = new Y;�…�y->bar = 2;�x->foo = 0xBEEF;

9

x

y

0xBEEF

Delicious

Corruption!

(6)

10 of 33

… but I don’t write buggy code

  • Smart pointers eliminate many bugs
    • std::unique_ptr, std::shared_ptr, etc

10

11 of 33

Is it possible that a use-after-free existed before you committed code?

11

12 of 33

How do you find use-after-frees?

  • Dynamically
    • Run an instrumented version of the program until something bad happens
    • Tooling: Valgrind memcheck, AddressSanitizer
    • Challenge: code coverage (exhaustive test cases)�
  • Statically
    • Exhaustively explore paths through the program
    • Look for a delete x followed by an x->y
    • Tooling: PointsTo :-P
    • Challenge: scaling, finding all paths (function pointers)

12

13 of 33

Static use-after-free detection is hard

  • Want to analyze all program paths
    • Bigger program = more paths of execution
    • Hard to scale�
  • How do you know all the program paths?
    • Need a way to figure out targets of function pointers, virtual method calls

13

14 of 33

Analyzing paths for use-after-frees

  • Example use-after-free

14

A *a = new A;

if (...) {

delete a;

}

a->x = 10;

(1)

15 of 33

Analyzing paths for use-after-frees

  • Convert frees into new definitions

15

A *a = new A;

if (...) {

delete a;

a = ERROR;

}

a->x = 10;

(2)

16 of 33

Analyzing paths for use-after-frees

  • Convert frees into new definitions

  • What about custom (de)allocators?
    • Use Python script to tell PointsTo about them!

16

A *a = new A;

if (...) {

delete a;

a = ERROR;

}

a->x = 10;

(2)

17 of 33

Analyzing paths for use-after-frees

  • Convert into SSA (one assign/var)

17

A *a0 = new A;

if (...) {

a1 = ERROR;

}

a2 = φ(a0,a1)

a2->x = 10;

(3)

18 of 33

Analyzing paths for use-after-frees

  • Convert SSA into a def-use graph

18

A *a0 = new A;

if (...) {

a1 = ERROR;

}

a2 = φ(a0,a1)

a2->x = 10;

(4)

a0

a1

a2

E

A

->x

19 of 33

Analyzing paths for use-after-frees

  • Find a path from an error def to a use

19

A *a0 = new A;

if (...) {

a1 = ERROR;

}

a2 = φ(a0,a1)

a2->x = 10;

(5)

a0

a1

a2

E

A

->x

20 of 33

How do we represent this graph?

  • Want to avoid pointer traversals
    • Need fast, space efficient representation

20

a0

a1

a2

E

A

->x

(1)

21 of 33

How do we represent this graph?

  • Start by labelling the nodes
    • We care about the binary representation of the labels

21

a0

a1

a2

E

A

->x

(2)

A 0 000

E 1 001

a0 2 010

a1 3 011

a2 4 100

>x 5 101

Labeled Nodes

22 of 33

How do we represent this graph?

  • Concat binary labels to form edges
    • We interpret each edge as boolean formula row

22

a0

a1

a2

E

A

->x

(3)

A 0 000

E 1 001

a0 2 010

a1 3 011

a2 4 100

>x 5 101

Labeled Nodes

Edges as a boolean formula

(a0,A) = 010 000 1

(a1,E) = 011 001 1

(a2,a0) = 100 010 1

(a2,a1) = 100 011 1

(>x,a2) = 101 100 1

= … 0

23 of 33

How do we represent this graph?

  • Represent boolean formula as a binary decision diagram

23

(4)

(not the same boolean formula)

24 of 33

What about function calls?

  • Want to follow paths across functions
    • Allocate in the caller
    • Free in the callee�
  • Need a function pointer oracle
    • Query when building SSA graph:� “What functions are pointed-to by� this function pointer?”
    • Fast to compute
    • Fast to query

24

void (*)(void)

25 of 33

Points-to analysis

  • Goal: where do pointers point?�
  • Path-insensitive analysis

25

int x;

int y;

int *p = &x;

int *q = p;

p = &y;

(1)

26 of 33

Points-to analysis

  • Goal: where do pointers point?�
  • Path-insensitive analysis

26

int x;

int y;

int *p = &x;

int *q = p;

p = &y;

p

x

y

(2)

27 of 33

Points-to analysis

  • Goal: where do pointers point?�
  • Path-insensitive analysis

27

int x;

int y;

int *p = &x;

int *q = p;

p = &y;

p

x

y

q

(3)

28 of 33

Points-to analysis

  • Goal: where do pointers point?�
  • Path-insensitive analysis

28

int x;

int y;

int *p = &x;

int *q = p;

p = &y;

p

x

y

q

(4)

29 of 33

Points-to analysis

  • Tells us where function pointers point
    • Also tells us where regular pointers point
    • Oblivious to things like control-flow, so can’t directly find use-after-frees�
  • Points-to analysis is easy-ish to scale
    • Graph is proportional to number of pointer assignments, not number of program paths
    • Easy to query: neighbors of a pointer are objects

29

(5)

30 of 33

And so the pieces come together

  • Turn program into a def-use graph
    • Control-flow problem becomes a data-flow problem�
  • Query the points-to graph to extend data-flow through function pointers
    • Improves scalability at the cost of some imprecision�
  • Search the def-use graph to find paths from frees (error def) to uses (deref)

30

31 of 33

Now I understand, how do I use it?

Show me a demo!

  • Compile program to LLVM bitcode
    • whole-program-llvm (on GitHub)�
  • Run PointsTo on your program
    • ./run.sh /path/to/program.bc�
  • Inspect the warnings
    • ls /path/to/program.*.path

31

32 of 33

Demo!

32

33 of 33

Any questions?

33