1 of 29

NEUZZ: Efficient Fuzzing with Neural Program Smoothing

Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, and Suman Jana

Columbia University

1

2 of 29

Fuzzing: a popular way to uncover bugs

2

[Liang et al. 2019]

3 of 29

Existing fuzzers

3

  • Coverage-guided fuzzer:
  • AFL (one of the most popular fuzzer)
  • Libfuzzer (integrated in LLVM mainline codebase)
  • Many afl variants(e.g., aflfast, fair-fuzz, )
  • Dynamic taint analysis based fuzzer: Vuzzer, angora
  • Hybrid fuzzer(fuzzer + concolic execution): QSYM, Savior
  • Learning-based fuzzer: Learn && Fuzz, Neuzz, MTFuzz

4 of 29

Evolutionary Fuzzing

Advantage: easy to implement

Disadvantage: inefficient

  • Random mutations are not effective
  • Often get stuck in long sequence of wasteful mutations

4

Mutation

Hard to find scalable and adaptive heuristics for guided mutation

Seed

Children

Grandchildren

5 of 29

A new approach to fuzzing-> ML

5

6 of 29

ML models learns knowledge from training data(seen, past ) and predicts on testing data(unseen, future).

6

7 of 29

Fuzzing is actually an repeating process which executes a tested program for million/billion times.

7

8 of 29

ML needs to define a loss function, then uses training data to optimize towards specific objective.

8

9 of 29

Fuzzing: An Optimization Problem

9

a program input

# of bugs found by input

generate K inputs from input space

Maximize

is discrete and hard to optimize

Find C(X) that can maximize total no. of bugs

10 of 29

Fuzzing: An Optimization Problem

10

: # of bugs

Input

Hard to find inputs like and

among flat plateaus

11 of 29

Fuzzing: An Optimization Problem

11

a program input

edge coverage of input

generate K inputs from input space

Maximize

Find C(X) that can maximize total number of edges

12 of 29

Fuzzing: An Optimization Problem

12

Input

: # of edges

13 of 29

Evolutionary optimization

13

Input

1

2

3

4

5

Random mutation is not efficient

: # of edges

14 of 29

Gradient-guided Optimization

14

Input

: # of edges

Smooth Approximation + Gradient-guided Mutation

: smooth approximation of

15 of 29

Gradient-guided Optimization

15

: smooth approximation of

Input

Smooth Approximation + Gradient-guided Mutation

1

2

3

4

5

16 of 29

Smooth Approximation

16

Problem:

How to smoothly approximate G(x)?

Neuzz Solution:

Use a NN to learn a smooth H(x)

Universal Approximation Theorem:

A NN can approximate any continuous function

17 of 29

Gradient-guided Mutation

17

Why gradient guidance?

Gradient indicates critical parts of input

What are critical parts of the input?

Critical parts of input affect program branches

How gradient-guided mutation works?

Focus mutations on the critical parts of the input

18 of 29

Main Idea behind Neuzz

18

Input

Branching Behaviors

Program

NN

Gradient-guided mutation

Smooth

Surrogate

Input

Branching

Behaviors

19 of 29

A Peek Into NN Model

19

20 of 29

Generalization to Unseen branches

Observations:

  • Real world program inputs have critical parts
  • Most of branches are affected by the critical parts

Neuzz Solution:

  • Identify critical parts based on observed branches
  • Perform more mutations on the critical part of inputs to explore unseen branches

20

21 of 29

Design of NEUZZ

21

22 of 29

Evaluation

  • 10 real world programs
  • Lava-M and DARPA CGC datasets
  • Comparison with RNN-based fuzzers
  • Performance of different model choices

22

23 of 29

Evaluations: Edge Coverage �NEUZZ vs. state-of-the-art fuzzers�

10 real world applications for 24 hours

23

NEUZZ achieves on average 3x more edge coverage than other fuzzers

24 of 29

Evaluations: Bug Finding �NEUZZ vs. state-of-the-art fuzzers�

24

NEUZZ finds the most number of bugs and all 5 bug types including two new CVEs

25 of 29

Evaluations: Lava-M and CGC

25

NEUZZ outperforms state-of-the-art fuzzers on LAVA-M and CGC

Lava-M dataset

DARPA CGC dataset

26 of 29

Evaluations: NEUZZ vs. RNN-based Fuzzer

26

NEUZZ achieves 6x more edge coverage and 20x less training time

27 of 29

Evaluations: Effect of Different NNs

27

NEUZZ achieves best performance with

NN+Incremetal learning

Edge coverage for 1M mutations

28 of 29

Key Takeaways of NEUZZ

  • Use NN gradients to identify the critical locations of program inputs
  • Focus mutations on the critical locations
  • Minimize runtime overhead by using simple feed-forward neural networks
  • Retrain the network incrementally to find new critical locations

28

29 of 29

Github Repo

NEUZZ is available at https://github.com/Dongdongshe/neuzz

29