1 of 38

Rise of the HaCRS

Augmenting Autonomous Cyber Reasoning Systems with Human Assistance

Yan Shoshitaishvili, Michael Weissbacher, Lukas Dresel, Christopher Salls, Ruoyu "Fish" Wang, Christopher Kruegel, Giovanni Vigna

2 of 38

3 of 38

4 of 38

5 of 38

6 of 38

7 of 38

Implications of "Cyber Autonomy"

Detections must be directly actionable by automation.

This mandates dynamic analysis for bug hunting.

Cyber Reasoning Systems run up against the dynamic coverage problem.

8 of 38

9 of 38

Semantic Reasoning Capability

Scalability

10 of 38

Semantic Reasoning Capability

Scalability

11 of 38

12 of 38

13 of 38

if (input[0] == MAGIC_NUMBER) { ... }

if (strcmp(username, "backdoor_user") == 0) { ... }

if (x == y * 1337 - 50) { ... }

14 of 38

Semantic Reasoning Capability

Scalability

15 of 38

16 of 38

Semantic Reasoning Capability

Scalability

17 of 38

18 of 38

if (expression_parsed) { ... }

if (game_won) { ... }

if (turing_test()) { ... }

...

19 of 38

Semantic Reasoning Capability

Scalability

?

20 of 38

Semantic Reasoning Capability

Scalability

21 of 38

22 of 38

HaCRS

Autonomous

Non-autonomous

23 of 38

HaCRS

Autonomous

Non-autonomous

24 of 38

HaCRS

Autonomous

Non-autonomous

25 of 38

Rise of the HaCRS

26 of 38

HaCRS Interface

  • Program Description
  • Tasklet Directions

Example Interactions

1 2 3 4 5 6 7

PAPER> PAPER

TIE

ROCK> SCISSORS

YOU LOSE

Feedback

Score: 223/1225

MINIMUM GOAL MET!

Bonuses:

- 10 more functions

- Output "INVALID"

✔ Output "YOU WIN!!!"

✔ Output "EASTEREGG!!"

Terminal

PAPER> 0000

EASTER EGG!!!

PAPER> SCISSORS

YOU WIN!!!

Static Analysis

Suggestions

Educated Guesses:

  • ROCK
  • SCISSORS
  • LIZARD
  • SPOCK

Brute Force:

  • ~~~!@
  • 0000

SUBMIT

GIVE UP

27 of 38

HaCRS Interface

  • Program Description
  • Tasklet Directions

Example Interactions

1 2 3 4 5 6 7

PAPER> PAPER

TIE

ROCK> SCISSORS

YOU LOSE

Feedback

Score: 223/1225

MINIMUM GOAL MET!

Bonuses:

- 10 more functions

- Output "INVALID"

✔ Output "YOU WIN!!!"

✔ Output "EASTEREGG!!"

Terminal

PAPER> 0000

EASTER EGG!!!

PAPER> SCISSORS

YOU WIN!!!

Static Analysis

Suggestions

Educated Guesses:

  • ROCK
  • SCISSORS
  • LIZARD
  • SPOCK

Brute Force:

  • ~~~!@
  • 0000

SUBMIT

GIVE UP

28 of 38

HaCRS Interface

  • Program Description
  • Tasklet Directions

Example Interactions

1 2 3 4 5 6 7

PAPER> PAPER

TIE

ROCK> SCISSORS

YOU LOSE

Feedback

Score: 223/1225

MINIMUM GOAL MET!

Bonuses:

- 10 more functions

- Output "INVALID"

✔ Output "YOU WIN!!!"

✔ Output "EASTEREGG!!"

Terminal

PAPER> 0000

EASTER EGG!!!

PAPER> SCISSORS

YOU WIN!!!

Static Analysis

Suggestions

Educated Guesses:

  • ROCK
  • SCISSORS
  • LIZARD
  • SPOCK

Brute Force:

  • ~~~!@
  • 0000

SUBMIT

GIVE UP

29 of 38

HaCRS Interface

  • Program Description
  • Tasklet Directions

Example Interactions

1 2 3 4 5 6 7

PAPER> PAPER

TIE

ROCK> SCISSORS

YOU LOSE

Feedback

Score: 223/1225

MINIMUM GOAL MET!

Bonuses:

- 10 more functions

- Output "INVALID"

✔ Output "YOU WIN!!!"

✔ Output "EASTEREGG!!"

Terminal

PAPER> 0000

EASTER EGG!!!

PAPER> SCISSORS

YOU WIN!!!

Static Analysis

Suggestions

Educated Guesses:

  • ROCK
  • SCISSORS
  • LIZARD
  • SPOCK

Brute Force:

  • ~~~!@
  • 0000

SUBMIT

GIVE UP

30 of 38

HaCRS Interface

  • Program Description
  • Tasklet Directions

Example Interactions

1 2 3 4 5 6 7

PAPER> PAPER

TIE

ROCK> SCISSORS

YOU LOSE

Feedback

Score: 223/1225

MINIMUM GOAL MET!

Bonuses:

- 10 more functions

- Output "INVALID"

✔ Output "YOU WIN!!!"

✔ Output "EASTEREGG!!"

Terminal

PAPER> 0000

EASTER EGG!!!

PAPER> SCISSORS

YOU WIN!!!

Static Analysis

Suggestions

Educated Guesses:

  • ROCK
  • SCISSORS
  • LIZARD
  • SPOCK

Brute Force:

  • ~~~!@
  • 0000

SUBMIT

GIVE UP

31 of 38

Experimental Setup

85 programs from the DARPA Cyber Grand Challenge.

183 non-experts, 5 semi-experts.

$1,100.

Unassisted and assisted approaches, 8 hours each.

32 of 38

Usefulness of human assistance varies by semantic complexity.

Semantic Complexity

Expertise Required

Fuzzing

Drilling

HaCRS

High

Low

12

14

23

High

High

14

17

28

Low

Low

1

2

2

Low

High

1

3

3

28

36

56

33 of 38

Test Cases

Code Coverage

Crashes

Fuzzing

361

42.87

28

Drilling

649

44.91

36

HaCRS

437

53.45

56

34 of 38

Test Cases Discovered

35 of 38

Code Coverage

36 of 38

Experimental Results

This is effective!

  • 8.54% more absolute coverage, 19.01% relative improvement (median)
  • 20 new crashes (36 vs 56, 55.56% improvement).

Unskilled users significantly improved CRS effectiveness.

Semi-experts did not do significantly better than non-experts (interface issue?).

37 of 38

Next steps

Incentive structures.

Expertise utilization.

Other applications:

  • Static detection triage
  • Test-case selection
  • Patch verification
  • Exploit assistance
  • High-level planning

38 of 38

Questions?

Yan Shoshitaishvili - yans@asu.edu

Michael Weissbacher - mw@ccs.neu.edu

Lukas Dresel - lukas.dresel@cs.ucsb.edu

Chris Salls - salls@cs.ucsb.edu

Ruoyu "Fish" Wang - fish@cs.ucsb.edu

Chris Kruegel - chris@cs.ucsb.edu

Giovanni Vigna - vigna@cs.ucsb.edu

Team @Shellphish - team@shellphish.net

This presentation: https://goo.gl/n43PX7

Project materials: https://hacrs.org

Join in on slack: http://angr.io/invite.html