3 of 47

Dynamic tools: userspace

AddressSanitizer, ThreadSanitizer, MemorySanitizer, ...

C/C++/Go bug detectors
Detect wide range of bug types (use-after-frees and out-of-bounds, data races and deadlocks, uninitialized memory uses, undefined behaviors)
Built into Clang, GCC and Go compilers

libFuzzer

In-process coverage-guided fuzzing engine for C/C++

oss-fuzz

Continuous fuzzing service for open source software

4 of 47

Agenda

Introduction to dynamic analysis and fuzzing
Linux kernel overview
Kernel bug detectors: Sanitizers
Kernel fuzzing: syzkaller and syzbot

5 of 47

Introduction

6 of 47

Static and dynamic analysis

Static analysis "is the analysis of computer software that is performed without actually executing programs". Essentially reading the source code (or the disassembly) manually or processing it using some tools.

Dynamic analysis "is the testing and evaluation of an application during runtime". Essentially running the program and checking whether it crashes or inspecting it during runtime.

7 of 47

Dynamic analysis: why?

Context:

Huge codebases
Limited human resources

Advantages:

No false positives
Undeniable proofs
Works on any code (pointers, unions, arrays, indirect calls, etc.)

Downsides:

See next slide...

8 of 47

Dynamic analysis

To successfully perform dynamic analysis we need:

A way to detect bugs when they happen
Some bugs don't manifest in any obvious way (e.g. OOB read)
Examples of bug detectors: valgrind, ASan

A way to execute different parts of the program
The more code we cover, the more the chance to hit the code that contains a bug
Example approaches: run unit / regression / … tests (if they exist), fuzzing

(A way to run and debug the application)

9 of 47

Fuzzing

Idea: let's feed randomly generated inputs to our program

Dumb approach: randomly generate binary blobs

Smarter approaches:

Grammar-based fuzzing: generate structured inputs (assuming we know the structure of the inputs) e.g. based on grammar/templates
Coveraged-guided fuzzing: trace the execution of the program on each of the tested inputs and save the ones that trigger new paths within the program to corpus
Mutation-based fuzzing: generate new inputs by modifying (mutating) existing ones (assuming we have a corpus of inputs)

10 of 47

Grammar-based fuzzing

Fuzzer uses explicit input grammar to generate complex realistic inputs:

HTTP-message = Simple-Request | Simple-Response | Full-Request | Full-Response

Simple-Request = "GET" SP Request-URI CRLF

Full-Request = Request-Line *( General-Header | Request-Header | Entity-Header ) CRLF

[ Entity-Body ]

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Request-URI = "*" | absoluteURI | abs_path

absoluteURI = scheme ":" *( uchar | reserved )

HTTP-Version = "HTTP" "/" 1*DIGIT "." 1*DIGIT

http_URL = "http:" "//" host [ ":" port ] [ abs_path ]

...

11 of 47

Coverage-guided fuzzing

start with some initial corpus of inputs (potentially empty)

for (;;) {

choose an input form the corpus and mutate it

execute and collect code coverage

if (input gives new coverage)

add it to corpus

}

Advantages:

Turns exponential problem into linear (more or less)
Inputs are reproducers
Corpus is suitable for regression testing

12 of 47

Target:

Linux kernel

13 of 47

Linux kernel

20 million lines of code
Written in an unsafe language: C
Almost no unit/regression tests

"Regression testing"? What's that? If it compiles, it is good; if it boots up, it is perfect. - Linus Torvalds

A lot of undocumented interfaces

14 of 47

Kernel inputs

Kernel

Userspace

syscalls

network

...

USB

15 of 47

Linux kernel fuzzing

Goal: grammar-based coverage-guided fuzzing

Need a way to collect coverage
Need descriptions of all syscalls

Need bug detectors

16 of 47

Bug detectors:

Sanitizers

17 of 47

Kernel Sanitizers

KASAN (use-after-frees and out-of-bounds)

CONFIG_KASAN available upstream since 4.0

KTSAN (data-races and deadlocks)

prototype available at https://github.com/google/ktsan

KMSAN (uninitialized-memory-use)

prototype available at https://github.com/google/kmsan

KHWASAN (similar to KASAN, requires less memory)

prototype available at https://github.com/xairy/kasan-prototype/tree/khwasan

18 of 47

KASAN report

BUG: KASAN: use-after-free in sctp_id2assoc+0x3a3/0x3b0 net/sctp/socket.c:224

Read of size 8 at addr ffff8800385b6838 by task syz-executor2/5889

Call Trace:

sctp_id2assoc+0x3a3/0x3b0 net/sctp/socket.c:224

sctp_setsockopt_rtoinfo net/sctp/socket.c:2931 [inline]

sctp_setsockopt+0x4b7/0x60f0 net/sctp/socket.c:4018

sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2850

SYSC_setsockopt net/socket.c:1798 [inline]

SyS_setsockopt+0x270/0x3a0 net/socket.c:1777

entry_SYSCALL_64_fastpath+0x1f/0xbe

19 of 47

KASAN report

Allocated by task 5841:

kzalloc include/linux/slab.h:663 [inline]

sctp_association_new+0x114/0x2180 net/sctp/associola.c:309

...

SyS_connect+0x24/0x30 net/socket.c:1569

entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 5882:

kfree+0xe8/0x2b0 mm/slub.c:3882

sctp_association_destroy net/sctp/associola.c:435 [inline]

...

syscall_return_slowpath+0x3ba/0x410 arch/x86/entry/common.c:263

entry_SYSCALL_64_fastpath+0xbc/0xbe

20 of 47

Fuzzer:

syzkaller

21 of 47

Existing system call fuzzers

Trinity (and others) in essence:

while (true) syscall(rand(), rand(), rand());

Knows argument types, so more like:

while (true) syscall(rand(), rand_fd(), rand_addr());

Tend to find shallow bugs
Frequently no reproducers

22 of 47

syzkaller

Coverage-guided
Grammar-based
Unsupervised
Multi-

OS (Linux, *BSD, Fuchsia, …)
arch (x86-64, arm64, …)
machine (QEMU, Android phones, ...)

Can generate C reproducers for found bugs

23 of 47

syzkaller

As of now found over 2500 bugs (https://github.com/google/syzkaller/wiki/Found-Bugs + https://syzkaller.appspot.com/ + internal)
Numerous CVEs
At least 4 public local privilege escalation exploits (CVE-2017-7308, CVE-2017-6074, CVE-2017-2636, CVE-2017-1000112)

24 of 47

Coverage for the Linux kernel

Available upstream with CONFIG_KCOV
GCC pass that inserts a function call into every basic block
kernel debugfs extension that collects and exposes coverage per-thread

__fuzz_coverage(); // 1

if (...) {

__fuzz_coverage(); // 2

...

}

__fuzz_coverage(); // 3

if (...) {

...

}

25 of 47

Syscall descriptions

Declarative description of all syscalls:

open(file filename, flags flags[open_flags],

mode flags[open_mode]) fd

read(fd fd, buf buffer[out], count len[buf])

close(fd fd)

open_flags = O_RDONLY, O_WRONLY, O_RDWR, O_APPEND ...

26 of 47

Programs

The description allows to generate and mutate "programs" in the following form:

mmap(&(0x7f0000000000), (0x1000), 0x3, 0x32, -1, 0)

r0 = open(&(0x7f0000000000)="./file0", 0x3, 0x9)

read(r0, &(0x7f0000000000), 42)

close(r0)

27 of 47

Algorithm

Start with an empty corpus of programs

Generate a new program, or choose an existing program from corpus and mutate it

Run the program, collect coverage

If new code is covered, minimize the program and add to the corpus

Goto 1

28 of 47

External kernel inputs

Kernel

Userspace

syscalls

network

...

USB

TUN/TAP

Gadget API

29 of 47

Test machines

Currently supports:

QEMU
GCE (Google Compute Engine)
Android devices (with serial cable/Suzy-Q)
ODROID C2 boards
Standalone machines (no reboot)

The list is extensible, to support a new type need to:

Reboot or recreate the machine (to repair after a crash)
Copy and run binaries (think of scp/ssh)
Get console output (to grep for crashes)

31 of 47

syzkaller:

automation

32 of 47

Ideally...

FUZZER

Resources

PoCs

33 of 47

Reality

Operation of a typical kernel fuzzer:

Manually create a bunch of VMs
Manually copy and start the binary
Manually monitor console output
Manually deduplicate crashes
Manually localize and reproduce
Manually restart the crashed VM (or press power button!)

Works only if you want to find a handful of crashes. Otherwise - full time job.

34 of 47

syzkaller operation

syz-manager

Resources

PoCs

test machines

corpus/programs

console output

manages test machines
monitors console output
deduplicates crashes
localizes bugs
creates PoCs

*but may incur higher up-front setup cost

35 of 47

Further automation

syz-ci

Continuous kernel/syzkaller updates
Restarts syz-manager on new kernel/syzkaller

syz-hub

Input/reproducer exchange between several syz-manager's

syzbot

Bug aggregation
Reporting
Status tracking

37 of 47

Exploitation

The exploits that I wrote:

https://github.com/xairy/kernel-exploits

Some materials regarding Linux kernel exploitation:

https://github.com/xairy/linux-kernel-exploitation

38 of 47

Contribution

Contribution to syzkaller

Syscall descriptions
Fixing bugs reported by syzbot

Other fun stuff to do

Find some exploitable bug and write an exploit

39 of 47

Questions?

https://github.com/google/syzkaller

https://syzkaller.appspot.com/

Andrey Konovalov <andreyknvl@google.com>

syzkaller@googlegroups.com

41 of 47

Other Linux kernel fuzzers

https://github.com/oracle/kernel-fuzzing

https://github.com/nccgroup/TriforceLinuxSyscallFuzzer

http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/

https://github.com/schumilo/vUSBf

42 of 47

Process structure

43 of 47

Syscall descriptions:

discriminated syscalls

socket$inet_tcp(

domain const[AF_INET],

type const[SOCK_STREAM],

proto const[0]) sock_tcp

socket$packet(

domain const[AF_PACKET],

type flags[packet_socket_type],

proto const[ETH_P_ALL_BE]) sock_packet

44 of 47

Syscall descriptions:

struct layout

sockaddr_ll {

sll_family const[AF_PACKET, int16]

sll_protocol flags[packet_protocols, int16be]

sll_ifindex ifindex

sll_hatype const[ARPHRD_ETHER, int16]

sll_pkttype int8

sll_halen const[6, int8]

sll_addr mac_addr

pad array[const[0, int8], 2]

}

bind$packet(fd sock_packet, addr ptr[in, sockaddr_ll], addrlen len[addr])

45 of 47

Syscall descriptions:

resources

resource sock[fd]

socket(domain flags[socket_domain], type flags[socket_type],

proto int8) sock

bind(fd sock, addr ptr[in, sockaddr_storage], addrlen len[addr])

resource sock_packet[sock]

socket$packet(domain const[AF_PACKET], type flags[packet_socket_type], proto const[ETH_P_ALL_BE]) sock_packet

bind$packet(fd sock_packet, addr ptr[in, sockaddr_ll], addrlen len[addr])

46 of 47

External Stimulus

Systems calls and external stimulus in the same program:

listen(r0)

emit_ethernet(syn)

emit_ethernet(ack)

r1 = accept(r0)

emit_ethernet(data)

read(r1)

emit_ethernet(rst)

Work in progress; also applicable to USB, ...

47 of 47

Finding 0days in the Linux kernel

Setup syzkaller

Write syscall descriptions for some previously untested kernel interface

Specify syscalls required to interact with this interface in the configuration file

Run syzkaller and discover bugs

1 of 47

2 of 47

3 of 47

4 of 47

5 of 47

6 of 47

7 of 47

8 of 47

9 of 47

10 of 47

11 of 47

12 of 47

13 of 47

14 of 47

15 of 47

16 of 47

17 of 47

18 of 47

19 of 47

20 of 47

21 of 47

22 of 47

23 of 47

24 of 47

25 of 47

26 of 47

27 of 47

28 of 47

29 of 47

30 of 47

31 of 47

32 of 47

33 of 47

34 of 47

35 of 47

36 of 47

37 of 47

38 of 47

39 of 47

40 of 47

41 of 47

42 of 47

43 of 47

44 of 47

45 of 47

46 of 47

47 of 47