1 of 70

Firmware Bug Hunting

With Taint Analysis

My Internship Experience

2 of 70

About me

  • CTFs
  • Binary

++ Reverse Engineering

++ Binary Exploitation

++ Tooling

3 of 70

About Akash

He will be sharing next week :D

4 of 70

Tooling // GEF

5 of 70

Tooling // Instrumentation

void foo(...)

{

// ... do something

}

void goo(...)

{

// ... do something

}

int main()

{

foo(...);

goo(...);

}

  • Take a program

6 of 70

Tooling // Instrumentation

void foo(...)

{

// ... do something

}

void goo(...)

{

// ... do something

}

int main()

{

foo(...);

goo(...);

}

void foo(...)

{

printf("Calling foo");

// ... do something

}

void goo(...)

{

printf("Calling goo");

// ... do something

}

int main()

{

foo(...);

goo(...);

}

  • Take a program
  • Add some print statements

7 of 70

Tooling // Instrumentation

  • Take a program
  • Add some print statements

Do this on compiled programs/apps that we don’t have source code for.

frida-trace -i "recv*" -i "read*" twitter

recv: Auto-generated handler: …/recv.js

...

recvfrom: Auto-generated handler: …/recvfrom.js

Started tracing 21 functions. Press Ctrl+C to stop.

39 ms recv()

112 ms recvfrom()

128 ms recvfrom()

129 ms recvfrom()

...

8098 ms recvfrom(socket=70,

buffer=0x32cc018, length=65536,

flags=0x0,

address=0xb0420bd8, address_len=16)

8 of 70

Tooling // Instrumentation

Use cases:

  • Trace function calls
  • Get code coverage
  • Get instruction statistics

9 of 70

Tooling // Symbolic Execution

  • Take a program that checks its given input

int main()

{

int n = read_int();

int x = n * 2;

int y = x + 4;

int z = y >> 3;

if (z == 1234)

puts("Correct PIN!");

}

10 of 70

Tooling // Symbolic Execution

  • Take a program that checks its given input
  • Execute it, but in a special way

int main()

{

int n = read_int();

int x = n * 2;

int y = x + 4;

int z = y >> 3;

if (z == 1234)

puts("Correct PIN!");

}

11 of 70

Tooling // Symbolic Execution

  • Take a program that checks its given input
  • Execute it, but in a special way
  • Treating an input value as symbolic (i.e. not concrete like 1, 1337, "abc")

int main()

{

int n = read_int();

int x = n * 2;

int y = x + 4;

int z = y >> 3;

if (z == 1234)

puts("Correct PIN!");

}

n: BitVec(32)

12 of 70

Tooling // Symbolic Execution

int main()

{

int n = read_int();

int x = n * 2;

int y = x + 4;

int z = y >> 3;

if (z == 1234)

puts("Correct PIN!");

}

n: BitVec(32)

x: n * 2

  • Take a program that checks its given input
  • Execute it, but in a special way
  • Treating an input value as symbolic

13 of 70

Tooling // Symbolic Execution

int main()

{

int n = read_int();

int x = n * 2;

int y = x + 4;

int z = y >> 3;

if (z == 1234)

puts("Correct PIN!");

}

n: BitVec(32)

x: n * 2

y: x + 4 // n * 2 + 4

  • Take a program that checks its given input
  • Execute it, but in a special way
  • Treating an input value as symbolic

14 of 70

Tooling // Symbolic Execution

int main()

{

int n = read_int();

int x = n * 2;

int y = x + 4;

int z = y >> 3;

if (z == 1234)

puts("Correct PIN!");

}

n: BitVec(32)

x: n * 2

y: x + 4 // n * 2 + 4

z: y << 3 // (n * 2 + 4) << 3

  • Take a program that checks its given input
  • Execute it, but in a special way
  • Treating an input value as symbolic

15 of 70

Tooling // Symbolic Execution

int main()

{

int n = read_int();

int x = n * 2;

int y = x + 4;

int z = y >> 3;

if (z == 1234)

puts("Correct PIN!");

}

n: BitVec(32)

x: n * 2

y: x + 4 // n * 2 + 4

z: y << 3 // (n * 2 + 4) << 3

check> z == 1234 // (n * 2 + 4) << 3 == 1234

  • Take a program that checks its given input
  • Execute it, but in a special way
  • Treating an input value as symbolic
  • Extract comparisons made based on symbolic values

16 of 70

Tooling // Symbolic Execution

Python 3.9.0 (default, Feb 20 2021, 21:41:45)

>>> from z3 import *

>>> n = BitVec('n', 32)

>>> x = n * 2

>>> y = x + 4

>>> z = y >> 3

n: BitVec(32)

x: n * 2

y: x + 4 // n * 2 + 4

z: y << 3 // (n * 2 + 4) << 3

check> z == 1234 // (n * 2 + 4) << 3 == 1234

  • Take a program that checks its given input
  • Execute it, but in a special way
  • Treating an input value as symbolic
  • Extract comparisons made based on symbolic values
  • Use a SMT solver to solve it

17 of 70

Tooling // Symbolic Execution

Python 3.9.0 (default, Feb 20 2021, 21:41:45)

>>> from z3 import *

>>> n = BitVec('n', 32)

>>> x = n * 2

>>> y = x + 4

>>> z = y >> 3

>>> s = Solver()

>>> s.add(z == 1234)

>>> s.check()

sat

>>> s.model()

[n = 4936]

  • Take a program that checks its given input
  • Execute it, but in a special way
  • Treating an input value as symbolic
  • Extract comparisons made based on symbolic values
  • Use a SMT solver to solve it

18 of 70

Tooling // Symbolic Execution

Many such tools out there these days

(mostly as binary analysis frameworks)

  • angr (UCSB, Team Shellphish)
  • BAP (CMU, Cylab)
  • Triton (Quarkslab)

19 of 70

Firmware Bug Hunting

With Taint Analysis

My Internship Experience

20 of 70

Target // Router

  • Download firmware from vendor website

21 of 70

Target // Router

  • Download firmware from vendor website
  • Extract using binwalk

binwalk -eM PROLiNK_WN552K1_V1.0.25_210722.bin

Scan Time: 2021-09-08 06:57:22

Target File: /tmp/PROLiNK_WN552K1_V1.0.25_210722.bin

MD5 Checksum: 70333e3d706bc121bef2a88cab4c4ae7

Signatures: 391

...

22 of 70

Target // Router

  • Download firmware from vendor website
  • Extract using binwalk
  • Find programs in squashfs-root
  • e.g. httpd, prog.cgi
  • Open in decompiler

23 of 70

Bug // Command Injection

void ping()

{

system("ping google.com");

}

  • C function system for calling shell commands

./a.out

PING google.com (74.125.24.138): 56 data bytes

64 bytes from 74.125.24.138: icmp_seq=0 ttl=106 time=4.234 ms

64 bytes from 74.125.24.138: icmp_seq=1 ttl=106 time=10.934 ms

64 bytes from 74.125.24.138: icmp_seq=2 ttl=106 time=12.126 ms

64 bytes from 74.125.24.138: icmp_seq=3 ttl=106 time=4.567 ms

24 of 70

Bug // Command Injection

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

C functions:

  • system - call shell commands
  • sprintf - create string based on format

./a.out google.com

PING google.com (74.125.24.138): 56 data bytes

64 bytes from 74.125.24.138: icmp_seq=0 ttl=106 time=4.234 ms

64 bytes from 74.125.24.138: icmp_seq=1 ttl=106 time=10.934 ms

64 bytes from 74.125.24.138: icmp_seq=2 ttl=106 time=12.126 ms

64 bytes from 74.125.24.138: icmp_seq=3 ttl=106 time=4.567 ms

system("ping google.com");

25 of 70

Bug // Command Injection

  • Run 1 command on shell

ping google.com

PING google.com (74.125.24.138): 56 data bytes

64 bytes from 74.125.24.138: icmp_seq=0 ttl=106 time=4.234 ms

64 bytes from 74.125.24.138: icmp_seq=1 ttl=106 time=10.934 ms

64 bytes from 74.125.24.138: icmp_seq=2 ttl=106 time=12.126 ms

64 bytes from 74.125.24.138: icmp_seq=3 ttl=106 time=4.567 ms

26 of 70

Bug // Command Injection

  • Run 2 commands on shell on the same line

ping google.com; whoami

PING google.com (74.125.24.138): 56 data bytes

64 bytes from 74.125.24.138: icmp_seq=0 ttl=106 time=4.234 ms

64 bytes from 74.125.24.138: icmp_seq=1 ttl=106 time=10.934 ms

64 bytes from 74.125.24.138: icmp_seq=2 ttl=106 time=12.126 ms

64 bytes from 74.125.24.138: icmp_seq=3 ttl=106 time=4.567 ms

--- google.com ping statistics ---

4 packets transmitted, 4 packets received, 0.0% packet loss

round-trip min/avg/max/stddev = 4.267/6.151/11.203/2.925 ms

daniellimws

27 of 70

Bug // Command Injection

  • Run 2 commands on shell with through a program that calls system

./a.out "google.com; whoami"

PING google.com (74.125.24.138): 56 data bytes

64 bytes from 74.125.24.138: icmp_seq=0 ttl=106 time=4.234 ms

64 bytes from 74.125.24.138: icmp_seq=1 ttl=106 time=10.934 ms

64 bytes from 74.125.24.138: icmp_seq=2 ttl=106 time=12.126 ms

64 bytes from 74.125.24.138: icmp_seq=3 ttl=106 time=4.567 ms

--- google.com ping statistics ---

4 packets transmitted, 4 packets received, 0.0% packet loss

round-trip min/avg/max/stddev = 4.267/6.151/11.203/2.925 ms

daniellimws

system("ping google.com; whoami");

28 of 70

Bug // Command Injection

This bug is somewhat common in routers.

29 of 70

Bug Hunting // Manual

  • Use decompiler (Ghidra)
  • Search for functions that call system/popen/execve
  • Check if user input is passed to system/popen/execve

  • Slow
  • Troublesome
  • Might miss out some bugs
  • So many to check

30 of 70

Bug Hunting // Automated

  • Use taint analysis to find functions where user input is passed to system/popen/execve

  • Faster (hopefully)
  • Less to check (hopefully)
  • Don’t miss bugs (hopefully)

31 of 70

Taint Analysis

  • Define sources and sinks

  • target is the source
    • controlled by user - user input
  • system is the sink
    • want user input to reach here

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

32 of 70

Taint Analysis

  • Define sources and sinks
  • Check if sink is tainted by source

  • Mark source as tainted
  • Pass the taint if value is copied to another variable/buffer
  • Check if argument to sink holds the taint

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

33 of 70

Approach

  • Use angr for binary analysis

Considerations:

  • Supports ARM and MIPS
  • Has built-in symbolic execution support
  • Actively maintained
  • Python (not completely a good thing)

34 of 70

Approach // Simulation

  • Routers normally run on ARM or MIPS
  • Run program to track taints
  • Need to simulate the program instead of actually running it

35 of 70

Approach // Simulation

  • Define variables/buffers/arrays as symbolic values

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

target: BitVec(1024*8) // 1024 bytes

command: BitVec(256*8) // 1024 bytes

36 of 70

Approach // Simulation

  • Define variables/buffers/arrays as symbolic values
  • Put taint on user input

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

target: BitVec(1024*8) // 1024 bytes

command: BitVec(256*8) // 1024 bytes

target <- taint

37 of 70

Approach // Simulation

Go through the program

  • When we reach a copying function like sprintf
  • Pass the taint from the source string to the destination string

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

target: BitVec(1024*8) // 1024 bytes

command: BitVec(256*8) // 1024 bytes

target <- taint

command <- taint

38 of 70

Approach // Simulation

Go through the program

  • When we reach a sink
  • Check if the sink argument is tainted

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

target: BitVec(1024*8) // 1024 bytes

command: BitVec(256*8) // 1024 bytes

target <- taint

command <- taint

check> is command tainted?

39 of 70

Approach // Simulation

If there is if, which path to consider?

void some_handler(char* target)

{

if (...)

// do something

else

// do something else

}

40 of 70

Approach // Simulation

If there is if, which path to consider?

  • Consider all possible code paths
  • Using states
    • Storing information about memory, registers, program state

void some_handler(char* target)

{

...

if (...)

// do something

else

// do something else

}

if

else

41 of 70

Approach // Simulation

How about loops?

  • Still consider all possible branches
    • Either stay in loop, or leave the loop

void some_handler(char* target)

{

...

for (...)

{

// do something

}

}

stay

leave

42 of 70

Approach // Simulation

How about loops?

  • The state that stays needs to branch again

void some_handler(char* target)

{

...

for (...)

{

// do something

}

}

stayed

leave

stay

43 of 70

Approach // Simulation

How about loops?

  • The state that stays needs to branch again, and again

void some_handler(char* target)

{

...

for (...)

{

// do something

}

}

stayed

leave

stay

44 of 70

Approach // Simulation

How about loops?

  • The state that stays needs to branch again, and again, and again

void some_handler(char* target)

{

...

for (...)

{

// do something

}

}

stayed

leave

stay

45 of 70

Approach // Simulation

How about loops?

  • State explosion
  • More states than my brain cells

void some_handler(char* target)

{

...

for (...)

{

// do something

}

}

stayed

leave

stay

46 of 70

Approach // Simulation

How about loops?

  • State explosion
  • More states than my brain cells

However, angr does follow some strategies to prioritize certain states

void some_handler(char* target)

{

...

for (...)

{

// do something

}

}

stayed

leave

stay

47 of 70

Approach // Simulation

Implementation

  • Analyse each function individually
    • Do taint analysis on all states
  • Because of state explosion, it might take forever to analyse a function
    • Set a 2 min timeout

48 of 70

Approach // Simulation

Results

Tested on some MIPS router firmware

  • For functions that have command injection bug, analysis ends very quickly 👍
  • For functions that don't have the bug, run the full 2 mins until timeout 👎

49 of 70

Approach // Simulation

Results

Tested on some MIPS router firmware

  • For functions that have command injection bug, analysis ends very quickly 👍
  • For functions that don't have the bug, run the full 2 mins until timeout 👎

  • prog.cgi has 100+ functions
  • Most functions take 2 mins to finish analysis
  • 3 hours to analyse

50 of 70

Approach // Simulation

Results

Tested on some MIPS router firmware

  • For functions that have command injection bug, analysis ends very quickly 👍
  • For functions that don't have the bug, run the full 2 mins until timeout 👎

  • prog.cgi has 100+ functions
  • Most functions take 2 mins to finish analysis
  • 3 hours to analyse
  • Bonus: angr has an unknown memory leak issue, so after 2 hours, 20GB of RAM gone

51 of 70

Approach // Feel sad

5 stages of grief

  • Denial
  • Anger
  • Bargaining
  • Depression
  • Acceptance

52 of 70

Approach // More research

After doing some more research on the Internet

  • https://github.com/angr/angr/issues/159

53 of 70

Approach // More research

And found some very helpful resources

54 of 70

Approach // Reaching Definitions

def-use/use-def graph of a function

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

55 of 70

Approach // Reaching Definitions

def-use/use-def graph of a function

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

ping:target

56 of 70

Approach // Reaching Definitions

def-use/use-def graph of a function

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

sprintf:target

ping:target

sprintf:format

("ping %s")

57 of 70

Approach // Reaching Definitions

def-use/use-def graph of a function

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

sprintf:target

ping:target

sprintf:format

("ping %s")

sprintf:command

58 of 70

Approach // Reaching Definitions

def-use/use-def graph of a function

void ping(char* target)

{

char command[256];

sprintf(command, "ping %s", target);

system(command);

}

sprintf:target

ping:target

sprintf:format

("ping %s")

sprintf:command

system:command

59 of 70

Approach // Reaching Definitions

Thanks to heavy-lifting done by angr

A program analysis problem becomes a graph traversal problem

Source flows to sink

sprintf:target

ping:target

sprintf:format

("ping %s")

sprintf:command

system:command

60 of 70

Approach // Reaching Definitions

Implementation

  • Analyse each function individually
    • Do taint analysis on all states
    • Same as before
  • No more state explosion

61 of 70

Approach // Reaching Definitions

Results

Tested on same MIPS firmware as before

  • Analysis on all 100+ functions ended in ~2mins 👍
  • Just 1-2 false positives

62 of 70

Approach // Reaching Definitions

Results

Tested on DIR-878 prog.cgi (2019-8312 to CVE-2019-8319)

  • After some tweaking, managed to find all the bugs above

63 of 70

Approach // Reaching Definitions

Results

Tested on D-Link DIR-878 prog.cgi (2019-8312 to CVE-2019-8319)

  • After some tweaking, managed to find all the bugs above

Same firmware is used on D-Link DIR-1960

  • And also managed to find 4 more authenticated command injection bugs
  • With 150 false positives... Took a long time to triage all

64 of 70

Approach // Reaching Definitions

Results

Tested on PROLiNK PRC2402M

  • Found >10 unauthenticated command injection bugs, exposed to WAN
  • Filed for CVE-2021-35400 to CVE-2021-35409

65 of 70

Approach // Reaching Definitions

Results

Tested on D-Link DIR-X1560

  • Found 3 authenticated command injection bugs, exposed to WAN
  • Filing CVEs soon

66 of 70

Demo

67 of 70

Takeaway

  • Modules that were relevant:
    • CS2107 Introduction to Information Security
    • CS2040C Data Structures & Algorithms
    • CS2113T Software Engineering

68 of 70

Thanks!

Questions?

69 of 70

References/Further readings

70 of 70

Credits