1 of 45

Attacking Binary Programs

Exploitation III

Gavin Lo, Tillson Galloway

Adapted from our lecture series from GreyHat at Georgia Tech

2 of 45

Exploitation

Reconnaissance

Enumeration

Exploitation

Post-Exploitation

Clean-up

How do we find potential vulnerabilities?

Once we have found vulnerabilities, how do we exploit them?

3 of 45

What is Binary?

The “language” that a computer understands
All numbers are translated to 1’s and 0’s
All programs are translated to simple instructions made by 1’s and 0’s

4 of 45

Binary

0000

0001

0010

0011

0111

0

1

2

3

7

Binary (base 2) Decimal (base 10)

5 of 45

Negative numbers in binary

Naive implementation: use only the last bit

1001

1111

0000

1000

-1

-7

0

also 0?

Problem: how do we represent 0?

6 of 45

Two’s Complement

Use the last bit, but count differently

1001

1111

0000

1000

-7

-1

0

-8

Prevents “positive/negative 0”

Also allows you to add negative numbers for the correct result!

0000

Positive Numbers

1000

Negative Numbers

Warps Around!

7 of 45

Adding Two’s Complement Numbers

11111111 (-1) 11111111 (-1)

+11111111 (-1) +0000001 (+1)

_________ _________

111111110 10000000

Truncated to: Truncated to:

11111110 (-2) 0000000 (0)

8 of 45

Integer Overflow

Some programs parse two’s complement numbers without checking whether the number is too big, allowing you to give them negative numbers as input!

For a 32-bit number: E.G. (0000 0000 0000 0000 0000 0000 0000 0000), anything between 2^31 and 2^32-1 becomes a negative number because it changes the first bit!

9 of 45

Example

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

10 of 45

Before the code executes

->

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

00000000000000000000000000000

inputString

0000 (0₁₀)

deposit

0011 (3₁₀)

balance

11 of 45

Example

char inputString[16]; // create string of max length 16

-> gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

110101010101010000000000000000

inputString

0000 (0₁₀)

deposit

0011 (3₁₀)

balance

Input: “6”

12 of 45

Example

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

-> int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

110101010101010000000000000000

inputString

0110 (6₁₀)

deposit

0011 (3₁₀)

balance

Input: “6”

13 of 45

Example

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

-> balance = balance + deposit; // update balance with deposit

110101010101010000000000000000

inputString

0110 (6₁₀)

deposit

1101 (-3₁₀)

balance

balance = 0110 + 0011

0110

0011

1101

Negative!

balance = 3 + 6

14 of 45

Attacking Binary Programs

Everything running on a computer can be traced back to binary machine code

Exploiting services running on ports can lead to remote code execution
Exploiting Windows system services and Linux SETUID programs forms the basis for privilege escalation (gaining higher privileges than we already have, like Windows SYSTEM user or Linux root)

By learning to exploit binary files, we can potentially exploit any application

Even ones written in a high level language by exploiting the language itself (see shellshock variants CVE-2014-7186 and CVE-2014-7187, two out of bounds memory accesses in Bash)!

Similar to code injection at a binary level

We inject shellcode (foreign machine code) into the program, then redirect the execution flow of the program into our shellcode.

A few weeks ago we talked about attacking web applications. You can attack web applications with these techniques too, but instead of going for the web application code itself, you would be going for the web server’s code; instead of going for XSS exploits for specific sites, you would be finding exploits in the web browser itself. You can do this because everything on a computer can be traced back to binary machine code.
Some examples of a language being exploited was the high-profile shellshock vulnerability. While most of its variants were logic errors with function parsing (arguably also binary exploits, just not the type we are discussing today), there were two variants that were out of bounds memory accesses, listed here. Shellshock was high-profile because many applications made calls to the bash shell, as it was (and still is) a widely used shell in Linux.
CVE-2014-7186 is an out-of-bounds memory access due to redirecting multiple strings into a single command. CVE-2014-7187 is an off-by-one memory access error
Attacking programs is actually very similar to code injection, just at a lower level. We inject shellcode (foreign machine code) into the program, then redirect the execution flow of the program into our shellcode.

15 of 45

The Stack

“The Stack” is how computers organize memory.
A huge array of ones and zeros
Operations

Access data at index (to get or modify)
Push to top
Pop from top

Cannot change the size of data that is in the stack
Cannot add or remove variables from the middle of the stack
On Linux, the stack grows downward (so the top of the stack�is the lowest part in terms of memory addresses)

0011

1001

1111

PUSH!

16 of 45

Memory

Program instructions

Stack

17 of 45

Registers

Registers are temporary variables on a CPU
General purpose registers

Can be used for anything

Special registers

What is the current line of machine code being executed?
Where should this function go when it returns?

RIP: address of the next instruction to be executed
RBP: address of the bottom of the stack
RSP: address of the top of the stack

18 of 45

The Stack

“The Stack” is how most computers organize memory.

We can push or pop new variables (including arrays) to and from the top of the stack. On x86_64, we use the “push” and “pop” instructions
We can modify existing variables or arrays already on the stack
We cannot change the size of arrays already on the stack
We cannot add or remove variables from the middle of the stack
On Linux, the stack grows downward (so the top of the stack�is the lowest part in terms of memory addresses)

PUSH!

19 of 45

Stack Frames

The stack is not only for variables, but also for functions!
Each function call creates a new stack frame on the front of the stack

The Stack Frame contains information about the current function

Temporary variables
Where to jump to when the function returns

Variable

Saved RIP

Saved RBP

RSP (top of stack)

RIP (register instruction pointer): address of the next instruction to be executed
RBP (register base pointer): address of the bottom (or base) of the stack
RSP (register stack pointer): address of the top of the stack

20 of 45

Memory

Program instructions

Stack

Variable

Saved RIP

Saved RBP

main()

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”;

}

func2() {

puts(“Hello!”);

}

21 of 45

Memory

Program instructions

Stack

Variable

Saved RIP

Saved RBP

Variable

Saved RIP

Saved RBP

main()

func1()

main() {

puts(func1()); <-

}

func1() {

func2(); <-

return “World!”;

}

func2() {

puts(“Hello!”);

}

22 of 45

Memory

Program instructions

Stack

Variable

Saved RIP

Saved RBP

Variable

Saved RIP

Saved RBP

main()

func1()

func2() - Stack Overflow!

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”; <-

}

func2() {

puts(“Hello!”); <-

}

Variable

Saved RIP

Saved RBP

23 of 45

Memory

Program instructions

Stack

Variable

Saved RIP

Saved RBP

Variable

Saved RIP

Saved RBP

main()

func1()

main() {

puts(func1()); <-

}

func1() {

func2(); <-

return “World!”;

}

func2() {

puts(“Hello!”);

}

24 of 45

Memory

Program instructions

Stack

Variable

Saved RIP

Saved RBP

Variable

Saved RIP

Saved RBP

main()

func1()

func2()

main() {

puts(func1()); <-

}

func1() {

func2(); <-

return “World!”;

}

func2() {

puts(“Hello!”); <-

}

Variable

Saved RIP

Saved RBP

25 of 45

Memory

Program instructions

Stack

Variable

Saved RIP

Saved RBP

Variable

Saved RIP

Saved RBP

main()

func1()

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”; <-

}

func2() {

puts(“Hello!”);

}

26 of 45

Memory

Program instructions

Stack

“World”

Variable

Saved RIP

Saved RBP

main()

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”;

}

func2() {

puts(“Hello!”);

}

27 of 45

Memory

Program instructions

Stack

“World”

Variable

Saved RIP

Saved RBP

0xc0

0xc4

0xc8

0xcc

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”;

}

func2() {

puts(“Hello!”);

}

Memory addresses

0x00

0x04

0x08

0x0c

0x08

0x0c

0xad

0xae

0xaf

0xbe

0xbc

28 of 45

Registers

In x86_64 assembly, most of our operations must be done between registers, or a register and a memory address.
x86_64 has many registers, including but not limited to:

RIP: address of the next instruction to be executed
RSP: address of the top of the stack
RBP: address of the bottom of the stack
RAX: accumulator register
RBX: base register
RCX: counter register
RDX: data register
R9-R15: general purpose registers

To access a memory address, we use the syntax “ADD RAX,[RBX]” to add the value stored in the memory address *RBX to the value of RAX.

This is called a “register dereference”, and is similar to pointers in C

29 of 45

Stack Frames

The stack is not only for variables, but also for functions!
Each function call creates a new stack frame on the front of the stack

In x86_64, RSP (stack pointer) points to the top of the stack, while RBP (base pointer) points to the bottom of the stack.
Whenever we call a function, certain saved registers are pushed onto the stack to be stored in our new stack frame, including RIP (instruction pointer) and the previous RBP (base pointer).
This allows us to restore the registers to a similar state as what it was before calling the function, and minimizes unexpected behavior from calling functions.
These saved registers form the “stack frame”

Variable

Saved RIP

Saved RBP

RSP (top of stack)

30 of 45

Arrays

Arrays are simply stored on the stack as contiguous variables. When you index an element of an array, you add the element number times the number of bytes of each element to the base of the array.
Arrays in x86_64 Linux are indexed upward, not downward like the way the stack grows.

array[1]

array[0]

array[2]

Base of Array

31 of 45

No Bounds Checking

Most compile-to-machine-code languages (C, Assembly, some C++ operators) offer no bounds checking.

Bounds checking is checking upon every array access if you are asking for an index that is out of bounds. In languages with bounds checking (like Java and Python), this will throw an exception.
The lack of bounds checking can potentially improve performance drastically; consider the two following assembly programs that sum up an array stored in RBX with length in RCX into RAX:

XOR RAX,RAX; Zero out RAX�REP ADD RAX,[RBX+RCX*8]; Add the RCXth index of RBX into RAX, decrementing�

RCX each time until it reaches zero

32 of 45

No Bounds Checking (cont)

Bounds checked program:

; Assume we have the length of the array stored in the memory address [length]�XOR RAX,RAX; Zero out RAX�.LOOP:� LEA RDX,[RBX+RCX*8]; Get address of our next thing to add� CMP RDX,0; Bounds check. Make sure it’s between 0 and length.� JL .ERROR;� CMP RDX,[length];� JGE .ERROR;� ADD RAX,[RDX]; Add the value� DEC RCX; Decrement RCX� TEST RCX,RCX; Test if RCX is 0. If not, repeat the loop.� JNZ .LOOP;

33 of 45

Buffer Overflows

A direct consequence of lack of bounds checking
Another type of overflow error

Like integer overflows! Except instead of overwriting a bit, we’re writing the stuff after a buffer in memory!

A buffer is simply a region of memory allocated to store data
The simplest type of buffer overflow is a stack-based buffer overflow
We write over the ends of the buffer into other variables on the�stack (potentially including the stack frame)

Let’s say we have an array of 16 1-byte characters in a function, and we input�a string to it…

array[7-15]

array[0-7]

Saved RIP

Saved RBP

RSP (top of stack)

34 of 45

Example

-> char inputString[8]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

****************************

inputString

0110 (6₁₀)

deposit

1101 (-3₁₀)

balance

****************************

Saved RIP register

“AAAAAAAAAAA” (len = 12)

35 of 45

Example

char inputString[8]; // create string of max length 16

-> gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

1100110011001100110011001100

inputString

0110 (6₁₀)

deposit

1101 (-3₁₀)

balance

11001100110011000000********

Saved RIP register

“AAAAAAAAAAA” (len = 12)

36 of 45

Example

char inputString[8]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

-> int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

1100110011001100110011001100

inputString

0110 (6₁₀ + 0)

deposit

1101 (-3₁₀)

balance

11001100110011000000

Saved RIP register

“AAAAAAAAAAA” (len = 12)

37 of 45

Example

char inputString[8]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

-> balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

1100110011001100110011001100

inputString

0110 (6₁₀)

deposit

1101 (-3₁₀ + 0)

balance

11001100110011000000

Saved RIP register

“AAAAAAAAAAA” (len = 12)

38 of 45

Example

11001100110011000000 -> Could be anything, but probably outside bounds of program

RIP: address of the next instruction to be executed

11001100110011001100110011001100

inputString[8]

0110 (6₁₀)

deposit

1101 (-3₁₀)

balance

11001100110011000000

RIP register

“AAAAAAAAAAA” (len = 12)

“Segmentation fault!”

39 of 45

Example

1100

inputString[0]

0110

deposit

1101

balance

1100

inputString[1]

1100

inputString[2]

1100

inputString[3]

1100

inputString[4]

1100

inputString[5]

1100

inputString[6]

1100

inputString[7]

1100

RIP

1100

char inputString[8];

int deposit

int balance

1100

40 of 45

How to exploit a buffer overflow

To overflow a buffer, just send it too much data

What is your name?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault

Once we find a buffer that we can overflow, we can inject data into its surrounding memory (specifically, wherever the RIP variable is).

echo -e ‘AAAAAAAABBBBBBBB\x46\x11\x40\x00\x00\x00’ | ./myProgram
NOTE: The bytes go in reverse order! (this is related to your CPU’s endianness)
GDB is a helpful debugger to see what memory you changed, and therefore how many bytes you need to get to RIP (see appendix slide)

41 of 45

Demo: Buffer Overflows

42 of 45

Protection Mechanisms (Stack Carnies)

A simple way to protect against buffer overflows overwriting the cached RIP is to introduce a stack canary. This makes it so that before returning from any function, we check whether a “canary” has been modified. If it has, we know that an attacker has triggered a stack overflow, and the program will simply crash.

Data

RBP

CANARY

RIP

43 of 45

Protection Mechanisms (Stack Carnies)

A simple way to protect against buffer overflows overwriting the cached RIP is to introduce a stack canary. This makes it so that before returning from any function, we check whether a “canary” has been modified. If it has, we know that an attacker has triggered a stack overflow, and the program will simply crash.
As we can see, the attacker can no longer write to the cached�RBP or RIP!

Shellcode

Pointer

Canary Modified! Program Crashes!

44 of 45

Protection Mechanisms (ASLR and DEP)

There are other ways modern computers protect against exploits like this. These include ASLR (address space layout randomization) and DEP (data execution prevention).

ASLR randomizes the top bytes of all addresses in the application whenever it is run. To get around this, an attacker must take care to only overwrite the lower bytes, and the attacker can only access addresses within the same memory range.
DEP separates address ranges into “data” or “code”, which each have a different combination of (read/write/execute) permissions. Code typically cannot be written to, but can be executed, while data cannot be executed, but can be written to. To get around this, an attacker must use ROP - return oriented programming - which consists of stitching shellcode together from library calls or functions in the program.

45 of 45

Useful Commands & Resources

Disassemble (dump assembly code of) a binary: objdump -d [filename]
Additional resources on x86 Assembly registers/instructions: https://www.cs.yale.edu/flint/cs421/papers/x86-asm/asm.html
Use gdb [filename] to debug a program

Use Intel syntax: set disassembly-flavor intel
Step over instructions: set step-mode on
To see what’s going on, layout asm
GDB tutorial: https://www.cs.cmu.edu/~gilpin/tutorial/
First, run the program
step inside a function call
info frame gives you information about the current stack frame
Run over a function onto the next line
If you know where to stop already, set a breakpoint there. (e.g. break main)