1 of 45

Attacking Binary Programs

Exploitation III

Gavin Lo, Tillson Galloway

Adapted from our lecture series from GreyHat at Georgia Tech

2 of 45

Exploitation

Reconnaissance

Enumeration

Exploitation

Post-Exploitation

Clean-up

How do we find potential vulnerabilities?

Once we have found vulnerabilities, how do we exploit them?

3 of 45

What is Binary?

  • The “language” that a computer understands
  • All numbers are translated to 1’s and 0’s
  • All programs are translated to simple instructions made by 1’s and 0’s

4 of 45

Binary

0000

0001

0010

0011

0111

0

1

2

3

7

Binary (base 2) Decimal (base 10)

5 of 45

Negative numbers in binary

Naive implementation: use only the last bit

1001

1111

0000

1000

-1

-7

0

also 0?

Problem: how do we represent 0?

6 of 45

Two’s Complement

Use the last bit, but count differently

1001

1111

0000

1000

-7

-1

0

-8

Prevents “positive/negative 0”

Also allows you to add negative numbers for the correct result!

0000

Positive Numbers

1000

Negative Numbers

Warps Around!

7 of 45

Adding Two’s Complement Numbers

11111111 (-1) 11111111 (-1)

+11111111 (-1) +0000001 (+1)

_________ _________

111111110 10000000

Truncated to: Truncated to:

11111110 (-2) 0000000 (0)

8 of 45

Integer Overflow

Some programs parse two’s complement numbers without checking whether the number is too big, allowing you to give them negative numbers as input!

For a 32-bit number: E.G. (0000 0000 0000 0000 0000 0000 0000 0000), anything between 2^31 and 2^32-1 becomes a negative number because it changes the first bit!

9 of 45

Example

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

10 of 45

Before the code executes

->

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

00000000000000000000000000000

inputString

0000 (010)

deposit

0011 (310)

balance

11 of 45

Example

char inputString[16]; // create string of max length 16

-> gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

110101010101010000000000000000

inputString

0000 (010)

deposit

0011 (310)

balance

Input: “6”

12 of 45

Example

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

-> int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

110101010101010000000000000000

inputString

0110 (610)

deposit

0011 (310)

balance

Input: “6”

13 of 45

Example

char inputString[16]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

-> balance = balance + deposit; // update balance with deposit

110101010101010000000000000000

inputString

0110 (610)

deposit

1101 (-310)

balance

balance = 0110 + 0011

0110

  • 0011

1101

Negative!

balance = 3 + 6

14 of 45

Attacking Binary Programs

  • Everything running on a computer can be traced back to binary machine code
    • Exploiting services running on ports can lead to remote code execution
    • Exploiting Windows system services and Linux SETUID programs forms the basis for privilege escalation (gaining higher privileges than we already have, like Windows SYSTEM user or Linux root)
  • By learning to exploit binary files, we can potentially exploit any application
    • Even ones written in a high level language by exploiting the language itself (see shellshock variants CVE-2014-7186 and CVE-2014-7187, two out of bounds memory accesses in Bash)!
  • Similar to code injection at a binary level
    • We inject shellcode (foreign machine code) into the program, then redirect the execution flow of the program into our shellcode.

15 of 45

The Stack

  • “The Stack” is how computers organize memory.
  • A huge array of ones and zeros
  • Operations
    • Access data at index (to get or modify)
    • Push to top
    • Pop from top

  • Cannot change the size of data that is in the stack
  • Cannot add or remove variables from the middle of the stack
  • On Linux, the stack grows downward (so the top of the stack�is the lowest part in terms of memory addresses)

0011

1001

1111

PUSH!

16 of 45

Memory

Program instructions

Stack

17 of 45

Registers

  • Registers are temporary variables on a CPU
  • General purpose registers
    • Can be used for anything
  • Special registers
    • What is the current line of machine code being executed?
    • Where should this function go when it returns?
  • RIP: address of the next instruction to be executed
  • RBP: address of the bottom of the stack
  • RSP: address of the top of the stack

18 of 45

The Stack

  • “The Stack” is how most computers organize memory.

  • We can push or pop new variables (including arrays) to and from the top of the stack. On x86_64, we use the “push” and “pop” instructions
  • We can modify existing variables or arrays already on the stack
  • We cannot change the size of arrays already on the stack
  • We cannot add or remove variables from the middle of the stack
  • On Linux, the stack grows downward (so the top of the stack�is the lowest part in terms of memory addresses)

PUSH!

19 of 45

Stack Frames

  • The stack is not only for variables, but also for functions!
  • Each function call creates a new stack frame on the front of the stack

  • The Stack Frame contains information about the current function
    • Temporary variables
    • Where to jump to when the function returns

Variable

Variable

Saved RIP

Saved RBP

RSP (top of stack)

  • RIP (register instruction pointer): address of the next instruction to be executed
  • RBP (register base pointer): address of the bottom (or base) of the stack
  • RSP (register stack pointer): address of the top of the stack

20 of 45

Memory

Program instructions

Stack

Variable

Variable

Saved RIP

Saved RBP

main()

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”;

}

func2() {

puts(“Hello!”);

}

21 of 45

Memory

Program instructions

Stack

Variable

Variable

Saved RIP

Saved RBP

Variable

Variable

Saved RIP

Saved RBP

main()

func1()

main() {

puts(func1()); <-

}

func1() {

func2(); <-

return “World!”;

}

func2() {

puts(“Hello!”);

}

22 of 45

Memory

Program instructions

Stack

Variable

Variable

Saved RIP

Saved RBP

Variable

Variable

Saved RIP

Saved RBP

main()

func1()

func2() - Stack Overflow!

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”; <-

}

func2() {

puts(“Hello!”); <-

}

Variable

Variable

Saved RIP

Saved RBP

23 of 45

Memory

Program instructions

Stack

Variable

Variable

Saved RIP

Saved RBP

Variable

Variable

Saved RIP

Saved RBP

main()

func1()

main() {

puts(func1()); <-

}

func1() {

func2(); <-

return “World!”;

}

func2() {

puts(“Hello!”);

}

24 of 45

Memory

Program instructions

Stack

Variable

Variable

Saved RIP

Saved RBP

Variable

Variable

Saved RIP

Saved RBP

main()

func1()

func2()

main() {

puts(func1()); <-

}

func1() {

func2(); <-

return “World!”;

}

func2() {

puts(“Hello!”); <-

}

Variable

Variable

Saved RIP

Saved RBP

25 of 45

Memory

Program instructions

Stack

Variable

Variable

Saved RIP

Saved RBP

Variable

Variable

Saved RIP

Saved RBP

main()

func1()

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”; <-

}

func2() {

puts(“Hello!”);

}

26 of 45

Memory

Program instructions

Stack

“World”

Variable

Saved RIP

Saved RBP

main()

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”;

}

func2() {

puts(“Hello!”);

}

27 of 45

Memory

Program instructions

Stack

“World”

Variable

Saved RIP

Saved RBP

0xc0

0xc4

0xc8

0xcc

main() {

puts(func1()); <-

}

func1() {

func2();

return “World!”;

}

func2() {

puts(“Hello!”);

}

Memory addresses

0x00

0x04

0x08

0x0c

0x08

0x0c

0xad

0xae

0xaf

0xbe

0xbc

28 of 45

Registers

  • In x86_64 assembly, most of our operations must be done between registers, or a register and a memory address.
  • x86_64 has many registers, including but not limited to:
    • RIP: address of the next instruction to be executed
    • RSP: address of the top of the stack
    • RBP: address of the bottom of the stack
    • RAX: accumulator register
    • RBX: base register
    • RCX: counter register
    • RDX: data register
    • R9-R15: general purpose registers
  • To access a memory address, we use the syntax “ADD RAX,[RBX]” to add the value stored in the memory address *RBX to the value of RAX.
    • This is called a “register dereference”, and is similar to pointers in C

29 of 45

Stack Frames

  • The stack is not only for variables, but also for functions!
  • Each function call creates a new stack frame on the front of the stack
    • In x86_64, RSP (stack pointer) points to the top of the stack, while RBP (base pointer) points to the bottom of the stack.
    • Whenever we call a function, certain saved registers are pushed onto the stack to be stored in our new stack frame, including RIP (instruction pointer) and the previous RBP (base pointer).
    • This allows us to restore the registers to a similar state as what it was before calling the function, and minimizes unexpected behavior from calling functions.
    • These saved registers form the “stack frame”

Variable

Variable

Saved RIP

Saved RBP

RSP (top of stack)

30 of 45

Arrays

  • Arrays are simply stored on the stack as contiguous variables. When you index an element of an array, you add the element number times the number of bytes of each element to the base of the array.
  • Arrays in x86_64 Linux are indexed upward, not downward like the way the stack grows.

array[1]

array[0]

array[2]

Base of Array

31 of 45

No Bounds Checking

  • Most compile-to-machine-code languages (C, Assembly, some C++ operators) offer no bounds checking.
    • Bounds checking is checking upon every array access if you are asking for an index that is out of bounds. In languages with bounds checking (like Java and Python), this will throw an exception.
    • The lack of bounds checking can potentially improve performance drastically; consider the two following assembly programs that sum up an array stored in RBX with length in RCX into RAX:
      • XOR RAX,RAX; Zero out RAX�REP ADD RAX,[RBX+RCX*8]; Add the RCXth index of RBX into RAX, decrementing�

RCX each time until it reaches zero

32 of 45

No Bounds Checking (cont)

Bounds checked program:

; Assume we have the length of the array stored in the memory address [length]�XOR RAX,RAX; Zero out RAX�.LOOP:� LEA RDX,[RBX+RCX*8]; Get address of our next thing to add� CMP RDX,0; Bounds check. Make sure it’s between 0 and length.� JL .ERROR;� CMP RDX,[length];� JGE .ERROR;� ADD RAX,[RDX]; Add the value� DEC RCX; Decrement RCX� TEST RCX,RCX; Test if RCX is 0. If not, repeat the loop.� JNZ .LOOP;

33 of 45

Buffer Overflows

  • A direct consequence of lack of bounds checking
  • Another type of overflow error
    • Like integer overflows! Except instead of overwriting a bit, we’re writing the stuff after a buffer in memory!
  • A buffer is simply a region of memory allocated to store data
  • The simplest type of buffer overflow is a stack-based buffer overflow
  • We write over the ends of the buffer into other variables on the�stack (potentially including the stack frame)
    • Let’s say we have an array of 16 1-byte characters in a function, and we input�a string to it…

array[7-15]

array[0-7]

Saved RIP

Saved RBP

RSP (top of stack)

34 of 45

Example

-> char inputString[8]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

****************************

inputString

0110 (610)

deposit

1101 (-310)

balance

****************************

Saved RIP register

“AAAAAAAAAAA” (len = 12)

35 of 45

Example

char inputString[8]; // create string of max length 16

-> gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

1100110011001100110011001100

inputString

0110 (610)

deposit

1101 (-310)

balance

11001100110011000000********

Saved RIP register

“AAAAAAAAAAA” (len = 12)

36 of 45

Example

char inputString[8]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

-> int deposit = atoi(inputString); // convert string to integer

balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

1100110011001100110011001100

inputString

0110 (610 + 0)

deposit

1101 (-310)

balance

11001100110011000000

Saved RIP register

“AAAAAAAAAAA” (len = 12)

37 of 45

Example

char inputString[8]; // create string of max length 16

gets(inputString); // ask for user input and store in inputString

int deposit = atoi(inputString); // convert string to integer

-> balance = balance + deposit; // update balance with deposit

RIP: address of the next instruction to be executed

1100110011001100110011001100

inputString

0110 (610)

deposit

1101 (-310 + 0)

balance

11001100110011000000

Saved RIP register

“AAAAAAAAAAA” (len = 12)

38 of 45

Example

11001100110011000000 -> Could be anything, but probably outside bounds of program

RIP: address of the next instruction to be executed

11001100110011001100110011001100

inputString[8]

0110 (610)

deposit

1101 (-310)

balance

11001100110011000000

RIP register

“AAAAAAAAAAA” (len = 12)

“Segmentation fault!”

39 of 45

Example

1100

inputString[0]

0110

deposit

1101

balance

1100

inputString[1]

1100

inputString[2]

1100

inputString[3]

1100

inputString[4]

1100

inputString[5]

1100

inputString[6]

1100

inputString[7]

1100

RIP

1100

1100

char inputString[8];

int deposit

int balance

1100

40 of 45

How to exploit a buffer overflow

  • To overflow a buffer, just send it too much data
    • What is your name?
    • AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    • Segmentation fault
  • Once we find a buffer that we can overflow, we can inject data into its surrounding memory (specifically, wherever the RIP variable is).
    • echo -e ‘AAAAAAAABBBBBBBB\x46\x11\x40\x00\x00\x00’ | ./myProgram
    • NOTE: The bytes go in reverse order! (this is related to your CPU’s endianness)
    • GDB is a helpful debugger to see what memory you changed, and therefore how many bytes you need to get to RIP (see appendix slide)

41 of 45

Demo: Buffer Overflows

42 of 45

Protection Mechanisms (Stack Carnies)

  • A simple way to protect against buffer overflows overwriting the cached RIP is to introduce a stack canary. This makes it so that before returning from any function, we check whether a “canary” has been modified. If it has, we know that an attacker has triggered a stack overflow, and the program will simply crash.

Data

Data

RBP

CANARY

RIP

43 of 45

Protection Mechanisms (Stack Carnies)

  • A simple way to protect against buffer overflows overwriting the cached RIP is to introduce a stack canary. This makes it so that before returning from any function, we check whether a “canary” has been modified. If it has, we know that an attacker has triggered a stack overflow, and the program will simply crash.
  • As we can see, the attacker can no longer write to the cached�RBP or RIP!

Shellcode

Shellcode

Shellcode

Shellcode

Pointer

Canary Modified! Program Crashes!

44 of 45

Protection Mechanisms (ASLR and DEP)

  • There are other ways modern computers protect against exploits like this. These include ASLR (address space layout randomization) and DEP (data execution prevention).
    • ASLR randomizes the top bytes of all addresses in the application whenever it is run. To get around this, an attacker must take care to only overwrite the lower bytes, and the attacker can only access addresses within the same memory range.
    • DEP separates address ranges into “data” or “code”, which each have a different combination of (read/write/execute) permissions. Code typically cannot be written to, but can be executed, while data cannot be executed, but can be written to. To get around this, an attacker must use ROP - return oriented programming - which consists of stitching shellcode together from library calls or functions in the program.

45 of 45

Useful Commands & Resources

  • Disassemble (dump assembly code of) a binary: objdump -d [filename]
  • Additional resources on x86 Assembly registers/instructions: https://www.cs.yale.edu/flint/cs421/papers/x86-asm/asm.html
  • Use gdb [filename] to debug a program
    • Use Intel syntax: set disassembly-flavor intel
    • Step over instructions: set step-mode on
    • To see what’s going on, layout asm
    • GDB tutorial: https://www.cs.cmu.edu/~gilpin/tutorial/
    • First, run the program
    • step inside a function call
    • info frame gives you information about the current stack frame
    • Run over a function onto the next line
    • If you know where to stop already, set a breakpoint there. (e.g. break main)