Implementing a New CPU Architecture

for Ghidra

@guedou

BeeRump

Before the talk

zoom the presenter notes

CTRL + ALT + / || CTRL + F4

Why?

2

Toshiba FlashAir W-03

3

See https://goo.gl/oijvdN

Toshiba MeP-c4

My MeP & FlashAir Tools

cea-sec/miasm - MeP architecture in miasm

assembly, disassembly, emulation

guedou/r2m2 - miasm plugin for radare2

graphical interface, emulation, tools

guedou/flashre - tools to reverse FlashAir cards

dump, telnet, fake updates...

4

5

Missing Tool: a Decompiler

aka output C instead of assembly

6

Open Source Decompilers

many available

reko, snowman, r2dec, radeco, retdec...

architecture dependent

must describe some MeP specificities

7

See https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers#Decompilers

New Open-Source RE Tool

developed by the NSA

revealed in the Vault7 leak

released in March 2019

https://github.com/NationalSecurityAgency/ghidra

many features

disassembly, graphing, scripting, extensions, decompiling...

8

See https://ghidra-sre.org/

9

See https://github.com/xyzz/ghidra-mep

SLEIGH

Ghidra Processor Specification Language

10

SLEIGH?

language derived from SLED

Specification Language for Encoding and Decoding

architecture independent assembler & disassembler

ease defining instructions decoding & semantics

data-flow & decompilation analysis

semantics converted to Ghidra IR (aka P-CODE)

also a command-line tool

$GHIDRA_HOME/support/sleigh

11

Mandatory Processor Structure

guedou/ghidra-processor-mep

12

$ tree Ghidra/Processors/MEP_C4/

Ghidra/Processors/MEP_C4/

├── data

│ └── languages

│ ├── mep_c4.cspec

│ ├── mep_c4.ldefs

│ ├── mep_c4.pspec

│ ├── mep_c4.sla

│ └── mep_c4.slaspec

└── Module.manifest

2 directories, 6 files


Language Definitions - mep_c4.ldefs

13

<language_definitions>

<language processor="Toshiba MeP-c4"

endian="little"

size="32"

variant="default"

version="0.1"

slafile="mep_c4.sla"

processorspec="mep_c4.pspec"

id="MEP_C4:LE:32:default">

<description>Toshiba MeP-c4, little endian</description>

<compiler name="default" spec="mep_c4.cspec" id="default"/>

</language>

</language_definitions>


Processor Specification - mep_c4.pspec

PC, symbols (Reset, NMI handlers...)

14

<processor_spec>

<programcounter register="pc"/>

</processor_spec>


Compiler Specification - mep_c4.cspec

15

<compiler_spec>

<global>

<range space="ram"/>

</global>

<stackpointer register="sp" space="ram"/>

<default_proto>

<prototype extrapop="0" stackshift="0" name="__stdcall">

<input>

<pentry minsize="1" maxsize="4">

<register name="r1"/>

</pentry>

</input>

<output>

<pentry minsize="1" maxsize="4">

<register name="r0"/>

</pentry>

</output>

</prototype>

</default_proto>

</compiler_spec>


SLEIGH Specification File - mep_c4.slaspec

compiled to mep_c4.sla with sleigh

XML version of mep_c4.slaspec with P-CODE

five important concepts

space - ram & register definition

register - names & aliases

token - instructions parts

variables - names to registers bindings

instruction - tokens composition & semantic

16

Example #1

MeP-c4 16-bit MOV

17

18

# MOV Rn,Rm - 0000_nnnn_mmmm_0000

define register offset=0 size=4 [ r0 r1 ];

define token instr(16)

major = (12, 15)

rn = (8, 11)

rm = (4, 7)

minor = (0, 3)

;

attach variables [ rn rm ] [ r0 r1 ];

:mov rn, rm is major=0b0000 & rn & rm & minor=0b0000 {

rn = rm;

}

Example #2

MeP-c4 32-bit MOV (variant #1)

19

20

# MOV Rn,imm16 - 1100_nnnn_0000_0001 iiii_iiii_iiii_iiii

define token instr(16)

major = (12, 15)

rn = (8, 11)

minor8 = (0, 7)

;

define token ext(16)

imm16 = (0, 15)

;

:mov rn, imm16 is major=0b1100 & rn & minor8=0b00000001 ; imm16 {

rn = imm16;

}

Example #3

MeP-c4 32-bit MOV (variant #2)

21

22

# MOVU Rn[0-7],imm24 - 1101_0nnn_IIII_IIII iiii_iiii_iiii_iiii

define token instr(16)

major = (12, 15)

rn = (8, 11)

minor = (0, 3)

;

define token ext(16)

imm16 = (0, 15)

;

:movu rn, imm24 is major=0b1101 & rn & minor ; imm16 [ imm24 = minor + (imm16 << 8); ] {

rn = imm24;

}

Example #4

MeP-c4 LW

23

24

# LW Rn,(Rm) - 0000_nnnn_mmmm_1110

:lw rn, "("^rm^")" is major=0b0000 & rn & rm & minor=0b1110 {

rn = *[ram]:4 rm;

}

Ghidra/Processors/MEP_C4/data/patterns/*.xml

ease detecting functions prologues & epilogues

25

<patternlist>

<pattern>

<data> 0x1a 0x70 ....0000 0x6f </data>

<!-- 1a70 LDC R0, LP

f06f ADD SP, -4

-->

<funcstart validcode="function" thunk="true"/>

</pattern>

</patternlist>

26

Perspectives

PR for ghidra-mep

add missing instructions

implement headless unit tests

automatically generate mep.sla from miasm?

convert miasm expressions to P-CODE?

27

References

28

Questions?

Beers?

29

flashre - BeeRump - Google Slides