Becoming a full-stack reverse-engineer
In only 3 short years
What this is
This is a roadmap to becoming capable of working at any level of abstraction in software. It won’t make you an expert reverse-engineer, breaker of things, kernel developer, compiler developer, graphics programmer, or anything else. It will give you the foundation to become any of those things, though.
Complete this and you’ll be in the top 1% of reverse-engineers and provide an easy pathway to the top 1% of security professionals.
What this isn’t
This isn’t a guide with a bunch of things to go reverse-engineer or otherwise break. In fact, almost none of this is actually directly reverse-engineering. That’s because reversing is 99.9% understanding the layers of abstraction and how they stack on top of each other, and 0.1% actually reading code.
This is not a direct path. It meanders and it runs off on tangents that seem random and unrelated to reversing. This is by design and I believe that it makes it better.
Caveats
Prerequisites
There are none!
Okay, that’s kind of a lie. If you already know C, x86 assembly, how to write compilers, how to write emulators, how JITs work, how kernels are constructed, and more, then this goes from 3 years to 20 minutes.
You really don’t need to know anything going in. If you get stuck or there’s something totally unknown to you, it’s time to learn. Anyone can go through this, it just will take you longer depending on how much you need to cram for each of the things I’m going to have you work on.
Year 1
Year 1: Read “Reversing”
Start by reading the book “Reversing” by Eldad Eilam. It’s not the perfect book, but it provides a great jumping-off point. What you’ll learn:
Year 1: Learn assembly
Learn assembly for at least one architecture; x86 is a great start. This isn’t because it’s simple, uniform and sane, or the best architecture; it’s none of those things. But it will teach you everything you need to know, it’s extremely widespread, and starting with some of the ugliest ASM will make you grateful for everything else.
Year 1: Reverse a game
Games provide a lot of great lessons to reverse-engineers and this is a good place to start out.
Year 1: Read “Compilers”
Read the Dragon book -- “Compilers” by Aho et al. This isn’t a great book, there are probably better out there, but I personally haven’t read them and can’t recommend any directly. This book does provide a good base level to work from, at least. What you’ll learn:
Year 1: Write a source-to-source compiler
Also known as “transpilers”, possibly the worst term in tech circles, these are a good way to get your feet wet with compiler development. This will teach you more than reading Compilers ever did.
Year 1: Write an assembler
Writing an assembler isn’t actually that valuable an exercise, but it’s an easy and short one that will eventually, one day, come to be useful to you. Take the few days/weeks and just do it.
Year 2
Year 2: Write another compiler to assembly
This time, you’re going to be compiling down to assembly. This is probably the single hardest thing on this entire list, and it’s how we’re going to start year 2, because it’s just that valuable.
Year 2: Read “Reverse Compilation Techniques”
Understanding how decompilers really work is valuable in ways I can’t even explain. This sets you up to build decompilers and deobfuscators, reverse-engineer code, and generally break everything.
Year 2: Write a bytecode decompiler
Both Android’s Dalvik and .NET’s CIL are easy to understand and work with. They also closely mirror the languages which compile to them, unlike most, making them optimal places to begin your decompiler journey.
Year 2: Write a machine code decompiler
Decompiling from machine code is drastically, drastically different than decompiling from bytecode. The starting languages -- C, C++, Rust, whatever -- are totally different from the target architecture in nearly every way. This means your decompiler needs to infer a lot more about what the program is trying to do. It also means a lot more you will learn in the process.
Year 2: Read the OSDev wiki
The OSDev wiki is the absolute best resource on the web for all things kernels and operating systems. Whether it’s details of target systems, how to write bootloaders, how basic graphics and interrupts work, or anything else -- it’s there.
Year 2: Write a toy kernel
This might be the thing that is most foreign to most people coming into this. It’s so different from everything else that you’ve done so far -- in this guide or in your career -- but you’ll learn a ton in the process. It’s also just fun.
Year 2: Read the OSDev Wiki
Year 2: Rewrite your toy kernel
Now that you’ve done it once, do it again but better. You know you made some horrible mistakes the first time around, and that you made it completely non-portable. Time to fix that.
Year 2: Write a microkernel
Your first kernel was almost certainly a monolithic mess, most likely without any separate address spaces, and probably didn’t even contain processes. Time to go to the exact opposite end of the spectrum.
Year 3
Year 3: Write an interpreting emulator
Emulators for game systems allow you to really put everything you’ve learned so far into practice in a big way. Let’s start simple, though.
Year 3: Write a recompiling emulator
Now we can sprinkle in some compiler development. Recompilers are a fun and interesting topic, which astoundingly few people work on.
Year 3: Write an emulator for a black box platform
There aren’t many platforms without extensive documentation at this point, but they do still exist. The original Xbox, for instance, isn’t well-documented. In fact, the older or newer something is, the less likely it is to be well-documented or have good existing emulators; most from the late 80s to mid 2000s are already done. One option you have is to pick up a kid’s toy like a Vtech handheld and attempt to pull the code from that and emulate it; these are virtually untouched.