1 of 19

Decoding the EVM

A new debugging format for smart contracts

2 of 19

Summary

  • There are no good debuggers for EVM languages*
  • Building a debugger for an EVM language today is ridiculously expensive

… and it still won’t even be good

  • Compilers’ providing the right information would solve this problem
  • Figuring out what information to provide is hard
  • Solidity and the EF now sponsor the ethdebug/format working group
  • Our aim: publish a standard for compilers to use that would simplify creating/maintaining robust debuggers

*Remix/Tenderly: please hold your glares until the end of the talk 🙂

3 of 19

About me

  • Recently became Ethdebug Working Group Lead
  • Informally serving this effort since October 2022

(mostly meetings and light project management)

  • I was the lead of Truffle for the last 5+ years (requiescat in pace 🪦🫡)

Notably, I architected @truffle/debugger and @truffle/codec

  • Long history of loudly+annoyingly insisting we need a debugging data format (to anyone who would listen), until the Solidity team started insisting back 😁🙏

g. nick // gnidan.eth

4 of 19

Talk agenda

  1. How to make a debugger today and why it won’t be good
  2. What’s a debugging data format?
  3. Working group progress and approach
  4. Call for participation

5 of 19

How to make a debugger today

6 of 19

Solidity debugging today

  • Source maps relate EVM instructions to high-level source ranges
  • ASTs relate source ranges to syntactic and semantic components
  • Transaction trace reports EVM machine state at each instruction step
  • Internal debugger logic steps through trace, looks up source range, finds AST, then keeps track of a bunch of stuff
  • Storage layouts list state variables, their types, but not the actual slot info
  • Using lots of state and overzealous guessing, a debugger can figure out what function it’s in, what variables are in scope, their values, etc.

7 of 19

What we can know vs. what we can’t

We need to understand Solidity internals for:

  • What function we’re in
  • The rest of the function call stack
  • List of variables in scope
  • Where local variables are on the stack
  • How memory is allocated
  • Storage slot assignments
  • Gas cost of complex operations
  • Visualizing state transitions
  • Decoding function parameters
  • Complex type information in events
  • Contract upgrade mechanisms
  • Inter-contract execution flow
  • Exception handling

With solc and debug_traceTransaction, we can be sure about:

  • Static list of state variables + their types
  • Raw EVM state at each instruction step
  • Corresponding source range
  • Corresponding AST node

8 of 19

Key problem here: debuggers must guess how compilers work

(and good luck if the compiler changes!)

9 of 19

Debugging data formats

(have been solving this problem for traditional computing since the 1980s)

10 of 19

Making sense of raw bytes

  • Debugging data captures programmer intent and maps it to low-level machine operations… allowing tools to translate machine state to source
  • Examples of debugging data include source mappings, storage layout descriptions, and ASTs
  • Compilers inherently track this as part of generating bytecode
  • If compilers output enough of this information, tools can examine machine state in terms of variables/functions/etc., not just PUSHes and JUMPs

11 of 19

Debugging data for smart contracts

Existing formats are sadly not suitable for smart contract languages:

  • These formats embed debug symbols inside the machine code output
  • Typically, they are only prepared for up to 64-bit word sizes
  • They lack blockchain context, like gas usage or the idea that the code itself has money

Solidity encodings get weird because of gas and how state is stored on-chain:

  • Closely-related pieces of data are often cryptographically non-sequential
  • Different data locations can have subtle restrictions on what’s allowable
  • The same type might be encoded entirely differently in a different data location
  • Encoding strategies can change fundamentally based on size of the data

12 of 19

from “storing things in solidity for fun and confusion”

https://gist.github.com/gnidan/b1890c68c8e0825d4a699929ccb2018e

13 of 19

ethdebug/format working group

14 of 19

Working group to-date

  • Tasked by solc-tooling group initially at DevCon Bogotá (October 2022)
  • Biweekly meetings since then to establish direction, identify known challenges, and discuss initial prototyping
  • Primary contributors:

John Toman (Certora), Daniel Kirchner (Solidity), Kamil Śliwak (Solidity), Harry Altman (prev. Truffle), Amal Sudama (prev. Truffle), Marko Veniger (Tenderly)

  • Repo: https://github.com/ethdebug/format
  • Matrix.chat: https://matrix.to/#/#ethdebug:matrix.org

15 of 19

Approach (as currently understood)

  • Define schema to represent native and user-defined types (structs, etc.)
  • Define schema for annotating machine instructions, e. g. to indicate when variables enter/exit scope and where to find them
  • Gloss over non-essential encoding details with constructs like {“strategy”: “solidity”} or {“strategy”: “vyper”}
  • Afford bytecode deduplication via nondeterministic annotations
  • Optimize for minimizing data size, since compiler output is already very large
  • Avoid re-inventing formats for existing things like source maps
  • Solicit additional requirements from tooling and other languages

16 of 19

So exciting (and not at all tedious)!

17 of 19

Please help us do a good job

  • Join our workshop session later today
  • Watch the GitHub repo: https://github.com/ethdebug/format
  • Read our WIP docs: https://ethdebug.github.io/format/
  • Attend our biweekly calls, announced in Matrix.chat: https://matrix.to/#/#ethdebug:matrix.org
  • Stay tuned for initial formal schema drafts and other artifacts

18 of 19

Thank you 🙏

19 of 19

Thursday,

Nov 16, 2023

İstanbul, Türkiye