2 of 19

Summary

There are no good debuggers for EVM languages*
Building a debugger for an EVM language today is ridiculously expensive

… and it still won’t even be good

Compilers’ providing the right information would solve this problem
Figuring out what information to provide is hard
Solidity and the EF now sponsor the ethdebug/format working group
Our aim: publish a standard for compilers to use that would simplify creating/maintaining robust debuggers

*Remix/Tenderly: please hold your glares until the end of the talk 🙂

3 of 19

About me

Recently became Ethdebug Working Group Lead
Informally serving this effort since October 2022

(mostly meetings and light project management)

I was the lead of Truffle for the last 5+ years ^{(requiescat in pace 🪦🫡)}

Notably, I architected @truffle/debugger and @truffle/codec

Long history of loudly+annoyingly insisting we need a debugging data format (to anyone who would listen), until the Solidity team started insisting back 😁🙏

g. nick // gnidan.eth

4 of 19

Talk agenda

How to make a debugger today and why it won’t be good
What’s a debugging data format?
Working group progress and approach
Call for participation

5 of 19

How to make a debugger today

6 of 19

Solidity debugging today

Source maps relate EVM instructions to high-level source ranges
ASTs relate source ranges to syntactic and semantic components
Transaction trace reports EVM machine state at each instruction step
Internal debugger logic steps through trace, looks up source range, finds AST, then keeps track of a bunch of stuff
Storage layouts list state variables, their types, but not the actual slot info
Using lots of state and overzealous guessing, a debugger can figure out what function it’s in, what variables are in scope, their values, etc.

7 of 19

What we can know vs. what we can’t

We need to understand Solidity internals for:

What function we’re in
The rest of the function call stack
List of variables in scope
Where local variables are on the stack
How memory is allocated
Storage slot assignments
Gas cost of complex operations
Visualizing state transitions
Decoding function parameters
Complex type information in events
Contract upgrade mechanisms
Inter-contract execution flow
Exception handling
…

With solc and debug_traceTransaction, we can be sure about:

Static list of state variables + their types
Raw EVM state at each instruction step
Corresponding source range
Corresponding AST node

8 of 19

Key problem here: debuggers must guess how compilers work

(and good luck if the compiler changes!)

9 of 19

Debugging data formats

(have been solving this problem for traditional computing since the 1980s)

10 of 19

Making sense of raw bytes

Debugging data captures programmer intent and maps it to low-level machine operations… allowing tools to translate machine state to source
Examples of debugging data include source mappings, storage layout descriptions, and ASTs
Compilers inherently track this as part of generating bytecode
If compilers output enough of this information, tools can examine machine state in terms of variables/functions/etc., not just PUSHes and JUMPs

11 of 19

Debugging data for smart contracts

Existing formats are sadly not suitable for smart contract languages:

These formats embed debug symbols inside the machine code output
Typically, they are only prepared for up to 64-bit word sizes
They lack blockchain context, like gas usage or the idea that the code itself has money

Solidity encodings get weird because of gas and how state is stored on-chain:

Closely-related pieces of data are often cryptographically non-sequential
Different data locations can have subtle restrictions on what’s allowable
The same type might be encoded entirely differently in a different data location
Encoding strategies can change fundamentally based on size of the data

12 of 19

from “storing things in solidity for fun and confusion”

https://gist.github.com/gnidan/b1890c68c8e0825d4a699929ccb2018e

13 of 19

ethdebug/format working group

14 of 19

Working group to-date

Tasked by solc-tooling group initially at DevCon Bogotá (October 2022)
Biweekly meetings since then to establish direction, identify known challenges, and discuss initial prototyping
Primary contributors:

John Toman (Certora), Daniel Kirchner (Solidity), Kamil Śliwak (Solidity), Harry Altman (prev. Truffle), Amal Sudama (prev. Truffle), Marko Veniger (Tenderly)

Repo: https://github.com/ethdebug/format
Matrix.chat: https://matrix.to/#/#ethdebug:matrix.org

15 of 19

Approach (as currently understood)

Define schema to represent native and user-defined types (structs, etc.)
Define schema for annotating machine instructions, e. g. to indicate when variables enter/exit scope and where to find them
Gloss over non-essential encoding details with constructs like {“strategy”: “solidity”} or {“strategy”: “vyper”}
Afford bytecode deduplication via nondeterministic annotations
Optimize for minimizing data size, since compiler output is already very large
Avoid re-inventing formats for existing things like source maps
Solicit additional requirements from tooling and other languages

16 of 19

So exciting (and not at all tedious)!

17 of 19

Please help us do a good job

Join our workshop session later today
Watch the GitHub repo: https://github.com/ethdebug/format
Read our WIP docs: https://ethdebug.github.io/format/
Attend our biweekly calls, announced in Matrix.chat: https://matrix.to/#/#ethdebug:matrix.org
Stay tuned for initial formal schema drafts and other artifacts

18 of 19

Thank you 🙏

19 of 19

Thursday,

Nov 16, 2023

İstanbul, Türkiye