1 of 26

EVM: Technical walkthrough

2 of 26

Transaction and Gas

- Signature is not present.

- Three types of Tx: Legacy, AccessList, eip1559Tx

- TransactTo is zero or contract address.

- Gas is introduced to limit execution, GasPrice for prioritizing transactions (eip1559).

3 of 26

Block

There are more additional fields but those are not used in EVM execution: OmnerHash, ParentHash, State/Transaction/Receipt Root, Bloom, ExtraData,MixHash/Nonce

BlockEnv and TxEnv can be seen as const field in EVM execution.�Additional cfg can be found in CfgEnv that contains ChainId and SpecId.

Beige paper: https://github.com/chronaeon/beigepaper/blob/master/beigepaper.pdf

4 of 26

Database interface

All Block/Transaction data are contained inside environment struct.

5 of 26

EVM: Host and Interpreter

  • EVM Is stack based machine
  • Transactions in block are executed in one by one manner.
  • Transaction does transact in two ways: Call and Create
  • EVM has two main parts: Host and Interpreter
  • Needs to support upgrades in terms of hard forks.
  • Precompiles as separate smart contracts written in native language.
  • Output of EVM execution is: Map<H160, Account>, Vec<Log>, ReturnStatus, GasUsed, OutputBytes

6 of 26

EVM Diagram

Interpreter executes contracts and calls Host for needed information. For example to call another contract.

If revert or selfdestruct happen contract call stops, and all its changes are reverted. Parent caller continue its execution.

7 of 26

Interpreter

  • Is the one that contains instructions and it is one responsible for execution of smart contracts.
  • It has two stages. First stage, Analysis, goes over smart contract bytecode and checks positions of JUMPDEST opcode and creates JUMPDEST table, this is what all EVM’s do (Evmone for optimization, added additional AdvanceAnalasys that for example precalculates GasBlock and adds padding if Bytecode doesn’t finish with STOP so that we are safe to iterate and not check length at every step)
  • Second stage is Execution: one big loop that does steping over bytecode, extracts OpCode does match(switch) and executes it depending on the type.
  • PUSH(1-15) opcode is special case that allows you to have data embedded inside bytecode and be allowed to push it to Stack. All Other OpCodes are just one byte sized.

8 of 26

Interpreter contains:

  • Memory: continuous unbound chunk of memory. Reserving new parts of memory is paid by gas. (In theory it does not have limit, but in practice you will need a lot of eth to pay for it)
  • Stack: 256bit item stack with 1024 limit of items.
  • Gas calculation: Spend gas is appended and checked against GasLimit before every instruction is executed. Gas per OpCode depends on the type and can be simple as ADD( priced 3gas) to SSTORE (depends on multiple factors, is new value zero, same as original,cold/hot load). Berlin hardfork introduces cold/hot account/storage loads.
  • Host: Interpreter is called by Host but it contains Host interface to get informations that are outside of interpreter, and it allows us to CALL another contract by calling Host.
  • Program counter and Contract that we are executing with its Analysis.

9 of 26

Interpreter machine�in code

Just look and marvel at that rust code

10 of 26

OpCodes

Can be roughly separated into:

  • Arithmetic and logic opcodes (ADD, SUB, MUL, SDIV, GT, LT, AND, OR,...)
  • Stack related (POP, PUSH, DUP, SWAP,...)
  • Memory opcodes (MLOAD, MSTORE, MSTORE8, MSIZE)
  • Program counter related opcodes (JUMP, JUMPI, PC, JUMPDEST)
  • Storage opcodes (SLOAD, SSTORE)
  • Environment opcodes (CALLER, Transaction and Block info)
  • Halting opcodes (STOP, RETURN, REVERT, SELFDESTRUCT,...)
  • System opcodes (LOG,CALL, CREATE,CREATE2,CALLSTATIC, …)(next slides)

Full list here: https://github.com/wolflo/evm-opcodes and https://www.evm.codes/

11 of 26

CREATE And CREATE2

CREATE and CREATE2, are OpCodes used to create contract.

They randomly create address where bytecode is going to be added. Bytecode is received as return value of Interpreter after input code is executed.

Only difference between them is how address of contract is going to be created:

  • CREATE address: Keccak256(rlp[caller,nonce]
  • CREATE2 address: Keccak256([0xff,caller,salt,code_hash])

12 of 26

Call OpCodes

Multiple variants of CALL are called with different call context.Call context contains: Address, Caller, ApparentValue. (It affects SLOAD and SSTORE)

  • CALL: Caller is present context.address. Address and ApparentValue are from stack.
  • DELEGATECALL: Address, Caller, ApparentValue are from present context.
  • CALLCODE: Address and Caller are present context.addreess. ApparentValue is from stack
  • STATICCALL: Same as CALL but contracts will fail if SSTORE, LOG, SELFDESTRUCT, CREATE/2 or CALL if the value sent is not 0 are called

DELEGATECALL was a new opcode that was a bug fix for CALLCODE which did not preserve msg.sender and msg.value. If Alice invokes Bob who does DELEGATECALL to Charlie, the msg.sender in the DELEGATECALL is Alice (whereas if CALLCODE was used the msg.sender would be Bob).

More info: https://ethereum.stackexchange.com/questions/3667/difference-between-call-callcode-and-delegatecall

13 of 26

Logs

Logs are a way to log a message that something happened while executing smart contract. It allows smart contract devs to have a nice way to notify users/machine for specific event.

Log contain:

  • Contract Address (From Call Context)
  • Topics: that are just a list of 256 bit items. Item number depends on if it is LOG0…LOG4. Items are popped from stack.
  • Data: Is read from Memory and can be in arbitrary size (of course you pay for every bite of it :))

14 of 26

Gas

Every Opcode is priced in terms of Gas. Every memory extension, DB load or store has some dynamic or base gas calculation.

FeeSpend is representing GasUsed*GasPrice and it is what you pay when you execute transaction to miner.

Eip1559 is improvement that introduced BaseFee that is taken from FeeSpend and burned (destroyed) rest of Fee is transferred to miner that created the block. And where our GasPrice is calculated as BaseFee+PriorityFee.

There was a way to get refund on gas GasRefund to decrease use gas. It is used in SSTORE and SELFDESTRUCT (Idea was okay but was misused and in future probably going to be removed).

15 of 26

Traces

It is utility used for debugging and useful for profiling of contract execution. It contains every step of execution and its opcode, used gas, memory, stack.

It can be tied with solidity output to get full view of what is happening.

Call Traces are for some use cases eve more needed, it represent what contracts are called.

16 of 26

Inspector

-Implementation detail but for traces to be obtain there are need to have some kind of hooks that will allows us to inspect internal state in runtime.

Forge (upcoming tool for solidity devs) are using something similar with Sputnik to obtain traces and apply cheatcodes that help with debugging.

It mostly does hooking on Host part and on every step inside Interpreter.

17 of 26

Interpreter code exploration

18 of 26

Host

  • Is starting point of execution. It creates and calls Interpreter(Machine).
  • As we already said, transaction can do: CALL and CREATE to EVM. so we have inner_call and inner_create functions for recursive calls from Interpreter.
  • Additionally Host acts as binding between Interpreter and needed data from outside of EVM (database, environment, SLOAD,SSTORE).
  • It handles contract calls and call stack. It needs to have ability to revert changes that happened inside one contract call. Including created Logs. Needs to handle selfdestruct storage reset.
  • Reverts happen on OutOfGas, StackOverflow and StackUnderflow errors.
  • Chooses if precompile contracts needs to be called if 0x00..01 to 0x00..09 addresses are called

19 of 26

Host contains:

  • Subroutine: call stack with changes of every call. (Next slide)
  • Precompiles: list of native hashes and curves.(Little bit later)
  • DB: fetching account info, code, and storage from database.
  • Environments: Transaction and Block information.
  • *Inspector: Implementation dependent part for hooking of evm execution, main usage is tracing

20 of 26

Subroutine (State and reverts)

It contains:

  • State: current state of accounts and storages.
  • Logs: Called OpCodes LOG1-4 are stored here.
  • Depth: limit call stack to 1024
  • Changelog: List of changes that happened in current changeset (contract call).
    • Checkpoint is created at every call and and it gets its own ID that is incremented over time. If some of contracts failed it’s checkpoint with its ID gets reverted and every ID that is higher.
    • If contract executed correctly usually its changelog should be merged with parent changelog, but we are just leaving it and in return just continue using current changelog without merging.

21 of 26

Host Trait

22 of 26

Precompile Name

Address

Type

Secp256k1::ecrecovery

0x00…01

Curve signature recovery

sha256

0x00…02

Hash

ripemd160

0x00…03

Hash

Identity

0x00…04

Utility

bigModExp

0x00…05

Math

Bn128::add

0x00…06

Curve

Bn128::mul

0x00…07

Curve

Bn128::pair

0x00…08

Curve

Blake2

0x00…09

Hash

More info: https://docs.klaytn.com/smart-contract/precompiled-contracts

23 of 26

Host code exploration

24 of 26

Hard Forks

  • Arrow Glacier: Dec-09-2021
    • EIP-4345 – delays the difficulty bomb until June 2022
  • London: Aug-05-2021
    • EIP-1559 – improves the transaction fee market
    • EIP-3198 – returns the BASEFEE from a block
    • EIP-3529 - reduces gas refunds for EVM operations
    • EIP-3541 - prevents deploying contracts starting with 0xEF
    • EIP-3554 – delays the Ice Age until December 2021
  • Berlin: Apr-15-2021
    • EIP-2565 – lowers ModExp gas cost
    • EIP-2718 – enables easier support for multiple transaction types
    • EIP-2929 – gas cost increases for state access opcodes
    • EIP-2930 – adds optional access lists
  • Muir Glacier: Jan-02-2020
    • EIP-2384 – delays the difficulty bomb for another 4,000,000 blocks, or ~611 days.

More on it here: https://ethereum.org/en/history/

25 of 26

Optimizations

Use u64 for gas calculations, in spec it is U256: Spending u256 gas is not something that is going to happen, for comparison current eth Block limit is 30M gas.

Memory calculation for u64, u256 does not make sense. There is no hard limit on memory used, but for every 32bit you use you pay for gas that acts as soft limiter. Usually memory is specified as offset+size and memory is paid as `max(offset+size)` number

Ethereum uses big-endian encoding and all PUSH values are in bigendian format, this can be slow on most machines that uses little endian and have support for u64 items. So in EVM stack is basically U256 that is [u64;4] (list of four u64 numbers) and we always convert those things back and forth.

26 of 26

Q&A