1 of 89

Ethereum Basics

Joseph Bonneau

2017 Summer School on Cyber & Computer Security

Technion

2 of 89

Talks on Ethereum today

Morning:

- Basic overview

- hands-on demo

Afternoon:

- Security issues

- Proof-of-stake

3 of 89

The road to Ethereum

4 of 89

“Smart contracts” conceptualized by Szabo in 1994

A smart contract is a computerized transaction protocol that executes the terms of a contract. The general objectives are to satisfy common contractual conditions (such as payment terms, liens, confidentiality, and even enforcement), minimize exceptions both malicious and accidental, and minimize the need for trusted intermediaries. Related economic goals include lowering fraud loss, arbitrations and enforcement costs, and other transaction costs.

-Nick Szabo “The Idea of Smart Contracts”

5 of 89

A “dumb contract” example: pay for a hash pre-image

Alice will reveal to Bob a value x such that SHA-256(x) = 0x2a...

In exchange, Bob will pay US$10.

If Alice does not reveal by July 1, 2017, then she will pay a penalty of US$1 per day that she is late, up to US$100.

Signed:

6 of 89

Traditional contracts vs. smart contracts

Traditional

Smart

specification

Natural language, “legalese”

Code

assent

Signatures

Digital signatures

dispute resolution

Judges, arbitrators

Decentralized platform

nullification

By judges, politicians

????

punishment

Fines, imprisonment

Monetary, requires deposits

7 of 89

Recall: Bitcoin has a simple scripting language

"tx_out":[

{

"value":"10.12287097",

"scriptPubKey":"OP_DUP OP_HASH160 69e...3d42e OP_EQUALVERIFY OP_CHECKSIG"

},

...

]

8 of 89

“Spending” requires specifying half of a script

OP_DUP

OP_HASH160

<69e02e18...>

OP_EQUALVERIFY

OP_CHECKSIG

<30440220...>

<0467d2c9...>

scriptSig

scriptPubKey

TO VERIFY: Concatenated script must execute completely with no errors

9 of 89

Bitcoin script execution example

<sig> <pubKey> OP_DUP OP_HASH160 <pubKeyHash?> OP_EQUALVERIFY OP_CHECKSIG

<sig>

<pubKey>

<pubKey>

<pubKeyHash?>

<pubKeyHash>

true

10 of 89

Bitcoin scripting language (“Script”)

Design goals

  • Built for Bitcoin (inspired by Forth)
  • Simple, compact
  • Support for cryptography
  • Stack-based
  • No looping
    • Not Turing-complete
  • Time/memory usage bound by program size

image via Jessie St. Amand

I am not impressed

11 of 89

Some useful contracts can be done in Bitcoin

  • Proof-of-burn
  • MULTISIG/access control trees
  • Pay-for-hash-preimage
    • Multi-party lotteries
    • Atomic cross-chain currency exchange
  • Micropayment/payment channels
    • Greatly improved with OP_CHECKLOCKTIME

12 of 89

Extending Bitcoin functionality

13 of 89

Bitcoin script left developers wanting more

By adding a few opcodes to Bitcoin script, we could support:

  • Distributed naming (Namecoin)
  • Financial instruments (OpenBazaar, MasterCoin)
  • Prediction markets (Futurecoin)
  • And many more...

14 of 89

Namecoin was the first fork of Bitcoin

Goal: distributed naming, similar functionality to DNS

3 new opcodes:

  • NAME_NEW
  • NAME_FIRST_UPDATE
  • NAME_UPDATE

15 of 89

Namecoin introduces three new opcodes

NAME_NEW: H(r, “jbonneau”)

NAME_FIRST_UPDATE: r, “jbonneau”, {"ip" : "68.178.254.235"}

NAME_UPDATE: r, “jbonneau”, {"ip6" : "2001:4860:0:1001::68"}

12 block delay (frontrunning)

16 of 89

Namecoin introduces new global state

NAME_NEW y

NAME_FIRST_UPDATE jbonneau,r; 68...

NAME_UPDATE jbonneau, 2001:...

google → 172.217.18.110 [owner: Kg]

reddit → 151.101.65.140 [owner: Kr]

google → 172.217.18.110 [owner: Kg]

reddit → 151.101.65.140 [owner: Kr]

y → {pending} [owner: Kj]

google → 172.217.18.110 [owner: Kg]

reddit → 151.101.65.140 [owner: Kr]

jbonneau → 68.178.254.235 [owner: Kj]

google → 172.217.18.110 [owner: Kg]

reddit → 151.101.65.140 [owner: Kr]

jbonneau → 2001:... [owner: Kj]

17 of 89

Namecoin introduces new fees, incentives

18 of 89

Several requirements for new functionality

new addition

purpose

in Namecoin

in Futurecoin

Global state

Track app-specific data

name → value map

list of markets, bets in each market

Opcodes

Express updates to global state

NAME_NEW etc.

OPEN_MARKET etc.

Fees

Limit computation & edits to global state

Registration fees to limit squatting, maintenance fees

transaction fees per open market, exchange

19 of 89

Bitcoin left these three implicit

new addition

purpose

in Bitcoin

Global state

Track app-specific data

UTXO set

Opcodes

Express updates to global state

transactions

Fees

Limit computation & reads/writes to global state

not required

Bitcoin scripts only succeed/fail. No side effects on global state

Miners can produce blocks which are very costly to verify

This state is implicit

20 of 89

Replicated State Machines

21 of 89

Replicated state machines are the classic abstraction

  • Set of possible states S
  • Set of possible inputs I
  • Set of possible outputs O

  • Transition function f: S × IS × O

  • Start state sS (genesis block)

22 of 89

“Blockchain” is an ordered list of inputs w/consensus

consensus info

input

nonce=0x456...

A→B 17 signed(Alice)

consensus info

input

nonce=0x123...

B→C 11 signed(Bob)

consensus info

input

23 of 89

“State” is really just a compression of history

Inputs: Ø

Outputs: 25→Alice

Inputs: 1[0]

Outputs: 17→Bob, 8→Alice

SIGNED(Alice)

time

is this valid?

Inefficient: Scan blockchain to check for validity

Inputs: 2[0]

Outputs: 8→Carol, 7→Bob

SIGNED(Bob)

Inputs: 2[1]

Outputs: 6→David, 2→Alice

SIGNED(Alice)

1

2

3

4

Efficient: track UTXO set

{1[0]: 25, A}

{2[0]: 17, B; 2[1]: 8, A}

{2[1]: 8, A; 3[0]: 8, C, 3[1]: 7,B}

24 of 89

Blockchains may include explicit state commitments

consensus info

input

state commitment

nonce=0x456...

A→B 17 signed(Alice)

{A: 33, B:17}

consensus info

input

state commitment

nonce=0x123...

B→C 11 signed(Bob)

{A: 33, B:6, C: 11}

consensus info

input

state commitment

s = {A: 50}

25 of 89

Explicit state offers many advantages

  • Inconsistencies surface immediately
  • Light clients can quickly get current state
  • Can efficiently verify sequence between any two blocks

consensus info

input

state commitment

nonce=0x123...

B→C 11 signed(Bob)

{A: 33, B:6, C: 11}

26 of 89

Ethereum: A universal RSM

27 of 89

To get Turing-completeness:

  • Set of possible states S
  • Set of possible inputs I
  • Set of possible outputs O

  • Transition function f: S × IS × O

  • Start state s ∈ S (genesis block)

Include arbitrary programs

Interpret programs

28 of 89

Universality brings on classic OS problems

  • What state can a tx change?
    • memory protection
  • How many resources can a contract use?
    • resource contention

29 of 89

Ethereum in one slide

  • States S = a map from addresses to state

  • Inputs I (transactions)

  • Transition f:
    • validate signature
    • run to.code(from, data, value, startgas, gasprice)
  • Start state: ∅

address

code

storage

balance

nonce

from

sig, n

to

data

value

startgas

gasprice

may affect the state of any address

30 of 89

The Ethereum blockchain structure

prev

nonce

difficulty

miner

extra

height

time

state root

transaction root

receipt root

address

code

storage

balance

nonce

31 of 89

The Ethereum blockchain structure

prev

nonce

difficulty

miner

extra

height

time

state root

transaction root

receipt root

from

sig, n

to

data

value

startgas

gasprice

32 of 89

The Ethereum blockchain structure

prev

nonce

difficulty

miner

extra

height

time

state root

transaction root

receipt root

final state

gas used

log output

log bloom

33 of 89

Ethereum addresses can be accounts or contracts

address

code

storage

balance

nonce

account

contract

address

H(pub_key)

H(creator, nonce)

code

EVM code

storage

Merkle storage root

balance

ETH balance

nonce

#transaction sent

Volatile fields

Note: no UTXOs in Ethereum

34 of 89

Three* types of transaction in Ethereum

type

from

sig, n

to

data

value

startgas

gasprice

create

creator

sig, n

code

start_bal

53000

?

35 of 89

Three* types of transaction in Ethereum

type

from

sig, n

to

data

value

startgas

gasprice

create

creator

sig, n

code

start_bal

53000

?

send

sender

sig, n

receiver

transfer_val

30000

?

36 of 89

Three* types of transaction in Ethereum

type

from

sig, n

to

data

value

startgas

gasprice

create

creator

sig, n

code

start_bal

53000

?

send

sender

sig, n

receiver

transfer_val

30000

?

call

caller

sig, n

contract

f, args

transfer_val

?

?

37 of 89

Example:

NameCoin in Ethereum

38 of 89

39 of 89

Ethereum code written in Solidity, compiled to EVM

Ethereum VM Bytecode

Stack Language

LLL

Viper, Serpent

Solidity

Functional, macros,

looks like scheme

Types, invariants, looks like Javascript

Looks like python

Looks like Forth.

Defined in Yellowpaper

40 of 89

Ethereum

Virtual

Machine

41 of 89

EVM is stack-based, like BTC script

Features

  • 1024-depth stack
  • 32-byte words
  • Accelerated crypto
    • SHA-3
    • Big num multiply
    • GF-256 operations

PUSH1 0

CALLDATALOAD

SLOAD

NOT

PUSH1 9

JUMPI

STOP

JUMPDEST

PUSH1 32

CALLDATALOAD

PUSH1 0

CALLDATALOAD

SSTORE

42 of 89

EVM memory model offers a lot of space

Storage: {0,1}256→{0,1}256 map (persistent)

Memory: {0,1}256→{0,1}256 map (volatile between tx)

  • in other words, both can represent 2264 bits!
  • arranged in 256-bit words
  • all memory is zero-initialized

Storage in Ethereum is very expensive. Limiting memory use is critical

43 of 89

EVM provides basic API for I/O

Input:

  • tx info: sender, value, gas limit
  • resource use: gas remaining, memory used
  • block info: depth, timestamp, miner, hash

Output:

  • send messages (call other contracts and/or send money)
  • write to logs
  • self destruct

44 of 89

Subtleties to contract calls

  • Data: unlimited params/return values
    • Direct mapped to memory address + size

  • Exceptions: out of gas, bad jump, etc.
    • No state changes persisted
    • Control returns to caller

  • Call stack limit: 1024
    • Calls from 1024th frame will fail

45 of 89

Efficient map commitments

46 of 89

How can Ethereum commit to so much state?

Recall: storage is a {0,1}256→{0,1}256 map

Requirements:

  • 2n keys supported
  • k keys with a non-zero value
  • O(1) commitment size
  • O(lg k) update cost
  • O(lg k) proofs (even for zero values)

Key insight:

Optimize for efficient storage of zeroized values

47 of 89

Non-solution: use a binary tree, paths encode address

Alice

Bob

Carol

David

001

Alice

100

Bob

110

Carol

111

David

0

0

0

0

0

0

0

1

1

1

1

1

1

1

Problem:

Requires O(2n) computation

48 of 89

Optimization #1: pre-compute all sizes of empty tree

Empty[2]

Empty[1]

Alice

Bob

Empty[1]

Carol

David

001

Alice

100

Bob

110

Carol

111

David

0

0

0

0

0

0

1

1

1

1

1

1

This almost works!

Proofs are length O(lg n)

49 of 89

Optimization #2: shrink all single subtrees; store prefix

“01”, Alice

“0”, Bob

“”, Carol

“”, David

001

Alice

100

Bob

110

Carol

111

David

0

0

0

1

1

1

This works!

Proofs are length O(lg k)

“kv node”

“diverge node”

50 of 89

Ethereum’s “Merkle PATRICIA tree” has a few quirks

  • Used in two main places
    • overall state
    • per-contract storage
  • 16-ary
    • proofs ~4x longer
  • paths can be < 256 bits
    • “1”, “01” are distinct
    • Really 2257-1 addresses

51 of 89

Gas and transaction limits

52 of 89

Ethereum is like Ryanair: pay to board, then keep paying

53 of 89

Gas in Ethereum is a necessary evil

  • All miners must evaluate all transactions
    • limit computation cost
  • All miners must store all state
    • limit storage use
  • Short-cut the halting problem
    • Finite GAS_LIMIT ensures all programs halt

Observe:

Bitcoin also employs a (crude) means to pay for resources consumed

54 of 89

Every operation has a fixed gas cost

opcodes

gas cost

Basic operations

ADD, MUL, PUSH, JUMP

2-10

Storage read

SLOAD

200

Storage write

SSTORE

5000

Storage write (from zero)

SSTORE

20000

Storage zeroize

SSTORE

-10000

Contract call

CALL, CODECALL, etc.

700

Transaction overhead

n/a

21000

Contract creation

n/a

32000

Contract destruction

SELFDESTRUCT

-19000

55 of 89

Gas metering is complex

  1. Transactions specify START_GAS, GAS_PRICE
  2. If START_GAS ⨉ GAS_PRICE > caller.balance, halt
  3. Deduct START_GAS ⨉ GAS_PRICE from caller.balance
  4. Set GAS = START_GAS
  5. Run code, deducting from GAS
  6. For negative values, add to GAS_REFUND
    1. GAS only decreases
  7. After termination, add GAS + GAS_REFUND to caller.balance

56 of 89

Out-of-gas exceptions are bad news

  • State reverts to previous value
    • Except that START_GAS * GAS_PRICE is still deducted

57 of 89

Callers can choose how much gas to send

A:

function a():

assert(msg.gas == 100);

x = B.b.gas(10)()

return x + “ World!”

B:

function b():

assert msg.gas == 10

y = C.c.gas(5)()

assert(y == 0);

// out of gas

return “Hello”

C:

function c():

assert(msg.gas == 5);

while (true) {

Loop

}

return “Bonjour”

Out of gas

“Hello”

100

10

5

“Hello World!”

58 of 89

Gas today is typically 5⨉10-9 ether = 9⨉10-7 USD

opcodes

gas cost

USD cost [Aug ‘17]

Basic operations

ADD, MUL, PUSH, JUMP

2-10

~10-5

Storage read

SLOAD

200

0.0003

Storage write

SSTORE

5000

0.008

Storage write (from zero)

SSTORE

20000

0.032

Storage zeroize

SSTORE

-10000

-0.016

Contract call

CALL, CODECALL, etc.

700

0.0011

Transaction overhead

n/a

21000

0.033

Contract creation

n/a

32000

0.051

Contract destruction

SELFDESTRUCT

-19000

-0.031

59 of 89

Economics of gas are similar to transaction fees

  • Miners choose transactions based on GAS_PRICE

  • In theory, they should not care which opcodes are used
    • In practice, some “overpriced” opcodes may be preferred

  • Maximum gas limit per block
    • Miners can slowly raise it, each block votes

60 of 89

Solidity

61 of 89

Solidity should look familiar

  • Syntax looks like C++, JavaScript etc.

  • Contracts look like classes/objects
    • Can mark functions internal

  • Static typing
    • Most types can be cast e.g. bool(x)

62 of 89

Solidity types

  • bool, uint8, uint16, ... uint256, int8, ... int256
  • address
  • string
  • byte[]
  • mapping(keyType ==> valueType)

63 of 89

Clever implementation of maps in Solidity

mapping(string => uint256) balances;

  • every item requires at least one 256-bit word
  • balances[“Andrew”] is 0 if “Andrew” doesn’t exist or if “Andrew” has 0 balance
  • to delete a key, set balances[“Andrew’] = 0
  • Cannot delete an entire map!

Alice

15

Bob

15

Joe

100

0

2256

H(“balances”|”Bob”)

H(“balances”|”Joe”)

H(“balances”|”Alice”)

15

15

100

64 of 89

Polite contracts call throw on errors

uint8 numCandidates;

uint32 votingFee;

mapping(address => bool) hasVoted;

mapping(uint8 => uint32) numVotes;

/// Cast a vote for a designated candidate

function castVote(uint8 candidate) {

if (msg.value < votingFee)

return;

if (hasVoted[msg.sender])

throw;

hasVoted[msg.sender] = true;

numVotes[candidate] += 1;

}

Throw ensures no effects persisted except gas consumption

65 of 89

Solidity gotchas (many more tomorrow)

  • Member variables public by default
    • Setters, getters automatically provided
  • Functions must be marked payable to accept funds
  • Member variables go to storage by default
    • Method variables go to memory
  • Fallback function()
    • Called if no function specified (e.g. send)
    • Called if non-existent function called
  • msg.sender vs. tx.origin

66 of 89

Solidity and EVM may outgrow Ethereum itself

- Enterprise Ethereum Alliance, still in infancy (Announced Feb 28)

-Goal: support EVM, Solidity and tools for private blockchains

- maintain compatibility with Ethereum network

67 of 89

Don’t like Solidity? Write your own language!

Ethereum VM Bytecode

Stack Language

Lower-Level Language

Viper

Solidity

Typed, looks like JS

Untyped, looks like python

Looks like Forth.

Defined in Yellowpaper

Your work here

Typed, looks like Go

Bamboo

Looks like Norse poetry

68 of 89

Ethereum project

69 of 89

Ethereum is “run” by the Ethereum Foundation

Compatible “alt-clients” exist (e.g. Parity, Consensys)

70 of 89

Ethereum blockchain is different than Bitcoins

Ethereum

Bitcoin

Target time between blocks

14.5 seconds

10 minutes

Proof of work

Equihash

SHA-2562

Stale block rewards

Uncle rewards

none

71 of 89

Hard Forks are planned in Ethereum

release

date

Frontier

July 2015

Homestead

March 2016

DAO hard fork

July 2016

Metropolis

2017?

Serenity

??

72 of 89

Ethereum has been chasing Bitcoin

Source: coinmarketcap.com

73 of 89

Price has been a wild ride recently

74 of 89

Research challenges

75 of 89

Ethereum makes data public by default

  • Proposals:
    • Project Alchemy-exchange Eth for Zcash
    • SNARKs for token-issuing contracts
      • Acceleration within EVM?
    • Hawk: The blockchain model of cryptography and privacy-preserving smart contracts [Khosba et al. 2016]

76 of 89

Verifying consistency of Ethereum implementations

  • at least 7 EVM implementations
    • C++, Go, Haskell, Java, Python, Ruby, Rust

  • Inconsistency can be exploited to cause a hard fork!

77 of 89

Verifying correctness of Ethereum contracts

Can you spot the bug?

78 of 89

Scaling is limited as nodes verify all contracts

  • Can’t always determine which state a tx will change

  • Goal is to support sharding
    • Most nodes track only a random subset of contracts
    • Super nodes process cross-shard communication
    • Details get complicated... great research topic!

https://github.com/ethereum/wiki/wiki/Sharding-FAQ

79 of 89

Proof-of-stake to replace proof-of-work?

https://medium.com/@VitalikButerin/safety-under-dynamic-validator-sets-ef0c3bbdf9f6

80 of 89

Explore more!

81 of 89

Explore the blockchain: https://etherscan.io

82 of 89

State of the Dapps: https://dapps.ethercasts.com/

83 of 89

More this week!

  • Tuesday: Practical programming exercise!

  • Wednesday: Security issues in smart contract programming

84 of 89

Verifying consistency of Ethereum implementations

  • There are at least 7 EVM implementations
    • C++, Go, Haskell, Java, Python, Ruby, Rust

  • Any EVM inconsistency can be exploited to cause a hard fork!

85 of 89

Bitcoin script instructions

256 opcodes total (15 disabled, 75 reserved)

  • Arithmetic
  • If/then
  • Logic/data handling
  • Crypto!
    • Hashes
    • Signature verification
    • Multi-signature verification

86 of 89

Side note: Namecoin got the incentives badly wrong

% of all Namecoin registrations

#Names claimed

An empirical study of Namecoin and lessons for decentralized namespace design

Harry Kalodner, Miles Carlsten, Paul Ellenbogen, Joseph Bonneau and Arvind Narayanan. WEIS 2015

87 of 89

Three levels of contract call in Ethereum

Original message:

Results of a call to C:

CALL

CALLCODE

DELEGATE CALL

msg.sender

A

B

value

x’≤ B.balance

data

(as specified)

startgas

s’ ≤ gas remaining

storage updated

C

B

from

sig, n

to

data

value

startgas

gasprice

A

sig

B

d

x

S

g

88 of 89

Modifiers ease repetitive safety checks

address public owner;

uint public electionEnd;

modifier onlyBy(address _account){

require(msg.sender == _account);

_;

}

modifier onlyAfter(uint _block) {

require(block.blocknumber >= _block);

_;

}

function endElection()

onlyBy(owner) onlyAfter(electionEnd){

89 of 89

Built-in support for calling other contracts

- Contract member variables if public, automatically defines a “getter”

- Modifiers payable, constant, returns(), also modifiers can be user defined

- Macros / Internal Functions internal modifier -> does not require a “message”

- Type conversions int(x), uint256(x), bool(x)

- Structs, arrays, mappings, memory vs storage

array: int[2] x; hashmap mapping ( int[2] => int );

- Throwing exceptions throw; // exceptions contain no data

- Units (currency: “eth”, “wei”, etc.) 3 * (2 eth)

  • a.send(x) sends x to address a
    • returns 0 if this fails due to call stack

  • foo.call.value(3).gas(20764)( bytes4(sha3("bar()")));
    • also callcode, delegatecall
    • default is 0 value, all available gas

  • new constructor deploys a new contract
    • Careful, it’s expensive!

Remember:

Smart contracts code is fixed forever.

Calls required to update functionality