1 of 126

Correct-by-construction Casper:�� Binary consensusSharding

ETHcc

March 8th, 2018

Vlad Zamfir :P

2 of 126

Outline

  • Part 1
    • Introduction to consensus
    • Correct-by-construction protocol design
    • CBC Casper safety proof
  • Part 2
    • Specification of binary, integer, and blockchain protocols
    • Safety oracles
    • Liveness
    • Subjective fault tolerance thresholds
    • Performance and Overhead
    • Sharding!
  • Part 3
    • Conclusion and future work
    • How to get involved!

3 of 126

Part 1: Introduction

4 of 126

Introduction to Consensus Pt. 1

Consensus protocols allow nodes on a distributed network to make consistent decisions given mutually exclusive possibilities.

For example, if we are “coming to consensus on a bit”, then the goal of a consensus protocol is to have nodes all output “0” or all output “1.”

5 of 126

Introduction to Consensus Pt. 2

Instead of a boring 0/1, we can also decide on values for:

  1. Integers!
  2. Blockchains!!
  3. Sharded blockchains!!!

6 of 126

Introduction to Consensus Pt. 3

We want our consensus protocol to have safety and liveness.

Safety: if (protocol-following) nodes do decide on values,� then they will decide on values that do not conflict.

Liveness: any (protocol-following) node will eventually make a decision� that accepts or rejects any given value.

7 of 126

Introduction to Consensus Pt. 4

Consensus is tricky because messages can be delayed, reordered or lost. We describe consensus protocols in terms of the adverse conditions their safety and liveness can survive.

A consensus protocol is called “asynchronously safe” if it has consensus safety regardless of timing assumptions.��It is called “Byzantine fault tolerant” for t faults if it survives t nodes behaving arbitrarily rather than following the protocol.

8 of 126

Introduction to Consensus Pt. 5

FLP impossibility showed that it’s impossible for a deterministic protocol to be guaranteed to be safe and live in asynchrony, in the context of any faults.

We will present a family of “correct-by-construction” deterministic, asynchronously safe, Byzantine fault tolerant consensus protocols!

They are not necessarily live in asynchrony (FLP), but they are live in a partially synchronous network.

9 of 126

Correct-by-construction protocol design

10 of 126

“Traditional” approach to protocol design

1) Formally specify the protocol

2) Define properties that the protocol must satisfy (in the case of consensus, something related to safety and liveness)

3) Prove that the protocol has the given properties

11 of 126

The “Correct-by-Construction” Way

1) Formally but only partially specify the protocol

2) Define properties and proofs that protocol must satisfy

3) Derive more of the protocol in a way that is proven to satisfy them

12 of 126

The goal is to make proving the protocol’s correctness almost trivial.

13 of 126

Traditional: Protocol first, analysis second�CBC: Analysis first, protocol second

14 of 126

CBC Casper Safety Proof

15 of 126

CBC Casper Safety Proof�Part 1

16 of 126

CBC Casper Safety Prerequisites

Before we start our safety proof, we do need to define a few things:

The protocol has:

  • Protocol states, Σ
  • Protocol state transitions, T

They describe how individual �(protocol-following) nodes can evolve

17 of 126

CBC Casper Safety Prerequisites

Before we start our safety proof, we do need to define a few things:

We write σ → σ’ to denote the existence of a �protocol state transition from σ to σ’��We insist that we have the properties:

  • σ → σ for all protocol states σ (identity)
  • If σ → σ’ and σ’ → σ’’ then σ → σ’’ (transitivity)

18 of 126

CBC Casper Safety Proof Pt. 1

Before we start our safety proof, we do need to define a few things:

The protocol has:

A function called “the estimator”, E, mapping �protocol states Σ to possible values of the consensus, C

E : Σ → C

19 of 126

CBC Casper Safety Proof Pt. 1

Before we start our safety proof, we do need to define a few things:

In the binary consensus,�the estimator maps protocol states to {0, 1}.

In the blockchain consensus it is the “fork choice rule”, �mapping protocol states to blockchains.

20 of 126

CBC Casper Safety Proof Pt. 1

Before we start our safety proof, we do need to define a few things:

The protocol has:

A type system for reasoning about consensus values.

In the “types-as-formulas” tradition, we take �types to be terms in a logic for describing consensus values.

21 of 126

CBC Casper Safety Proof Pt. 1

Before we start our safety proof, we do need to define a few things:

For a proposition p (a proposition about values of the consensus),�we will write c ⇒ p for consensus value c, if c satisfies p

So we can write E(σ) ⇒ p, �when the estimator on state σ entails proposition p

22 of 126

CBC Casper Safety Proof Pt. 1

Before we start our safety proof, we do need to define a few things:

Estimate safety:

A proposition p is safe at protocol state σ, �if it holds for the estimator for all future protocol states σ’.

S(p, s) ⇔ for all σ’. σ→σ’ ⇒ ( E(σ’) ⇒ p )

23 of 126

CBC Casper Safety Proof Pt. 1

Before we start our safety proof, we do need to define a few things:

Estimate safety:

A proposition p is safe at protocol state σ, �if it holds for the estimator for all future protocol states σ’.

S(p, s) ⇔ for all σ’. σ→σ’ ⇒ ( E(σ’) ⇒ p )

24 of 126

CBC Casper Safety Proof Pt. 1

Lemma 1: σ → σ’ and S(p, σ) ⇒ S(p, σ’) (forward safety)

Lemma 2: σ → σ’ and S(p, σ’) ⇒ ¬S(¬p,σ) (backwards consistency)

2

2

25 of 126

CBC Casper Safety Proof Pt. 1

Theorem (distributed consistency, aka consensus safety)

σ1→σ’, σ2 → σ’ then S(p, σ1) ⇒ ¬S(¬p, σ2)

From S(p, σ1) and lemma 1, we have S(p, σ’)�From S(p, σ’) and lemma 2, we have ¬S(¬p, σ2)

Therefore S(p, σ1) ⇒ ¬S(¬p, σ2)

26 of 126

CBC Casper Safety Proof Pt. 1

This theorem can be interpreted as consensus safety:

We have S(p, σ1) ⇒ ¬S(¬p, σ2), which is equivalent to�⇔ ¬S(p, σ1) ∨ ¬S(¬p, σ2)�⇔ ¬(S(p, σ1) ∧ S(¬p, σ2))��Which says it’s impossible for �p to be safe at σ1 and ¬p to be safe at σ2�(for any σ1 and σ2 with a common future state)

27 of 126

CBC Casper Safety Proof Pt. 1

This theorem can be interpreted as consensus safety:

Estimate safety is a local property:�An estimate is safe if holds for all future protocol states� (from this current protocol state).

Consensus safety is a distributed property: �Protocol-following nodes make mutually-consistent decisions.

28 of 126

CBC Casper Safety Proof Pt. 1

This theorem can be interpreted as consensus safety:

Estimate safety has the “distributed consistency” property �for any collection states with a common future, �S(p,σ) ⇒ ¬S(¬p, σ’)�⇔ ¬(S(p,σ1) ∧ S(¬p, σ2))��As long as nodes have common future protocol states,�Then we can take advantage of our theorem for �decisions that “correspond only to safe properties”.

29 of 126

CBC Casper Safety Proof Pt. 1

This theorem can be interpreted as consensus safety:

By assuming that a protocol-following node at state σ only makes �decisions that correspond only to safe properties at state σ.

I.e. a node at state σ is allowed to make a decision on:

  • a proposition p if S(p, σ)
  • a value c if c ⇒ p implies S(p, σ)

30 of 126

CBC Casper Safety Proof Pt. 1

This theorem can be interpreted as consensus safety:

So if protocol-following nodes:

  • make decisions at state σ that correspond � only to properties that have estimate safety at σ
  • have a common future state

Then we have the property that our “distributed consistency” �property is equivalent to consensus safety.��

31 of 126

CBC Casper Safety Proof Pt. 1

This theorem can be interpreted as consensus safety:

The cbc safety proof (part 1) therefore shows:

Nodes who make decisions on “safe estimates” and �have common future protocol states have consensus safety

32 of 126

CBC Casper Safety Proof�Part 2

33 of 126

CBC Casper Safety Proof Pt. 2

The first part of the safety proof assumes that nodes have a common future protocol state.

The second part will guarantee that protocol-following nodes have common future states (so long as there are not too many Byzantine faults).

34 of 126

Non-triviality and Consensus failure

But first, we will observe that Byzantine faults can cause consensus failure in any non-trivial consensus protocol.

35 of 126

Non-triviality and Consensus failure

We will say that the protocol is non-trivial if there exists any protocol state that isn’t safe on p or on ¬p, there exists a transition to a state σ1 and to a state σ2 such that p is safe at σ1 and ¬p is safe at σ2.

This means that a node at state σ1 and a node at state σ2 do not have a common future protocol state. But they must be able to each get to either of these states.

36 of 126

Non-triviality and Consensus failure

Nodes therefore cannot independently follow any state transitions available to them, but must instead transition through the protocol as a function of interactions with other nodes.

37 of 126

Non-triviality and Consensus failure

We are going to assume that:

  • Some nodes are “named” as consensus-forming nodes
  • Protocol following nodes only experience state transitions when receiving signed messages from these named nodes.

38 of 126

Non-triviality and Consensus failure

Equivocation faults are fundamentally enough to cause consensus failure.

“Equivocation” is behaviour exhibited by a named node that can’t be produced by a single execution of the protocol, but can be produced by more than one protocol execution.

39 of 126

Non-triviality and Consensus failure

Equivocation faults are fundamentally enough to cause consensus failure.

If protocol-following nodes are to make consistent decisions, they must be able to synchronize based on signed messages from the consensus forming nodes.

�However, if each of these nodes are equivocating, they might be running completely independent versions of the protocol for each of our nodes.

40 of 126

Non-triviality and Consensus failure

Equivocation faults are fundamentally enough to cause consensus failure.

If we might be running through protocol states independently, then why should we be able to maintain consensus safety?

�We might be able to have consensus safety if only some proper subset of consensus forming nodes are equivocating.�

41 of 126

Non-triviality and Consensus failure

Equivocation faults are fundamentally enough to cause consensus failure.��The question for us, then, is how do we make sure that nodes have a common future protocol state as long as there is less than some number of equivocations?

Well first, let’s be sure that we can count equivocations!

42 of 126

CBC Casper Safety Proof Pt. 2

We now insist that messages attest that the sender was at a particular protocol state at the time of sending:

�Check if there exists a single protocol execution through each�the protocol states observed in the sender’s messages.

43 of 126

CBC Casper Safety Proof Pt. 2

From a set of messages from many senders, we can detect equivocation from each sender, and check how many of them equivocated

If the validators have weight, then we can calculate the �total weight of equivocating validators

44 of 126

CBC Casper Safety Proof Pt. 2

Tying the knot:�

Let protocol states be sets of messages that exhibit�not more than some number (weight) of Byzantine faults

45 of 126

CBC Casper Safety Proof Pt. 2

Tying the knot:

�Receiving (or making) a message that says “I’m at this state” �represents a protocol state transition

�Receiving a message m at protocol state σ means moving �to state σ + m, if it doesn’t exhibit more than t faults.�

46 of 126

47 of 126

CBC Casper Safety Proof Pt. 2

Tying the knot:��Any message is an equivocation w.r.t. some other message � (that may actually never exist).��So receiving m at σ means no longer having a �common future protocol state with any set of messages σ’ �such that σ + σ’ doesn't exhibit more than t faults, �but σ + σ’ + m does

48 of 126

CBC Casper Safety Proof Pt. 2

Tying the knot:�

Two nodes have a common future state if there is not�more than t Byzantine faults in their union!

49 of 126

CBC Casper Safety Proof Pt. 2

Tying the knot:

σ1 and σ2 have a common future σ1 + σ2 ... if and only if σ1 + σ2 exhibits not more than t Byzantine faults

So: not more than t Byzantine faults �⇒ nodes only visit states with common futures

50 of 126

CBC Casper Safety Proof Pt. 2

Tying the knot:

So:

nodes exhibit not more than t Byzantine faults �⇒ nodes only visit states with common futures�⇒ consensus safety

51 of 126

CBC Casper Safety Proof Pt. 2

Tying the knot:

And:

consensus failure �⇒ nodes do not have a common future�⇒ nodes are at states whose union exhibit � more than t faults

Which is what was wanted!

52 of 126

Part 2: Specifications

53 of 126

Casper the Friendly Binary Consensus

For each of these protocols:

We will define:

  • protocol messages
    • So protocol states and their transitions are defined � (sets of messages with no more than t faults)
  • an estimator
    • So that we can relate protocol states to values of the consensus

�Because that’s all we need to satisfy the cbc framework

54 of 126

Casper the Friendly Binary Consensus

55 of 126

Casper the Friendly Binary Consensus

Protocol messages:�The set of protocol messages M is parametric in a set of validator names, V, which are the names of consensus forming nodes.

Protocol messages in FBC have three parts:

  • estimate (0 or 1)
  • sender (a validator name)
  • justification (a protocol state)
    • (i.e. a set of protocol messages).

56 of 126

Casper the Friendly Binary Consensus

Definitions:

Dependency: The dependencies of a message m are all of the messages in the justification of that message m (or in their justifications, or in their justifications).

Latest message: The latest message from a validator v in some protocol state σ is the message m with the properties:

  • m has v as a sender and
  • m is not in the dependency of any other m’ from v in σ.

57 of 126

Casper the Friendly Binary Consensus

Binary estimator:

Before we define the estimator for our binary consensus protocol,�let us define the score of an estimate e in some protocol state σ:

�The score for estimate e in protocol state σ is the sum of the weights of validators whose latest messages have an estimate e.

58 of 126

Casper the Friendly Binary Consensus

Binary estimator:

Now, we say that, for any protocol state, the estimator returns the estimate with the highest score.

For example, if more validators (by weight) have latest messages with the estimate “1”, then the estimator will return “1.”

59 of 126

Casper the Friendly Binary Consensus

Binary estimator:

If there is a tie, the estimator returns:

  • An exception, or
  • The proposition “0 or 1”, or
  • A non-deterministic choice of 0 or 1
  • Always returns 0, or
  • Always returns 1

Any of these are fine!

60 of 126

61 of 126

62 of 126

63 of 126

64 of 126

Casper the Friendly Integer Consensus (with median)

65 of 126

Casper the Friendly Integer Consensus

Protocol messages:

Just as binary, protocol messages have three parts:

  • an estimate (an integer),
  • a sender (a validator name), and
  • a justification (a protocol state)
    • (which itself is a set of protocol messages.)

66 of 126

Casper the Friendly Integer Consensus

Integer Estimator:

Instead of choosing the estimate with the highest score, as in the binary consensus, we choose the median of the integers given in latest message…

�...the median is weighted by the validators weights.

67 of 126

68 of 126

69 of 126

Casper the Friendly GHOST

70 of 126

Casper the Friendly Ghost

Protocol messages:

Blockchain protocol messages also have 3 parts:

  • an estimate (a block),
  • a sender (a validator name), and
  • a justification (a protocol state)
    • (which itself is a set of protocol messages)

You can think of the estimate as the previous block pointer!

71 of 126

Casper the Friendly Ghost

GHOST estimator:

Also only based on latest messages

72 of 126

Casper the Friendly Ghost

GHOST estimator:

Also only based on latest messages

73 of 126

74 of 126

75 of 126

76 of 126

Safety Oracle

77 of 126

A Simple Safety Oracle: “Clique Oracle”

The safety proof ensures that nodes that make decisions on safe estimates make consensus-safe decisions.

But we haven’t said how we might determine whether a proposition is safe at a given protocol state!

78 of 126

A Simple Safety Oracle: “Clique Oracle”

While we don’t need to detect safety to enjoy consensus safety on implicit decisions,

if we ever want to report or use our decisions we still want to be able to detect safe estimates.

We call our safety detectors “safety oracles”.

79 of 126

A Simple Safety Oracle: “Clique Oracle”

An ideal safety oracle OIdeal returns “True” on a given estimate p and state s�⇔ an ideal adversary fails to change a node’s estimate

But the ideal adversary can be computationally quite demanding, so we will describe one oracle that can be used to detect safe estimates for some protocols.

An oracle is safe if it has the property that O(p, σ) => OIdeal(p, σ)

80 of 126

A Simple Safety Oracle: “Clique Oracle”

The “clique oracle” will work for protocols with the property that: ��If a majority of latest messages from non-equivocating validators in a protocol state σ agree with an estimate e, then the estimator also entails e: E(σ) => e

81 of 126

A Simple Safety Oracle: “Clique Oracle”

So we can define “agreement” between estimates 𝒆 and 𝒆’ for each protocol:

  • Binary / Integer: 𝒆 = 𝒆’ (both are the same bit/number)
  • Blockchain: 𝒆↓𝒆’ (one block is “in the blockchain” of the other)

82 of 126

A Simple Safety Oracle: “Clique Oracle”

Given some protocol state σ, an “𝒆-clique” in σ is a set of validators where:

  • Each of the validators’ latest message agrees with estimate 𝒆.
  • Each can see each other agree with 𝒆, in their latest messages.
  • Each can’t see new latest messages from each that disagree with 𝒆.

83 of 126

A Simple Safety Oracle: “Clique Oracle”

These validators in an 𝒆-clique cannot “convince” each other to disagree with 𝒆.

If they have more than half of the weight, then neither can other validators…

(who have less than half of the weight)

...at least assuming that there aren’t any equivocating nodes in the 𝒆-clique.

84 of 126

A Simple Safety Oracle: “Clique Oracle”

If:

  • There is an e-clique in σ of non-Equivocating nodes with total weight We
  • The total weight of all validators is Wt

Then:

  • 𝒆 is safe in σ for a protocol with fault tolerance t < 2·We – Wt

85 of 126

A Simple Safety Oracle: “Clique Oracle”

The estimator will agree with the largest majority of latest messages.�So at least enough equivocation is required to create a bigger weight majority than�available in the clique: We.

�The minority has Wt - We weight

So we need at least f faults, by weight, such that Wt - We + f > We

⇔ f > 2We - Wt

86 of 126

Subjective Fault Tolerance Thresholds

87 of 126

Subjective Fault Tolerance Thresholds

One node can run protocol states as sets of messages exhibiting less than t1 faults, while another runs a protocol with the same messages but only when they exhibit less than t2 faults

Two nodes have a common future state (guaranteeing our consensus safety)�if the union of their views has less than min(t1,t2) faults

88 of 126

Liveness

89 of 126

Liveness

Consensus protocols have “liveness” if nodes are guaranteed to eventually make a decision]

FLP Impossibility states that it’s impossible to be both safe and live in an asynchronous network. (For deterministic protocols without cryptography)

All of our discussion so far has been without any timing assumptions (hence asynchronous safety)

90 of 126

Liveness

Indeed, we have said nothing about when nodes should make and send messages to each other

Instead, we’ve been speaking about arbitrary additions to arbitrary sets of protocol messages

This means that we can’t give a liveness proof (even in synchrony/partial synchrony) for the current protocol specification

91 of 126

Liveness

However, as we have seen in observations, there are many protocol states with safety.

And nodes can coordinate timeouts in order to produce a DAG of messages corresponding to these protocol states, in a synchronous network (and eventually in a partially synchronous network).

This means that we could have liveness in synchrony/partial synchrony, if we additionally recommended a message passing DAG, like “a blockchain”.

92 of 126

Overhead and Performance

93 of 126

94 of 126

95 of 126

96 of 126

97 of 126

98 of 126

99 of 126

100 of 126

Casper the Friendly Sharded Blockchain

�(with a hierarchical binary sharding and �fork-choice enforced atomicity of cross-shard blocks)

101 of 126

Casper the Friendly Sharded Blockchain

The safety proof says nothing about which protocol states nodes actually are required to eventually visit.

It therefore will allow for nodes to never see all blocks from all shards, while still providing the distributed consistency/consensus safety property!

102 of 126

Casper the Friendly Sharded Blockchain

So we are going to follow the same process as normal,

but instead of a bit, an integer, or a blockchain;

this time with a sharded data structure

103 of 126

Sharded Block Structure

Protocol messages:

Very similar to the blockchain, but:

  • There’s a chain for every shard
  • They can share “merge blocks”

104 of 126

Sharded Block Structure

Protocol messages:

Merge blocks:

  • Have previous block pointers to, and
  • Can be the previous block of

blocks on the merged shards

105 of 126

Sharded Block Structure

Protocol messages:

106 of 126

Sharded fork choice rule

The fork choice rule returns a blockchain for every shard

107 of 126

Fork-choice rule enforced atomicity of merge blocks

The fork choice rule enforced atomicity of merge blocks:

The merge block is in the fork choice of all of the shards being merged�Or it’s in the fork choice of none.

108 of 126

Fork-choice enforced atomicity of merge block safety

109 of 126

Fork-choice enforced �plausible atomicity of safe merge blocks

This is the weakest atomicity definition in this model:

110 of 126

Binary hierarchical sharded fork choice rule

We’re only going to allow some shards to have merge blocks�

Namely the ones that are connected by an edge, here:

111 of 126

Binary hierarchical sharded fork choice rule

Shards that are lower in the hierarchy follow their parent

I.e. the fork choice of the child is a function of the fork choice of the parent��But not visa versa

This allows nodes to stay on a single branch of the tree

112 of 126

Binary hierarchical sharded fork choice rule

Shards that are lower in the hierarchy follow their parent

They start their fork choice at the most recent merge block in the fork choice of their parent

113 of 126

Binary hierarchical sharded fork choice rule

Shards that are lower in the hierarchy follow their parent

The fork choice ignores or orphans any merge blocks that were orphaned by its parent’s fork choice

114 of 126

Binary hierarchical sharded fork choice rule

Shards that are lower in the hierarchy follow their parent

The parent can orphan blocks from the child shard

115 of 126

Binary hierarchical sharded fork choice rule

Shards that are lower in the hierarchy follow their parent

But the parent must only include merge blocks that are later than the previous merge blocks, for the child shard

116 of 126

Binary hierarchical sharded fork choice rule

So we are able to define a fork choice rule that guarantees the atomicity of cross-shard blocks, in this binary hierarchical sharding!

117 of 126

Binary hierarchical sharded fork choice rule

We will assign validators uniformly across every branch of the tree, so each only needs to do the fork choice rule for N shards out of 2N - 1

118 of 126

Part 3: Conclusion and Future work

119 of 126

Future Work

Research on the cbc-process:

  • Further analysis of the cbc protocol state structure and its derivation
  • Liveness Strategies/Measures/Proofs
  • Validator set rotation for all CBC Casper protocols
    • For now only for the friendly GHOST
  • New and more efficient safety oracles
    • Integer with average rather than median?
  • Estimators that return propositions rather than values

120 of 126

Future Work

Research on sharding:

  • Validator separation of duties
    • with bounded overhead
  • Satisfying weaker merge block atomicity definitions
  • Dynamic shard structure
    • Load balancing
    • Temporarily isolated shards
  • Heterogenous sharding

121 of 126

Future Work

  • Education
  • Implementation work
    • Prototype → testnet → production

122 of 126

Conclusion

The correct-by-construction framework for generating consensus protocols is pretty cool…

We were able to define lots of consensus protocols (including sharding!) without doing almost any extra analysis on a case-by-case basis!

123 of 126

Conclusion

We can explore lots of different liveness strategies, overhead/performance tradeoffs,

All without thinking about the maintenance of our Byzantine fault tolerant asynchronous safety.

124 of 126

Conclusion

This flexibility is useful for iterating fast, and progress has been accelerating!

The framework has been laid out for you to contribute!

Don’t be shy :)

125 of 126

Check out the docs, the repo, and help us!

Read the papers:

Contribute to the codebase: tinyurl.com/cbc-casper-code

Join the gitter and make yourself known: tinyurl.com/cbc-casper-chat

�Ask me about how you can contribute to cbc-casper research!

126 of 126

Thank you for listening�Thank you ETHcc! <3