Raft and Distributed Consensus
Michael
1
A Stateful Service
2
3
client
x:4
x←4
4
client
x:4
ok
High Availability
5
6
S
client 1
M
S
client 2
Challenges
7
Leader Who?
8
S
client 1
M
S
client 2
✘
Brain Split
9
S
client 1
M
S
client 2
Raft
10
11
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client 2
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client 1
Three Parts of Raft
12
Leader Election
13
Server States
14
Follower
Candidate
Leader
win election
no heartbeat
start/recover
discover higher term
randomized timeout
discover current leader or new term
passive but expects a leader
request_vote RPC
append_entries RPC
Terms/Quorum(Majority Vote)
15
Raft divide time into Terms
Terms
16
Each Term starts with an election
Terms
17
If election timeout, it is fine, just wait for a new one to start.
Election Algorithm
18
current_term += 1
vote for self
ask for vote from all peers
Election Outcome 1
19
Simple Election
20
F
F
F
5 ms
T=0
150 ms
T=0
50 ms
T=0
Simple Election
21
F
C
F
T=1
150 ms
T=0
50 ms
T=0
Simple Election
22
F
C
F
T=1
32 ms
T=1
172 ms
T=1
ok
ok
Simple Election
23
F
L
F
T=1
121 ms
T=1
188 ms
T=1
Election Outcome 2
24
Learn About the Winner
25
F
F
F
5 ms
T=0
4 ms
T=0
231 ms
T=0
Learn About the Winner
26
F
C
C
112 ms
T=1
341 ms
T=1
226 ms
T=0
vote for the first request
Learn About the Winner
27
F
L
C
112 ms
T=1
341 ms
T=1
183 ms
T=1
no, I have voted for other node
ok
Learn About the Winner
28
F
L
C
112 ms
T=1
341 ms
T=1
183 ms
T=1
Learn About the Winner
29
F
L
F
T=1
177 ms
T=1
214 ms
T=1
Election Outcome 3
30
Split Vote
31
F
F
F
5 ms
T=0
4 ms
T=0
7 ms
T=0
Split Vote
32
C
C
C
217 ms
T=1
251 ms
T=1
152 ms
T=1
Split Vote
33
C
C
C
52 ms
T=1
242 ms
T=1
177 ms
T=1
Split Vote
34
C
C
C
T=2
190 ms
T=1
125 ms
T=1
Split Vote
35
F
L
F
T=2
153 ms
T=2
225 ms
T=2
ok
ok
Election Correctness
36
Log Replication
37
Log Entry
38
3
x←4
4
index
term
cmd
Log Structure
39
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
leader
Replicate Log with AppendEntris Consistency Check
40
Replicate Log
41
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
✓match
leader
Replicate Log
42
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
✓match
3
x←3
8
leader
Replicate Log
43
3
x←3
8
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
✘mismatch
leader
Replicate Log
44
3
x←3
8
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
✘mismatch
leader
Replicate Log
45
3
x←3
8
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
✘mismatch
leader
Replicate Log
46
3
x←3
8
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
✓match
leader
Replicate Log
47
3
x←3
8
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
✓match
3
x←3
8
3
y←7
5
3
x←0
6
3
y←9
7
leader
Committed Entries
48
Committed Entries
49
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
leader
Committed Entries
50
server 1:
server 2:
server 3:
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
3
x←3
8
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
3
y←7
5
3
x←0
6
3
y←9
7
1
x←2
1
1
y←3
2
1
x←4
3
2
z←x
4
2
y←x
5
leader
Safety
51
Election Restriction
52
53
54
Reject
One Edge Case
55
Fig 8
56
because S1 is at T4, it replicates the past term entry in its log at T2. It has not replicated the current term T4 entry successfully yet.
Remember committed(recompuated and start from 0 whenever a new leader is elected) needs two requirements:
Leader Completeness Property
57
Fig 9
58
Client Interaction
59
60
F
client 1
L
F
client 2
x←4
z←x
61
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
z←x
62
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
z←x
z←x
63
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
z←x
z←x
64
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
z←x
z←x
z←x
z←x
65
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
z←x
z←x
z←x
z←x
66
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
z←x
z←x
z←x
ok, 4
67
2
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
1
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
client
3
x←2
y←3
x←4
Log
Consensus
Module
App State
Machine
z←x
z←x
z←x
Run Away Example
68
Runaway Example
69
F
L
F
F
F
Case 1
70
Runaway Example: Case 1
71
F
F
F
L
F
Runaway Example: Case 1
72
F
L
F
L
F
Runaway Example: Case 1
73
F
L
F
L
F
who has a higher term ?
Case 2
74
Runaway Example: case 2
75
F
L
F
F
F
Runaway Example: case 2
76
F
L
F
C
C
Term 82
Term 3
Runaway Example: case 2
77
F
L
F
C
C
Term 82
Term 82
Runaway Example: case 2
78
F
F
F
C
C
Term 82
Term 82
vote for me, please
Runaway Example: case 2
79
F
F
F
C
C
Term 82
Term 82
reject, because log
Runaway Example: case 2
80
C
F
F
C
C
Term 82
Term 83
Runaway Example: case 2
81
L
F
F
F
F
Term 83
Term 83
ok
ok
ok
ok
Reference
82