ReCraft: Self-Contained Split, Merge, and Membership Change of Raft Protocol
Kezhi Xiong1 Soonwon Moon2 Joshua H. Kang1 Bryant Curto1
Jieung Kim3 Ji-Yong Shin1
1Northeastern University 2Seoul National University 3Yonsei University
The 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 26, 2025
1
Reconfiguration
2
Crucial for liveness and performance of a distributed system
Reconfiguration for distributed systems
3
Consensus-based services�(e.g., Zookeeper, etcd)
KV Store
AI/ML
Data Analytics
Config
Config
Config
Reconfiguration of consensus-based systems?
4
Consensus-based services
Configuration manager
…
Consensus-based system
Configuration manager
Consensus-based system
Reconfiguration of consensus-based systems?
5
Consensus-based system
Consensus
New configuration
Consensus decides new configuration
New configuration decides how consensus works
State machine replication log
Update
Update
New�Config
Update
Initial�Config
ReCraft: ❸ Self-Contained Split, Merge, and � ❷ Membership Change of ❶Raft Protocol
❷ Why new membership change?
❸ Why new split and merge protocols?
6
| Membership Change | Splitting/Merging |
Raft | Non-stopping | N/A |
Multi-Raft�(TiKV, CockroachDB) | N/A | External cluster manager Stopping |
ReCraft | Non-stopping� Fault-tolerant & performant Easier-to-implement | Self-contained� Non/Minimal-stopping |
❶ Why Raft? Most popular consensus protocol
ReCraft
7
ReCraft and Raft assumptions and conditions
8
Q is usually majority, but can be bigger during reconfiguration
Raft Revisited
9
Total # of nodes = 2f+1 = 3
Quorum size = f+1 = 2
Max # of failures = f = 1
A
C
B
0
α
0
α
0
α
Req Vote Term #1
Leader Election
(Leader must have an up-to-date log)
Req Vote Term #2
Committed due to Quorum Overlap:
all possible majority sets contain “β”
{A, B} {A, C} {B, C} {A, B, C}
New leader carries β
1
Replication
A
C
B
1
0
α
0
α
0
α
A
C
B
Leader Election
0
α
0
α
0
α
1
β
1
β
2
γ
2
γ
2
γ
2
γ
1
β
Replication
1
β
A
C
B
0
α
0
α
0
α
1
β
2
1
β
1
β
β
1
!
All decisions require an agreement of a quorum
!
Raft Membership Change
10
Old Config {A, B, C}
New Config (NC) {A,B,C,D,E}
Raft
B
C
D
A
E
…
…
…
…
…
NC
JC
JC
JC
JC
NC
NC
NC
NC
JC
NC
Joint Config (JC) {A,B,C} {A,B,C,D,E}
B
C
D
A
E
…
…
…
…
…
JC
JC
JC
JC
JC
JC
B
C
A
…
…
…
Checks for quorum Q-new:
3 of {A,B,C,D,E}
Checks for quorum Q-old:
2 of {A,B,C}
Checks for Q-old
Checks for Q-new
1. activates TWO quorums
JC
2. adjusts to Q-new only
NC
Raft vs ReCraft Membership Change (see Paper for details)
11
Old Config {A, B, C}
New Config (NC) {A,B,C,D,E}
2. adjusts to Q-new (conditional)
ReCraft
Raft
B
C
D
A
E
…
…
…
…
…
NC
N+
N+
N+
N+
NC
NC
NC
NC
N+
NC
Interim Config (N+) {A,B,C,D,E}
B
C
D
A
E
…
…
…
…
…
N+
N+
N+
N+
N+
N+
B
C
A
…
…
…
Checks for Q-new
Checks for Q-old
Checks for quorum
Q-new+: 4 of {A,B,C,D,E}
❶ Minimum overlapping
quorum size with Q-old�❷ Naturally,
Q-new+ ≥ Q-new
1. activates TWO quorums
JC
2. adjusts to Q-new only
NC
❷
❶
1. activates ONE Q-new+ quorum
N+
When Q-new+ = Q-new,
ReCraft can omit one consensus step
(e.g., adding 2 nodes to 2-node cluster)
NC
ReCraft
12
ReCraft Split Based on Quorum Overlap
13
B
C
A
E
D
EnterJoint (EJ)
B
C
A
E
D
B
C
A
E
D
LeaveJoint (LJ)
ReCraft Split
14
B
τ
C
τ
A
τ
E
τ
D
τ
EJ
EJ
EJ
EJ
EJ
α
α
α
α
α
EnterJoint
When received:
Election: Q-joint
Commit: Q-old
B
τ
C
τ
A
τ
E
τ
D
τ
EJ
EJ
EJ
EJ
EJ
LJ
LJ
LJ
LJ
LJ
α
α
α
α
α
LeaveJoint
LeaveJoint
When received:
Election: Q-joint
Commit: Q-new-1
When received:
Election: Q-joint
Commit: Q-new-2
LeaveJoint
LeaveJoint
When committed:
Election: Q-new-1
Commit: Q-new-1
When committed:
Election: Q-new-2
Commit: Q-new-2
EnterJoint (EJ)
LeaveJoint (LJ)
Split
Done
B
τ
C
τ
A
τ
E
τ
D
τ
α
α
α
α
α
Epoch: 1
Conf-old
Election: Q-old
Commit: Q-old
Allows non-stopping
updates to new config
Commit with
smaller # of messages
than Q-joint > Q-old
Prefix of term #;
incremented when Split/Merge succeeds
B
C
A
E
D
τ
τ
τ
τ
τ
EJ
EJ
EJ
EJ
EJ
LJ
LJ
LJ
LJ
LJ
β
β
β
π
π
α
α
α
α
α
Epoch: 2
Epoch: 2
α – μ
ν - ω
α – ω
What can go wrong?
15
B
τ
C
τ
A
τ
E
τ
D
τ
α
α
α
α
α
Epoch: 1
Conf-old
Election: Q-old
Commit: Q-old
E
τ
D
τ
EJ
EJ
LJ
LJ
α
α
Epoch: 2
B
τ
C
τ
A
τ
EJ
EJ
EJ
LJ
LJ
LJ
α
α
α
Epoch: 2
E
τ
D
τ
α
α
Epoch: 1
EJ
B
τ
C
τ
A
τ
EJ
EJ
EJ
α
α
α
EnterJoint
When received:
Election: Q-joint
Commit: Q-old
E
τ
D
τ
α
α
!
!
LJ
B
τ
C
τ
A
τ
EJ
EJ
EJ
LJ
LJ
LJ
α
α
α
E
τ
D
τ
α
α
!
!
LeaveJoint
When received:
Election: Q-joint
Commit: Q-new-1
LeaveJoint
When committed:
Election: Q-new-1
Commit: Q-new-1
D and E can get stuck
EJ
EJ
LJ
LJ
If epoch is larger,
pull data up to LJ
Updates only when Split or Merge is done.
Clearly marks configuration change
Epoch: 1
α – ω
α – ω
α – μ
ν - ω
ReCraft Merge: Lock-based-2PC over Consensus
16
Merge�Prep (TX)
B
C
A
α
α
α
TX
TX
TX
B
C
A
α
α
α
TX
TX
TX
E
F
D
π
π
π
TX
TX
TX
Merge
Commit�(C)
B
C
A
α
α
α
TX
TX
TX
C
C
C
E
F
D
π
π
π
TX
TX
TX
C
C
C
C
OK
Snapshot
Exchange & Merge
B
C
A
α π
α π
α π
E
F
D
α π
α π
α π
α
π
B
C
A
α
α
α
Epoch: 3
E
F
D
π
π
π
Epoch: 5
α – μ
ν - ω
B
C
A
α π
α π
α π
E
F
D
α π
α π
α π
Epoch: 6
α – ω
TX
2PC prepare
2PC commit/abort
Data exchange
Only blocking operation in ReCraft
What can go wrong?
17
B
C
A
α
α
α
Epoch: 3
E
F
D
π
π
π
Epoch: 5
α – μ
ν - ω
2PC prepare
Merge�Prep (TX)
B
C
A
α
α
α
TX
TX
TX
Merge�Prep (TX)
E
F
D
π
π
π
TX
TX
TX
Merge
Abort�(A)
B
C
A
α
α
α
TX
TX
TX
A
A
A
E
F
D
π
π
π
TX
TX
TX
A
A
A
2PC commit/abort
Merge
Abort�(A)
B
C
A
α
α
α
TX
TX
TX
E
F
D
π
π
π
TX
TX
TX
E
F
D
π
π
π
TX
TX
TX
A
A
A
B
C
A
α
α
α
TX
TX
TX
A
A
A
The main cause of a “NO” answer is
ongoing transaction, our precondition #1
TX
TX
NO
NO
Epoch: 3
Epoch: 5
α – μ
ν - ω
Evaluation
18
Adopted Multi-Raft baseline
19
α – ω
α – ω
N/A
REMOVE
& RESET
STOP
ν - ω
α – μ
RUN
ν - ω
α – μ
STOP
α - ω
α – μ
COPY
DATA
α - ω
RESET
N/A
ADD MEMBERS
α – ω
RUN
Split Procedures
Cluster Manager
ν - ω
α – μ
Merge Procedures
Cluster Manager
ν - ω
α – μ
COPY
DATA
Split
20
ReCraft
Multi-Raft
Duration of Performance Dip for Split
ReCraft Does not need data transfer: always constant time
1 x 6 node cluster to 2 x 3-node clusters
Lower is better
Merge
21
ReCraft
Multi-Raft
Duration of Performance Dip for Merge
ReCraft blocks minimally and transfers data in parallel
MultiRaft serially sends data through the cluster manager
2 x 3-node clusters to 1 x 6 node cluster
Lower is better
Fault tolerance
22
operation | ReCraft | Multi-Raft | |||
Phase 1 | Phase 2 | Phase 3 | Standalone Cluster Manager | Replicated Cluster Manager | |
Split | fold + 1 | N (fsub + 1) | - | 1 | fcm + 1 |
Merge | fsub + 1 | fsub + 1 | fsub + 1 | 1 | fcm + 1 |
Minimum # of node failures to completely stop the split/merge
in previous experiments
Fail 6-node cluster = 3
Fail two 3-node clusters = 4
Fail one 3-node cluster = 2
Fail one 3-node cluster = 2
Fail one 3-node cluster = 2
Fail standalone CM = 1
Fail standalone CM = 1
Fail one 3-node cluster = 2
Fail one 3-node cluster = 2
Triple replication with Raft
Conclusion
23
Thank you!�Q & A�
More in the paper
Ji-Yong Shin �(j.shin@northeastern.edu; https://www.jiyongshin.info)
Extra: ReCraft Membership change vs Raft
25
Extra: ReCraft Membership change vs Raft
26
Fault tolerance
27
operation | ReCraft | MultiRaft | |||
Phase 1 | Phase 2 | Phase 3 | Standalone CM | Replicated CM | |
Split | fold + 1 | N (fsub + 1) | - | 1 | fcm + 1 |
Merge | fsub + 1 | fsub + 1 | fsub + 1 | 1 | fcm + 1 |
Minimum # of node failures to completely stop the split/merge
Fail 6-node cluster = 3
Fail two 3-node clusters = 4
Fail one 3-node cluster = 2
Fail one 3-node cluster = 2
Fail one 3-node cluster = 2
Fail standalone CM = 1
Fail standalone CM = 1
Fail one 3-node cluster = 2
Fail one 3-node cluster = 2
Triple replication with Raft
Raft vs ReCraft Membership Change (see Paper for details)
28
Old Config {A, B}
New Config (NC) {A, B, C, D}
B
A
α
α
Checks for quorum Q-old:
2 of {A,B}
C
D
Joint Config (JC) {A, B} {A, B, C, D}
B
A
Checks for Q-old
Checks for Q-new
α
JC
α
JC
α
JC
α
JC
JC
1. Activates Joint
B
C
D
A
Checks for quorum Q-new:
3 of {A,B,C,D}
α
JC
α
JC
α
JC
α
JC
NC
NC
NC
NC
2. Deactivates Joint Mode
NC
2. Adjusts quorum to Q-new, � if necessary
1. Activates Q-new+
Checks for �Q-new+
❶ Min quorum size that
overlaps with Q-old
B
α
A
α
B
α
N+
C
α
N+
D
α
N+
A
α
N+
N+
❷ In general,
Q-new+ ≥ Q-new
ReCraft
Raft
Q-new+ = 3 out of {A, B, C, D}
Because Q-new+ = Q-new, second step is omitted
Checks for Q-old
❶
Checks for Q-new
❷