Lab 4 Parts 1 & 2
CSE 452 Spring 2026
Announcements
Lab 3 Wrap up
Lab 4 Overview
Goal: Build a “linearizable, sharded key-value store with multi-key updates and dynamic load balancing, similar in functionality to Amazon's DynamoDB or Google's Spanner”.
Dynamic load balancing (with a shard controller) is Part 1
Sharding is Part 2 (and extending Part 1)
Multi-key updates (transactions and two-phase commit) is Part 3
Now, how do we do all of that?
Lab 4 Part 1 - ShardMaster
Sharding: what is it?
Terminology
Lab 4
Sharding/partitioning
Peer 0
Peer 1
Peer 2
Paxos Group 0
Peer 0
Peer 1
Peer 2
Paxos Group 1
Peer 0
Peer 1
Peer 2
Paxos Group 2
Put(A, 0)
Get(B)
Append(C, 123)
Shards vs Groups
Peer 0
Peer 1
Peer 2
Paxos Group 0
Peer 0
Peer 1
Peer 2
Paxos Group 1
Peer 0
Peer 1
Peer 2
Paxos Group 2
Shard 1
(a-d)
Shard 2
(e-h)
Shard 3
(i-l)
Shard 4
(m-p)
Shard 5
(q-t)
Shard 6
(u-x)
Shard 7
(y-z)
Each group of servers is in charge of shardNum(s)
ShardMaster
ShardMaster continued
ShardMaster Application
Join
Leave
Move
Query
Join Example
Shard | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Group | null | null | null | null | null | null | null | null | null | null |
Configuration: -1 [should return Error if queried]
Join(1)
Worksheet part 1
Join Example
Shard | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Group | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Configuration: 0
Join(2)
Join Example
Shard | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Group | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
Configuration: 1
Join(5)
Join Example
Shard | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Group | 1 | 1 | 1 | 5 | 5 | 2 | 2 | 2 | 2 | 5 |
Configuration: 2
Some series of transitions occur...
Leave Example
Shard | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Group | 1 | 1 | 5 | 2 | 2 | 7 | 7 | 4 | 6 | 6 |
Configuration: 9
Leave(1)
Leave Example
Shard | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Group | 5 | 4 | 5 | 2 | 2 | 7 | 7 | 4 | 6 | 6 |
Configuration: 10
Some tips
Lab 4 Part 2 - Replication for Sharded K/V
Read the spec!
Part 2 in some bullet points
As a note, just because you pass the part 1 tests, it doesn’t mean that the ShardMaster is 100% correct. ��100% code coverage is hard.
Overview: System
Paxos Replica Group 1
ShardStoreServer
ShardStoreServer
ShardStoreServer
Paxos Replica Group 2
ShardStoreServer
ShardStoreServer
ShardStoreServer
Paxos Replica Group 3
ShardStoreServer
ShardStoreServer
ShardStoreServer
ShardMaster
Client
Client
Client
PaxosRequest: Please give me the configuration
Note: Query is a Command
ShardStoreRequest: Paxos Replica Group 2, please append(“cat”, “concatenate files and print to stdOut”)
This is running PaxosServers with a ShardMaster app
1A
1B
2: “Client requests” -> shard’s group of servers
3: Paxos within group
4: Servers will execute eventually after consensus + reply
Overview: ShardMaster
Paxos Group Responsible for ShardMaster
P
P
P
P
P
ShardMaster is an Application that is running within a PaxosServer:
new PaxosServer(address, shardMasters.clone(), new ShardMaster(numShards));
Overview: Paxos Replica Group
Paxos Replica Group
ShardStoreServer
ShardStoreServer
ShardStoreServer
P
P
P
A zoomed in version of a Paxos Replica Group, where P stands for Paxos Subnode. Lines between ShardStoreServer and its P subnode indicate message passing between the two.
Message Passing between ShardStoreServer and PaxosSubnode
ShardStoreServer
PaxosSubnode
PaxosRequest
PaxosDecision
When executing, pass up PaxosDecision using handleMessage(decision, supernodeAddress) instead of doing app.execute. [Still in order.]
decision = what to process next on ShardStoreServer, e.g. reconfig or put or something
(Almost) everything should be Paxos replicated, so wrap stuff in a PaxosRequest and call handleMessage(request, paxosSubnode)
If leader, propose to other Paxos nodes
Gets decision or learns it’s chosen
Overview: Paxos Replica Group’s Subnode
public void init() {
// Setup Paxos
paxosAddress = Address.subAddress(address(), PAXOS_ADDRESS_ID);
Address[] paxosAddresses = new Address[group.length];
for (int i = 0; i < paxosAddresses.length; i++) {
paxosAddresses[i] = Address.subAddress(group[i], PAXOS_ADDRESS_ID);
}
PaxosServer paxosServer =
new PaxosServer(paxosAddress, paxosAddresses, address());
addSubNode(paxosServer);
paxosServer.init();
...
}
Init for ShardStoreServer
Code is also in spec
This won’t work until you add the constructor, see next slide
Constructing the Paxos Sub Node
public PaxosServer(Address address, Address[] servers, Address parentAddress) {
super(address); // 'address' is the address of this node
this.servers = servers;
this.parentAddress = parentAddress;� // 'parentAddress' is the address of the 'parent' ShardStoreServer
// Again, just call handleMessage(decision, this.parentAddress);
// Note: There is no app.
}
Changes for PaxosServer
ShardStoreClient
Messages
Suggestion for implementation
handleShardStoreRequest(Request m) {
// Do some validation (e.g. is the request being sent to the correct group, etc...)
// If valid, call process (coming in a slide or two) with replicated as false
}
Suggestion for implementation (continued)
Suggestion for implementation (continued)
private void process(Command command, boolean replicated) {
if (command instanceof ShardMove) {
processShardMove((ShardMove) command, replicated);
} else if (command instanceof ShardMoveAck) {
processShardMoveAck((ShardMoveAck) command, replicated);
} else if (command instanceof NewConfig) {
processNewConfig((NewConfig) command, replicated);
} else if (command instanceof AMOCommand) {
processAMOCommand(command, replicated);
}
// Add cases for Lab 4 Part 3
else {
LOG.severe("Got unknown command: " + command);
}
}
private void processShardMove(ShardMove m, boolean replicated) {
// some checks to see if this is valid command
if (!replicated) {
// propose via paxos. process will call this method again
// with replicated=true from handlePaxosDecision
// Propose m to Paxos subnode (Some wrapping may be required)
return;
}
// the message was replicated can now act upon it
// do stuff here
}
private void handleShardMoveMessage(ShardMoveMessage m, Address sender) {
process(m.command(), false /* It’s not replicated yet… */ );
}
private void handlePaxosDecision(PaxosDecision p, Address sender) {
process(p.command(), true /*It’s replicated (from Paxos) */ );
}
Suggestion for implementation (continued)
processXX(Message m, Address a, boolean replicated) {
// ...
// What if this command has been executed?
// What if this is not an active Config?
// ...
if (!replicated) {
paxosPropose(m.command());
return;
}
// Yay, it’s replicated! Now we can act upon the message, here...
}
Idea: For a given message, you will go into the processXX method twice (once before the message is replicated (in which case you will propose it via Paxos and then return) and the second time after the message has already been replicated via Paxos
Reconfiguration and More
Overview: System
Paxos Replica Group 1
ShardStoreServer
ShardStoreServer
ShardStoreServer
Paxos Replica Group 2
ShardStoreServer
ShardStoreServer
ShardStoreServer
Paxos Replica Group 3
ShardStoreServer
ShardStoreServer
ShardStoreServer
ShardMaster
Client
Client
Client
PaxosRequest: Please give me the configuration
Note: Query is a Command
PaxosRequest: Please give me the most recent configuration
Note: Query is a Command
configNum 5
G1: shard 1, 2, 3
G2: shard 4, 5, 6
configNum 6
G1: shard 1, 2
G2: shard 4, 5
G3: shard 3, 6
Reconfiguration
What does a configuration need to move?
Be careful about when guaranteeing at-most-once semantics for key-value operations. When a server sends shards to another, the server needs to send AMOApplication state as well. Think about how the receiver of the shards should send its AMOApplication state.
Why Can’t This Happen?
Config 10
Group 1: shards 1,2,3
Group 2: shards 4,5,6
Group 3: shards 7,8,9,10
Config 11
Group 1: shards 2,3,4 (sent shard 1, received shard 4)
Group 2: shards 5,6 (sent shard 4)
Group 3: shards 8,9,10 (sent shard 7)
Group 4: shards 1,7 (received shards 1 and 7)
There is an example of what can go wrong on the slides
Servers should not query with (-1)
Instead, make sure your servers query with (config num + 1)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 1
(0, [1,2,3,4], [], {}, false)
Group 2
(0, [], [], {}, false)
ShardMasters
Note that messages need to be replicated for the state to be consistent among the servers in the groups.
(config num, shards owned, shards needed, shards to move, inReconfig)
Note: Shards created in first config
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 1
(0, [1,2,3,4], [], {}, false)
Group 2
(0, [], [], {}, false)
ShardMasters
(config num, shards owned, shards needed, shards to move, inReconfig)
Query(1)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 1
(0, [1,2,3,4], [], {}, false)
Group 2
(0, [], [], {}, false)
ShardMasters
NewConfig
(Group 1, 1, {1:[1,2], 2:[3,4]})
(config num, shards owned, shards needed, shards to move, inReconfig)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 1
(1, [1,2], [], {2:{3,4}}, true)
Group 2
(0, [], [], {}, false)
ShardMasters
(config num, shards owned, shards needed, shards to move, inReconfig)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 1
(1, [1,2], [], {2:{3,4}}, true)
Group 2
(0, [], [], {}, false)
ShardMasters
(config num, shards owned, shards needed, shards to move, inReconfig)
ShardMove(Group 2, 1, {3,4})
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 1
(1, [1,2], [], {2:{3,4}}, true)
Group 2
(0, [], [], {}, false)
ShardMasters
(config num, shards owned, shards needed, shards to move, inReconfig)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 1
(1, [1,2], [], {2:{3,4}}, true)
Group 2
(0, [], [], {}, false)
ShardMasters
(config num, shards owned, shards needed, shards to move, inReconfig)
Query(1)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 2
(0, [], [], {}, false)
ShardMasters
NewConfig
(Group 2, 1, {1:[1,2], 2:[3,4]})
(config num, shards owned, shards needed, shards to move, inReconfig)
Group 1
(1, [1,2], [], {2:{3,4}}, true)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 2
(1, [], [3,4], {}, true)
(config num, shards owned, shards needed, shards to move, inReconfig)
ShardMasters
Group 1
(1, [1,2], [], {2:{3,4}}, true)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 2
(1, [], [3,4], {}, true)
(config num, shards owned, shards needed, shards to move, inReconfig)
ShardMasters
Group 1
(1, [1,2], [], {2:{3,4}}, true)
ShardMove(Group 2, 1, {3,4})
onShardMoveTimer(1, {2:{3,4}})
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 2
(1, [3, 4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
ShardMasters
Group 1
(1, [1,2], [], {2:{3,4}}, true)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 2
(1, [3, 4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
ShardMasters
Group 1
(1, [1,2], [], {2:{3,4}}, true)
ShardMoveAck(Group 1, 1, Group 2)
Reconfiguration
Might be (slightly) different if you’re doing transactions (part 3)!
Group 2
(1, [3, 4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
ShardMasters
Group 1
(1, [1,2], [], {}, false)
Reconfig Deadlock-ish Problem
ShardMasters
Group 1
(1, [1,2], [], {}, false)
Group 2
(1, [3,4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
Query(-1)
Reconfig Deadlock-ish Problem
ShardMasters
Group 1
(1, [1,2], [], {}, false)
Group 2
(1, [3,4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
NewConfig
(Group 1, 2,
{1:[1,2,3], 2:[4]})
Reconfig Deadlock-ish Problem
ShardMasters
Group 1
(2, [1,2], [3], {}, true)
Group 2
(1, [3,4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
Reconfig Deadlock-ish Problem
ShardMasters
Group 1
(2, [1,2], [3], {}, true)
Group 2
(1, [3,4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
Query(-1)
Reconfig Deadlock-ish Problem
ShardMasters
Group 1
(2, [1,2], [3], {}, true)
Group 2
(1, [3,4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
NewConfig
(Group 2, 3,
{1:[1,2], 2:[3,4]})
Reconfig Deadlock-ish Problem
ShardMasters
Group 1
(2, [1,2], [3], {}, true)
Group 2
(3, [3,4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
Stuck in reconfig because Group 2 won’t send it shard 3 now :(
Reconfig Deadlock-ish Problem
ShardMasters
Group 1
(2, [1,2], [3], {}, true)
Group 2
(3, [3,4], [], {}, false)
(config num, shards owned, shards needed, shards to move, inReconfig)
Solution: servers should query with (config num + 1) rather than (-1)
Thus, Clients and Servers should only accept increasing configuration numbers
Clients want the most up-to-date state: Query(-1)
Servers want the state for the next configuration that they can transition to, if there is one
Reconfiguration: A Potential Workflow
Maybe: Storing Heard Requests during a Reconfig
... | NewConfig | Command C | ShardMove | |
Command C
Gets rejected because it’s in a reconfig
Enters reconfig
Gets rejected because it’s already in the log*
Finished reconfig
Consider the case when
Maybe try to store replicated Commands you receive while you’re doing a reconfig and iterating over them when you’re done with the reconfig. (Make sure they get processed in the same order.)
*=would be reproposed after garbage collection
Sending messages between groups
Misc. Tips
Some spec clarifications
“You may have implemented optimizations in lab 3 by making assumptions which were valid but do not hold for this lab. In particular, you should be very cautious about dropping proposals when Paxos is running as a sub-node. As a sub-node, Paxos should be oblivious to AMOApplication logic and should be able to decide same command for different slots. Some de-duplication at the PaxosServer level is possible, but it must be done carefully.”
A note on read-only commands
Remember to remove the apps from the shards when you move them
Why? Consider when you move a shard and don’t remove the shard, then in a future configuration, you’re supposed to get the shard back. Your code might indicate that it doesn’t need to be transferred because you already have it.
Using the visualizer
See the bottom of the lab 4 spec for more details
./run-tests.py -l 4 -d NUM_GROUPS NUM_SERVERS_PER_GROUP NUM_SHARDMASTERS NUM_CLIENTS CLIENT_WORKLOAD [... CLIENT_N_WORKLOAD] [CONFIG_WORKLOAD]
Additional notes [not mentioned in spec]