1 of 22

Section 9: Lab 4 Part 3

CSE 452 Winter 2022

2 of 22

Distributed Systems in a nutshell

3 of 22

Questions about Lab 4?

4 of 22

Goal of Lab 4 Part 3

Support transactions across multiple keys potentially located across different replica groups

Transactions can be:

Series of reads
Series of writes
Swaps - new one :)

Use Two-Phase Locking to acquire locks during transaction.
Why can’t we just use Paxos? Because each group is responsible for a different set of shards and we want to support transactions across shards/replica groups.

5 of 22

Two-Phase Commit

Allows all participants to arrive at same conclusion.
Pick a coordinator (i.e leader of the transaction)
Coordinator sends out prepares to all replica groups responsible for a given key (participants)
Participants check if any key involved in the transaction is locked

If any key is locked, reply with an abort (wound wait)
Otherwise, reply PrepareOK and lock the key to prevent reads and writes on it

If Coordinator received PrepareOKs from all groups, send out commits
If commit, participants perform read/write, and reply back to coordinator
Coordinator replies back to the client once all CommitOks are received and Results are combined

6 of 22

Transactions

Transaction interface

Set<String> readSet()
Set<String> writeSet()
Set<String> keySet() // union of readSet and writeSet

Implementations of Transaction

MultiGet

Set<String> keys
Read one or more keys across replica groups

MultiPut

Map<String, String> values
Write one or more keys across replica groups

Swap

String key1, key2
Given two keys, swap their values

Your goal: implement support for these 3 types of Transactions

7 of 22

Transactions: The Normal / Happy Path from Coordinator’s perspective

Suggestion: client talks to coordinator, coordinator talks to participants
Suggestion: Transaction coordinator is the replica group with the largest group ID in the transaction
Client sends the Transaction to the coordinator
Coordinator replicates Transaction through its Paxos subnodes

In general, incoming messages to replica groups should be replicated (just like in part 2).
Use the same design of process(Command c, boolean replicated)

The coordinator will run 2 phase commit to execute the transaction

Prepare

Coordinator sends out prepares to all replica groups involved in the transaction
If coordinator receives prepareOKs from every replica group (not majority anymore, this ain’t Paxos), can proceed to commit stage

Commit

Coordinator sends out commit message to all replica groups involved in transaction
Participants can unlock keys after receiving commit
Coordinator responds to client after all CommitOKs received from participants

8 of 22

Transactions: The Normal / Happy Path from Participant’s perspective

Upon receiving prepare:

Validation: check config nums match
Participants check if key locked (for each key in transaction)

If locked, tell Coordinator to abort

On abort, Coordinator should send other participants abort, wait for client to retry

If not locked, lock keys and send PrepareOK (with read/swap data)

Once coordinator receives PrepareOK from every replica group (NOT just majority anymore, this still isn’t Paxos), the coordinator will send commit messages to participants
Upon receiving commit:

Replicate in Paxos
Verify that commit is for the correct transaction, config num matches, and attempt num matches
If write commit

Perform write and update application state

Update client address -> highest sequence number seen mapping
Might want some state to store results of transactions somewhere
Unlock keys
Respond with CommitOK

9 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

10 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

MultiGet(keys={b,d})

Client sends a request to the Coordinator. Coordinator is the max({replica group ids involved in this transaction}). In this case, R3 (for key=d) and R2 (key=b) are involved, max(3,2) = 3. So replica group 3 is coordinator.

11 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

Coordinator sends Prepare messages to all participants in the transaction. Participant should lock keys involved in transaction.

Prepare(command=MultiGet(keys={b,d})

12 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

PrepareOK

Each replica group responds with PrepareOK �(including the information on what it’s responding to and maybe some values depending on what command is being performed [read or swap])

13 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

Commit

14 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

CommitOK({b:...})

CommitOK({d: …})

15 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

Reply({b: ..., d: ...})

16 of 22

Tips

Groups can be in two roles: participant and coordinator. May need to handle sending messages to your own group (i.e. if a given group is both a participant and coordinator).

Similar to how in Paxos nodes can be both acceptors/proposers.

Keep track of ongoing (active) transactions at each node �(each transaction could keep track of a set of keys that are currently locked)
What happens if coordinator receives an abort?

It may be the case that coordinator config is out of sync

Participants may reject prepare from coordinator (e.g., if key is already locked)

If coordinator learns of a new config while receiving prepare responses, the coordinator should finish any outstanding transactions, send out aborts as a response to other transactions if needed and process the config change. Then client will retry and will start a new transaction.
This means each transaction should have an attempt / retry # associated with it. If a transaction needs to be retried, the coordinator needs to reach out to all participants again as part of a new attempt #. Attempt # needed so participants can differentiate different transaction attempts from a given coordinator. Needs to deal with attempts from different configs.

17 of 22

Deadlock or cyclic waiting: the problem

Recall: deadlock across threads/processes (332, 451)

Thread/Process A holds resource X, waiting for resource Y
Thread/Process B holds resource Y, waiting for resource X

This can happen in 2PC.

G1

G2

G3

G4

keys={k1}

keys={k2}

keys={k3}

keys={k4}

Write(k4, k2, k1)

Write(k3, k2, k1)

References from 332: https://courses.cs.washington.edu/courses/cse332/22sp/lectures/lecture20/Lecture20.pdf, section 9 of https://homes.cs.washington.edu/~djg/teachingMaterials/spac/sophomoricParallelismAndConcurrency.pdf.

References from 451: https://courses.cs.washington.edu/courses/cse451/22sp/lectures/09.5-synch-review.pdf, “deadlock”, see the dining lawyers problem

We don’t need to think about shard movement. Most of the complexity here can be found even without thinking of shard movement.

In the example, the color of the key corresponds to the transaction for which it locked the key.

G4 is waiting for key k1 from G1; that key is locked because of the transaction coordinated by G3.
G3 is waiting for the key k2 from G2; that key is locked because of the transaction coordinated by G4.

18 of 22

Deadlock solution: abort (the 2PC way)

Release the locks and try again!

Aborts are sent from G1 to G4, from G2 to G3
G3 tells G1 and G4 tells G2 to unlock and not perform the operation
Try to acquire the locks again

Pros:

Feasible because it’s easy to cancel a prepared transaction

Rollback unnecessary because KVStore not modified

If nothing is locked, efficient: can send all prepares at same time

Cons:

Progress is not guaranteed; retry may collide (unlikely but possible)
Need to know that all groups hear Abort

Group may receive a Prepare without you hearing PrepareOk

19 of 22

Deadlock solution: lock ordering (the 332 way)

Avoid cyclic waiting altogether

Prepare groups in decreasing order of group id.
Coordinator who has prepared the lowest ID will eventually commit/abort*

Pros:

Non-coordinator participants never abort

Coordinator can defer coordination if they can’t lock a required key
Each attempt is uniquely identifiable by configNum of coordinator

Can avoid a separate attempt/retry num – less bookkeeping

Assuming the config stabilizes, progress will be made

“stabilizes” means everyone hears about view and moves complete

Cons:

In the happy path, need several more RTTs (cannot send at same time).

Efficient enough for lab 4.3, but in practice maybe avoid it.
Sending commits/aborts can be done all at once.

*you will still need to abort because it’s possible that you’re coordinating in a config numbered j but you later hear about config j+1 – the group you are currently sending to may not be sending PrepareOk because they no longer own the relevant shard, so in that case, you should abort.

This way is suggested in 332 because you don’t need to rollback (there are probably other reasons but this is one of them). You may be doing some work that you can’t easily undo; if you release the locks and try again, you would need to undo the changes. Here, there’s nothing to undo, so that’s not an advantage for this approach – but the slides explain why you might consider doing this.

This fixes the deadlock example shown earlier because it’s not possible for G3 to have locked G1 before G2. It would first lock G2, then lock G1. In that example, G1’s key k1 would not be locked, so G4 can finish its transaction; then G3 finishes its transaction once G4 finishes committing.

RTT -> round trip time

20 of 22

Aborts

Make sure that the aborts you receive are valid �(right transaction, config, and attempt number)

Valid aborts from the coordinator should unlock the locks

Safe to send aborts as long as you haven’t sent a prepare_ok for the current attempt
Coordinator should accept AbortOks to end the transaction attempt
Cases when aborts can happen:

Coordinator sends prepare for a transaction but key is locked on participant

Replicate abort on Paxos subnodes, send back Abort to coordinator
Coordinator should end up trying again with a new round of prepares

Participant has higher config number than coordinator’s config number

What should happen when the participant is in the middle of a reconfiguration?

21 of 22

Potential Workflow for part 3

Tackling the entire two-phase commit process at once can be difficult, so one suggestion of what order to get things done is

Transactions that are all on one shard (no 2PC)
Transactions that span shards within one group (no 2PC). [Should pass test 1 for part 3]
2-phase commit to everyone (involved), coordinator logic, transaction attempts:

Sending Prepares and locking keys, PrepareOks
Commits and CommitOks
Sending Aborts for different configuration, already locked keys, processing AbortOks
Retry transaction if necessary/possible
Adding logic for interaction with reconfigurations

Note: Sending to everyone (involved) would require getting AbortOks from everyone (involved) if an abort happens

Note: 3. is the bulk of the work because you need these messages and also timers to retry as necessary (for Prepares, Commits, Aborts) [can delay timers until unreliable tests]

22 of 22

Potential Workflow for part 3 [continued]

MultiGet and MultiPut through 2PC
Swap through 2PC (slightly more complicated)

Swap values are aggregated on prepares and written on commit

Sending prepares (grabbing locks) in order to avoid live-locking

Ordering for who to contact next
Sending out Aborts to people you contacted in case of an Abort and receiving AbortOks from them