1 of 22

Section 9: Lab 4 Part 3

CSE 452 Winter 2022

2 of 22

Distributed Systems in a nutshell

3 of 22

Questions about Lab 4?

4 of 22

Goal of Lab 4 Part 3

  • Support transactions across multiple keys potentially located across different replica groups
    • Transactions can be:
      • Series of reads
      • Series of writes
      • Swaps - new one :)
  • Use Two-Phase Locking to acquire locks during transaction.
  • Why can’t we just use Paxos? Because each group is responsible for a different set of shards and we want to support transactions across shards/replica groups.

5 of 22

Two-Phase Commit

  1. Allows all participants to arrive at same conclusion.
  2. Pick a coordinator (i.e leader of the transaction)
  3. Coordinator sends out prepares to all replica groups responsible for a given key (participants)
  4. Participants check if any key involved in the transaction is locked
    1. If any key is locked, reply with an abort (wound wait)
    2. Otherwise, reply PrepareOK and lock the key to prevent reads and writes on it
  5. If Coordinator received PrepareOKs from all groups, send out commits
  6. If commit, participants perform read/write, and reply back to coordinator
  7. Coordinator replies back to the client once all CommitOks are received and Results are combined

6 of 22

Transactions

  • Transaction interface
    • Set<String> readSet()
    • Set<String> writeSet()
    • Set<String> keySet() // union of readSet and writeSet
  • Implementations of Transaction
    • MultiGet
      • Set<String> keys
      • Read one or more keys across replica groups
    • MultiPut
      • Map<String, String> values
      • Write one or more keys across replica groups
    • Swap
      • String key1, key2
      • Given two keys, swap their values
  • Your goal: implement support for these 3 types of Transactions

7 of 22

Transactions: The Normal / Happy Path from Coordinator’s perspective

  • Suggestion: client talks to coordinator, coordinator talks to participants
  • Suggestion: Transaction coordinator is the replica group with the largest group ID in the transaction
  • Client sends the Transaction to the coordinator
  • Coordinator replicates Transaction through its Paxos subnodes
    • In general, incoming messages to replica groups should be replicated (just like in part 2).
    • Use the same design of process(Command c, boolean replicated)
  • The coordinator will run 2 phase commit to execute the transaction
    • Prepare
      • Coordinator sends out prepares to all replica groups involved in the transaction
      • If coordinator receives prepareOKs from every replica group (not majority anymore, this ain’t Paxos), can proceed to commit stage
    • Commit
      • Coordinator sends out commit message to all replica groups involved in transaction
      • Participants can unlock keys after receiving commit
      • Coordinator responds to client after all CommitOKs received from participants

8 of 22

Transactions: The Normal / Happy Path from Participant’s perspective

  • Upon receiving prepare:
    • Validation: check config nums match
    • Participants check if key locked (for each key in transaction)
      • If locked, tell Coordinator to abort
        • On abort, Coordinator should send other participants abort, wait for client to retry
      • If not locked, lock keys and send PrepareOK (with read/swap data)
  • Once coordinator receives PrepareOK from every replica group (NOT just majority anymore, this still isn’t Paxos), the coordinator will send commit messages to participants
  • Upon receiving commit:
    • Replicate in Paxos
    • Verify that commit is for the correct transaction, config num matches, and attempt num matches
    • If write commit
      • Perform write and update application state
    • Update client address -> highest sequence number seen mapping
    • Might want some state to store results of transactions somewhere
    • Unlock keys
    • Respond with CommitOK

9 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

10 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

MultiGet(keys={b,d})

Client sends a request to the Coordinator. Coordinator is the max({replica group ids involved in this transaction}). In this case, R3 (for key=d) and R2 (key=b) are involved, max(3,2) = 3. So replica group 3 is coordinator.

11 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

Coordinator sends Prepare messages to all participants in the transaction. Participant should lock keys involved in transaction.

Prepare(command=MultiGet(keys={b,d})

Prepare(command=MultiGet(keys={b,d})

12 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

PrepareOK

PrepareOK

Each replica group responds with PrepareOK �(including the information on what it’s responding to and maybe some values depending on what command is being performed [read or swap])

13 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

Commit

Commit

14 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

CommitOK({b:...})

CommitOK({d: …})

15 of 22

R2

R1

R3

R{1,2,3} are replica groups. Each is a set of ShardStoreServers, each with a Paxos subnode.

Client

keys={a}

keys={b,c}

keys={d,e}

Reply({b: ..., d: ...})

16 of 22

Tips

  • Groups can be in two roles: participant and coordinator. May need to handle sending messages to your own group (i.e. if a given group is both a participant and coordinator).
    • Similar to how in Paxos nodes can be both acceptors/proposers.
  • Keep track of ongoing (active) transactions at each node �(each transaction could keep track of a set of keys that are currently locked)
  • What happens if coordinator receives an abort?
    • It may be the case that coordinator config is out of sync
      • Participants may reject prepare from coordinator (e.g., if key is already locked)
    • If coordinator learns of a new config while receiving prepare responses, the coordinator should finish any outstanding transactions, send out aborts as a response to other transactions if needed and process the config change. Then client will retry and will start a new transaction.
    • This means each transaction should have an attempt / retry # associated with it. If a transaction needs to be retried, the coordinator needs to reach out to all participants again as part of a new attempt #. Attempt # needed so participants can differentiate different transaction attempts from a given coordinator. Needs to deal with attempts from different configs.

17 of 22

Deadlock or cyclic waiting: the problem

  • Recall: deadlock across threads/processes (332, 451)
    • Thread/Process A holds resource X, waiting for resource Y
    • Thread/Process B holds resource Y, waiting for resource X
  • This can happen in 2PC.

G1

G2

G3

G4

keys={k1}

keys={k2}

keys={k3}

keys={k4}

Write(k4, k2, k1)

Write(k3, k2, k1)

18 of 22

Deadlock solution: abort (the 2PC way)

  • Release the locks and try again!
    • Aborts are sent from G1 to G4, from G2 to G3
    • G3 tells G1 and G4 tells G2 to unlock and not perform the operation
    • Try to acquire the locks again
  • Pros:
    • Feasible because it’s easy to cancel a prepared transaction
      • Rollback unnecessary because KVStore not modified
    • If nothing is locked, efficient: can send all prepares at same time
  • Cons:
    • Progress is not guaranteed; retry may collide (unlikely but possible)
    • Need to know that all groups hear Abort
      • Group may receive a Prepare without you hearing PrepareOk

19 of 22

Deadlock solution: lock ordering (the 332 way)

  • Avoid cyclic waiting altogether
    • Prepare groups in decreasing order of group id.
    • Coordinator who has prepared the lowest ID will eventually commit/abort*
  • Pros:
    • Non-coordinator participants never abort
      • Coordinator can defer coordination if they can’t lock a required key
      • Each attempt is uniquely identifiable by configNum of coordinator
        • Can avoid a separate attempt/retry num – less bookkeeping
    • Assuming the config stabilizes, progress will be made
      • “stabilizes” means everyone hears about view and moves complete
  • Cons:
    • In the happy path, need several more RTTs (cannot send at same time).
      • Efficient enough for lab 4.3, but in practice maybe avoid it.
      • Sending commits/aborts can be done all at once.

20 of 22

Aborts

  • Make sure that the aborts you receive are valid �(right transaction, config, and attempt number)
    • Valid aborts from the coordinator should unlock the locks
  • Safe to send aborts as long as you haven’t sent a prepare_ok for the current attempt
  • Coordinator should accept AbortOks to end the transaction attempt
  • Cases when aborts can happen:
    • Coordinator sends prepare for a transaction but key is locked on participant
      • Replicate abort on Paxos subnodes, send back Abort to coordinator
      • Coordinator should end up trying again with a new round of prepares
    • Participant has higher config number than coordinator’s config number
      • What should happen when the participant is in the middle of a reconfiguration?

21 of 22

Potential Workflow for part 3

Tackling the entire two-phase commit process at once can be difficult, so one suggestion of what order to get things done is

  1. Transactions that are all on one shard (no 2PC)
  2. Transactions that span shards within one group (no 2PC). [Should pass test 1 for part 3]
  3. 2-phase commit to everyone (involved), coordinator logic, transaction attempts:
    1. Sending Prepares and locking keys, PrepareOks
    2. Commits and CommitOks
    3. Sending Aborts for different configuration, already locked keys, processing AbortOks
    4. Retry transaction if necessary/possible
    5. Adding logic for interaction with reconfigurations

Note: Sending to everyone (involved) would require getting AbortOks from everyone (involved) if an abort happens

Note: 3. is the bulk of the work because you need these messages and also timers to retry as necessary (for Prepares, Commits, Aborts) [can delay timers until unreliable tests]

22 of 22

Potential Workflow for part 3 [continued]

  1. MultiGet and MultiPut through 2PC
  2. Swap through 2PC (slightly more complicated)
    1. Swap values are aggregated on prepares and written on commit
  3. Sending prepares (grabbing locks) in order to avoid live-locking
    • Ordering for who to contact next
    • Sending out Aborts to people you contacted in case of an Abort and receiving AbortOks from them