1 of 47

Primary-Backup Protocol

CSE 452 Spring 2023

2 of 47

Roadmap

5 rules of Primary-Backup

  • Examples, What-Ifs, Hints/Tips

Design Docs

  • Why, How, Example!

3 of 47

General Flow

Client

Primary

Backup

View Server

Pool of Idle Servers

What messages do the arrows represent?

4 of 47

The 5 Sacred Scrolls of Primary-Backup

5 of 47

Rules:

  • Primary in view i+1 must have been backup or primary in view i
  • Primary must wait for backup to accept/execute each op before doing op and replying to client
  • Backup must accept forwarded requests only if view is correct
  • Non-primary must reject client requests
  • Every operation must be before or after state transfer (Problem with processing requests during a state transfer)

6 of 47

Rule 1: What if the new Primary was not in the previous view?

Client

Server A

{foo: bar}

Server B

{foo: bar}

Server C

{}

Server D

{}

Append(foo->bar)

Append(foo->bar)

AppendRes(bar)

AppendRes(bar)

View 2

Primary: A

Backup: B

7 of 47

Rule 1: What if the new Primary was not in the previous view?

Client

Server A

{foo: bar}

Server B

{foo: bar}

Server C

{}

Server D

{}

View 3

Primary: C

Backup: D

Get(foo)

KeyNotFound()

Get(foo)

KeyNotFound()

8 of 47

Rule 1: Solution + Takeaway

The new primary MUST have been in the previous view

View i

Primary: A

Backup: B

View i+1

Primary: A

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: A

Some Possible? Transitions

9 of 47

Rule 1: Solution + Takeaway

The new primary MUST have been in the previous view

View i

Primary: A

Backup: B

View i+1

Primary: A

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: A

Some Possible? Transitions

10 of 47

Rule 1: Solution + Takeaway

The new primary MUST have been in the previous view

View i

Primary: A

Backup: B

View i+1

Primary: A

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: A

Some Possible? Transitions

11 of 47

Rule 1: Solution + Takeaway

The new primary MUST have been in the previous view

View i

Primary: A

Backup: B

View i+1

Primary: A

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: NULL

View i

Primary: A

Backup: B

View i+1

Primary: B

Backup: A

Some Possible? Transitions

12 of 47

Rules:

  • Primary in view i+1 must have been backup or primary in view i
  • Primary must wait for backup to accept/execute each op before doing op and replying to client
  • Backup must accept forwarded requests only if view is correct
  • Non-primary must reject client requests
  • Every operation must be before or after state transfer (Problem with processing requests during a state transfer)

13 of 47

Rule 2: What if we did not forward requests before executing them?

Client

Server A

{foo: bar}

Server B

{}

Server C

{}

Append(foo->bar)

Append(foo->bar)

AppendOk(bar)

View 2

Primary: A

Backup: B

14 of 47

Rule 2: What if we did not forward requests before executing them?

Client

Server A

{foo: bar}

Server B

{}

Server C

{}

Get(foo)

View 3

Primary: B

Backup: C

KeyNotFound

Get(foo)

KeyNotFound

15 of 47

Rule 2: Solution

Primary must wait for backup to accept/execute each op before doing op and replying to client

  • There could be optimizations you could make here, but…why not just keep it simple :)

16 of 47

Rule 2: How does this fix anything?!?

Client

Server A

{}

Server B

{}

Server C

{}

Append(foo->bar)

Append(foo->bar)

AppendOk(bar)DOES NOT EXIST!!

View 2

Primary: A

Backup: B

17 of 47

Rule 2: Another (interesting) issue

Server C

{}

View 2

Primary: A

Backup: B

Client

Server A

{foo: bar}

Server B

{foo: bar}

Append(foo->bar)

Append(foo->bar)

AppendRes(bar)

AppendRes(bar)

View 2

Primary: A

Backup: B

View 2

Primary: A

Backup: B

View 2

Primary: A

Backup: B

18 of 47

Rule 2: What if we did not forward requests before executing them?

Client

Server A

{foo: barbaz}

Server B

{foo: bar}

Server C

{foo: bar}

Append(foo->baz)

View 3

Primary: B

Backup: C

AppendRes(barbaz)

Suppose both the client and Server A do not get the new View…

Append(foo->baz)

View 2

Primary: A

Backup: B

View 2

Primary: A

Backup: B

View 3

Primary: B

Backup: C

19 of 47

Rule 2: What if we did not forward requests before executing them?

Client

Server A

{foo: barbaz}

Server B

{foo: bar}

Server C

{foo: bar}

View 3

Primary: B

Backup: C

When the client eventually

gets the new view…

Get(foo)

GetRes(bar)

GetRes(bar)

Get(foo)

View 3

Primary: B

Backup: C

View 3

Primary: B

Backup: C

20 of 47

Rule 2: How does this fix anything?!?

Client

Server A

{foo: bar}

Server B

{foo: bar}

Server C

{foo: bar}

View 3

Primary: B

Backup: C

Suppose both the client and Server A do not get the new View…

Client never gets a response :)

Append(foo->baz)

Append(foo->baz)

21 of 47

Rule 2: Takeaway

Forwarding!

  • Only execute a command on the Primary after receiving a response from the Backup during forwarding
  • The backup is at most one command ahead of the primary at all times

Hint: You may want to use a timer to ensure forwarded methods get processed

22 of 47

Rules:

  • Primary in view i+1 must have been backup or primary in view i
  • Primary must wait for backup to accept/execute each op before doing op and replying to client
  • Backup must accept forwarded requests only if view is correct
  • Non-primary must reject client requests
  • Every operation must be before or after state transfer (Problem with processing requests during a state transfer)

23 of 47

Rule 3: Problem & Solution

View 2

Primary: A

Backup: B

View 3

Primary: B

Backup: C

View 4

Primary: C

Backup: D

View 5

Primary: D

Backup: B

Forwarded

Hint: Have the view number on all messages

(Be careful of state-transfer messages)

24 of 47

Rules:

  • Primary in view i+1 must have been backup or primary in view i
  • Primary must wait for backup to accept/execute each op before doing op and replying to client
  • Backup must accept forwarded requests only if view is correct
  • Non-primary must reject client requests
  • Every operation must be before or after state transfer (Problem with processing requests during a state transfer)

25 of 47

Rule 4: What if a non-primary responds to a client?

Server C

{}

View 4

Primary: B

Backup: null

Client

Server A

{}

Server B

{}

Put(a->bar)

View 2

Primary: A

Backup: B

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 3

Primary: B

Backup: C

26 of 47

Rule 4: What if a non-primary responds to a client?

Server C

{}

View 4

Primary: B

Backup: null

Client

Server A

{a:bar}

Server B

{}

Put(a->bar)

View 2

Primary: A

Backup: B

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 3

Primary: B

Backup: C

27 of 47

Rule 4: What if a non-primary responds to a client?

Server C

{}

View 4

Primary: B

Backup: null

Client

Server A

{a:bar}

Server B

{}

PutOk()

View 2

Primary: A

Backup: B

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 3

Primary: B

Backup: C

28 of 47

Rule 4: What if a non-primary responds to a client?

Server C

{}

View 4

Primary: B

Backup: null

Client

Server A

{a:bar}

Server B

{}

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 3

Primary: B

Backup: C

29 of 47

Rule 4: What if a non-primary responds to a client?

Server C

{}

View 4

Primary: B

Backup: null

Client

Server A

{a:bar}

Server B

{}

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 3

Primary: B

Backup: C

Get(a)

30 of 47

Rule 4: What if a non-primary responds to a client?

Server C

{}

View 4

Primary: B

Backup: null

Client

Server A

{a:bar}

Server B

{}

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 4

Primary: B

Backup: null

View 3

Primary: B

Backup: C

KeyNotFound()

31 of 47

Rule 4: Takeaway

  • Client is not guaranteed that they will be talking to the primary
    • Could be on an outdated view
  • Servers should only process/respond to client if they are the primary

32 of 47

Rules:

  • Primary in view i+1 must have been backup or primary in view i
  • Primary must wait for backup to accept/execute each op before doing op and replying to client
  • Backup must accept forwarded requests only if view is correct
  • Non-primary must reject client requests
  • Every operation must be before or after state transfer

33 of 47

Recap: State Transfer

  • Why state transfer?
    • To bring new backup up to date
  • When do we do a state transfer?
    • If primary receives a new view with a new backup - need to do state transfer
    • V1: {S1, S2}, V2: {S1, S3}
  • What do we transfer?
    • Application (In lab 2, that’s your entire AMOApplication) + Metadata (Current view…etc)

34 of 47

Recap: State Transfer (Subtle details)

  • Pings from primary during state transfer should reflect old view.
    • Primary should still ping to view server so view server knows primary is alive.
    • The primary only moves to new view once state transfer is complete.
  • Backup can receive duplicated/late state transfer messages �(i.e. if state transfer message is duplicated/come later).
    • Need to ensure that state on backup is only overwritten once per view change
    • What happens if the state transfer ack gets dropped?
      • Resend state transfer message, but backup should not overwrite their state if they have already complete thed state transfer

35 of 47

Rule 5: What if we processed requests during state transfer?

Primary

Backup

StateTransfer((“a” -> “foo”))

“a” -> “foo”

Client 1

Put(“a”, “bar”)

Put(“a”, “bar”)

Put(“a”, “bar”)

“a” -> “foo”

“a” -> “bar”

“a” -> “bar”

StateTransferAck

36 of 47

Rule 5: Takeaway

  • State transfer will overwrite the state of the new backup so don’t process requests during a state transfer, otherwise the results could be overwritten.

37 of 47

Questions So Far?

38 of 47

Design Doc Tips

39 of 47

Why Design Docs?

  • Overwhelmingly positive feedback last quarter
  • Treat the design doc as the challenging part of the labs
  • Our distributed system is a state machine with 2 possible actions
    • A server receives a message
    • A server fires a timer
  • Design doc defines what we do for each of those actions
  • A comprehensive design doc will in theory catch all of our edge cases!

40 of 47

Things to Design

  • Preface & Conclusion help us set up the problem
    • What are the cases we need to handle? What can we ignore?
    • Fault model for this class is defined for us
  • Protocol defines how we achieve our goals
  • Correctness/Liveness Analysis helps convince us our design actually works

41 of 47

Good Design Doc Practices

  • You should be able to hand another student/TA your design doc and they should be convinced your design works
  • Keep our designs application & language agnostic
  • Be concise and use bullet points!

42 of 47

Example Protocol for Atleast Once RPC (Lab 1 Part 2)

  • Kinds of nodes:
    • There are two kinds of nodes: clients and servers.
    • There can be any number of clients and any number of servers.
  • State at each kind of node:
    • Client:
      • Sequence number
        • What is it? Integer, sequence # of our current request
        • Starts at 0, increases by 1 when client sends a command
      • Current Request
        • What is it? Request, the current request we are working on
        • Starts as null, gets set when client sends a command
      • Last Reply
        • What is it? Reply, the reply to the current we are working on
        • Starts as null, get set to the reply when we hear from the server. Reset to null when we send a new command
    • Server:
      • No evolving state (Stores only an application)

43 of 47

Example Protocol for Atleast Once RPC (Lab 1 Part 2)

  • Messages:
    • Request Message
      • Source: Clients
      • Destination: Servers
      • Contents
        • Command to be executed
        • A sequence number
      • When is it sent?
        • Whenever a client wants to invoke a command
        • Client sets it’s current request to this message, it’s last reply to null, and then sends this message to the server
        • Client sets the client timer to resend the message
      • What happens at the destination when it is received?
        • The server passes the command to the server’s application and executes it
        • The server takes the result from the application, wraps it in a Reply message and sends it to the client that sent the Request message

44 of 47

Example Protocol for Atleast Once RPC (Lab 1 Part 2)

  • Messages:
    • Reply message
      • Source: Servers
      • Destination: Clients
      • Contents:
        • The reply from the application
        • A sequence number (integer)
    • When is it sent?
      • When a server receives a request message, it executes the command in the request and then responds to the client with a Reply message
    • What happens at the destination when it is received?
      • The client checks if the Reply message corresponds to the Request it is currently working on.
      • If it matches, the client sets it’s Reply field to this reply

45 of 47

Example Protocol for Atleast Once RPC (Lab 1 Part 2)

  • Timers:
    • RequestRetransmit
      • Set by clients
      • Contents: a sequence number (integer)
      • Set whenever a client sends a new RPC
      • What happens when it fires?
        • The client checks if the timer's sequence number is the same as the sequence number of the current request on the client. If it isn’t, ignore the timer.
        • Otherwise, the client retransmits the current request, and resets the timer

46 of 47

Where to Start

  • Messages
    • Request / Reply
    • “Copy”/Forward message from primary to backup
    • State transfer
    • Acks?
    • Others?
  • Timer handlers
    • Ensure proper checks on timer handler, avoid calling set() if response to the message that the timer was set for was received successfully
  • States needed to keep for PBClient/PBServer
    • AMOApplication (only server)
    • Sequence number (on client)
    • Current View

47 of 47

Analysis: Processing Multiple Requests Simultaneously

Primary

Backup

Put(“a”, “foo”)

Put(“a”, “foo”)

“a” -> “foo”

Client 1

Put(“a”, “foo”)

Client 2

Put(“a”, “bar”)

Put(“a”, “bar”)

Put(“a”, “bar”)

“a” -> “bar”

“a” -> “bar”

“a” -> “foo”