Published using Google Docs
DSA Fall 2024 Homework 3
Updated automatically every 5 minutes

Homework/Project 3 - Replicated Key-Value Store with Paxos                                   

CSCI-4510/6510 – Fall 2024  

Assigned: October 24, 2024

Project due: Friday, November 15, 2024 at 11:59pm in Submitty and Gradescope

Teams formed in Submitty by: November 8, 2024

You may work in groups of one or two for the coding portion.

The problem set is to be done individually.

1. Overview

In Homework 2, you implemented a replicated key-value store using the replicated log and dictionary algorithm by Wuu and Bernstein. This replication algorithm has the benefits of simplicity and low message overhead, but because the logs were not closely synchronized, it was complicated to identify and reconcile conflicting updates. Thus, you needed to implement mechanisms at the application level to determine which operations should be executed and in what order.

For this homework, you will implement the key-value store using the Synod algorithm to maintain a replicated log of events. The log replicas will be maintained in a way so that it is not possible to execute conflicting updates. The log will store insert and delete events, and to simplify things, the log will store append events (rather than converting an append command into a delete followed by an insert).

The distributed system consists of a set of sites. Each site is identified by a site ID, a short alphanumeric string. Each site stores a copy of the key-value store (full replication). Each site acts as a Proposer, accepting commands from standard in and issuing proposals for log entries. Each site also acts as an Acceptor, helping to determine which value is assigned to each log slot, and each site acts as a Learner, storing a copy of the replicated log. When an event is added to a site’s log, the site should also update its copy of the key-value store.

As in the previous homeworks, each key is a string of at most 32 alphanumeric characters. Each value is a non-empty list of strings, where each string has at most 32 alphanumeric characters.

2. Implementation Details

When a user enters a command at a site, the site’s Proposer should execute the following steps:

  1. If there are any holes in the site’s log, execute the Synod algorithm for those log slots to learn the missing values. Update the copy of the key-value store with any missing information, if necessary.
  2. Check whether the command can be executed based on the local state of the updated repository.  If the command can be executed, then execute the Synod algorithm as the Proposer to propose the new event for its first empty log slot.
  3. When a Learner learns about a new log entry, it updates its copy of the key-value store accordingly.

In the case of timeouts when waiting for “promise” or “ack” messages, the Proposer should retry the proposal a maximum of two times (three tries total) for the same log slot. The Proposer should increment its proposal number in between each try, as specified in the Synod algorithm. Even if there is no timeout, due to concurrent proposals, the site’s proposal may not be committed to the log. If the site’s proposal is not added to the log, the application should notify the user according to the UI specification (see Section 3). The Proposer should not try to propose the operation for any other log slot.

Note 1: You will need to use multiple threads or asynchronous IO for proposals. A Proposer should not try to contact the Acceptors one at a time, but rather it should send messages to all Acceptors at once and wait for responses only from a majority.

Note 2: Your site should be able to act as a Proposer, an Acceptor, and a Learner simultaneously.

Learning Committed Values: You can use either of the two strategies discussed in class to inform the Learners of committed values. If you use a “Distinguished Learner”, the site that is the Proposer for the committed value can also be the “Distinguished Learner”.

Stable Storage: The log should be stored in stable storage. Your application should also store the Acceptor state variables in stable storage, as specified in the Synod Algorithm. Do not store any other information in stable storage.

Site Failures and Recovery: Your application should follow the Synod algorithm, meaning it should tolerate message loss, site failures, and recovery. The application should be able to create new log entries (under the liveness conditions) so long as a majority of sites are active.

A site may miss commit messages for some log entries due to failure. When the site recovers, it should try to learn the values for these missed log entries. You must design and implement a recovery algorithm to do this. Your algorithm should ensure the safety and liveness guarantees described in the Paxos paper.

After a site recovers and learns the missing log entries, it should recreate its in-memory data structure for the key-value store by “replaying” the log, i.e., applying the operation in each log entry in order.

Note 3: A site may also miss log entries that are committed when it is active, since messages may be lost. When a site learns a committed value for log slot 5, for example, it should fill in any holes (missing entries) in log slots 1 - 4 by running an instance of the Synod algorithm for each missing entry. It should issue at most three proposals to fill any hole. If any hole cannot be filled, the site should inform the user the operation cannot be completed.

Paxos Optimization: You will not implement the full Paxos algorithm with leader election, but you will implement a smaller optimization. When a site “wins” a log slot k, meaning the site’s proposed value is the one that is committed to log slot k, this site can skip the “prepare-promise” phase for its first proposal for log entry k+1 (if it has a proposal for this log slot). In essence, this site’s Proposer has implicit promises from all Acceptors for its first proposal for log entry k+1. This first proposal should have proposal number 0.

3. Key-Value Store UI

Each site should accept user input from standard in. Your program should support the following commands:

(a) To execute a put operation, the user enters:

put <key> <value>

for example:

put hello world

If the event for this command is added to the site’s log, the site should print out

put completed

If the operation cannot be completed, for example, because the key already exists in the key-value store, or if the command is not added to the log, the site should print out:

unable to execute put

(b) To execute a get operation, the user enters:

get <key>

For example,

get hello

If the site’s local copy of the key-value store contains the key, the site should print out:

get <key> returned <value>

For example, for a one-element list:

get hello returned world

And for a two element list:

get hello returned one, two

If the site does not have the key in its key-value store, it should print out:

get <key> returned null

(c) A user can append a value to the list corresponding to a key by entering the following command:

append <key> <value>

For example, if the key hello already stores a single value world, after the command

append hello goodbye

the value associated with the key hello should be [world, goodbye]. If there is no value for the associated key, the append operation should be replaced with a put operation.

Once the site completes the append operation, it should print out:

append completed

If the command is not added to the site’s log, the site should print out:

unable to execute append

(d) To remove an entry from the key-value store, the user enters

remove <key>

For example,

remove hello

If the operation is executed successfully, the site should print out:

remove completed

If the command cannot be completed because the key-value store does not contain the key or because the command is not added to the site’s log, the site should print out:

unable to execute remove

(e) To allow us to introspect the contents of the key-value store, when the command  kv is entered, the site should print out the contents of its local copy of the key-value store, using the same format as for Homeworks 1 and 2.

(f) The site should support the following command to display the contents of the log:

log

In response to this command, the system should list the log contents, sorted in ascending order of log slot. For example, if the commands applied, in order, are

put lucky charms

put froot loops

append lucky cereal

remove froot

The output of the log command should be:

insert lucky [charms]

insert froot [loops]

append lucky [cereal]

delete froot [loops]

4. Messaging Requirements

To verify that your program is sending and receiving the correct messages, your site should print which messages are sent and received to stderr. This output will be manually graded. There is no required format, but a suggested format is something like:

sending prepare(100) to all sites

received promise(99, ‘insert lucky [charms]’) from <site_id>

Please do not include extraneous debug information in your stderr output.

5. Implementation Details

5. Submission and Grading

The problem set is worth 25% of the grade, and the coding portion is worth 75% of the grade. Solutions to the problem set must be submitted in Gradescope.

To submit your homework source code, you must follow the submission steps in Submitty. This will check out the project files from your git repository and copy them to Submitty for grading. The project grades will be based on both public and hidden autograding tests.

Note that to run the hidden tests, I need to regrade all project submissions, so the public tests will be rerun. Because of this, you should make sure that your code gives repeatable results.

6. Late Policy

Submissions will be accepted up to 24 hours late for a 20% grade penalty.