Advanced Computer Systems Syllabus

Spring 2019

This course will cover seminal recent research papers across topics in distributed computer systems, with a focus on managing big data. Topics may include communication paradigms, process management, naming, synchronization, consistency and replication, fault tolerance, storage architectures, high-performance file systems, data provenance, and next-generation storage devices and architectures, including those at Google, Yahoo, and Amazon.  Throughout the course, we will discuss the tradeoffs made between performance, reliability, scalability, robustness, and security.  

Instructor

Prof. Avani Wildani (Dr. Will)

http://www.mathcs.emory.edu/~avani          Email: avani@mathcs.emory.edu

Office:  MSC W412                                        Office Hours: By appt         

Textbook

Principles of Computer Systems Design: An Introduction” - Saltzer et al.        

Occasional exercises will be assigned out of this book, and it’s a good first resource         

Grading                                                

Summary Writing Instructions                                        

                                                                   

A major component of this course will be the in-class discussion of papers on research in operating systems. Typically, you will need to read one paper per class; the reading list is available online, and all of the papers are available as links from the reading list. These papers should be read carefully, and a short (1-2 paragraph) summary of each article and a few questions or insightful comments about the material (at least 3 per paper) prepared for the class meeting in which the paper will be discussed. The summary of each article consists of brief answers to the following five questions and three comments or questions about the paper (the sixth item):

  1. What is the problem the authors are trying to solve
  2. Why is it interesting, relevant, and/or important?
  3. What other approaches or solutions existed at the time that this work was done?
  4. What was wrong with the other approaches or solutions?
  5. What is the authors' approach or solution, and how does it compare to earlier approaches or solutions?
  6. Three or more comments/questions about the paper.                                                

Class Project

                                        

Students in the class must complete a research project in the general area of operating systems. Both a paper describing the project and a poster presentation will be required. This project should be the results of experimental research (strongly preferred) or a strong survey of prior art in a focused area.

Your project should take approximately 60–80 hours over the course of the quarter, including time to read background material, build and run your experiments, and write up your results.

If you want to work with someone else in the class on your project, you may do so with prior approval (i.e., please see me before doing this). If you work with a partner, the expectations for the scope of your project will be adjusted accordingly.

ALL PAPERS MUST BE IN LATEX

There will be checkpoints during the semester to keep you on schedule to complete your project. Checkpoints:

1/29 :                   Project Proposal Due

2/12:                Annotated Bibliography Due

3/5:                Research Plan Due

4/8:                   Preliminary Graphs Due

5/8:                Paper Due

Note: this is the LAST DAY of the term.  I am literally giving you all of the time I can, so there can be no extensions.

What is a Proposal?

In your project proposal, you will put together a few sentences (no more than a paragraph) that define a problem you find interesting and propose a project that addresses some part of that problem.  This could include:

What is a Bibliography?

For your annotated bibliography, I want to see a list of sources and a sentence or two about what the paper/book/thesis/article contains that is relevant to your work.  Since your paper will be in LaTeX, I recommend making a BibTeX bibliography.  If you do, it’s perfectly fine to turn in the raw .bib file with your annotations in a @comment{} field below each entry.

For a 1-person project, I expect ~30 sources, but this number will vary based on topic area.

What is a Research Plan?

This will be a *detailed plan* for implementing your system.  At this point, you should have an outline of the paper, with your introduction and background sections written.  This is, in effect, your “Methods” section.  It should include the data you intend to gather, the specific experiments you intend to run, the graphs you expect these experiments to produce, and possible extensions if these experiments do not go as planned.  This should be at least 2 pages.

What are Preliminary Graphs?

These are a first stab at solving the problem, likely with several simplifying assumptions.  This is a good way to verify that your approach is sound before spending onerous amounts of computation and programming time.  

What goes on the poster?

Your poster should be an academic research poster that presents your problem statement, core ideas (e.g. an architecture diagram, a single set of equations, or a *very* succinctly written bulleted list), relevant graphs that support your hypothesis, and a few bullet points to help viewers interpret your graphs.  There will be an in class poster session on the last day of class that will be open to the entire department, so make certain you can discuss your ideas with clever non-experts!

What goes in the paper?

You will have read many research papers by this point in the term, and that is precisely what you will now be writing.  An academic research paper typically has the following sections:

Course Schedule

Date

Lecture

Readings

Presenter

1/15

Intro

Ch. 1

Avani

1/17

Systems Background

Ch. 1

The UNIX Time Sharing System 

Avani

(slides)

1/22

Naming in Systems

Ch. 2.2, 2.3

Blockstack: A Global Naming and Storage System Secured by Blockchains

Zexin

1/24

Modularity in Networks (Client/Server model)

Ch. 4.1, 4.2

END-TO-END ARGUMENTS IN SYSTEM DESIGN

Yue

1/29

SNOW DAY

1/31

Modularity in memory and Virtualization

Ch. 5.1

Exokernel: An Operating System Architecture for Application-Level Resource Management 

Xiaoyuan

2/5

PROJECT PROPOSALS DUE

Modularity EVERYWHERE (2 paper day!)

Ch 5.3

LegoOS

Efficient virtual memory for big memory servers 

Si

Jianqiao

2/7

Virtual Machines / Hypervisors (2 paper day!)

Ch. 5.2, 5.8

Xen

The Design and Implementation of Hyperupcalls

Avani

Yue

2/12

Containerization

SOCK: Rapid Task Provisioning with Serverless-Optimized Containers

Cntr: Lightweight OS Containers

Si

Xiaoyuan

2/14

Control Planes (2 paper day!)

Arrakis: The Operating System Is the Control Plane

Mirador: An Active Control Plane for Datacenter Storage

Jianqiao

Zexin

2/19

Break Day!

2/21

Scheduling, Performance

ANNOTATED BIBLIOGRAPHY DUE

Ch. 6.3, 6.1

Pliant: Leveraging Approximation to Improve Datacenter Resource Efficiency

Ziwei

2/28

SDNs

(Guest Speaker: TBA)

Ch. 7 overview, 7.1

OpenFlow: Enabling Innovation in Campus Networks 

3/5

Networking

Ch. 7.2-7.5 (skim 3-5)

Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network 

Yuchun

3/7

Cloud Computing

MapReduce: Simplified Data Processing on Large Clusters

Large-scale cluster management at Google with Borg

Yue

Yuxin

3/12, 3/14

SPRING

 BREAK

Work on your projects!

3/19

Blockchains

Bitcoin: A Peer-to-Peer Electronic Cash System 

3/21

Atomicity / Consistency / Concurrency 

(The Paxos Lecture)

Ch. 9.1, 9.2, 9.4

Paxos Made Simple

In Search of an Understandable Consensus Algorithm 

The Part-Time Parliament (Original Paxos paper: optional)

Ziwei

Avani

3/26

Blockchains

Bitcoin: A Peer-to-Peer Electronic Cash System 

Zexin

3/28

Data Management and File Systems

The Freeze-Frame File System (optional)

Dynamo: Amazon’s Highly Available Key-value Store 

Jianqiao

4/2

Caching

RESEARCH PLAN DUE

RobinHood: Tail Latency Aware Caching -- Dynamic Reallocation from Cache-Rich to Cache-Poor

ARC: A Self-Tuning, Low Overhead Replacement Cache 

Yuxin

Xiaoyuan

4/4

Data in the Cloud

Ceph

DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching

Yuchun

Yuxin

4/9

DHTs, Object Storage, Tracing

Finding a needle in Haystack: Facebook’s photo storage

Canopy: An End-to-End Performance
Tracing And Analysis System

Ziwei

Xiaoyuan

4/11

Fault Tolerance

A case for redundant arrays of inexpensive disks (RAID)

Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity

Yuchun

Jianqiao

4/16

Distributed Systems

Tango: distributed data structures over a shared log

Spanner: Google’s Globally-Distributed Database 

Yuxin

Yue

4/18

Workload Analysis

On the diversity of cluster workloads and its impact on research results

Si

4/23

Systems for AI

Ray: A Distributed Framework for Emerging AI Applications

Avani

4/25

LAST DAY!!

Poster Presentations

FINAL PAPER DUE