CS525 UIUC SP21: Reading List
Compiled by Prof. Indranil Gupta. http://indy.cs.illinois.edu/
Main Course Website: https://courses.engr.illinois.edu/cs525/sp2021
SP21 Schedule: https://go.cs.illinois.edu/CS525SP21Schedule (see for slides on presentation/discussion)
Current Document: https://go.cs.illinois.edu/CS525SP21ReadingList
There are a total of 22 sessions. The presentation schedule for these can be found at the main CS525 SP21 Schedule Page.
For each session below, there are:
1) Two “Main Papers” (these are the presentation papers), and
2) Some “Additional Readings” papers.
These papers have been carefully selected, and the schedule is not negotiable. Presenters must present the two “Main Papers” (they can also additionally cover the other papers listed in the session). Other students must review any one of the two “Main Papers.”
Finally, there are more papers available on topics, which we could not cover given the limited number of weeks in SP21, at the SP18 CS525 website.
Table of Contents (Clickable!)
***** Indy’s Lectures [1/26-2/11] 2
***** Federated ML [2/16] 2
***** Elastic ML [2/18] 3
***** IoT + Distributed Systems [2/23] 3
***** IoT Failure Handling [2/25] 4
***** Server Less and Achieve More [3/2] 4
***** Disaggregate it All [3/4] 5
***** In the Blink of an Eye [3/9] 5
***** Planet-Scale Systems [3/11] 6
***** The Edge [3/16] 6
***** Edge-Cloud/Hybrid [3/18] 7
***** ML Scheduling and Frameworks [3/23] 7
***** ML Parameter Management [3/25] 8
***** Time is of the Essence [3/30] 9
***** Rethink. All. The. Assumptions. (Twist in the Classical Tale) [4/1] 10
***** Verification Approaches [4/6] 10
***** Video Killed the Radio Star [4/8] 11
***** Consensus, and Consistency [4/15] 11
***** Industry [4/20] 12
***** Grounding It (Measurement Studies) [4/22] 13
***** Transactions [4/27] 14
***** Miscellaneous [4/29] 14
***** Wrap Up [5/4] 15
***** Indy’s Lectures [1/26-2/11]
1/26 Introduction
1/28 Before, There Were Clouds
- Historical reflections: The rise, fall, and resurrection of software as a service, Martin Campbell-Kelly, CACM, May 2009.
- MapReduce: Simplified Data Processing on Large Clusters, J. Dean et al, OSDI 2004 (Google)
- Grid: a new infrastructure for 21st century science, I. Foster, Physics Today, 2002.
- A view of cloud computing, Michael Armbrust, Michael Armbrust , Armando Fox, Armando Fox, Rean Griffith, Rean Griffith, Anthony Douglas Joseph, Anthony D. Joseph, Randy Katz, Randy Katz, Andy D Konwinski, Andy Konwinski, Gunho Lee, Gunho Lee, David Andrew Patterson, David Patterson, Ariel Rabkin, Ariel Rabkin, Ion Stoica, Matei Zaharia, vol. 53, no. 4, CACM, Apr 2010.
- Larry Ellison's Rant on Cloud Computing (Youtube video)
2/2 P2P Systems
- The Gnutella protocol specification v 0.4
- Chord: a scalable peer-to-peer lookup service for Internet applications, I. Stoica et al, SIGCOMM 2001
- Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems, A. Rowstron et al, Middleware 2001.
- Kelips: Building an Efficient and Stable P2P DHT through Increased Memory and Background Overhead, I. Gupta. K. Birman, P. Linga, A. Demers, R. van Renesse, IPTPS 2003 (Springer).
2/4 Key-Value Stores
2/9 Distributed Algorithms, Sensor Networks
2/11 Orientation To Upcoming Topics
(No papers, Indy will present an orientation of upcoming topics at a very high level)
***** Federated ML [2/16]
Main Papers
- Towards Federated Learning at Scale: System Design, Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konečný, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander, MLSys 2019.
- Throughput-Optimal Topology Design for Cross-Silo Federated Learning, Othmane Marfoq, Chuan Xu, Giovanni Neglia, Richard Vidal, NeurIPS 2020. Arxiv version
Additional Readings
- PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud, Liang Luo, Peter West, Jacob Nelson, Arvind Krishnamurthy, Luis Ceze, MLSys 2020.
- Federated Optimization in Heterogeneous Networks, Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith, MLSys 2020.
***** Elastic ML [2/18]
Main Papers
- Serving DNNs like Clockwork: Performance Predictability from the Bottom Up, Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace, OSDI 2020.
- Resource Elasticity in Distributed Deep Learning In Efficient Model Training, Andrew Or, Haoyu Zhang, Michael Freedman, MLSys 2020.
Additional Readings
- HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees, Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Fan Yang, Lidong Zhou, Mao Yang, Francis C.M. Lau, Yuqi Wang, Yifan Xiong, Bin Wang, OSDI 2020.
- KungFu: Making Training in Distributed Machine Learning Adaptive, Luo Mai, Guo Li, Marcel Wagenlander, Konstantinos Fertakis, Andrei-Octavian Brabete, Peter Pietzuch, OSDI 2020.
***** IoT + Distributed Systems [2/23]
Main Papers
Additional Readings
- Medley: A Novel Distributed Failure Detector for IoT Networks, Rui Yang, Shichu Zhu, Yifei Li, Indranil Gupta, Proc. ACM/IFIP Middleware, 2019.
- Fault-tolerant consensus in directed graphs and convex-hull consensus, Lewis Tseng, PhD Thesis, UIUC, 2018.
***** IoT Failure Handling [2/25]
Main Papers
- Rivulet: a fault-tolerant platform for smart-home applications, M. S. Ardekani, R. P. Singh, N. Agrawal, D. B. Terry, R. O. Suminto, Middleware 2017.
- IoTRepair: Systematically addressing device faults in commodity IoT, Michael Norris, Berkay Celik, Prasanna Venkatesh, Shulin Zhao, Patrick McDaniel, Anand Sivasubramaniam, Gang Tan, IoTDI 2020. [arxiv version]
Additional Readings
- Transactuations: where transactions meet the physical world, A. Sengupta, T. Leesatapornwongsa, M. S. Ardekani, C. A. Stuardo, Usenix ATC 2019.
- Home, SafeHome: Smart Home Reliability with Visibility and Atomicity, Shegufta Bakht Ahsan, Rui Yang, Shadi A. Noghabi, Indranil Gupta, (To Appear), Eurosys 2021. [arxiv version]
***** Server Less and Achieve More [3/2]
Main Papers
- A Fault-Tolerance Shim for Serverless Computing, Vikram Sreekanti, Chenggang Wu, Saurav Chhatrapati, Joseph Gonzalez, Joseph M. Hellerstein (UC Berkeley), Jose M. Faleiro, Eurosys 2020.
- Fault-tolerant and transactional stateful serverless workflows, Haoran Zhang, Adney Cardoza, Peter Baile Chen, Sebastian Angel, Vincent Liu, OSDI 2020.
Additional Readings
- FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices, Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer, OSDI 2020.
- SEUSS: Skip Redundant Paths to Make Serverless Fast, James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, Jonathan Appavoo, Eurosys 2020.
- Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo, Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, Rahul Potharaju, NSDI 2021.
***** Disaggregate it All [3/4]
Main Papers
- Capuchin: Tensor-based GPU Memory Management for Deep Learning, Xuan Peng, Xuan Peng, Xuanhua Shi, Xuanhua Shi, Hulin Dai, Hulin Dai, Hai Jin, Hai Jin, Weiliang Ma, Weiliang Ma, Qian Xiong, Qian Xiong, Fan Yang, Fan Yang, Xuehai Qian, Xuehai Qian (Less) March 2020, ASPLOS 2020.
- Building An Elastic Query Engine on Disaggregated Storage, Midhul Vuppalapati, Justin Miron Rachit Agarwal, Dan Truong, Ashish Motivala, Thierry Cruanes, NSDI 2020.
Additional Readings
- Can Far Memory Improve Job Throughput? Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K. Aguilera, Aurojit Panda, Sylvia Ratnasamy, Scott Shenker, Eurosys 2020.
- Accessible Near-Storage Computing with FPGAs, Robert Schmid, Max Plauth, Lukas Wenzel, Felix Eberhardt, Andreas Polze, Eurosys 2020.
- Hailstorm: Disaggregated Compute and Storage for Distributed LSM-based Databases, Laurent Bindschaedler, Laurent Bindschaedler, Ashvin Goel, Ashvin Goel, Willy E Zwaenepoel, Willy Zwaenepoel March 2020, ASPLOS 2020.
***** In the Blink of an Eye [3/9]
Main Papers
- Microsecond Consensus for Microsecond Applications, Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J. Marathe, Athanasios Xygkis, Igor Zablotchi, OSDI 2020.
- HovercRaft: Achieving Scalability and Fault-tolerance for microsecond-scale Datacenter Services, Marios Kogias, Edouard Bugnion, Eurosys 2020.
Additional Readings
***** Planet-Scale Systems [3/11]
Main Papers
- State-machine replication for planet-scale systems, Vitor Enes, Vitor Enes, Carlos Baquero, Carlos Baquero, Tuanir França Rezende, Tuanir França Rezende, Alexey Gotsman, Alexey Gotsman, Matthieu Perrin, Matthieu Perrin, Pierre Sutra, Pierre Sutra, Eurosys 2020. [Paper link on author’s homepage]
- Sol: Fast Distributed Computation Over Slow Networks, Fan Lai, Jie You, Xiangfeng Zhu, Harsha V. Madhyastha, Mosharaf Chowdhury, NSDI 2020.
Additional Readings
***** The Edge [3/16]
Main Papers
***** Edge-Cloud/Hybrid [3/18]
Main Papers
- MCUNet: Tiny Deep Learning on IoT Devices, Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han, NeurIPS 2020. [Conference paper link]
- CLIO: Enabling automatic compilation of deep learning pipelines across IoT and Cloud, Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, Heesung Kwon, Mobicom 2020. [Author paper link]
Additional Readings
***** ML Scheduling and Frameworks [3/23]
Main Papers
- Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads, Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, Matei Zaharia, OSDI 2020.
- Ray: A Distributed Framework for Emerging AI Applications, Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, Ion Stoica, OSDI 2018.
Additional Readings
- Efficient Algorithms for Device Placement of DNN Graph Operators, Jakub Tarnawski, Amar Phanishayee, Nikhil R. Devanur, Divya Mahajan, Fanny Nina Paravecino, NeurIPS 2020.
- AntMan: Dynamic Scaling on {GPU} Clusters for Deep Learning, Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, Yangqing Jia, OSDI 2020.
- Themis: Fair and Efficient GPU Cluster Scheduling, Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, Shuchi Chawla, NSDI 2020.
- Online Scheduling of Heterogeneous Distributed Machine Learning Jobs, Qin Zhang, Ruiting Zhou, Chuan Wu, Lei Jiao, Zongpeng Li, Mobihoc 2020.
- When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning, Shiqiang Wang, Tiffany Tuor, Theodoros Salonidi, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan, INFOCOM 2018.
- Online Dispatching and Scheduling of Jobs with Heterogeneous Utilities in Edge Computing, Chi Zhang, Haisheng Tan, Haoqiang Huang Zhenhua Han, Shaofeng H.-C. Jiang, Nikolaos Freris, XiangYang Li, Mobihoc 2020.
- A Generic Communication Scheduler for Distributed DNN Training Acceleration Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, Chuanxiong Guo, SOSP 2019.
- Baechi: Fast Device Placement of Machine Learning Graphs, Beomyeol Jeon, Linda Cai, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta, SoCC 2020.
***** ML Parameter Management [3/25]
Main Papers
- Blink: Fast and Generic Collectives for Distributed ML, Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Nikhil Devanur, Jorgen Thelin, Ion Stoica, MLSys 2020.
- Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training, Qinyi Luo, Qinyi Luo, Jiaao He, Jiaao He, Youwei Zhuo, Youwei Zhuo, Xuehai Qian, Xuehai Qian March 2020, ASPLOS 2020.
Additional Readings
- BlueConnect: Decomposing All-Reduce for Deep Learning on Heterogeneous Network, Hierarchy Minsik Cho, Ulrich Finkler, David Kung, MLSys 2019.
- Priority-based Parameter Propagation for Distributed DNN Training, Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, Gennady Pekhimenko, MLSys 2019.
- A System for Massively Parallel Hyperparameter Tuning, Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Jonathan Ben-tzur, Moritz Hardt, Benjamin Recht, Ameet Talwalkar, MLSys 2020.
- 3LC: Lightweight and Effective Traffic Compression for Distributed Machine, Learning Hyeontaek Lim, David G Andersen, Michael Kaminsky, MLSys 2019.
***** Time is of the Essence [3/30]
Main Papers
- Sundial: Fault-tolerant Clock Synchronization for Datacenters, Yuliang Li, Gautam Kumar, Hema Hariharan, Hassan Wassel, Peter Hochschild, Dave Platt, Simon Sabato, Minlan Yu, Nandita Dukkipati, Prashant Chandra, Amin Vahdat, OSDI 2020.
- Reliable Timekeeping for Intermittent Computing, Jasper de Winkel, Jasper de Winkel, Carlo Delle Donne, Carlo Delle Donne, Kasım Sinan Yıldırım, Kasim Sinan Yildirim, Przemysław Pawełczak, Przemysław Pawełczak, Josiah Hester, Josiah Hester, ASPLOS 2020.
Additional Readings
- Spanner: Google's Globally-Distributed Database, J. C. Corbett, J. Dean, et al, OSDI 2012
- Time-sensitive Intermittent Computing Meets Legacy Software, Vito Kortbeek, Vito Kortbeek, Kasım Sinan Yıldırım, Kasim Sinan Yildirim, Abu Bakar, Abu Bakar, Jacob Sorber, Jacob Sorber, Josiah Hester, Josiah Hester, Przemysław Pawełczak, Przemysław Pawełczak, ASPLOS 2020
- Time Awareness in Deep Learning-Based Multimodal Fusion Across Smartphone Platforms, Sandeep Singh Sandha, Joseph Noor, Fatima M. Anwar Mani Srivastava, IoTDI 2020.
- AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning, Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xing, NeurIPS 2020.
- No Time for Asynchrony, Marcos Aguilera, Michael Walfish, Usenix HotOS, 2009.
***** Rethink. All. The. Assumptions. (Twist in the Classical Tale) [4/1]
Main Papers
Additional Readings
- Partial synchrony based on set timeliness, Marcos K. Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, Sam Toueg, ACM PODC 2009.
- Baechi: Fast Device Placement of Machine Learning Graphs, Beomyeol Jeon, Linda Cai, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta, SoCC 2020.
- Lineage Stash: Fault Tolerance Off the Critical Path, Stephanie Wang, John Liagouris, Robert Nishihara, Philipp Moritz, Ujval Misra, Alexey Tumanov, Ion Stoica, SOSP 2019.
- Aegean: Replication Beyond the Client-Server Model, Remzi Can Aksoy, Manos Kapritsos, SOSP 2019.
- Toward a Generic Fault Tolerance Technique for Partial Network Partitioning, Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan, Samer Al-Kiswany, OSDI 2020.
***** Verification Approaches [4/6]
Main Papers
- Effective Concurrency Testing for Distributed Systems, Xinhao Yuan, Xinhao Yuan, Junfeng Yang, March 2020, ASPLOS 2020.
- Storage Systems are Distributed Systems (So Verify Them That Way!). Travis Hance, Andrea Lattuada, Chris Hawblitzel, Jon Howell, Rob Johnson, Bryan Parno, OSDI 2020.
Additional Readings
- IronFleet: Proving Practical Distributed Systems Correct, Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R. Lorch, Bryan Parno, Michael L. Roberts, Srinath Setty, Brian Zill, SOSP 2015
- Kaizen: Building a Verified and Performant Blockchain System, F. Kalim, K. Palmskog, J. Mehar, A. Murali, M. Parthasarathy, I. Gupta, FMCAD 2019.
- Quantitative Analysis of Consistency in NoSQL Key-value Stores, S. Liu, J. Ganhotra, M. R. Rahman, S. Nguyen, I. Gupta, J. Meseguer. LITES/QEST vol. 4, no. 1, 2017.
- Building Scalable and Flexible Cluster Managers Using Declarative Programming, Lalith Suresh, Joao Loff, Faria Kalim, Sangeetha Abdu Jyothi, Nina Narodytska, Leonid Ryzhyk, Sahan Gamage, Brian Oki, Pranshu Jain, Michael Gasch, OSDI 2020.
- Implementing Declarative Overlays, B. T. Loo et al, SOSP 2005
***** Video Killed the Radio Star [4/8]
Main Papers
- Approximate query service on autonomous IoT cameras, Mengwei Xu, Mengwei Xu, Xiwen Zhang, Xiwen Zhang, Yunxin Liu, Yunxin Liu, Gang Huang, Gang Huang, Xuanzhe Liu, Xuanzhe Liu, Felix Xiaozhu Lin, Felix Xiaozhu Lin, MobiSys 2020. [arxiv version]
- Scaling Video Analytics on Constrained Edge Nodes, Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G Andersen, Michael Kaminsky, Subramanya R. Dulloor, MLSys 2019.
Additional Readings
- Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis, Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, Ravi Sundaram, SOSP 2019.
- MVStylizer: An Efficient Edge-Assisted Video Photorealistic Style Transfer System for Mobile Phones, Ang Li, Chunpeng Wu Yiran Chen, Mobihoc 2020.
- CSI: Inferring Mobile ABR Video Adaptation Behavior under HTTPS and QUIC, Shichang Xu, Subhabrata Sen, Z. Morley Mao, Eurosys 2020.
***** Consensus, and Consistency [4/15]
Main Papers
- Virtual Consensus in Delos, Mahesh Balakrishnan, Jason Flinn, Chen Shen, Mihir Dharamshi, Ahmed Jafri, Xiao Shi, Santosh Ghosh, Hazem Hassan, Aaryaman Sagar, Rhed Shi, Jingming Liu, Filip Gruszczynski, Xianan Zhang, Huy Hoang, Ahmed Yossef, Francois Richard, Yee Jiun Song, OSDI 2020.
- Toward a Generic Fault Tolerance Technique for Partial Network Partitioning, Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan, Samer Al-Kiswany, OSDI 2020.
Additional Readings
- EPaxos Revisited, Sarah Tollman, Seo Jin Park, John Ousterhout, NSDI 2021.
- Fault-Tolerant Replication with Pull-Based Consensus in MongoDB, Siyuan Zhou, Shuai Mu, NSDI 2021.
- Tolerating Slowdowns in Replicated State Machines using Copilots, Khiem Ngo, Siddhartha Sen, Wyatt Lloyd, OSDI 2020.
- Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log, Cong Ding David Chu Evan Zhao Xiang Li† Lorenzo Alvisi Robbert van Renesse, NSDI 2020.
- CRaft: An Erasure-coding-supported Version of Raft for Reducing Storage Cost and Network Cost, Zizhong Wang, Tongliang Li, Haixia Wang, Airan Shao, Yunren Bai, Shangming Cai, Zihan Xu, Dongsheng Wang, FAST 2020.
- The SNOW Theorem and Latency-Optimal Read-Only Transactions, H. Lu et al, OSDI 2016.
- Conflict-free Replicated Data Types (CRDTs), N. Preguiça, C. Baquero, M. Shapiro, ArXiV 2018.
- Strong and Efficient Consistency with Consistency-Aware Durability, Aishwarya Ganesan, Ramnatthan Alagappan, Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau, FAST 2020.
- Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol, Antonios Katsarakis, Antonios Katsarakis, Vasilis Gavrielatos, Vasilis Gavrielatos, M R Siavash Katebzadeh, M.R. Siavash Katebzadeh, Arpit Joshi, Arpit Joshi, Aleksandar Dragojević, Aleksandar Dragojevic, Boris Grot, Boris Grot, Vijay Nagarajan, Vijay Nagarajan, ASPLOS 2020.
***** Industry [4/20]
Main Papers
- Autopilot: workload autoscaling at Google, Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, John Wilkes, Eurosys 2020. [direct link]
- Millions of Tiny Databases, Marc Brooker, Tao Chen Fan Ping, NSDI 2020.
Additional Readings
- Borg: the Next Generation, Muhammad Tirmazi, Adam Barker, Nan Deng, Md Ehtesam Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, John Wilkes, Eurosys 2020.
- Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure, Ze Li, Qian Cheng, Ken Hsieh, Yingnong Dang, Peng Huang, Pankaj Singh, Xinsheng Yang, Qingwei Lin, Youjiang Wu, Sebastien Levy, Murali Chintalapati, NSDI 2020.
- Twine: A Unified Cluster Management System for Shared Infrastructure, Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, Peter Zhang, OSDI 2020.
- Akamai DNS: Providing Authoritative Answers to the World’s Queries, Kyle Schomp, Onkar Bhardwaj, Eymen Kurdoglu, Mashooq Muhaimen, Ramesh K. Sitaraman, SIGCOMM 2020.
- POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database, Wei Cao,Yang Liu,Zhushi Cheng,Ning Zheng,Wei Li, Wenjie Wu,Linqiang Ouyang,Peng Wang, Yijing Wang,Ray Kuan, Zhenjun Liu, Feng Zhu,Tong Zhang, FAST 2020.
***** Grounding It (Measurement Studies) [4/22]
Main Papers
Additional Readings
- On the Use of ML for Blackbox System Performance Prediction, Silvery Fu, Saurabh Gupta, Radhika Mittal, Sylvia Ratnasamy, NSDI 2021.
- Uncovering Access, Reuse, and Sharing Characteristics of I/O-Intensive Files on Large-Scale Production HPC Systems, Tirthak Patel, Suren Byna, Glenn K. Lockwood Nicholas J. Wright, Philip Carns, Robert Ross, Devesh Tiwari, FAST 2020.
- Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook, Zhichao Cao, Siying Dong, Sagar Vemuri, David H.C. Du, FAST 2020.
***** Transactions [4/27]
Main Papers
- Kvell+: Snapshot Isolation without Snapshots, Baptiste Lepers, Oana Balmau, Karan Gupta, Willy Zwaenepoel, OSDI 2020.
- Meerkat: Multicore-Scalable Replicated Transactions Following the Zero-Coordination Principle, Adriana Szekeres, Michael Whittaker, Naveen Kr. Sharma, Jialin Li, Arvind Krishnamurthy, Dan R. K. Ports, Irene Zhang, Eurosys 2020.
Additional Readings
***** Miscellaneous [4/29]
Main Papers
Additional Readings
- #byzantine The Byzantine Generals problem, L. Lamport et al, TOPLAS 1982
- #byzantine Practical Byzantine Fault-Tolerance, Castro et al, OSDI 1999.
- #byzantine Zyzzyva: Speculative Byzantine Fault Tolerance, Ramakrishna Kotla et al, SOSP 2007
- #blockchains Blockene: A High-throughput Blockchain Over Mobile Devices, Sambhav Satija, Apurv Mehra, Sudheesh Singanamalla, Karan Grover, Muthian Sivathanu, Nishanth Chandran, Divya Gupta, Satya Lokam, OSDI 2020.
- #blockchains Fast and Secure Global Payments with Stellar, Marta Lokhava, Giuliano Losa (Galois), David Mazières (Stanford), Graydon Hoare, Nicolas Barry, Eliezer Gafni (UCLA), Jonathan Jove, Rafał Malinowski, Jed McCaleb, SOSP 2019.
- #hardware Assise: Performance and Availability via Client-local {NVM} in a Distributed File System, Thomas E. Anderson, Marco Canini, Jongyul Kim, Dejan Kostic, Youngjin Kwon, Simon Peter, Waleed Reda, Henry N. Schuh, Emmett Witchel, OSDI 2020.
- #compilers #ml Transferable Graph Optimizers for ML Compilers, Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi2, Daniel Wong3, Peter Ma, Qiumin Xu, Hanxiao Liu, Mangpo Phitchaya Phothilimtha, Shen Wang, Anna Goldie, Azalia Mirhoseini James Laudon, NeurIPS 2020.
***** Wrap Up [5/4]
(No reviews required for the following papers.)
Additional Readings
- R. Hoffmann, "Why buy that theory?", 2003
- R. P. Feynman, "Metaplast Corp."