CS6222/CS4803:
Machine Learning Systems
Fall 2025
Class Hours: MW 11:00-12:15ET
Classroom: Ford EnvSci&Tech (ES&T) L1255
Professor: Alexey Tumanov Email: atumanov@gatech.edu Web: faculty.cc.gatech.edu/~atumanov Office hours: MW 12:15-12:45 (or by appt) Head GTA: Dhruv Garg Email: dgarg39@gatech.edu Office Hours: Fri 11-12, KACB3300 Lounge | Teaching Assistants Sukrit Kumar [email] Office Hours: Tue 11-12, KACB3300 Lounge Anirudha Agrawal [email] OH: Thu 3:30-4:30, KACB3300 Lounge Angelina Zhou [email] Office Hours: Wed 2-3pm, KACB3300 Lounge |
Recent resurgence, popularity, and efficacy of Machine Learning Systems is fueled by progress on both Machine Learning algorithms and advances in hardware and software systems that support them. Examples of this relationship can be found in enabling training increasingly complex models at scale on growing datasets with ever-improving time-to-convergence and time-to-accuracy. New software systems have also contributed to the modularity and simplification of model development, neural architecture discovery, and model fine-tuning, by providing practical abstractions. Popular sysML software includes open source frameworks such as PyTorch, Tensorflow, Clipper, Ray. New hardware platforms specialized for Machine Learning include new generations of GPUs as well as hardware accelerators, such as Google’s TPU and Intel’s Nervana Neural Network Processor (NNP). Focusing on the fundamentals, in this course, we will look at the latest trends on the intersection of these three disciplines of Computer Science: Machine Learning, Software Systems, and Computer Architecture, and how their co-design enables the next generation of Machine Learning Systems.
The list of topics covered in this class includes:
The course format is a mixture of lecture material presented by the instructor and assigned paper analysis presented by the students. The course is heavily project-based by design (either labs or research). To learn to be a SysML researcher and/or practitioner --- there’s no substitute for the hands-on imperative. Undergraduate students will take the labs option; graduate students will be required to take the research project option. Research project students will be strongly encouraged to team up in a way that diversifies their expertise, producing full coverage of both Systems and ML background needed for the execution of your projects (e.g., by including both ML and Systems students in each group).
Note: this course satisfies the following curricular course requirements for both MS and PhD students:
Upon completion of this course a student would be able to:
For undergraduate students, required prerequisites are as follows: (CS2200 or ECE3058) and (CS3210 or CS4210).
For graduate students, an equivalent is expected, but not strictly enforced.
For all students, you need to have a strong background in at least one of {Systems, ML}:
Lecture material competency will be examined through a set of in-class, proctored, timed, individual quizzes throughout the semester as well as a midterm exam.
Your final grade will be assigned as a letter grade according to the following scale:
A 90-100%
B 80-89%
C 70-79%
D 60-69%
F 0-59%
The following graded assignment will contribute to the student’s final grade:
Class participation:
Analytical Paper Presentations:
In-class quizzes on lectures + papers
Midterm Exam
Research Project XOR Labs: 55%
A late penalty on assignments will be assessed at 10% point reduction per day up to 7 days. After 7 days, the grade of zero is assigned for the late assignment.
For paper review submissions (if graded), the lowest 20% of paper review scores will be dropped.
For graduate students, the research project is the major contributor to the student’s grade in this course (see grading policy). Students are expected to work in teams, develop a research idea in the scope of the SysML research area covered by this class, implement the system prototype, develop experimental methodology, carry out experiments, and communicate the results of their research to the class. The research project will have multiple graded components, including:
For undergraduate students, there will be four autograded labs with a clear specification and expected autogradeable outcomes. In contrast to the open-ended research project, labs will be independent of each other, will be limited in scope, and will have the expected effort of two to four weeks to complete each of the four labs. The students are strongly encouraged to start lab assignments as soon as they are released. They are expected to take the amount of time assigned. Starting late on labs is a common failure mode and should be avoided.
Hand grading is error prone and mistakes are possible. We allow students to request regrades to ensure that they have the proper grade for the work they've turned in.
We will accept regrade requests for hand-graded assignments in this class subject to the following regrade policy:
This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.
Find our class signup link at: https://piazza.com/gatech/fall2025/cs8803smr
A student interested in registering in audit or pass/fail mode must always approach the course instructor and determine the minimum passing requirements. For this class, the minimum passing requirement includes full participation in the course project ( including proposal, mid-point presentation, and final project deliverables) XOR labs. That’s 55% of the course grade. Therefore, audit and pass/fail are highly discouraged, because the amount of work involved will be similar to taking the course for a letter grade. Furthermore, students not taking this class for a letter grade will still be encouraged to present papers and participate in the class discussion. In the (unlikely) situation where registered students run out of paper presentation slots, the non-letter grade students may be asked to volunteer their presentation slot. In general the priority for access to resources (including presentation slots and compute) will be letter grade, pass/fail, audit, in that order.
All students must follow the academic integrity and Georgia Tech Honor Code. Cheating will not be tolerated. Examples of behaviors that violate Georgia Tech Honor Code (Section 3) include but are not limited to:
Due to the highly dynamic situation (e.g., global phenomena outside of the instructor’s control, conference travel, etc), the syllabus and course schedule may be subject to change. It is the responsibility of students to check Canvas, GradeScope, Piazza, email messages, and course announcements (through course canvas OR piazza) to stay up-to-date with any course logistics changes. We will make every effort to communicate changes via these mechanisms. The course is held IN RESIDENCE by default, until and unless announced otherwise on course Piazza or Canvas. Virtual or Hybrid options or accommodations cannot be guaranteed.
Week | Date | Topic | Paper 1 | Paper 2 |
1 | 18-Aug | Class introduction & overview | ||
1 | 20-Aug | Class topic overview: Tour de SysML 2025 | ||
1 | 22-Aug | Registration deadline | ||
2 | 25-Aug | DL Frameworks war: Pytorch vs TensorFlow | ||
2 | 27-Aug | GPU 101 | ||
3 | 1-Sep | LABOR DAY (no class) | ||
3 | 3-Sep | DL Frameworks Gen 2.0 | ||
4 | 8-Sep | Training Large Models | ||
4 | 10-Sep | Breaking Scaling Boundaries | ||
5 | 15-Sep | Building Automated Distributed Training Systems | ||
5 | 17-Sep | Prediction Serving: Abstractions, Composition | ||
6 | 22-Sep | Prediction Serving Gen 2.0: Model Zoo | ||
6 | 24-Sep | Managing Resource for DL training | gandiva | mlaas in the wild |
7 | 29-Sep | Optimizing Resource Allocation | ||
7 | 1-Oct | Serving LLMs: Foundations | Orca | Inside vLLM |
8 | 6-Oct | FALL BREAK (no class) | ||
8 | 8-Oct | Serving LLMs: Low latency | Sarathi-Serve | DistServe |
9 | 13-Oct | DeepSeek V3: Case-study on architecture-system codesign | DeepSeek v3 Sec1-3 | DeepSeekV3 Inference Sys Overview |
9 | 15-Oct | Midterm Exam | ||
10 | 20-Oct | Mid-point project presentations | ||
10 | 22-Oct | Mid-point project presentations | ||
11 | 27-Oct | LLM: Systems support for Long Context in LLM | Medha | Cartridges |
11 | 29-Oct | Hardware Aware Algorithm Design | ||
12 | 3-Nov | Caching for LLM Inference Systems | Akasha (TBD) | Strata |
12 | 5-Nov | Model Compression and Quantization | ||
13 | 10-Nov | Neural Architecture Search: Deployment-aware | ||
13 | 12-Nov | Federated Learning and FL NAS | ||
14 | 17-Nov | Guest lecture: Retrospective + Prospective on SysML | ||
14 | 19-Nov | Final project presentations | ||
15 | 24-Nov | Final project presentations | ||
15 | 26-Nov | STUDENT RECESS (no class) | ||
16 | 1-Dec | Final project presentations |
Note: quiz dates marked in yellow.