CS6222-SMR Syllabus Fall25

CS6222/CS4803:

Machine Learning Systems

Fall 2025

Class Hours: MW 11:00-12:15ET

Classroom: Ford EnvSci&Tech (ES&T) L1255

Web: piazza/canvas

Professor: Alexey Tumanov

Email: atumanov@gatech.edu

Web: faculty.cc.gatech.edu/~atumanov

Office hours: MW 12:15-12:45 (or by appt)

Head GTA: Dhruv Garg

Email: dgarg39@gatech.edu

Office Hours: Fri 11-12, KACB3300 Lounge

Teaching Assistants

Sukrit Kumar [email]

Office Hours: Tue 11-12, KACB3300 Lounge

Anirudha Agrawal [email]

OH: Thu 3:30-4:30, KACB3300 Lounge

Angelina Zhou [email]

Office Hours: Wed 2-3pm, KACB3300 Lounge

Course Description

Recent resurgence, popularity, and efficacy of Machine Learning Systems is fueled by progress on both Machine Learning algorithms and advances in hardware and software systems that support them. Examples of this relationship can be found in enabling training increasingly complex models at scale on growing datasets with ever-improving time-to-convergence and time-to-accuracy. New software systems have also contributed to the modularity and simplification of model development, neural architecture discovery, and model fine-tuning, by providing practical abstractions. Popular sysML software includes open source frameworks such as PyTorch, Tensorflow, Clipper, Ray. New hardware platforms specialized for Machine Learning include new generations of GPUs as well as hardware accelerators, such as Google’s TPU and Intel’s Nervana Neural Network Processor (NNP). Focusing on the fundamentals, in this course, we will look at the latest trends on the intersection of these three disciplines of Computer Science: Machine Learning, Software Systems, and Computer Architecture, and how their co-design enables the next generation of Machine Learning Systems.

The list of topics covered in this class includes:

ML lifecycle management,
ML model serving/inference,
frameworks for ML training and inference,
latency aware neural architecture search (NAS),
Federated Learning (FL),
Weight shared Deep Neural Network (DNN) training and inference
ML model hyperparameter optimization
Model compression and quantization (static and dynamic)
Resource management & scheduling for ML workloads
Large Language Models (LLM):

Foundations of Large Language Models (LLMs)
Latency and throughput maximizing: mechanisms and policies
Advanced LLM inference scheduling: 4D parallelism

The course format is a mixture of lecture material presented by the instructor and assigned paper analysis presented by the students. The course is heavily project-based by design (either labs or research). To learn to be a SysML researcher and/or practitioner --- there’s no substitute for the hands-on imperative. Undergraduate students will take the labs option; graduate students will be required to take the research project option. Research project students will be strongly encouraged to team up in a way that diversifies their expertise, producing full coverage of both Systems and ML background needed for the execution of your projects (e.g., by including both ML and Systems students in each group).

Curricular Requirement Satisfaction

Note: this course satisfies the following curricular course requirements for both MS and PhD students:

MSCS Specialization in ML [source] elective course
MSCS Specialization in Systems [source] elective course
SCS PhD program Systems qualifier : one of the core (area) courses
ECE PhD CSS TIG Coursework Qualifier requirement [handbook]

Learning Objectives

Upon completion of this course a student would be able to:

Use state-of-the-art frameworks to train simple ML models, with the ability to implement their own custom training and tuning algorithms.
Use state-of-the-art and develop their own ML model serving frameworks. This includes implementing their own scheduling algorithms/policies for model serving
Develop inference serving auto-scaling mechanisms and policies that automatically and transparently adapt to variable ingest workload.
Develop and understand state-of-the-art Federated Learning (FL) algorithms.
Develop and understand state-of-the-art Neural Architecture Search (NAS) algorithms.
Use state-of-the-art systems for serving Large Language Models (LLMs) and understand and apply different degrees of parallelism for distributed LLM inference.
Have a holistic understanding of the ML lifecycle as a multi-stage pipeline.
Use state-of-the-art LLM simulators and be able to modify those simulators for different models, different replica level scheduling policies, and different LLM inference techniques, including, but not limited to prefill/decode disaggregation and speculative decode.

Prerequisites/Requirements

For undergraduate students, required prerequisites are as follows: (CS2200 or ECE3058) and (CS3210 or CS4210).

For graduate students, an equivalent is expected, but not strictly enforced.

For all students, you need to have a strong background in at least one of {Systems, ML}:

Basic system building skills are expected (at the level of CS2200 OR ECE3058)

Knowledge of python is required
Knowledge of C/C++ is strongly preferred, but not required

Basic familiarity with Machine Learning training and/or inference is expected

A crash course on Deep Neural Networks (DNN) is strongly recommended, but not required.

Ability to work with a medium-sized code base (1000-10000 lines of code) is strongly recommended (e.g. CS3210 labs)

Examination

Lecture material competency will be examined through a set of in-class, proctored, timed, individual quizzes throughout the semester as well as a midterm exam.

Grading Scale

Your final grade will be assigned as a letter grade according to the following scale:

A 90-100%

B 80-89%

C 70-79%

D 60-69%

F 0-59%

Grading Policy

The following graded assignment will contribute to the student’s final grade:

Class participation:

5% -- Attendance / Class participation/Info card

Analytical Paper Presentations:

10% -- Paper presentations (submitted to canvas)

In-class quizzes on lectures + papers

15% -- administered in class via canvas quiz assignments throughout semester

Midterm Exam

15% – administered in class as a proctored exam

Research Project XOR Labs: 55%

Research Project: 55% (graduate students)

5% project proposal
10% mid-point presentation
10% final project presentation
5% final project poster/video/demo
5% team project check-in
20% final project report

Labs (undergraduate students):

Lab1: 10%
Lab2: 15%
Lab3: 15%
Lab4: 15%

Late Penalty

A late penalty on assignments will be assessed at 10% point reduction per day up to 7 days. After 7 days, the grade of zero is assigned for the late assignment.

For paper review submissions (if graded), the lowest 20% of paper review scores will be dropped.

Research Project versus Labs

For graduate students, the research project is the major contributor to the student’s grade in this course (see grading policy). Students are expected to work in teams, develop a research idea in the scope of the SysML research area covered by this class, implement the system prototype, develop experimental methodology, carry out experiments, and communicate the results of their research to the class. The research project will have multiple graded components, including:

Research project proposal -- initial proposal for the research project, team composition, falsifiable hypothesis statement, and experimental methodology expected
Mid-point progress presentation -- project progress presentation in class
Final project report, which includes artifact evaluation
Research project poster/demo/video
Final project presentation

For undergraduate students, there will be four autograded labs with a clear specification and expected autogradeable outcomes. In contrast to the open-ended research project, labs will be independent of each other, will be limited in scope, and will have the expected effort of two to four weeks to complete each of the four labs. The students are strongly encouraged to start lab assignments as soon as they are released. They are expected to take the amount of time assigned. Starting late on labs is a common failure mode and should be avoided.

Regrade Policy

Hand grading is error prone and mistakes are possible. We allow students to request regrades to ensure that they have the proper grade for the work they've turned in.

We will accept regrade requests for hand-graded assignments in this class subject to the following regrade policy:

The goal of a regrade is to ensure you have the correct grade for the entire assignment, not just to return points. We will regrade your entire assignment if a regrade is requested. There are concrete pedagogical reasons for this.
To request a regrade, please submit a regrade request to the teaching staff on piazza (private Piazza post visible to Instructors only). The request should include the following necessary components:

What question(s) do you believe we made a mistake on?
What mistake do you believe we made?
Why do you believe your answer is correct for this question?

Regrade requests must be submitted within 2 weeks of an assignment being returned. We will not regrade assignments after that regrade opportunity window.

Communication

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.

Find our class signup link at: https://piazza.com/gatech/fall2025/cs8803smr

Audit and Pass/Fail Policy

A student interested in registering in audit or pass/fail mode must always approach the course instructor and determine the minimum passing requirements. For this class, the minimum passing requirement includes full participation in the course project ( including proposal, mid-point presentation, and final project deliverables) XOR labs. That’s 55% of the course grade. Therefore, audit and pass/fail are highly discouraged, because the amount of work involved will be similar to taking the course for a letter grade. Furthermore, students not taking this class for a letter grade will still be encouraged to present papers and participate in the class discussion. In the (unlikely) situation where registered students run out of paper presentation slots, the non-letter grade students may be asked to volunteer their presentation slot. In general the priority for access to resources (including presentation slots and compute) will be letter grade, pass/fail, audit, in that order.

Academic Honor Code

All students must follow the academic integrity and Georgia Tech Honor Code. Cheating will not be tolerated. Examples of behaviors that violate Georgia Tech Honor Code (Section 3) include but are not limited to:

Unauthorized collaboration -- this includes copying paper reviews, having a student from a different project group make significant/tangible contributions to the project you are claiming credit for
Plagiarism: submission of material that is significantly identical to that created or published by another person without adequate credit
False Claims of Performance: false or exuberant claims of experimental evaluation in a project report that cannot be reproduced with the submitted code.
Use of ChatGPT or other generative AI technology without attribution to the source!

Subject to Change Statement

Due to the highly dynamic situation (e.g., global phenomena outside of the instructor’s control, conference travel, etc), the syllabus and course schedule may be subject to change. It is the responsibility of students to check Canvas, GradeScope, Piazza, email messages, and course announcements (through course canvas OR piazza) to stay up-to-date with any course logistics changes. We will make every effort to communicate changes via these mechanisms. The course is held IN RESIDENCE by default, until and unless announced otherwise on course Piazza or Canvas. Virtual or Hybrid options or accommodations cannot be guaranteed.

Tentative Schedule

Week	Date	Topic	Paper 1	Paper 2
1	18-Aug	Class introduction & overview
1	20-Aug	Class topic overview: Tour de SysML 2025	Hidden technical debt in ML	Berkeley View of Systems Challenges for AI
1	22-Aug	Registration deadline
2	25-Aug	DL Frameworks war: Pytorch vs TensorFlow	Tensorflow	pytorch
2	27-Aug	GPU 101	CUDA background
3	1-Sep	LABOR DAY (no class)
3	3-Sep	DL Frameworks Gen 2.0	Triton	Pytorch2
4	8-Sep	Training Large Models	Scaling Laws	PyTorch Distributed
4	10-Sep	Breaking Scaling Boundaries	ZERO	Megatron-LM
5	15-Sep	Building Automated Distributed Training Systems	Alpa	Varuna
5	17-Sep	Prediction Serving: Abstractions, Composition	Clipper	Inferline
6	22-Sep	Prediction Serving Gen 2.0: Model Zoo	INFaaS	SuperServe
6	24-Sep	Managing Resource for DL training	gandiva	mlaas in the wild
7	29-Sep	Optimizing Resource Allocation	TetriSched	Gavel
7	1-Oct	Serving LLMs: Foundations	Orca	Inside vLLM
8	6-Oct	FALL BREAK (no class)
8	8-Oct	Serving LLMs: Low latency	Sarathi-Serve	DistServe
9	13-Oct	DeepSeek V3: Case-study on architecture-system codesign	DeepSeek v3 Sec1-3	DeepSeekV3 Inference Sys Overview
9	15-Oct	Midterm Exam
10	20-Oct	Mid-point project presentations
10	22-Oct	Mid-point project presentations
11	27-Oct	LLM: Systems support for Long Context in LLM	Medha	Cartridges
11	29-Oct	Hardware Aware Algorithm Design	flashattention	fast inf via spec decode
12	3-Nov	Caching for LLM Inference Systems	Akasha (TBD)	Strata
12	5-Nov	Model Compression and Quantization	lottery ticket	SqueezeLLM
13	10-Nov	Neural Architecture Search: Deployment-aware	DεpS@eccv24	bignas-eccv20
13	12-Nov	Federated Learning and FL NAS	FLAME	SuperFedNAS@eccv24
14	17-Nov	Guest lecture: Retrospective + Prospective on SysML
14	19-Nov	Final project presentations
15	24-Nov	Final project presentations
15	26-Nov	STUDENT RECESS (no class)
16	1-Dec	Final project presentations

Note: quiz dates marked in yellow.