CS6222/CS4803:

Machine Learning Systems

Fall 2025

Class Hours: MW 11:00-12:15ET

Classroom: Ford EnvSci&Tech (ES&T) L1255

Web: piazza/canvas

Professor: Alexey Tumanov

Email: atumanov@gatech.edu

Web: faculty.cc.gatech.edu/~atumanov

Office hours: MW 12:15-12:45 (or by appt)

Head GTA: Dhruv Garg

Email: dgarg39@gatech.edu

Office Hours: Fri 11-12, KACB3300 Lounge

Teaching Assistants

Sukrit Kumar [email]

Office Hours: Tue 11-12, KACB3300 Lounge

Anirudha Agrawal [email]

OH: Thu 3:30-4:30, KACB3300 Lounge

Angelina Zhou [email]

Office Hours: Wed 2-3pm, KACB3300 Lounge

Course Description

Recent resurgence, popularity, and efficacy of Machine Learning Systems is fueled by progress on both Machine Learning algorithms and advances in hardware and software systems that support them. Examples of this relationship can be found in enabling training increasingly complex models at scale on growing datasets with ever-improving time-to-convergence and time-to-accuracy. New software systems have also contributed to the modularity and simplification of model development, neural architecture discovery, and model fine-tuning, by providing practical abstractions. Popular sysML software includes open source frameworks such as PyTorch, Tensorflow, Clipper, Ray. New hardware platforms specialized for Machine Learning include new generations of GPUs as well as hardware accelerators, such as Google’s TPU and Intel’s Nervana Neural Network Processor (NNP). Focusing on the fundamentals, in this course, we will look at the latest trends on the intersection of these three disciplines of Computer Science: Machine Learning, Software Systems, and Computer Architecture, and how their co-design enables the next generation of Machine Learning Systems.

The list of topics covered in this class includes:

  • ML lifecycle management,
  • ML model serving/inference,
  • frameworks for ML training and inference,
  • latency aware neural architecture search (NAS),
  • Federated Learning (FL),
  • Weight shared Deep Neural Network (DNN) training and inference
  • ML model hyperparameter optimization
  • Model compression and quantization (static and dynamic)
  • Resource management & scheduling for ML workloads
  • Large Language Models (LLM):
  • Foundations of Large Language Models (LLMs)
  • Latency and throughput maximizing: mechanisms and policies
  • Advanced LLM inference scheduling: 4D parallelism

The course format is a mixture of lecture material presented by the instructor and assigned paper analysis presented by the students. The course is heavily project-based by design (either labs or research). To learn to be a SysML researcher and/or practitioner --- there’s no substitute for the hands-on imperative. Undergraduate students will take the labs option; graduate students will be required to take the research project option. Research project students will be strongly encouraged to team up in a way that diversifies their expertise, producing full coverage of both Systems and ML background needed for the execution of your projects (e.g., by including both ML and Systems students in each group).

Curricular Requirement Satisfaction

Note: this course satisfies the following curricular course requirements for both MS and PhD students:

  • MSCS Specialization in ML [source] elective course
  • MSCS Specialization in Systems [source] elective course
  • SCS PhD program Systems qualifier : one of the core (area) courses
  • ECE PhD CSS TIG Coursework Qualifier requirement [handbook]

Learning Objectives

Upon completion of this course a student would be able to:

  • Use state-of-the-art frameworks to train simple ML models, with the ability to implement their own custom training and tuning algorithms.
  • Use state-of-the-art and develop their own ML model serving frameworks. This includes implementing their own scheduling algorithms/policies for model serving
  • Develop inference serving auto-scaling mechanisms and policies that automatically and transparently adapt to variable ingest workload.
  • Develop and understand state-of-the-art Federated Learning (FL) algorithms.
  • Develop and understand state-of-the-art Neural Architecture Search (NAS) algorithms.
  • Use state-of-the-art systems for serving Large Language Models (LLMs) and understand and apply different degrees of parallelism for distributed LLM inference.
  • Have a holistic understanding of the ML lifecycle as a multi-stage pipeline.
  • Use state-of-the-art LLM simulators and be able to modify those simulators for different models, different replica level scheduling policies, and different LLM inference techniques, including, but not limited to prefill/decode disaggregation and speculative decode.

Prerequisites/Requirements

For undergraduate students, required prerequisites are as follows: (CS2200 or ECE3058) and (CS3210 or CS4210).

For graduate students, an equivalent is expected, but not strictly enforced.

For all students, you need to have a strong background in at least one of {Systems, ML}:

  • Basic system building skills are expected (at the level of CS2200 OR ECE3058)
  • Knowledge of python is required
  • Knowledge of C/C++ is strongly preferred, but not required
  • Basic familiarity with Machine Learning training and/or inference is expected
  • A crash course on Deep Neural Networks (DNN) is strongly recommended, but not required.
  • Ability to work with a medium-sized code base (1000-10000 lines of code) is strongly recommended (e.g. CS3210 labs)

Examination

Lecture material competency will be examined through a set of in-class, proctored, timed, individual quizzes throughout the semester as well as a midterm exam.

Grading Scale

Your final grade will be assigned as a letter grade according to the following scale:

A              90-100%

B              80-89%

C             70-79%

D             60-69%

F              0-59%

Grading Policy

The following graded assignment will contribute to the student’s final grade:

Class participation:

  • 5% -- Attendance / Class participation/Info card

Analytical Paper Presentations:

  • 10% -- Paper presentations (submitted to canvas)

In-class quizzes on lectures + papers

  • 15% -- administered in class via canvas quiz assignments throughout semester

Midterm Exam

  • 15% – administered in class as a proctored exam

Research Project XOR Labs: 55%

  • Research Project: 55% (graduate students)
  • 5% project proposal
  • 10% mid-point presentation
  • 10% final project presentation
  • 5% final project poster/video/demo
  • 5% team project check-in
  • 20% final project report
  • Labs (undergraduate students):
  • Lab1: 10%
  • Lab2: 15%
  • Lab3: 15%
  • Lab4: 15%

Late Penalty

A late penalty on assignments will be assessed at 10% point reduction per day up to 7 days. After 7 days, the grade of zero is assigned for the late assignment.

For paper review submissions (if graded), the lowest 20% of paper review scores will be dropped.

Research Project versus Labs

For graduate students, the research project is the major contributor to the student’s grade in this course (see grading policy). Students are expected to work in teams, develop a research idea in the scope of the SysML research area covered by this class, implement the system prototype, develop experimental methodology, carry out experiments, and communicate the results of their research to the class. The research project will have multiple graded components, including:

  • Research project proposal -- initial proposal for the research project, team composition, falsifiable hypothesis statement, and experimental methodology expected
  • Mid-point progress presentation -- project progress presentation in class
  • Final project report, which includes artifact evaluation
  • Research project poster/demo/video
  • Final project presentation

For undergraduate students, there will be four autograded labs with a clear specification and expected autogradeable outcomes. In contrast to the open-ended research project, labs will be independent of each other, will be limited in scope, and will have the expected effort of two to four weeks to complete each of the four labs. The students are strongly encouraged to start lab assignments as soon as they are released. They are expected to take the amount of time assigned. Starting late on labs is a common failure mode and should be avoided.

Regrade Policy

Hand grading is error prone and mistakes are possible. We allow students to request regrades to ensure that they have the proper grade for the work they've turned in.

We will accept regrade requests for hand-graded assignments in this class subject to the following regrade policy:

  • The goal of a regrade is to ensure you have the correct grade for the entire assignment, not just to return points. We will regrade your entire assignment if a regrade is requested. There are concrete pedagogical reasons for this.
  • To request a regrade, please submit a regrade request to the teaching staff on piazza (private Piazza post visible to Instructors only). The request should include the following necessary components:
  • What question(s) do you believe we made a mistake on?
  • What mistake do you believe we made?
  • Why do you believe your answer is correct for this question?
  • Regrade requests must be submitted within 2 weeks of an assignment being returned. We will not regrade assignments after that regrade opportunity window.

Communication

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.

Find our class signup link at: https://piazza.com/gatech/fall2025/cs8803smr

Audit and Pass/Fail Policy

A student interested in registering in audit or pass/fail mode must always approach the course instructor and determine the minimum passing requirements. For this class, the minimum passing requirement includes full participation in the course project ( including proposal, mid-point presentation, and final project deliverables) XOR labs. That’s 55% of the course grade. Therefore, audit and pass/fail are highly discouraged, because the amount of work involved will be similar to taking the course for a letter grade. Furthermore, students not taking this class for a letter grade will still be encouraged to present papers and participate in the class discussion. In the (unlikely) situation where registered students run out of paper presentation slots, the non-letter grade students may be asked to volunteer their presentation slot. In general the priority for access to resources (including presentation slots and compute) will be letter grade, pass/fail, audit, in that order.

Academic Honor Code

All students must follow the academic integrity and Georgia Tech Honor Code. Cheating will not be tolerated. Examples of behaviors that violate Georgia Tech Honor Code (Section 3) include but are not limited to:

  • Unauthorized collaboration -- this includes copying paper reviews, having a student from a different project group make significant/tangible contributions to the project you are claiming credit for
  • Plagiarism: submission of material that is significantly identical to that created or published by another person without adequate credit
  • False Claims of Performance: false or exuberant claims of experimental evaluation in a project report that cannot be reproduced with the submitted code.
  • Use of ChatGPT or other generative AI technology without attribution to the source!

Subject to Change Statement

Due to the highly dynamic situation (e.g., global phenomena outside of the instructor’s control, conference travel, etc), the syllabus and course schedule may be subject to change. It is the responsibility of students to check Canvas, GradeScope, Piazza, email messages, and course announcements (through course canvas OR piazza) to stay up-to-date with any course logistics changes. We will make every effort to communicate changes via these mechanisms. The course is held IN RESIDENCE by default, until and unless announced otherwise on course Piazza or Canvas. Virtual or Hybrid options or accommodations cannot be guaranteed.


Tentative Schedule

Week

Date

Topic

Paper 1

Paper 2

1

18-Aug

Class introduction & overview

1

20-Aug

Class topic overview: Tour de SysML 2025

Hidden technical debt in ML

Berkeley View of Systems Challenges for AI

1

22-Aug

Registration deadline

2

25-Aug

DL Frameworks war: Pytorch vs TensorFlow

Tensorflow

pytorch

2

27-Aug

GPU 101

CUDA background

3

1-Sep

LABOR DAY (no class)

3

3-Sep

DL Frameworks Gen 2.0

Triton

Pytorch2

4

8-Sep

Training Large Models

Scaling Laws

PyTorch Distributed

4

10-Sep

Breaking Scaling Boundaries

ZERO

Megatron-LM

5

15-Sep

Building Automated Distributed Training Systems

Alpa

Varuna

5

17-Sep

Prediction Serving: Abstractions, Composition

Clipper

Inferline

6

22-Sep

Prediction Serving Gen 2.0: Model Zoo

INFaaS

SuperServe

6

24-Sep

Managing Resource for DL training

gandiva

mlaas in the wild

7

29-Sep

Optimizing Resource Allocation

TetriSched

Gavel

7

1-Oct

Serving LLMs: Foundations

Orca

Inside vLLM

8

6-Oct

FALL BREAK (no class)

8

8-Oct

Serving LLMs: Low latency

Sarathi-Serve

DistServe

9

13-Oct

DeepSeek V3: Case-study on architecture-system codesign

DeepSeek v3 Sec1-3

DeepSeekV3 Inference Sys Overview

9

15-Oct

Midterm Exam

10

20-Oct

Mid-point project presentations

10

22-Oct

Mid-point project presentations

11

27-Oct

LLM: Systems support for Long Context in LLM

Medha

Cartridges

11

29-Oct

Hardware Aware Algorithm Design

flashattention

fast inf via spec decode

12

3-Nov

Caching for LLM Inference Systems

Akasha (TBD)

Strata

12

5-Nov

Model Compression and Quantization

lottery ticket

SqueezeLLM

13

10-Nov

Neural Architecture Search: Deployment-aware

DεpS@eccv24

bignas-eccv20

13

12-Nov

Federated Learning and FL NAS

FLAME

SuperFedNAS@eccv24

14

17-Nov

Guest lecture: Retrospective + Prospective on SysML

14

19-Nov

Final project presentations

15

24-Nov

Final project presentations

15

26-Nov

STUDENT RECESS (no class)

16

1-Dec

Final project presentations

Note: quiz dates marked in yellow.