Algorithms and Machine Learning for Analyzing Mutations in Cancer: Fall 2017

Instructor: Max Leiserson,

Meeting time: Tu/Th, 11am-12:15pm in CSI 3118

Office hours: Tu/Th, 1-2pm in BSB 3113, or by appointment. If the building is locked, call me at (301) 405-7395 to get let in.

Course website:

Course description

This course will present computational challenges and methods for analyzing mutations in cancer. A major cause of cancer are the somatic mutations that occur during the lifetime of an individual. Researchers now have access to large, public datasets of DNA sequencing data from thousands of tumors, and these datasets have brought many computational challenges to the forefront of cancer biology.

This course will focus on three of these challenges:

  1. Characterizing the mutational processes acting on a cancer genome. Mutations in DNA sequence are caused by different mutational processes, from mutagens like UV radiation and tobacco smoke to mistakes in the process of DNA replication. Despite decades of cancer research, only a handful of such processes are understood. We will frame uncovering these mutational processes as a computational problem, and study recent machine learning and optimization methods for characterizing the signatures of these processes.
  2. Identifying driver mutations in cancer. A key challenge in cancer research is identifying the mutations driving cancer, which is complicated by the multitude of random, passenger mutations present in most tumors. We will study the computational challenges of identifying driver mutations, including methods that use machine learning, combinatorial optimization, and/or applied statistics.
  3. Identifying tumor vulnerabilities through genetic interactions. The same mutations that cause cancer can also make the tumor vulnerable to targeted treatments. One powerful insight to expand the pool of potential targets is mapping the genetic interactions within cancer cells. We will study the computational challenges of inferring genetic interactions, both from cancer data and in model organisms, and learn about supervised learning and dimensionality reduction methods that attempt to solve this problem.

The goal of the course is to provide an overview of these cancer research questions and computational methods. The format of the course will be a mix of lectures to introduce key biological concepts, and review and research paper presentations.


Undergraduate-level computer science (e.g. algorithms or machine learning) and probability/statistics (e.g. random variables, distributions) are prerequisites. Students will be expected to be able to write and execute code. There are no biology prerequisites. Exceptions can be made with permission from the instructor.

Course Objectives and Learning Outcomes


Learning outcomes

Course schedule and assignments

The course will be a mixture of lectures and paper presentations. See this Google Spreadsheet for the topics, schedule, and readings and other assignments.


There is no specific textbook for this class. Readings will be distributed as PDFs by the instructor, including readings from textbooks, research papers, and reviews.




Your grade will be computed based on the following three factors. Each factor is explained further below.

  1. Participation (30% total)
  1. In-class participation, scribing, and online quizzes (25%)
  2. Office hours visit (5%)
  1. In-class presentations (10% total)
  2. Project I: Data analysis project (20% total)
  1. Checkpoint 1: example data (5%)
  2. Checkpoint 2: real data (5%)
  3. Write-up and figure/table (10%)
  1. Project II: Research project (40% total)
  1. Proposal (12% total)
  1. Report (6%)
  2. Presentation (6%)
  1. Final (28% total)
  1. Report (14%)
  2. Presentation (14%)


Students will be expected to participate in class, both during lectures and paper presentations by their fellow classmates. Students are also expected to participate in approximately weekly (ungraded) online quizzes, which will be used to gauge understanding and collect reviews of research papers we read in class. In addition, during classes where research papers are presented, one student will be appointed as the “scribe” and will take notes and produce an “annotated” version of the paper (e.g. with Kami). This participation will constitute 15% of your grade.

In addition, each student is expected to visit office hours at least once, which is worth 5% of your grade. Office hours are an opportunity to seek help, discuss project ideas and/or research interests, or discuss other topics of interest to the student.

In-class presentations

Students will be responsible for presenting multiple papers and/or concepts throughout the semester. These presentations will include slides and/or handouts for the class. These presentations will constitute 10% of your grade.

Data analysis project

Students will use computational methods to analyze real biological datasets by reproducing experiments from seminal papers read in class. Students will write a short report and create a short slide presentation summarizing their findings. Multiple students will reproduce experiments from the same paper, and these results will be discussed in class.

Research project

Students will propose and conduct a research project throughout the course of the semester. Students will then work with the instructor to choose a project that is within the scope of the course and matches their interests. The instructor will distribute a list of project ideas, but students can define their own project. Students may work on these projects in small groups, but will be graded individually.

About halfway through the semester, students will turn in a short, approximately 2-page proposal, and give a 5-minute presentation on their proposal in class. The proposal report and presentation will each constitute 10% and 15% of your grade, respectively (25% total). Similarly, near the end of the semester, students will turn in a final report and make a final 10-minute presentation, which will also constitute 10% and 15% of your grade, respectively (25% total).

Absence and late policy

Unexcused absences may be counted against your participation grade in the class. Any student who needs to be excused from a single lecture due to a medically necessitated absence shall inform the instructor of the absence prior to the class.

Projects may be turned in up to two days after the due date for partial credit; 10% of the maximum grade will be subtracted for each late day. Late responses to online quizzes will not be accepted, and failure to participate may be counted against your participation in the class.

Academic integrity

Students will be expected to uphold principles of academic integrity, described in the University’s handbook: In addition, following the guidelines in the handbook, students will be asked to sign a pledge on each project stating that “I pledge on my honor that I have not given or received any unauthorized assistance on this examination (or assignment).” More information on the Code of Academic Integrity is available at


Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first TWO weeks of the semester.