Machine Learning, Fall 2019 |

Time/Location: Tue/Thu 2:00-3:15 pm in room LSE B01

Course Number: CS 542

Instructor: Kate Saenko (saenko@bu.edu) office hours: T/Th 3:30-5pm in MCS-200

Teaching Fellows:

Ben Usman (usmn@bu.edu) office hours: Wed 11:15-12:15pm and 14:25-15:25pm in EMA 302

Vasili Ramanishka (vram@bu.edu) office hours: Tue 5:00-7:00pm in EMA 302

Graders: Runqi Tian (rqtian@bu.edu), Ximeng Sun (sunxm@bu.edu), Andrea Burns (aburns4@bu.edu), Chi Zhang (czhang1@bu.edu)

Please only use email for personal questions, e.g. grading, use Piazza for all other questions:

Piazza: discussion forum and problem sets: https://piazza.com/bu/fall2019/cs542/home

SCHEDULE*

Topic | Details | Assignments | |

THE BASICS | |||

Tue Sep 3 | Introduction | what is machine learning? types of learning; features; hypothesis; cost function; course information | |

Wed lab | LAB1: Probability and Math Review | ||

Thu Sep 5 | Preliminaries | review of expected mathematical skills for the course; Useful reference on matrix calculus; also see http://www.matrixcalculus.org/ | ps0 (public) ps0 (piazza) (math prerequisites) |

Tue Sep 10 | Supervised Learning I: Regression | regression, linear hypothesis, SSD cost; gradient descent; normal equations; maximum likelihood; Reading: Bishop 1.2-1.2.4,3.1-3.1.1 | |

Wed lab | LAB2: Machine Learning Overview | ||

Thu Sep 12 | Supervised Learning II: Classification | classification; sigmoid function; logistic regression. Reading: 4.3.1-4.3.2; 4.3.4 | ps0 due submit solution (11:59pm) ps1 out |

Tue Sep 17 | Supervised Learning III: Regularization | more logistic regression, regularization; bias-variance Reading: Bishop 3.1, 3.2 | |

Wed lab | LAB3: Linear Regression | ||

Thu Sep 19 | Unsupervised Learning I: Clustering | clustering, k-means, Gaussian mixtures. Reading: Bishop 9.1-9.2 | ps1 due (11:59pm) ps2 out |

Tue Sep 24 | Unsupervised Learning II: PCA | dimensionality reduction, PCA. Reading: Bishop 12.1 | |

Wed lab | LAB4: Logistic Regression | ||

NEURAL NETWORKS | |||

Thu Sep 26 | Neural Networks I: Feed-forward Nets | artificial neuron, MLP, sigmoid units; neuroscience inspiration; output vs hidden layers; linear vs nonlinear networks; feed-forward neural networks; Reading: Bishop Ch 5.1-5.3 | ps2 due (11:59pm) ps3 out |

Tue Oct 1 | Neural Networks II: Learning | Learning via gradient descent; computation graphs, backpropagation algorithm. Reading: Bishop Ch 5.1-5.3 | |

Wed lab | LAB5: Gaussian Mixture Models | ||

Thu Oct 3 | Neural Networks III: Convolutional Nets | Convolutional networks. Reading: Bishop Ch 5.5; Goodfellow et al Ch. 9. (optional) | ps3 due (11:59pm) ps4 out |

Tue Oct 8 | Neural Networks IV: Recurrent Nets | recurrent networks; training strategies | |

Wed lab | LAB6: Backprop | ||

Thu Oct 10 | Computing cluster/Tensorflow Intro (guest lecture by Katia Oleinik) | Intro to SCC and Tensorflow; please bring laptops to class to follow along with the lecture. Also see Software/Hardware below. | ps4 due (11:59pm) |

Tue Oct 15 | NO CLASS | ||

Wed lab | LAB7: Midterm Review | ||

Thu Oct 17 | Midterm (closed book, no electronics, you may bring a single 8“x11” sheet of paper with typed or handwritten notes on both sides) | covers everything up to and including Neural Networks IV; expect questions on material covered in lectures, problem sets, LABs and assigned reading | Midterm Practice Problems Solutions |

ADVANCED TOPICS | |||

Tue Oct 22 | Probabilistic Models I: LDA | generalized linear models; generative vs discriminative models; linear discriminant analysis (LDA); Reading: Bishop Ch 4.2 | |

Wed lab | LAB8: Midterm Review | ||

Thu Oct 24 | Probabilistic Models II: Bayesian Methods | priors over parameters; Bayesian linear regression; Reading: Bishop Ch 2.3 | ps5 out |

Tue Oct 29 | Support Vector Machines I (guest lecture by Prof. Sarah Bargal) | maximum margin methods; support vector machines; primal vs dual SVM formulation; Hinge loss vs. cross-entropy loss; Reading: Bishop Ch 7.1.1-7.1.2 | |

Wed lab | LAB9: LDA, SVM, Kernels | ||

Thu Oct 31 | NO CLASS | ps5 due (11:59pm) ps6 out | |

Tue Nov 5 | Support Vector Machines II | non-separable data; slack variables;kernels; multiclass SVM; Reading: Bishop Ch 6.1-6.2, Ch 7.1.3 | |

Wed lab | LAB10: | ||

Thu Nov 7 | Reinforcement Learning I | reinforcement learning; Markov Decision Process (MDP); policies, value functions, Q-learning | |

Tue Nov 12 | Reinforcement Learning II | Q-learning cont’d; deep Q-learning (DQN) | ps6 due (11:59pm) ps7 out |

Wed lab | LAB11: CANCELLED | ||

Thu Nov 14 | Unsupervised Learning III: Anomaly Detection | Anomaly detection methods: density estimation, reconstruction based method, One Class SVM; evaluating anomaly detection | |

Tue Nov 19 | Unsupervised Learning IV: Generative Adversarial Networks (GANs) | Implicit generative models; adversarial methods; Generative Adversarial Nets (GANs); Reading: Goodfellow et al. NIPS 2014 | ps7 due (11:59pm) Challenge starts |

Wed lab | LAB12: RL | Challenge | ||

Thu Nov 21 | Unsupervised Learning V: Semi-supervised Learning | Semi-supervised learning (SSL); self-training; co-training; clustering methods, SSSVM | Deadline to register for challenge (11:59pm) |

APPLICATIONS | |||

Tue Nov 26 | Practical Advice for Applying ML | Machine learning system design; feature engineering; feature pre-processing; learning with large datasets; SGD and mini-batch GD | Deadline to register for challenge (11:59pm) |

Wed lab | NO CLASS- THANKSGIVING | ||

Thu Nov 28 | NO CLASS- THANKSGIVING | ||

Tue Dec 3 | Applications II: Language and Vision | Image and video captioning, visual question answering, other V&L understanding problems | Submit something to leaderboard! |

Wed lab | LAB13: Challenge Discussion | ||

Thu Dec 5 | Applications II: Machine Learning Ethics | Ethics in ML; population bias in machine learning, fairness, transparency, accountability; de-biasing image captioning models | |

Tue Dec 10 | Final Review | submit a course evaluation at | Challenge ends Tue Dec 10 12:00pm (NOON) Code due Wed Dec 11 12:00pm |

Tue Dec 17 | Final exam 3:00PM - 5:00PM in LSE B01 (closed book, no electronics, you may bring a single 8“x11” sheet of paper with typed or handwritten notes on both sides) | expect questions on material covered in the entire course in lectures, problem sets, LABs, and assigned reading | Additional practice problems Solutions |

*schedule is tentative and is subject to change.

SYLLABUS

This course is an introduction to modern machine learning concepts, techniques, and algorithms. Topics include regression, classification, unsupervised and supervised learning, kernels, support vector machines, feature selection, clustering, sequence models, Bayesian methods, and more. Weekly labs and problem sets emphasize taking theory into practice, by gaining a thorough mathematical understanding of the machine learning methods and coding them to apply them to data sets.

Course Pre-requisites

This is an introductory graduate course (open to upper-level undergraduates) and requires the following:

- Linear algebra (CAS CS 132 or MA 242 or equivalent)
- Multivariate Calculus (eg, CAS MA 225), including partial derivatives
- Probability (CAS CS 237 or MA 381 or 581 or equivalent)
- Comfort with programming in Python (eg, CAS CS 111 and 112, or equivalent)

In addition, either Foundations of Data Science (CS391 E1) or Intro to Optimization (CAS CS 507) are highly recommended as a precursor to this course.

Textbooks

The required textbook for the course is

- Bishop, C. M. Pattern Recognition and Machine Learning.

Other recommended supplemental textbooks on general machine learning:

- Duda, R.O., Hart, P.E. and Stork, D.G. Pattern Classiﬁcation. Wiley-Interscience. 2nd Edition. 2001.
- Theodoridis, S. and Koutroumbas, K. Pattern Recognition. Edition 4. Academic Press, 2008.
- Russell, S. and Norvig, N. Artiﬁcial Intelligence: A Modern Approach. Prentice Hall Series in Artiﬁcial Intelligence. 2003.
- Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning. Springer. 2001.
- Koller, D. and Friedman, N. Probabilistic Graphical Models. MIT Press. 2009.
- Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning.

Recommended background reading on matrix calculus:

- Reference reading on matrix calculus and linear algebra can be found here
- Matrix derivatives cheat sheet,

Alternative Machine Learning Courses

- Introduction to Machine Learning (ENG EC 414) is more suitable for undergraduates
- Intro to Optimization (CAS CS 507) is highly recommended as a precursor to this course
- https://www.coursera.org/learn/machine-learning/ Andrew Ng’s basic intro to Machine Learning course on Coursera, can be taken as a precursor to this course

Deliverables/Graded Work

The main graded work for the course is the midterm, final, problem sets and class challenge. There will be eight self-graded weekly problem sets, each consisting of written and programming problems, which are meant to prepare students for the two exams. (Note: there are no team projects in this course). The course grade consists of the following:

- Problem Sets 40%
- Class Challenge 10%
- Midterm 25%
- Final 25%

Class Challenge

The class challenge is a Kaggle-style individual competition that gives each student a chance to apply their knowledge to a real-world problem, developing a solution from start to finish. More information about the challenge will be released later on in the term.

Piazza

We will be using Piazza for class discussion and posting problem sets. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and instructor. Rather than emailing questions to the teaching staff, we encourage you to post your questions on Piazza. If you have any problems or feedback for the developers of the platform, email team@piazza.com.

Software/Hardware

Programming assignments will be developed in the Python programming language, using iPython notebooks. If you do not already have a CS account and would like one, please visit http://www.bu.edu/cs/account.

For the challenge, students will have access the SCC computing cluster which has GPU and CPU nodes available. See the notes from Intro to SCC and Tensorflow for further information on using the SCC. At the end of the notes there are a number of links, including cheat sheets, links to the documentation on how to submit jobs and an examples webpage. Note: there are directions on how to open a Jupyter notebook using port forwarding, however, is now much easier to do this using "SCI OnDemand". You can access it by typing "scc5.bu.edu" in the browser. Also: you should not worry about SUs for this SCC project, but be careful not to use up too much space (you can check how much space is used with the pquota command).

Another resource you may want to use for some problem sets or the challenge is Google’s Colab, see https://colab.research.google.com/notebooks/welcome.ipynb

COURSE POLICIES

Late Policy

Late work will incur the following penalties

- 20% off per day, up to 2 days

The lowest of all the problem set grades will be dropped. Exceptions to these policies can only be made in cases of significant medical or family emergencies, and should be documented.

Academic Honesty Policy

The instructors take academic honesty very seriously. Cheating, plagiarism and other misconduct will be subject to grading penalties up to failing the course. Students enrolled in the course are responsible for familiarizing themselves with the detailed BU policy, available here. In particular, plagiarism is defined as follows, and applies to all written materials and software, including material found online. Collaboration on homework is allowed, but should be acknowledged and you should always come up with your own solution rather than copying (which is defined as plagiarism):

Plagiarism: Representing the work of another as one’s own. Plagiarism includes but is not limited to the following: copying the answers of another student on an examination, copying or restating the work or ideas of another person or persons in any oral or written work (printed or electronic) without citing the appropriate source, and collaborating with someone else in an academic endeavor without acknowledging his or her contribution. Plagiarism can consist of acts of commission-appropriating the words or ideas of another-or omission failing to acknowledge/document/credit the source or creator of words or ideas (see below for a detailed definition of plagiarism). It also includes colluding with someone else in an academic endeavor without acknowledging his or her contribution, using audio or video footage that comes from another source (including work done by another student) without permission and acknowledgement of that source.

Prohibited behaviors include:

- copying all or part of someone else's work, even if you subsequently modify it; this includes cases in which someone tells you what you should write for your solution
- viewing all or part of someone else's work
- showing all or part of your work to another student
- consulting solutions from past semesters, or those found online or in books
- posting your work or our solutions where others can view it (e.g., online).

Incidents of academic misconduct will be reported to the Academic Conduct Committee (ACC). The ACC may suspend/expel students found guilty of misconduct. At a minimum, students who engage in misconduct will have their final grade reduced by one letter grade (e.g., from a B to a C).

Religious Observance

Students are permitted to be absent from class, including classes involving examinations, labs, excursions, and other special events, for purposes of religious observance. In-class, take-home and lab assignments, and other work shall be made up in consultation with the student’s instructors. More details on BU’s religious observance policy are available here.