Course Information

Classroom: DOW 1005

Time: MW 10:30am-12:00pm

Instructor: Honglak Lee

Instructor office hours: Tuesdays 2pm-4pm, 3773 BBB

GSI: Kihyuk Sohn

GSI office hours: Monday 3pm-4pm, Friday 2pm-3pm, 1637 BBB

Contact: For all questions, please use Piazza (registration required).

NOTE: Please note that this is a tentative syllabus and subject to change.

Course Description

The goal of machine learning is to develop computer algorithms that can learn from data or past experience to predict well on the new unseen data. In the past few decades, machine learning has become a powerful tool in artificial intelligence and data mining, and it has made major impacts in many real-world applications.

This course will give a graduate-level introduction of machine learning and provide foundations of machine learning, mathematical derivation and implementation of the algorithms, and their applications. Topics include supervised learning, unsupervised learning, learning theory, graphical models, and reinforcement learning. This course will also cover recent research topics such as sparsity and feature selection, Bayesian techniques, and deep learning. In addition to mathematical foundations, this course will also put an emphasis on practical applications of machine learning to artificial intelligence and data mining, such as computer vision, data mining, speech recognition, text processing, bioinformatics, and robot perception and control. The course will require an open-ended research project.

Text books

Chris Bishop, "Pattern Recognition and Machine Learning," October 1, 2007.
Hastie, Tibshrani, Fiedman, "Elements of Statistical Learning," Springer, 2010. (available online)
Sutton and Barto, "Reinforcement Learning: An Introduction," MIT Press, 1998 (available online)
(optional) Boyd and Vandenberghe, "Convex Optimization," Cambridge University Press, 2004. (available online)
(optional) Mackay, "Information Theory, Inference, and Learning Algorithms," Cambridge University Press, 2003. (available online)

Prerequisites

EECS 492: Introduction to Artificial Intelligence (This is an official prerequisite, but it is not accurate in that you can still take this course if you haven't completed EECS 492.)
Linear algebra (equivalent to MATH 217, MATH 417)
Multivariate calculus
Probability (equivalent to EECS 401)
Programming skills (equivalent to EECS 280, EECS 281, and experience in MATLAB)

* NOTE: Please see the instructor if you do not satisfy the above requirements. In particular, if you haven't taken at least two of linear algebra, multivariate calculus, and probability courses, it is strongly recommended that you finish them first before taking this course.

Homework

There will be four or five (approximately bi-weekly) problem sets to strengthen the understanding of the fundamental concepts, mathematical formulations, algorithms, and the applications. The problem sets will also include programming assignments to implement algorithms covered in the class.

Project

This course offers an opportunity for getting involved in open-ended research in machine learning. Students are encouraged to develop new theory and algorithms in machine learning, or apply existing algorithms to new problems, or apply to their own research problems. Please talk to the instructor before deciding about the project topic. Students will be required to complete their project proposals, progress reports, poster presentations and the final report.

Check resource page in ctools for more detailed information.

Grading

Homework: 30%

Midterm: 30%

Project: 40% (progress report 10%; final project 30%)

* Up to 2% extra credit may be awarded for active class participations.

Important dates

Project proposal due: 2/6 23:59pm (extended to 2/8 23:59pm)
Project progress report due: 3/11 23:55pm (extended to 3/13 23:55pm)
Midterm exam: 3/27 (GGBL 1504, 6-9pm)
Final project poster presentation: 4/25 (time and place: TBD)
Final project report due: 4/29 23:55pm (no late days)

Topics to be covered (tentative)

Introduction
Regression

Linear regression
Gradient descent and stochastic gradient
Newton method
Probabilistic interpretation of linear regression: Maximum likelihood

Classification

k-nearst neighbors (kNN)
Naive Bayes
Linear discriminant analysis/ Gaussian discriminant analysis
Logistic regression
Generalized linear models, softmax regression

Kernel methods

Kernel density estimation, kernel regression
Support vector machines
Convex optimization
Gaussian processes

Regularization

L2 regularization
L1 regularization, sparsity and feature selection
Bias-Variance tradeoff
Overfitting
Cross validation, model selection
Advice for developing machine learning algorithms

Neural networks

Perceptron
MLP and back-propagation

Learning theory

Sample complexity
VC dimension
PAC learning
Error bounds

Graphical models

Bayesian networks

Representation
Exact inference
Sampling based inference

Learning in Bayesian networks

Maximum likelihood estimation
Expectation maximization
Hidden Markov Models (HMM)

Structure learning
Bayesian inference and learning
Markov networks

Inference and learning

Unsupervised learning

Clustering: K-means
Gaussian mixtures
Expectation Maximization (revisited)
PCA
Dimensionality reduction: ISOMAP, LLE
ICA
Sparse coding
Boltzmann machines and autoencoders, Deep belief networks

Reinforcement learning

MDP
Value iteration and policy iteration
Dynamic programming
Value function approximation
TD learning

Lecture schedule, reading lists, and handouts

No	Date		Lecture	Topics	Readings and useful Links	Handouts and due dates
1	1/9	Wed	Introduction and Overview	Introduction	Bishop: Ch 2.1, Appendix B
2	1/14	Mon	Supervised Learning: regression	Linear regression	Bishop: Ch 3.1; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes1.pdf	HW1 out
3	1/16	Wed	No class (to be replaced with a online or makeup lecture) Supervised Learning: regression	Regularized linear regression; Locally weighted linear regression; Kernel regression; K-nearest neighbor	Bishop: Ch 3.2, 1.1, 2.5; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes1.pdf
	1/21	Mon	No class	No class - MLK day
4	1/23	Wed	Supervised Learning: classification	Logistic regression; Generalized linear models; Linear discriminant analysis	Bishop: Ch 4.1, 4.3; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes1.pdf
5	1/28	Mon	Supervised Learning: classification	Perceptron; Gaussian discriminant analysis; Naive Bayes	Bishop: Ch 4.2; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes2.pdf	HW1 due, HW2 out
6	1/30	Wed	Kernel mehods	Kernel methods; kernel regression	Bishop: Ch 6.1-6.3
7	2/4	Mon	Kernel methods	Support vector machines	Bishop: Ch 7.1
8	2/6	Wed	Kernel methods	Support vector machines; convex optimization overview	Bishop: Ch 7.1; Stephen Boyd's lecture notes (available in resources)	project proposal due
9	2/11	Mon	Kernel methods	Multivariate Gaussian distribution; Bayesian linear regression; Gaussian Processes	Bishop: Ch 2.3, 3.3, 6.4	HW2 due, HW3 out
10	2/13	Wed	Kernel methods	Gaussian Processes	Bishop: Ch 6.4
11	2/18	Mon	Regularization and Model Selection	Regularization and Model Selection; Advice on using ML algorithms	http://cs229.stanford.edu/notes/cs229-notes5.pdf
12	2/20	Wed	Feature selection	Advice on using ML algorithms; Feature Selection	http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf
13	2/25	Mon	Graphical models	Bayesian Networks	Bishop: Ch 8.1, 8.2	HW3 due, HW4 out
14	2/27	Wed	Graphical models	Markov Networks	Bishop: Ch 8.3
	3/4	Mon	No class	No class - winter break
	3/6	Wed	No class	No class - winter break
15	3/11	Mon	Graphical models	Inference in graphical models	Bishop: Ch 8.4	Project progress report due
16	3/13	Wed	Graphical models	Inference in graphical models	Bishop: Ch 9; See also Bishop Ch 2 for basics of maximum likelihood for binary/multinomial/Gaussian variables
17	3/18	Mon	Graphical models	Learning in graphical models; EM	Bishop: Ch 9; See also Bishop Ch 2 for basics of maximum likelihood for binary/multinomial/Gaussian variables
18	3/20	Wed	Unsupervised learning	Abstract view of EM; Unsupervised Learning – PCA	Bishop: Ch 9	HW4 due
19	3/25	Mon	Midterm exam review
20	3/27	Wed	Advanced Unsupervised learning	Nonlinear latent variable models; Deep Learning	Bishop: Ch 12.4
21	4/1	Mon	Deep Learning	Neural network; Autoencoders; Restricted Boltzmann machines; Deep belief networks	Bengio's survey paper www.iro.umontreal.ca/~bengioy/papers/ftml.pdf	Midterm exam
22	4/3	Wed	Unsupervised Learning	HMM	Bishop: Ch 13.1, 13.2
23	4/8	Mon	Reinforcement learning	RL introduction	Sutton and Barto: Ch 1-3
24	4/10	Wed	Reinforcement learning	Learning optimal policies: Dynamic Programming, Monte Carlo; TD learning	Sutton and Barto: Ch 4, 5, 6
25	4/15	Mon	Learning Theory	Learning theory overview; VC dimension; Generalization Bound	http://cs229.stanford.edu/notes/cs229-notes4.pdf
26	4/17	Wed	Ensemble Methods	Boosting	Bishop: Ch 14.3


	4/25	Thur	Final project presentation (time and place: TBD)			Final project report due: 4/29 23:59pm

Optional Review sessions

* NOTE: Attendance is optional.

No	Date	Review session	Topics	Readings and useful Links	Handouts
1	TBD	Linear Algebra review	Overview of linear algebra, matrix operations and calculus; MATLAB tutorial	Stanford CS229 linear algebra review: http://cs229.stanford.edu/section/cs229-linalg.pdf
2	TBD	Probability review	Overview of probability	Stanford CS229 probability review: http://cs229.stanford.edu/section/cs229-prob.pdf