EECS 445 Fall 2013: Syllabus

EECS 445: Introduction to Machine Learning

University of Michigan, Fall 2013

Q. What’s difference between EECS 445 and EECS 545?

Q. What’s difference between EECS 445 and EECS 492?

Q. How much will the workload be?

Topics to be covered (tentative)

Lectures: MW 1:30pm-3pm, 1303 EECS

Discussion: F 11am -12pm, 1012 FXB

Instructor: Honglak Lee

Instructor office hours: MW 3-4pm (after class), 3773 BBB

GSI: Sami Abu-El-Haija

GSI office hours: Tuesday 12-1PM, and Thursday 3:30-4:30PM, both at 1637 BBB

Contact: For all questions, please use Piazza (registration required).

NOTE: Please note that this is a tentative syllabus and subject to change.

The course is a programming-focused introduction to Machine Learning. Increasingly, extracting value from data is an important contributor to the global economy across a range of industries. The field of Machine Learning provides the theoretical underpinnings for data-analysis as well as more broadly for modern artificial intelligence approaches to building artificial agents that interact with data; it has had a major impact on many real-world applications.

The course will emphasize understanding the foundational algorithms and “tricks of the trade” through implementation and basic-theoretical analysis. On the implementation side, the emphasis will be on practical applications of machine learning to computer vision, data mining, speech recognition, text processing, bioinformatics, and robot perception and control. Real data sets will be used whenever feasible to encourage understanding of practical issues. The course will provide an opportunity for an open-ended research project. On the theoretical side, the course will give a undergraduate-level introduction to the foundations of machine learning topics including regression, classification, kernel methods, regularization, neural networks, graphical models, and unsupervised learning.

Textbook

There is no required textbook, but we recommend Chris Bishop’s book as a reference. If you want to know more details and recent coverage on specific methods, we recommend Kevin Murphy’s book.

Other relevant (and advanced) books

- Chris Bishop, "Pattern Recognition and Machine Learning," Springer, October 1, 2007.
- Kevin Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012
- Hastie, Tibshrani, Friedman, "Elements of Statistical Learning," Springer, 2010. (available online)
- David Barber, “Bayesian Reasoning and Machine Learning”, Cambridge University Press, 2012 (available online)
- Sutton and Barto, "Reinforcement Learning: An Introduction," MIT Press, 1998 (available online)
- Boyd and Vandenberghe, "Convex Optimization," Cambridge University Press, 2004. (available online)
- Mackay, "Information Theory, Inference, and Learning Algorithms," Cambridge University Press, 2003. (available online)

- EECS 281
- Recommended background: Linear algebra (equivalent to MATH 217 or MATH 417), Multivariate calculus, Probability (equivalent to EECS 401)

* NOTE: Please see the instructor if you do not satisfy the above math requirements. In particular, if you haven't taken at least two of linear algebra, multivariate calculus, and probability courses, it is recommended that you finish them first before taking this course. We will have some review lectures and discussion sessions to refresh your knowledge on these topics.

Discussion sections are designed to facilitate understanding of specific topics through problem solving, practical hands-on tutorials on implementation of learning algorithms, and interactive QnAs. Discussion sections are recorded! Recordings will be available here, about 2 hours after the section’s time

Linear Algebra and Matrix Calculus Notes: http://cs229.stanford.edu/section/cs229-linalg.pdf

There will be five or six problem sets to implement the algorithms we learned about class and also strengthen the understanding of the fundamental concepts, mathematical formulations, algorithms. The substantial portion of problem sets will include programming assignments to implement algorithms covered in the class.

You are given 7 late days during this semester. You can use at most 3 days per homework. In other words, All assignments beyond 3 days will be graded zero. If you use all your 7 late days, you can use more days for 20% grade deduction for the assignment that is late (beyond the 7 late days). Nonetheless, no assignments will be accepted 3 days after the deadline as we intend to make solutions available 3 days after submission.

This course offers an opportunity for getting involved in open-ended research in machine learning. Students are encouraged to develop new algorithms in machine learning, or apply existing algorithms to new problems. Please talk to the instructor before deciding about the project topic. Students will be required to complete their project proposals, progress reports, poster presentations and the final report.

Check resource page in ctools for more detailed information.

(Note: Students can choose to either take a final exam or submit the final project report. If they choose to work on the final project, then instructor approval is required--around the time of submitting progress report.)

Homework: 35%

Midterm: 30%

Project: 35% (progress report 10%; final project 25%)

Final: 35%

* Up to 2% extra credit may be awarded for active class participations.

- Project proposal due: 10/10 23:55pm
- Project progress report due: 11/8 23:55pm
- Midterm exam: 11/21 5-8pm, EECS 1500
- Final project poster presentation: 12/16 4-6pm, Tishman Hall@Beyster Building
- Final project report due: 12/20 23:55pm
- Final exam: 12/13 4-6pm, EECS 1012

EECS 445 is more implementation and application oriented, and it will cover fewer topics, but more implementation details. It's more geared towards undergrads or graduate students outside EECS.

In contrast, EECS 545 will cover more theoretical aspects and advanced techniques.

NOTE:

If you are an undergraduate student, we recommend taking EECS 445. If you want to take EECS 545, please consult the corresponding instructor.

If you are a CSE graduate student, then EECS 445 will not satisfy your breadth/depth requirements.

EECS 445 will focus on details of machine learning algorithms (such as logistic regression, support vector machines, neural networks, etc.), so you will learn more about detailed techniques and implementation of these algorithms. There will be also an opportunity to get involved in projects that will deal with real-world problems. If you are interested in data mining, data analysis/prediction, and pattern recognition, this course will be a better fit.

EECS 492 will cover more broad topics in artificial intelligence, such as search, logic, knowledge representation and reasoning, planning. There will be some overlap since EECS 492 will also cover a few lectures of machine learning.

In terms of workload, it will be 4 unit course. Reasonable math skills (linear algebra, probability, etc.) are strongly recommended. For students who don't have reasonable math skills (such as linear algebra), this course may incur heavier course load.

* indicates advanced topics

- Introduction
- Regression

- Linear regression
- Gradient descent and stochastic gradient
- Newton method
- Probabilistic interpretation of linear regression: Maximum likelihood

- Classification

- k-nearst neighbors (kNN)
- Naive Bayes
- Linear discriminant analysis/ Gaussian discriminant analysis
- Logistic regression
- Generalized linear models, softmax regression

- Kernel methods

- Kernel density estimation, kernel regression
- Support vector machines
- Convex optimization*
- Gaussian processes*

- Regularization

- L2 regularization
- Bias-Variance tradeoff
- Overfitting
- Cross validation, model selection
- L1 regularization, sparsity and feature selection*

- Advice for developing machine learning algorithms
- Neural networks

- Perceptron
- MLP and back-propagation

- Deep learning*
- Unsupervised learning

- Clustering: K-means
- Gaussian mixtures: Expectation maximization
- Dimensionality reduction: Principal Component Analysis (PCA)
- Independent Component Analysis (ICA) and sparse coding

- Ensemble learning

- Bagging
- Boosting

- Graphical models*
- Hidden Markov Models (HMM)*
- Reinforcement Learning*
- Learning Theory*

*Note: There are no classes on 10/14 (study break) and 11/27 (thanksgiving break).

Lecture | Date | Day of week | Lecture | Topics | Chapters | Homework/Project deadlines |

1 | 9/4 | Wed | Introduction and Overview | Introduction. What is machine learning. Examples of Machine Learning problems. | Bishop: Ch 2.1, Appendix B | |

2 | 9/9 | Mon | Supervised Learning: regression | Linear Regression [Sum of Squared Error, Least Squares, Gradient Descent] | Bishop: Ch 3.1; Murphy: 7.1-7.3; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes1.pdf | |

3 | 9/11 | Wed | Supervised Learning: regression | Linear Regression [Matrix calculus; Closed form solution]; Regularized Linear Regression. | Bishop: Ch 3.2, 1.1, 2.5; Murphy: 7.3, 7.5; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes1.pdf | |

4 | 9/16 | Mon | Supervised Learning: regression | Regularized Linear Regression. Probability review. Maximum Likelihood interpretation of Linear Regression. Locally weighted linear regression | Bishop: Ch 3.2, 1.1, 2.5; Murphy: 7.3, 7.5, 14.7.5; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes1.pdf | 9/17 HW1 out |

5 | 9/18 | Wed | Supervised Learning: classification | Classification - Discriminative Models: Logistic Regression, Newton's method, K-Nearest Neighbors | Bishop: Ch 4.1, 4.3; Murphy 8.1-8.3, 1.4.1-1.4.3; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes1.pdf | |

6 | 9/23 | Mon | Supervised Learning: classification | Classification - Probabilistic Generative Models [Gaussian Discriminative Analysis] | Bishop: Ch 4.2; Murphy 4.2; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes2.pdf | 9/24 HW1 due; HW2 out |

7 | 9/25 | Wed | Supervised Learning: classification | Classification - Probabilistic Generative Models [Naive Bayes]. Discriminant Functions [Fisher's linear discriminant, Perceptron learning algorithm] | Bishop: Ch 4.2; Murphy 3.5, 8.6; 8.5.4; Stanford CS229 note: www.stanford.edu/class/cs229/notes/cs229-notes2.pdf | |

8 | 9/30 | Mon | Kernel mehods | Kernel methods: Definition, constructing kernels | Bishop: Ch 6.1-6.3; ; Murphy 14.1-14.2, 14.4 | |

9 | 10/2 | Wed | Kernel mehods | Kernel methods: Kernelized linear regression; Kernel regression | Bishop: Ch 6.1-6.3; Murphy 14.7 | |

10 | 10/7 | Mon | Kernel mehods | Support Vector Machine (SVM) | Bishop: Ch 7.1; Murphy 14.5 | 10/8: HW2 due; HW3 out |

11 | 10/9 | Wed | Regularization and Model Selection | Regularization and Model Selection; Cross validation | http://cs229.stanford.edu/notes/cs229-notes5.pdf; Murphy 1.4.8 | 10/10: Project proposal due |

12 | 10/16 | Wed | Advice on using ML algorithms | Advice on using ML algorithms | ||

13 | 10/21 | Mon | Neural Networks | Neural Networks: Representation | Bishop: Ch 5; Murphy 16.5 | |

14 | 10/23 | Wed | Neural Networks | Neural Networks: Learning with backpropagation; Deep autoencoders; convolutional neural networks | Bishop: Ch 5; Murphy 16.5; Bengio's survey paper www.iro.umontreal.ca/~bengioy/papers/ftml.pdf | 10/22: HW#3 due; HW#4 out |

15 | 10/28 | Wed | Unsupervised learning | Clustering: K-means | Bishop: Ch 9; Murphy 11.2 | |

16 | 10/30 | Mon | Unsupervised learning | Clustering: Gaussian mixtures - expectation maximization | Bishop: Ch 9; Murphy 11.3, 11.4 | |

17 | 11/4 | Wed | Unsupervised learning | Dimensionality reduction: PCA | Bishop: Ch 12.4; Murphy 12.2 | |

18 | 11/6 | Wed | Unsupervised learning | Sparse coding and sparse autoencoder | www.iro.umontreal.ca/~bengioy/papers/ftml.pdf; Murphy 13.8, 28 | 11/8: Project progress report |

19 | 11/11 | Mon | Ensemble Methods | Bagging, Boosting | Bishop: Ch 14.3 | 11/12: HW#4 due |

20 | 11/13 | Wed | Reinforcement learning | RL overview | Sutton and Barto: Ch 1-3; http://cs229.stanford.edu/notes/cs229-notes12.pdf | |

21 | 11/18 | Mon | Midterm review | |||

22 | 11/20 | Wed | Kernel mehods* | Gaussian process regression* (Multivariate Gaussian distribution) | Bishop: Ch 2.3, 3.3, 6.4 | 11/21 5-8pm: Midterm (place: EECS 1500) |

23 | 11/25 | Mon | Sequence modeling | Hidden Markov Models | Bishop: Ch 13.1, 13.2 | 11/22: HW#5 out |

24 | 12/2 | Mon | Learning Theory | Learning theory overview; VC dimension; Generalization bound | ||

25 | 12/4 | Wed | Application of ML (guest lecture) | |||

26 | 12/9 | Mon | Application of ML (guest lecture) | 12/6: HW#5 due | ||

27 | 12/11 | Wed | Feature selection | |||

12/13 | Fri | Final exam (option 2) | ||||

12/16 | Mon | Final poster presentation (option 1) | 12/20: Final report due (option 1) |