STAD68: Advanced Machine Learning and Data Mining
Prof Daniel M. Roy
firstname.lastname@example.org (please include “STAD68” in your email’s subject line or body)
Office hours: Mondays 3--4pm in IC 462, or by appointment. Changes will be announced.
Mondays 7--10pm in IC 320.
There are twelve lectures: The first lecture is January 5. The last lecture is March 30. There is no lecture on February 16.
Statistical aspects of supervised learning: regression, regularization methods, parametric and nonparametric classification methods, including Gaussian processes for regression and support vector machines for classification, model averaging, model selection, and mixture models for unsupervised learning. Some advanced methods will include Bayesian networks and graphical models.
STAC58 and STAC67.
Each student’s grade in the course will be based on:
The following is a tentative outline of the material we will cover:
Students with diverse learning styles and needs are welcome in this course. Please feel free to approach me or Accessibility Services so we can assist you in achieving academic success in this course. If you have not registered with the Accessibility Services and have a disability, please visit the Accessibility Services website at http://www.accessibility.utoronto.ca for information on how to register.
This assignment is to be done by each student individually. You may discuss it in general terms with other students, but the work you hand in should be your own. In particular, you should not leave any discussion with someone else with any written notes (either paper or electronic). You may not use any resources/aids other than the book and Wikipedia. If you are not certain whether a resource is allowed, email the instructor.
Assignments are due by 6:50pm on the date marked. Late assignments will be penalized 10% of the available marks per 24 hours up to a maximum of 72 hours. Beyond this, no extensions will be granted on homework assignments, except in the case of an official Student Medical Certificate or a written (not emailed) request submitted at least one week before the due date and approved by the instructor. Please plan ahead.
The required textbook is the 4th printing of
Kevin P. Murphy (2012), Machine Learning: A Probabilistic Perspective, MIT Press.
I have requested that a copy be put on 3-hour reserve in the library and that the bookstore order copies for purchase.
The printing # can be determined from the copyright page: Look for a sequence 10 9 8 7 6 … k. The number k is the printing. You might be able to find the book cheaper by googling around, but be mindful of the printing #. If you get an earlier printing, you’ll have to make (extensive) use of the online errata at:
Last I checked, there were 250 typos and ~10 significant errors in the first printing. The book should be available from the campus bookstore by the second week.
There are many other excellent resources for learning about machine learning. The following list contains just a few:
Programming assignments can be completed in any language you like, although the assignment handouts and supporting files will generally only be provided in one language (usually R, MATLAB, or Python). To be an effective machine learner/data scientist, you must know how to program and manipulate data! Being able to work efficiently with all of the key machine learning languages is necessary in industry and in applied graduate work.
The R language (http://www.R-project.org) is free software. All public lab machines have R installed. (See
for an introduction. There are many other resources online.) If you have access to a machine where you can install software, you might consider R Studio (http://www.rstudio.com/products/rstudio/), which is an integrated development environment (IDE) that provides many useful features.
The MATLAB language is proprietary, but Murphy’s textbook has many examples in MATLAB in his book and online at
MATLAB 2013 is available in the public Window labs, although you will need a version that has the Statistics toolkit installed to run the examples from Murphy’s book.
Other core machine learning languages include Python and C/C++. Some up and coming languages that have gained a lot of interest within machine learning include Scala and Julia.
I will use a mixture of Blackboard and the course website
to post material.