STA4516

Topics in Probabilistic Programming

Instructor

Prof Daniel M. Roy

daniel.roy@utoronto.ca (please include “STA4516” in your email’s subject line or body)

Office hours: SS 6026C, by appointment.

Time and Location

Date/Time: Fall 2015, Tuesdays, 2--5pm from Oct. 27 through Dec. 8

Room: SS 2116 (Sidney Smith Hall)

Overview

Probabilistic programming is an emerging area of machine learning, statistics, and computer science, and can be characterized as the systematic study of algorithmic processes that describe and transform uncertainty. Probabilistic programming languages give users the ability to describe complex probabilistic models with code, rather than formulas, while universal inference engines automate the task of implementing inference algorithms for models described by probabilistic code. This class will discuss the key ideas behind probabilistic programming languages and systems, as well as give students a glimpse at the theoretical foundations of probabilistic programming itself.

Pre-requisites

Mathematical maturity and some background in probability/measure theory, functional programming, and analysis recommended.

Structure of the course

The course will be structured around weekly readings and student presentations of papers.  Each week there will be a lecture on a new subject.

  ----------- No class on Nov 10 ------------------------------

See Reading (below) for papers associated with each week’s topic.

Grading

For students taking the class for credit, their grade will be based on:

Students auditing will be expected to participate actively (by asking and answering questions, and doing weekly readings) and even scribe, if necessary. Students auditing who would like to present a paper are welcome to do so, time permitting. Everyone is welcome to submit a research project.

Class participation

Students should remain attentive, ask questions, and answer questions during class.  Participation also involves reading over those papers being presented that week, and coming with questions.  By Week 3, students should have completed the online tutorials (ProbMods and DIPPL.)  Class participation also involves scribing.

What is scribing?

Scribing requires that the student take detailed notes during lecture, and produce a LaTeX version of these notes.  The notes should be free from typos, and should allow a reader to follow the logic of the presentation.  These will be distributed to the students.  A LaTeX template for scribing is available from the course website and must be used.

Presentations

Students are expected to make paper presentations, which can be tackled in groups of up to 2 students collaboratively, although two people are expected to cover the material in greater depth and sophistication.  (Groups of 3 must receive my prior approval.)  A list of scheduled presentations will be available on the course website.  Papers other than those that have been marked with yellow asterisks (*) require prior approval.  Students are encouraged to find papers in areas of interest to them.

Presentation structure and notes

Presentations can either be slide presentations or chalk talks. Presentations should last 30 minutes. Up to fifteen minutes of questions will push this to 45 minutes in total. (Rehearse your presentations to check for timing. A rough guide is no more than one slide per minute.) In the case of a slide presentation, PDF slides will be placed on the website after the class. In the case of a chalk-talk, the presenter is responsible for producing notes for their own talk. These can be the hand-written notes if they are clearly written and suitable for publication on the website, otherwise they should be LaTeX'd.  Notes should be submitted by the Friday following the presentation.

Project Report

Projects are meant to be short and low-risk, but potentially the start to something more interesting.  The deliverables are:

  1. One-page summaries of projects, due by email by Wednesday, November 11 (note there is no class that week due to a study break).  Summaries allow me to give early advice.
  2. Project reports, due December 8, by email, by midnight, in PDF format, with proper citations and bibliographies, free from grammatical and spelling errors.  I take plagiarism extremely seriously. (Any code produced should be submitted alongside the PDF, e.g., in a tarball or zip file.)

Project reports should ideally be produced in LaTeX, and can use any LaTeX style, but I am looking for what would amount to ~4 pages worth of material had the paper been written in the NIPS style. 

The prototypical probabilistic programming project is to find a statistical model in the literature, and implement (or approximate) it in one (or ideally two different) probabilistic programming system(s). The goal would be to reproduce a key plot/figure in a research paper that used the model on some data.  A good project along these lines would carefully document the challenges that arose in using each probabilistic programming systems, including any approximations that had to be made in order to get useful results.  An excellent project might then use the flexibility of the probabilistic programming ecosystem to propose a slightly improved model and get some preliminary results.  Another contribution that would make an excellent project, would be to show how a family of related statistical models can be represented by a single probabilistic programming library. There are many other ideas that would make an excellent project.

Side note: Note that many probabilistic programming systems are alpha-level software and may contain bugs.  Developers may not be able to respond to you quickly enough to meet deadlines.  Other systems are more mature.  I recommend starting with a more mature system, and, once you have enough results to write up research paper.  Alternatively, start early.  Note also that some systems cannot handle large data sets because the resulting algorithm is too slow or too memory intensive.  Start small, and build up.

Another prototypical research project is to perform a literature review of several related papers.  A good project along these lines would identify strengths and weaknesses of the papers, and propose (actually) interesting avenues for further exploration.  It’s hard to summarize what an excellent project would be. I’d probably want to learn something in the course of reading the review.

For those who are theoretically inclined, I welcome your own ideas, although I would recommend that you come to me early so that we can discuss them.

Policy on Late Work

Research projects are due by email on the last day of class (11:59pm Toronto time, Dec 8).  Every day of delay thereafter results in a 33% deduction.  Presentations cancelled later than Friday noon will be counted as missed.  Missed presentations can be made up if there are slots available for 75%.  Extenuating circumstances will be handled on a case by case basis.

Course Webpage

Blackboard will be used to manage the course list and grades, but the course information and links will be available at http://danroy.org/teaching/2015/STA4516/

Accessibility

Students with diverse learning styles and needs are welcome in this course. Please feel free to approach me or Accessibility Services so we can assist you in achieving academic success in this course. If you have not registered with the Accessibility Services and have a disability, please visit the Accessibility Services website at http://www.accessibility.utoronto.ca for information on how to register.

Reading

There is no required course text (indeed, a suitable text doesn’t even exist yet).  Students are responsible for reading the papers being presented each week.  Papers marked with * are approved for presenting.  Students may suggest other articles (listed here, or otherwise), but my prior approval is required.

Online Tutorials and Books

Seminal Papers

PhD Theses

Algorithms

Domain Specific Languages and Applications

Combining Logic and Probability in AI

Theory (Computable Analysis and Probability Theory)

Theory (Programming Languages)

Theory (Probability)

Additional reading

A larger (but still far from complete) collection of research articles on probabilistic languages and probabilistic programming can be found here:
http://probabilistic-programming.org/research/ 

Relevant work in statistics, probability, and other fields

Selection of Probabilistic Programming Languages and Systems

Many more can be found listed at http://probabilistic-programming.org