1 of 14

Reinforcement Learning (CS 828z)

Class Overview & Syllabus

Hal Daumé III

30 August 2016

2 of 14

Topics Covered

  • Background
    • Bandits
    • Markov decision processes & dynamic programming
    • TD-learning and Q-learning
    • Linear models and (deep) neural networks
  • TD/Q learning
  • Policy gradient
  • Offline evaluation / learning
  • Policy iterations
  • Learning from demonstrations
  • RL in humans
  • Miscellaneous...

3 of 14

Prerequisite Knowledge -- must come easily

  • Basic machine learning concepts (CIML 1-4, UML 1)
  • Basic calculus and linear algebra
    • Compute (by hand) gradients of multivariate functions
    • Conceptualize dot products and matrix multiplications as projections
    • Solve multivariate equations using, etc, matrix inversion, etc.
    • Use techniques of Lagrange multipliers for constrained optimization problems
    • Understand and be able to use convexity
    • see math4ml: http://hal3.name/courses/2013S_ML/math4ml.pdf
  • Basic probability and statistics
    • Use chain rule, marginalization rule and Bayes' rule
    • Make use of conditional independence, and understand "explaining away"
    • Understand: random variables, expectations and variance
    • Describe the difference between PDFs and CDFs
    • Compute maximum likelihood solutions for Bernoulli and Gaussian distributions
  • Basic artificial intelligence:
    • E.g., what I cover in UGAI: http://hal3.name/courses/2012S_AI/
    • We will very very VERY quickly review basic RL (MDPs, Q learning, TD learning)

4 of 14

Course structure

  • "Intro" first eight class periods
    • Hal lectures
    • Students scribe
  • "Rest" last 21 class periods
    • student pairs present papers
    • and scribe those papers
  • Throughout class: ask questions on Piazza�
  • Requirements (25 students):
    • Intro scribing worth 1 point
    • Paper presentation worth 2 points
    • You must earn 4 points to get full credit�(so 2 papers, or 1 paper and 2 scribes; you are not allowed to do more than two of the initial section)

5 of 14

Scribing logistics

  • Within two days of class, you must post a summary of the class on Piazza�
  • You are also responsible for answering questions posted there (if you don't know the answer, talk to me)�
  • You are also responsible for helping people with the initial RL programming project

6 of 14

Paper presentation logistics

  • At least 48 hours before corresponding class period, you must post a summary to Piazza
  • At least 24 hours before corresponding class period, others must post at least two questions on that thread
  • At most 48 hours after corresponding class period, you should answer those questions on Piazza (even if you answered them in class)
  • You should assume other members of the class have read (at least skimmed) the paper and at least understand the high level
  • Your presentation should focus on more difficult aspects, and connect the paper to what we've seen before
  • I strongly prefer handouts to slides, but do what you're most comfortable with

7 of 14

Course Project

  • Your choice of topic (I'll provide some ideas)�
  • May be done groups of 3-4 students
    • Write-ups must include the division of labor
    • Same grade for all team members unless there's a large discrepancy in labor�
  • Three items to hand in:
    • Proposal (28 October), includes meeting with Hal
    • Progress Report (18 November)
    • Final Report (8 Dec)�
  • Goal: convince me that you learned something and can put that knowledge to use!

8 of 14

Grading

40% course project��30% paper presentations/scribing��20% asking reading questions��10% intro RL programming project (individual)�

9 of 14

Class Rules

  • Nothing may be handed in late without:
    • prior arrangements (at least one week in advance), or
    • a doctor's note (or equivalent), modulo university regulations
  • You are encouraged to ask questions in class or on Piazza
  • Add/drop deadlines
  • Cheating: see academic integrity policy
  • ADA/DSS:
    • Letter of accommodation to Hal within the first two week of class
    • Please also see ADA/DSS policy

10 of 14

Policy on class behavior

The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of a this course. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. Harassment and hostile behavior are unwelcome in any part of this course. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person’s participation or opportunity for participation in the course.�

We aim for this course to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. Please contact an instructor or CS staff member if you have questions or if you feel you are the victim of harassment (or otherwise witness harassment of others).

Edited from the NAACL Policy: http://naacl.org/policies/anti-harassment.html

11 of 14

Academic Integrity Policy

Any assignment or exam that is handed in must be your own work (unless otherwise stated). However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating (this includes posting solutions online in a public place). If someone dictates a solution to you, you are cheating.�

Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an F in the course and referred to the University Office of Student Conduct. Please don't take that chance - if you're having trouble understanding the material, please let us know and we will be more than happy to help.

12 of 14

ADA/DSS Policy

Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first two weeks of the semester. You may reach them at 301-314-7682 or by visiting Susquehanna Hall on the 4th Floor.�

The CS department does not consider requests for retroactive accommodation to be reasonable. In the same vein, we do not consider it reasonable to ask an instructor to create an alternate assignment of substance. The spirit of our accommodation is to help DSS-advised students find creative ways to meet the high standards we set for all our students.

13 of 14

Minutia (but important minutia!)

  • Office hours
    • Hal is TBA (AVW 3227)
    • Or schedule an appointment�
  • Announcements will be posted to Piazza
    • You are expected to read regularly
    • I won't necessarily repeat announcements in class�
  • Before asking a logistical question, check syllabus

14 of 14

Some content....