Published using Google Docs
Syllabus: Machine Learning for Language Understanding (S'22, LING-UA 52, MLLU, Bowman)
Updated automatically every 5 minutes

Machine Learning for Language Understanding

LING-UA 52 - DS-UA 203

Sam Bowman

Assistant Professor of Linguistics, Data Science, and Computer Science

Spring 2022

Building computational models that can understand human language has long been a goal for researchers in computational linguistics and in the area of artificial intelligence called natural language processing. Many of the biggest successes in research toward this goal have relied on machine learning: a family of methods that allow computers to learn to reproduce some human behavior by example, rather than by explicit programming. This course covers widely-used machine learning methods for language understanding—with a special focus on machine learning methods based on artificial neural networks—and culminates in a substantial final project in which students write an original research paper in AI or computational linguistics.

If you take this class, you’ll be exposed only to a fraction of the many approaches that researchers have used to teach language to computers. However, you’ll get training and practice with all the research skills that you’ll need to explore the field further on your own. This includes not only the skills to design and build computational models, but also to design responsible experiments to test those models, to write and present your results, and to read and evaluate results from the scientific literature.

Basic Logistics

Format

Lecture: Available on Brightspace, released Monday afternoon/evening

Lab Session:  Tuesdays 3:30–4:45p, BOBS LL138

Optional Live Discussion: Wednesdays 2–3p, GCASL 383

The main lecture content will take the form of a prerecorded video each week, in the flipped classroom style. The normal class time will be used for semi-structured discussions of course content and readings. The lab session, led by the TAs, will introduce concrete skills and tools that you'll use in your assignments and project.

All lab sessions and live discussion sessions will be recorded, but to participate actively, you’ll have to attend in person. Participating in live discussion within these sessions is an easy way to earn the required participation credit, but you can also earn this credit through online discussion, and you are not required to attend any specific live sessions.

Prerequisites

Note on alternative courses: This course covers largely the same material as the graduate-level DS-GA 1012. Other than that, it should overlap only very slightly with other NYU courses on computational linguistics and natural language processing. Students interested in a survey of methods in language technology should consider enrolling in both this course and the topics course Natural Language Processing, which is occasionally offered through Computer Science. These two courses may be taken in either order.

Registration

You may register through either course code (LING-UA or DS-UA)—both course codes correspond to exactly the same course. If you see open space in either course code but you’re being blocked from registering for any reason, contact the relevant department directly. This course is open to students in any NYU school or program. Only department administrators can edit student enrollment, and you do not need instructor permission if you’re confident that you meet the prerequisites: For registration questions, please reach out to departments directly rather than to me.

Auditing

Auditors from any school or department are welcome, subject to the prerequisites above. Auditors may not submit assignments, and may only participate in project teams with permission of the instructor. If you’d like to audit, make a private post on Campuswire during the first week of class introducing yourself as an auditor and giving your email address. The TAs can then add you to Brightspace.

Course Site

Forum (Campuswire)

New NYU Classes (brightspace.nyu.edu)

Instructional Staff and Office Hours

Note: Office hour times may change some weeks. Watch the forum for announcements.

Sam Bowman (Instructor)

Eugene Choi (TA)

Arka Talukdar (TA)

Requirements

Schedule

For a preview of some topics, see 2021’s slides.

Wk.

Lab Topic (w/ instructor)

Lecture Topic

Due

Reference

1/24

Python Refresher (AT)

Introduction
Live: Discussion

Campuswire sign-up

E1

1/31

Basics of Supervised Learning (EC)

Text Classification Crash Course

Live: Discussion

Homework Opt-Out

J&M3–5; G2

2/7

Hands-On Supervised Learning (EC)

Deep Learning Crash Course
Live: Discussion

HW1: Classification (AT)

J&M6–7,9

2/14

Deep Learning (AT)

Pretraining & Transfer Learning

Live: Discussion

HW2: Neural Networks (EC)

Ruder, J&M11

2/21

Transfer Learning (EC)

Working with Pretrained Models (w/ Jason Phang)
Live: Discussion

Reading Quiz: BERT

Brown, Bommasani 

2/28

NYU HPC Cluster (EC)

Aligning and Using Large Language Models

Live: Quiz Review & Discussion

Mini-Proposals

See Slides

3/7

Prompting (AT)

Designing Experiments
Live: Discussion

HW3: Transformers (AT)

See Slides

3/14

Spring Break

3/21

Research Mini-Talks (AT)

Syntax & Parsing
Live: Discussion

Reading Quiz: TruthfulQA, Teams

J&M12–14; B5–7

3/28

Working with Parsers (AT)

Formal Semantics
Live: Discussion

Full Proposals

J&M15; C&C

4/4

Formal Semantics (EC)

Writing & Publishing
Live: Discussion

See Slides

4/11

Intermediate LaTeX (AT)

Model Analysis
Live: Quiz Recap and Discussion

Reading Quiz:
Checklist

Rogers

4/18

Model Analysis (EC)

Applications & Ethics
Live: Discussion

Partial Draft

Resources Here

4/25

Making Figures (AT)

Data Collection and Crowdsourcing (w/ Alicia Parrish)
Live: Quiz Recap and Discussion

Reading Quiz:
Disagreements

Resources Here

5/2

Mechanical Turk (EC)

Dataset Analysis
Live: Discussion

See Slides

5/9

Project Presentations (Friday 2–3:50p)

Final Paper (Friday 6p)

Readings

Paper readings: You’ll have to read lots of NLP research papers for your project, but we’ll read only four together as a class. The readings are examples of good recent research, and we’ll be reading them with a special attention to how they’re designed and written. For each paper we read together, you’ll have to be ready to take a brief quiz on the paper. After the quiz, you’ll have a chance to discuss the paper—both as a technical idea and as a piece of writing.

Reference readings: This course doesn’t follow any one book, but the readings noted in the ‘Reference’ column above should cover most of the technical content discussed in class, and offer greater depth and additional references on many points. These readings are often long and dense, and you are not required to read them before lecture. However, you will likely have to refer to them at least occasionally when completing the assignments and the project. Abbreviations above correspond to:

  1. J&M: Dan Jurafsky and Jim Martin’s Speech and Language Processing (3rd ed., available free online, not yet out in print)
  2. G: Yoav Goldberg’s Neural Network Methods for Natural Language Processing (1st ed., available free online through the NYU network)
  3. B: Emily Bender’s Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax (1st ed., available free online through the NYU network)
  1. E: Jacob Eisenstein’s Natural Language Processing (draft, available free online).
  2. C&C: Liz Coppock and Lucas Champollion’s Invitation to Formal Semantics Boot Camp (draft, available free online)

We’re using these books because we think they’re useful, and I encourage you to read them in full.

Paper

The final paper should answer a concrete empirical question involving machine learning for language understanding and should include a literature review that shows that this question hasn’t already been answered decisively by others. A paper could evaluate which of two kinds of model works best on some dataset, evaluate an existing type of model on a new real-world language understanding problem, or offer a quantitative analysis of some existing model using new or existing data.

The paper should include the description of the task, models, experiments and conclusion, and should be no more than 4 pages long, with two optional pages for appendices, and unlimited additional pages for references. The main body of the paper should be complete—you should only use appendices to provide additional clarifying details. It should be prepared in LaTeX. Your paper must follow ACL style closely. Your paper must include a link to a public GitHub repository containing any code used in your experiments.

In addition to the paper itself, we will grade four milestones:

Here are some examples of papers from past years that earned ‘A’ grades.

Contribution Statements

The partial draft and the final paper must state (in one or two sentences) the contributions of each team member. This doesn’t count toward the page limit.

Course Policies

Collaboration

Projects should be completed in groups of two or three, which should be formed by the time the full proposal is submitted. We strongly encourage you to discuss the readings with your peers before the reading quizzes open, but while the quizzes are open, you may not discuss the readings or quiz questions with anyone. You may discuss the homework assignments with anyone at any time, but you must ultimately complete and write-up the assigned work for yourself.

Projects Submitted to Multiple Classes

You may submit a project to both this class and another class—we encourage it!—but if you do this, your project must be ambitious and thorough enough to justify the amount of credit you’ll be getting for it. If your course project overlaps with a project that will be submitted to another class, you must inform both instructors by the partial draft deadline, and get confirmation from the instructors of both classes that the combined project is substantial enough.

Due Dates

Please upload completed assignments to NYU Brightspace by 1:30 PM on Wednesdays.

Late Work

We will take 20 percentage points from a grade for each day (full or partial) that the work is late. Exceptions are available only in case of documented emergencies.

Extra Credit

Some lab sessions may include opportunities to earn extra credit, of no more than one point each toward the total grade. This will not be offered as part of the first lab session, or any lab session that is advertised as an optional review session. You will have to attend live to be eligible. We never offer extra credit of any kind on request.

Instructor Contact

Use public forum posts or come to office hours if you have questions about the course. Only use email or private forum messages for issues that are highly sensitive.

Plagiarism

If you use any text or figures at all from an outside source in your submitted work, you must do two things: (i) Use quotation marks, block quotes, or an explicit note to make it clear exactly what part of your paper comes from this source. (ii) Clearly cite this source. Yes, this includes descriptions or depictions of any baseline models and data you use.

You may reuse code by others for the final paper, as long as you make this clear in your paper. However, all the code you submit for individual assignments must be your own.

Violations of this policy will result in a zero grade for the submitted work and a referral to the university for further investigation.

Written submissions will use TurnItIn by default to automatically flag possible plagiarism cases for our review. You may opt out if you wish by contacting us.

Participating Productively

I expect you to be civil, to be respectful of your classmates, and to consider the impacts of your decisions on others, both in your research work and in your interactions with your classmates. I'll try to do the same.

Applicable University Policies

Academic Integrity

Work you submit should be your own. Please consult the CAS academic integrity policy for more information: https://cas.nyu.edu/content/nyu-as/cas/academic-integrity.html – penalties for violations of academic integrity may include failure of the course, suspension from the University, or even expulsion.

Religious Observance

As a nonsectarian, inclusive institution, NYU policy permits members of any religious group to absent themselves from classes without penalty when required for compliance with their religious obligations. The policy and principles to be followed by students and faculty may be found here: The University Calendar Policy on Religious Holidays (http://www.nyu.edu/about/policies-guidelines-compliance/policies-and-guidelines/university-calendar-policy-on-religious-holidays.html)

Accessibility Accommodations

Students requesting academic accommodations are advised to reach out to the The Moses Center for Student Accessibility as early as possible in the semester for assistance.

NYU’s Henry and Lucy The Moses Center for Student Accessibility

212-998-4980

https://www.nyu.edu/csd 

mosescsd@nyu.edu