Published using Google Docs
Syllabus: Natural Language Understanding and Computational Semantics (S'22, DS-GA 1012, NLU, Bowman)
Updated automatically every 5 minutes


Natural Language Understanding and Computational Semantics

DS-GA 1012 - LING-GA 1012

Sam Bowman

Assistant Professor of Linguistics, Data Science, and Computer Science

Spring 2022

Since at least the proposal of the Turing test, building computational systems that can communicate with humans using natural language has been a central goal for what we now think of as AI research. Understanding real, naturally occurring human language is the key to reaching this goal. This course surveys recent successes in language understanding, but it is focused primarily on preparing students to do original research in this area, culminating with a substantial final project that should meet the standards of published work in this field.

The course is centered on text rather than speech, but within that, it will touch on the full range of applicable techniques for language understanding, including formal logics, statistical methods, distributional methods, and deep learning, and will bring in ideas from formal linguistics where they can be readily used in practice. We’ll discuss concrete tasks like question answering as well as higher-level issues like how to effectively represent language meaning.

Basic Logistics

Format

Lecture: Available on New NYU Classes, released every Monday afternoon

Live Discussion: Wednesdays 10–11:40a, Cantor Film Center (36 East 8th Street), Rm. 101

Live Lab Session: Thursdays 11:15a–12:05p, 19 University Place, Room 102

The main lecture content will take the form of a prerecorded video each week, in the flipped classroom style. The normal class time will be used for semi-structured discussions of course content and readings. The lab session, led by the TAs, will introduce concrete skills and tools that you'll use in your assignments and project.

All lab sessions and live discussion sessions will be recorded, but to participate actively, you’ll have to attend in person. Participating in live discussion within these sessions is an easy way to earn the required participation credit, but you can also earn this credit through online discussion, and you are not required to attend any specific live sessions.

Prerequisites

Recommended: NYU DS-GA 1011, or recent coursework elsewhere covering natural language processing with deep learning.

Required: Prior experience with Python programming (including experience writing >100-line programs with common libraries from scratch), basic familiarity with linear algebra.

Note on alternative courses: This course covers largely the same material as the advanced undergraduate course LING-UA 52. Other than that, it should overlap only very slightly with other NYU courses on computational linguistics and natural language processing.

Registration

This course is offered under two different course codes, but both of them correspond to the same actual course, with the same meeting times and assignments. The DS-GA course code is restricted to Data Science students until mid-January, when Data Science opens up enrollment for all of its courses. The LING-GA course code is open to everyone, but has fewer seats.

Faculty do not have the power to enroll students or add students to waiting lists, and we do not have the ability to create Albert permission codes. Please email departments directly for registration issues, rather than me. Students who meet the prerequisites but are blocked from enrolling on Albert should email one or both departments directly to enroll or join a waiting list. If you are working on a PhD in a closely related area and you’re having trouble enrolling, ask your advisor to contact Data Science directly for help—they may be able to guarantee you a seat if you ask early. This course is open to students in any NYU school or program.

Auditing

Auditors from any school or department are welcome, subject to the prerequisites above. Auditors may not submit assignments, and may only participate in project teams with permission of the instructor. If you’d like to audit, make a private post on Campuswire during the first week of class introducing yourself as an auditor and giving your email address. The TAs can then add you to Brightspace.

Course Site

Forum (Campuswire)

New NYU Classes (brightspace.nyu.edu)

Instructional Staff and Office Hours

Note: Office hour times may change some weeks. Watch the forum for announcements.

Sam Bowman (Instructor)

Richard Yuanzhe Pang (TA)

Ying Wang (TA)

Mark Yilun Kuang (Grader)

Requirements

Schedule

For a preview of some topics, see 2021’s slides.

Wk.

Lecture Topic

Due

Lab Topic (w/ instructor)

Reference

1/24

Introduction
Live: Discussion

Campuswire sign-up

Basics of Supervised Learning (RP)

E1

1/31

Text Classification Crash Course

Live: Discussion

Homework Opt-Out

Hands-On Supervised Learning (YW)

J&M3–5; G2

2/7

Deep Learning Crash Course
Live: Discussion

HW1: Classification (YW)

Deep Learning (RP)

J&M6–7,9

2/14

Pretraining & Transfer Learning

Live: Discussion

HW2: Neural Networks (RP)

Transfer Learning (YW)

Ruder, J&M11

2/21

Working with Pretrained Models (w/ Jason Phang)
Live: Discussion

Reading Quiz: BERT

NYU HPC Cluster (RP)

Brown, Bommasani 

2/28

Aligning and Using Large Language Models

Live: Quiz Recap and Discussion

Mini-Proposals

Prompting (YW)

See Slides

3/7

Designing Experiments
Live: Discussion

HW3: Transformers (MK)

Research Mini-Talks (RP)

See Slides

3/14

Spring Break

3/21

Syntax & Parsing
Live: Discussion

Reading Quiz: TruthfulQA, Teams

Working with Parsers (YW)

J&M12–14; B5–7

3/28

Formal Semantics
Live: Discussion

Full Proposals

Formal Semantics (YW)

J&M15; C&C

4/4

Writing & Publishing
Live: Discussion

Intermediate LaTeX (RP)

See Slides

4/11

Model Analysis
Live: Discussion

Reading Quiz:
Checklist

Model Analysis (YW)

Rogers

4/18

Applications & Ethics
Live: Discussion

Partial Draft

Making Figures (RP)

Resources Here

4/25

Data Collection and Crowdsourcing (w/ Alicia Parrish)
Live: Discussion

Reading Quiz:
Disagreements

Mechanical Turk (YW)

Resources Here

5/2

Dataset Analysis
Live: Discussion

Extra Office Hours (RP)

See Slides

5/9

Project Presentations (Friday 8–9:50a)

Final Paper (Friday noon)

Readings

Paper readings: You’ll have to read lots of NLP research papers for your project, but we’ll read only four together as a class. The readings are examples of good recent research, and we’ll be reading them with a special attention to how they’re designed and written. For each paper we read together, you’ll have to be ready to take a brief quiz on the paper. After the quiz, you’ll have a chance to discuss the paper—both as a technical idea and as a piece of writing.

Reference readings: This course doesn’t follow any one book, but the readings noted in the ‘Reference’ column above should cover most of the technical content discussed in class, and offer greater depth and additional references on many points. These readings are often long and dense, and you are not required to read them before lecture. However, you will likely have to refer to them at least occasionally when completing the assignments and the project. Abbreviations above correspond to:

  1. J&M: Dan Jurafsky and Jim Martin’s Speech and Language Processing (3rd ed., available free online, not yet out in print)
  2. G: Yoav Goldberg’s Neural Network Methods for Natural Language Processing (1st ed., available free online through the NYU network)
  3. B: Emily Bender’s Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax (1st ed., available free online through the NYU network)
  1. E: Jacob Eisenstein’s Natural Language Processing (draft, available free online).
  2. C&C: Liz Coppock and Lucas Champollion’s Invitation to Formal Semantics Boot Camp (draft, available free online)

We’re using these books because we think they’re useful, and I encourage you to read them in full.

Paper

The final paper should answer a concrete empirical question involving machine learning for language understanding and should include a literature review that shows that this question hasn’t already been answered decisively by others. A paper could evaluate which of two kinds of model works best on some dataset, evaluate an existing type of model on a new real-world language understanding problem, or offer a quantitative analysis of some existing model using new or existing data.

The paper should include the description of the task, models, experiments and conclusion, and should be no more than 4 pages long, with two optional pages for appendices, and unlimited additional pages for references. The main body of the paper should be complete—you should only use appendices to provide additional clarifying details. It should be prepared in LaTeX. Your paper must follow ACL style closely. Your paper must include a link to a public GitHub repository containing any code used in your experiments.

In addition to the paper itself, we will grade four milestones:

Here are some examples of papers from past years that earned ‘A’ grades.

Contribution Statements

The partial draft and the final paper must state (in one or two sentences) the contributions of each team member. This doesn’t count toward the page limit.

Course Policies

Collaboration

Projects should be completed in groups of three or four, which should be formed by the time the full proposal is submitted. We strongly encourage you to discuss the readings with your peers before the reading quizzes open, but while the quizzes are open, you may not discuss the readings or quiz questions with anyone. You may discuss the homework assignments with anyone at any time, but you must ultimately complete and write-up the assigned work for yourself.

Projects Submitted to Multiple Classes

You may submit a project to both this class and another class—we encourage it!—but if you do this, your project must be ambitious and thorough enough to justify the amount of credit you’ll be getting for it. If your course project overlaps with a project that will be submitted to another class, you must inform both instructors by the partial draft deadline, and get confirmation from the instructors of both classes that the combined project is substantial enough.

Due Dates

Please upload completed assignments to NYU Brightspace by 9:30 AM on Wednesdays.

Late Work

We will take 20 percentage points from a grade for each day (full or partial) that the work is late. Exceptions are available only in case of documented emergencies.

Extra Credit

Some lab sessions may include opportunities to earn extra credit, of no more than one point each toward the total grade. This will not be offered as part of the first lab session, or any lab session that is advertised as an optional review session. You will have to attend live to be eligible. We never offer extra credit of any kind on request.

Instructor Contact

Use public forum posts or come to office hours if you have questions about the course. Only use email or private forum messages for issues that are highly sensitive.

Plagiarism

If you use any text or figures at all from an outside source in your submitted work, you must do two things: (i) Use quotation marks, block quotes, or an explicit note to make it clear exactly what part of your paper comes from this source. (ii) Clearly cite this source. Yes, this includes descriptions or depictions of any baseline models and data you use.

You may reuse code by others for the final paper, as long as you make this clear in your paper. However, all the code you submit for individual assignments must be your own.

Violations of this policy will result in a zero grade for the submitted work and a referral to the university for further investigation.

Written submissions will use TurnItIn by default to automatically flag possible plagiarism cases for our review. You may opt out if you wish by contacting us.

Participating Productively

I expect you to be civil, to be respectful of your classmates, and to consider the impacts of your decisions on others, both in your research work and in your interactions with your classmates. I'll try to do the same.

Applicable University Policies

Academic Integrity

Work you submit should be your own. Please consult the GSAS academic integrity policy for more information: http://gsas.nyu.edu/content/nyu-as/gsas/about-gsas/policies-and-procedures/gsas-statement-on-academic-integrity.html. Penalties for violations of academic integrity may include failure of the course, suspension from the University, or even expulsion.

Religious Observance

As a nonsectarian, inclusive institution, NYU policy permits members of any religious group to absent themselves from classes without penalty when required for compliance with their religious obligations. The policy and principles to be followed by students and faculty may be found here: The University Calendar Policy on Religious Holidays (http://www.nyu.edu/about/policies-guidelines-compliance/policies-and-guidelines/university-calendar-policy-on-religious-holidays.html)

Disability Disclosure Statement

Academic accommodations are available to any student with a chronic, psychological, visual, mobility, learning disability, or who is deaf or hard of hearing. Students should please register with the Moses Center for Students with Disabilities at 212-998-4980.

NYU's Henry and Lucy Moses Center for Students with Disabilities

726 Broadway, 2nd Floor

New York, NY 10003-6675

Telephone: 212-998-4980

Voice/TTY Fax: 212-995-4114

Web site: http://www.nyu.edu/csd