Natural Language Understanding and Computational Semantics
DS-GA 1012 - LING-GA 1012
Assistant Professor of Linguistics, Data Science, and Computer Science
Spring 2022
Since at least the proposal of the Turing test, building computational systems that can communicate with humans using natural language has been a central goal for what we now think of as AI research. Understanding real, naturally occurring human language is the key to reaching this goal. This course surveys recent successes in language understanding, but it is focused primarily on preparing students to do original research in this area, culminating with a substantial final project that should meet the standards of published work in this field.
The course is centered on text rather than speech, but within that, it will touch on the full range of applicable techniques for language understanding, including formal logics, statistical methods, distributional methods, and deep learning, and will bring in ideas from formal linguistics where they can be readily used in practice. We’ll discuss concrete tasks like question answering as well as higher-level issues like how to effectively represent language meaning.
Lecture: Available on New NYU Classes, released every Monday afternoon
Live Discussion: Wednesdays 10–11:40a, Cantor Film Center (36 East 8th Street), Rm. 101
Live Lab Session: Thursdays 11:15a–12:05p, 19 University Place, Room 102
The main lecture content will take the form of a prerecorded video each week, in the flipped classroom style. The normal class time will be used for semi-structured discussions of course content and readings. The lab session, led by the TAs, will introduce concrete skills and tools that you'll use in your assignments and project.
All lab sessions and live discussion sessions will be recorded, but to participate actively, you’ll have to attend in person. Participating in live discussion within these sessions is an easy way to earn the required participation credit, but you can also earn this credit through online discussion, and you are not required to attend any specific live sessions.
Recommended: NYU DS-GA 1011, or recent coursework elsewhere covering natural language processing with deep learning.
Required: Prior experience with Python programming (including experience writing >100-line programs with common libraries from scratch), basic familiarity with linear algebra.
Note on alternative courses: This course covers largely the same material as the advanced undergraduate course LING-UA 52. Other than that, it should overlap only very slightly with other NYU courses on computational linguistics and natural language processing.
This course is offered under two different course codes, but both of them correspond to the same actual course, with the same meeting times and assignments. The DS-GA course code is restricted to Data Science students until mid-January, when Data Science opens up enrollment for all of its courses. The LING-GA course code is open to everyone, but has fewer seats.
Faculty do not have the power to enroll students or add students to waiting lists, and we do not have the ability to create Albert permission codes. Please email departments directly for registration issues, rather than me. Students who meet the prerequisites but are blocked from enrolling on Albert should email one or both departments directly to enroll or join a waiting list. If you are working on a PhD in a closely related area and you’re having trouble enrolling, ask your advisor to contact Data Science directly for help—they may be able to guarantee you a seat if you ask early. This course is open to students in any NYU school or program.
Auditors from any school or department are welcome, subject to the prerequisites above. Auditors may not submit assignments, and may only participate in project teams with permission of the instructor. If you’d like to audit, make a private post on Campuswire during the first week of class introducing yourself as an auditor and giving your email address. The TAs can then add you to Brightspace.
Forum (Campuswire)
New NYU Classes (brightspace.nyu.edu)
Note: Office hour times may change some weeks. Watch the forum for announcements.
Sam Bowman (Instructor)
Richard Yuanzhe Pang (TA)
Ying Wang (TA)
Mark Yilun Kuang (Grader)
For a preview of some topics, see 2021’s slides.
Wk. | Lecture Topic | Due | Lab Topic (w/ instructor) | Reference |
1/24 | Introduction | Campuswire sign-up | Basics of Supervised Learning (RP) | E1 |
1/31 | Text Classification Crash Course Live: Discussion | Homework Opt-Out | Hands-On Supervised Learning (YW) | J&M3–5; G2 |
2/7 | Deep Learning Crash Course | HW1: Classification (YW) | Deep Learning (RP) | J&M6–7,9 |
2/14 | Pretraining & Transfer Learning Live: Discussion | HW2: Neural Networks (RP) | Transfer Learning (YW) | Ruder, J&M11 |
2/21 | Working with Pretrained Models (w/ Jason Phang) | Reading Quiz: BERT | NYU HPC Cluster (RP) | |
2/28 | Aligning and Using Large Language Models Live: Quiz Recap and Discussion | Mini-Proposals | Prompting (YW) | See Slides |
3/7 | Designing Experiments | HW3: Transformers (MK) | Research Mini-Talks (RP) | See Slides |
3/14 | Spring Break | |||
3/21 | Syntax & Parsing | Reading Quiz: TruthfulQA, Teams | Working with Parsers (YW) | J&M12–14; B5–7 |
3/28 | Formal Semantics | Full Proposals | Formal Semantics (YW) | J&M15; C&C |
4/4 | Writing & Publishing | Intermediate LaTeX (RP) | See Slides | |
4/11 | Model Analysis | Reading Quiz: | Model Analysis (YW) | |
4/18 | Applications & Ethics | Partial Draft | Making Figures (RP) | |
4/25 | Data Collection and Crowdsourcing (w/ Alicia Parrish) | Reading Quiz: | Mechanical Turk (YW) | |
5/2 | Dataset Analysis | Extra Office Hours (RP) | See Slides | |
5/9 | Project Presentations (Friday 8–9:50a) | Final Paper (Friday noon) |
Paper readings: You’ll have to read lots of NLP research papers for your project, but we’ll read only four together as a class. The readings are examples of good recent research, and we’ll be reading them with a special attention to how they’re designed and written. For each paper we read together, you’ll have to be ready to take a brief quiz on the paper. After the quiz, you’ll have a chance to discuss the paper—both as a technical idea and as a piece of writing.
Reference readings: This course doesn’t follow any one book, but the readings noted in the ‘Reference’ column above should cover most of the technical content discussed in class, and offer greater depth and additional references on many points. These readings are often long and dense, and you are not required to read them before lecture. However, you will likely have to refer to them at least occasionally when completing the assignments and the project. Abbreviations above correspond to:
We’re using these books because we think they’re useful, and I encourage you to read them in full.
The final paper should answer a concrete empirical question involving machine learning for language understanding and should include a literature review that shows that this question hasn’t already been answered decisively by others. A paper could evaluate which of two kinds of model works best on some dataset, evaluate an existing type of model on a new real-world language understanding problem, or offer a quantitative analysis of some existing model using new or existing data.
The paper should include the description of the task, models, experiments and conclusion, and should be no more than 4 pages long, with two optional pages for appendices, and unlimited additional pages for references. The main body of the paper should be complete—you should only use appendices to provide additional clarifying details. It should be prepared in LaTeX. Your paper must follow ACL style closely. Your paper must include a link to a public GitHub repository containing any code used in your experiments.
In addition to the paper itself, we will grade four milestones:
Here are some examples of papers from past years that earned ‘A’ grades.
The partial draft and the final paper must state (in one or two sentences) the contributions of each team member. This doesn’t count toward the page limit.
Projects should be completed in groups of three or four, which should be formed by the time the full proposal is submitted. We strongly encourage you to discuss the readings with your peers before the reading quizzes open, but while the quizzes are open, you may not discuss the readings or quiz questions with anyone. You may discuss the homework assignments with anyone at any time, but you must ultimately complete and write-up the assigned work for yourself.
You may submit a project to both this class and another class—we encourage it!—but if you do this, your project must be ambitious and thorough enough to justify the amount of credit you’ll be getting for it. If your course project overlaps with a project that will be submitted to another class, you must inform both instructors by the partial draft deadline, and get confirmation from the instructors of both classes that the combined project is substantial enough.
Please upload completed assignments to NYU Brightspace by 9:30 AM on Wednesdays.
We will take 20 percentage points from a grade for each day (full or partial) that the work is late. Exceptions are available only in case of documented emergencies.
Some lab sessions may include opportunities to earn extra credit, of no more than one point each toward the total grade. This will not be offered as part of the first lab session, or any lab session that is advertised as an optional review session. You will have to attend live to be eligible. We never offer extra credit of any kind on request.
Use public forum posts or come to office hours if you have questions about the course. Only use email or private forum messages for issues that are highly sensitive.
If you use any text or figures at all from an outside source in your submitted work, you must do two things: (i) Use quotation marks, block quotes, or an explicit note to make it clear exactly what part of your paper comes from this source. (ii) Clearly cite this source. Yes, this includes descriptions or depictions of any baseline models and data you use.
You may reuse code by others for the final paper, as long as you make this clear in your paper. However, all the code you submit for individual assignments must be your own.
Violations of this policy will result in a zero grade for the submitted work and a referral to the university for further investigation.
Written submissions will use TurnItIn by default to automatically flag possible plagiarism cases for our review. You may opt out if you wish by contacting us.
I expect you to be civil, to be respectful of your classmates, and to consider the impacts of your decisions on others, both in your research work and in your interactions with your classmates. I'll try to do the same.
Work you submit should be your own. Please consult the GSAS academic integrity policy for more information: http://gsas.nyu.edu/content/nyu-as/gsas/about-gsas/policies-and-procedures/gsas-statement-on-academic-integrity.html. Penalties for violations of academic integrity may include failure of the course, suspension from the University, or even expulsion.
As a nonsectarian, inclusive institution, NYU policy permits members of any religious group to absent themselves from classes without penalty when required for compliance with their religious obligations. The policy and principles to be followed by students and faculty may be found here: The University Calendar Policy on Religious Holidays (http://www.nyu.edu/about/policies-guidelines-compliance/policies-and-guidelines/university-calendar-policy-on-religious-holidays.html)
Academic accommodations are available to any student with a chronic, psychological, visual, mobility, learning disability, or who is deaf or hard of hearing. Students should please register with the Moses Center for Students with Disabilities at 212-998-4980.
NYU's Henry and Lucy Moses Center for Students with Disabilities
726 Broadway, 2nd Floor
New York, NY 10003-6675
Telephone: 212-998-4980
Voice/TTY Fax: 212-995-4114
Web site: http://www.nyu.edu/csd