Published using Google Docs
CSE392: Syllabus for Special Topic in CS: Natural Language Processing
Updated automatically every 5 minutes

CS Topic: Natural Language Processing

Stony Brook University
CSE392-01 - Spring 2019

Tuesdays and Thursday, 1:00 - 2:20
NCS 109 or (Old) CS 2311 (see Piazza)

Instructor:         H. Andrew Schwartz

      office hours:Tu 5:30-6:30p, We: 2 -3

      office:        NCS 255

      email:        has@cs.stonybrook.edu*

     

* Piazza is the primary place to ask questions. Please only email if the question is personal or you are sure no one else in the class will be interested in the answer (even then you can send private in Piazza).


As humans, we process language quite effortlessly, but why do our devices seem to misunderstand us so much? Getting computers to understand natural language is one of the grand challenges of AI and its pursuit has resulted in methods that largely power some of the key technologies of the modern digital world such as Web Search, Translation, and Personal Assistants (e.g. Siri, Alexa). This course will introduce the algorithms and statistical techniques used for natural language processing, covering syntax (identifying structure), semantics (uncovering meaning), and applications (e.g. sentiment analysis, machine translation, and human language analysis). Students will be introduced to techniques in modern machine learning that power state-of-the-art NLP: deep learning (recursive neural networks, transformers) as well as discriminative learning (ridge regression, support vector machines).  The course will have a substantial project component giving students first-hand experience developing language processing software for useful real-world problems.  

Course Materials:

Speech and Language Processing (3rd ed. draft). By. Dan Jurafsky and James H. Martin

Other research papers and tutorials will be listed in the schedule.

Coursework:

35%        Exams (3)

15% Team project (1)

30% Individual assignments (3)

20% NLP in the World Presentations (2)
(subject to change with advance notice)

Grading Scale:

F: [0, 50), F: [50, 66), C-: [66,70), C: [70,76), C+: [76,78), B-: [78,80), B: [80,86), B+: [86,88), A-: [88,90), A: [90,100]

The scale above is intended to be fixed assuming the mean grade of the course is approx. 85 to 90. However, should the coursework reflect a much lower grade, a curve may be applied to raise the letter grades. The intention with having a published scale is so students always know where they stand in the course.  

Exams. Exams will take place in class. No calculators or other materials are permitted on one’s desk unless otherwise specified. Exams will last approximately 70 minutes and include questions whereby one demonstrates of familiarity with the material through (1) problem solving, (2) short answer essays, (3) coding specific algorithms, and (4) true/false statements. Material covered on the exam may include anything from class or in the readings. Lecture Slides are intended as an aid for the material covered in class; They are not a complete replacement for good note-taking in class or for doing the readings.

Individual Assignments. Three individual assignments will involve programming and allow one to use some of the concepts and algorithms learned from class lectures.

Team Project. The last project for the course will be a team project. Teams of 3 must pick a project related to natural language processing. There will be a sign up.  The projects must utilize 4 distinct concepts from the coarse, plus use of logistic regression at least once and must use RNNs at least once. : (1) a brief proposal document (as assignment 3), (2) analysis code, (3) a report, and (4) a final presentation. Signup Here

NLP in the World Presentation.  Each student must give 2 5 minute presentations covering natural language processing being used in the real world. Topics should be linked to a news, blog, or announcement of some source on the Web. Part of each students’ grade will be based on participation in this session. Signup here.

Policies

Late Assignments. Assignments will be accepted up to 48 hours late. A 10% penalty will be assessed if it is less than 24 hours late, while a 25% penalty will be assessed if it is between 24 and 48 hours late. Any assignments submitted after 48 hours from the deadline will earn a 0.

Required Programming Language: We will use Python 3.5+ as the default language during class. Acceptable libraries will be listed for each assignment.

Academic Honesty.

Copying work: Students are welcome and encouraged to converse about assignment problems and concepts. However, sharing answers, via any form of communication, or copying portions of answers from websites or other media is strictly prohibited. You are responsible for both not looking at another’s answers or code as well as making sure your own answers and code are not accessible by other students.

Plagiarism: Plagiarism is defined as presenting someone else’s writing or work without attribution, as if it was your own. Copying any work, such as code, is plagiarizing.With regard to writing, information learned from books, websites, research papers, or any other source should either be (a) written in one’s own words and include a citation, or (b) quoted and include a citation. Although option (b) is not plagiarism, excessive use of quotations will result in a lower grade as it demonstrates less critical thinking and goes against the purpose of the assignment.  Cornell University has a wonderful webpage further defining plagiarism and included exercises to determine if something is plagiarism: http://plagiarism.arts.cornell.edu . Consequences: At a minimum, all students involved in copying work, plagiarism, any cheating or scholarly misconduct will receive a 0 for the assignment and be reported to Academic Judiciary which may come with further consequences.
        
Academic Integrity. Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty is required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Technology & Management, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity/index.html

Accessibility.

If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Student Accessibility Support Center, ECC (Educational Communications Center) Building, Room 128, (631)632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential.

Critical Incidents.

Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of University Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Faculty in the HSC Schools and the School of Medicine are required to follow their school-specific procedures. Further information about most academic matters can be found in the Undergraduate Bulletin, the Undergraduate Class Schedule, and the Faculty-Employee Handbook.


Schedule and Topics

Week

Topics

Reading Assignment

Assignments, Exams

I. Syntax

1/28

Introduction to NLP; Regular Expressions

SLP 2.1, 2.4; examples in TimeFlies

2/4

Logistic Regression; POS Tagging

SLP 8.1 - 8.3; 5.1;

2/11

Logistic Regression: Supervised Classification

Elkan2014;
SLP 5.1;

A1 Released

2/18

Language Modeling and auto-complete

SLP 3.1 - 3.4

A1 Due

2/25

Language Modeling; Exam 1

Exam 1 (th, 2/28)

II. Semantics

3/4

TensorFlow and Recurrent Neural Networks

TS Paper, SLP 9.1 - 9.4;

NLP in the World Presentations Start

3/11

RNNs Continued;
Semantics Avalanche: Dependency Parsing, Word Sense Disambiguation, Semantic Role Labeling/Verb Predicates.

C.1-C.5
SLP 13.1 - 13.4

SLP 18.1-18.3; 18.6

3/18

Spring Recess: No Classes

SLP 11.1, 11.3

A2 Released

3/25

Vector Models; Dimensionality Reduction

SLP 6.1-6.5;

4/1

RNN-based (neural) Language Models

SLP 7.1,7.5;

Team Signup

4/8

LMs contd.; Exam 2

Exam 2 (th, 4/11)

III. Applications

4/15

Human Centered NLP

HovySpruit, Lynn_etAl, Belmont

4/22

Differential Language Analysis, MT    

SLP 19.5,19.7,19.8
Kern_etAl

Team Project Proposals Due

4/29

BERT Transformers; Speech Recognition;

Devlin_etAl; Sileo

NLP in the World Presentations End

5/6

Team Project Presentations

Team Project Due

5/14

-5/22

Final Exam during scheduled exam period:
5/21: 2:15pm - 5:00pm

Exam 3

This schedule is subject to change with advanced notice.