1 of 50

CSCI-SHU 376: Natural Language Processing

Hua Shen

2026-01-20

Spring 2026

Lecture 1: Introduction

2 of 50

Welcome to NLP course👏 !

Hello from Your Instructor!

Hua Shen

Assistant Professor of Computer Science

huashen@nyu.edu | huashen218

Research: Human-AI Alignment

Website: https://hua-shen.org/

3 of 50

Outline

  1. What is Natural Language Processing (NLP)?
  2. Course Overview and Logistics
  3. Discussion: Know more about You 🧡 !

4 of 50

Outline

  • What is Natural Language Processing (NLP)?
  • Course Overview and Logistics
  • Discussion: Know more about You 🧡 !

5 of 50

What is Natural Language Processing

6 of 50

What is NLP?

  • Natural Language Processing:
    • build program to automatically analyse, understand and generate human language in text
    • Important branch of Artificial Intelligence
  • NLP is an interdisciplinary field
    • Healthcare, Law, Finance, etc

7 of 50

What is NLP?

8 of 50

What is NLP?

  • Play Diplomacy game with Human players!

9 of 50

What is NLP?

10 of 50

  • 1950s–1980s: Rule-based NLP
    • Hand-crafted rules, grammars, lexicons
    • Tasks: parsing, MT, QA (ELIZA)
    • Brittle, non-scalable
  • 1990s–2000s: Statistical NLP
    • Data-driven, probabilistic models
    • N-grams, HMMs, PCFGs, MaxEnt
    • Tasks: POS, NER, SMT
    • Enabled by large corpora (Penn Treebank)
  • 2000s–2013: Feature-Based ML NLP
    • SVMs, CRFs, heavy feature engineering
    • Sequence labeling, parsing, IE
    • Limited transfer, costly features

NLP Landscape & History

11 of 50

  • 2013–2017: Neural NLP
    • Word embeddings (Word2Vec, GloVe)�RNNs, LSTMs, attention
    • Seq2Seq models
  • 2018–2020: Transformers & Pretraining
    • Self-attention, pretrain–finetune
    • BERT, GPT, T5
    • Benchmark leaps (GLUE)
  • 2021–Now: Foundation Models / LLMs
    • Scale, in-context learning, RLHF
    • Zero-/few-shot generalization
    • New challenges: bias, hallucination, cost, safety

NLP Landscape & History

12 of 50

  • Task Landscape
    • Syntax → Semantics → Discourse
    • Tagging, parsing, NER, QA, MT, summarization, dialogue
  • Resources
    • Data (Wikipedia, Common Crawl)
    • Compute (GPUs/TPUs)
    • Tooling (NLTK, spaCy, HuggingFace)
  • Open Directions
    • Multimodal NLP
    • Low-resource languages
    • Explainability & Controllability
    • Human–AI collaboration
    • Safety and Alignment

NLP Tasks & Directions

EMNLP 2025 Reference: https://2025.emnlp.org/calls/main_conference_papers/

13 of 50

Know More about State-Of-The-Art (SOTA) NLP

Three top *CL conferences

  • ACL
  • EMNLP
  • NAACL

14 of 50

NLP History 1: Statistical and Feature-Based NLP

https://medium.com/@antoine.louis/a-brief-history-of-natural-language-processing-part-1-ffbcb937ebce

Rule-based NLP

15 of 50

Rule Based NLP

  • Rule based system, require careful programming
  • Limited Domains

16 of 50

Statistical NLP

17 of 50

Statistical NLP

  • Use machine learning approaches for NLP
  • Statistical Machine Translation

18 of 50

https://medium.com/@antoine.louis/a-brief-history-of-natural-language-processing-part-1-ffbcb937ebce

NLP History 2: NLP with Deep Learning

19 of 50

NLP with Deep Leaning

  • Significant Progress in NLP

  • Lots of Compute resources, large corpora

  • Little Feature Engineering

20 of 50

NLP with Deep Leaning – Neural Machine Translation

21 of 50

NLP with LLMs

  • Pre-trained on large corpus
  • Fine-tuning / prompting on tasks
  • One model for many tasks

22 of 50

State-of-the-art LLMs

23 of 50

Language Model Hallucinates

24 of 50

Hallucination is an open question

  • BingChat: Retrieval Augmentation

25 of 50

NLP Subfields

  • Machine Translation
  • Question Answering
  • Information Extraction
  • Text Summarization
  • Syntactic Parsing
  • Semantic Parsing
  • ….

26 of 50

Why NLP is (Still) Hard?

  • Ambiguous
  • Listener has to infer --- Pragmatics
  • ….

27 of 50

Lexical Ambiguity

28 of 50

Syntactic Ambiguity

29 of 50

Semantic Ambiguity

30 of 50

Pragmatics

  • Language conveys information about a user’s preference

31 of 50

Outline

  • What is Natural Language Processing (NLP)?
  • Course Overview and Logistics
  • Discussion: Know more about You 🧡 !

32 of 50

Logistics

  • Lectures: Tuesday/Thursday 11:15am – 12:30pm, N401
  • Office Hour: Friday 2:30-4:00pm (send me an email beforehand is preferred)
  • Course Site: Brightspace
    • Contains all information about the course (dates, contents, readings etc)
    • Announcements will be posted on Brightspace as well
    • Slides will be available before each class

33 of 50

Course Topics

34 of 50

Course Goal

  • Understand basics of NLP (standard frameworks, algorithms etc)
  • Know recent advances (e.g., LLM) and NLP applications
  • Get hand-on experience, through assignments and final projects

35 of 50

Course Structure

Grading breakdown

      • Attendance and Quizzes: 15%
      • Homework Assignments: 15%
      • Mid-term Exam: 25%
      • [Project] Proposal: 5%
      • [Project] Mid-term Presentation: 10%
      • [Project] Final Report and Presentation: 30%

36 of 50

Course Structure

Quizzes (15%)

    • There will be 7 quizzes, each quiz will focus on the previous topic we discussed in class (e.g., the first quiz will be about N-gram LMs).
    • Each quiz will include one free-answer question (for 10-15 mins), about the key concepts in the corresponding topic.
    • Quiz will be closed book.
    • Your lowest quiz score will be removed.

37 of 50

Course Structure

Assignments (15%)

1: LM, text classification, word embedding (5%)

2: sequence modelling (5%)

3: Transformer (5%)

    • Each assignment consists of programming (in Python) and written components, and usually has 2 weeks
    • You will have 48 free late hours for all assignments

38 of 50

Course Structure

Midterm Exam (25%)

    • True / False, Multiple-choice, free-answers
    • You are allowed to bring a double sided A4 sheet
    • Grades will be curved

39 of 50

Course Structure

Final Project (45%)

Work on a specific problem with provided datasets and baselines. E.g., Agents, aiming for a research paper.

    • A team project with (usually) <= 3 people per team
    • Meet Hua at second week with your project ideas.
    • [Project] Proposal (5%): your ideas and a related paper (5%)
    • [Project] Mid-term Presentation (10%): relevant papers, and your progress / challenges (10%)
    • [Project] Final Presentation (10%): your final project
    • [Project] Final Report (20%): a conference style report

40 of 50

Final Project Team Registration

41 of 50

2026: Human + AI in Classs

Human-AI Collaboration is Encouraged!: Leverage AI as you need in this course

42 of 50

  • Code editor + LLM

2026: AI + Coding

43 of 50

  • Operating system written by Cursor Only!!
  • Do we still need software engineer ???

2026: AI + Coding

44 of 50

2026: AI + Research

Van Noorden, R., & Perkel, J. M. (2023). AI and science: what 1,600 researchers think. Nature, 621(7980), 672-675.

45 of 50

Course Books

Textbooks

  • The most popular textbook in NLP
  • The book is frequently Updated

Link: https://web.stanford.edu/~jurafsky/slp3/

46 of 50

Prerequisite

  • We assume you have learned (or at least know a bit about) the following:
    • Python
    • Calculus, Probabilities and statistics
    • Supervised Learning and. Unsupervised Learning
    • Basics of Neural Networks

47 of 50

Computing Resource

— Generative AI Tools and Services in NYU Shanghai

Service

How to Access

Collect data?

Commercial

  • OpenAI.com (account needed)
  • Microsoft Bing (with built-in ChatGPT-5 functionality)
  • Public AI Tools (ChatGPT, Claude, Bard) - Direct Platform access; strictly prohibited for any NYU-related information

Yes (personal use only)

Institutional Licence

@NYU IT

  • Google Gemini & NotebookLLM - Available through NYU IT with NetID

No (NYU wide license)

Private By Request

  • Private Generative AI Pilot (OpenAI ChatGPT) - Submit project proposal via NYU IT; 3-5 days approval;

No

48 of 50

Outline

  • What is Natural Language Processing (NLP)?
  • Course Overview and Logistics
  • Discussion: Know more about You 🧡 !

49 of 50

Love to know more about you!

What’s your experience + expectation on this course

50 of 50

  • Your name, major, year
  • Why do you choose this course / What do you know about NLP?
  • Do you have specific topics that you find interesting?

Students Introduction