Published using Google Docs
CS260: Natural Language Generation with Deep Learning (Winter 2023)
Updated automatically every 5 minutes

Natural Language Generation with Deep Learning

 Course codes: CS260

Instructor: Yue Dong

Assistant Professor

UCR

yuedongcs@gmail.com

Term: Winter 2023

When: MWF 10:00am -10:50 am

Where: Skye Hall170

Office hours: MRB 4135 MF 11:00 am to 12:00 pm

TA: Uday Singh Saini usain001@ucr.edu

Course Description:

Natural language generation (NLG), a branch of natural language processing (NLP) aimed at producing human-like text, underlies many successful applications of artificial intelligence, such as machine translation, paraphrasing, and question answering systems. These impressive performances are largely attributed to the development of deep learning, where we have witnessed the paradigm shift from supervised learning (training a model from scratch based on labeled data) to transfer learning (using self-supervised pre-training followed by subsequent task specific fine-tuning), to recent prompt-based learning that accommodates data to fit the pre-trained model without fine-tuning. In this course, we will survey the history of natural language generation in the era of deep learning, as well as recent advances in building, analyzing, and using deep learning models for downstream NLG applications.

This is a seminar-style course, where the class as a whole will work together in running the course. In the first few lectures, I will provide an overview of NLG and highlight the challenges the field is facing. For the remaining classes, students are required to present one paper and one code demo in a team of 1-2.

Expected Outcomes:

By the end of the course, you should be able to meaningfully contribute to cutting-edge research in natural language processing.

Useful Resources:

Pytorch Tutorial: https://www.youtube.com/playlist?list=PLqnslRFeH2UrcDBWF5mfPGpqQDSta6VK4 

Huggingface crash course: https://huggingface.co/course/chapter0/1?fw=pt 

Paper with code: https://paperswithcode.com/

Grading:

Prerequisites:

Students must have experience with machine learning, deep learning, and the basics of modern natural language processing. Additionally, this course requires proficiency in Python and preferably PyTorch.  This seminar course is intended to provide students with the foundations they need to conduct NLP-related research with deep learning in the future. Consider taking CS 173 (Introduction to Natural Language Processing) instead for a general and systematic introduction to NLP.

To benefit from this class, you should be able to read a recent conference paper on machine learning or natural language processing and have a decent understanding of the basic ideas and concepts proposed (not necessarily a complete understanding of every detail).  

Schedule (subject to change):

Presentation Schedule, Slides, and Codes (CS260 winter 2013, need UCR R'Mail)

Date

Topic/Paper

Additional Readings

Mon, 1/9

Intro. Slides

Class introduction, background, logistics

Wed, 1/11

Slides

Sequence Modeling Fundamentals (Yue)

Fri, 1/13

Slides

Sequence Modeling with Deep Learning (Yue)

Mon, 1/16

Holiday

Wed, 1/18

Supervised learning with RNNs

Sequence to sequence learning with neural networks (2014) (1)

Paper slides

Implementation

1/20/2023 (add/drop ddl)

Learning Phrase Representations using RNN Encoder–Decoder

for Statistical Machine Translation (2)

Mon, 1/23

Attention

Neural Machine Translation by Jointly Learning to Align and Translate (3)

Implementation of Bahdanau attention

Wed, 1/25

Effective Approaches to Attention-based Neural Machine Translation (4)

Luong attention

Fri, 1/27

Self-attention models

Attention Is All You Need (5)

Mon, 1/30

Transformer-based encoders

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (6)

Self-Attention with Relative Position Representations

The Annotated Transformer

Wed, 2/1

RoBERTa: A Robustly Optimized BERT Pretraining Approach (7)

Fri, 2/3

A Primer in BERTology: What We Know About How BERT Works (8)

BERT101

Family of BERT comparison

Mon, 2/6

Transformer-based encoder-decoder models

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (9)

Wed, 2/8

T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (10)

Fri, 2/10

Transformer-based decoders

GPT: Improving Language Understanding by Generative Pre-Training (11)

GPT-2: Language Models are Unsupervised Multitask Learners (12)

T5 implementation

Mon, 2/13

GPT-3: Language Models are Few-Shot Learners (13)

Wed, 2/15

Prompt-based learning

T0: Multitask Prompted Training Enables Zero-Shot Task Generalization (14)

Fri, 2/17

FLAN: Finetuned Language Models Are Zero-Shot Learners (15)

Mon, 2/20

Mon, 2/20

Due 11:59pm PST

Holiday (no class)

Project proposal due


Video submission folder

Slide submission folder 

(restricted to the class)

A 5-minute video with the introduction, datasets/tasks, literature review, and your novel methods (refer to Q4 for more details)

Wed, 2/22

Project proposals I

Literature review + Proposal

Fri, 2/24

Project proposals II

Literature review + Proposal

Mon, 2/27

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models (16)

Wed, 3/1

Special Guest Lecture

Annika Speer

How to effectively deliver public speeches as a scientist? 

Fri, 3/3

DPR:Dense Passage Retrieval for Open-Domain Question Answering (17)

Latent Retrieval for Weakly Supervised Open Domain Question Answering (18)

Mon, 3/6

Retrieval-based models

REALM: Retrieval-Augmented Language Model Pre-Training (19)

RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (20)

ATLAS-Meta

RETRO-Deepmind

Wed, 3/8

Multimodal models

🦩 Flamingo: a Visual Language Model for Few-Shot Learning (21)

Fri, 3/10

Efficient learning/inference

Efficiently Scaling Transformer Inference (22)

Mon, 3/13

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (23)

Wed, 3/15

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (24)

Fri, 3/17

Limitations and beyond

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models (25)

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 (26)

Fri, 3/24 11:59pm PST

Final project due

Submission folder (restricted to the class)

UCR final period: Sat. Mar. 18; Mon.-Fri. Mar. 20-24, 2023

        Mon, 3/4

Grades are available on R’Web

We will submit the grades on Mar. 28th

Textbooks:

Books on deep learning for NLP:

  1. Goldberg 2017: Neural Network Methods for Natural Language Processing 
  2. Jurafsky and Martin 2019: Speech and Language Processing
  3. Eisenstein 2019: Introduction to Natural Language Processing (draft copy here)

FAQs:

Q1: What is the project format?

A: The objective of this project is to carry out original and independent research on NLP-related topics. Please use the Latex template to produce a final report that is at most eight pages in length plus references. An abstract and introduction should be included in your report to clarify the problem you are trying to solve and the contributions you have made. The paper should also include sections on related work, methodology, experimental setup and results, and a section on conclusions.

Q2: Is there a particular format to write the summary?

A: The paper summary should follow the ACL rolling review format and contain the following three sections: summary of the paper, summary of the strengths, and summary of the weaknesses. More details on how to write each section can be found here: https://aclrollingreview.org/reviewform

Q3: How long is the paper and code presentation?

A: We are expecting 20 minutes for each presentation, with a 5-minute QA session.

Q4: What is the format for the project proposal?

A: 1. Please submit a 5-minute video presentation (could use Zoom for recording) in the Video submission folder and your slides in the Slide submission folder by Mon, 2/20 11:59pm PST.

     2.  Browse through your peers’ videos before the class on Wed, 2/22.

     3. Ask and answer questions related to the projects on Wed, 2/22, and Fri, 2/24.

In the proposal slides, please lay out the following sections:

  1.  Introduction and motivation
  2. Task/dataset (example input&output, dataset statistics)
  3. Literature review on existing work
  1. one slide on the base models we covered in the class, e.g., T5 or BART for generation tasks, or BERT for classification tasks
  2. At least one latest state-of-the-art model on the task (e.g., from papers in 2022 or 2023)
  1. Your novel idea: Explain how it differs from existing work

Q5: How do I come up with project ideas?

A: I would suggest to follow the following steps:

1. Finding a dataset and task at:

2. Looking for latest SOTA work and their code on NLP or ML conferences (e.g.,):

3. Reproduce the SOTA paper and be critical, think about the limitations and potential improvements. You are welcome to discuss the ideas with me.

Q6: How to access the cluster for experiments?

A: Please follow the instructions here [CS 260 cluster link] [bolt access slides] [bolt access docs]. If you have additional questions, please ask Uday for help.

Q7: Project proposal grading rubrics:

  1. (20%) Introduction and motivation
  2. (20%) Task/dataset (example input&output, dataset statistics)
  3. (40%) Literature review on existing work
  1. one slide on the base models we covered in the class, e.g., T5 or BART for generation tasks, or BERT for classification tasks
  2. At least one latest state-of-the-art model on the task (e.g., from papers in 2022 or 2023)
  1. (20%) Your novel idea: Explain how it differs from existing work. - Notes on this:  As we have discussed in class, there are many ideas and techniques that you could potentially propose. It's not necessary to come up with novel ideas that are guaranteed to work, as most of our ideas will not end up working better than existing ones. The key is to be critical when analyzing current works (step 3) and to apply the knowledge you have gained in class to test out new ideas.  For example, in our class, we discussed the next sentence prediction objective in BERT and questioned why the authors made this assumption as a training objective. Later work, RoBERTa, also questioned this design choice and tried prior and next sentence prediction instead. This is a good example of being critical when reading a paper and not taking other people's design choices for granted.

Academic Integrity:

The University of California at Riverside values academic integrity. All students must learn the meaning and consequences of cheating, plagiarism, and other academic offenses under the university's Academic Integrity Policies and Procedures http://conduct.ucr.edu/policies/academic-integrity-policies-and-procedures/.

Inclusivity:

I strive to provide an inclusive learning environment as the instructor of this course. If you experience barriers to learning in this course, please contact me or UCR Student Disability Resource Center.