CSCI-SHU 376: Natural Language Processing
Hua Shen
Course Agenda: 2026 Spring-NLP-[CSCI-SHU-376]-Class Schedule
2026-03-05
Spring 2026
Lecture 10: LLM Pretraining and Instruction Finetuning
Recap: Pre-training and Fine-Tuning
2
Recap: ELMo
3
Recap: BERT
4
Today’s Plan
Pretraining Data
6
Pretraining Data Factors
7
Pretraining Data Quantities
8
Pretraining Data: Common Crawl
9
Pretraining Data: Extraction
10
Pretraining Data: Filtering
11
Pretraining Data: Deduplication
12
Pretraining Data Mixture
13
Pretraining and Compute
14
Scaling laws
15
Today’s Plan
Language modeling != assisting users
17
Language modeling != assisting users
We need fine-tuning to rescue!
18
Recap: Pre-training and Fine-Tuning
19
Scaling up finetuning
20
Instruction finetuning
Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., ... & Wei, J. (2024). Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70), 1-53.
21
Instruction finetuning
Wei, Jason, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. "Finetuned language models are zero-shot learners." arXiv preprint arXiv:2109.01652 (2021).
22
Instruction finetuning
23
Instruction finetuning (Flan-T5)
24
Instruction finetuning (Flan-T5)
25
MMLU: new benchmarks for multitask LMs
26
MMLU: Examples
27
MMLU: Rapid Progress
28
Instruction finetuning results
29
Chat Tuning
30
Knowledge Distillation
31
Supervised Distillation
32
Supervised Distillation
33
Token-level Knowledge Distillation
34
Sequence-level Knowledge Distillation
35