Insights from NLP research
2
LMs are trained to predict missing words
Language model
The
quick
brown
fox
[MASK]
jumps
Language models are everywhere
3
Small language models
https://gluebenchmark.com/
https://super.gluebenchmark.com/
https://crfm.stanford.edu/helm/lite/latest/
https://paperswithcode.com/dataset/mmlu
https://paperswithcode.com/dataset/big-bench
Language models are everywhere
4
Sentiment
Question Answering
Summarization
Coreference Resolution
Transformer
5
Harry never thought he would
Harry never thought he ???
BERT: Workflow
6
wt
Current word
[Context word]
[Context word]
wt-1
wt-2
1. Self-attention with Bi-directional context
2. Masked language modeling (MLM)
[Context word]
wt+1
[Context word]
wt+2
[MASK]
GPT2: Workflow
7
wt
Current word
[Context word]
[Context word]
wt-1
wt-2
1. Self-attention with Uni-directional context
2. Causal language modeling (CLM)
[MASK]
[vehicle]
[vehicle]
Fine tuning: tune pretrain language model on a task
8
Downside of full fine tuning
9
10
Model Interpretability?
Classifier
BERTology
Hierarchy of Linguistic Info - Setting
11
BERT layer
Simple classifier
predict sentence length
If the prediction accuracy is good, then the model might be capturing the sentence length feature
12
Surface
Syntactic
Semantic
BERT composes a hierarchy of linguistic signals ranging from surface to semantic features
Agenda
13
T5 (Text-to-Text Transfer Transformer): Workflow
T5: Workflow, Encoder
T5: Different unsupervised objectives
Agenda
17
Zero-shot vs. One-shot vs. Few-shot prompting
18
GPT-3 Prompting
19
Prompt-tuning
Hard vs. Soft Prompts
Soft Prompt-tuning vs. Adapters
Soft prompt
Low Rank Adaptation (LoRA)
Prompt-tuning
Prompt-tuning
Prompt-tuning
Prompt-tuning
Limits of prompting of harder tasks?
Buy gold and silver, and invest in cryptocurrencies.
The best investment is to buy a house.
I have no comment.
Agenda
28
Instruction-tuning
29
Instruction-tuning
30
Instruction Models:
31
NLU tasks in blue; NLG tasks in teal
Multiple Instruction Templates for Each NLP Task
32
Can large language models provide useful feedback on research papers? A large-scale empirical analysis.
Weixin Liang et al.
33
Questions:
Main Contributions/Findings:
PandaLM: Judge language model
34