Aligning open language models
Nathan Lambert || Allen Institute for AI || @natolambert
Stanford CS25: Transformers United V4
A heavily abbreviated history of language models (LMs)
Aligning open language models | Lambert: 2
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 3
Shannon 1948
1948: Claude Shannon models English
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 4
1948: Claude Shannon models English
1948-2017:
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 5
1948: Claude Shannon models English
1948-2017:
2017: the transformer is born
Vaswani et al. 2017
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 6
1948: Claude Shannon models English
1948-2017:
2017: the transformer is born
2018: GPT-1, ELMo, and BERT released
Radford et al. 2018, Devlin et al. 2018
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 7
1948: Claude Shannon models English
1948-2017:
2017: the transformer is born
2018: GPT-1, ELMo and BERT released
2019: GPT-2 and scaling laws
Radford et al. 2019, Kaplan et al. 2020
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 8
1948: Claude Shannon models English
1948-2017:
2017: the transformer is born
2018: GPT-1, ELMo and BERT released
2019: GPT-2 and scaling laws (and releases)
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 9
1948: Claude Shannon models English
1948-2017:
2017: the transformer is born
2018: GPT-1, ELMo and BERT released
2019: GPT-2 and scaling laws
2020: GPT-3 surprising capabilities. many harms
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 10
1948: Claude Shannon models English
1948-2017:
2017: the transformer is born
2018: GPT-1, ELMo and BERT released
2019: GPT-2 and scaling laws
2020: GPT-3 surprising capabilities
2021: Stochastic parrots
A heavily abbreviated history of LMs
Aligning open language models | Lambert: 11
1948: Claude Shannon models English
1948-2017:
2017: the transformer is born
2018: GPT-1, ELMo and BERT released
2019: GPT-2 and scaling laws
2020: GPT-3 surprising capabilities
2021: Stochastic parrots
2022: ChatGPT
Can ChatGPT exist without RLHF?
RLHF seems to be necessary, but not sufficient
Aligning open language models | Lambert: 12
RLHF is relied upon elsewhere
RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more
Aligning open language models | Lambert: 13
RLHF is relied upon elsewhere
RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more
Aligning open language models | Lambert: 14
Bai, Y. et al. “Constitutional AI: Harmlessness from AI Feedback.” 2023.
Anthropic’s Claude
RLHF is relied upon elsewhere
RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more
Aligning open language models | Lambert: 15
“Meanwhile reinforcement learning, known for its instability, seemed a somewhat shadowy field for those in the NLP research community. However, reinforcement learning proved highly effective, particularly given its cost and time effectiveness.”
- Touvron, H. et al. “ Llama 2: Open Foundation and Fine-Tuned Chat Models.” 2023
Bai, Y. et al. “Constitutional AI: Harmlessness from AI Feedback.” 2023.
Anthropic’s Claude
Meta’s Llama 2
This lecture’s atlas
Follow along at hf.co/collections/natolambert
Aligning open language models | Lambert: 16
Collection
QR code
This lecture’s atlas
Follow along at hf.co/collections/natolambert
Aligning open language models | Lambert: 17
Collection
QR code
Aligning open language models: Chapters
Aligning open language models | Lambert: 18
0: kickstart
Collection
QR code
Aligning open language models: Chapters
Aligning open language models | Lambert: 19
0: kickstart
1: instruction tuning blooms
Collection
QR code
Aligning open language models: Chapters
Aligning open language models | Lambert: 20
0: kickstart
1: instruction tuning blooms
2: evals & expectations
Collection
QR code
Aligning open language models: Chapters
Aligning open language models | Lambert: 21
0: kickstart
1: instruction tuning blooms
2: evals & expectations
Collection
QR code
3: RLHF works!
Aligning open language models: Chapters
Aligning open language models | Lambert: 22
0: kickstart
1: instruction tuning blooms
2: evals & expectations
3: RLHF works!
4. expansion
Collection
QR code
Aligning open language models: Chapters
0: kickstart
1: instruction tuning blooms
2: evals & expectations
3: RLHF works!
4. expansion
Collection
QR code
Base models
Follow along at hf.co/collections/natolambert
Aligning open language models | Lambert: 24
Aligned / fine-tuned / preference trained models
Follow along at hf.co/collections/natolambert
Aligning open language models | Lambert: 25
Some definitions for “alignment” of models
Aligning open language models | Lambert: 26
Chapter 0: The race to reproduce ChatGPT
Chapter 1: The first open instruct models
First open instruction tuned models
Aligning open language models | Lambert: 29
Alpaca
13 Mar. 2023
Key idea: Instruction fine-tuning (IFT)
Aligning open language models | Lambert: 30
<|system|>
You’re a helpful agent
<|end|>
<|user|>
{query}
<|end|>
<|assistant|>{Answer goes here}
System prompt
Special
tokens
Key idea: Instruction fine-tuning (IFT)
starting point: a base language model
continue training a transformer with pairs of
question: answer
Aligning open language models | Lambert: 31
Stack Overflow :What makes a transformer a transformer?, nbro 2021
Key idea: Self-instruct / synthetic data
Start: N high-quality (often human) prompts
Ask a strong LM: Create a modified version of these instructions.
Generate completions with another (or same) strong LM.
End: easily 10x more (synthetic) training data!�
(synthetic data = text generated by another LLM)
Aligning open language models | Lambert: 32
Taori et al. 2023.
Self-instruct: Wang et al. 2022
First open instruction tuned models
Aligning open language models | Lambert: 33
Alpaca
13 Mar. 2023
Vicuna (lmsys/vicuna-7b-delta-v0)
30 Mar. 2023
Key resource: ShareGPT data
Aligning open language models | Lambert: 34
Source: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered
First open instruction tuned models
Aligning open language models | Lambert: 35
Alpaca
13 Mar. 2023
Vicuna (lmsys/vicuna-7b-delta-v0)
30 Mar. 2023
Koala
3 Apr. 2023
Why weight differences?
Aligning open language models | Lambert: 36
LLaMA access form (and license): https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform
+
=
Δ
Original LLaMA
weights
New chat model
Release this
First open instruction tuned models
Aligning open language models | Lambert: 37
Alpaca
13 Mar. 2023
Vicuna (lmsys/vicuna-7b-delta-v0)
30 Mar. 2023
Koala
3 Apr. 2023
Dolly
12 Apr. 2023
First open instruction tuned models
Aligning open language models | Lambert: 38
Alpaca
13 Mar. 2023
Vicuna (lmsys/vicuna-7b-delta-v0)
30 Mar. 2023
Koala
3 Apr. 2023
Dolly
12 Apr. 2023
MT Bench 13B: 4.53
MT Bench 7B: 6.69
MT Bench 13B: 6.08
MT Bench 12B: 3.28
OpenAssistant: The first open, human instruction dataset
“In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations (OASST1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers.”
April 15th 2023
Aligning open language models | Lambert: 39
StableVicuna: The first RLHF model
28 April 2024
Trained with proximal policy optimization (PPO) on popular datasets
Standard formulation. Ahead of its time!
Aligning open language models | Lambert: 40
QLoRA & Guanaco
LoRA: Low Rank Adaptation
Popular tool for fine-tuning models with lower memory consumption.
QLoRA: LoRA + quantized base model (plus paging and double quantization)
Further reduce memory consumption of fine-tuning while (mostly) maintaining performance
Aligning open language models | Lambert: 41
Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/
Paper: https://arxiv.org/abs/2305.14314
Dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco
Model: https://huggingface.co/timdettmers/guanaco-65b
Original thread: https://twitter.com/Tim_Dettmers/status/1661379354507476994?lang=en
Image credit: Tim Dettmers
QLoRA & Guanaco
LoRA: Low Rank Adaptation
Popular tool for fine-tuning models with lower memory consumption.
QLoRA: LoRA + quantized base model (plus paging and double quantization)
Further reduce memory consumption of fine-tuning while (mostly) maintaining performance
Aligning open language models | Lambert: 42
Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/
Paper: https://arxiv.org/abs/2305.14314
Dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco
Model: https://huggingface.co/timdettmers/guanaco-65b
Original thread: https://twitter.com/Tim_Dettmers/status/1661379354507476994?lang=en
Approximate VRAM requirements.
Source: https://github.com/hiyouga/LLaMA-Factory#hardware-requirement
QLoRA & Guanaco
Guanaco (33B MT Bench 6.88)
First models trained with QLoRA plus quality filtered Open Assistant dataset.
Both the dataset and QLoRA method are still regularly used.
State-of-the-art open model at release.
Aligning open language models | Lambert: 43
Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/
Paper: https://arxiv.org/abs/2305.14314
Dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco
Model: https://huggingface.co/timdettmers/guanaco-65b
Original thread: https://twitter.com/Tim_Dettmers/status/1661379354507476994?lang=en
Image credit: Tim Dettmers
Chapter 2: Setting expectations & evals.
44
Do LoRA methods work with RL?
Aligning open language models | Lambert: 45
Llama 2 chat backlash
Should chat models be “safe?”
Aligning open language models | Lambert: 46
Röttger et al. 2023
“Uncensored” models
One of the first models named this way (April 2023): cognitivecomputations/WizardLM-7B-Uncensored
Example models here: https://huggingface.co/models?other=uncensored
Aligning open language models | Lambert: 47
Transition period: Ultrachat, OpenChat, XwinLM, OpenHermes, and more fine-tunes
A series of strong models trained with instruction tuning and/or RLHF, but none markedly shifted the narrative.
Aligning open language models | Lambert: 48
Note 17 April 2024: WizardLM not currently available officially on HuggingFace for artifact review at Microsoft.
Establishing evaluation
The four most popular aligned model evaluations of the past year were created within 2 months of each other!
Aligning open language models | Lambert: 49
ChatBotArena
Side by side preference collection of two different models.
Pros:
Cons:
Aligning open language models | Lambert: 50
AlpacaEval
LLM-as-a-judge mirroring preference collection phase:
Aligning open language models | Lambert: 51
AlpacaEval
Strengths:
Shortcomings (similar to MT Bench):
Aligning open language models | Lambert: 52
Aside: AlpacaEval 2
Aligning open language models | Lambert: 53
MT Bench
LLM-as-a-judge: ask a LLM (GPT4/Claude) to rate a model response:
Aligning open language models | Lambert: 54
Zheng, Lianmin, et al. "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena." arXiv preprint arXiv:2306.05685 (2023).
MT Bench
LLM-as-a-judge: ask a LLM (GPT4/Claude) to rate a model response:
Shortcomings: hard to use as sole focus during training
Aligning open language models | Lambert: 55
Zheng, Lianmin, et al. "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena." arXiv preprint arXiv:2306.05685 (2023).
Open LLM Leaderboard
Started as an engineering tool for automatically evaluating competitive models. Turned into an product with an entire team.
Aligning open language models | Lambert: 56
Establishing evaluation
How easy to use are these evaluations
Aligning open language models | Lambert: 57
Chapter 3: Getting RLHF to work
58
Review: RLHF objective
Aligning open language models | Lambert: 59
π: LLM policy
πθ: base LLM
x: prompt
y: completion
Review: RLHF objective
Aligning open language models | Lambert: 60
Optimize “reward” inspired ▲ by human preferences
▲ Constrain the model to not trust the reward too much (preferences are hard to model)
π: LLM policy
πθ: base LLM
x: prompt
y: completion
Review: RLHF objective
Aligning open language models | Lambert: 61
Optimize “reward” inspired ▲ by human preferences
▲ Constrain the model to not trust the reward too much (preferences are hard to model)
π: LLM policy
πθ: base LLM
x: prompt
y: completion
Primary questions:
Review: Preference (reward) modeling
Can we just use supervised learning on scores?
Aligning open language models | Lambert: 62
Bradley Terry model:�Estimate probability that a given pairwise preference is true
Score from
optimal reward model
Chosen completion
Rejected completion
Prompt
Key idea:
Probability ∝ reward
What if we just use gradient ascent on this equation?
Aligning open language models | Lambert: 63
What if we just use gradient ascent on this equation?
The answer, with some math, is:
Direct Preference Optimization (DPO)
Released on May 29th 2023
(4+ months before models we’re discussing)
Aligning open language models | Lambert: 64
Rafailov, Sharma, Mitchell et al. 2023
DPO core facts
The first 2 points mean we’ll see more DPO models than anything else and learn it’s limits!
Aligning open language models | Lambert: 65
Example code.
Rafailov, Sharma, Mitchell et al. 2023
DPO vs RL (PPO, REINFORCE, …)
DPO and PPO are very different optimizers.
It is learning directly from preferences vs. using RL update rules.
It is also not really online vs offline RL, but that is more muddled.��More discussion:�https://twitter.com/srush_nlp/status/1729896568956895370, https://www.interconnects.ai/p/the-dpo-debate, https://www.youtube.com/watch?v=YJMCSVLRUNs
Aligning open language models | Lambert: 66
Credit Tom Goldstein
https://twitter.com/tomgoldsteincs
RLHF phase: Zephyr β
Aligning open language models | Lambert: 67
UltraFeedback: https://arxiv.org/abs/2310.01377
RLHF phase: Tulu 2
Aligning open language models | Lambert: 68
RLHF phase: SteerLM & Starling
Still plenty of models showing that PPO (and RL methods) outperforms DPO!
Aligning open language models | Lambert: 69
Chapter 4: Modern ecosystem
70
Diversity of models and more players
Examples:
Aligning open language models | Lambert: 71
Llama 3
More about scaling than alignment. TBD if they solved the Llama 2 refusals problem.
More here: www.interconnects.ai/p/llama-3-and-scaling-open-llms
Aligning open language models | Lambert: 72
Current directions
Aligning open language models
73
Open vs. closed aligned models
Lambert: 74
Maxime Labonne
Tulu 2 70B
Current directions
Aligning open language models | Lambert: 75
I cover these topics regularly on my blog www.interconnects.ai
Where open alignment is happening
Aligning open language models | Lambert: 76
I cover these topics regularly on my blog www.interconnects.ai
Thank you! Questions
Contact: nathan at natolambert dot com
Socials: @natolambert
Writing: interconnects.ai
Thanks to many teammates at HuggingFace and AI2 for supporting this journey!
Aligning open language models | Lambert: 77