3 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 3

Shannon 1948

1948: Claude Shannon models English

4 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 4

1948: Claude Shannon models English

1948-2017:

5 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 5

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

Vaswani et al. 2017

6 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 6

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo, and BERT released

Radford et al. 2018, Devlin et al. 2018

7 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 7

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

Radford et al. 2019, Kaplan et al. 2020

8 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 8

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws (and releases)

9 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 9

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

2020: GPT-3 surprising capabilities. many harms

10 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 10

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

2020: GPT-3 surprising capabilities

2021: Stochastic parrots

11 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 11

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

2020: GPT-3 surprising capabilities

2021: Stochastic parrots

2022: ChatGPT

12 of 77

Can ChatGPT exist without RLHF?

RLHF seems to be necessary, but not sufficient

Aligning open language models | Lambert: 12

13 of 77

RLHF is relied upon elsewhere

RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more

Aligning open language models | Lambert: 13

14 of 77

RLHF is relied upon elsewhere

RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more

Aligning open language models | Lambert: 14

Bai, Y. et al. “Constitutional AI: Harmlessness from AI Feedback.” 2023.

Anthropic’s Claude

15 of 77

RLHF is relied upon elsewhere

RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more

Aligning open language models | Lambert: 15

“Meanwhile reinforcement learning, known for its instability, seemed a somewhat shadowy field for those in the NLP research community. However, reinforcement learning proved highly effective, particularly given its cost and time effectiveness.”

- Touvron, H. et al. “ Llama 2: Open Foundation and Fine-Tuned Chat Models.” 2023

Bai, Y. et al. “Constitutional AI: Harmlessness from AI Feedback.” 2023.

Anthropic’s Claude

Meta’s Llama 2

16 of 77

This lecture’s atlas

Follow along at hf.co/collections/natolambert

Aligning open language models | Lambert: 16

Collection

QR code

17 of 77

This lecture’s atlas

Follow along at hf.co/collections/natolambert

Not covering every model since ChatGPT
Building substantially on other developments pre ChatGPT

Aligning open language models | Lambert: 17

Collection

QR code

18 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 18

0: kickstart

Collection

QR code

19 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 19

0: kickstart

1: instruction tuning blooms

Collection

QR code

20 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 20

0: kickstart

1: instruction tuning blooms

2: evals & expectations

Collection

QR code

21 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 21

0: kickstart

1: instruction tuning blooms

2: evals & expectations

Collection

QR code

3: RLHF works!

22 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 22

0: kickstart

1: instruction tuning blooms

2: evals & expectations

3: RLHF works!

4. expansion

Collection

QR code

23 of 77

Aligning open language models: Chapters

0: kickstart

1: instruction tuning blooms

2: evals & expectations

3: RLHF works!

4. expansion

Collection

QR code

24 of 77

Base models

Follow along at hf.co/collections/natolambert

Aligning open language models | Lambert: 24

25 of 77

Aligned / fine-tuned / preference trained models

Follow along at hf.co/collections/natolambert

Aligning open language models | Lambert: 25

26 of 77

Some definitions for “alignment” of models

Instruction fine-tuning (IFT): Training a model to follow use instructions (usually via autoregressive LM loss)
Supervised fine-tuning (SFT): Training a model to learn task-specific capabilities (usually via autoregressive LM loss)
Alignment: General notion of training a model to mirror user desires, any loss function
Reinforcement learning from human feedback (RLHF): Specific technical tool for training ML models from human data
Preference fine-tuning: Using labeled preference data to fine-tune a LM (either with RL, DPO, or another loss function)

Aligning open language models | Lambert: 26

27 of 77

Chapter 0: The race to reproduce ChatGPT

The land grab and craziness until LLaMA dropped
A time for basic questions: What is red-teaming? What makes a dialogue agent useful? What tools can we use?

28 of 77

Chapter 1: The first open instruct models

29 of 77

First open instruction tuned models

Aligning open language models | Lambert: 29

Alpaca

13 Mar. 2023

52k self-instruct style data distilled from text-davinci-003
Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

30 of 77

Key idea: Instruction fine-tuning (IFT)

Adapt base model to specific style of input
Ability to include system prompts, multi-turn dialogues, and other chat templates

Aligning open language models | Lambert: 30

<|system|>

You’re a helpful agent

<|end|>

<|user|>

{query}

<|end|>

<|assistant|>{Answer goes here}

System prompt

Special

tokens

31 of 77

Key idea: Instruction fine-tuning (IFT)

starting point: a base language model

continue training a transformer with pairs of

question: answer

Aligning open language models | Lambert: 31

Stack Overflow :What makes a transformer a transformer?, nbro 2021

32 of 77

Key idea: Self-instruct / synthetic data

Start: N high-quality (often human) prompts

Ask a strong LM: Create a modified version of these instructions.

Generate completions with another (or same) strong LM.

End: easily 10x more (synthetic) training data!�

(synthetic data = text generated by another LLM)

Aligning open language models | Lambert: 32

Taori et al. 2023.

Self-instruct: Wang et al. 2022

33 of 77

First open instruction tuned models

Aligning open language models | Lambert: 33

Alpaca

13 Mar. 2023

52k self-instruct style data distilled from text-davinci-003
Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

Fine-tunes ChatGPT data from ShareGPT
LLaMA 7B and 13B diff’s
Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

34 of 77

Key resource: ShareGPT data

Source: Data from a sharing tool for their ChatGPT conversations
Question: Legal grey area, most of these datasets are unlicensed / without consent.
Use: extensive use in last 18 months, starting to be replaced by carefully collected counterparts:

LMSYS-Chat-1M: cleaned conversations from ChatBotArena.
WildChat: free ChatGPT usage in exchange for data.

Aligning open language models | Lambert: 34

Source: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

35 of 77

First open instruction tuned models

Aligning open language models | Lambert: 35

Alpaca

13 Mar. 2023

52k self-instruct style data distilled from text-davinci-003
Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

Fine-tunes ChatGPT data from ShareGPT
LLaMA 7B and 13B diff’s
Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

Koala

3 Apr. 2023

Diverse dataset (Alpaca, Anthropic HH, ShareGPT, WebGPT…)
Human evaluation
LLaMA 7B diff.

https://bair.berkeley.edu/blog/2023/04/03/koala/

36 of 77

Why weight differences?

LLaMA weights were released as “research only” and distributed upon request
License prohibits downstream distribution of artifacts
People release a “weight delta” that can be merged to obtain a model (same architecture, tokenizer, etc)

Aligning open language models | Lambert: 36

LLaMA access form (and license): https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform

Original LLaMA

weights

New chat model

Release this

37 of 77

First open instruction tuned models

Aligning open language models | Lambert: 37

Alpaca

13 Mar. 2023

52k self-instruct style data distilled from text-davinci-003
Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

Fine-tunes ChatGPT data from ShareGPT
LLaMA 7B and 13B diff’s
Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

Koala

3 Apr. 2023

Diverse dataset (Alpaca, Anthropic HH, ShareGPT, WebGPT…)
Human evaluation
LLaMA 7B diff.

https://bair.berkeley.edu/blog/2023/04/03/koala/

Dolly

12 Apr. 2023

15k human written data
Trained on Pythia 12b

https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

38 of 77

First open instruction tuned models

Aligning open language models | Lambert: 38

Alpaca

13 Mar. 2023

52k self-instruct style data distilled from text-davinci-003
Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

Fine-tunes ChatGPT data from ShareGPT
LLaMA 7B and 13B diff’s
Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

Koala

3 Apr. 2023

Diverse dataset (Alpaca, Anthropic HH, ShareGPT, WebGPT…)
Human evaluation
LLaMA 7B diff.

https://bair.berkeley.edu/blog/2023/04/03/koala/

Dolly

12 Apr. 2023

15k human written data
Trained on Pythia 12b

https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

MT Bench 13B: 4.53

MT Bench 7B: 6.69

MT Bench 13B: 6.08

MT Bench 12B: 3.28

39 of 77

OpenAssistant: The first open, human instruction dataset

“In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations (OASST1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers.”

April 15th 2023

Used extensively in future models.
Still the only human dataset of this size to be released.
OpenAssistant and others trained the popular models with it.
(released fine-tuned models too!)

Aligning open language models | Lambert: 39

Dataset: https://huggingface.co/datasets/OpenAssistant/oasst1

Paper: https://arxiv.org/abs/2304.07327

40 of 77

StableVicuna: The first RLHF model

28 April 2024

Trained with proximal policy optimization (PPO) on popular datasets

OAsst1 dataset for SFT + PPO
Anthropic HH + Stanford Human Preferences (SHP) for RL

Standard formulation. Ahead of its time!

Aligning open language models | Lambert: 40

Model: https://huggingface.co/CarperAI/stable-vicuna-13b-delta

Blog: https://stability.ai/news/stablevicuna-open-source-rlhf-chatbot

41 of 77

QLoRA & Guanaco

LoRA: Low Rank Adaptation

Popular tool for fine-tuning models with lower memory consumption.

QLoRA: LoRA + quantized base model (plus paging and double quantization)

Further reduce memory consumption of fine-tuning while (mostly) maintaining performance

Aligning open language models | Lambert: 41

Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/

Paper: https://arxiv.org/abs/2305.14314

Dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco

Model: https://huggingface.co/timdettmers/guanaco-65b

Original thread: https://twitter.com/Tim_Dettmers/status/1661379354507476994?lang=en

Image credit: Tim Dettmers

42 of 77

QLoRA & Guanaco

LoRA: Low Rank Adaptation

Popular tool for fine-tuning models with lower memory consumption.

QLoRA: LoRA + quantized base model (plus paging and double quantization)

Further reduce memory consumption of fine-tuning while (mostly) maintaining performance

Aligning open language models | Lambert: 42

Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/

Paper: https://arxiv.org/abs/2305.14314

Dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco

Model: https://huggingface.co/timdettmers/guanaco-65b

Original thread: https://twitter.com/Tim_Dettmers/status/1661379354507476994?lang=en

Approximate VRAM requirements.

Source: https://github.com/hiyouga/LLaMA-Factory#hardware-requirement

43 of 77

QLoRA & Guanaco

Guanaco (33B MT Bench 6.88)

First models trained with QLoRA plus quality filtered Open Assistant dataset.

Both the dataset and QLoRA method are still regularly used.

State-of-the-art open model at release.

Aligning open language models | Lambert: 43

Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/

Paper: https://arxiv.org/abs/2305.14314

Dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco

Model: https://huggingface.co/timdettmers/guanaco-65b

Original thread: https://twitter.com/Tim_Dettmers/status/1661379354507476994?lang=en

Image credit: Tim Dettmers

44 of 77

Chapter 2: Setting expectations & evals.

45 of 77

Do LoRA methods work with RL?

Big exploration in late may / summer 2023
Few models that “splashed” trained this way
Likely not a fundamental limitation, but is tricky hyperparameter space

Aligning open language models | Lambert: 45

46 of 77

Llama 2 chat backlash

Should chat models be “safe?”

Aligning open language models | Lambert: 46

Röttger et al. 2023

47 of 77

“Uncensored” models

Goal: Modify models so they don’t refuse any request
Method: Remove instances of “as a language model” or “Sorry, …” in training data
Confusion: Not the clearest name for things. The models were never explicitly censored to begin with.
Prefer the name filtered or unbiased.

One of the first models named this way (April 2023): cognitivecomputations/WizardLM-7B-Uncensored

Example models here: https://huggingface.co/models?other=uncensored

Aligning open language models | Lambert: 47

48 of 77

Transition period: Ultrachat, OpenChat, XwinLM, OpenHermes, and more fine-tunes

A series of strong models trained with instruction tuning and/or RLHF, but none markedly shifted the narrative.

April. 2023: WizardLM v0.1 trained with EvolInstruct (synthetic data generation), other strong RL math/code models mostly ignored by community, MT Bench 13B: 6.35
Jun. 2023: UltraLM 13B trained on new UltraChat dataset
Jun. 2023: OpenChat 13B trained on filtered ShareGPT data
Sep. 2023: XwinLM 7B, strong model “trained with RLHF,” but no details, no paper� XwinLM 70B, first model to beat GPT-4 on AlpacaEval
Oct. 2023: Teknium/OpenHermes on Mistral 7B, strong synthetic data filtering + better base model

Aligning open language models | Lambert: 48

Note 17 April 2024: WizardLM not currently available officially on HuggingFace for artifact review at Microsoft.

49 of 77

Establishing evaluation

The four most popular aligned model evaluations of the past year were created within 2 months of each other!

May 3, 2023: ChatBotArena
June 8, 2023:AlpacaEval
June 22, 2023: MT Bench
July 2023: Open LLM Leaderboard

Aligning open language models | Lambert: 49

50 of 77

ChatBotArena

Side by side preference collection of two different models.

Pros:

At-scale, blind LLM community comparisons.
Ranks top closed and open models.

Cons:

Do not control or know prompt or user distribution.
Hard tool to base engineering decisions on!
Only the best models get in.

Aligning open language models | Lambert: 50

51 of 77

AlpacaEval

LLM-as-a-judge mirroring preference collection phase:

Show candidate model response versus baseline model completion, ask which is better
Sourced from common instruction datasets validation split (Self Instruct, Open Assistant, Vicuna, Koala, and Anthropic HH)

Aligning open language models | Lambert: 51

52 of 77

AlpacaEval

Strengths:

More samples creates smaller error bars than MT Bench
Single-turn is a little easier to use

Shortcomings (similar to MT Bench):

Win rate based on comparison to outdated model (Davinci003)
No categories or clear interpretation of total result
Potential length bias
Saturation of scores

Aligning open language models | Lambert: 52

53 of 77

Aside: AlpacaEval 2

Compare to GPT4 rather than Davinci003 (InstructGPT variant)
Potentially too challenging to trust results
Linear length correlation penalty is decent correction, but not a long term solution

Aligning open language models | Lambert: 53

54 of 77

MT Bench

LLM-as-a-judge: ask a LLM (GPT4/Claude) to rate a model response:

Two turns (response & follow-up)
7 categories (writing, role-play, math, coding, extraction, STEM, humanities)
Rate one model at a time 0-10 rating scale to mitigate positional bias

Aligning open language models | Lambert: 54

Zheng, Lianmin, et al. "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena." arXiv preprint arXiv:2306.05685 (2023).

55 of 77

MT Bench

LLM-as-a-judge: ask a LLM (GPT4/Claude) to rate a model response:

Two turns (response & follow-up)
7 categories (writing, role-play, math, coding, extraction, STEM, humanities)
0-10 rating scale�

Shortcomings: hard to use as sole focus during training

Variance in scoring up to ~0.5 points, big deltas needed for signal�(via generation temperature and model API variation)
Only 80 prompts in the eval. set
Scoring saturated at top end (GPT4: 8.99)

Aligning open language models | Lambert: 55

Zheng, Lianmin, et al. "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena." arXiv preprint arXiv:2306.05685 (2023).

56 of 77

Open LLM Leaderboard

Started as an engineering tool for automatically evaluating competitive models. Turned into an product with an entire team.

Evaluate almost any model on the hub on core LLM tasks.
Good for discovering models
Bad for LLM developers to fixate on
RLHF has not been shown to improve these metrics deeply, starting to get better in 2024

Aligning open language models | Lambert: 56

57 of 77

Establishing evaluation

How easy to use are these evaluations

ChatBotArena: Hard to use as training signal (slow feedback), yet most reliable
AlpacaEval: Slightly expensive, for academics, training tool (~$5 per model eval), decent correlation
MT Bench: Cheap training tool ($.5 per model eval), decent correlation
Open LLM Leaderboard: Not super useful for studying alignment

Aligning open language models | Lambert: 57

58 of 77

Chapter 3: Getting RLHF to work

59 of 77

Review: RLHF objective

Aligning open language models | Lambert: 59

π: LLM policy

π_θ: base LLM

x: prompt

y: completion

60 of 77

Review: RLHF objective

Aligning open language models | Lambert: 60

Optimize “reward” inspired ▲ by human preferences

▲ Constrain the model to not trust the reward too much (preferences are hard to model)

π: LLM policy

π_θ: base LLM

x: prompt

y: completion

61 of 77

Review: RLHF objective

Aligning open language models | Lambert: 61

Optimize “reward” inspired ▲ by human preferences

▲ Constrain the model to not trust the reward too much (preferences are hard to model)

π: LLM policy

π_θ: base LLM

x: prompt

y: completion

Primary questions:

How to implement reward: r(x,y)
How to optimize reward

62 of 77

Review: Preference (reward) modeling

Can we just use supervised learning on scores?

Assigning a scalar reward of how good a response is did not work
Pairwise preferences are easy to collect and worked!

Aligning open language models | Lambert: 62

Bradley Terry model:�Estimate probability that a given pairwise preference is true

Score from

optimal reward model

Chosen completion

Rejected completion

Prompt

Key idea:

Probability ∝ reward

63 of 77

What if we just use gradient ascent on this equation?

Aligning open language models | Lambert: 63

64 of 77

What if we just use gradient ascent on this equation?

The answer, with some math, is:

Direct Preference Optimization (DPO)

Released on May 29th 2023

(4+ months before models we’re discussing)

Aligning open language models | Lambert: 64

Rafailov, Sharma, Mitchell et al. 2023

65 of 77

DPO core facts

Extremely simple to implement
Scales nicely with existing distributed training libraries
Trains an implicit reward function (can still be used as a reward model, see RewardBench)

The first 2 points mean we’ll see more DPO models than anything else and learn it’s limits!

Aligning open language models | Lambert: 65

Example code.

Rafailov, Sharma, Mitchell et al. 2023

66 of 77

DPO vs RL (PPO, REINFORCE, …)

DPO and PPO are very different optimizers.

It is learning directly from preferences vs. using RL update rules.

It is also not really online vs offline RL, but that is more muddled.��More discussion:�https://twitter.com/srush_nlp/status/1729896568956895370, https://www.interconnects.ai/p/the-dpo-debate, https://www.youtube.com/watch?v=YJMCSVLRUNs

Aligning open language models | Lambert: 66

Credit Tom Goldstein

https://twitter.com/tomgoldsteincs

67 of 77

RLHF phase: Zephyr β

First model to make a splash with DPO!
Fine-tune of Mistral 7B with UltraFeedback dataset.
Discovered weird low learning rates that are now standard (~5E-7)
MT Bench 7.34

Aligning open language models | Lambert: 67

UltraFeedback: https://arxiv.org/abs/2310.01377

Model: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

68 of 77

RLHF phase: Tulu 2

First model to scale DPO to 70 billion parameters!
Strongly validated the Zephyr results.
Started the DPO vs. PPO debate for real.
MT Bench 70B: 7.89

Aligning open language models | Lambert: 68

Model: https://huggingface.co/allenai/tulu-2-dpo-70b

69 of 77

RLHF phase: SteerLM & Starling

Still plenty of models showing that PPO (and RL methods) outperforms DPO!

SteerLM: Attribute conditioned fine-tuning
Starling: Introduced new preference dataset, Nectar, and k-wise reward model loss function (i.e. moving beyond pairwise preferences)

MT Bench 7B: 8.09 (beat every model except GPT-4 at the time)

Aligning open language models | Lambert: 69

SteerLM: https://huggingface.co/nvidia/SteerLM-llama2-13B

Starling: https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha

70 of 77

Chapter 4: Modern ecosystem

71 of 77

Diversity of models and more players

Examples:

Genstruct from NousResearch: model for rephrasing any text into instructions
OLMo-Instruct from AI2: truly open-source models.
More players such as Databricks DBRX and Cohere’s Command R+ (first open model to pass GPT-4 on ChatBotArena)
Research models such as Microsoft Rho (reward model weighted pretraining)
Multilingual fine-tuning with Aya
More MoE models (JetMoE, Qwen Moe,...)
State-space models such as Jamba

Aligning open language models | Lambert: 71

72 of 77

Llama 3

More about scaling than alignment. TBD if they solved the Llama 2 refusals problem.

More here: www.interconnects.ai/p/llama-3-and-scaling-open-llms

Aligning open language models | Lambert: 72

73 of 77

Current directions

Aligning open language models

74 of 77

Open vs. closed aligned models

Lambert: 74

Image credit:

Maxime Labonne

Tulu 2 70B

75 of 77

Current directions

Data! Data! Data! We are severely limited on experimentation by having too few preference datasets (Anthropic HH, UltraFeedback, and Nectar are main three).
Continuing to improve DPO: tons of papers iterating on the method (ORPO, cDPO, IPO, BCO, KTO, DNO, sDPO, etc)
More model sizes: Most alignment research happened at 7 or 13B parameter scale. Expand up and down!
Specific evaluations: How do we get more specific evaluations than ChatBotArena?
Personalization: A large motivation behind local models, young area academically

Aligning open language models | Lambert: 75

I cover these topics regularly on my blog www.interconnects.ai

76 of 77

Where open alignment is happening

AI2 (self bias): Tulu models, OLMo-Adapt, dataset releases
HuggingFaceH4: Quick releases on new base models, recipes for new techniques (e.g. ORPO / CAI), other tools
Berkeley-Nest/Nexusflow: Nectar dataset / Starling models
NousResearch: Hermes fine-tuning models, datasets, and other
OpenBMB: Preference datasets, reward models, and more
Argilla: Open preference datasets and resulting models
Some HuggingFace users

Maxime Labonne: Model merging & other fine-tunes
Jon Durbin: More model merges & other fine-tunes

Aligning open language models | Lambert: 76

I cover these topics regularly on my blog www.interconnects.ai

77 of 77

Thank you! Questions

Contact: nathan at natolambert dot com

Socials: @natolambert

Writing: interconnects.ai

Thanks to many teammates at HuggingFace and AI2 for supporting this journey!

Aligning open language models | Lambert: 77