1 of 77

Aligning open language models

Nathan Lambert || Allen Institute for AI || @natolambert

Stanford CS25: Transformers United V4

2 of 77

A heavily abbreviated history of language models (LMs)

Aligning open language models | Lambert: 2

3 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 3

Shannon 1948

1948: Claude Shannon models English

4 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 4

1948: Claude Shannon models English

1948-2017:

5 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 5

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

Vaswani et al. 2017

6 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 6

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo, and BERT released

Radford et al. 2018, Devlin et al. 2018

7 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 7

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

Radford et al. 2019, Kaplan et al. 2020

8 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 8

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws (and releases)

9 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 9

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

2020: GPT-3 surprising capabilities. many harms

10 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 10

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

2020: GPT-3 surprising capabilities

2021: Stochastic parrots

11 of 77

A heavily abbreviated history of LMs

Aligning open language models | Lambert: 11

1948: Claude Shannon models English

1948-2017:

2017: the transformer is born

2018: GPT-1, ELMo and BERT released

2019: GPT-2 and scaling laws

2020: GPT-3 surprising capabilities

2021: Stochastic parrots

2022: ChatGPT

12 of 77

Can ChatGPT exist without RLHF?

RLHF seems to be necessary, but not sufficient

Aligning open language models | Lambert: 12

13 of 77

RLHF is relied upon elsewhere

RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more

Aligning open language models | Lambert: 13

14 of 77

RLHF is relied upon elsewhere

RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more

Aligning open language models | Lambert: 14

Bai, Y. et al. “Constitutional AI: Harmlessness from AI Feedback.” 2023.

Anthropic’s Claude

15 of 77

RLHF is relied upon elsewhere

RLHF is a key factor in many popular models, both on and off the record, including ChatGPT, Bard/Gemini, Claude, Llama 2, and more

Aligning open language models | Lambert: 15

“Meanwhile reinforcement learning, known for its instability, seemed a somewhat shadowy field for those in the NLP research community. However, reinforcement learning proved highly effective, particularly given its cost and time effectiveness.”

- Touvron, H. et al. “ Llama 2: Open Foundation and Fine-Tuned Chat Models.” 2023

Bai, Y. et al. “Constitutional AI: Harmlessness from AI Feedback.” 2023.

Anthropic’s Claude

Meta’s Llama 2

16 of 77

This lecture’s atlas

Follow along at hf.co/collections/natolambert

Aligning open language models | Lambert: 16

Collection

QR code

17 of 77

This lecture’s atlas

Follow along at hf.co/collections/natolambert

  • Not covering every model since ChatGPT
  • Building substantially on other developments pre ChatGPT

Aligning open language models | Lambert: 17

Collection

QR code

18 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 18

0: kickstart

Collection

QR code

19 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 19

0: kickstart

1: instruction tuning blooms

Collection

QR code

20 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 20

0: kickstart

1: instruction tuning blooms

2: evals & expectations

Collection

QR code

21 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 21

0: kickstart

1: instruction tuning blooms

2: evals & expectations

Collection

QR code

3: RLHF works!

22 of 77

Aligning open language models: Chapters

Aligning open language models | Lambert: 22

0: kickstart

1: instruction tuning blooms

2: evals & expectations

3: RLHF works!

4. expansion

Collection

QR code

23 of 77

Aligning open language models: Chapters

0: kickstart

1: instruction tuning blooms

2: evals & expectations

3: RLHF works!

4. expansion

Collection

QR code

24 of 77

Base models

Follow along at hf.co/collections/natolambert

Aligning open language models | Lambert: 24

25 of 77

Aligned / fine-tuned / preference trained models

Follow along at hf.co/collections/natolambert

Aligning open language models | Lambert: 25

26 of 77

Some definitions for “alignment” of models

  • Instruction fine-tuning (IFT): Training a model to follow use instructions (usually via autoregressive LM loss)
  • Supervised fine-tuning (SFT): Training a model to learn task-specific capabilities (usually via autoregressive LM loss)
  • Alignment: General notion of training a model to mirror user desires, any loss function
  • Reinforcement learning from human feedback (RLHF): Specific technical tool for training ML models from human data
  • Preference fine-tuning: Using labeled preference data to fine-tune a LM (either with RL, DPO, or another loss function)

Aligning open language models | Lambert: 26

27 of 77

Chapter 0: The race to reproduce ChatGPT

  • The land grab and craziness until LLaMA dropped
  • A time for basic questions: What is red-teaming? What makes a dialogue agent useful? What tools can we use?

28 of 77

Chapter 1: The first open instruct models

29 of 77

First open instruction tuned models

Aligning open language models | Lambert: 29

Alpaca

13 Mar. 2023

  • 52k self-instruct style data distilled from text-davinci-003
  • Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

30 of 77

Key idea: Instruction fine-tuning (IFT)

  • Adapt base model to specific style of input
  • Ability to include system prompts, multi-turn dialogues, and other chat templates

Aligning open language models | Lambert: 30

<|system|>

You’re a helpful agent

<|end|>

<|user|>

{query}

<|end|>

<|assistant|>{Answer goes here}

System prompt

Special

tokens

31 of 77

Key idea: Instruction fine-tuning (IFT)

starting point: a base language model

continue training a transformer with pairs of

question: answer

Aligning open language models | Lambert: 31

Stack Overflow :What makes a transformer a transformer?, nbro 2021

32 of 77

Key idea: Self-instruct / synthetic data

Start: N high-quality (often human) prompts

Ask a strong LM: Create a modified version of these instructions.

Generate completions with another (or same) strong LM.

End: easily 10x more (synthetic) training data!�

(synthetic data = text generated by another LLM)

Aligning open language models | Lambert: 32

Taori et al. 2023.

Self-instruct: Wang et al. 2022

33 of 77

First open instruction tuned models

Aligning open language models | Lambert: 33

Alpaca

13 Mar. 2023

  • 52k self-instruct style data distilled from text-davinci-003
  • Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

  • Fine-tunes ChatGPT data from ShareGPT
  • LLaMA 7B and 13B diff’s
  • Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

34 of 77

Key resource: ShareGPT data

  • Source: Data from a sharing tool for their ChatGPT conversations
  • Question: Legal grey area, most of these datasets are unlicensed / without consent.
  • Use: extensive use in last 18 months, starting to be replaced by carefully collected counterparts:
    • LMSYS-Chat-1M: cleaned conversations from ChatBotArena.
    • WildChat: free ChatGPT usage in exchange for data.

Aligning open language models | Lambert: 34

Source: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

35 of 77

First open instruction tuned models

Aligning open language models | Lambert: 35

Alpaca

13 Mar. 2023

  • 52k self-instruct style data distilled from text-davinci-003
  • Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

  • Fine-tunes ChatGPT data from ShareGPT
  • LLaMA 7B and 13B diff’s
  • Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

Koala

3 Apr. 2023

  • Diverse dataset (Alpaca, Anthropic HH, ShareGPT, WebGPT…)
  • Human evaluation
  • LLaMA 7B diff.

https://bair.berkeley.edu/blog/2023/04/03/koala/

36 of 77

Why weight differences?

  • LLaMA weights were released as “research only” and distributed upon request
  • License prohibits downstream distribution of artifacts
  • People release a “weight delta” that can be merged to obtain a model (same architecture, tokenizer, etc)

Aligning open language models | Lambert: 36

+

=

Δ

Original LLaMA

weights

New chat model

Release this

37 of 77

First open instruction tuned models

Aligning open language models | Lambert: 37

Alpaca

13 Mar. 2023

  • 52k self-instruct style data distilled from text-davinci-003
  • Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

  • Fine-tunes ChatGPT data from ShareGPT
  • LLaMA 7B and 13B diff’s
  • Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

Koala

3 Apr. 2023

  • Diverse dataset (Alpaca, Anthropic HH, ShareGPT, WebGPT…)
  • Human evaluation
  • LLaMA 7B diff.

https://bair.berkeley.edu/blog/2023/04/03/koala/

Dolly

12 Apr. 2023

  • 15k human written data
  • Trained on Pythia 12b

https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

38 of 77

First open instruction tuned models

Aligning open language models | Lambert: 38

Alpaca

13 Mar. 2023

  • 52k self-instruct style data distilled from text-davinci-003
  • Model weight diff. to LLaMA 7B

https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna (lmsys/vicuna-7b-delta-v0)

30 Mar. 2023

  • Fine-tunes ChatGPT data from ShareGPT
  • LLaMA 7B and 13B diff’s
  • Introduces LLM-as-a-judge

https://lmsys.org/blog/2023-03-30-vicuna/

Koala

3 Apr. 2023

  • Diverse dataset (Alpaca, Anthropic HH, ShareGPT, WebGPT…)
  • Human evaluation
  • LLaMA 7B diff.

https://bair.berkeley.edu/blog/2023/04/03/koala/

Dolly

12 Apr. 2023

  • 15k human written data
  • Trained on Pythia 12b

https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

MT Bench 13B: 4.53

MT Bench 7B: 6.69

MT Bench 13B: 6.08

MT Bench 12B: 3.28

39 of 77

OpenAssistant: The first open, human instruction dataset

“In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations (OASST1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers.”

April 15th 2023

  • Used extensively in future models.
  • Still the only human dataset of this size to be released.
  • OpenAssistant and others trained the popular models with it.
  • (released fine-tuned models too!)

Aligning open language models | Lambert: 39

40 of 77

StableVicuna: The first RLHF model

28 April 2024

Trained with proximal policy optimization (PPO) on popular datasets

  • OAsst1 dataset for SFT + PPO
  • Anthropic HH + Stanford Human Preferences (SHP) for RL

Standard formulation. Ahead of its time!

Aligning open language models | Lambert: 40

41 of 77

QLoRA & Guanaco

LoRA: Low Rank Adaptation

Popular tool for fine-tuning models with lower memory consumption.

QLoRA: LoRA + quantized base model (plus paging and double quantization)

Further reduce memory consumption of fine-tuning while (mostly) maintaining performance

Aligning open language models | Lambert: 41

Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/

Image credit: Tim Dettmers

42 of 77

QLoRA & Guanaco

LoRA: Low Rank Adaptation

Popular tool for fine-tuning models with lower memory consumption.

QLoRA: LoRA + quantized base model (plus paging and double quantization)

Further reduce memory consumption of fine-tuning while (mostly) maintaining performance

Aligning open language models | Lambert: 42

Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/

43 of 77

QLoRA & Guanaco

Guanaco (33B MT Bench 6.88)

First models trained with QLoRA plus quality filtered Open Assistant dataset.

Both the dataset and QLoRA method are still regularly used.

State-of-the-art open model at release.

Aligning open language models | Lambert: 43

Note, this is different: Guanaco - Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputshttps://guanaco-model.github.io/

Image credit: Tim Dettmers

44 of 77

Chapter 2: Setting expectations & evals.

44

45 of 77

Do LoRA methods work with RL?

  • Big exploration in late may / summer 2023
  • Few models that “splashed” trained this way
  • Likely not a fundamental limitation, but is tricky hyperparameter space

Aligning open language models | Lambert: 45

46 of 77

Llama 2 chat backlash

Should chat models be “safe?”

Aligning open language models | Lambert: 46

Röttger et al. 2023

47 of 77

“Uncensored” models

  • Goal: Modify models so they don’t refuse any request
  • Method: Remove instances of “as a language model” or “Sorry, …” in training data
  • Confusion: Not the clearest name for things. The models were never explicitly censored to begin with.
  • Prefer the name filtered or unbiased.

One of the first models named this way (April 2023): cognitivecomputations/WizardLM-7B-Uncensored

Example models here: https://huggingface.co/models?other=uncensored

Aligning open language models | Lambert: 47

48 of 77

Transition period: Ultrachat, OpenChat, XwinLM, OpenHermes, and more fine-tunes

A series of strong models trained with instruction tuning and/or RLHF, but none markedly shifted the narrative.

  • April. 2023: WizardLM v0.1 trained with EvolInstruct (synthetic data generation), other strong RL math/code models mostly ignored by community, MT Bench 13B: 6.35
  • Jun. 2023: UltraLM 13B trained on new UltraChat dataset
  • Jun. 2023: OpenChat 13B trained on filtered ShareGPT data
  • Sep. 2023: XwinLM 7B, strong model “trained with RLHF,” but no details, no paper� XwinLM 70B, first model to beat GPT-4 on AlpacaEval
  • Oct. 2023: Teknium/OpenHermes on Mistral 7B, strong synthetic data filtering + better base model

Aligning open language models | Lambert: 48

Note 17 April 2024: WizardLM not currently available officially on HuggingFace for artifact review at Microsoft.

49 of 77

Establishing evaluation

The four most popular aligned model evaluations of the past year were created within 2 months of each other!

  • May 3, 2023: ChatBotArena
  • June 8, 2023:AlpacaEval
  • June 22, 2023: MT Bench
  • July 2023: Open LLM Leaderboard

Aligning open language models | Lambert: 49

50 of 77

ChatBotArena

Side by side preference collection of two different models.

Pros:

  • At-scale, blind LLM community comparisons.
  • Ranks top closed and open models.

Cons:

  • Do not control or know prompt or user distribution.
  • Hard tool to base engineering decisions on!
  • Only the best models get in.

Aligning open language models | Lambert: 50

51 of 77

AlpacaEval

LLM-as-a-judge mirroring preference collection phase:

  • Show candidate model response versus baseline model completion, ask which is better
  • Sourced from common instruction datasets validation split (Self Instruct, Open Assistant, Vicuna, Koala, and Anthropic HH)

Aligning open language models | Lambert: 51

52 of 77

AlpacaEval

Strengths:

  • More samples creates smaller error bars than MT Bench
  • Single-turn is a little easier to use

Shortcomings (similar to MT Bench):

  • Win rate based on comparison to outdated model (Davinci003)
  • No categories or clear interpretation of total result
  • Potential length bias
  • Saturation of scores

Aligning open language models | Lambert: 52

53 of 77

Aside: AlpacaEval 2

  • Compare to GPT4 rather than Davinci003 (InstructGPT variant)
  • Potentially too challenging to trust results
  • Linear length correlation penalty is decent correction, but not a long term solution

Aligning open language models | Lambert: 53

54 of 77

MT Bench

LLM-as-a-judge: ask a LLM (GPT4/Claude) to rate a model response:

  • Two turns (response & follow-up)
  • 7 categories (writing, role-play, math, coding, extraction, STEM, humanities)
  • Rate one model at a time 0-10 rating scale to mitigate positional bias

Aligning open language models | Lambert: 54

Zheng, Lianmin, et al. "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena." arXiv preprint arXiv:2306.05685 (2023).

55 of 77

MT Bench

LLM-as-a-judge: ask a LLM (GPT4/Claude) to rate a model response:

  • Two turns (response & follow-up)
  • 7 categories (writing, role-play, math, coding, extraction, STEM, humanities)
  • 0-10 rating scale�

Shortcomings: hard to use as sole focus during training

  • Variance in scoring up to ~0.5 points, big deltas needed for signal�(via generation temperature and model API variation)
  • Only 80 prompts in the eval. set
  • Scoring saturated at top end (GPT4: 8.99)

Aligning open language models | Lambert: 55

Zheng, Lianmin, et al. "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena." arXiv preprint arXiv:2306.05685 (2023).

56 of 77

Open LLM Leaderboard

Started as an engineering tool for automatically evaluating competitive models. Turned into an product with an entire team.

  • Evaluate almost any model on the hub on core LLM tasks.
  • Good for discovering models
  • Bad for LLM developers to fixate on
  • RLHF has not been shown to improve these metrics deeply, starting to get better in 2024

Aligning open language models | Lambert: 56

57 of 77

Establishing evaluation

How easy to use are these evaluations

  • ChatBotArena: Hard to use as training signal (slow feedback), yet most reliable
  • AlpacaEval: Slightly expensive, for academics, training tool (~$5 per model eval), decent correlation
  • MT Bench: Cheap training tool ($.5 per model eval), decent correlation
  • Open LLM Leaderboard: Not super useful for studying alignment

Aligning open language models | Lambert: 57

58 of 77

Chapter 3: Getting RLHF to work

58

59 of 77

Review: RLHF objective

Aligning open language models | Lambert: 59

π: LLM policy

πθ: base LLM

x: prompt

y: completion

60 of 77

Review: RLHF objective

Aligning open language models | Lambert: 60

Optimize “reward” inspired by human preferences

Constrain the model to not trust the reward too much (preferences are hard to model)

π: LLM policy

πθ: base LLM

x: prompt

y: completion

61 of 77

Review: RLHF objective

Aligning open language models | Lambert: 61

Optimize “reward” inspired by human preferences

Constrain the model to not trust the reward too much (preferences are hard to model)

π: LLM policy

πθ: base LLM

x: prompt

y: completion

Primary questions:

  • How to implement reward: r(x,y)
  • How to optimize reward

62 of 77

Review: Preference (reward) modeling

Can we just use supervised learning on scores?

  • Assigning a scalar reward of how good a response is did not work
  • Pairwise preferences are easy to collect and worked!

Aligning open language models | Lambert: 62

Bradley Terry model:�Estimate probability that a given pairwise preference is true

Score from

optimal reward model

Chosen completion

Rejected completion

Prompt

Key idea:

Probability reward

63 of 77

What if we just use gradient ascent on this equation?

Aligning open language models | Lambert: 63

64 of 77

What if we just use gradient ascent on this equation?

The answer, with some math, is:

Direct Preference Optimization (DPO)

Released on May 29th 2023

(4+ months before models we’re discussing)

Aligning open language models | Lambert: 64

Rafailov, Sharma, Mitchell et al. 2023

65 of 77

DPO core facts

  • Extremely simple to implement
  • Scales nicely with existing distributed training libraries
  • Trains an implicit reward function (can still be used as a reward model, see RewardBench)

The first 2 points mean we’ll see more DPO models than anything else and learn it’s limits!

Aligning open language models | Lambert: 65

Example code.

Rafailov, Sharma, Mitchell et al. 2023

66 of 77

DPO vs RL (PPO, REINFORCE, …)

DPO and PPO are very different optimizers.

It is learning directly from preferences vs. using RL update rules.

It is also not really online vs offline RL, but that is more muddled.��More discussion:https://twitter.com/srush_nlp/status/1729896568956895370, https://www.interconnects.ai/p/the-dpo-debate, https://www.youtube.com/watch?v=YJMCSVLRUNs

Aligning open language models | Lambert: 66

Credit Tom Goldstein

https://twitter.com/tomgoldsteincs

67 of 77

RLHF phase: Zephyr β

  • First model to make a splash with DPO!
  • Fine-tune of Mistral 7B with UltraFeedback dataset.
  • Discovered weird low learning rates that are now standard (~5E-7)
  • MT Bench 7.34

Aligning open language models | Lambert: 67

68 of 77

RLHF phase: Tulu 2

  • First model to scale DPO to 70 billion parameters!
  • Strongly validated the Zephyr results.
  • Started the DPO vs. PPO debate for real.
  • MT Bench 70B: 7.89

Aligning open language models | Lambert: 68

69 of 77

RLHF phase: SteerLM & Starling

Still plenty of models showing that PPO (and RL methods) outperforms DPO!

  • SteerLM: Attribute conditioned fine-tuning
  • Starling: Introduced new preference dataset, Nectar, and k-wise reward model loss function (i.e. moving beyond pairwise preferences)
    • MT Bench 7B: 8.09 (beat every model except GPT-4 at the time)

Aligning open language models | Lambert: 69

70 of 77

Chapter 4: Modern ecosystem

70

71 of 77

Diversity of models and more players

Examples:

  • Genstruct from NousResearch: model for rephrasing any text into instructions
  • OLMo-Instruct from AI2: truly open-source models.
  • More players such as Databricks DBRX and Cohere’s Command R+ (first open model to pass GPT-4 on ChatBotArena)
  • Research models such as Microsoft Rho (reward model weighted pretraining)
  • Multilingual fine-tuning with Aya
  • More MoE models (JetMoE, Qwen Moe,...)
  • State-space models such as Jamba

Aligning open language models | Lambert: 71

72 of 77

Llama 3

More about scaling than alignment. TBD if they solved the Llama 2 refusals problem.

More here: www.interconnects.ai/p/llama-3-and-scaling-open-llms

Aligning open language models | Lambert: 72

73 of 77

Current directions

Aligning open language models

73

74 of 77

Open vs. closed aligned models

Lambert: 74

Image credit:

Maxime Labonne

Tulu 2 70B

75 of 77

Current directions

  • Data! Data! Data! We are severely limited on experimentation by having too few preference datasets (Anthropic HH, UltraFeedback, and Nectar are main three).
  • Continuing to improve DPO: tons of papers iterating on the method (ORPO, cDPO, IPO, BCO, KTO, DNO, sDPO, etc)
  • More model sizes: Most alignment research happened at 7 or 13B parameter scale. Expand up and down!
  • Specific evaluations: How do we get more specific evaluations than ChatBotArena?
  • Personalization: A large motivation behind local models, young area academically

Aligning open language models | Lambert: 75

I cover these topics regularly on my blog www.interconnects.ai

76 of 77

Where open alignment is happening

  • AI2 (self bias): Tulu models, OLMo-Adapt, dataset releases
  • HuggingFaceH4: Quick releases on new base models, recipes for new techniques (e.g. ORPO / CAI), other tools
  • Berkeley-Nest/Nexusflow: Nectar dataset / Starling models
  • NousResearch: Hermes fine-tuning models, datasets, and other
  • OpenBMB: Preference datasets, reward models, and more
  • Argilla: Open preference datasets and resulting models
  • Some HuggingFace users
    • Maxime Labonne: Model merging & other fine-tunes
    • Jon Durbin: More model merges & other fine-tunes

Aligning open language models | Lambert: 76

I cover these topics regularly on my blog www.interconnects.ai

77 of 77

Thank you! Questions

Contact: nathan at natolambert dot com

Socials: @natolambert

Writing: interconnects.ai

Thanks to many teammates at HuggingFace and AI2 for supporting this journey!

Aligning open language models | Lambert: 77