1 of 32

Niels Rogge

December 2023

Training and deploying open-source LLMs

2 of 32

Overview

The rise of open LLMs
Training LLMs
Deploying LLMs
Why open-source?
Exciting developments

3 of 32

The rise of open LLMs

February 2023:

LLaMa

March:

Alpaca, Vicuna

April:

Koala

May:

StarCoder, StarChat, MPT-7B, Guanaco

June:

Falcon, MPT-30B, Phi-1

July:

LLaMa-2

September:

Falcon 180B, Mistral-7b

November:

Yi-34B, Zephyr-7b

December:

Mixtral-8x7b, Phi-2

4 of 32

The rise of open LLMs

5 of 32

The rise of open LLMs

Chatbot Arena by LMSys

6 of 32

The rise of open LLMs

Chatbot Arena by LMSys

7 of 32

The rise of open LLMs

Chatbot Arena by LMSys

Mixtral already on par with GPT-3.5, better than Gemini Pro

8 of 32

The rise of open LLMs

9 of 32

Training LLMs

Karpathy, 2023

10 of 32

Training LLMs

Pre-training

predicting the next token
typically done by large organizations (OpenAI, Meta, Microsoft)
across clusters of GPUs

GPT-4: 25,000 GPUs for 100 days
LLaMa-2 70B: 6,000 GPUs for 12 days

costs millions of $$$

=> to get a “base model”

11 of 32

Training LLMs

2. Supervised fine-tuning (SFT)

turn the model into a chatbot
1-100k (input, output pairs)
one or more GPUs

runpod.io
vast.ai
lambda labs
… or your favorite cloud

12 of 32

Training LLMs

2. Supervised fine-tuning (SFT)

recommended: TRL library

13 of 32

Training LLMs

2. Supervised fine-tuning (SFT)

recommended: TRL library

includes PEFT (Q-LoRa), Unsloth
allows to fine-tune huge LLMs on consumer hardware

14 of 32

Training LLMs

3. Human preference training

make the chatbot

friendly
harmless
helpful

1-100k (chosen, rejected pairs)
one or more GPUs

15 of 32

Training LLMs

3. Human preference training

make the chatbot

friendly
harmless
helpful

1-100k (chosen, rejected pairs)
one or more GPUs

16 of 32

Training LLMs

3. Human preference training

recommended: TRL library

includes PPO, DPO

17 of 32

Training LLMs

3. Human preference training

recommended: TRL library

DPO: allows to train on human preferences directly

no need for separate reward model

18 of 32

Training LLMs

Hugging Face alignment handbook

includes recipes for SFT, DPO

19 of 32

Deploying LLMs

Serverless vs. dedicated compute

20 of 32

Deploying LLMs

Serverless solutions

Together.ai
AnyScale
Perplexity.ai
…

Charge per token

e.g. $0.0006/1K tokens for Mixtral-8x7B
>60% cheaper than GPT-3.5

21 of 32

Deploying LLMs

Serverless solutions

Together.ai
AnyScale
Perplexity.ai
…

Charge per token

e.g. $0.0006/1K tokens for Mixtral-8x7B
>60% cheaper than GPT-3.5

22 of 32

Deploying LLMs

Serverless solutions

Together.ai
AnyScale
Perplexity.ai
…

Charge per token

e.g. $0.0006/1K tokens for Mixtral-8x7B
>60% cheaper than GPT-3.5

23 of 32

Deploying LLMs

Dedicated compute

TGI (Text Generation Inference), vLLM
Inference Endpoints, Together.ai

24 of 32

Deploying LLMs

Dedicated compute

TGI (Text Generation Inference), vLLM
Inference Endpoints, Together.ai

Charge per time

e.g. $2/hour

25 of 32

Deploying LLMs

Dedicated compute

TGI (Text Generation Inference), vLLM
Inference Endpoints

Charge per time

e.g. $2/hour

26 of 32

Why open-source?

Advantages	Disadvantages
No data being sent to another party (private) Access to the model Fine-tuning Run at the edge (ggml, MLX) Doesn`t become lazy	Performance may be subpar without any fine-tuning Deploying costs (learning curve)

27 of 32

Why closed-source?

Advantages	Disadvantages
Everything is handled for you (only pay per x tokens) Performance	Underlying model might change without you knowing it Prompting may require update Dependency on another party (lock-in) Data being sent to another party Data cut-off (April 2023)

28 of 32

Exciting developments

Expect LLMs to become smaller, more capable and run a lot faster

29 of 32

Exciting developments

Sit back and enjoy the race 🍿

30 of 32

Exciting developments

Phi-2: small LLM trained on 1.4 trillion tokens
MLX: Mixtral on a Macbook
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models: RLAIF > human labeled data
MedPrompt: getting 90% on MMLU with GPT-4 just by prompting
…
???

31 of 32

Exciting developments

Much more to come…

32 of 32

Thanks for your

attention!

PS: connect with me!

@NielsRogge