1 of 32

Niels Rogge

December 2023

Training and deploying open-source LLMs

2 of 32

Overview

  1. The rise of open LLMs
  2. Training LLMs
  3. Deploying LLMs
  4. Why open-source?
  5. Exciting developments

3 of 32

The rise of open LLMs

  • February 2023:
    • LLaMa
  • March:
    • Alpaca, Vicuna
  • April:
    • Koala
  • May:
    • StarCoder, StarChat, MPT-7B, Guanaco
  • June:
    • Falcon, MPT-30B, Phi-1
  • July:
    • LLaMa-2
  • September:
    • Falcon 180B, Mistral-7b
  • November:
    • Yi-34B, Zephyr-7b
  • December:
    • Mixtral-8x7b, Phi-2

4 of 32

The rise of open LLMs

5 of 32

The rise of open LLMs

Chatbot Arena by LMSys

6 of 32

The rise of open LLMs

Chatbot Arena by LMSys

7 of 32

The rise of open LLMs

Chatbot Arena by LMSys

Mixtral already on par with GPT-3.5, better than Gemini Pro

8 of 32

The rise of open LLMs

9 of 32

Training LLMs

Karpathy, 2023

10 of 32

Training LLMs

  1. Pre-training

  • predicting the next token
  • typically done by large organizations (OpenAI, Meta, Microsoft)
  • across clusters of GPUs
    • GPT-4: 25,000 GPUs for 100 days
    • LLaMa-2 70B: 6,000 GPUs for 12 days
  • costs millions of $$$

=> to get a “base model”

11 of 32

Training LLMs

2. Supervised fine-tuning (SFT)

  • turn the model into a chatbot
  • 1-100k (input, output pairs)
  • one or more GPUs
    • runpod.io
    • vast.ai
    • lambda labs
    • … or your favorite cloud

12 of 32

Training LLMs

2. Supervised fine-tuning (SFT)

  • recommended: TRL library

13 of 32

Training LLMs

2. Supervised fine-tuning (SFT)

  • recommended: TRL library
    • includes PEFT (Q-LoRa), Unsloth
    • allows to fine-tune huge LLMs on consumer hardware

14 of 32

Training LLMs

3. Human preference training

  • make the chatbot
    • friendly
    • harmless
    • helpful
  • 1-100k (chosen, rejected pairs)
  • one or more GPUs

15 of 32

Training LLMs

3. Human preference training

  • make the chatbot
    • friendly
    • harmless
    • helpful
  • 1-100k (chosen, rejected pairs)
  • one or more GPUs

16 of 32

Training LLMs

3. Human preference training

  • recommended: TRL library
    • includes PPO, DPO

17 of 32

Training LLMs

3. Human preference training

  • recommended: TRL library
    • DPO: allows to train on human preferences directly
      • no need for separate reward model

18 of 32

Training LLMs

Hugging Face alignment handbook

  • includes recipes for SFT, DPO

19 of 32

Deploying LLMs

  • Serverless vs. dedicated compute

20 of 32

Deploying LLMs

  • Serverless solutions
    • Together.ai
    • AnyScale
    • Perplexity.ai

  • Charge per token
    • e.g. $0.0006/1K tokens for Mixtral-8x7B
    • >60% cheaper than GPT-3.5

21 of 32

Deploying LLMs

  • Serverless solutions
    • Together.ai
    • AnyScale
    • Perplexity.ai

  • Charge per token
    • e.g. $0.0006/1K tokens for Mixtral-8x7B
    • >60% cheaper than GPT-3.5

22 of 32

Deploying LLMs

  • Serverless solutions
    • Together.ai
    • AnyScale
    • Perplexity.ai

  • Charge per token
    • e.g. $0.0006/1K tokens for Mixtral-8x7B
    • >60% cheaper than GPT-3.5

23 of 32

Deploying LLMs

  • Dedicated compute
    • TGI (Text Generation Inference), vLLM
    • Inference Endpoints, Together.ai

24 of 32

Deploying LLMs

  • Dedicated compute
    • TGI (Text Generation Inference), vLLM
    • Inference Endpoints, Together.ai

  • Charge per time
    • e.g. $2/hour

25 of 32

Deploying LLMs

  • Dedicated compute
    • TGI (Text Generation Inference), vLLM
    • Inference Endpoints

  • Charge per time
    • e.g. $2/hour

26 of 32

Why open-source?

Advantages

Disadvantages

  • No data being sent to another party (private)
  • Access to the model
  • Fine-tuning
  • Run at the edge (ggml, MLX)
  • Doesn`t become lazy

  • Performance may be subpar without any fine-tuning
  • Deploying costs (learning curve)

27 of 32

Why closed-source?

Advantages

Disadvantages

  • Everything is handled for you (only pay per x tokens)
  • Performance

  • Underlying model might change without you knowing it
  • Prompting may require update
  • Dependency on another party (lock-in)
  • Data being sent to another party
  • Data cut-off (April 2023)

28 of 32

Exciting developments

Expect LLMs to become smaller, more capable and run a lot faster

29 of 32

Exciting developments

Sit back and enjoy the race 🍿

30 of 32

Exciting developments

  • Phi-2: small LLM trained on 1.4 trillion tokens
  • MLX: Mixtral on a Macbook
  • Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models: RLAIF > human labeled data
  • MedPrompt: getting 90% on MMLU with GPT-4 just by prompting
  • ???

31 of 32

Exciting developments

  • Much more to come…

32 of 32

Thanks for your

attention!

PS: connect with me!

@NielsRogge