CS 6120
Week 13, Topic 11.2: Practically Leveraging �Large Language Models
Project Demo Day
Each team introduces their Dataset (RAG) or Application (PAL) they are using
Introduce your work (1 min each)
Each instructor / TA tries out projects (5 Q’s each)
Students try each other's application out
Administrivia
Administrative - Homework 5
Large Language Models�
Section 0: Administrivia
Section 1: In-Context Learning: Prompt Engineering
Section 2: Instruction Fine Tuning with RLHF
Section 3: Retrieval Augmented Generation (RAG)
Last Week: Pre-Training and Fine-Tuning
Last Week: Pre-Training and Fine-Tuning
Prompt Engineering: Full Specifications
In-Context Learning
In Context Learning (ICL) - Zero Shot Inference
In Context Learning (ICL) - One Shot Inference
GPT-2
Supply an example!
Summary of ICL, Strongly Dependent on Model Size
Prompt Engineering: Useful Techniques
3 examples + guiding LLM
Text-Davinci-002 (Cusp of 3.0-3.5)
Apropos of nothing, consistently wrong
Text-Davinci-002 (Cusp of 3.0-3.5)
Chain of Thought Reasoning
Chain of Thought Reasoning: 1 or Few Shot Learning
ICL or No?
Note that:
When to Fine-Tune?
Provisioning: Peak vs Baseline
Large Language Models�
Section 0: Administrivia
Section 1: In-Context Learning: Prompt Engineering
Section 2: Instruction Fine Tuning with RLHF
Section 3: Retrieval Augmented Generation (RAG)
Remedying examples from ICL
Instruction Following in LLMs
InstructGPT prioritizes
making it more reliable and aligned with user intent compared to traditional LLMs.
Instruction Following in LLMs
Traditional vs Instruction-Aligned
InstructGPT Training�(Focuses on Alignment):
Traditional LLM Training:
Section 2: Instruction Fine Tuning with Reinforcement Learning
Multi-Task Fine Tuning
FLAN T5
The Samsum Dataset
Fine Tuning LLMs
Pre-Trained LLM
GB/TB of Unstructured Textual Data
GB of Labeled Textual Data for Task(s)
Pre-Trained LLM
prompt + completion
Prompt Template Libraries
Instruction Set Fine Tuning
Instruction Fine-Tuning → INSTRUCT Modeling
Section 2: Instruction Fine Tuning with Reinforcement Learning
Models Behaving Badly
Some Examples
General Alignment
Section 2: Instruction Fine Tuning with Reinforcement Learning
Using Human Feedback Provides Significant Improvements
Training an LLM
Building Alignment through RLHF
InstructGPT
1.3B
Anthropic
52B Parameters
Gopher
280B Parameters
Initial Pre-trained Model
Reinforcement Learning with Human Feedback
Reinforcement Learning
Agent
RL Policy (Model)
Environment
reward: rt
action: at from action space
state: st from state space
Objective:
Win the game!
Reinforcement Learning
Agent
RL Policy = LLM
Environment
reward: rt
action: at from token vocab
state: st from current context
Objective:
Generate aligned text!
Instruct LLM
Reward Model
question
answer
Models Becoming Mainstream
Prompt
Explain the moon landing to a 6 year old in a few sentences.
Completion
GPT-3
Explain the theory of gravity to a 6 year old.
Explain the theory of relativity to a 6 year old in a few sentences.
Explain the big bang theory to a 6 year old.
Explain evolution to a 6 year old.
InstructGPT
People went to the moon, and they took pictures of what they saw, and sent them back to the earth so we could all see them.
Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model, despite having more than 100x fewer parameters.
Section 2: Instruction Fine Tuning with Reinforcement Learning
RLHF - Data Preparation
RLHF - Data Labeling
Sample Instructions for RLHF
Prepare Labeled Data for Training
Section 2: Instruction Fine Tuning with Reinforcement Learning
Sampling from Existing Prompt Data
The training dataset of prompt-generation pairs for the RM is generated by sampling a set of prompts from a predefined dataset (Anthropic’s data generated primarily with a chat tool on Amazon Mechanical Turk is available on the Hub, and OpenAI used prompts submitted by users to the GPT API). The prompts are passed through the initial language model to generate new text.
General Process for RLHF
Reinforcement Learning
Agent
RL Policy = LLM
Environment
reward: rt
action: at from token vocab
state: st from current context
Objective:
Generate aligned text!
Instruct LLM
Reward Model
question
answer
Reinforcement Learning
Agent
RL Policy = LLM
Environment
reward: rt
action: at from token vocab
state: st from current context
Objective:
Generate aligned text!
Instruct LLM
Reward Model
question
answer
Training the Reward Model
Using the Reward Model
Reinforcement Learning
Agent
RL Policy = LLM
Environment
reward: rt
action: at from token vocab
state: st from current context
Objective:
Generate aligned text!
Instruct LLM
Reward Model
question
answer
Reinforcement Learning
Agent
RL Policy = LLM
Environment
reward: rt
action: at from token vocab
state: st from current context
Objective:
Generate aligned text!
Instruct LLM
Reward Model
question
answer
Interacting with the Reward Model
Prompt Dataset
Instruct LLM
Reward Model
“...a friendly animal”
0.24
iteration 1
“A dog is…”
Update LLM with an RL Algorithm
Prompt Dataset
Instruct LLM
RL Algorithm
Reward Model
“...a friendly animal”
0.24
iteration 1
“A dog is…”
prompt: “A dog is”
“...a friendly animal”
Update LLM with an RL Algorithm
prompt: “A dog is”
“...a friendly animal”
Prompt Dataset
RL Updated LLM
RL Algorithm
Reward Model
“...a friendly animal”
iteration 1
“A dog is…”
0.24
Continue to Optimize the LLM
prompt: “A dog is”
“...man’s best friend”
Prompt Dataset
RL Updated LLM
RL Algorithm
Reward Model
“...man’s best friend”
0.57
iteration 2
“A dog is…”
Continue to Optimize the LLM
prompt: “A dog is”
“...the best pet”
Prompt Dataset
RL Updated LLM
RL Algorithm
Reward Model
“...the best pet”
1.24
iteration 3
“A dog is…”
Continue to Optimize the LLM
prompt: “A dog is”
“...a canine”
Prompt Dataset
RL Updated LLM
RL Algorithm
Reward Model
“...a canine”
RLHF
3.28
iteration n
“A dog is…”
Determining the Nature of the RL Algorithm
Prompt Dataset
Human Aligned LLM
Proximal Policy Optimization
Reward Model
“...a friendly animal”
RLHF
0.57
iteration n
“A dog is…”
prompt: “A dog is”
“...a friendly animal”
RLHF - General Concept
Proximal Policy Optimization (PPO)
General Process for RLHF
Avoiding Reward Hacking
Alternatives to RLHF
Aligning LLM Systems: Review
Large Language Models�
Section 0: Administrivia
Section 1: In-Context Learning: Prompt Engineering
Section 2: Instruction Fine Tuning with RLHF
Section 3: Retrieval Augmented Generation (RAG)
Knowledge Cut-offs in LLMs
Challenges with LLMs
LLM Powered Applications - Retrieval Augmented Generation
Retrieval Augmented Generation - LLMs
Large Language Models not storing facts but rather probabilities of a sequence of tokens, as information has been probabilities. ����������Indeed, use them as language rather than information models. We can think of them as effective implementations of auto-completers.
Retrieval Augmented Generation (RAG) Diagrammatically
Corpus of Data
Vector Store
Process to Vectors
User
Process to Vectors
Nearest Neighbors Lookup
Prompt Generation
Prompt Generation
RAG - Implemented via Amazon
Comparing Fine-Tuning with RAG
Example: Query RAG System and LLM Response - Search Legal Docs
Example: Query RAG System and LLM Response - Search Legal Docs
Retrieval Augmented Generation
Could You Use the LLM as a Vector Store?
Pros | Cons |
Comprehensive understanding of information and documents at hand in a compact manner. | Computational load on the LLMs. Retrieval speed would add to inference. It is not entirely clear how to combine the prompt with the comprehensive data. |
RAG External Sources
Chunking - Why?
Chunking - How Big?
Chunking - How Big?
Overlap - How Much?
Data Preparation for RAG
Vector Database Search
Reviewing RAG Systems
Graveyard
Alternative embeddings before generation
Combining query with the context: what’s going on?
On the scale of LLM’s
Understanding tokenization, chunking, etc.
What to use in the Vector Database?