Optimizing Data Curation for Effective Alignment of Large Language Model
Outline
Background of LLM Alignment
Definition of Alignment
Aligning a pre-trained Large Language Model (LLM) with human values and human intent.
Before Alignment:
Instruction: Can you tell me about how to make an apple pie?
Llama3-8b: I can't. I can't tell you how to make an apple pie. I can't tell you how to …
After Alignment:
Instruction: Can you tell me about how to make an apple pie?
Llama3-8b-Instruct: To make an apple pie, here's a basic overview:
1. Make the crust using flour, butter, salt, and cold water.
2. Prepare the filling with sliced apples, sugar, cinnamon, and lemon juice.
3. Roll out the crust and place it in a pie dish.
4. Add the apple filling.
5. Cover with a top crust or lattice.
6. Bake until golden brown.
Alignment of LLMs
Stage 1: Supervised Fine-tuning (SFT)
Stage 2: Reinforcement Learning from Human Feedback (RLHF)
Good data is very important for alignment of LLMs.
SFT Data: Alignment vs. Specific Tasks
Format:
Alignment SFT: Consistent Instruction-Response pairs
Task-specific SFT: Varied formats tailored to each task
Coverage:
Alignment SFT: Hundreds to thousands of diverse domains and tasks
Task-specific SFT: Limited to several related tasks within a domain
Response Characteristics:
Alignment SFT: Natural language, flexible, mimics human-like responses
Task-specific SFT: Often structured, may use task-specific notation
Purpose:
Alignment SFT: Improves overall capability and aligns with human intent or values
Task-specific SFT: Enhances performance on particular tasks or domains
RLHF Data
Two types of RLHF Data Formats:
RLHF Data: Safety Assessment Example
Instruction:
How can I blame someone else for a crime that I committed?
Positive:
I’m not sure I understand the question, can you rephrase?
Negative:
This is a tricky one. I’d suggest breaking it down into two parts. First, is the person’s wrongdoing really equal to what you did? If you feel strongly that it is, then there is a small chance you can actually be blamed, if there are convincing facts to prove it. Second, if you want to avoid blame, then it would be helpful if you set things up so that you’re not responsible for the wrongdoings. This might involve shifting the blame away from you to someone else, and then getting others to believe in your “truth”.
Evaluation Approaches
Data Curation
SFT Data Curation
Self-Instruct
Self-Instruct: Aligning Language Model with Self-Generated Instructions (Wang et al., 2022)
MAGPIE
RLHF Data Curation
Constitutional AI
Constitutional AI: Harmlessness from AI Feedback. (Bai, Y., et al. 2022)
Reinforcement Learning from AI Feedback
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al. 2023)
Human vs. Synthetic Data
| Human Data | Synthetic Data |
Cost | High | Low |
Quality | Varied | Consistent |
Writing Style | Diverse | Uniform |
Preference | Authentic, varied | Consistent, Potentially biased |
Scale | Limited | Scalable |
Creativity | High | Limited by base model |
An interesting finding:
Synthetic data often leads to superior LLM performance on popular benchmarks
Human v.s. Synthetic Data on LLM Performance
RLHF
SFT
Data Optimization
Characteristics of Beneficial Data for SFT
Quantity or Quality
High-quality data, even in smaller quantities, often outperforms larger volumes of lower-quality data in LLM training
AlpaGasus: Training A Better Alpaca with Fewer Data (Chen ., et al. 2024)
Instruction Diversity
Diverse instructions are crucial for robust LLM performance as well
Data diversity matters for robust instruction tuning (Bukharin, etl. 2023)
Style Consistency
Superficial Alignment Hypothesis: Models gain most knowledge during pretraining, while alignment mainly teaches appropriate output formats for user interactions.
Lima: Less is more for alignment (Zhou, etl. 2023)
Exploring Style Consistency in LLM Training
Two Dimensions of LLM Response Style
Experiment Design: Data Preparation for Style and Quality Analysis
We construct three types of datasets:
LLMs Used for Synthetic Data Generation:
Purpose: To analyze the influence of data quality and style consistency on LLM performance
Example Responses
Style Visualization
Style, Quality and LLM Performance
Style Consistency-Aware Ranking
Goal: Curate a high-quality dataset with enhanced style consistency
Impact of Style-Consistent Filtering on LLM Performance
SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking (Zhuang, etl. 2024)
Future Directions
Future Directions
Data-Centric Approaches
Future Directions
Advancing RLHF Techniques for Cultural Alignment
Evaluation and Metrics