Artificial Intelligence
MSc course
Lab 11: Understanding and Applying Foundation Models
Outline
Foundation Models
Foundation Models
Bommasani et al. On the Opportunities and Risks of Foundation Models (2021)
Foundation Models
Hyperparameter cost problem
Hyperparameter tuning is a critical step in AI model training. Involves finding the best set of parameters that control the learning process.
Challenges
Common Approaches to address the problem
Guess and Pray: This approach relies on intuition and experience to choose hyperparameters. It's less systematic and can be hit-or-miss, but it requires less computational resources.
Exhaustive Search: Methods like grid search systematically explore a range of hyperparameter values. While thorough, this method is computationally expensive and impractical for high-dimensional hyperparameter spaces.
Scaling Laws: These involve using simple, predictive rules that guide the selection of hyperparameters based on the model's size and the dataset. This method tries to balance performance with computational feasibility, using empirical data and theoretical insights to inform choices. It's a more recent approach, gaining traction for its effectiveness in large-scale models.
Scaling Laws
Kaplan et al. Scaling Laws for Neural Language Models. (2020)
Scaling Laws
Data Scaling Laws
The data scaling law is a simple formula that maps dataset size (n) to error.
What do we expect out of scaling laws
Data Scaling Laws - Empirical Observation
Data Scaling Laws - Toy Example
Model Scaling Laws - Parameters
Model Scaling Laws - Depth
Scaling Laws - Problems ?
Lavesque et al. The winograd schema challenge. (2012)
Scaling Laws - Problems ?
Scaling Laws - Problems ?
Scaling Laws - Problems ?
Phase transitions are sudden, discontinuous jumps in performance.
Do we expect to see more phase transitions?
This is probably the big unknown in LM scaling!
Foundation Models Comparison
GPTs, Codex, DALL-E, CLIP
PALM-2, Gopher, Chinchilla, Gemini
LLaMA, Alpaca
Jurassic
HELM
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models.
https://github.com/stanford-crfm/helm
https://crfm.stanford.edu/helm/latest/
Liang et al. Holistic Evaluation of Large Language Models. (2023)
Datasets
Neural Networks are compressed/compiled version of the training data. Therefore, the size of the dataset has to scale accordingly with the size of the model.
GPT-3 175B is trained with 300 Billion tokens collected from a weighted combination of the following datasets:
Brown et al. Language Models are few-shot learners. (2020)
Datasets - The PILE
EleutherAI (a nonprofit organization committed to building open language models), released The Pile, a dataset for language modeling, where the key idea is to source it from smaller high-quality sources (academic + professional sources).
Gao et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling (2020)
Training LLMs - Self Supervised
Self-supervision is a form of unsupervised learning where the data itself provides the supervision.
SS Learning leverages the inherent structure of data to create pseudo-labels, allowing models to learn meaningful representations
Chen et al. Big Self-Supervised Models are Strong Semi-Supervised Learners (2020)
Training LLMs - RLHF and RLAIF
Reinforcement Learning with Human Feedback (RLHF) involves humans providing additional signals or modifying reward structures to guide learning.
Humans demonstrate desired behaviors, which the RL agent then tries to imitate.
Shane et al. Policy shaping: Integrating human feedback with reinforcement learning.(2013).
How to improve FMs without spending millions
Prompt Crafting or Engineering
Prompt crafting or engineering is a method that involves meticulously designing input prompts to enhance the performance of language models for specific tasks. It requires an in-depth understanding of the model and the task, focusing on the choice of words, structure, and context to guide the model towards generating accurate and relevant responses.
Fine tuning pre-trained models
Fine-tuning a pre-trained model involves adjusting it for a specific task by selecting an appropriate dataset, optimizing learning rates, and reducing the number of training iterations. This approach effectively utilizes the model's existing learned features, conserves computational resources, and can be customized to meet specific needs.
Task Specialization
Prompt Engineering
Prompt engineering can include tactics such as:
Zero-shot Learning
Definition: Zero-shot Learning refers to the scenario where the FM is asked to answer a task unseen during training. This capability is made possible due to the broad pretraining which exposes the model to diverse scenarios and tasks. For example, a model can identify animals not by directly learning from images of them, but by understanding and applying descriptive attributes about them.
Advantages
Limitations
Few Shot Learning
Definition: Few-shot Learning refers to the scenario where the model is expected to generalize from a limited number of examples. In the context of Foundation Models, it involves providing a few examples of a particular task to the model in the form of a prompt. This helps the model understand the desired output and generalize from the examples to perform the task on new inputs.
Advantages
Limitations
Chain of Thought
Definition: Chain of Thought (CoT) is achieved by prompting the models to generate a series of intermediate steps that lead to the final answer of a multi-step problem. The technique improve results in reasoning tasks that require logical thinking and multiple steps to solve. CoT could compete with task-specific fine-tuned models on several tasks.
Advantages
Limitations
Chain of Thought
Wei et al. Chain-of-thought prompting elicits reasoning in large language models. (2022)
Self-Reflection
Definition: Self-Reflection is achieved by prompting the models to introspect and analyze their own outputs and reasoning process. The model leverages its underlying knowledge and understanding gained during the pretraining phase to provide insights about its own thought.
Advantages
Limitations
Self-Reflection
Francisco et al. Artificial intelligence as a socratic assistant for moral enhancement (2020)
Retrieval Augmentation Generation (RAG)
Lewis et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. (2020)
Retrieval Augmentation Generation (RAG)
Chen et al. Re-imagen: Retrieval-augmented text-to-image generator (2022).
Training Adaptation
Linear Probing involves training a model on top of the frozen model to adapt to the specific task.
Fine-Tuning involves continuing the training of the pretrained model on the specific task.
Mixed Approaches combines different methods like linear probing and fine-tuning to achieve better results.
Training Adaptation
Using Tools with LLMs
Foundation models can learn how to use tools directly from input prompts. The large-scale training data often include examples of using tools, which the models generalize from.
Advantages
Coding
FMs can write code in various programming languages, understand syntax, solve algorithmic problems, and debug. For example, they can generate a Python or C++ functions to solve a system of linear equations.
Advantage
Limitation
Searching the Net
GPT-4 generate responses based on their training data. However, allowing these models to interact with the internet in real-time could greatly improve their responses, keeping them current with the latest information, and enabling fact-checking against live data. For instance, they could provide recent stock market trends, up-to-date news, or even the latest scientific research.
Advantages
Limitations
Vector-based Databases
Vector-based Databases
Vector-based Databases
Inverse Scaling Phenomenon
Memory Trap
The Memory Trap refers to a tendency of LMs to default to replicating memorized text, often overruling specific instructions to generate novel or specific content. For example, given the famous quote, "Due cose sono infinite: l’universo e la stupidità umana, ma riguardo l’universo ...", a large LM is more likely to finish it as per the original quote, rather than generating a unique ending, despite being prompted to do so.
Logic Issues
LLMs often struggle with logical reasoning tasks, including the ability to accurately perform deductions. As models scale, this problem becomes more pronounced, a phenomenon known as inverse scaling.
As LLMs continue to be integrated into decision-making processes, understanding and mitigating their logical fallacies becomes crucial. Models need to be able to correctly interpret and apply logical reasoning to avoid incorrect or harmful outputs.
Example prompt:
�
Alpha Code 2
Li et al. Competition-level code generation with alphacode (2022)
Alpha Code 2
FunSearch
Romera-Paredes et al. Mathematical discoveries from program search with large language models (2023)
Meta-Morph
Gupta et al. Metamorph: Learning universal controllers with transformers (2022)
Examples of Applications
Future Trends & Challenges