1 of 30

LLM SOTA

By Rajat, Kuldeep

Real World

AI Bootcamp

2 of 30

What are Foundational models?

  • Foundation models are large neural networks, trained on large amounts of unlabelled data.
  • Unlike more narrow traditional AI models, foundation models are “general-purpose” and can be easily “adapted” to many different tasks without new training data.

Real World

AI Bootcamp

3 of 30

Traditional ML vs Foundational Models

Real World

AI Bootcamp

4 of 30

How to use FMs?

  • Prompting

  • Fine Tuning

Real World

AI Bootcamp

5 of 30

Some Examples of FMs

  • BERT (Bidirectional Encoder Representations for Transformers) 2018
    • BERT was built on bi-directional training objectives Masked Language Model
    • The pre-trained BERT model generates token-level embeddings, which facilitate contextual language comprehension across a range of NLP tasks, including sentiment analysis, text summarization, and semantic similarity.

  • GPT (Generative Pre-trained Transformers), 2019 onwards
    • GPT was built on learning the next-word prediction objective
    • Outperformed prev generation models in generative and language understanding tasks

  • CLIP
    • A multi-modal vision and language modal
    • trained on 400 million image-text pairs with the motive of learning how to connect language and images
    • Used in Image retrieval and Image-text matching tasks

Real World

AI Bootcamp

6 of 30

Scale’s Law

  • Computation: Enhanced hardware availability boosts training efficiency and model performance.

  • Data: Access to large datasets enables models to learn diverse and comprehensive patterns.

  • Transformer Model Architecture: Self-attention mechanisms improve the model's ability to capture complex relationships in data

Real World

AI Bootcamp

7 of 30

Benefits of FM

  • Generalization
    • FMs can be adapted to perform a wide range of tasks

  • Enhanced performance
    • Often achieve SOTA state-of-the-art performance on many tasks due to their ability to leverage vast amounts of training data and complex patterns

  • Accelerated time to value:
    • 84 % of organizations successfully transform a gen AI use case idea into production within six months

  • User Experience:
    • 80% of organizations have improved user satisfaction due to Gen AI, which in turn boosts user engagement and satisfaction

Real World

AI Bootcamp

8 of 30

How FMs are trained?�

  • Pretraining
    • Train model on low quality massive amount of data
  • Fine Tuning
    • Teaches model to respond to instructions
  • Alignment
    • Helps model to produce output closer to human preferences

Real World

AI Bootcamp

9 of 30

Emergent Abilities

  • It refers to unexpected skills or behaviors that arise when the model is trained on a lot of data

  • Emergent abilities are not present in small models but can be observed in large models

Real World

AI Bootcamp

10 of 30

Emergent Abilities – InContext Learning (ICL)

  • Few Shot Learning:
    • Few-shot learning is a machine learning approach where a model learns to perform a task with only a small number of examples

  • It Leads to quick Task Adaptation:
    • FM quickly adapt to new tasks making it extremely useful in scenarios where data is scarce

Real World

AI Bootcamp

11 of 30

Emergent Abilities – Reasoning and Planning

  • Reasoning is the ability to solve problems by making decisions, and draw logical conclusions from information.

  • Can LLM truly think? Can they reason?
    • Chain-of-Thought Prompting

Real World

AI Bootcamp

12 of 30

Text

Real World

AI Bootcamp

13 of 30

What has changed in last 5 years ?

  • Increased Context Length
    • Context window refers to the length of text an AI model can process and respond to in a given instance.

    • Better way to handle self-attention (KV cache, sliding window)

    • Better positional encoding

  • Better Sampling
    • Test Time Sampling - generate multiple outputs and select the best one

    • Constraint sampling - used to guide the generation of text towards certain constraints
      • Structured Output

  • Alignment with Human Preference

Real World

AI Bootcamp

14 of 30

What has it lead to?

  • Better Contextual Understanding:
    • Analyzes context from user input to generate relevant

  • Better Pattern Recognition:
    • Learns from patterns in data to predict and generate data

  • Function Calling:
    • LLMs accessing real world tools, APIs

  • Human like conversation:
    • Able to act as an assistant

Real World

AI Bootcamp

15 of 30

Better Chat Quality (Ex customer support bot)

Real World

AI Bootcamp

16 of 30

Retrieval Augmented Generation (RAG)

Real World

AI Bootcamp

17 of 30

Code Generation

  • Text-to-SQL
    • Limited generation space, less chance of errors

Real World

AI Bootcamp

18 of 30

Reasoning and Planning – OpenAI “o1”

  • o1 models think before they answer, and can produce a long internal chain of thought before responding to the user.

Real World

AI Bootcamp

19 of 30

Images and Videos

Real World

AI Bootcamp

20 of 30

What has changed in last 5 years ?

  • Receptive Field
    • Receptive field refers to the region of the input image that a particular neuron “looking at”
    • Local vs Global
    • Related to longer context window

  • Processing high resolution data
    • Computation, Model architecture

Real World

AI Bootcamp

21 of 30

What has it lead to?

  • Global Context leads model to learn more nuanced features compared to local patterns

  • Enhanced segmentation has improved applications like
    • Medical imaging (e.g., tumor detection)
    • Autonomous driving (e.g., road and obstacle identification)
    • Remote sensing (e.g., land cover classification)

Real World

AI Bootcamp

22 of 30

Multi Modal

Real World

AI Bootcamp

23 of 30

Real World

AI Bootcamp

24 of 30

What are MultiModal models?

Real World

AI Bootcamp

25 of 30

What has changed in last 5 years ?

  • Better Alignment between embeddings of different modalities

    • Contrastive Learning

Real World

AI Bootcamp

26 of 30

What has it lead to?

  • Document Understanding
    • Better OCR
    • Better Context understanding

  • Image / Video / Audio Editing

  • Diffusion Models Are Real-Time Game Engines (https://gamengen.github.io/)

Real World

AI Bootcamp

27 of 30

What has it lead to?

  • Image Generation (Text-to-Image)
    • Advancement in GANs, leading to more realistic image generation, especially with architectures like StyleGAN and BigGAN.

  • Video Generation (Text-to-Video)
    • Leverage a transformer architecture that operates on spacetime patches of video and image latent codes

Real World

AI Bootcamp

28 of 30

Challenges with FMs

  • Computation scarcity
    • FM has a requirement for enormous computational resources to train and perfect models.

  • Data scarcity
    • Limited access to high-quality, diverse datasets can hinder model performance and generalization.

  • Bias
    • FMs can inherit and amplify biases present in training data, leading to unfair or inaccurate outcomes.

  • Consent
    • Copyright issues
    • Ethical issues

Real World

AI Bootcamp

29 of 30

How to choose FM? (https://artificialanalysis.ai/)

  • Modality
    • Text / Image / Audio / Multimodal

  • Access
    • Open vs Closed

  • Latency
    • Real time vs Offline

  • Price

  • Context Window size

  • Quality
    • Benchmark

Real World

AI Bootcamp

30 of 30

Thank You

Real World

AI Bootcamp