1 of 34

Retrieval Augmented Generation (RAG)

Parvez M Robin

Software Engineer

Siemens

2 of 34

$ WHOAMI

  • Software Engineer at Siemens
    • Focusing Electronic Design Automation and Artificial Intelligence
  • Masters from Dalhousie University
    • Focusing understanding software bugs using neural language models
  • Public Speaker
    • Focusing software engineering and artificial intelligence
  • Hiker
    • Focusing uphill
  • Biker
    • Focusing downhill

3 of 34

LET THERE BE LLM

  • Gemini 1.5 knows nothing after November 2023
    • April 2023 for GPT 4 and Llama 2
  • You can memorize only so much
    • Parameters can only store a limited amount of knowledge
    • In 2020, the internet hit 64 zetabytes (trillion gigabytes) *
  • Tend to prefer popularity over accuracy
  • LLMs are not domain experts

* https://healthit.com.au/how-big-is-the-internet-and-how-do-we-measure-it/

4 of 34

JUST AN EXAMPLE

5 of 34

6 of 34

You can think of the large language model as an over-enthusiastic new employee who refuses to stay informed with current events but will always answer every question with absolute confidence.

– Amazon Web Service Documentation

Source: https://aws.amazon.com/what-is/retrieval-augmented-generation

7 of 34

RAG TO THE RESCUE

LLM

  • Knows nothing new
  • You can memorize only so much
  • Tend to prefer popularity over accuracy
  • Are not domain experts
  • Can pull in up-to-date information
  • Can retrieve external data as needed
  • Relies on specific, relevant sources for responses
  • Supports Domain-Specific Expertise, by pulling from specialized resources

RAG

8 of 34

HOW RAG WORKS

9 of 34

RETRIEVAL AUGMENTED GENERATION

  • A retrieval module responsible for retrieving latest and/or private information
  • Narrows Down to the Most Useful Information
  • A generator module to generate human friendly response
  • Integrates Multiple Sources
  • Balances Contextual and Retrieved Information

10 of 34

RAG ARCHITECTURE

Query

RAG

Knowledge�Source

LLM

Query

Retrieved Documents

Final Response

Retrieved Documents

Reranked Documents

Prompt + Query + Docs

11 of 34

TYPES OF RAG

RAG SEQUENCE

  • Generates one response per retrieved document
  • Chooses the best final response from the candidates
  • Best when all retrieved documents has the answer
  • E.g., Google Search
  • Treats the retrieved documents as a series of tokens
  • Generates a single response using all of them
  • Best when retrieved documents has part of the full answer
  • E.g., creative writing, software debugging

RAG TOKEN

Reference: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Facebook AI Research (FAIR)

12 of 34

ADVANTAGES

Responsible AI

Precise source of truth

Improved accuracy and relevance

Support for privacy

Minimal hallucination

13 of 34

SOME BUZZWORDS

Token

Embedding

Context window

Chunk size

Prompt

Zero shot learning

Few shot learning

Hallucination

Agent

14 of 34

COMPARING LLMS FOR RAG

A study by Galileo

15 of 34

EXPERIMENTAL SETUP

Short ContextLess than 5k tokens�RAG on a few pages

Medium Context5k to 25k tokens�RAG on a book’s chapter

Long Context40k to 100k tokens�RAG on a book

16 of 34

OPEN SOURCE IS CLOSING THE GAP

While closed-source models still offer the best performance, open-source models like Gemini, Llama, and Qwen continue to improve

Source: https://www.rungalileo.io/ty/hallucinationindex

17 of 34

MEDIUM CONTEXT LENGTH IS THE KEY

Most of these models perform the best when provided with a context of 5k to 25k tokens

Source: https://www.rungalileo.io/ty/hallucinationindex

18 of 34

ANTHROPIC OUTPERFORMS OPENAI

Anthropic’s latest Claude 3.5 Sonnet and Claude 3 Opus consistently beats out GPT-4o and GPT-3.5.

Source: https://www.rungalileo.io/ty/hallucinationindex

19 of 34

LARGER IS NOT ALWAYS BETTER

In certain cases, smaller models outperformed larger models. Specifically, Gemini 1.5 Flash from Google performed unexpectedly well.

Source: https://www.rungalileo.io/ty/hallucinationindex

20 of 34

A COST-EFFECTIVE RAG

21 of 34

MAKE IT BETTER

22 of 34

HIERARCHICAL DOCUMENTS

  • Organize documents into a structured hierarchy
  • Use metadata and semantic relationships

23 of 34

RECURSIVE RETRIEVAL

  • Given the retrieved document, ask the LLM if there is any confusing topics
  • Search again for documents on confusing topics
  • Repeat, until the LLM is confident enough

24 of 34

MULTISTEP REASONING�CHAIN OF THOUGHT

  • Ask the LLM to break the task into multiple steps
  • Ask it to solve each step
  • Ask it to explain it’s reasoning in each step
  • Ask it to stich everything together

  • Consider parallelizing steps, if possible

Source: ART: Automatic multi-step reasoning and tool-use for large language models, Microsoft Research, Allen Institute of AI, Meta AI

25 of 34

MULTISTEP REASONING CONTD.�CHAIN OF NOTE

  • Generate sequential reading notes for each retrieved document
  • Thoroughly evaluate of their relevance
  • Integrate this information to formulate the final answer

Source: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Tecent AI Lab

26 of 34

SELF-AWARE LLM

  1. Generate response using knowledge and retrieved docs
  2. Analyze own response for potential errors, inconsistencies, or incompleteness
  3. Until self-critique identifies issues
  4. Refine query
  5. Retrieve new docs
  6. Go to 1

Source: ART: Self-RAG: Learning to Retrieve, Generate and Critique, IBM AI Research, Allen Institute for AI

27 of 34

THANK YOU

Questions?

28 of 34

A Chronology of Generations

Parvez M Robin

Software Engineer

Siemens

29 of 34

A Chronology of Generations

Parvez M Robin

Software Engineer

Siemens

30 of 34

A Chronology of Generations

Parvez M Robin

Software Engineer

Siemens

31 of 34

A Chronology of Generations

Parvez M Robin

Software Engineer

Siemens

32 of 34

A Chronology of Generations

Parvez M Robin

Software Engineer

Siemens

33 of 34

A Chronology of Generations

Parvez M Robin

Software Engineer

Siemens

34 of 34

A Chronology of Generations

Parvez M Robin

Software Engineer

Siemens