Retrieval Augmented Generation (RAG)
Parvez M Robin
Software Engineer
Siemens
$ WHOAMI
LET THERE BE LLM
* https://healthit.com.au/how-big-is-the-internet-and-how-do-we-measure-it/
JUST AN EXAMPLE
You can think of the large language model as an over-enthusiastic new employee who refuses to stay informed with current events but will always answer every question with absolute confidence.
– Amazon Web Service Documentation
Source: https://aws.amazon.com/what-is/retrieval-augmented-generation
RAG TO THE RESCUE
LLM
RAG
HOW RAG WORKS
RETRIEVAL AUGMENTED GENERATION
RAG ARCHITECTURE
Query
RAG
Knowledge�Source
LLM
Query
Retrieved Documents
Final Response
Retrieved Documents
Reranked Documents
Prompt + Query + Docs
TYPES OF RAG
RAG SEQUENCE
RAG TOKEN
Reference: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Facebook AI Research (FAIR)
ADVANTAGES
Responsible AI
Precise source of truth
Improved accuracy and relevance
Support for privacy
Minimal hallucination
SOME BUZZWORDS
Token
Embedding
Context window
Chunk size
Prompt
Zero shot learning
Few shot learning
Hallucination
Agent
COMPARING LLMS FOR RAG
A study by Galileo
EXPERIMENTAL SETUP
Short Context�Less than 5k tokens�RAG on a few pages
Medium Context�5k to 25k tokens�RAG on a book’s chapter
Long Context�40k to 100k tokens�RAG on a book
OPEN SOURCE IS CLOSING THE GAP
While closed-source models still offer the best performance, open-source models like Gemini, Llama, and Qwen continue to improve
Source: https://www.rungalileo.io/ty/hallucinationindex
MEDIUM CONTEXT LENGTH IS THE KEY
Most of these models perform the best when provided with a context of 5k to 25k tokens
Source: https://www.rungalileo.io/ty/hallucinationindex
ANTHROPIC OUTPERFORMS OPENAI
Anthropic’s latest Claude 3.5 Sonnet and Claude 3 Opus consistently beats out GPT-4o and GPT-3.5.
Source: https://www.rungalileo.io/ty/hallucinationindex
LARGER IS NOT ALWAYS BETTER
In certain cases, smaller models outperformed larger models. Specifically, Gemini 1.5 Flash from Google performed unexpectedly well.
Source: https://www.rungalileo.io/ty/hallucinationindex
A COST-EFFECTIVE RAG
MAKE IT BETTER
HIERARCHICAL DOCUMENTS
RECURSIVE RETRIEVAL
MULTISTEP REASONING�CHAIN OF THOUGHT
Source: ART: Automatic multi-step reasoning and tool-use for large language models, Microsoft Research, Allen Institute of AI, Meta AI
MULTISTEP REASONING CONTD.�CHAIN OF NOTE
Source: Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Tecent AI Lab
SELF-AWARE LLM
Source: ART: Self-RAG: Learning to Retrieve, Generate and Critique, IBM AI Research, Allen Institute for AI
THANK YOU
Questions?
A Chronology of Generations
Parvez M Robin
Software Engineer
Siemens
A Chronology of Generations
Parvez M Robin
Software Engineer
Siemens
A Chronology of Generations
Parvez M Robin
Software Engineer
Siemens
A Chronology of Generations
Parvez M Robin
Software Engineer
Siemens
A Chronology of Generations
Parvez M Robin
Software Engineer
Siemens
A Chronology of Generations
Parvez M Robin
Software Engineer
Siemens
A Chronology of Generations
Parvez M Robin
Software Engineer
Siemens