Evaluating and Optimizing your RAG App
Jerry Liu, LlamaIndex co-founder/CEO
RAG
Context
Use Cases
Question-Answering
Text Generation
Summarization
Planning
LLM’s
Context
Use Cases
Question-Answering
Text Generation
Summarization
Planning
LLM’s
API’s
Raw Files
SQL DB’s
Vector Stores
?
Paradigms for inserting knowledge
Fine-tuning - baking knowledge into the weights of the network
LLM
Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep...
RLHF, Adam, SGD, etc.
Paradigms for inserting knowledge
Retrieval Augmentation - Fix the model, put context into the prompt
LLM
Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep...
Input Prompt
Here is the context:
Before college the two main things…
Given the context, answer the following question:�{query_str}
LlamaIndex: A data framework for LLM applications
Data Ingestion (LlamaHub 🦙)
Data Structures
Queries
RAG Stack
Current RAG Stack for building a QA System
Vector Database
Doc
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
LLM
Data Ingestion / Parsing
Data Querying
5 Lines of Code in LlamaIndex!
Current RAG Stack (Data Ingestion/Parsing)
Vector Database
Doc
Chunk
Chunk
Chunk
Chunk
Process:
Current RAG Stack (Querying)
Vector Database
Chunk
Chunk
Chunk
LLM
Process:
Current RAG Stack (Querying)
Vector Database
Chunk
Chunk
Chunk
LLM
Process:
Retrieval
Synthesis
Challenges with “Naive” RAG
Challenges with Naive RAG
Challenges with Naive RAG (Response Quality)
Challenges with Naive RAG (Response Quality)
What do we do?
But before all this…
We need a way to measure performance
Evaluation
Evaluation
Vector Database
Chunk
Chunk
Chunk
LLM
Retrieval
Synthesis
Evaluation in Isolation (Retrieval)
Synthetic Dataset Generation for Retrieval Evals
22
Evaluation E2E
Synthetic Dataset Generation for E2E Evals
24
LLM-based Evaluation Modules
25
Techniques for Better Performing RAG
Decouple Embeddings from Raw Text Chunks
Raw text chunks can bias your embedding representation with filler content (Max Rumpf, sid.ai)
Small-to-Big Retrieval
Solutions:
Small-to-Big Retrieval
Solutions:
Sentence Window Retrieval (k=2)
Naive Retrieval (k=5)
Only one out of the 5 chunks is relevant - “lost in the middle” problem
Embed References to Text Chunks
Solutions:
Organize your data for more structured retrieval
(Recursive Retrieval)
Summaries → documents
Organize your data for more structured retrieval
(Metadata)
We report the development of GPT-4, a large-scale, multimodal…
{“page_num”: 1, “org”: “OpenAI”}
Metadata
Text Chunk
Example of Metadata
Organize your data for more structured retrieval
(Metadata Filters)
Question: “Can you tell me about Google’s R&D initiatives from 2020 to 2023?”
Single Collection of all 10Q Document Chunks
2020 10Q chunk 4
top-4
2020 10Q chunk 7
2021 10Q chunk 4
2023 10Q chunk 4
No guarantee you’ll return the relevant document chunks!
query_str: <query_embedding>
Organize your data for more structured retrieval
(Metadata Filters)
Question: “Can you tell me about Google’s R&D initiatives from 2020 to 2023?”
2020 10Q
2021 10Q
2022 10Q
2023 10Q
2020 10Q chunk 4
2021 10Q chunk 4
2022 10Q chunk 4
2023 10Q chunk 4
query_str: <query_embedding>
Metadata tags:
<metadata_tags>
Organize your data for more structured retrieval
(Recursive Retrieval)
Organize your data for more structured retrieval
(Recursive Retrieval)
Summaries → documents
Organize your data for more structured retrieval
(Recursive Retrieval)
Documents → Embedded Objects
Production RAG Guide
Fine-Tuning
Fine-tuning
You can choose to fine-tune the embeddings or the LLM
Fine-tuning (Embeddings)
Generate a synthetic query dataset from raw text chunks using LLMs
NOTE: Similar process to generating an evaluation dataset!
Credits: Jo Bergum, vespa.ai
Fine-tuning (Embeddings)
Use this synthetic dataset to finetune an embedding model.
Fine-tuning a Black-box Adapter
Fine-tuning (LLMs)
Use OpenAI to distill GPT-4 to gpt-3.5-turbo
Finetuning Abstractions in LlamaIndex