Beyond RAG: Building Advanced Context-Augmented LLM
Applications
Jerry Liu, LlamaIndex co-founder/CEO
LlamaIndex: Context Augmentation for your LLM app
RAG
RAG
Data Parsing & Ingestion
Data Querying
Index
Data
Data Parsing + Ingestion
Retrieval
LLM + Prompts
Response
Naive RAG
PyPDF
Sentence Splitting
Chunk Size 256
Simple QA Prompt
Dense Retrieval
Top-k = 5
Index
Data
Data Parsing + Ingestion
Retrieval
LLM + Prompts
Response
Naive RAG is Limited
RAG Prototypes are Limited
Naive RAG approaches tend to work well for simple questions over a simple, small set of documents.
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
Pain Points
There’s certain questions we want to ask where naive RAG will fail.
Examples:
Can we do more?
In the naive setting, RAG is boring.
🚫 It’s just a glorified search system
🚫 There’s many questions/tasks that naive RAG can’t give an answer to.
💡 Can we go beyond simple search/QA to building a general context-augmented research assistant?
Beyond RAG: Adding Layers of Agentic Reasoning
From RAG to Agents
RAG
Query
Response
From RAG to Agents
RAG
Query
Response
⚠️ Single-shot
⚠️ No query understanding/planning
⚠️ No tool use
⚠️ No reflection, error correction
⚠️ No memory (stateless)
From RAG to Agents
✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization
Agent
RAG
Query
Response
Tool
Tool
Tool
From Simple to Advanced Agents
Full Agents
Agent Ingredients
Simple
Lower Cost
Lower Latency
Advanced
Higher Cost
Higher Latency
Routing
One-Shot Query Planning
Tool Use
ReAct
Dynamic Planning + Execution
Conversation Memory
Routing
Simplest form of agentic reasoning.
Given user query and set of choices, output subset of choices to route query to.
Routing
Use Case: Joint QA and Summarization
Conversation Memory
In addition to current query, take into account conversation history as input to your RAG pipeline.
Conversation Memory
How to account for conversation history in a RAG pipeline?
Query Planning
Break down query into parallelizable sub-queries.
Each sub-query can be executed against any set of RAG pipelines
Uber 10-K chunk 4
top-2
Uber 10-K chunk 8
Lyft 10-K chunk 4
Lyft 10-K chunk 8
Compare revenue growth of Uber and Lyft in 2021
Uber 10-K
Lyft 10-K
Describe revenue growth of Uber in 2021
Describe revenue growth of Lyft in 2021
top-2
Query Planning
Example: Compare revenue of Uber and Lyft in 2021
Uber 10-K chunk 4
top-2
Uber 10-K chunk 8
Lyft 10-K chunk 4
Lyft 10-K chunk 8
Compare revenue growth of Uber and Lyft in 2021
Uber 10-K
Lyft 10-K
Describe revenue growth of Uber in 2021
Describe revenue growth of Lyft in 2021
top-2
Tool Use
Use an LLM to call an API
Infer the parameters of that API
Tool Use
In normal RAG you just pass through the query.
But what if you used the LLM to infer all the parameters for the API interface?
A key capability in many QA use cases (auto-retrieval, text-to-SQL, and more)
Let’s put them together
Core Components of a Full Agent
Minimum necessary ingredients:
ReAct: Reasoning + Acting with LLMs
Source: https://react-lm.github.io/
ReAct: Reasoning + Acting with LLMs
Query Planning: Generate next step given previous steps (chain-of-thought prompt)
Tool Use: Sequential tool calling.
Memory: Maintain simple buffer.
ReAct: Reasoning + Acting with LLMs
Can we make this even better?
LLMCompiler (Kim et al. 2023)
Kim et al. 2023
An agent compiler for parallel multi-function planning + execution.
LLMCompiler
Query Planning: Generate a DAG of steps. Replan if steps don’t reach desired state
Tool Use: Parallel function calling.
Memory: Maintain simple buffer.
Tree-based Planning
Tree of Thoughts (Yao et al. 2023)
Reasoning via Planning (Hao et al. 2023)
Language Agent Tree Search (Zhou et al. 2023)
Tree-based Planning
Query Planning in the face of uncertainty: Instead of planning out a fixed sequence of steps, sample a few different states.
Run Monte-Carlo Tree Search (MCTS) to balance exploration vs. exploitation.
Self-Reflection
Use feedback to improve agent execution and reduce errors
🗣️ Human feedback
🤖 LLM feedback
Use few-shot examples instead of retraining the model!
Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023)
Additional Requirements
LlamaIndex + W&B
Tracing and Observability are essential developer tools for RAG/agent development.
We have first-class integrations with Weights and Biases.