CN408/SF323 AI Engineer
Lecture 4: Information Retrieval and Retrieval Augmented Generation
Nutchanon Yongsatianchot
News
News
News
News: nano-banana is out!
Retrieval Augmented Generation
Many graphics are from https://www.deeplearning.ai/courses/retrieval-augmented-generation-rag/
Motivation
Solution: Providing Additional Information
Prompt
User’s Query
Additional Information
Instruction
We will need to retrieve!!
Many applications involve specialized knowledge
Legal or Medical use cases
Company chatbot
The Big Picture: Retrieval Augmented Generation (RAG)
Query
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
The center question of RAG:
How can we get all the right information to LLM?
Information Retrieval
Two board ways of retrieving information
Sparse Search
Keyword Matching
Keyword Matching
‘CN240’: “วิชา data science”}
Keyword Matching Demo
Metadata filtering
Uses rigid criteria to narrow down documents based on metadata like title, author,
creation date, access privileges, and more.
Metadata Filtering In RAG
Sparse Search
Bag-of-word search - TF-IDF
Keyword Search
Bag of Words
Word order is ignored, only word presence and frequency matter
Bag of words
Sparse Vectors
Most words aren’t used. The bag of words is sparse, with few non-zero entries.
Bag of words view of document
Words appearing in each document
Bag of words search
Frequency Based vs. Term Frequency (TF) Based Scoring
Example Query: pizza oven
Document 1
Homemade pizza in oven is better
than frozen pizza
Contains: pizza (2x) oven (1x)
Document 2
Wood-fired oven is a better oven than a stone oven for cooking pizza
Contains: pizza (1x) oven ( 3x)
Simple Scoring = 2 points
Simple Scoring = 2 points
TF Scoring = 3 points
TF Scoring = 4 points
Normalized TF Scoring
Longer documents may contain keywords many times simply because they are longer.
Solution: Normalize by document length
Score = (Number of keyword occurrences) / (Total words in document)
Term Frequency-Inverse Document Freqeuncy (TF-IDF)
Basic TF scoring treats all words equally, whether they're common filler words or rare, meaningful terms.
Solution: Weight terms using “inverse document frequency” (IDF).
Score = TF(word, doc) × log(Total docs / Docs containing word)
TF vs. TF-IDF
TF-IDF
Documents with rare keywords score higher than documents with common words
Modern systems use a slightly refined version called BM25
Sparse Search
BM25
BM25 Scoring
BM25 (Best Matching 25) was named as the 25th variant in a series of scoring functions proposed by its creators.
BM25 Tunable Parameters
k₁ - Term Frequency Saturation
b - Length Normalization
Sparse/Keyword Search Summary
Sparse Vector
Score
Rank
BM25 Demo
Semantic Search
Keyword search does not capture meaning of words
Semantic Search vs. Keyword Search
Sparse vs. Dense Search
Vector Space
Similar words are closer in the vector space
Sentence Embedding Example
Measuring Vector Distance
Measuring Vector Distance
Measuring Vector Distance
Semantic Search Summary
Our Current RAG
Query
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
RAG with Vector Database
Query
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
embedding
Embedding Choices
Embedding Search High Level
Vector Database
Embedding Search without Vector Database Demo
Semantic Search
With Vector Database
You technically don’t need Vector Database to do RAG
RAG != Semantic Search with Vector Database
Vector Database Choices
Pinecone Database
Pinecone Database - Setup
RAG with Pinecone Database - Coding
*Pinecone has its own proprietary Approximate Nearest Neighbour (ANN) search
Break
Improving RAG
RAG with Vector Database
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
Query
embedding
RAG with Vector Database
Query
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Augment
Documents
embedding
embedding
Index
Hybrid Search
Hybrid Search
Query
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Response
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
embedding
Hybrid Search
Retriever
Keyword Search
Semantic Search
Metadata Filter
Metadata Filter
Reciprocal Rank Fusion
1st = 1 point, 2nd = 0.5 points, etc.
Reciprocal Rank Fusion
RRF only cares about ranks, not scores
Beta: Weighting Semantic vs. Keyword
If exact keyword matching is important, set a lower beta
Hybrid Search
Reranking
Reranking
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
Query
embedding
Overview of Reranking
Reranker
Hybrid Search + Reranker Demo
Chunking Strategies
Chunking
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
Query
embedding
Why Chunk Documents?
Indexing without chunking
The problems with this approach
Chunking your content
Chunking Considerations
Fixed Size Chunking
Fixed Size Chunking: Overlapping Chunking
Fixed Size Chunking
Chunk based on the number of token and (optionally) any overlap between chunks.
Sentence splitting
Recursive Chunking
Chunking Strategy Demo
Specialized chunking
Specialized chunking
Semantic Chunking
Groups sentences together based on similar meanings rather than arbitrary character limits
Semantic Chunking
Language Based Chunking
Contextual Retrieval
Contextual Retrieval
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
Query
embedding
Problem
Contextual Retrieval
original_chunk = "The company's revenue grew by 3% over the previous quarter."
contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."
Implementing Contextual Retrieval
<document>
{{WHOLE_DOCUMENT}}
</document>
Here is the chunk we want to situate within the whole document
<chunk>
{{CHUNK_CONTENT}}
</chunk>
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.
Choosing a Chunking Approach
Query Rewriting
This section is from LangChain: https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x
Query Rewriting
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
Query
embedding
Queries != documents
Query Rewriting
AI Engineering: Building Applications with Foundation Models
Query Rewriting
Use an LLM to rewrite the query before it’s submitted to the retriever.
Transform a question into multiple perspectives
Intuition: Improve search
Use this with parallelized retrieval
Multi-Query
HyDE: Hypothetical Document Embeddings
HyDE
RAG Evaluations
RAG Evaluations
Knowledge Base
User’s query
LLM
Augmented Prompt
Generate
Answer
Relevant Documents
Retrieve
Index
Augment
Documents
embedding
Query
embedding
RAG Evaluations
RAG systems have three core components:
Part 1: Context Relevance (C|Q)
Def: How well do the retrieved chunks address the question information needs? Does your retriever find passages that contain information relevant to answering the user's question.
Bad
Question: "What are the health benefits of meditation?"
Context: "Meditation practices vary widely across different traditions. Mindfulness meditation, which originated in Buddhist practices, focuses on present-moment awareness, while transcendental meditation uses mantras to achieve deeper states of consciousness."
Reasoning: Despite being factually correct about meditation, this context did not discuss health benefits.
Good
Question: "What are the health benefits of meditation?"
Context: "Regular meditation has been shown to reduce stress hormones like cortisol. A 2018 study in the Journal of Cognitive Enhancement found meditation improves attention and working memory."
Reasoning: Strong relevance. The context directly addresses multiple health benefits with specific details.
Q: What source of information would you need to answer multiple-choice questions on financial knowledges?
Finance textbooks
Retrieval Quality Metrics
Common ingredients to most retriever quality metrics:
If you want to evaluate your retriever you need to know the correct answers
The Question
The specific question being evaluated
Ranked Results
Documents returned in ranked order
Ground Truth All documents labeled as relevant or irrelevant
Recall and Precision
Recall penalizes for leaving out relevant documents
Precision penalizes for returning irrelevant documents
Top k
Example
Evaluate RAG using Synthetic Data and LLM-as-a-judge
Ques- tions
Score
sample
generate
judge
LLM
Knowledge Base
LLM
Only use good questions
Evaluate RAG using Synthetic Data and LLM-as-a-judge
Part 2: Faithfulness/Groundedness (A|C)
Def: To what extent does the answer restrict itself only to claims that can be verified from the retrieved context?
Bad
Context: "The Great Barrier Reef is the world's largest coral reef system. It stretches for over 2,300 kilometers along the coast of Queensland, Australia."
Answer: "The Great Barrier Reef, the world's largest coral reef system, stretches for over 2,300 kilometers along Australia's eastern coast and is home to about 10% of the world's fish species."
Reasoning: The first part is supported, but the claim about "10% of the world's fish species" isn't in the provided context.
Good
Context: "The Great Barrier Reef is the world's largest coral reef system."
Answer: "The Great Barrier Reef is the largest coral reef system in the world."
Reasoning: Perfect faithfulness. The answer only states what's in the context.
Part 3: Answer Relevance (A|Q)
Def: How directly does the answer address the specific information need expressed in the question? This evaluates the end-to-end system performance.
Bad
Question: "How does compound interest work in investing?"
Answer: "Interest in investing can be simple or compound. Compound interest is more powerful than simple interest and is an important concept in finance."
Reasoning: Low relevance. The answer doesn't actually explain the mechanism of how it works.
Good
Question: "How does compound interest work in investing?"
Answer: "Compound interest works by adding the interest earned back to your principal investment, so that future interest is calculated on the new, larger amount."
Reasoning: High relevance. The answer directly explains the concept asked about.
Advanced RAG Relationships
Other considerations
HW 4
Project
Don’t forget about your project!
Extra
ColBert: Similarity at Token Levels
Score = Sum of max similarity of each query embedding to any document embedding
ColBert
ColBert