1 of 46

Navigating RAG for Social Science

Atita Arora

Solution Architect, Qdrant

2 of 46

About me

Started 2008

- Computer Applications�- Strategic Business Management�

�- Vector / Semantic Search

- Language analysis �- Information retrieval �

Opensource

Loves to travel, eat, cook

Mom of 2 boys

3 of 46

Hottest 3 letter word in Gen AI right now.. RAG !!!

4 of 46

Adoption of Generative AI

5 of 46

Structured vs Unstructured Data use-cases

6 of 46

Datasphere forecast

Catalog data
User behaviour & Interactions data
Multimedia data

The global datasphere will grow to 163 zettabytes by 2025, and about 80% of that will be unstructured

Challenges�

Volume and Complexity
Data Quality
Integration
Privacy and Security

7 of 46

Information Retrieval

The Evolution of

Open Source LLM Multimodal /

Multilingual IR

2023-Present

Semantic Search

(Word Embeddings)

2018

Learning to Rank (LTR)

2017

Named Entity Recognition

2015

Personalization

2013

2010

Natural Language Processing

(Intent)

Using language processing

techniques

2000

Text Analysis

Using Synonyms, Stemming, Lemmatization

Pre 2000

Pattern search , Exact search

Mostly driven by databases

2011

Multi-word Synonyms

Initial implementation and

later developed into its advance

form

Closed Source LLM

Multilingual IR

2022

8 of 46

AI Models

The Rise of

Models by average English MTEB score (y) vs speed (x)

vs embedding size (circle size).

https://informationisbeautiful.net/visualizations/the-rise-of-generative-ai-large-language-models-llms-like-chatgpt/

9 of 46

Discovery of common language!

10 of 46

The magic of Embeddings !!

An object is known by the company it keeps
In our example the word ‘right’ has a different meanings in each sentence

11 of 46

Key discussion points :

WTH is RAG?�
How do you build RAG ?�
Core challenges of RAG�
Improvement Techniques�
Evaluation based RAG Optimization�
Conclusions

12 of 46

Why do you / anyone need RAG?

👎

Question

Answer

13 of 46

Why do you / anyone need RAG?

👎

Question

Answer

14 of 46

15 of 46

Why do you / anyone need RAG?

👍

Question

Answer

Context

16 of 46

What is RAG?

R

A

G

Retrieval of relevant data / information per User Query

Augmented augmentation of retrieved relevant data / information to the LLM prompt context

Generation of answer using the prompt, augmented with context with relevant information

17 of 46

And how does it compares to Fine-tuning

https://arxiv.org/pdf/2312.10997v1.pdf

18 of 46

Benefits of RAG?

Saves time

Multiple applications

Contextual

Up-to-date

Enhances Engagement

Multilingual*

Saves Cost

Works with custom data

19 of 46

How do you build RAG?

Embedding Storage

Embedding Generation and Ingestion

Document Processing

Query Search and Document Retrieval

Response Synthesis and Answer Generation

20 of 46

How do you build RAG?

Picking your model from : https://huggingface.co/spaces/mteb/leaderboard

Using Chunkviz : https://github.com/gkamradt/ChunkViz�
Picking LLM from : https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

21 of 46

Model Evaluation - https://huggingface.co/spaces/mteb/leaderboard

22 of 46

Chunk Visualization - https://github.com/gkamradt/ChunkViz

23 of 46

LLM Evaluation https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

24 of 46

Flavours of RAG - Naive RAG

• Document Processing:

- Extract text from documents

- Split documents into appropriate chunks

• Embedding Generation and Ingestion:

- Generate embeddings for document chunks

- Store embeddings in vector database

• Query Processing:

- Embed user query

- Retrieve top-k relevant documents

from vector database

• Response Generation:

- Enrich LLM prompt with retrieved documents

- Generate response using LLM

25 of 46

Flavours of RAG - Advance RAG

• Query Treatment:

Routing
Rewriting
Expansion

• Retrieval Response Treatment:

Rerank results
Fusion of multiple ranking algorithms
Summarisation of retrieved results

26 of 46

Flavours of RAG - Agentic / Self Improving RAG

27 of 46

Enterprise-Ready, Massive-Scale Vector Search Technology for the Next AI Generation

Performance

Centric

Quick and

easy to start

Resource

optimization

focussed

All embeddings

supported OOTB

Fully Open

Source project

Scalability

Oriented

Most Loved

open-source vector search database

> 10 K+

Adopters

Worldwide

> 7 M+

Downloads

> 30 K+

Community

Members

28 of 46

Let’s build RAG

Naive RAG Git Repo : https://github.com/qdrant/qdrant-rag-eval/tree/oxford_llm_2024/workshop-rag-eval-oxford-llm2024

Self-Query RAG : https://github.com/qdrant/qdrant-rag-eval/blob/oxford_llm_2024/workshop-rag-eval-oxford-llm2024/notebook/self_query_rag_qdrant_langchain.ipynb

29 of 46

30 of 46

Challenges

of RAG

31 of 46

How can these challenges affect our Applications?

Ref : https://theconversation.com/eat-a-rock-a-day-put-glue-on-your-pizza-how-googles-ai-is-losing-touch-with-reality-230953

32 of 46

33 of 46

RAG Improvement Techniques

ℹ️

Data Cleaning

Leverage Metadata

Advanced data extraction

✂️

Data Chunking

Embedding Model

Retrieval Window

🔍

Indexing Algorithms

🗂️

Multi Vector Indexing

Document Reranking

LLM

Prompt Engineering

Prompt

Agents

34 of 46

How do you evaluate RAG?

Document Processing Evaluation

Model Evaluation

Retrieval Evaluation

Prompt Evaluation

Response Evaluation

LLM Evaluation

Performance Evaluation

35 of 46

Evaluation is Paramount !!

Why should you evaluate ?

Establish Trust (Reputation and Confidence)�
Correlate outcomes wrt your use case�
Validation that your application avoids common pitfalls�
Criterion to make a go / no-go decisions�
Roadmap for improvements�
Compliance and Ethics

36 of 46

Landscape of RAG Evaluation

37 of 46

Evaluation Metrics

Precision & Relevance

�

Semantic and Syntactic Similarity

Context Utilization and Sufficiency

- Answer Relevance

- Context Precision

- Context Relevancy

- Context Recall

- Query Fulfillment

- Context Similarity

- Faithfulness

- Groundedness

- Knowledge F1 score

- ROUGE

- Hallucination

- SelfcheckGPT-NLI

-Summarization accuracy

- Answer correctness

- Exact Match

- F1 Score

- Jaccard Similarity

- Answer semantic similarity

- BERT SentenceSimilarity

- BERTScore

- ROUGE

- SacreBLEU

- Coherence

- Conciseness

- Completion Verbosity

- Verbosity Ratio

- No gibberish

Coherence and Conciseness

Hallucination Management

- Context Utilization

- Context sufficiency

- Summarization accuracy

- Maliciousness

- Harmfulness

- Personal Information detection

- Prompt injection

- OpenAI content moderation

- Safe for work

- No sensitive topics

- Controversiality

- Misogyny

- Criminality

- Controversiality

- Insensitivity

- Toxicity

- Helpfulness

Faithfulness and Groundedness

Correctness and Accuracy

Safety & Guardrails

Summarization

38 of 46

Domain-specific eval is essential for high-quality RAG apps

RAG quality is inherently use-case-dependent. It depends on the database and its contents.

Quantitative

Reliable

Explainable

Debuggable

39 of 46

How to pick relevant metrics?

Take an example of RAG built on Documentation

Quality of Answer

├── Answer Correctness

│ ├── Query Fulfillment

│ │ └── Completeness (SelfCheckGPT)

│ ├── Faithfulness and Groundedness

│ │ ├── Context Utilization

│ │ └── Derived from Document Chunks

│ │ ├── Context Sufficiency

│ │ └── Quality of Retrieved Chunks - Precision / Recall / nDCG

├── Helpfulness

├── Bias-Free

├── Non-Malicious

├── Privacy Compliance

│ └── No Personal Information Shared (PII)

├── Policy Compliance

├── Conciseness

│ └── Designated Number of Tokens (Cost)

├── Latency Requirements

40 of 46

Evaluation Code Walkthrough

Code Branch: https://github.com/qdrant/qdrant-rag-eval/tree/oxford_llm_2024/workshop-rag-eval-oxford-llm2024

RAG Eval with RAGAS : https://github.com/qdrant/qdrant-rag-eval/blob/oxford_llm_2024/workshop-rag-eval-oxford-llm2024/notebook/naive_rag_eval_qdrant_langchain_ragas.ipynb

DSPy :

https://github.com/qdrant/qdrant-rag-eval/blob/oxford_llm_2024/workshop-rag-eval-oxford-llm2024/notebook/naive_rag_eval_qdrant_langchain_ragas_dspy.ipynb

�

41 of 46

Other Evaluation Metrics

Cost (Tokens)
Latency (Time)
Compliance
Continuous Integration with live traffic
Some ideas : https://athina.ai/�

42 of 46

Further Experiments

Experiments with :

Different chunk size , chunk overlap settings
Embedding Models�

Experiments with retrieval tech :

Tuning retrieval params from vector search pov
Hybrid Search RAG – Exact Matches + Semantic similarity
Fusing retrieval algo with different techniques - RRF etc
Using a reranker - cohere , mixedbread , jina etc�

Experiments with:

Different LLMs
Prompt tuning
Chaining LLMs

43 of 46

To sum up

Your data is a crucial determinant of the complexity of RAG.
Domain understanding helps address challenges like document order, terminology, and chain of thought.
Avoid over-optimizing your first run; there's no substitute for evaluation-based improvements.
Regularly update your evaluation dataset to keep it aligned with the latest challenges in your LLM application.
Evaluate with a combination of carefully chosen metrics to effectively diagnose issues.
Ensure the scalability of your evaluation process to accommodate future expansions and refinements.
LLMs, combined with human evaluations, are among the most effective methods for assessing LLM-based applications.

44 of 46

45 of 46

References

�

46 of 46

Thank you !!

A free forever 1GB cluster included for trying out. No credit card required.

Feel free to reach us at : �info@qdrant.com