1 of 35

LlamaIndex: Basics to Production

Ravi Theja:

Data Scientist - Glance (Inmobi)

Open Source Contributor at LlamaIndex.

2 of 35

Context

  • LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.

Use Cases

Question-Answering

Text Generation

Summarization

Planning

LLM’s

3 of 35

Context

  • How do we best augment LLMs with our own private data?

Use Cases

Question-Answering

Text Generation

Summarization

Planning

LLM’s

API’s

Raw Files

SQL DB’s

Vector Stores

?

4 of 35

Paradigms for inserting knowledge

Fine-tuning - baking knowledge into the weights of the network

LLM

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep...

RLHF, Adam, SGD, etc.

5 of 35

Paradigms for inserting knowledge

Fine-tuning - baking knowledge into the weights of the network

Downsides:

  • Data preparation effort
  • Lack of transparency
  • Doesn’t work well
  • High upfront cost

6 of 35

Paradigms for inserting knowledge

In-context learning - Fix the model, put context into the prompt

LLM

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep...

Input Prompt

Here is the context:

Before college the two main things…

Given the context, answer the following question:�{query_str}

7 of 35

Key challenges of in-context learning

  • How to retrieve the right context for the prompt?
  • How to deal with long context?
  • How to deal with source data that is potentially very large? (GB’s, TB’s)
  • How to tradeoff between:
    • Performance
    • Latency
    • Cost

8 of 35

LlamaIndex: A data framework for LLM applications

  • Data Management and Query Engine for your LLM application
  • Offers components across the data lifecycle: ingest, index, and query over data

Data Ingestion (LlamaHub 🦙)

Data Structures

Retrieval and Query Interface

  • Connect your existing data sources and data formats (API’s, PDF’s, docs, SQL, etc.)
  • Store and index your data for different use cases. Integrate with different db’s.
  • Given an input prompt, retrieve relevant context and synthesize a knowledge-augmented output.

9 of 35

LlamaIndex

Knowledge-Intensive LLM Applications

LlamaIndex

Data framework for LLM app development

Foundation Models

Input: rich query description

Output: rich response with references, actions, etc

Sales

Marketing

Recruiting

Dev

Legal

Finance

10 of 35

Data Connectors: powered by LlamaHub 🦙

  • Easily ingest any kind of data, from anywhere
    • into unified document containers
  • Powered by community-driven hub
    • rapidly growing (100+ loaders and counting!)
  • Growing support for multimodal documents (e.g. with inline images)

<10 lines of code to ingest from Notion

11 of 35

Data Indices + Query Interface

Your source documents are stored in a data collection

In-memory, MongoDB

Our data indices help to provide a view of your raw data

Vectors, keyword lookups, summaries

A retriever helps to retrieve relevant documents for your query

A query engine manages retrieval and synthesis given the query.

12 of 35

Vector Store Index

Doc

Doc

Doc

Vector Store

Node1

Node2

Node3

Embedding1

Embedding2

Embedding3

Raw Documents

Stored as Nodes in a vector store

Each Node is indexed with an embedding

13 of 35

Vector Store Index

14 of 35

Response Synthesis

15 of 35

Demo Walkthrough

Let’s play around with LlamaHub + index + query!

Easily ingest data

16 of 35

Use Case: Semantic Search

Answer

The author grew up writing short stories, programming on an IBM 1401, and working on microcomputers. He wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He studied philosophy in college, but switched to AI. He reverse-engineered SHRDLU for his undergraduate thesis and wrote a book about Lisp hacking. He visited the Carnegie Institute and realized he could make art that would last.

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()

index = GPTVectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query(

"What did the author do growing up?"

)

17 of 35

Use Case: Summarization

Answer

  • The author began writing and programming before college, and studied philosophy in college before switching to AI.
  • He realized that AI, as practiced at the time, was a hoax and decided to focus on Lisp hacking instead.
  • He wrote a book about Lisp hacking and graduated with a PhD in computer science.
  • ….

from llama_index import GPTListIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()

index = GPTListIndex.from_documents(documents)

query_engine = index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("Could you give a summary of this article in newline separated bullet points?")

18 of 35

Using Open Source Models - GPT4ALL

local_llm_path = './ggml-gpt4all-j-v1.3-groovy.bin'

llm = GPT4All(model=local_llm_path, backend='gptj', streaming=True, n_ctx=512)

llm_predictor = LLMPredictor(llm=llm)

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))

prompt_helper = PromptHelper(max_input_size=512, num_output=256, max_chunk_overlap=1000)

service_context = ServiceContext.from_defaults(

llm_predictor=llm_predictor,

embed_model=embed_model,

prompt_helper=prompt_helper,

node_parser=SimpleNodeParser(text_splitter=TokenTextSplitter(chunk_size=300, chunk_overlap=20))

)

index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine(similarity_top_k=1, service_context=service_context)

response_stream = query_engine.query("What are the main climate risks to our Oceans?")

Colab Notebook

19 of 35

Use Case: Building a Unified Query Interface

Can use a “Router” abstraction to route to different query engines.

For instance, can do joint semantic search / summarization

20 of 35

Use Case: Document Comparisons

Say you want to compare the 2021 10-K filings for Uber and Lyft

Question: “Compare and contrast the customer segments and geographies that grew the fastest.”

Generate a query plan over your document sources.

21 of 35

Use Case: Exploiting Temporal Relationships

Given a question, what if we would like to retrieve additional context in the past or the future?

Example question: “What did the author do after his time at Y Combinator?”

Requires looking at context in the future!

22 of 35

Use Case: Recency Filtering / Outdated nodes

Imagine you have three timestamped versions of the same data.

If you ask a question over this data, you want to make sure it’s over the latest document.

23 of 35

Use Case: Masking PII information

NER PII Node processor

My name is Ravi Theja and I live in Bangalore. My email address is ravi.theja@gmail.com and my phone number is +91 9550164716.

My name is [NAME] and I live in [PLACE]. My email address is [EMAIL] and my phone number is [CONTACT]

Masking personal information is beneficial prior to inputting it into LLM.

24 of 35

Llama Readers

  • Dhruv and team at MumbaiHacks

25 of 35

Use Case: Text-to-SQL (Structured Data)

from llama_index import GPTSQLStructStoreIndex, SQLDatabase

sql_database = SQLDatabase(engine, include_tables=["city_stats"])

# NOTE: the table_name specified here is the table that you

# want to extract into from unstructured documents.

index = GPTSQLStructStoreIndex.from_documents(

wiki_docs,

sql_database=sql_database,

table_name="city_stats",

)

# set Logging to DEBUG for more detailed outputs

query_engine = index.as_query_engine(mode="default")

response = query_engine.query("Which city has the highest population?")

print(response)

SELECT city_name, population FROM city_stats ORDER BY population DESC LIMIT 1

Generated SQL

26 of 35

Use Case: Joint Text-to-SQL and Semantic Search

Query with SQL over structured data, and “join” it with unstructured context from a vector database!

Combine expressivity of SQL with semantic understanding

27 of 35

Evaluation:

  • What is the need for evaluation?
  • Question Generator.
  • Evaluators.

28 of 35

Response Evaluator

Takes in response source information and output response to evaluate the correctness of the response.

29 of 35

Query Response Evaluator

Takes in Query, response source information and output response to evaluate the correctness of the response.

30 of 35

Source Context Evaluation

Takes in Query, each source information and output response to evaluate the correctness of the response.

31 of 35

AlBus

32 of 35

Workflow:

33 of 35

LLamaIndex in production

OpenAI Pydantic Program

  1. Guide to build full stack web app with LlamaIndex.
  2. Deploying and monitoring with Vellum
  3. Build and Scale a Powerful Query Engine with LlamaIndex and Ray

Easily perform structured data extraction into a Pydantic object with our `OpenAIPydanticProgram`

Notebook Here

34 of 35

Integrations with Ecosystem

  • Use as a standalone module over your data
  • Use as an ingest/query layer on top of your storage system (e.g. vector db)
  • Use data loaders from LlamaHub on their own (e.g. with LangChain, raw OpenAI calls)
  • Use retriever modules for information retrieval
  • Use for LLM experimentation/prototyping (integrate with Vellum, AimStack, etc.)

35 of 35

Thanks!

Check out our docs for more details