1 of 33

Chat with custom data

source using Gemini

Tarun Jain

Data Scientist at AI Planet

@TRJ_0751

Integrate LLMs into your custom dataset.

2 of 33

$whoami

Data Scientist @AIPlanet 🥑

AI with Tarun - YouTube🤖

Google Developer Expert in AI/ML 🦄

GSoC’24 @Red Hen Lab ⭐

I watch Anime and read Manga ⛩️

3 of 33

Let's talk about

Large Language Models

4 of 33

5 of 33

6 of 33

7 of 33

Zero Shot Prompting

Few Shot Prompting

8 of 33

Issues with Large Language Models

9 of 33

  1. Hallucinations
  2. Knowledge cut-off
  3. Lack of Domain specific Factual responses

10 of 33

Chat with your own data using Gemini 1.5 Pro and Flash

Retrieval Augmented Generation

11 of 33

12 of 33

13 of 33

External Data

14 of 33

External Data

Data Preprocessing

(Splitting/

Chunking)

15 of 33

External Data

Vector

Embeddings

Data Preprocessing

(Splitting/

Chunking)

16 of 33

External Data

Vector

Embeddings

Vector

Database

Data Preprocessing

(Splitting/

Chunking)

17 of 33

Step-1 Retrieval

18 of 33

Step-2 Augment

19 of 33

Step-3 Generate

20 of 33

You can build this with

5 lines of code

21 of 33

from beyondllm import source,retrieve,generator

import os

# Beyond LLM uses Gemini Embeddings and LLM as default, get your API key to get started

os.environ['GOOGLE_API_KEY'] = "AI******************8"�

# Load your Data using fit. Also define the chunk size

data = source.fit("https://www.youtube.com/watch?v=oJJyTztI_6g",dtype="youtube",chunk_size=1024,chunk_overlap=0)

�# Retriever => Retrieves document. Use type for advanced RAG techniques

retriever = retrieve.auto_retriever(data,type="normal",top_k=4) # here embed_model by default is Gemini�

# Generate AI response�query = "what tool is video mentioning about?"

pipeline = generator.Generate(question=query,retriever=retriever) #here llm by default is Gemini

print(pipeline.call())

print(pipeline.get_rag_triad_evals()) # evaluate LLM response

22 of 33

from beyondllm import source,retrieve,generator

import os

# Beyond LLM uses Gemini Embeddings and LLM as default, get your API key to get started

os.environ['GOOGLE_API_KEY'] = "AI******************8"�

# Load your Data using fit. Also define the chunk size

data = source.fit("https://www.youtube.com/watch?v=oJJyTztI_6g",dtype="youtube",chunk_size=1024,chunk_overlap=0)

�# Retriever => Retrieves document. Use type for advanced RAG techniques

retriever = retrieve.auto_retriever(data,type="normal",top_k=4) # here embed_model by default is Gemini�

# Generate AI response�query = "what tool is video mentioning about?"

pipeline = generator.Generate(question=query,retriever=retriever) #here llm by default is Gemini

print(pipeline.call())

print(pipeline.get_rag_triad_evals()) # evaluate LLM response

23 of 33

from beyondllm import source,retrieve,generator

import os

# Beyond LLM uses Gemini Embeddings and LLM as default, get your API key to get started

os.environ['GOOGLE_API_KEY'] = "AI******************8"�

# Load your Data using fit. Also define the chunk size

data = source.fit("https://www.youtube.com/watch?v=oJJyTztI_6g",dtype="youtube",chunk_size=1024,chunk_overlap=0)

�# Retriever => Retrieves document. Use type for advanced RAG techniques

retriever = retrieve.auto_retriever(data,type="normal",top_k=4) # here embed_model by default is Gemini�

# Generate AI response�query = "what tool is video mentioning about?"

pipeline = generator.Generate(question=query,retriever=retriever) #here llm by default is Gemini

print(pipeline.call())

print(pipeline.get_rag_triad_evals()) # evaluate LLM response

24 of 33

from beyondllm import source,retrieve,generator

import os

# Beyond LLM uses Gemini Embeddings and LLM as default, get your API key to get started

os.environ['GOOGLE_API_KEY'] = "AI******************8"�

# Load your Data using fit. Also define the chunk size

data = source.fit("https://www.youtube.com/watch?v=oJJyTztI_6g",dtype="youtube",chunk_size=1024,chunk_overlap=0)

�# Retriever => Retrieves document. Use type for advanced RAG techniques

retriever = retrieve.auto_retriever(data,type="normal",top_k=4) # here embed_model by default is Gemini�

# Generate AI response�query = "what tool is video mentioning about?"

pipeline = generator.Generate(question=query,retriever=retriever) #here llm by default is Gemini

print(pipeline.call())

print(pipeline.get_rag_triad_evals()) # evaluate LLM response

25 of 33

!pip install google-generativeai

26 of 33

Multimodal

  • Gemini Pro Vision

import os

from beyondllm.llms import GeminiMultiModal

api = "..."

img = "..."

os.environ['GOOGLE_API_KEY'] = api

llm = GeminiMultiModal(

model_name="gemini-pro-vision"

)

user_query = "which IPL franchise does he play for?"

print(llm.predict(prompt=user_query,image=img))

27 of 33

Function Calling

  • Gemini 1.5 Flash

import google.generativeai as genai

genai.configure(api_key=api)

def find_movies(description, location):

return ["Barbie", "Oppenheimer"]

def find_theaters(location, movie):

return ["Googleplex 16", "Android Theatre"]

functions = {

"find_movies": find_movies,

"find_theaters": find_theaters,

}

model = genai.GenerativeModel(

model_name="gemini-1.5-flash",

tools=functions.values())

query = "Which theaters in SF show the Oppenheimer movie?"

response = model.generate_content(query)

print(response)

28 of 33

Prompting with Video

  • Gemini 1.5 Flash

import google.generativeai as genai

api = "...."

video_path = "..."

genai.configure(api_key=api)

video_file = genai.upload_file(path=video_path)

model = genai.GenerativeModel(

model_name="models/gemini-1.5-flash"

)

prompt = "Describe this video."

response = model.generate_content(

[prompt, video_file]

)

print(response.text)

29 of 33

To learn Advanced RAG

30 of 33

End of Slides

31 of 33

Thank You!

Tarun R Jain - LinkedIN

GDE in AI/ML

@TRJ_0751- Twitter

32 of 33

Feedback

Please take a moment and share your feedback

33 of 33

Code

Build RAG app using Gemini LLM and Embedding