Major Project | End-term Presentation��Vanshika Mishra
LLMs as Question Answering Chatbot
Sudhir Sharma
Contents
About Samagra
SamagraX is dedicated to building and shaping open-source population-scale products, platforms, and protocols. The focus is on creating Building Blocks (BBs) and Digital Public Goods (DPGs) that empower governments to leverage technology and data to transform the lives of millions.
Building Blocks (BBs), as defined by GovStack, are software code and applications that are interoperable, scalable, and reusable across various use cases and contexts. They provide essential digital services at scale, ensuring efficient and effective solutions for diverse needs.
Digital Public Goods (DPGs) include open-source software, open data, open AI models, open standards, and open content. These resources adhere to privacy laws and best practices, are designed to do no harm, and contribute to achieving the Sustainable Development Goals (SDGs).
SamagraX operates across both these categories by either building solutions from scratch or enhancing existing BBs and DPGs. The portfolio includes shaping over 10 BBs and 3 DPGs, which have been instrumental in delivering more than 20 products across various domains and states.
The mission of SamagraX is to empower governments with innovative technology and data solutions, driving large-scale positive impact and transforming lives.
Brief Introduction of the Project
BASIC OVERVIEW
FEATURES OF THE PROJECT - Ingestion
5. Indexing in milvus testing
6. Document Ingestion
7. Sample dataset creation to create question answer pairs
FEATURES OF THE PROJECT - Retrieval
1. Tokenize the question from the user, clean the input.
2. Generate query embedding.
3. Find the top n matching documents from the document store.
4. Evaluate the most suitable model that can be used for short answer and generative answering based on context.
5. Fine-tuning of models for the current data set.
6. Pass the top n documents to the models and extract the responses.
7. Merge the responses and return them to the end user.
8. Returning the context info also to the end user along with the answers.
PROJECT DESIGN
WORKFLOW
Text Extraction, tokenization and Cleaning
Sample Dataset Creation
Testing milvus indexes and ingesting
MILVUS QUERY RESULTS
The table demonstrates the results of similarity searches using Milvus,
based on a dataset of questions and answers about India. The retrieved results
include the query question, the answer, and the context from which the answer
was derived.
TESTING WHICH LLM IS BEST SUITED?
TESTING WHICH LLM IS BEST SUITED?
The Testing Metrics are -
TESTING WHICH LLM IS BEST SUITED?
The Testing Metrics are -
3. Avg ROUGE-1 Score :- ROUGE scores indicate how much of the reference content is covered by the generated answers. Higher ROUGE scores suggest that the model is capturing relevant information from the source documents.
4. Accuracy :- Similarity score between predicted answer and ground truth answer.
TESTING WHICH LLM IS BEST SUITED?
The Testing Metrics are -
5. Latency :-Latency measures the time taken by the model to generate an answer for a given query.
Final Result
First Priority given to accuracy and second to latency.
Based on this
distilbert-base-uncased-distilled-squad
was selected as the LLM Model
and we fine- tuned it using langchain to our use case.
Getting User Query and Processing it
Step 1 :- Get the user query and pre-process it.
Step 2 :- Form the Question Embedding and pad it according to our vector database sequence length.
Step 3 :- Pass this through vector db to find the top-n documents.
D
Query Result
Future Scope :-
Continuously ingest new and diverse articles to expand the knowledge base.
Incorporate various types of data such as multimedia content (videos, images) and
structured data (tables, graphs) to provide a richer set of information.
Regularly update and fine-tune the LLMs with new datasets and user feedback
to improve accuracy and contextual understanding.
Experiment with cutting-edge LLM architectures and fine-tuning techniques to
push the boundaries of performance.
Extend the system’s capabilities to handle queries in multiple languages,
making it accessible to a global audience.
Future Scope :-
d. User Personalization:
Develop mechanisms to personalize responses based on individual user
preferences, query histories, and contextual factors.
e. User Feedback Loop:
Establish a feedback loop where users can rate the relevance and accuracy of
the responses.
f. Advanced Query Handling:
Enhance the system’s ability to understand and process complex, multi-faceted
questions.
Learnings
THANKYOU FOR THIS INCREDIBLE EXPERIENCE