RAG ASSISTANT
A Personal Assistant
SACHIN JADHAV
RUTHWIK DASYAM
ZAHIR MAHAMMAD
CONTENTS
PROJECT SCOPE
Rather than giving a general answer, A Personalized assistant that
PROBLEM
Goal: Retrieval-Augmented Generation
Develop a multi-modal foundation model that can retrieve and understand data from documents stored on local systems, regardless of document type (image, text, chart, or table). The assistant should:
The solution should include a UI-based chatbot that runs locally, allowing users to interact via speech or text:
RELATED WORK
1. Traditional OCR-based
Reference : Smith, R. (2007). An Overview of the Tesseract OCR Engine (ICADR 2007)
2. Poppler Library
Reference : Poppler Development Team. (2021). Poppler: PDF rendering library. In freedesktop.org
3. FAISS for textual content indexing
Reference : Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE
4. DONUT Huggingface model
Reference : Kim, G., Hong, S., et al. (2022). OCR-free Document Understanding Transformer. In European Conference on Computer Vision (ECCV 2022)
Text-Based RAG
APPROACH
Image-Based RAG
APPROACH
Reference:
Combined Approach
APPROACH
Text in doc
Image in doc
RESULTS
Input- a folder containing pdfs
The Chatbot retrives the image and text data from the pdf and stores the indices.
It then retrives the relevant context based on the user Query
RESULTS
User Query - What does the image represent in the quantum section of tech mag
RESULTS
User Query - What percentage of women owned startups in the world does chicago have
RESULTS
User Query - Which is in top 10 univ of Computer Science for undergrads
RESULTS AND FINDINGS
GUI for the Chatbot
RESULTS AND FINDINGS
Audio Input and Audio Output for the ChatBot
Thank you.