1 of 8

ArcticSearch

  • Saurav Joshi

Document Efficient Search vs Hallucination

2 of 8

Hallucinations Challenge in LLM

  • Hallucination is the scenario of where the model generates information that is not based on accurate legal knowledge or data. It occurs when the model produces outputs that are fictional, speculative, or unsupported by established legal principles or precedents.
  • For example, if you were to ask a language model a question about a historical event that never occurred, it could still generate a plausible response, even though it was entirely invented. These made-up responses from LLM are called hallucinations.
  • In order to reduce the hallucination inside LLMs, we now have more domain-specific models or technologies which ensure hallucination is reduced to the best of its ability. Hence, I am have chosen a specific topic - documentations of various codebases as it seems LLM models don’t perform accurately when questions asked are very specific. Additionally, I have leveraged technologies such as Pinecone, and Llama-index to perform the research.

3 of 8

Documentation

4 of 8

Tech Stack

  1. Python - Implemented the entire codebase using Python and related manipulation libraries such as Pandas, Numpy, etc.
  2. Llama-index - Used the Llama-index parser to parse all the html web pages into documents and nodes and created the model i.e GPT index using llama-index.
  3. Pinecone - Created vector embeddings and stored them into the vector database for high performance and efficiency.
  4. OpenAI - Leveraged OpenAI models to build a baseline and fine-tuned a specific model using retriever aware training strategy.

5 of 8

Dataset Construction Strategy

  • I fetched 8 open-source and closed documentations - Pinecone, Faiss, llama-index, pandas, sentence transformers, etc.
  • Extracted documents and eventually nodes using llama-index parser.
  • I used human evaluation to test the model due to time restrictions - total of 30 rows in the dataset - 15 for training and 15 for testing the various models.
  • Example set - By default llama_index uses which OpenAI model? Use this documentation for reference: https://gpt-index.readthedocs.io/en/latest/,text-davinci-003

6 of 8

OpenAI Models

  1. GPT3.5
  2. GPT4
  3. Zero-shot finetuned GPT3
  4. Prompt aware finetuned GPT3
  5. Prompt aware GPT4

7 of 8

Results

8 of 8

Future Work

  • Finetune GPT4 model.
  • Finetune Google Llama 7B model.
  • Develop training and testing dataset automatically.
  • Develop an application to display to an end user.