LLamaIndex FAQ
This serves as a lightweight, easy to update sibling of https://gpt-index.readthedocs.io/en/latest/
We will periodically pull the FAQs back into the main documentation page.
A: OpenAI has a default max token output of 256 tokens. You can change this (and for most 3rd party APIs) by following the guide in the docs: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-changing-the-number-of-output-tokens-for-openai-cohere-ai21
A: Yes! There is a guide to using any custom LLM here, where a model from huggingface is locally loaded and used:
A: Yes! There is a guide to using embeddings from huggingface here:
https://gpt-index.readthedocs.io/en/latest/how_to/customization/embeddings.html#custom-embeddings
A: By default, Llama Index uses OpenAI to calculate embeddings and for synthesizing answers to queries. You can read about their policies below:
https://openai.com/policies/privacy-policy
https://openai.com/policies/api-data-usage-policies
A: Are you using ChatGPT? The original internal prompts were optimized for OpenAI’s Davinci model. We’ve made some optimizations for ChatGPT-specific prompts, but ChatGPT is a pretty stubborn model to work with. You can try creating your own prompts as well.
You can read more about creating your own prompts here: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html
For inspiration, ChatGPT-specific prompts are defined here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py
And the original default prompts are defined here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py
A: The quality of responses is usually down to the structure of the index. If you are using a single index, you may need to look into using a composable index. A list or keyword index with a bunch of sub list/vector indexes is a good start.
Furthermore, if you can do any logical splitting of your documents ahead of time (a sub-index for each topic, splitting documents according to sections.etc.), this will also help with response quality.
A guide to composable indexes is available here: https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/composability.html
A: If you are using a vector store index (e.g. `GPTSimpleVectorIndex`). We retrieve the top 1 document as context by default (which might not contain the desired context for your query). Try using a higher `similarity_top_k` when querying.
A: Just construct the index as you normally would, but pass an empty array instead of actual documents. E.g. index = GPTPineconeIndex([], pinecone_index=index)
A: See https://github.com/jerryjliu/llama_index/issues/738
A: Most likely due to OpenAI API availability. Try switching to a different model might help. The “#statuspage” in the discord also tracks any issues with OpenAI’s API.
A: You need to pass the `llm_predictor` and `embed_model` used when initially creating the index. In version 0.5.0, this has also changed to using a ServiceContext object.
A: Llama Index supports all models from OpenAI, including text-davinci-003 (default), gpt-3.5-turbo (chatgpt), and gpt-4!
To use chatgpt, see this example:
To use gpt-4, see this example:
https://github.com/jerryjliu/llama_index/blob/main/examples/test_wiki/TestNYC-Tree-GPT4.ipynb
The default prompts internal prompts are here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py And The chatgpt specific prompts here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py
You can follow those to create your own, and pass them in at query time 
index.query(..., text_qa_template=my_template, refine_template=my_refine_template)
There is also a docs page here: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html
Try setting a lower `chunk_size_limit` in `ServiceContext`