Q: My response is cut-off

A: OpenAI has a default max token output of 256 tokens. You can change this (and for most 3rd party APIs) by following the guide in the docs: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-changing-the-number-of-output-tokens-for-openai-cohere-ai21

Q: Can I use my own LLM instead of a 3rd party?

A: Yes! There is a guide to using any custom LLM here, where a model from huggingface is locally loaded and used:

https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-custom-llm-model

Q: Can I use my own embeddings model?

A: Yes! There is a guide to using embeddings from huggingface here:

https://gpt-index.readthedocs.io/en/latest/how_to/customization/embeddings.html#custom-embeddings

Q: Does Llama Index send my files off of my computer?

A: By default, Llama Index uses OpenAI to calculate embeddings and for synthesizing answers to queries. You can read about their policies below:

https://openai.com/policies/privacy-policy

https://openai.com/policies/api-data-usage-policies

Q: Responses to my queries keep mentioning XYZ (part of the internal prompts):

A: Are you using ChatGPT? The original internal prompts were optimized for OpenAI’s Davinci model. We’ve made some optimizations for ChatGPT-specific prompts, but ChatGPT is a pretty stubborn model to work with. You can try creating your own prompts as well.

You can read more about creating your own prompts here: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html

For inspiration, ChatGPT-specific prompts are defined here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py

And the original default prompts are defined here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py

Q: My queries aren’t finding relevant context in my index, how can I fix this?

A: The quality of responses is usually down to the structure of the index. If you are using a single index, you may need to look into using a composable index. A list or keyword index with a bunch of sub list/vector indexes is a good start.

Furthermore, if you can do any logical splitting of your documents ahead of time (a sub-index for each topic, splitting documents according to sections.etc.), this will also help with response quality.

A guide to composable indexes is available here: https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/composability.html

Q: Not possible to answer question given context information

A: If you are using a vector store index (e.g. `GPTSimpleVectorIndex`). We retrieve the top 1 document as context by default (which might not contain the desired context for your query). Try using a higher `similarity_top_k` when querying.

Q: How do I load my <3rd party vector store> and use the documents already in the store?

A: Just construct the index as you normally would, but pass an empty array instead of actual documents. E.g. index = GPTPineconeIndex([], pinecone_index=index)

Q: Assertion error from `assert bpe_ranks == encoder_json_loaded`

A: See https://github.com/jerryjliu/llama_index/issues/738

Q: Rate limit issue

A: Most likely due to OpenAI API availability. Try switching to a different model might help. The “#statuspage” in the discord also tracks any issues with OpenAI’s API.

Q: load_from_disk does not work for Azure OpenAI

A: You need to pass the `llm_predictor` and `embed_model` used when initially creating the index. In version 0.5.0, this has also changed to using a ServiceContext object.

Q: What is the latest OpenAI model currently supported?

A: Llama Index supports all models from OpenAI, including text-davinci-003 (default), gpt-3.5-turbo (chatgpt), and gpt-4!

To use chatgpt, see this example:

https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb

To use gpt-4, see this example:

https://github.com/jerryjliu/llama_index/blob/main/examples/test_wiki/TestNYC-Tree-GPT4.ipynb

Q: I want more control over the responses to my queries (i.e. specific output formats, following rules, etc.), how can I do this?

The default prompts internal prompts are here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py And The chatgpt specific prompts here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/chat_prompts.py

You can follow those to create your own, and pass them in at query time
index.query(..., text_qa_template=my_template, refine_template=my_refine_template)

There is also a docs page here: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html

Q: Model’s maximum context length was exceeded

Try setting a lower `chunk_size_limit` in `ServiceContext`