1 of 1

ChatEbola: A RAG-Based Chatbot For Ebola

  • Data collection: Ebola data were collected from reliable sources, including the World Health Organisation (WHO) and the Centers for Disease Control and Prevention (CDC).

  • Data loading and chunking: Data were loaded and split into pieces or chunks.

  • Vector database: The chunked documents were embedded and inserted into a vector database for storage, allowing them to be searched over with encodings.

  • Context retrieval: Users' queries are encoded and sent to the vector store to retrieve relevant contexts.

  • Prompt engineering: Prompts are added to retrieved contexts and users' queries for answer generation.

  • Answer generation: A large language model (gemini-flash) is implemented and supplied with contexts and queries to generate responses to questions.

  • Model evaluation: the performance of the overall RAG-based model will be analyzed with unseen datasets using retrieval-augmented generation assessments (ragas) and human evaluation.

  • Model deployment: after high performance, the model will be deployed as a chatbot using Streamlit for global access.

Simeon Krah 1

Nigel Dolling 1

[1] Jacob, S. T., Crozier, I., Fischer, W. A., Hewlett, A., Kraft, C. S., Vega, M. A. D. L., ... & Kuhn, J. H. (2020). Ebola virus disease. Nature reviews Disease primers, 6(1), 13.

[2] Malvy, D., McElroy, A. K., de Clerck, H., Günther, S., & van Griensven, J. (2019). Ebola virus disease. The Lancet, 393(10174), 936-948.

[3] Baize, S., Pannetier, D., Oestereich, L., Rieger, T., Koivogui, L., Magassouba, N. F., ... & Günther, S. (2014). Emergence of Zaire Ebola virus disease in Guinea. New England Journal of Medicine, 371(15), 1418-1425.

[4] Raja, M., & Yuvaraajan, E. (2024, April). A RAG-based Medical Assistant Especially for Infectious Diseases. In 2024 International Conference on Inventive Computation Technologies (ICICT) (pp. 1128-1133). IEEE.

[5] Quidwai, M. A., & Lagana, A. (2024). A RAG Chatbot for Precision Medicine of Multiple Myeloma. medRxiv, 2024-03.

www.deeplearningindaba.com/2025/

Kigali, Rwanda

INTRODUCTION

Deep Learning Indaba 2025

DISCUSSION

Achieving context recall, context precision, faithfulness and answer correctness scores of 91%, 93%, 97%, and 80% is not out of place and aligns with other related works. The retriever retrieves a large portion of all related documents which are relevant for particular contexts. The LLM, as demonstrated by the faithfulness and answer correctness scores, also does well in relying on the retrieved contexts to generate an answer. Future work will involve improving all performances, especially that of the LLM to generate highly accurate and safe responses.

By deploying, we hope the chatbot will provide timely, accurate, and trusted information about Ebola, complementing the global effort to address the disease. Ultimately, the chatbot will help educate the public, reduce the time needed to search for information and combat misinformation associated with the Ebola vaccine while exhibiting its memory and conversational capabilities.

REFERENCES

CONCLUSION

Ebola virus disease (EVD) is a severe illness, usually characterised by haemorrhagic fever, caused by the Ebola virus. It is often fatal, with a death rate ranging from 25% to 90%. Through contact with infected animals as a result of butchering, cooking or eating, people become infected with Ebola. However, most cases stem from human-to-human transmission through contact with body fluids or secretions of infected people.

Since its inception in 1976 to date, Ebola continues to threaten public health, especially in the Central African regions, with 14 outbreaks as of 2017. Even with the development of vaccines for some types of the Ebola virus disease, Ebola cases are recorded presently, highlighting the need for additional ways of dealing with the disease.

Leveraging natural language processing in healthcare, large language models have demonstrated important use in public health. For instance, chatbots have been created to complement the efforts of healthcare professionals, ranging from disease diagnosis to treatment. However, drawbacks including hallucinations, cut-off knowledge and addition cost of fine-tuning exist with large language models.

By utilizing Retrieval-Augmented Generation (RAG), this project seeks to provide a reliable and interactive platform for public health education about Ebola virus disease to ensure early diagnosis, treatment, and prevention and address misinformation and myths parading vaccine hesitancy.

METHODS

1 Noguchi Memorial Institute for Medical Research

2 Department of Biomedical Engineering, University of Ghana

Samuel Kwofie 2

RESULT

Upon evaluation with ragas, the model achieved:

  • Context recall score of 91%
  • Context precision score of 93%
  • Faithfulness score of 97%
  • Answer correctness score of 80%

FUTURE WORK

  • Implement graph-based retrieval and response generation
  • Perform human expert evaluation
  • Deploy online either as a mobile or web app
  • Add multilingual capabilities