1 of 1

Developing, Implementing, and Testing Reproducible AI Chatbots in Research and Educational Applications

Devan Alpesh Patel1,2, T. L. Swetnam2

1KEYS intern, 2BIO5 Institute, University of Arizona

Introduction

Materials and Methods

Acknowledgements

Discussion

Implementation

  • CyVerse is a powerful computational infrastructure for researchers & educators

  • AI models have grown in popularity, efficiency, and capability with the rise of GPTs and LLMs

  • Researchers want access to AI chatbots trained on their datasets. These will yield faster, more accurate data-driven insights

  • Following Open Science best-practices requires transparency and reproducibility through open source software, versus commercial software, like what OpenAI sells

  • Creating detailed documentation helps enhance understanding, collaboration, and reproducibility of scientific research
  • Accuracy is verified by citations and hyperlinks of all sources

  • Chatbots enable users to quickly receive accurate responses

  • Each Vector Store (knowledge base) can be automatically updated with current information

  • Assistant also has access to the entirety of the internet to gather information

  • Implementation Cost: Hosting a Virtual Machine

  • Running Cost: Proportional to amount and size of messages

  • Message (token) limits are necessary to prevent overspending ($)

  • OpenAI (ChatGPT) streamlines the development process – it helps to write code and resolve errors quickly

The entirety of KEYS has been an amazing experience! I want to thank my wonderful PI and mentor, Dr. Tyson Swetnam for all he’s taught me, for supporting me throughout the summer, and for dedicating his time to mentor me. I also want to thank Tony Edgin and Nirav Merchant from CyVerse, BIO5 Institute, Data Science Institute, and Institute for Computation and Data Enabled Insight. Finally, I am beyond thankful for all the KEYS staff and their hard work to make this internship possible.

Scan QR for references

Research Goal

Create a custom ChatGPT 4o chatbot trained on any website. Implement my chatbot on the CyVerse Learning Materials websites

Figure 1. Process flow chart

Materials:

Cron Job

Virtual Machine

Vector Store

Data Files

User Website

https://devan-p.github.io/KEYS2024

See my KEYS documentation

website for all the details:

Chatbot is also active there!�

Documentation

Figure 10. Desktop view of the custom chatbot (red) on a CyVerse Workshop website (https://foss.cyverse.org/).

Vector Store

Figure 2. HTML icon

Figure 3. CSS icon

Figure 4. JavaScript icon

Figure 5. Python icon

Figure 6. Chatbot icon

Figure 7. OpenAI icon

Figure 8. Github icon

Figure 9. MkDocs icon

Cost

  • Interacting with a custom LLM assistant can be expensive depending on many factors: model, # of tokens, vector store size

  • GPT-4o: $5.00 / 1M input tokens; $15.00 / 1M output tokens

  • On average, a single message and response costs about $0.09

  • CyVerse is designing an open-source Chatbot to replace OpenAI’s API

Average Cost for 1 message and response: ~9.5¢

Figure 11. Mobile view of the chatbot button (red) on a CyVerse Workshop Website (https://foss.cyverse.org/).

Figure 13. Process flow chart. Depicts updates to the assistant’s knowledge base.

Figure 12. Vector Store depiction, contains files from: Word, Excel, Markdown, Powerpoint, Webpage, PDF, MP4, and JPEG icons