Async Batch Processing Solution for Cloud-LLMs
Yuhan Chen, Noah Robitshek, Sergio Rodriguez,
Andrew Sasamori, Rayan Syed, Bennett Taylor
In collaboration with Two Sigma
Data Transfer
Data Transfer
Dataset
Data Batches
Data Partition
Kafka Topic
Data Transfers
Data Formats and Partitioning
API Frameworks
API Hosting
System Design (API)
Users
API
WebSocket or HTTP Communication
Apache Backend
Kafka Queueing/PubSub
Flink and Redis Data Store
Rate Limiting
Logic:
Potential Rate Limit Error Handling
LLM APIs
Decision upon API model is on the basis of token usage for the most part
Data going into API request should already be parsed in a format that follows specific parameters defined by LLM of choice
LLM nodes:
Individual instances or endpoints of the language model that handle the processing of requests. Each LLM node represents a separate API call to a different language model
Future Plans
Burndown Chart
Async Batch Processing Solution for Cloud-LLMs
Yuhan Chen, Noah Robitshek, Sergio Rodriguez,
Andrew Sasamori, Rayan Syed, Bennett Taylor
In collaboration with Two Sigma
Appendix
System Design (TEMPLATE)
Users
API
Load Balancer
Batch Processing
LLM Nodes
Major Roadblocks