Evolving KServe: The Unified Model Inference Platform for Both Predictive and Generative AI
Yuan Tang
Senior Principal Software Engineer, Red Hat
Project Lead, KServe
#KubeCon #CloudNativeCon
Model inference platform
Supported runtimes
Orchestration
Hardware accelerators (GPUs, CPUs, etc.)
Cloud native integrations
Autoscaling
Networking
Generative
Predictive
GenAI integrations
KServe
CNCF incubating project (Sept 2025)
30+ adopters
19 maintainers
300+ contributors
Trusted by industry leaders
KServe is used in production by organizations across various industries, providing reliable model inference at scale.
GenAI Building Blocks
LLM Metric-Based Autoscaling
Prompt Caching
Intelligent Routing, Traffic Management
GenAI Runtime
vLLM, TRT-LLM,�llm-d
Scale
Cost
Latency /
Throughput
Efficiency
Optimized LLM Inference
A CNCF Sandbox project for distributed large language model inference that runs natively on Kubernetes.
Join Our Community!
https://github.com/kserve/kserve
Find Us This Week!
https://github.com/kserve/kserve