Yuan Tang @TerryTangYuan
Principal Software Engineer, Red Hat OpenShift AI�Project Lead, Argo & Kubeflow
Production-Ready AI Platform on Kubernetes
Agenda
Distributed Machine Learning Patterns
AI Landscape & Ecosystem
AI Landscape & Ecosystem
AI Landscape & Ecosystem
AI Landscape & Ecosystem
By Cloud Native AI WG
AI Landscape & Ecosystem
Cloud Native AI WG
AI Landscape & Ecosystem
How much infrastructure is needed
How much data scientist cares
Production Readiness - Scalability
Production Readiness - Reliability
Production Readiness - Observability
Production Readiness - Flexibility
Kubeflow: The ML Toolkit for Kubernetes
Cloud Native Production-ready AI Platform
Cloud Native Production-ready AI Platform
Cloud Native Production-ready AI Platform
2. Model Training
Data partitions
Worker #3
Worker #1
Worker #2
Consume data partition
Consume data partition
Consume data partition
Distributed all-reduce model training with multiple workers and data partitions
Cloud Native Production-ready AI Platform
2. Model Training
Kubeflow Training Operator Architecture
Cloud Native Production-ready AI Platform
2. Model Training
Distributed training with TensorFlow
Cloud Native Production-ready AI Platform
2. Model Training
Cloud Native Production-ready AI Platform
3. Model Tuning
Katib: Kubernetes-native AutoML in Kubeflow
Cloud Native Production-ready AI Platform
3. Model Tuning
Cloud Native Production-ready AI Platform
3. Model Tuning
Example
Experiment Budget
Search Space
Algorithm
Objective
Trial Template
Cloud Native Production-ready AI Platform
3. Model Tuning
Example
Cloud Native Production-ready AI Platform
4. Model Serving
KServe: Highly scalable, standard, cloud agnostic model inference platform on Kubernetes�
Cloud Native Production-ready AI Platform
4. Model Serving
Single model serving
Cloud Native Production-ready AI Platform
4. Model Serving
Multi model serving: ModelMesh
Cloud Native Production-ready AI Platform
4. Model Serving
LLMs
> curl -H "content-type:application/json" -H "Host: ${SERVICE_HOSTNAME}" -v http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/${MODEL_NAME}/infer -d '{"id": "42","inputs": [{"name": "input0","shape": [-1],"datatype": "BYTES","data": [""Where is Eiffel Tower?"]}]}'
{"text_output":"The Eiffel Tower is located in the 7th arrondissement of Paris, France. It stands on the Champ de Mars, a large public park next to the Seine River. The tower's exact address is:\n\n2 Rue du Champ de Mars, 75007 Paris, France.","model_name":"llama2"}
Cloud Native Production-ready AI Platform
4. Model Serving
Problem: model initialization takes a long time
Solution: Modelcars feature (model is in OCI image) in KServe brings:
Cloud Native Production-ready AI Platform
5. Workflow
Argo Workflows
The container-native workflow engine for Kubernetes
Cloud Native Production-ready AI Platform
5. Workflow
CRDs and Controllers
Interfaces
Example
Cloud Native Production-ready AI Platform
5. Workflow
Cloud Native Production-ready AI Platform
5. Workflow
Argo Events
Event-driven workflow automation
Cloud Native Production-ready AI Platform
5. Workflow
Data ingestion
Model training
Cache store (Argo/K8s/etc.)
GitHub events (commits/PRs/tags/etc.)
The data has NOT been updated recently.
The data has already been updated recently.
Argo Events receives the events and then triggers a ML pipeline with Argo Workflow
Cloud Native Production-ready AI Platform
6. Iterations
Distributed Machine Learning Patterns