Yuan Tang (Akuity) and Andrey Velichkevich (Apple)
Managing Thousands of Automatic Machine Learning Experiments With Argo and Katib
Kubeflow Overview
What is Katib ?
Katib Architecture
Challenges in HP Tuning
Argo Workflows
Memoization Cache
Step A
Cache Store
If the cache is outdated, re-run the step.
If the cache is still fresh (within a configurable time, e.g. <10s), retrieving cache and use it.
Step B
Creating cache (if not exist yet)
Cache is saved as ConfigMaps
Memoization Cache - Example
Cache (K8s ConfigMap):
Memoization in ML Workflows
Data ingestion
Model training
Cache store
The data has NOT been updated recently.
The data has already been updated recently.
Triggers a ML pipeline with Argo Workflow
Triggers a Katib Experiment
Memoization Cache
Sequential steps:
Data ingestion step:
Distributed model training step:
Multi-objective Optimization Pipeline
Data Ingestion
Data Ingestion
Logistic Regression
Accuracy: 68%
Neural Networks
AUC: 76%
Decision Trees
Loss: 90%
Metrics Collection
Hyper-parameters Suggestion
Triggers a ML pipeline with Argo Workflow with the new suggested hyperparameters
Multi-objective Optimization Pipeline
DAG:
Data ingestion steps:
Model training steps:
Demo: Caching
Join Argo Workflows and Katib Community
Thank you!