Kubeflow at Spotify
How Kubeflow Pipelines fits into our Machine Learning ecosystem
Kubeflow at Spotify
How Kubeflow Pipelines fits into our Machine Learning ecosystem
Josh Baer
ML Platform Product Lead
Twitter: @j6aer
Music Streaming Service
Launched in 2008
230M Active Users
50M Tracks
79 Countries
Machine Learning Journey - Overview
How long does it take to build an ML prototype?
Most teams spend 1-3 “sprints” getting an initial prototype out
How many product teams will wait this long to get initial learnings?
How long does it take to go from prototype -> production-grade solution?
Over 30% of ML practitioners spend more than a quarter turning an idea into production software
Machine Learning Journey - Overview
01
02
03
04
Measurement, Experimentation and Tweaking
ML Productionization
Problem Definition
Tweak
Evaluate
Train
Develop
Model Prototyping
Problem Definition
Prototype
Productionize
Measure
4w
2w
1w
2w
1w
2w
2w
Many iterations per phase
Problem Definition
Prototype
Productionize
Measure
4w
2w
1w
2w
1w
2w
2w
14 weeks to go from a defined problem to a production solution!
Difficult to Collaborate
Keeping track of projects, artifacts and lineage was difficult
No common way of building workflows
Teams using N different frameworks in different ways. No shared learnings.
Slow feedback loops
Data analysis was separate from model training and model analysis. Each step is custom
Other Challenges
Kubeflow Pipelines
Tensorflow Extended
Kubeflow + TFX at Spotify
Test Cluster
Internal development cluster to test upgrades, run integration tests
Development Cluster
For running ad-hoc jobs, developing new workflows
Production Cluster
For regularly scheduled workloads
Higher availability SLA
“Spotify” Kubeflow Setup
Caching
Quicker resumption of failed tasks
Central Metadata
Keep track of what’s being built and run Spotify-wide
Command Line Tooling
Allows for scheduling and execution of jobs via luigi (Spotify orchestration
Shared-VPC Integration
Connect with other Spotify services
Common TFX Components
Easily run tfx-based pipelines
Other Spotify Kubeflow Features
Over 15,000 Kubeflow Pipeline Runs!
Machine Learning Journey - Updated
Problem Definition
Prototype
Productionize
Measure
4w
1w
2d
1d
1d
2d
1d
Shorter iteration cycles =>
faster time to production =>
better ML in our products
Kubeflow Pipelines
Recent Progress
Mention Hack week last week:
Our Kubeflow Timeline
August 2019
“Spotify” Kubeflow Pipeline Platform launched in alpha.
Jan 2020
Launch beta - open it up to the entire Spotify community
Aug 2018
Kubeflow Pipeline Launches
Jan 2019
First teams trying out Kubeflow. Start focusing infra efforts
We’re here
Our Vision for Kubeflow Pipelines
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015.
Our Vision for Kubeflow Pipelines
Kubeflow
Components
Our Vision for Kubeflow Pipelines
Kubeflow
For building this community...
Want to hear more?
Check our Keshi and Ryan’s Talk at Kubecon: “Building and Managing a Centralized Kubeflow Platform at Spotify”