1 of 46

Contributors:

thealamkin@google.com | @taylamkin

josh.bottum@canonical.com

david.aronchick@microsoft.com | @aronchick�carmine.rimi@canonical.com | @carminerimi

2019-03-12

Kubeflow

Contributors Summit

Product Manager Update

2 of 46

Agenda

  • Building the Kubeflow Ecosystem
  • Kubeflow User Survey
  • Collecting User Input for Releases
  • The Release Schedule

3 of 46

David Aronchick�david.aronchick@microsoft.com | @aronchick

Building the�Kubeflow �Ecosystem

4 of 46

Kubecon 2017

5 of 46

The Problem

  • Setting up an ML stack/pipeline is incredibly hard
  • Setting up a production ML stack/pipeline is even hardER
  • Setting up an ML stack/pipeline that ports between the 81% of enterprises that use multi-cloud* environments is EVEN HARDER.

* Note: For the purposes of today, “local” is a specific type of “multi-cloud”

6 of 46

Kubeflow Contributor Summit 2018

7 of 46

Kubeflow Contributor Summit 2018

8 of 46

Mission�(2017)

9 of 46

Make it Easy for Everyone

to Develop, Deploy and Manage Portable, Distributed ML�on Kubernetes

10 of 46

Mission�(2018)

11 of 46

Make it Easy for Everyone

to Develop, Deploy and Manage Portable, Distributed ML�on Kubernetes

12 of 46

Mission�(2019)

13 of 46

Make it Easy for Everyone

to Develop, Deploy and Manage Portable, Distributed ML�on Kubernetes

14 of 46

Kubeflow

15 of 46

Summary

  • Kubeflow = Cloud Native, multi-cloud solution for ML.
  • Kubeflow provides a platform for composable, portable and scalable ML pipelines.
  • If you have a Kubernetes conformant cluster, you can run Kubeflow.

16 of 46

Cloud

Training

Experimentation

17 of 46

Critical User Journey Comparison

2017

  • Experiment with Jupyter
  • Distribute your training with TFJob
  • Serve your model with Seldon

2019

  • Setup locally with miniKF
  • Access your cluster with Istio/Ingress
  • Ingest your data with Pachyderm
  • Transform your data with TF.T
  • Analyze the data with TF.DV
  • Experiment with Jupyter
  • Hyperparam sweep with Katib
  • Distribute your training with TFJob
  • Analyze your model with TF.MA
  • Serve your model with Seldon
  • Orchestrate everything with KF.Pipelines

18 of 46

Momentum!

  • ~4000 commits
  • ~200 community contributors
  • ~50 companies contributing, including:

19 of 46

Community Contributions

NOT�GOOGLE

GOOGLE

Kubernetes

Kubeflow

NOT�GOOGLE

GOOGLE

20 of 46

Community Contribution Katib from NTT

  • Pluggable microservice architecture for HP tuning
    • Different optimization algorithms
    • Different frameworks
  • StudyJob (K8s CRD)
    • Hides complexity from user
    • No code needed to do HP tuning

20

21 of 46

Community Contribution TensorRT from NVidia

  • Production datacenter inferencing server
  • Maximize real-time inference performance of GPUs
    • Multiple models per GPU per node
    • Supports heterogeneous GPUs & multi GPU nodes
  • Integrates with orchestration systems and auto scalers via latency and health metrics

21

22 of 46

Community Contribution Argo from Intuit

  • Argo CRD for workflows
  • Argo CRD is engine for Pipelines (more on that later)
  • Argo CD for GitOps

22

23 of 46

Community Contribution Notebooks & Storage from

Arrikto

  • Core Notebook Experience

  • 0.4: New JupyterHub-based UI
    • Multiple Persistent Volumes

  • 0.5: K8s-Native Notebooks UI

  • Pipelines: Support for local storage

  • MiniKF: All-in-one packaging for seamless local deployments

23

24 of 46

But We Need �Help!

STILL

25 of 46

Ok! But WHAT�Do We Need…?

26 of 46

Kubeflow

User Survey Update

Contributors:

josh.bottum@canonical.com, thealamkin@google.com,

2019-03-12

27 of 46

28 of 46

29 of 46

Hybrid

vmWare or OpenStack

30 of 46

Iteration/Tracking Experiments

31 of 46

Simplified end-to-end workflows

Automatic Hyperparameter tuning

32 of 46

Roadmap Directions

  • Simplified 1st use
    • through landing page and dashboard
  • Simplified end-to-end workflows
    • with integrated build, train and deploy
  • Enterprise readiness
    • Authentication, Isolation, multi-user

  • Plus your input!

33 of 46

Collecting User Input for Upcoming Releases

Contributors:

josh.bottum@canonical.com, thealamkin@google.com,

2019-03-12

34 of 46

Process to collect user input on Roadmap items (CUJs)

  • Goals
    • Develop a simplified process to collect user input on roadmap items
    • Build end-user energy & ownership of new features

  • Need to Balance
    • Which Personas
    • Consistent input vs. new views
    • Easy CUJ creation & Easy collection of user feedback

  • Proposed Process
    • http://bit.ly/2TsPYfg
    • Major releases may require extra data collection (Kubeflow 1.0)

35 of 46

CUJ Format

  • Kubeflow Release Mgr PM will work to identify ~3 topics for data collection

  • Kubeflow Engineering Leads will develop CUJs and feature descriptions
    • Target - Week 3 of 12

  • Kubeflow Engineering Leads will develop 1-2 questions that data gather sessions will gather (prior to the meeting)

  • Kubeflow Engineers will identify the preferred personas for the questions

36 of 46

CUJ Delivery

  • CUJs / Feature Pack Reviews
  • 1 to 3 Reviews in each format
    • in a Group format
    • in a 1:1 format
  • Reviews should be 20-30 minutes.

  • Kubeflow OutReach PM Responsibilities
    • Set-up meetings
      • Target: 50% of input from repeat end-users
    • Take notes
      • Validate input (before closing the call)
      • Store in Google docs & posted edited notes

37 of 46

Outreach & Local Events

Contributors:

josh.bottum@canonical.com, thealamkin@google.com,

2019-03-12

38 of 46

Kubeflow Days

Goals

  • Increase Kubeflow adoption
  • Simplify on-site updates to top Communities
  • Develop local end-user presenters and use cases

Kubeflow Day LA (at SoCal Linux Expo - March 2019)

  • 11 vendor sessions
    • Google, Microsoft, Cisco, Arrikto, Redhat, Canonical, MavenCodes
  • 350+ Paid Registrations, 120+ attendees
    • TicketMaster, Disney, Dreamworks, Sony, Honey,+,+
  • Attendee Feedback
    • Core contributor sessions received more feedback / appreciation

39 of 46

Carmine Rimi�carmine.rimi@canonical.com | @carminerimi

The Release Cycle

40 of 46

Critical User Journeys

  • CUJ Beginner Guide

Target Personas:

  • Data Scientist
  • ML Engineer

  • Data Engineer

  • Application Engineer
  • Platform Engineer

  • Infrastructure Engineer
  • Operations Engineer
  • Automation Engineer
  • Manager


41 of 46

CUJ Process(es)

  • Identify “Critical” Persona / Feature / Experience Gap
  • Write Draft CUJ
  • Circulate, Consolidate Feedback, Finalize
    • Feedback from Target Persona {before | during | after}1+
  • Create Label + Issues
  • Possible Roadmap Assignment
  • Finalize CUJ (add issue query)

Identify

Draft

“Final”

Label &

Issues

Roadmap

Final

42 of 46

Release Cycles

Phase 0

  • Themes, CUJs, Debt
  • Kubeflow Workstreams - Coordinators

Phase 1

  • Scope, Resources, Quality
  • (time already established)

Phase 2

  • Execution, Tracking
  • Releasing ..

Release N-1

Release N

Release N+1

3+ months

3+ months

3+ months

43 of 46

Integrating

44 of 46

Integrating

  • CI (Build, E2E Tests)
  • Deployment
  • Lifecycle
  • Pipeline
  • Docs
  • Certification? (Portability)
  • Code
  • Examples

45 of 46

Thank you

Carmine Rimi�carmine.rimi@canonical.com | @carminerimi

46 of 46

Kubeflow

Contributor Summit Product Manager Update

End