1 of 41

ML Problems: Formulation and Adoption

Sayak Paul

ML at 🤗

@RisingSayak

2 of 41

$whoami

  • ML at Hugging Face 🤗
  • Past: Carted, PyImageSearch, DataCamp, TCS Research
  • Open-source 🥑 (Keras, KerasCV, 🤗 Transformers, etc.)
  • Netflix nerd
  • Coordinates at sayak.dev

2

3 of 41

ML is fascinating!

“A 3D render of an astronaut walking in a green desert” (Stable Diffusion 2)

https://huggingface.co/spaces/stabilityai/stable-diffusion

4 of 41

What are we upto today?

  • Problem formulation in ML
    • Is my problem suitable to worked out with ML?
    • Yes:
      • How do we know?
      • Defining the fundamentals of an ML system
      • What metrics should I optimize?
  • ML adoption
    • Tooling for ML adoption at various stages
      • PoC
      • MVP and beyond

4

5 of 41

Disclaimer: The talk is focused on initiating an ML project and NOT on what goes after initiating one.

6 of 41

Problem Formulation in ML

Prompt: “Formulating problem statements in ML” (Stable Diffusion 2)

7 of 41

What is Supervised Machine Learning?

In 90 seconds,

summarise what you know in pairs. Go!

8 of 41

Terminology

Label is the true thing we are predicting “y

  • The y variable in basic linear regression

8

9 of 41

Terminology

Label is the true thing we are predicting “y

Features are input variables describing our data “x1”

  • The x1, x2, …, xn variables in basic linear regression

9

10 of 41

Terminology

Label is the true thing we are predicting “y

Features are input variables describing our data “x1

Example is a particular instance of data, x

10

11 of 41

Terminology

Label is the true thing we are predicting “y

Features are input variables describing our data “x1

Example is a particular instance of data, x

Labeled example {features, label}: (x, y)

  • Used to train the model

11

12 of 41

Terminology

Label is the true thing we are predicting “y

Features are input variables describing our data “x1

Example is a particular instance of data, x

Labeled example {features, label}: (x, y)

Unlabeled example {feature, ?}: (x, ?)

  • Used for making predictions on new data

12

13 of 41

Terminology

Label is the true thing we are predicting “y

Features are input variables describing our data “x1

Example is a particular instance of data, x

Labeled example {features, label}: (x, y)

Unlabeled example {feature, ?}: (x, ?)

Model maps examples to prediction labels: y

  • Defined by internal parameters, which are learned

13

14 of 41

Extensions

  • Do we have a precise understanding of the inputs and outputs of the ML system?
  • Do we know the decisions the ML system will help driving?
  • How do we measure success?
  • Can the problem be solved using heuristics?

14

15 of 41

ML in the wild

15

Deep learning algorithm does as well as dermatologists in identifying skin cancer

Label: __________

Feature: __________

Example: __________

Labeled Examples: ___________

Unlabeled Examples: ___________

Output: ___________

16 of 41

A Framework

16

  1. Frame the problem:

What will traffic be like tomorrow?

  • Make a hypothesis:

Weather forecast could be informative.

  • Collect the data:

Collect historical traffic and weather data.

  • Test hypothesis:

Test a model with the data

  1. Analyze results:

Is this model better than existing systems?

  • Reach a conclusion:

I should (not) use this model, because of X, Y, and Z.

  • Refine and repeat

Time of year could be a helpful signal.

17 of 41

ML Adoption

Prompt: “ML adoption in 2022 with Google technologies” (Stable Diffusion 2)

18 of 41

18

Stage I: PoC

19 of 41

Considerations

  • ML is worth giving a go.
  • You have enough signals to support this.
  • You have a framework to decide for moving forward with ML or not.

19

20 of 41

Data?

  • Start with data samples that closely represent the problem you want to solve with ML.
  • Data acquisition can still be very ad-hoc at this stage.
  • May not dedicated warehousing and feature stores to store your data.

20

21 of 41

Start with existing ML models / APIs if possible

  • Saves you time
  • Saves you resources
  • Easy incorporation
  • Already been battle-tested

21

22 of 41

Focus on one ML problem at a time

  • Decouple the impact
  • Decouple the efforts
  • Decouple the technical complexities

22

23 of 41

Choose the right tooling for training

  • Easy to use (technical debts can be brutal)
  • Readable by the broader team
  • Minimal efforts when scaling (scale from notebooks to prod)
  • Easily maintainable
  • Fits well with other things: serverless hosting, on-device platforms, etc.

23

24 of 41

Choose the right tooling for training

24

25 of 41

Why?

  • Keras is known for its API design (tf.keras).
  • Lets you write models and train them using intuitive API.
  • Progressive disclosure of complexity:
    • Standard API for training, prediction, and evaluation.
    • But also possible to customize things arbitrarily.
  • First-class support for accelerators like TPUs.
  • Integrates well with XLA for accelerated computation.

25

26 of 41

Why?

  • [...]
  • Off-the-shelf support for TensorFlow Serving, TensorFlow Lite, TFX, TensorFlow Cloud, Vertex AI, etc.

26

27 of 41

Training considerations

  • If possible start training on a small sample (doesn’t fit all)
  • If you have a trained model:
    • Carefully evaluate the impact of data leakage and other feature descriptors.
    • Evaluate the model under different sub-populations.
    • Study the predictions to determine if they can help you reach decisions.

27

28 of 41

Interacting with the model for usage

  • Decide a way to interact with the model for application usage.
  • Determine the best way to consume the model: batched, online, on-device, etc.

28

29 of 41

Model consumption

  • Batched:
    • Predictions are not required immediately.
    • Recommended tooling:
      • An Apache Beam pipeline run on Dataflow on schedules
      • BigQuery with scheduled queries
  • Online:
    • Data can leave the device?
      • Docker + Kubernetes + GKE (microservices)
        • Prefer gRPC over REST
        • Prefer Go over Python

29

30 of 41

Model consumption

  • Online
    • Data cannot leave the (mobile) device
      • TensorFlow Lite
      • Firebase (for model management and communication with app)

30

31 of 41

Model consumption

  • Online
  • Other solutions for mobile ML
    • MLKit
    • MediaPipe

31

32 of 41

32

Stage II: MVP and Beyond

33 of 41

Scaling up

  • You probably did all the development through a notebook.
    • No shame! Everyone does it like that :)
  • Now, we need to graduate that to a bigger capacity.
  • What do we need?
    • Well tested practices and processes.

33

34 of 41

TFX comes to the rescue

34

35 of 41

Why TFX though?

  • Designed to operate at arbitrary scales.
  • Enforces good practices for:
    • Maintainability
    • Repeatability
    • Reproducibility
    • Adaptability
  • Flexibility
    • Run on any compatible executor (Kubeflow, Apache Airflow, Vertex AI, etc.)
    • Create your own components easily without worrying about the scalability part

35

36 of 41

Why TFX though?

Need more reasons?

36

37 of 41

Scaling up

  • TFX may not be a solution for all ML applications.
  • Develop what’s best for your scenario to reliably and safely deploy models to prod -
    • Continuous integration
    • Continuous delivery
    • Continuous evaluation / retraining

37

38 of 41

Parallely explore Vertex AI

Support for

  • Scalable model training (bring your own framework)
  • Scalable deployments (bring your own framework)
  • Model monitoring
  • … and much more!

38

39 of 41

Recommended readings

  • Designing Machine Learning Systems (Chip Huyen)
  • Machine Learning Design Patterns (Valliappa Lakshmanan, Sara Robinson, Michael Munn)
  • Google Cloud Architecture Guides: https://cloud.google.com/architecture/#/Technologies=AI_and_machine_learning
  • Full Stack Deep Learning (Charles Frye, Sergey Karayev, Josh Tobin)

39

40 of 41

Wrapping up

  • Assess if you’ve an ML-friendly problem statement
  • Don’t be afraid to launch without ML
  • Keep it simple
  • Keep it one at a time
  • Launch from notebooks to prod
  • Experiment rapidly and reliably

40

41 of 41

Thank you!

Sayak Paul (@RisingSayak)