1 of 7

Serving and optimizing �ML workflows on �Heterogeneous Infrastructures

Yao Lu, Microsoft Research

with Yongji Wu, Mathew Lentz, Danyang Zhuo, Duke University

2 of 7

IoT/Hybrid cloud: data source & compute moving to edge

  • Examples / use cases
    • Intelligent traffic
    • Video surveillance
    • Auto driving
    • Wearable health
    • Personal assistant

  • Complex workflows
  • Tiered & heterogeneous infra.

Example: AI City challenge (multi-camera object tracking)

3 of 7

IoT/Hybrid cloud: data source & compute moving to edge

  • Challenge 1: Model choices

  • Challenge 2: Model placement

  • We want a system that
    • Serves ML on hetero infra
    • Optimizes the overall costs given:
      • Workflow & infra
      • Target accuracy & throughput

Example: Visual question answering (VQA)

4 of 7

Key ideas & solutions

  • A cost-based optimizer
    • Profiling strategies considering correlation between models

  • A flexible query processor based on Naiad & Timely Dataflow
    • Decides where to do what
    • For how to do on specific hardware: offload to virtualization & ML compilers

5 of 7

Key results

  • Baselines:
    • LB: Brute force
    • JB: Our solution
    • FF: First fit, BF: Best fit

(Diff. QO using our query processor)

    • PT: PyTorch
    • SP: Spark

(End-to-end systems w/ GPUs)

  • Up to 5x cheaper than Spark
  • 36-58% cheaper than second best

6 of 7

Key results

  • Has low overheads:
    • QO in milliseconds (brute force may take >1h)

    • Runtime overhead: a few to 20% from native PyTorch

  • Can adapt to infra changes

7 of 7

  • Under review for VLDB 23

  • Will open-source

  • Looking for collaborators