JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 7

Serving and optimizing �ML workflows on �Heterogeneous Infrastructures

Yao Lu, Microsoft Research

with Yongji Wu, Mathew Lentz, Danyang Zhuo, Duke University

2 of 7

IoT/Hybrid cloud: data source & compute moving to edge

Examples / use cases

Intelligent traffic
Video surveillance
Auto driving
Wearable health
Personal assistant

Complex workflows
Tiered & heterogeneous infra.

Example: AI City challenge (multi-camera object tracking)

3 of 7

IoT/Hybrid cloud: data source & compute moving to edge

Challenge 1: Model choices

Challenge 2: Model placement

We want a system that

Serves ML on hetero infra
Optimizes the overall costs given:

Workflow & infra
Target accuracy & throughput

Example: Visual question answering (VQA)

4 of 7

Key ideas & solutions

A cost-based optimizer

Profiling strategies considering correlation between models

A flexible query processor based on Naiad & Timely Dataflow

Decides where to do what
For how to do on specific hardware: offload to virtualization & ML compilers

5 of 7

Key results

Baselines:

LB: Brute force
JB: Our solution
FF: First fit, BF: Best fit

(Diff. QO using our query processor)

PT: PyTorch
SP: Spark

(End-to-end systems w/ GPUs)

Up to 5x cheaper than Spark
36-58% cheaper than second best

6 of 7

Key results

Has low overheads:

QO in milliseconds (brute force may take >1h)

Runtime overhead: a few to 20% from native PyTorch

Can adapt to infra changes

7 of 7

Under review for VLDB 23

Will open-source

Looking for collaborators