1 of 18

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

1

Bufang Yang1, Lixing He1, Neiwen Ling1, Zhenyu Yan1,

Guoliang Xing1, Xian Shuai2, Xiaozhe Ren2, Xin Jiang2

The Chinese University of Hong Kong1,

Noah’s Ark Lab, Huawei2

2 of 18

Embedded AI is Everywhere

2

Wearables

Robots

AR / VR

Smart Home

3 of 18

Key Features of Embedded AI Systems

3

  • Real-time Predictions
  • Limited Resources

Memory, Power

Latency

Daily Activity Tracking

Robotic Sensing

Run Specialized Small Models

Walking

Running

Cycling

Human

Objects

Limited Classes

Closed-set

Open-set

Embedded AI models are task-specific, work in a closed-set manner

4 of 18

Challenges for Embedded AI Systems

4

  • Unable to handle new classes

Daily Activity Tracking

Interested Activities

  • Limited labeled data
  • Manual data annotation
  • On-device retraining

Time-consuming and impractical

Robotic Sensing

Performance Drop !

Objects in user’s home

5 of 18

The era of Foundation Models (FMs)

5

Rule-based

Data Processing

Statistical

Models

Deep Neural

Networks

Pre-training

& Fine-tuning

Foundation Models

  • Pretrained on huge amount of data
  • Adapt to various downstream tasks

Foundation Models:

On the Opportunities and Risks of Foundation Models, arxiv, 2022.

SayCan

LM-NAV

ImageBind and CLIP

ChatGPT and GPT4

6 of 18

Multi-modal FM is Ideal for Embedded Systems

6

  • Remarkable Generalization Capability

IN1K

P365

NYU-D

SUN-D

VGGS

ESC

LLVIP

Ego4D

Closed-set, SM

0.1

0.27

10.0

5.26

0.32

2.75

50.0

0.9

FM (ImageBind)

77.7

45.4

54.0

35.1

27.8

66.9

63.4

25.0

Huge Gap

Image

Audio

Depth

IMU

Text

Encoder

Encoder

Encoder

Encoder

Encoder

A Unified Embedding Space

Multi-modal FM

Ideal for IoT Systems

User-1

User-2

User-3

Text Embedding

Open-set Recognition

7 of 18

System Limitations of FMs

7

  • Poor real-time performance under limited network bandwidth
  • Hard to deploy on the edge

Smartphone

Smartwatch

Jetson Nano

Models

Params.

FLOPS

Nano

Small Models

MobileNet

3.5M

0.3B

36.8ms

ResNet18

11.7M

1.8B

30.5ms

Foundation Models

ImageBind

1172M

167.3B

N.A.

CLIP-L/14

407.8M

61.5B

N.A.

300-500 times larger

Up to 600 ms

Cloud-centric (e.g. Microsoft Azure)

How to leverage the rich knowledge of FMs on the edge?

FM systems

8 of 18

Putting FMs on the Edge?

8

  • Model Compression

[DIME-FM, ICCV ’23]

  • Resources, performance drop

-- Pruning, Quantization, Knowledge distillation

[MLC-LLM, https://github.com/mlc-ai/mlc-llm]

  • Dynamic Inference

[PersEPhonEE, HotMobile ’21]

  • Heavyweight early-exit heads

-- Early-exit, model cascade

  • Limited memory of edge devices

[Tabi, EuroSys ’23]

9 of 18

How about Edge-cloud Collaboration?

9

  • Edge-cloud Collaboration Systems

[SPINN, MobiCom ’20]

-- Split computing

[Neurosurgeon, ACM SIGARCH Computer Architecture News ’17]

-- Big-small model collaboration

[AppealNet, DAC ’21]

[Shoggoth, DAC ’23]

Can not generalize to new classes

Highly sensitive to network fluctuations

10 of 18

Our Key Idea

10

User-1

User-2

User-3

User-N

Foundation Model

Customized Small Models

Cloud

Edge

Hard Samples

Periodically Update Customized Small Models

11 of 18

EdgeFM

11

  • Collaborative system between FMs and customized small models

Cloud-side FM

Knowledge Query

Small Model Customization

EdgeFM Inference Engine

Network Conditions

Uncertainty Quantification

Query FM / Run on Edge

Sensor Data

Customized Small Model

Update

12 of 18

Knowledge Query

12

Closed-set small NNs

Open-set FMs

Pre-defined Classes

FM Unified Feature Space

Mismatching

  • Heterogenous Feature Space

Feature Projection Network

  • Content-aware Data Uploading

Upload all the collected data

Huge Transmission Overhead

Content-aware Data Uploading

informative

13 of 18

Two-stage Model Customization

13

FM Sensor Encoder

Small model sensor embedding

Customized Small Model

First stage:

Mean Square Error

Unlabeled

Data

FM sensor embedding

Similarity Matching

Pseudo text embedding

Second stage:

Contrastive Loss

Text Embedding Pool

14 of 18

EdgeFM Inference Engine

14

Yes

No

On-device Prediction

FM Prediction

Prediction

Aggregate

Dynamic Model Switching

  • Content-aware Model Switching

Uncertainty score vs threshold

  • Dynamic Network Adaptation

[ thre, latency ]

Threshold-searching Table

Network Bandwidth

Real-time Constraints

Bandwidth-aware threshold

15 of 18

End-to-end Experiment

15

2 NVIDIA GEFORE

RTX3090 GPUs

NVIDIA

Jetson Xavier

+

  • Experiment setup

Cruise in an office room

mobile robot

  • Edge-cloud Implementation

MobileNetV2, ResNet18, EfficientNet-B1

  • Evaluation Models
  • FMs:

ImageBind, CLIP

  • Small Models:

16 of 18

Evaluation

16

  • Adaptability to environment change
  • Adjust the edge-cloud processing rate ( edge: 80% 🡪 40%, cloud: 20% 🡪 60% )
  • Overall accuracy is always close to the original FM (~78%)

Move to a new environment

17 of 18

Evaluation

17

  • Baselines
  • PersEPhonEE (early-exit)
  • SPINN (edge-cloud model splitting)
  • Cloud-centric

Dataset

Tasks

FMs

SC15

Activity Recognition

ImageBind/CLIP

UCF101

Activity Recognition

ImageBind/CLIP

SC40

Indoor Scene Recognition

ImageBind/CLIP

FLO102

Flower Recognition

ImageBind/CLIP

ESC50

Audio Recognition

ImageBind

  • Evaluation Datasets
  • Reduce 3.2x end-to-end latency

18 of 18

Thanks!

  • EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

  • Bufang Yang, Lixing He, Neiwen Ling, Zhenyu Yan, Guoliang Xing, Xian Shuai, Xiaozhe Ren and Xin Jiang

18

Visit CUHK AIoT Lab