1 of 18

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

1

Bufang Yang¹, Lixing He¹, Neiwen Ling¹, Zhenyu Yan¹,

Guoliang Xing¹, Xian Shuai², Xiaozhe Ren², Xin Jiang²

The Chinese University of Hong Kong¹,

Noah’s Ark Lab, Huawei²

2 of 18

Embedded AI is Everywhere

2

Wearables

Robots

AR / VR

Smart Home

3 of 18

Key Features of Embedded AI Systems

3

Real-time Predictions

Limited Resources

Memory, Power

Latency

Daily Activity Tracking

Robotic Sensing

Run Specialized Small Models

Walking

Running

Cycling

Human

Objects

Limited Classes

Closed-set

Open-set

Embedded AI models are task-specific, work in a closed-set manner

4 of 18

Challenges for Embedded AI Systems

4

Unable to handle new classes

Daily Activity Tracking

Interested Activities

Limited labeled data

Manual data annotation
On-device retraining

Time-consuming and impractical

Robotic Sensing

Performance Drop !

Objects in user’s home

5 of 18

The era of Foundation Models (FMs)

5

Rule-based

Data Processing

Statistical

Models

Deep Neural

Networks

Pre-training

& Fine-tuning

Foundation Models

Pretrained on huge amount of data
Adapt to various downstream tasks

Foundation Models:

On the Opportunities and Risks of Foundation Models, arxiv, 2022.

SayCan

LM-NAV

ImageBind and CLIP

ChatGPT and GPT4

6 of 18

Multi-modal FM is Ideal for Embedded Systems

6

Remarkable Generalization Capability

	IN1K	P365	NYU-D	SUN-D	VGGS	ESC	LLVIP	Ego4D
Closed-set, SM	0.1	0.27	10.0	5.26	0.32	2.75	50.0	0.9
FM (ImageBind)	77.7	45.4	54.0	35.1	27.8	66.9	63.4	25.0

Huge Gap

Image

Audio

Depth

IMU

Text

…

Encoder

A Unified Embedding Space

Multi-modal FM

Ideal for IoT Systems

User-1

User-2

User-3

Text Embedding

Open-set Recognition

7 of 18

System Limitations of FMs

7

Poor real-time performance under limited network bandwidth

Hard to deploy on the edge

Smartphone

Smartwatch

Jetson Nano

	Models	Params.	FLOPS	Nano
Small Models	MobileNet	3.5M	0.3B	36.8ms
Small Models	ResNet18	11.7M	1.8B	30.5ms
Foundation Models	ImageBind	1172M	167.3B	N.A.
Foundation Models	CLIP-L/14	407.8M	61.5B	N.A.

300-500 times larger

Up to 600 ms

Cloud-centric (e.g. Microsoft Azure)

How to leverage the rich knowledge of FMs on the edge?

FM systems

8 of 18

Putting FMs on the Edge?

8

Model Compression

[DIME-FM, ICCV ’23]

Resources, performance drop

-- Pruning, Quantization, Knowledge distillation

[MLC-LLM, https://github.com/mlc-ai/mlc-llm]

Dynamic Inference

[PersEPhonEE, HotMobile ’21]

Heavyweight early-exit heads

-- Early-exit, model cascade

Limited memory of edge devices

[Tabi, EuroSys ’23]

9 of 18

How about Edge-cloud Collaboration?

9

Edge-cloud Collaboration Systems

[SPINN, MobiCom ’20]

-- Split computing

[Neurosurgeon, ACM SIGARCH Computer Architecture News ’17]

-- Big-small model collaboration

[AppealNet, DAC ’21]

[Shoggoth, DAC ’23]

Can not generalize to new classes

Highly sensitive to network fluctuations

10 of 18

Our Key Idea

10

User-1

User-2

User-3

User-N

Foundation Model

Customized Small Models

Cloud

Edge

Hard Samples

Periodically Update Customized Small Models

11 of 18

EdgeFM

11

Collaborative system between FMs and customized small models

Cloud-side FM

Knowledge Query

Small Model Customization

EdgeFM Inference Engine

Network Conditions

Uncertainty Quantification

Query FM / Run on Edge

Sensor Data

Customized Small Model

Update

12 of 18

Knowledge Query

12

Closed-set small NNs

Open-set FMs

Pre-defined Classes

FM Unified Feature Space

Mismatching

Heterogenous Feature Space

Feature Projection Network

Content-aware Data Uploading

Upload all the collected data

Huge Transmission Overhead

Content-aware Data Uploading

informative

13 of 18

Two-stage Model Customization

13

FM Sensor Encoder

Small model sensor embedding

Customized Small Model

First stage:

Mean Square Error

Unlabeled

Data

FM sensor embedding

Similarity Matching

Pseudo text embedding

Second stage:

Contrastive Loss

Text Embedding Pool

14 of 18

EdgeFM Inference Engine

14

Yes

No

On-device Prediction

FM Prediction

Prediction

Aggregate

Dynamic Model Switching

Content-aware Model Switching

Uncertainty score vs threshold

Dynamic Network Adaptation

[ thre, latency ]

Threshold-searching Table

Network Bandwidth

Real-time Constraints

Bandwidth-aware threshold

15 of 18

End-to-end Experiment

15

2 NVIDIA GEFORE

RTX3090 GPUs

NVIDIA

Jetson Xavier

+

Experiment setup

Cruise in an office room

mobile robot

Edge-cloud Implementation

MobileNetV2, ResNet18, EfficientNet-B1

Evaluation Models

FMs:

ImageBind, CLIP

Small Models:

16 of 18

Evaluation

16

Adaptability to environment change

Adjust the edge-cloud processing rate ( edge: 80% 🡪 40%, cloud: 20% 🡪 60% )
Overall accuracy is always close to the original FM (~78%)

Move to a new environment

17 of 18

Evaluation

17

Baselines

PersEPhonEE (early-exit)
SPINN (edge-cloud model splitting)
Cloud-centric

Dataset	Tasks	FMs
SC15	Activity Recognition	ImageBind/CLIP
UCF101	Activity Recognition	ImageBind/CLIP
SC40	Indoor Scene Recognition	ImageBind/CLIP
FLO102	Flower Recognition	ImageBind/CLIP
ESC50	Audio Recognition	ImageBind

Evaluation Datasets

Reduce 3.2x end-to-end latency

18 of 18

Thanks!

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

Bufang Yang, Lixing He, Neiwen Ling, Zhenyu Yan, Guoliang Xing, Xian Shuai, Xiaozhe Ren and Xin Jiang

18

Visit CUHK AIoT Lab