EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
1
Bufang Yang1, Lixing He1, Neiwen Ling1, Zhenyu Yan1,
Guoliang Xing1, Xian Shuai2, Xiaozhe Ren2, Xin Jiang2
The Chinese University of Hong Kong1,
Noah’s Ark Lab, Huawei2
Embedded AI is Everywhere
2
Wearables
Robots
AR / VR
Smart Home
Key Features of Embedded AI Systems
3
Memory, Power
Latency
Daily Activity Tracking
Robotic Sensing
Run Specialized Small Models
Walking
Running
Cycling
Human
Objects
Limited Classes
Closed-set
Open-set
Embedded AI models are task-specific, work in a closed-set manner
Challenges for Embedded AI Systems
4
Daily Activity Tracking
Interested Activities
Time-consuming and impractical
Robotic Sensing
Performance Drop !
Objects in user’s home
The era of Foundation Models (FMs)
5
Rule-based
Data Processing
Statistical
Models
Deep Neural
Networks
Pre-training
& Fine-tuning
Foundation Models
Foundation Models:
On the Opportunities and Risks of Foundation Models, arxiv, 2022.
SayCan
LM-NAV
ImageBind and CLIP
ChatGPT and GPT4
Multi-modal FM is Ideal for Embedded Systems
6
| IN1K | P365 | NYU-D | SUN-D | VGGS | ESC | LLVIP | Ego4D |
Closed-set, SM | 0.1 | 0.27 | 10.0 | 5.26 | 0.32 | 2.75 | 50.0 | 0.9 |
FM (ImageBind) | 77.7 | 45.4 | 54.0 | 35.1 | 27.8 | 66.9 | 63.4 | 25.0 |
Huge Gap
Image
Audio
Depth
IMU
Text
…
…
Encoder
Encoder
Encoder
Encoder
Encoder
A Unified Embedding Space
Multi-modal FM
Ideal for IoT Systems
User-1
User-2
User-3
Text Embedding
Open-set Recognition
System Limitations of FMs
7
Smartphone
Smartwatch
Jetson Nano
| Models | Params. | FLOPS | Nano |
Small Models | MobileNet | 3.5M | 0.3B | 36.8ms |
ResNet18 | 11.7M | 1.8B | 30.5ms | |
Foundation Models | ImageBind | 1172M | 167.3B | N.A. |
CLIP-L/14 | 407.8M | 61.5B | N.A. |
300-500 times larger
Up to 600 ms
Cloud-centric (e.g. Microsoft Azure)
How to leverage the rich knowledge of FMs on the edge?
FM systems
Putting FMs on the Edge?
8
[DIME-FM, ICCV ’23]
-- Pruning, Quantization, Knowledge distillation
[MLC-LLM, https://github.com/mlc-ai/mlc-llm]
[PersEPhonEE, HotMobile ’21]
-- Early-exit, model cascade
[Tabi, EuroSys ’23]
How about Edge-cloud Collaboration?
9
[SPINN, MobiCom ’20]
-- Split computing
[Neurosurgeon, ACM SIGARCH Computer Architecture News ’17]
-- Big-small model collaboration
[AppealNet, DAC ’21]
[Shoggoth, DAC ’23]
Can not generalize to new classes
Highly sensitive to network fluctuations
Our Key Idea
10
User-1
User-2
User-3
User-N
Foundation Model
Customized Small Models
Cloud
Edge
Hard Samples
Periodically Update Customized Small Models
EdgeFM
11
Cloud-side FM
Knowledge Query
Small Model Customization
EdgeFM Inference Engine
Network Conditions
Uncertainty Quantification
Query FM / Run on Edge
Sensor Data
Customized Small Model
Update
Knowledge Query
12
Closed-set small NNs
Open-set FMs
Pre-defined Classes
FM Unified Feature Space
Mismatching
Feature Projection Network
Upload all the collected data
Huge Transmission Overhead
Content-aware Data Uploading
informative
Two-stage Model Customization
13
FM Sensor Encoder
Small model sensor embedding
Customized Small Model
First stage:
Mean Square Error
Unlabeled
Data
FM sensor embedding
Similarity Matching
Pseudo text embedding
Second stage:
Contrastive Loss
Text Embedding Pool
EdgeFM Inference Engine
14
Yes
No
On-device Prediction
FM Prediction
Prediction
Aggregate
Dynamic Model Switching
Uncertainty score vs threshold
[ thre, latency ]
Threshold-searching Table
Network Bandwidth
Real-time Constraints
Bandwidth-aware threshold
End-to-end Experiment
15
2 NVIDIA GEFORE
RTX3090 GPUs
NVIDIA
Jetson Xavier
+
Cruise in an office room
mobile robot
MobileNetV2, ResNet18, EfficientNet-B1
ImageBind, CLIP
Evaluation
16
Move to a new environment
Evaluation
17
Dataset | Tasks | FMs |
SC15 | Activity Recognition | ImageBind/CLIP |
UCF101 | Activity Recognition | ImageBind/CLIP |
SC40 | Indoor Scene Recognition | ImageBind/CLIP |
FLO102 | Flower Recognition | ImageBind/CLIP |
ESC50 | Audio Recognition | ImageBind |
Thanks!
18
Visit CUHK AIoT Lab