1 of 15

Resilient Edge-Cloud Autonomous Learning with Timely inferences

Haider Abdelrahman, James Chang, Lakshya Gour, Tanushree Mehta, Shreya Venugopal

Advisor: Prof. Anand Sarwate

2 of 15

The Team

Yunhyuk Chang

Electrical & Computer Engineering, 2024

Haider Abdelrahman

Electrical & Computer Engineering, 2026

Shreya Venugopal

Computer Science, �Grad student, 2024

Lakshya Gour

Computer Science + Math, 2026

Tanushree Mehta

Electrical & Computer Engineering, 2026

3 of 15

The Problem

Real-time machine learning models are getting more complex
Running them on less powerful (mobile) devices is becoming difficult b/c of the need for lower latency
Solution: MEC(Mobile-edge computing)

3

4 of 15

What is MEC (Mobile-Edge Computing)?

A network architecture that brings computation and storage capabilities closer to the end-users, reducing latency and improving real-time application performance.

4

5 of 15

Which part are we interested in?

5

Threshold: Confidence level at which mobile asks for help
Asking for help: Inference confidence < Threshold, request help
Average Latency: Total time to perform task

As you vary the threshold, how does the average latency change(over the dataset)?

6 of 15

Experimental Setup

6

7 of 15

Models and Datasets

Data:

CIFAR-10
10 categories
Dataset size: 60,000
Test set size: 10,000

Models:

MobileNetV2
DenseNet

7

8 of 15

Findings

9 of 15

Baseline

Mobile Device: Small Neural Network Edge Device: Oracle Time Synchronization Protocol: PTP

(MobileNetV2) (100% Accuracy) Network Connection: Ethernet

9

10 of 15

CPU Restriction and Network Delay

Mobile Device: Small Neural Network Edge Device: Oracle Time Synchronization Protocol: PTP

(MobileNetV2) (100% Accuracy) Network Connection: Ethernet

CPU Limit: 1.2 Ghz

Network: 8ms delay +/- 3ms

10

11 of 15

Queuing

Mobile Device: MobileNetV2 Edge Device: Oracle Time Synchronization Protocol: PTP

(85% Accuracy) (100% Accuracy) Network Connection: Ethernet

Queue at Edge
Mobile continues to inference on next image(multithreading) as it waits for the Edge response
Range of latency: 7-12ms

11

12 of 15

Out of Distribution Analysis

12

Mobile has less class capability than Edge
Separates unknown image to confused class
Range of latency: 2.5-5ms

13 of 15

Conclusions

Implementing threshold:

Lower latency inference than using only the Edge device
Higher accuracy inference than using only the Mobile device

Emulating real life by restricting the CPU speed & network has high impact on latency

Introducing parallelization (multithreading) during inference allows for lower latency and quicker predictions

13

14 of 15

Potential Next Steps

Continue to better emulate real life scenarios
Better automate testing and data collection
Explore more complex problems

Split Computing
Early Exiting
Multiple Clients and Servers
Different Queuing Policies

14

15 of 15

Acknowledgements

15

Sponsor(s): nVerses Capital

Project head: Prof. Anand Sarwate

Special thanks: Prof. Waheed U. Bajwa, Ivan Seskar, Jenny Shane, Prof. Roy Yates, & all PhD students who helped!

This material is based upon work supported by the National Science Foundation under grant no. CNS-2148104 and is supported in part by funds from federal agency and industry partners as specified in the Resilient & Intelligent NextG Systems (RINGS) program.