1 of 29

CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale

Hongpeng Guo*, Shuochao Yao†, Zhe Yang*, Qian Zhou*, Klara Nahrstedt*

†

2 of 29

Large Scale Video Analytics

Traffic Cameras deployed in New York city.

3 of 29

Video Analytics Pipeline

Vehicle detection.

Pedestrian tracking.

network-exhaustive

Resource

Intensive

compute-intensive

4 of 29

Existing Solutions

Frame Filtering:

Reducto [Sigcomm’20], Focus [OSDI’18];

Resolution Configuring:

DDS [Sigcomm’20], Vigil [Mobicom’15], DeepDecision. [Infocom’18]..

Cameras Scheduling:

Spatula [SEC’20], Caesar[Sensys’19],...

Independent cameras solution

No comprehensive coverage

5 of 29

CrossRoI Application Scenario

Focus on a fleet of closely located cameras.

Observation: Cross-camera content correlation & redundancy.

Queries can be answered with object detected once:

Vehicle detection;

Pedestrians localization;

Car counting;

6 of 29

CrossRoI Challenges & Solutions

Challenges

How to establish cross-camera data associations?

How to reduce data-intensity with the cross-camera correlations?

How to best transform reduced video data into network & computation resource reduction.

Solutions

Re-identification & statistical filters.

RoI-masks generated from an optimization framework.

Specially designed video compressor & CNN kernel to optimize network & computation usage.

7 of 29

CrossRoI: Offline Phase Overview

CrossRoI Server

ReID & Filtering

Cross-camera Profiling

RoI Optimization

RoI Masks

8 of 29

CrossRoI: Offline Phase Overview

CrossRoI Server

ReID & Filtering

Cross-camera Profiling

RoI Optimization

RoI Masks

9 of 29

CrossRoI: Online Phase Overview

CrossRoI Server

RoI optimized CNN Model

Query Answers

10 of 29

CrossRoI: ReID & Results Filtering

Cross-camera data association is based on the result of object re-identification (ReID).

However, ReID results may not be accurate.

Categorize all detections in C1 into positive or negative based on if there is a corresponding appearance in C2.

Positive appearance in C2 exists

Negative appearance in C2 not exists

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

11 of 29

CrossRoI: Raw ReID Results Filtering

Further categorize the detections with True/ False labels based on correctness.

True Positive

True Negative

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

12 of 29

CrossRoI: Raw ReID Results Filtering

Further categorize the detections with True/ False labels based on correctness.

True Positive

True Negative

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

False Positive: Different objects being identified as the same falsely.

13 of 29

CrossRoI: Raw ReID Results Filtering

Further categorize the detections with True/ False labels based on correctness.

True Positive

True Negative

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

False Positive: Different objects being identified as the same falsely.

False Negative: Fail to identify same objects across cameras, thus generating a false negative identification.

14 of 29

CrossRoI: Raw ReID Results Filtering

Profile on a 5 camera dataset and compare with ground truth.

Raw ReID results has significant errors.

Observations:

True Positive >> False Positive & True Negative >> False Negative

Intrinsic physical correlations between viewports.

15 of 29

CrossRoI: Raw ReID Results Filtering

Use statistical filters to rectify error-prone raw ReID results.

Remove False Positive

Remove False Negative

16 of 29

CrossRoI: Cross-camera data association

Divide frame into tiles & Associate objects with tiles..

Profiling overtime and generating a lookup table.

O1: {3, 4, 9, 10}

O2: {11}

O3: {9, 10, 15, 16}

At timestamp T = t1

O4: {2, 8}

O5: {3, 9}

O3: {7, 8, 13, 14}

17 of 29

CrossRoI: Cross-camera data association

Profiling overtime and generating a lookup table.

Timestamp	Objects detected	Appearance Regions
t1	O1, O2, O3, O4, O5.	O1: C1-{3, 4, 9, 10}; O2: C1-{11}; O3: C1-{9, 10, 15, 16}, C2-{7, 8, 13, 14}; O4: C2-{2, 8}; O5: C2-{3, 9}.
...	...	...
tn	...	...

The lookup table ensembles cross-camera data associations.

18 of 29

CrossRoI: RoI Masks Generation

Select minimum number of tiles among cameras, such that every object in this field could be observed by at least one camera in any timestamp.

A Combinatorial optimization Problem.

C1: {3, 4, 9, 10, 11, 15, 16}

C2: {2, 3, 8, 9}

At timestamp T = t1

19 of 29

A RoI Mask Generation example over 5 cameras through 1min synchronized video.

Format: camera_ID: (#selected_tiles / # overall_tiles)

C2: (814 / 1296)

C1: (439 / 1296)

C3: (303 / 1296)

C5: (448 / 768)

C4: (496 / 1296)

CrossRoI: RoI Masks Generation

About 58% content removed.

20 of 29

Evaluation

NVIDIA AI City Challenge Dataset.

5 sync videos for a traffic crossing, 3 min length, more than 30K vehicle bbox detections.

1 mins video for offline profiling, 2 mins video for online evaluation.

21 of 29

Evaluation

Unique Vehicle Detection

Detect the vehicles at least once across all five cameras in any timestamp.

CrossRoI reduced 42% network overhead and 25% end-to-end latency, while keeping 99.9% detection accuracy, compared to the baseline.

22 of 29

Evaluation

Compared to Reducto [SIGCOMM’20]

23 of 29

Q&A

hg5@illinois.edu

24 of 29

Backup Slides

25 of 29

CrossRoI: System Overview

In device data flow

Offline video data flow

RoI masks data flow

Online video streams

CrossROI operates in two phases:

An offline-phase to profile video clips and (a) learn cross-camera data associations, and (b) generate optimized RoI masks.

26 of 29

CrossRoI: System Overview

In device data flow

Offline video data flow

RoI masks data flow

Online video streams

CrossROI operates in two phases:

An offline-phase to profile video clips and (a) learn cross-camera data associations, and (b) generate optimized RoI masks.

An online-phase applying the RoI masks to reduce network usage for video streaming and computation workloads for CNN inference.

27 of 29

CrossRoI: Video Compression

Tile-based video streaming hurts video compression efficacy substantially.

Reduce macro-block level matching in video compression codecs, e.g. H.264/5

Video compression efficacy characterization. Profile video sizes when the video is split into m * n tiles.

28 of 29

CrossRoI: Video Compression

Tile-based video streaming hurts video compression efficacy substantially.

Design a Tile Grouping Algorithm to merge small tiles into larger ones.

(a) Before Tile Grouping

(b) After Tile Grouping

29 of 29

CrossRoI: RoI-based Object Detection

Based on SBNet[1], we designed a RoI-Yolo object detector which only focuses on the RoI masks region of video frames, and thus boots CNN inference speed.

[1] Ren, Mengye, et al. "Sbnet: Sparse blocks network for fast inference." CVPR. 2018.