1 of 29

CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale

Hongpeng Guo*, Shuochao Yao, Zhe Yang*, Qian Zhou*, Klara Nahrstedt*

*

2 of 29

Large Scale Video Analytics

Traffic Cameras deployed in New York city.

3 of 29

Video Analytics Pipeline

Vehicle detection.

Pedestrian tracking.

network-exhaustive

Resource

Intensive

+

:=

compute-intensive

4 of 29

Existing Solutions

  • Frame Filtering:

    • Reducto [Sigcomm’20], Focus [OSDI’18];
  • Resolution Configuring:

    • DDS [Sigcomm’20], Vigil [Mobicom’15], DeepDecision. [Infocom’18]..
  • Cameras Scheduling:

    • Spatula [SEC’20], Caesar[Sensys’19],...

Independent cameras solution

No comprehensive coverage

5 of 29

CrossRoI Application Scenario

1

3

2

4

5

5

2

3

4

1

t1

t1

Focus on a fleet of closely located cameras.

Observation: Cross-camera content correlation & redundancy.

Queries can be answered with object detected once:

  • Vehicle detection;

  • Pedestrians localization;

  • Car counting;

6 of 29

CrossRoI Challenges & Solutions

Challenges

How to establish cross-camera data associations?

How to reduce data-intensity with the cross-camera correlations?

How to best transform reduced video data into network & computation resource reduction.

Solutions

Re-identification & statistical filters.

RoI-masks generated from an optimization framework.

Specially designed video compressor & CNN kernel to optimize network & computation usage.

7 of 29

CrossRoI: Offline Phase Overview

CrossRoI Server

ReID & Filtering

Cross-camera Profiling

RoI Optimization

RoI Masks

8 of 29

CrossRoI: Offline Phase Overview

CrossRoI Server

ReID & Filtering

Cross-camera Profiling

RoI Optimization

RoI Masks

9 of 29

CrossRoI: Online Phase Overview

CrossRoI Server

RoI optimized CNN Model

Query Answers

10 of 29

CrossRoI: ReID & Results Filtering

Cross-camera data association is based on the result of object re-identification (ReID).

However, ReID results may not be accurate.

Categorize all detections in C1 into positive or negative based on if there is a corresponding appearance in C2.

  • Positive appearance in C2 exists

  • Negative appearance in C2 not exists

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

C1

C2

11 of 29

CrossRoI: Raw ReID Results Filtering

Further categorize the detections with True/ False labels based on correctness.

  • True Positive

  • True Negative

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

C1

C2

12 of 29

CrossRoI: Raw ReID Results Filtering

Further categorize the detections with True/ False labels based on correctness.

  • True Positive

  • True Negative

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

C1

C2

  • False Positive: Different objects being identified as the same falsely.

13 of 29

CrossRoI: Raw ReID Results Filtering

Further categorize the detections with True/ False labels based on correctness.

  • True Positive

  • True Negative

We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)

C1

C2

  • False Positive: Different objects being identified as the same falsely.
  • False Negative: Fail to identify same objects across cameras, thus generating a false negative identification.

14 of 29

CrossRoI: Raw ReID Results Filtering

Profile on a 5 camera dataset and compare with ground truth.

Raw ReID results has significant errors.

Observations:

  • True Positive >> False Positive & True Negative >> False Negative

  • Intrinsic physical correlations between viewports.

15 of 29

CrossRoI: Raw ReID Results Filtering

Use statistical filters to rectify error-prone raw ReID results.

Remove False Positive

Remove False Negative

16 of 29

CrossRoI: Cross-camera data association

Divide frame into tiles & Associate objects with tiles..

Profiling overtime and generating a lookup table.

O1: {3, 4, 9, 10}

O2: {11}

O3: {9, 10, 15, 16}

At timestamp T = t1

1

2

3

24

O4: {2, 8}

O5: {3, 9}

O3: {7, 8, 13, 14}

O1

O2

O3

O3

O4

O5

C1

C2

17 of 29

CrossRoI: Cross-camera data association

Profiling overtime and generating a lookup table.

Timestamp

Objects detected

Appearance Regions

t1

O1,

O2,

O3,

O4,

O5.

O1: C1-{3, 4, 9, 10};

O2: C1-{11};

O3: C1-{9, 10, 15, 16},

C2-{7, 8, 13, 14};

O4: C2-{2, 8};

O5: C2-{3, 9}.

...

...

...

tn

...

...

The lookup table ensembles cross-camera data associations.

18 of 29

CrossRoI: RoI Masks Generation

Select minimum number of tiles among cameras, such that every object in this field could be observed by at least one camera in any timestamp.

A Combinatorial optimization Problem.

1

2

3

24

O1

O2

O3

O3

O4

O5

C1: {3, 4, 9, 10, 11, 15, 16}

C2: {2, 3, 8, 9}

At timestamp T = t1

19 of 29

A RoI Mask Generation example over 5 cameras through 1min synchronized video.

Format: camera_ID: (#selected_tiles / # overall_tiles)

C2: (814 / 1296)

C1: (439 / 1296)

C3: (303 / 1296)

C5: (448 / 768)

C4: (496 / 1296)

CrossRoI: RoI Masks Generation

About 58% content removed.

20 of 29

Evaluation

  • NVIDIA AI City Challenge Dataset.

5 sync videos for a traffic crossing, 3 min length, more than 30K vehicle bbox detections.

1 mins video for offline profiling, 2 mins video for online evaluation.

21 of 29

Evaluation

  • Unique Vehicle Detection

Detect the vehicles at least once across all five cameras in any timestamp.

CrossRoI reduced 42% network overhead and 25% end-to-end latency, while keeping 99.9% detection accuracy, compared to the baseline.

22 of 29

Evaluation

  • Compared to Reducto [SIGCOMM’20]

23 of 29

Q&A

hg5@illinois.edu

24 of 29

Backup Slides

25 of 29

CrossRoI: System Overview

In device data flow

Offline video data flow

RoI masks data flow

Online video streams

CrossROI operates in two phases:

  1. An offline-phase to profile video clips and (a) learn cross-camera data associations, and (b) generate optimized RoI masks.

26 of 29

CrossRoI: System Overview

In device data flow

Offline video data flow

RoI masks data flow

Online video streams

CrossROI operates in two phases:

  • An offline-phase to profile video clips and (a) learn cross-camera data associations, and (b) generate optimized RoI masks.

  • An online-phase applying the RoI masks to reduce network usage for video streaming and computation workloads for CNN inference.

27 of 29

CrossRoI: Video Compression

Tile-based video streaming hurts video compression efficacy substantially.

  • Reduce macro-block level matching in video compression codecs, e.g. H.264/5

Video compression efficacy characterization. Profile video sizes when the video is split into m * n tiles.

28 of 29

CrossRoI: Video Compression

Tile-based video streaming hurts video compression efficacy substantially.

Design a Tile Grouping Algorithm to merge small tiles into larger ones.

(a) Before Tile Grouping

(b) After Tile Grouping

1

2

3

29 of 29

CrossRoI: RoI-based Object Detection

Based on SBNet[1], we designed a RoI-Yolo object detector which only focuses on the RoI masks region of video frames, and thus boots CNN inference speed.

[1] Ren, Mengye, et al. "Sbnet: Sparse blocks network for fast inference." CVPR. 2018.