CrossRoI: Cross-camera Region of Interest Optimization for Efficient Real Time Video Analytics at Scale
Hongpeng Guo*, Shuochao Yao†, Zhe Yang*, Qian Zhou*, Klara Nahrstedt*
*
†
Large Scale Video Analytics
Traffic Cameras deployed in New York city.
Video Analytics Pipeline
Vehicle detection.
Pedestrian tracking.
network-exhaustive
Resource
Intensive
+
:=
compute-intensive
Existing Solutions
Independent cameras solution
No comprehensive coverage
CrossRoI Application Scenario
1
3
2
4
5
5
2
3
4
1
t1
t1
Focus on a fleet of closely located cameras.
Observation: Cross-camera content correlation & redundancy.
Queries can be answered with object detected once:
CrossRoI Challenges & Solutions
Challenges
How to establish cross-camera data associations?
How to reduce data-intensity with the cross-camera correlations?
How to best transform reduced video data into network & computation resource reduction.
Solutions
Re-identification & statistical filters.
RoI-masks generated from an optimization framework.
Specially designed video compressor & CNN kernel to optimize network & computation usage.
CrossRoI: Offline Phase Overview
CrossRoI Server
ReID & Filtering
Cross-camera Profiling
RoI Optimization
RoI Masks
CrossRoI: Offline Phase Overview
CrossRoI Server
ReID & Filtering
Cross-camera Profiling
RoI Optimization
RoI Masks
CrossRoI: Online Phase Overview
CrossRoI Server
RoI optimized CNN Model
Query Answers
CrossRoI: ReID & Results Filtering
Cross-camera data association is based on the result of object re-identification (ReID).
However, ReID results may not be accurate.
Categorize all detections in C1 into positive or negative based on if there is a corresponding appearance in C2.
We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)
C1
C2
CrossRoI: Raw ReID Results Filtering
Further categorize the detections with True/ False labels based on correctness.
We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)
C1
C2
CrossRoI: Raw ReID Results Filtering
Further categorize the detections with True/ False labels based on correctness.
We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)
C1
C2
CrossRoI: Raw ReID Results Filtering
Further categorize the detections with True/ False labels based on correctness.
We study the pairwise relations between two cameras to learn where ReID algorithms make mistakes. (e.g. C1 C2)
C1
C2
CrossRoI: Raw ReID Results Filtering
Profile on a 5 camera dataset and compare with ground truth.
Raw ReID results has significant errors.
Observations:
CrossRoI: Raw ReID Results Filtering
Use statistical filters to rectify error-prone raw ReID results.
Remove False Positive
Remove False Negative
CrossRoI: Cross-camera data association
Divide frame into tiles & Associate objects with tiles..
Profiling overtime and generating a lookup table.
O1: {3, 4, 9, 10}
O2: {11}
O3: {9, 10, 15, 16}
At timestamp T = t1
1
2
3
24
O4: {2, 8}
O5: {3, 9}
O3: {7, 8, 13, 14}
O1
O2
O3
O3
O4
O5
C1
C2
CrossRoI: Cross-camera data association
Profiling overtime and generating a lookup table.
Timestamp | Objects detected | Appearance Regions |
t1 | O1, O2, O3, O4, O5. | O1: C1-{3, 4, 9, 10}; O2: C1-{11}; O3: C1-{9, 10, 15, 16}, C2-{7, 8, 13, 14}; O4: C2-{2, 8}; O5: C2-{3, 9}. |
... | ... | ... |
tn | ... | ... |
The lookup table ensembles cross-camera data associations.
CrossRoI: RoI Masks Generation
Select minimum number of tiles among cameras, such that every object in this field could be observed by at least one camera in any timestamp.
A Combinatorial optimization Problem.
1
2
3
24
O1
O2
O3
O3
O4
O5
C1: {3, 4, 9, 10, 11, 15, 16}
C2: {2, 3, 8, 9}
At timestamp T = t1
A RoI Mask Generation example over 5 cameras through 1min synchronized video.
Format: camera_ID: (#selected_tiles / # overall_tiles)
C2: (814 / 1296)
C1: (439 / 1296)
C3: (303 / 1296)
C5: (448 / 768)
C4: (496 / 1296)
CrossRoI: RoI Masks Generation
About 58% content removed.
Evaluation
5 sync videos for a traffic crossing, 3 min length, more than 30K vehicle bbox detections.
1 mins video for offline profiling, 2 mins video for online evaluation.
Evaluation
Detect the vehicles at least once across all five cameras in any timestamp.
CrossRoI reduced 42% network overhead and 25% end-to-end latency, while keeping 99.9% detection accuracy, compared to the baseline.
Evaluation
Q&A
hg5@illinois.edu
Backup Slides
CrossRoI: System Overview
In device data flow
Offline video data flow
RoI masks data flow
Online video streams
CrossROI operates in two phases:
CrossRoI: System Overview
In device data flow
Offline video data flow
RoI masks data flow
Online video streams
CrossROI operates in two phases:
CrossRoI: Video Compression
Tile-based video streaming hurts video compression efficacy substantially.
Video compression efficacy characterization. Profile video sizes when the video is split into m * n tiles.
CrossRoI: Video Compression
Tile-based video streaming hurts video compression efficacy substantially.
Design a Tile Grouping Algorithm to merge small tiles into larger ones.
(a) Before Tile Grouping
(b) After Tile Grouping
1
2
3
CrossRoI: RoI-based Object Detection
Based on SBNet[1], we designed a RoI-Yolo object detector which only focuses on the RoI masks region of video frames, and thus boots CNN inference speed.
[1] Ren, Mengye, et al. "Sbnet: Sparse blocks network for fast inference." CVPR. 2018.