Mask-Net�A Hardware-efficient Object Detection Network with Masked Region Proposals
Hanqiu Chen*, Cong (Callie) Hao
Georgia Institute of Technology
* Work done during internship at Georgia Tech
Overview
2
Overview
3
Background & Motivation
Challenges for object detection on embedded systems with DNNs
4
Deep Neural Networks
Implementation on embedded systems
Challenges
Background & Motivation
Redundant computation: a large part of an image is background and it is unnecessary to focus on these regions.
5
Sample image from DAC-SDC [1] dataset
The distribution of bounding box relative size in three different datasets
[1] Xiaowei Xu, Xinyi Zhang, Bei Yu, X Sharon Hu, Christopher Rowen, Jingtong Hu, and Yiyu Shi. Dac-sdc low power object detection challenge for uav applications. IEEE transactions on pattern analysis and machine intelligence, 2019.
Overview
6
Related Work: Region Proposal
7
Faster-RCNN | Mask-RCNN |
Computationally expensive: needs deep convolution layers to extract enough features | No rectangular regions: not beneficial for hardware acceleration |
The Mask-RCNN[2] framework for instance segmentation and its extension
Faster R-CNN[1] is a single, unified network for object detection. The RPN module serves as the ‘attention’ of this unified network.
[1] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015): 91-99.
[2] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
Related Work : Cascade
8
Four common cascade network in object detection
Cai, Zhaowei, and Nuno Vasconcelos. "Cascade r-cnn: Delving into high quality object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Image credit:
Pros | Cons |
|
|
Overview
9
The architecture of Mask-Net
10
The architecture of Mask-Net
11
Shared by new branch and backbone to extract preliminary features
The architecture of Mask-Net
12
Generate a mask with proposed regions
The architecture of Mask-Net
13
Only compute the proposed regions to generate bounding box
A case study: Mask-SkyNet
14
SkyNet[1] is a hardware-efficient object detection and tracking backbone.
We choose SkyNet as base model to design a FPGA accelerator.
Mask-SkyNet architecture
[1] Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, et al. Skynet: a hardware-efficient method for object detection and tracking on embedded systems. Proceedings of Machine Learning and Systems, 2:216–229, 2020.
Promising features of Mask-Net
15
Promising features of Mask-Net
16
Promising features of Mask-Net
17
Algorithm Innovations
18
Mask generation process
The gate function
Algorithm Innovations
19
All pass mechanism
Train the new branch
Fine-tune the backbone
Stage 2
Stage 1
Apply mask to backbone
Two stage training process
Hardware Innovations
20
0
0
0
0
0
0
0
0
1
0
0
1
1
1
0
0
0
1
0
0
0
0
0
0
0
Non-rectangular Shape Mask
0
0
0
0
0
0
1
1
1
0
0
1
1
1
0
0
1
1
1
0
0
0
0
0
0
Rectangular Shape Mask
Hardware Innovations
21
Channel shuffle in Mask-Net
Channel shuffle in ShuffleNet[1]
[1] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6848–6856, 2018.
Overview
22
Experiment Results
23
Experiment Results
24
Mask quality analysis
IoU loss comparison
Experiment Results
25
C/RTL co-simulation results from Vitis
Experiment Results
26
Software evaluation results ( table Ⅰ )
Hardware evaluation results ( table Ⅱ )
Resource utilization report
( table Ⅲ )
73 of the 87 DSPs added come from the new branch
Overview
27
Design Space Exploration
28
( 1 )
( 2 )
( 3 )
( 4 )
Design Space Exploration Results
29
DSP exploration space when extra DSP count is 1309
The relationship between the number of extra DSPs, inference time and theoretical speedup
The DSP distribution across three parts with different number of total extra DSPs.
Overview
30
Future Research Directions
31
Thank you!