1 of 14

Cone Detection by Faster R-CNN

2022.7.30

2 of 14

Introduction

  • Object Detection Background

Object detection is a field of computer vision that involves taking either videos or still images as input and passing it through various different types of algorithms/programs in an attempt to identify and classify various types of objects such as people and cars.

  • Common model for object detection

R-CNN, Fast R-CNN, Faster R-CNN, YOLO and etc.

  • Project Proposal

Train a Faster R-CNN model using a dataset of traffic cones, to locate and classify a cone within an image.

3 of 14

Cone Dataset

Github Dataset: 123 annotated images

Data Augmentation: Albumentations API

Annotation files:

(1) First coordinate: label

(2) Second coordinate: x-value of the cone center

(3) Third coordinate: y-value of the cone center

(4) Fourth coordinate: width of the cone

(5) Fifth coordinate: height of the cone

80-20 percent split for training and validating.

augmentation

Horizontal flip

Median Blur

Crop

Contrast

4 of 14

Four steps to implement Faster R-CNN:

- CNN layers: Put the image into pretrained VGG16-Net to get the corresponding feature map.

- Region Proposal Network: Use RPN to generate region proposals. There are two tasks: Classifier and Regressor.

- ROI Pooling: Collect feature maps and region proposals to generate proposals feature maps.

- Classification: Utilize proposal feature maps to classify.

RPN + Fast R-CNN

Faster R-CNN

5 of 14

Region Proposal Network

(1) Generate anchors for each pixel after the convolution layers.

(2) Use Softmax to determine if a region is positive with a potential object or negative with no possible object (based on texture, shape, color features etc.)

(3) Bounding Box Regression finds the translating and scaling parameters to better fit the anchor to the Ground Truth (actual boundary box for the object).

(4) Proposal Layer takes in the positive anchors along with the regression parameters to output the precise dimension of the proposed region.

Anchors: a set of 9 boundary boxes with fixed aspect ratio

6 of 14

Fast R-CNN

General steps:

(1) Use selective search to generate 1K~2K region proposals.

(2) Put the image into pretrained VGG16-Net to get the corresponding feature map. The generated region proposals are projected on the feature map to get feature vectors.

(3) Use ROI pooling layer to reshape them into a fixed size. From the ROI feature vector, we use softmax layer to predict the class of the proposed region and the offset values for the bounding box.

Selective Search:

1. Generate initial sub-segmentation, we generate many candidate regions.

2. Use greedy algorithm to recursively combine similar regions into larger ones.

3. Use the generated regions to produce the final candidate region proposals.

7 of 14

RPN Multi-task Loss

Classification Loss

Bbox Regression Loss

Parameters of equation:

(1) denotes the probability that the i-th anchor is predicted to be the true label.

(2) is 1 when sample is positive and 0 vice versa.

(3) denotes the bounding box regression parameter for predicting the i-th anchor.

(4) denotes the corresponding ground truth box of the i-th anchor.

(5) denotes the number of all samples in a mini-batch is 256.

(6) denotes the number of anchor positions (not the number of anchors) is about 2400.

8 of 14

RPN Multi-task Loss

Binary Cross Entropy:

Bbox Regression Loss:

(1) denotes the bounding box regression parameter for predicting the i-th anchor.

(2) denotes the corresponding ground truth box of the i-th anchor.

9 of 14

Fast R-CNN Multi-task Loss

Classification Loss

Bbox Regression Loss

Parameters of equation:

(1) denotes the softmax probability distribution predicted by the classifier.

(2) corresponds with ground truth category labels.

(3) denotes the regression parameters for the corresponding class predicted by the bounding box regressor.

(4) corresponds to the bounding box regression parameters of the ground truth.

(5) Classification Loss is also the cross-entropy loss function.

(6) Bbox regression loss is the same with RPN network.

10 of 14

Advantages & Drawbacks

Advantages

  1. Produces high accuracy results with different tested datasets (high compatibility).
  2. Faster prediction time and lower computing power requirement.
  3. Identify varying scales and aspect ratio objects.

Drawbacks

  • RoI pooling lowers the definition and accuracy of image.
  • NMS layer could remove boundary boxes for overlapping objects.
  • The number of positive and negative samples is limited by hyperparameters to ensure the balance.

11 of 14

Experiment results

original images prediction results

  • All the cones are detected and the correct dimensions of the bounding boxes are applied
  • Cars and pillars are also detected (false positive)
  • Average Recall: 0.594
  • Average Precision: 0.513

12 of 14

Result Comparison With Yolov3

Yolov3 test on video

Analysis:

(1) From table and video results, Faster R-CNN and YOLOv3 models both have effect on the traffic cone dataset.

(2) Faster R-CNN in our project maintains a high recall rate, while maintaining a relatively high precision.

(3) YOLOv3 sacrifices recall rate in order to ensure high precision.

(4) In result, Faster R-CNN model in our project is better than YOLOv3.

13 of 14

Conclusion & Future Study

(1) In this project, we have reproduced the code of the Faster R-CNN model to detect the traffic cones on the image.

(2) Faster R-CNN model has presented RPNs for efficient and accurate region proposal generation.

(3) By sharing convolutional features with the down-stream detection network, the region proposal step is nearly cost-free.

(4) The learned RPN also improves region proposal quality and thus the overall object detection accuracy.

(5) We have compared the results between Faster R-CNN and YOLOv3 model and presented drawbacks to solve in the future.

14 of 14

References

[1] Annis, J., Floyd, D., Fontes, S., & Navarrete, M. (n.d.). Parking Analysis via Image Processing. Bakersfield; CSU Bakersfield.

[2] “Albumentations Documentation - What Is Image Augmentation.” What Is Image Augmentation - Albumentations Documentation, https://albumentations.ai/docs/introduction/image_augmentation/.

[3] Du, Lixuan, et al. “Overview of Two-Stage Object Detection Algorithms.” Research Gate, May 2020, https://www.researchgate.net/figure/Network-structure-diagram-of-Faster-R-CNN-Faster-R-CNN-is-mainly-divided-into-the_fig1_341871095/actions#reference. Accessed 15 July 2022.

[4] Justin Freid. “Frontiers: Visual identity that begins to transform advertising.” Translated by Fred, 10 Sept. 2018, https://zhuanlan.zhihu.com/p/44239428.

[5] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015. 1, 2, 4, 5, 8.

[6] Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International journal of computer vision, 2013, 104(2): 154-171.