JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 16

Detection and Segmentation in CV

2 of 16

Semantic vs Instance vs Panoptic segmentation

3 of 16

Pros and cons

Semantic segmentation
Faster / easier than instance segmentation
Allows “complete” explanation
Suitable for “staff” and “things”
Merges instances

Object detection
Faster / easier than instance segmentation
Distinguishes instances
Inaccurate for some classes
Incomplete
Suitable for “things”

Instance / Panoptic segmentation
Complete
Distinguish instances
Accurate
Harder / slower

4 of 16

Detection. Intersection over Union (IOU).

5 of 16

Detection. Average precision (AP).

https://medium.com/towards-data-science/what-is-average-precision-in-object-detection-localization-algorithms-and-how-to-calculate-it-3f330efe697b

6 of 16

Detection. Average precision (AP).

https://medium.com/towards-data-science/what-is-average-precision-in-object-detection-localization-algorithms-and-how-to-calculate-it-3f330efe697b

7 of 16

Detection. Mean average precision (MAP).

https://medium.com/towards-data-science/what-is-average-precision-in-object-detection-localization-algorithms-and-how-to-calculate-it-3f330efe697b

8 of 16

Detection. Non maximum suppression (NMS)

https://learnopencv.com/non-maximum-suppression-theory-and-implementation-in-pytorch/

NMS

Input: ({bbox_i, score_i}} from 1 to N

Sort in the descending order of score_i

for i = 1…N

Take bbox_i

Skip all boxes whose IoU with bbox_i > threshold

9 of 16

Detection. Ideas.

Sliding-window: use binary classification to classify every possible subwindow
Region proposal: pick a subset of prospective regions and score them with a binary classifier
Bounding box regression: predict the coordinates of the boxes as real-valued variables

10 of 16

UNet

11 of 16

R-CNN

https://arxiv.org/pdf/1311.2524

Use an external box proposal method
Fine-tune the ConvNet to score proposal

12 of 16

Fast R-CNN

https://arxiv.org/pdf/1504.08083

13 of 16

Faster R-CNN

https://arxiv.org/pdf/1506.01497

Key novelty: the proposals come from “sparse sliding window search”

14 of 16

Mask R-CNN

https://arxiv.org/pdf/1703.06870

Predicting mask for instance segmentation

15 of 16

Single-shot detector

https://arxiv.org/pdf/1512.02325

One-stage detection: united model for proposals and classification
Anchor boxes on different scales

16 of 16

RetinaNet

https://arxiv.org/pdf/1708.02002

Adding encoder-decoder - better for small objects
Focal Loss