1 of 16

Detection and Segmentation in CV

2 of 16

Semantic vs Instance vs Panoptic segmentation

3 of 16

Pros and cons

  • Semantic segmentation
  • Faster / easier than instance segmentation
  • Allows “complete” explanation
  • Suitable for “staff” and “things”
  • Merges instances
  • Object detection
  • Faster / easier than instance segmentation
  • Distinguishes instances
  • Inaccurate for some classes
  • Incomplete
  • Suitable for “things”

  • Instance / Panoptic segmentation
  • Complete
  • Distinguish instances
  • Accurate
  • Harder / slower

4 of 16

Detection. Intersection over Union (IOU).

5 of 16

Detection. Average precision (AP).

https://medium.com/towards-data-science/what-is-average-precision-in-object-detection-localization-algorithms-and-how-to-calculate-it-3f330efe697b

6 of 16

Detection. Average precision (AP).

https://medium.com/towards-data-science/what-is-average-precision-in-object-detection-localization-algorithms-and-how-to-calculate-it-3f330efe697b

7 of 16

Detection. Mean average precision (MAP).

https://medium.com/towards-data-science/what-is-average-precision-in-object-detection-localization-algorithms-and-how-to-calculate-it-3f330efe697b

8 of 16

Detection. Non maximum suppression (NMS)

https://learnopencv.com/non-maximum-suppression-theory-and-implementation-in-pytorch/

NMS

Input: ({bbox_i, score_i}} from 1 to N

Sort in the descending order of score_i

for i = 1…N

Take bbox_i

Skip all boxes whose IoU with bbox_i > threshold

9 of 16

Detection. Ideas.

  1. Sliding-window: use binary classification to classify every possible subwindow
  2. Region proposal: pick a subset of prospective regions and score them with a binary classifier
  3. Bounding box regression: predict the coordinates of the boxes as real-valued variables

10 of 16

UNet

11 of 16

R-CNN

https://arxiv.org/pdf/1311.2524

  1. Use an external box proposal method
  2. Fine-tune the ConvNet to score proposal

12 of 16

Fast R-CNN

https://arxiv.org/pdf/1504.08083

13 of 16

Faster R-CNN

https://arxiv.org/pdf/1506.01497

Key novelty: the proposals come from “sparse sliding window search”

14 of 16

Mask R-CNN

https://arxiv.org/pdf/1703.06870

Predicting mask for instance segmentation

15 of 16

Single-shot detector

https://arxiv.org/pdf/1512.02325

  1. One-stage detection: united model for proposals and classification
  2. Anchor boxes on different scales

16 of 16

RetinaNet

https://arxiv.org/pdf/1708.02002

  1. Adding encoder-decoder - better for small objects
  2. Focal Loss