1 of 19

Instance Segmentation

With

Mask R-CNN

Kaiming He,Georgia Gkioxari,Piotr Dollar,Ross Girshick

Facebook AI Research (FAIR)

2 of 19

Image Segmentation

3 of 19

Instance Segmentation

4 of 19

Faster R-CNN

Before moving on with Mask R-CNN. It would be better if we discuss about Faster R-CNN first.

Its is a state-of-art object detection method.

Just a short intro!!!

5 of 19

Faster R-CNN

6 of 19

Faster R-CNN

Sample Output of Faster

7 of 19

RoIPool

8 of 19

Mask R-CNN

9 of 19

Mask R-CNN relations with Faster R-CNN

  • Faster R-CNN is extended by adding a branch for predicting segmentation masks on each Region Of interest in parallel with the existing branch for classification and bounding box regression.
  • Basically the mask branch is a small FCN applied to each RoI, predicting a segmentation mask in a pixel to pixel manner.
  • This in return only add a small computational overhead.

10 of 19

Mask R-CNN

11 of 19

Mask R-CNN

12 of 19

Mask R-CNN

  • Mask R-CNN is conceptually a simple and flexible framework.

  • We already know that a Faster R-CNN has two output for each candidate object, a class label and a bounding-box offsets, to this we add a third branch that outputs the object mask.

  • Mask head requires pixel level information and evaluation for predicting mask for each objects, this differentiate it from classification and boundary box outputs.

  • To tackle this we need some improvements over architecture.

Why need improvement over Faster R-CNN

13 of 19

Mask R-CNN Architecture

1. Improvement over Faster R-CNN.

i) RoIPool vs RoIAlign

2. Improvement over other state of art model for segmentation

i) parallel computation

ii) decouple mask and classifications

14 of 19

RoIAlign - The Saviour

15 of 19

Exploitation and improvement over Faster R-CNN

  • In segmentation we required pixel to pixel behaviour. And alignment of RoI output with feature map to be done precisely.
  • RoIPool while mapping RoIs with small feature map it does some coarse and not accurate quantization for feature extraction, thus introduces misalignment between the RoI and extracted features.
  • Such small variation doesn’t effect classification, which is robust to such translation.
  • But misalignment has negative effect on mask prediction, directly effecting the performance of Mask R-CNN.
  • To fix this misalignment, a quantization free layer called RoIAlign is used.

Issues with RoIPool layer and solutions

16 of 19

Improvement over other state of art model for segmentation

  • Many earlier approaches to instance segmentation are based on segment proposals, because of popularity of R-CNN methods.
  • Models like Deepmask and others learn to earn to propose segment candidates, which are then classified by Fast R-CNN with segmentation precedes recognition.
  • This type of architectures are slow and less accurate.
  • Whereas in Mask R-CNN, mask prediction is done parallelly with class labels, which is simple and more flexible.

Parallel Computations

17 of 19

Mask R-CNN

18 of 19

Improvement over other state of art model for segmentation

  • Lcls - classification loss
  • Lbox - boundary box regression loss
  • Lmask - mask loss

Decouple mask & classification

RoI loss:

  • Earlier networks uses a per pixel sigmoid and a multinomial cross-entropy loss.
  • This couple mask prediction and classification together, and making the classes compete.
  • Whereas in Mask R-CNN, Lmask allows the network to generate masks for every class without competition among classes, and rely on classification branch to predict the class label to output the correct mask.
  • Mask R-CNN uses a per pixel softmax and a binary cross entropy loss.Thus decouple mask & classification.

19 of 19

References