1 of 19

Instance Segmentation

With

Mask R-CNN

Kaiming He,Georgia Gkioxari,Piotr Dollar,Ross Girshick

Facebook AI Research (FAIR)

2 of 19

Image Segmentation

3 of 19

Instance Segmentation

4 of 19

Faster R-CNN

Before moving on with Mask R-CNN. It would be better if we discuss about Faster R-CNN first.

Its is a state-of-art object detection method.

Just a short intro!!!

5 of 19

Faster R-CNN

6 of 19

Faster R-CNN

Sample Output of Faster

7 of 19

RoIPool

8 of 19

Mask R-CNN

9 of 19

Mask R-CNN relations with Faster R-CNN

Faster R-CNN is extended by adding a branch for predicting segmentation masks on each Region Of interest in parallel with the existing branch for classification and bounding box regression.
Basically the mask branch is a small FCN applied to each RoI, predicting a segmentation mask in a pixel to pixel manner.
This in return only add a small computational overhead.

10 of 19

Mask R-CNN

11 of 19

Mask R-CNN

12 of 19

Mask R-CNN

Mask R-CNN is conceptually a simple and flexible framework.

We already know that a Faster R-CNN has two output for each candidate object, a class label and a bounding-box offsets, to this we add a third branch that outputs the object mask.

Mask head requires pixel level information and evaluation for predicting mask for each objects, this differentiate it from classification and boundary box outputs.

To tackle this we need some improvements over architecture.

Why need improvement over Faster R-CNN

13 of 19

Mask R-CNN Architecture

1. Improvement over Faster R-CNN.

i) RoIPool vs RoIAlign

2. Improvement over other state of art model for segmentation

i) parallel computation

ii) decouple mask and classifications

14 of 19

RoIAlign - The Saviour

15 of 19

Exploitation and improvement over Faster R-CNN

In segmentation we required pixel to pixel behaviour. And alignment of RoI output with feature map to be done precisely.
RoIPool while mapping RoIs with small feature map it does some coarse and not accurate quantization for feature extraction, thus introduces misalignment between the RoI and extracted features.
Such small variation doesn’t effect classification, which is robust to such translation.
But misalignment has negative effect on mask prediction, directly effecting the performance of Mask R-CNN.
To fix this misalignment, a quantization free layer called RoIAlign is used.

Issues with RoIPool layer and solutions

16 of 19

Improvement over other state of art model for segmentation

Many earlier approaches to instance segmentation are based on segment proposals, because of popularity of R-CNN methods.
Models like Deepmask and others learn to earn to propose segment candidates, which are then classified by Fast R-CNN with segmentation precedes recognition.
This type of architectures are slow and less accurate.
Whereas in Mask R-CNN, mask prediction is done parallelly with class labels, which is simple and more flexible.

Parallel Computations

17 of 19

Mask R-CNN

18 of 19

Improvement over other state of art model for segmentation

L_cls - classification loss
L_box - boundary box regression loss
L_mask - mask loss

Decouple mask & classification

RoI loss:

Earlier networks uses a per pixel sigmoid and a multinomial cross-entropy loss.
This couple mask prediction and classification together, and making the classes compete.
Whereas in Mask R-CNN, Lmask allows the network to generate masks for every class without competition among classes, and rely on classification branch to predict the class label to output the correct mask.
Mask R-CNN uses a per pixel softmax and a binary cross entropy loss.Thus decouple mask & classification.

19 of 19

References