1 of 31

Fully Convolutional Network (FCN)

2 of 31

Deep Learning for Computer Vision: Review

2

Source: 6.S191 Intro. to Deep Learning at MIT

3 of 31

Convolutional Autoencoder

4 of 31

Convolutional Autoencoder

  • Motivation: image to autoencoder ?
  • Convolutional autoencoder extends the basic structure of the simple autoencoder by changing the fully connected layers to convolution layers.
    • the network of encoder change to convolution layers
    • the network of decoder change to transposed convolutional layers
      • A transposed 2-D convolution layer upsamples feature maps.
      • This layer is sometimes incorrectly known as a "deconvolution" or "deconv" layer.
      • This layer is the transpose of convolution and does not perform deconvolution.

4

downsample

upsample

5 of 31

tf.keras.models.Conv2D

  • Encoder
  • Padding

5

padding = ‘VALID’

strides = [1, 1, 1, 1]

6 of 31

tf.keras.models.Conv2D

  • Encoder
  • Padding

6

padding = ‘VALID’

strides = [1, 1, 1, 1]

padding = ‘SAME’

strides = [1, 1, 1, 1]

7 of 31

tf.keras.models.Conv2D

  • Encoder
  • Stride

7

padding = ‘SAME’

strides = [1, 1, 1, 1]

padding = ‘SAME’

strides = [1, 2, 2, 1]

8 of 31

tf.keras.models.Conv2DTranspose

  • Decoder
  • Stride

8

padding = ‘VALID’

strides = (1,1)

padding = ‘VALID’

strides = (1,1)

9 of 31

tf.keras.models.Conv2DTranspose

  • Decoder
  • Stride

9

padding = ‘VALID’

strides = (2,2)

padding = ‘VALID’

strides = (2,2)

10 of 31

tf.keras.models.Conv2DTranspose

  • Decoder
  • Stride

10

padding = ‘SAME’

strides = (2,2)

padding = ‘SAME’

strides = (2,2)

11 of 31

CAE Implementation

  • Fully convolutional
  • Note that no dense layer is used

11

12 of 31

CAE Implementation

12

13 of 31

CAE Implementation

13

14 of 31

CAE Implementation

14

15 of 31

Reconstruction Result

15

16 of 31

Segmentation

  • Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.
  • Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level

16

Image from http://d2l.ai/

17 of 31

Segmentation

  • Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.
  • Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level
  • Classification needs to understand what is in the input (namely, the context).
  • However, in order to predict what is in the input for each pixel, segmentation needs to recover not only what is in the input, but also where.

17

Image from http://d2l.ai/

18 of 31

Semantic Segmentation: FCNs

  • FCN uses a convolutional neural network to transform image pixels to pixel categories.

  • Network designed with all convolutional layers, with down-sampling and up-sampling operations

  • Given a position on the spatial dimension, the output of the channel dimension will be a category prediction of the pixel corresponding to the location.

18

Image from http://d2l.ai/

19 of 31

From CAE to FCN

19

20 of 31

From CAE to FCN

20

21 of 31

Skip Connection

  • A skip connection is a connection that bypasses at least one layer.

  • Here, it is often used to transfer local information by summing feature maps from the downsampling path with feature maps from the upsampling path.
    • Merging features from various resolution levels helps combining context information with spatial information.

21

22 of 31

ResNet (Deep Residual Learning)

  • He, Kaiming, et al. “Deep residual learning for image recognition.” CVPR. 2016.
  • Plain net

22

 

23 of 31

ResNet (Deep Residual Learning)

  • He, Kaiming, et al. "Deep residual learning for image recognition." CVPR. 2016.
  • Residual net
  • Skip connection

23

 

- A direct connection between 2 non-consecutive layers

- No gradient vanishing

24 of 31

ResNet (Deep Residual Learning)

  •  

24

  • If identity were optimal, easy to set weights as 0

  • If optimal mapping is closer to identity, easier to find small fluctuations

25 of 31

Residual Net

25

26 of 31

Fully Convolutional Networks (FCNs)

  • To obtain a segmentation map (output), segmentation networks usually have 2 parts
    • Downsampling path: capture semantic/contextual information
    • Upsampling path: recover spatial information
  • The downsampling path is used to extract and interpret the context (what), while the upsampling path is used to enable precise localization (where).

  • Furthermore, to fully recover the fine-grained spatial information lost in the pooling or downsampling layers, we often use skip connections.

  • Network can work regardless of the original image size, without requiring any fixed number of units at any stage.

26

27 of 31

Segmented (Labeled) Images

27

input

output

28 of 31

FCN Architecture

28

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

29 of 31

FCN Architecture

29

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

30 of 31

FCN Architecture

30

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

31 of 31

Segmentation Result

31

maxp3

maxp4

input

Segmentation output

overlapping