1 of 48

Convolutional Autoencoder

Industrial AI Lab.

Prof. Seungchul Lee

2 of 48

Convolutional Autoencoder

Motivation: image to autoencoder ?
Convolutional autoencoder extends the basic structure of the simple autoencoder by changing the fully connected layers to convolution layers.

the network of encoder change to convolution layers
the network of decoder change to transposed convolutional layers

A transposed 2-D convolution layer upsamples feature maps.
This layer is sometimes incorrectly known as a "deconvolution" or "deconv" layer.
This layer is the transpose of convolution and does not perform deconvolution.

downsample

upsample

3 of 48

tf.keras.models.Conv2D

Encoder
Padding

padding = ‘VALID’

strides = [1, 1, 1, 1]

4 of 48

tf.keras.models.Conv2D

Encoder
Padding

padding = ‘VALID’

strides = [1, 1, 1, 1]

padding = ‘SAME’

strides = [1, 1, 1, 1]

5 of 48

tf.keras.models.Conv2D

Encoder
Stride

padding = ‘SAME’

strides = [1, 1, 1, 1]

padding = ‘SAME’

strides = [1, 2, 2, 1]

6 of 48

tf.keras.models.Conv2DTranspose

Decoder
Stride

padding = ‘VALID’

strides = (1,1)

padding = ‘VALID’

strides = (1,1)

7 of 48

tf.keras.models.Conv2DTranspose

Decoder
Stride

padding = ‘VALID’

strides = (2,2)

padding = ‘VALID’

strides = (2,2)

8 of 48

tf.keras.models.Conv2DTranspose

Decoder
Stride

padding = ‘SAME’

strides = (2,2)

padding = ‘SAME’

strides = (2,2)

9 of 48

CAE Implementation

Fully convolutional
Note that no dense layer is used

10 of 48

CAE Implementation

11 of 48

CAE Implementation

12 of 48

CAE Implementation

13 of 48

Reconstruction Result

14 of 48

Fully Convolutional Network (FCN)

Industrial AI Lab.

Prof. Seungchul Lee

15 of 48

Deep Learning for Computer Vision: Review

Source: 6.S191 Intro. to Deep Learning at MIT

16 of 48

Segmentation

Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.
Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level

Image from http://d2l.ai/

17 of 48

Segmentation

Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.
Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level
Classification needs to understand what is in the input (namely, the context).
However, in order to predict what is in the input for each pixel, segmentation needs to recover not only what is in the input, but also where.

Image from http://d2l.ai/

18 of 48

Semantic Segmentation: FCNs

FCN uses a convolutional neural network to transform image pixels to pixel categories.

Network designed with all convolutional layers, with down-sampling and up-sampling operations

Given a position on the spatial dimension, the output of the channel dimension will be a category prediction of the pixel corresponding to the location.

Image from http://d2l.ai/

19 of 48

From CAE to FCN

20 of 48

From CAE to FCN

CAE

FCN

21 of 48

Skip Connection

A skip connection is a connection that bypasses at least one layer.

Here, it is often used to transfer local information by summing feature maps from the downsampling path with feature maps from the upsampling path.

Merging features from various resolution levels helps combining context information with spatial information.

22 of 48

ResNet (Deep Residual Learning)

He, Kaiming, et al. “Deep residual learning for image recognition.” CVPR. 2016.
Plain net

23 of 48

ResNet (Deep Residual Learning)

He, Kaiming, et al. “Deep residual learning for image recognition.” CVPR. 2016.
Residual net
Skip connection

- A direct connection between 2 non-consecutive layers

- No gradient vanishing

24 of 48

ResNet (Deep Residual Learning)

If identity were optimal, easy to set weights as 0

If optimal mapping is closer to identity, easier to find small fluctuations

25 of 48

Residual Net

26 of 48

Fully Convolutional Networks (FCNs)

To obtain a segmentation map (output), segmentation networks usually have 2 parts

Downsampling path: capture semantic/contextual information
Upsampling path: recover spatial information

The downsampling path is used to extract and interpret the context (what), while the upsampling path is used to enable precise localization (where).

Furthermore, to fully recover the fine-grained spatial information lost in the pooling or downsampling layers, we often use skip connections.

Network can work regardless of the original image size, without requiring any fixed number of units at any stage.

27 of 48

Segmented (Labeled) Images

input

output

28 of 48

FCN Architecture

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

29 of 48

FCN Architecture

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

30 of 48

FCN Architecture

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

31 of 48

FCN Architecture

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

32 of 48

Segmentation Result

maxp3

maxp4

input

Segmentation output

overlapping

33 of 48

Super-resolution and Deblurring

Prof. Seungchul Lee

Industrial AI Lab.

34 of 48

Image Restoration

The sources of corruption in digital images arise during image acquisition (digitization) and transmission

Imaging sensors can be affected by ambient conditions
Interference can be added to an image during transmission

Image restoration

To recover original image from degraded one with prior knowledge of degradation process

35 of 48

Inverse Problem

Inverse problems involve modeling of degradation and applying the inverse process in order to recover the original image from inadequate observations
The observations contain incomplete information about the target parameter or data due to physical limitations of the measurement devices
Consequently, solutions to inverse problems are non-unique

36 of 48

Image Super-resolution (SR)

Restore high resolution (HR) image from low resolution (LR) image

Numerous learning-based SR approaches

1 of 48

2 of 48

3 of 48

4 of 48

5 of 48

6 of 48

7 of 48

8 of 48

9 of 48

10 of 48

11 of 48

12 of 48

13 of 48

14 of 48

15 of 48

16 of 48

17 of 48

18 of 48

19 of 48

20 of 48

21 of 48

22 of 48

23 of 48

24 of 48

25 of 48

26 of 48

27 of 48

28 of 48

29 of 48

30 of 48

31 of 48

32 of 48

33 of 48

34 of 48

35 of 48

36 of 48

37 of 48

38 of 48

39 of 48

40 of 48

41 of 48

42 of 48

43 of 48

44 of 48

45 of 48

46 of 48

47 of 48

48 of 48