1 of 48

Convolutional Autoencoder

Industrial AI Lab.

Prof. Seungchul Lee

2 of 48

Convolutional Autoencoder

  • Motivation: image to autoencoder ?
  • Convolutional autoencoder extends the basic structure of the simple autoencoder by changing the fully connected layers to convolution layers.
    • the network of encoder change to convolution layers
    • the network of decoder change to transposed convolutional layers
      • A transposed 2-D convolution layer upsamples feature maps.
      • This layer is sometimes incorrectly known as a "deconvolution" or "deconv" layer.
      • This layer is the transpose of convolution and does not perform deconvolution.

2

downsample

upsample

3 of 48

tf.keras.models.Conv2D

  • Encoder
  • Padding

3

padding = ‘VALID’

strides = [1, 1, 1, 1]

4 of 48

tf.keras.models.Conv2D

  • Encoder
  • Padding

4

padding = ‘VALID’

strides = [1, 1, 1, 1]

padding = ‘SAME’

strides = [1, 1, 1, 1]

5 of 48

tf.keras.models.Conv2D

  • Encoder
  • Stride

5

padding = ‘SAME’

strides = [1, 1, 1, 1]

padding = ‘SAME’

strides = [1, 2, 2, 1]

6 of 48

tf.keras.models.Conv2DTranspose

  • Decoder
  • Stride

6

padding = ‘VALID’

strides = (1,1)

padding = ‘VALID’

strides = (1,1)

7 of 48

tf.keras.models.Conv2DTranspose

  • Decoder
  • Stride

7

padding = ‘VALID’

strides = (2,2)

padding = ‘VALID’

strides = (2,2)

8 of 48

tf.keras.models.Conv2DTranspose

  • Decoder
  • Stride

8

padding = ‘SAME’

strides = (2,2)

padding = ‘SAME’

strides = (2,2)

9 of 48

CAE Implementation

  • Fully convolutional
  • Note that no dense layer is used

9

10 of 48

CAE Implementation

10

11 of 48

CAE Implementation

11

12 of 48

CAE Implementation

12

13 of 48

Reconstruction Result

13

14 of 48

Fully Convolutional Network (FCN)

Industrial AI Lab.

Prof. Seungchul Lee

15 of 48

Deep Learning for Computer Vision: Review

15

Source: 6.S191 Intro. to Deep Learning at MIT

16 of 48

Segmentation

  • Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.
  • Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level

16

Image from http://d2l.ai/

17 of 48

Segmentation

  • Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.
  • Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level
  • Classification needs to understand what is in the input (namely, the context).
  • However, in order to predict what is in the input for each pixel, segmentation needs to recover not only what is in the input, but also where.

17

Image from http://d2l.ai/

18 of 48

Semantic Segmentation: FCNs

  • FCN uses a convolutional neural network to transform image pixels to pixel categories.

  • Network designed with all convolutional layers, with down-sampling and up-sampling operations

  • Given a position on the spatial dimension, the output of the channel dimension will be a category prediction of the pixel corresponding to the location.

18

Image from http://d2l.ai/

19 of 48

From CAE to FCN

19

20 of 48

From CAE to FCN

20

CAE

FCN

21 of 48

Skip Connection

  • A skip connection is a connection that bypasses at least one layer.

  • Here, it is often used to transfer local information by summing feature maps from the downsampling path with feature maps from the upsampling path.
    • Merging features from various resolution levels helps combining context information with spatial information.

21

22 of 48

ResNet (Deep Residual Learning)

  • He, Kaiming, et al. “Deep residual learning for image recognition.” CVPR. 2016.
  • Plain net

22

 

23 of 48

ResNet (Deep Residual Learning)

  • He, Kaiming, et al. “Deep residual learning for image recognition.” CVPR. 2016.
  • Residual net
  • Skip connection

23

 

- A direct connection between 2 non-consecutive layers

- No gradient vanishing

24 of 48

ResNet (Deep Residual Learning)

  •  

24

  • If identity were optimal, easy to set weights as 0

  • If optimal mapping is closer to identity, easier to find small fluctuations

25 of 48

Residual Net

25

26 of 48

Fully Convolutional Networks (FCNs)

  • To obtain a segmentation map (output), segmentation networks usually have 2 parts
    • Downsampling path: capture semantic/contextual information
    • Upsampling path: recover spatial information
  • The downsampling path is used to extract and interpret the context (what), while the upsampling path is used to enable precise localization (where).

  • Furthermore, to fully recover the fine-grained spatial information lost in the pooling or downsampling layers, we often use skip connections.

  • Network can work regardless of the original image size, without requiring any fixed number of units at any stage.

26

27 of 48

Segmented (Labeled) Images

27

input

output

28 of 48

FCN Architecture

28

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

29 of 48

FCN Architecture

29

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

30 of 48

FCN Architecture

30

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

31 of 48

FCN Architecture

31

Fixed

maxp3

maxp4

fcn4

fcn3

fcn2

fcn1

Trained

32 of 48

Segmentation Result

32

maxp3

maxp4

input

Segmentation output

overlapping

33 of 48

Super-resolution and Deblurring

Prof. Seungchul Lee

Industrial AI Lab.

34 of 48

Image Restoration

  • The sources of corruption in digital images arise during image acquisition (digitization) and transmission
    • Imaging sensors can be affected by ambient conditions
    • Interference can be added to an image during transmission
  • Image restoration
    • To recover original image from degraded one with prior knowledge of degradation process

34

35 of 48

Inverse Problem

  • Inverse problems involve modeling of degradation and applying the inverse process in order to recover the original image from inadequate observations
  • The observations contain incomplete information about the target parameter or data due to physical limitations of the measurement devices
  • Consequently, solutions to inverse problems are non-unique

35

36 of 48

Image Super-resolution (SR)

  • Restore high resolution (HR) image from low resolution (LR) image

  • Numerous learning-based SR approaches

36

37 of 48

Lab: SR on Material Images

37

38 of 48

Build a FCN Model

38

39 of 48

Build a FCN Model

39

40 of 48

Build a FCN Model

40

41 of 48

Training

41

42 of 48

Result

42

43 of 48

Image Deblurring

43

44 of 48

Build a FCN Model

44

45 of 48

Build a FCN Model

45

46 of 48

Build a FCN Model

46

47 of 48

Training

47

48 of 48

Result

48