1 of 97

Convolutional Neural Networks�(CNN) 

2 of 97

Convolution

2

3 of 97

Convolution

  • Integral (or sum) of the product of the two signals after one is reversed and shifted
  • Cross correlation and convolution

3

4 of 97

1D Convolution

  • (actually cross-correlation)

4

Source: Dr. Francois Fleuret at EPFL

1

3

2

3

0

-1

1

2

2

1

1

3

0

-1

Input

Kernel

Output

L = W-w+1

7

W

w

5 of 97

1D Convolution

  • (actually cross-correlation)

5

Source: Dr. Francois Fleuret at EPFL

1

3

2

3

0

-1

1

2

2

1

1

3

0

-1

Input

Kernel

Output

L = W-w+1

7

9

W

w

6 of 97

1D Convolution

  • (actually cross-correlation)

6

Source: Dr. Francois Fleuret at EPFL

1

3

2

3

0

-1

1

2

2

1

1

3

0

-1

w

Input

Kernel

Output

L = W-w+1

7

9

12

2

-1

0

6

W

7 of 97

Example: 1D Convolution

7

8 of 97

De-noising a Piecewise Smooth Signal

  •  

8

9 of 97

De-noising a Piecewise Smooth Signal

9

10 of 97

Edge Detection

10

11 of 97

Smoothing and Detection of Abrupt Changes

11

12 of 97

Images

12

13 of 97

Images Are Numbers

13

Source: 6.S191 Intro. to Deep Learning at MIT

14 of 97

Images

14

Original image

R

G

B

Gray image

15 of 97

2D Convolution

15

16 of 97

Convolution on Image (= Convolution in 2D)

  • Filter (or Kernel)
    • Discrete convolution can be viewed as element-wise multiplication by a matrix
    • Modify or enhance an image by filtering
    • Filter images to emphasize certain features or remove other features
    • Filtering includes smoothing, sharpening and edge enhancement

16

Image

Kernel

Output

17 of 97

Convolution on Image (= Convolution in 2D)

17

18 of 97

Convolution on Image

18

Kernel

Image

Kernel

Output

19 of 97

Convolution on Image

19

20 of 97

Gaussian Filter: Blurring

20

21 of 97

How to Find the Right Kernels

  • We learn many different kernels that make specific effect on images

  • Let’s apply an opposite approach

  • We are not designing the kernel, but are learning the kernel from data

  • Can learn feature extractor from data using a deep learning framework

21

22 of 97

Learning Visual Features

22

23 of 97

Convolutional Neural Networks (CNN)

  • Motivation
    • The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.

23

24 of 97

ANN Structure for Object Detection in Image

  • Does not seem the best
  • Did not make use of the fact that we are dealing with images

24

bird

25 of 97

Fully Connected Neural Network

  • Input
    • 2D image
    • Vector of pixel values

  • Fully connected
    • Connect neuron in hidden layer to all neurons in input layer
    • No spatial information
    • Spatial organization of the input is destroyed by flatten
    • And many, many parameters !

  • How can we use spatial structure in the input to inform the architecture of the network?

25

Source: 6.S191 Intro. to Deep Learning at MIT

26 of 97

Convolution Mask + Neural Network

26

27 of 97

Locality

  • Locality: objects tend to have a local spatial support
    • fully-connected layer → locally-connected layer

27

28 of 97

Locality

  • Locality: objects tend to have a local spatial support
    • fully-connected layer → locally-connected layer

28

 

 

 

 

 

 

 

 

 

 

29 of 97

Deep Artificial Neural Networks

  • Universal function approximator
    • Simple nonlinear neurons
    • Linear connected networks

  • Hidden layers
    • Autonomous feature learning

29

Class 2

Class 1

30 of 97

Convolutional Neural Networks

  • Structure
    • Weight sharing
    • Local connectivity
    • Typically have sparse interactions

  • Optimization
    • Smaller searching space

30

Class 2

Class 1

31 of 97

Multiple Filters (or Kernels)

31

32 of 97

Channels

  • Colored image = tensor of shape (height, width, channels)
  • Convolutions are usually computed for each channel and summed:
  • Kernel size aka receptive field (usually 1, 3, 5, 7, 11)

32

Source: Dr. Francois Fleuret at EPFL

33 of 97

Multi-channel 2D Convolution

33

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

34 of 97

Multi-channel 2D Convolution

34

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

35 of 97

Multi-channel 2D Convolution

35

Kernel

w

h

c

Output

Input

W

H

C

36 of 97

Multi-channel 2D Convolution

36

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Output

Kernel

w

h

c

37 of 97

Multi-channel 2D Convolution

37

Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

38 of 97

Multi-channel 2D Convolution

38

Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

39 of 97

Multi-channel 2D Convolution

39

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

40 of 97

Multi-channel 2D Convolution

40

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

41 of 97

Multi-channel 2D Convolution

41

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

42 of 97

Multi-channel 2D Convolution

42

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

43 of 97

Multi-channel 2D Convolution

43

Output

Input

W

H

C

Kernel

w

h

c

Source: Dr. Francois Fleuret at EPFL

44 of 97

Multi-channel 2D Convolution

44

Output

H – h + 1

W – w + 1

1

Input

W

H

C

Kernel

w

h

c

Source: Dr. Francois Fleuret at EPFL

45 of 97

Multi-channel and Multi-kernel 2D Convolution

45

Source: Dr. Francois Fleuret at EPFL

Output

H – h + 1

W – w + 1

D

w

h

c

Kernels

Input

W

H

C

D

46 of 97

Dealing with Shapes

  •  

46

Source: Dr. Francois Fleuret at EPFL

47 of 97

Multi-channel 2D Convolution

  • The kernel is not swiped across channels, just across rows and columns.
  • Note that a convolution preserves the signal support structure.
    • A 1D signal is converted into a 1D signal, a 2D signal into a 2D, and neighboring parts of the input signal influence neighboring parts of the output signal.

  • We usually refer to one of the channels generated by a convolution layer as an activation map.
  • The sub-area of an input map that influences a component of the output as the receptive field of the latter.

47

48 of 97

Padding and Stride

48

49 of 97

Strides

  • Strides: increment step size for the convolution operator
  • Reduces the size of the output map

49

Example with kernel size 3×3 and a stride of 2 (image in blue)

Source: https://github.com/vdumoulin/conv_arithmetic

50 of 97

Padding

  • Padding: artificially fill borders of image
  • Useful to keep spatial dimension constant across filters
  • Useful with strides and large receptive fields
  • Usually fill with 0s

50

Source: https://github.com/vdumoulin/conv_arithmetic

51 of 97

Padding and Stride

  •  

51

Source: Dr. Francois Fleuret at EPFL

 

Input

 

52 of 97

Padding and Stride

  •  

52

Source: Dr. Francois Fleuret at EPFL

Input

Output

53 of 97

Padding and Stride

  •  

53

Source: Dr. Francois Fleuret at EPFL

Input

Output

54 of 97

Padding and Stride

  •  

54

Source: Dr. Francois Fleuret at EPFL

Input

Output

55 of 97

Padding and Stride

  •  

55

Source: Dr. Francois Fleuret at EPFL

Input

Output

56 of 97

Padding and Stride

  •  

56

Source: Dr. Francois Fleuret at EPFL

Input

Output

57 of 97

Padding and Stride

  •  

57

Source: Dr. Francois Fleuret at EPFL

Input

Output

58 of 97

Padding and Stride

  •  

58

Source: Dr. Francois Fleuret at EPFL

Input

Output

59 of 97

Padding and Stride

  •  

59

Source: Dr. Francois Fleuret at EPFL

Input

Output

60 of 97

Padding and Stride

  •  

60

Source: Dr. Francois Fleuret at EPFL

Input

Output

61 of 97

Nonlinear Activation Function

61

62 of 97

Pooling

62

63 of 97

Pooling

  • Compute a maximum value in a sliding window (max pooling)
  • Reduce spatial resolution for faster computation
  • Achieve invariance to local translation
  • Max pooling introduces invariances
    • Pooling size : 2×2
    • No parameters: max or average of 2x2 units

63

64 of 97

Pooling

  • The most standard type of pooling is the max-pooling, which computes max values over non-overlapping blocks
  • For instance in 1D with a window of size 2

64

Source: Dr. Francois Fleuret at EPFL

1

3

2

3

0

-1

1

2

2

1

Input

r w

65 of 97

Pooling

65

Source: Dr. Francois Fleuret at EPFL

1

3

2

3

0

-1

1

2

2

1

Input

r w

w

Output

3

r

66 of 97

Pooling

66

Source: Dr. Francois Fleuret at EPFL

1

3

2

3

0

-1

1

2

2

1

w

3

3

0

2

2

Input

r w

Output

r

67 of 97

Pooling

  • Such an operation aims at grouping several activations into a single “more meaningful” one.

  • The average pooling computes average values per block instead of max values

67

Source: Dr. Francois Fleuret at EPFL

1

3

2

3

0

-1

1

2

2

1

w

3

3

0

2

2

Input

r w

Output

r

68 of 97

Pooling: Invariance

  • Pooling provides invariance to any permutation inside one of the cell
  • More practically, it provides a pseudo-invariance to deformations that result into local translations

68

Source: Dr. Francois Fleuret at EPFL

69 of 97

Pooling: Invariance

  • Pooling provides invariance to any permutation inside one of the cell
  • More practically, it provides a pseudo-invariance to deformations that result into local translations

69

Source: Dr. Francois Fleuret at EPFL

70 of 97

Multi-channel Pooling

70

Source: Dr. Francois Fleuret at EPFL

Input

r w

s h

C

71 of 97

Multi-channel Pooling

71

Input

r w

s h

C

Output

72 of 97

Multi-channel Pooling

72

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

73 of 97

Multi-channel Pooling

73

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

74 of 97

Multi-channel Pooling

74

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

75 of 97

Multi-channel Pooling

75

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

76 of 97

Multi-channel Pooling

76

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

77 of 97

Multi-channel Pooling

77

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

78 of 97

Multi-channel Pooling

78

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

79 of 97

Multi-channel Pooling

79

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

80 of 97

Multi-channel Pooling

80

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

81 of 97

Multi-channel Pooling

81

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

82 of 97

Multi-channel Pooling

82

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

83 of 97

Multi-channel Pooling

83

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

84 of 97

Multi-channel Pooling

84

Source: Dr. Francois Fleuret at EPFL

Input

r w

s h

C

Output

r

s

C

85 of 97

Inside the Convolution Layer Block

85

Conv blocks

86 of 97

Classic ConvNet Architecture

  • Input

  • Conv blocks
    • Convolution + activation (relu)
    • Convolution + activation (relu)
    • ...
    • Maxpooling

  • Output
    • Fully connected layers
    • Softmax

86

87 of 97

CNNs for Classification: Feature Learning

  • Learn features in input image through convolution
  • Introduce non-linearity through activation function (real-world data is non-linear!)
  • Reduce dimensionality and preserve spatial invariance with pooling

87

Source: 6.S191 Intro. to Deep Learning at MIT

88 of 97

CNNs for Classification: Class Probabilities

  • CONV and POOL layers output high-level features of input
  • Fully connected layer uses these features for classifying input image
  • Express output as probability of image belonging to a particular class

88

Source: 6.S191 Intro. to Deep Learning at MIT

89 of 97

CNNs: Training with Backpropagation

  • Learn weights for convolutional filters and fully connected layers
  • Backpropagation: cross-entropy loss

89

Source: 6.S191 Intro. to Deep Learning at MIT

90 of 97

CNN in TensorFlow

90

91 of 97

Lab: CNN with TensorFlow

  • MNIST example
  • To classify handwritten digits

91

92 of 97

CNN Structure

92

93 of 97

Loss and Optimizer

  • Loss
    • Classification: Cross entropy
    • Equivalent to applying logistic regression
  • Optimizer
    • GradientDescentOptimizer
    • AdamOptimizer: the most popular optimizer

93

94 of 97

Test or Evaluation

94

95 of 97

CNN for Steel Surface Defects

95

96 of 97

Steel Surface Defects

  • NEU steel surface defects example

96

97 of 97

CNN with TensorFlow

97