1 of 97

Convolutional Neural Networks�(CNN)

2 of 97

Convolution

2

3 of 97

Convolution

Integral (or sum) of the product of the two signals after one is reversed and shifted
Cross correlation and convolution

3

4 of 97

1D Convolution

(actually cross-correlation)

4

Source: Dr. Francois Fleuret at EPFL

1	3	2	3	0	-1	1	2	2	1

1	3	0	-1

Input

Kernel

Output

L = W-w+1

7

W

w

5 of 97

1D Convolution

(actually cross-correlation)

5

Source: Dr. Francois Fleuret at EPFL

1	3	2	3	0	-1	1	2	2	1

1	3	0	-1

Input

Kernel

Output

L = W-w+1

7

9

W

w

6 of 97

1D Convolution

(actually cross-correlation)

6

Source: Dr. Francois Fleuret at EPFL

1	3	2	3	0	-1	1	2	2	1

1	3	0	-1

w

Input

Kernel

Output

L = W-w+1

7

9

12

2

-1

0

6

W

7 of 97

Example: 1D Convolution

7

8 of 97

De-noising a Piecewise Smooth Signal

8

9 of 97

De-noising a Piecewise Smooth Signal

9

10 of 97

Edge Detection

10

11 of 97

Smoothing and Detection of Abrupt Changes

11

12 of 97

Images

12

13 of 97

Images Are Numbers

13

Source: 6.S191 Intro. to Deep Learning at MIT

14 of 97

Images

14

Original image

R

G

B

Gray image

15 of 97

2D Convolution

15

16 of 97

Convolution on Image (= Convolution in 2D)

Filter (or Kernel)

Discrete convolution can be viewed as element-wise multiplication by a matrix
Modify or enhance an image by filtering
Filter images to emphasize certain features or remove other features
Filtering includes smoothing, sharpening and edge enhancement

16

Image

Kernel

Output

17 of 97

Convolution on Image (= Convolution in 2D)

17

18 of 97

Convolution on Image

18

Kernel

Image

Kernel

Output

19 of 97

Convolution on Image

19

20 of 97

Gaussian Filter: Blurring

20

21 of 97

How to Find the Right Kernels

We learn many different kernels that make specific effect on images

Let’s apply an opposite approach

We are not designing the kernel, but are learning the kernel from data

Can learn feature extractor from data using a deep learning framework

21

22 of 97

Learning Visual Features

22

23 of 97

Convolutional Neural Networks (CNN)

Motivation

The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.

23

24 of 97

ANN Structure for Object Detection in Image

Does not seem the best
Did not make use of the fact that we are dealing with images

24

bird

25 of 97

Fully Connected Neural Network

Input

2D image
Vector of pixel values

Fully connected

Connect neuron in hidden layer to all neurons in input layer
No spatial information
Spatial organization of the input is destroyed by flatten
And many, many parameters !

How can we use spatial structure in the input to inform the architecture of the network?

25

Source: 6.S191 Intro. to Deep Learning at MIT

26 of 97

Convolution Mask + Neural Network

26

27 of 97

Locality

Locality: objects tend to have a local spatial support

fully-connected layer → locally-connected layer

27

28 of 97

Locality

Locality: objects tend to have a local spatial support

fully-connected layer → locally-connected layer

28

29 of 97

Deep Artificial Neural Networks

Universal function approximator

Simple nonlinear neurons
Linear connected networks

Hidden layers

Autonomous feature learning

29

Class 2

Class 1

30 of 97

Convolutional Neural Networks

Structure

Weight sharing
Local connectivity
Typically have sparse interactions

Optimization

Smaller searching space

30

Class 2

Class 1

31 of 97

Multiple Filters (or Kernels)

31

32 of 97

Channels

Colored image = tensor of shape (height, width, channels)
Convolutions are usually computed for each channel and summed:
Kernel size aka receptive field (usually 1, 3, 5, 7, 11)

32

Source: Dr. Francois Fleuret at EPFL

33 of 97

Multi-channel 2D Convolution

33

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

34 of 97

Multi-channel 2D Convolution

34

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

35 of 97

Multi-channel 2D Convolution

35

Kernel

w

h

c

Output

Input

W

H

C

36 of 97

Multi-channel 2D Convolution

36

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Output

Kernel

w

h

c

37 of 97

Multi-channel 2D Convolution

37

Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

38 of 97

Multi-channel 2D Convolution

38

Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

39 of 97

Multi-channel 2D Convolution

39

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

40 of 97

Multi-channel 2D Convolution

40

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

41 of 97

Multi-channel 2D Convolution

41

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

42 of 97

Multi-channel 2D Convolution

42

Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

43 of 97

Multi-channel 2D Convolution

43

Output

Input

W

H

C

Kernel

w

h

c

Source: Dr. Francois Fleuret at EPFL

44 of 97

Multi-channel 2D Convolution

44

Output

H – h + 1

W – w + 1

1

Input

W

H

C

Kernel

w

h

c

Source: Dr. Francois Fleuret at EPFL

45 of 97

Multi-channel and Multi-kernel 2D Convolution

45

Source: Dr. Francois Fleuret at EPFL

Output

H – h + 1

W – w + 1

D

w

h

c

Kernels

Input

W

H

C

D

46 of 97

Dealing with Shapes

46

Source: Dr. Francois Fleuret at EPFL

47 of 97

Multi-channel 2D Convolution

The kernel is not swiped across channels, just across rows and columns.
Note that a convolution preserves the signal support structure.

A 1D signal is converted into a 1D signal, a 2D signal into a 2D, and neighboring parts of the input signal inﬂuence neighboring parts of the output signal.

We usually refer to one of the channels generated by a convolution layer as an activation map.
The sub-area of an input map that inﬂuences a component of the output as the receptive ﬁeld of the latter.

47

48 of 97

Padding and Stride

48

49 of 97

Strides

Strides: increment step size for the convolution operator
Reduces the size of the output map

49

Example with kernel size 3×3 and a stride of 2 (image in blue)

Source: https://github.com/vdumoulin/conv_arithmetic

50 of 97

Padding

Padding: artificially fill borders of image
Useful to keep spatial dimension constant across filters
Useful with strides and large receptive fields
Usually fill with 0s

�

50

Source: https://github.com/vdumoulin/conv_arithmetic

51 of 97

Padding and Stride

51

Source: Dr. Francois Fleuret at EPFL

Input

52 of 97

Padding and Stride

52

Source: Dr. Francois Fleuret at EPFL

Input

Output

53 of 97

Padding and Stride

53

Source: Dr. Francois Fleuret at EPFL

Input

Output

54 of 97

Padding and Stride

54

Source: Dr. Francois Fleuret at EPFL

Input

Output

55 of 97

Padding and Stride

55

Source: Dr. Francois Fleuret at EPFL

Input

Output

56 of 97

Padding and Stride

56

Source: Dr. Francois Fleuret at EPFL

Input

Output

57 of 97

Padding and Stride

57

Source: Dr. Francois Fleuret at EPFL

Input

Output

58 of 97

Padding and Stride

58

Source: Dr. Francois Fleuret at EPFL

Input

Output

59 of 97

Padding and Stride

59

Source: Dr. Francois Fleuret at EPFL

Input

Output

60 of 97

Padding and Stride

60

Source: Dr. Francois Fleuret at EPFL

Input

Output

61 of 97

Nonlinear Activation Function

61

62 of 97

Pooling

62

63 of 97

Pooling

Compute a maximum value in a sliding window (max pooling)
Reduce spatial resolution for faster computation
Achieve invariance to local translation
Max pooling introduces invariances

Pooling size : 2×2
No parameters: max or average of 2x2 units

63

64 of 97

Pooling

The most standard type of pooling is the max-pooling, which computes max values over non-overlapping blocks
For instance in 1D with a window of size 2

64

Source: Dr. Francois Fleuret at EPFL

1	3	2	3	0	-1	1	2	2	1

Input

r w

65 of 97

Pooling

65

Source: Dr. Francois Fleuret at EPFL

1	3	2	3	0	-1	1	2	2	1

Input

r w

w

Output

3

r

66 of 97

Pooling

66

Source: Dr. Francois Fleuret at EPFL

1	3	2	3	0	-1	1	2	2	1

w

3

3

0

2

2

Input

r w

Output

r

67 of 97

Pooling

Such an operation aims at grouping several activations into a single “more meaningful” one.

The average pooling computes average values per block instead of max values

67

Source: Dr. Francois Fleuret at EPFL

1	3	2	3	0	-1	1	2	2	1

w

3

3

0

2

2

Input

r w

Output

r

68 of 97

Pooling: Invariance

Pooling provides invariance to any permutation inside one of the cell
More practically, it provides a pseudo-invariance to deformations that result into local translations

68

Source: Dr. Francois Fleuret at EPFL

69 of 97

Pooling: Invariance

Pooling provides invariance to any permutation inside one of the cell
More practically, it provides a pseudo-invariance to deformations that result into local translations

69

Source: Dr. Francois Fleuret at EPFL

70 of 97

Multi-channel Pooling

70

Source: Dr. Francois Fleuret at EPFL

Input

r w

s h

C

71 of 97

Multi-channel Pooling

71

Input

r w

s h

C

Output

72 of 97

Multi-channel Pooling

72

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

73 of 97

Multi-channel Pooling

73

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

74 of 97

Multi-channel Pooling

74

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

75 of 97

Multi-channel Pooling

75

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

76 of 97

Multi-channel Pooling

76

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

77 of 97

Multi-channel Pooling

77

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

78 of 97

Multi-channel Pooling

78

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

79 of 97

Multi-channel Pooling

79

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

80 of 97

Multi-channel Pooling

80

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

81 of 97

Multi-channel Pooling

81

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

82 of 97

Multi-channel Pooling

82

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

83 of 97

Multi-channel Pooling

83

Source: Dr. Francois Fleuret at EPFL

Output

Input

r w

s h

C

84 of 97

Multi-channel Pooling

84

Source: Dr. Francois Fleuret at EPFL

Input

r w

s h

C

Output

r

s

C

85 of 97

Inside the Convolution Layer Block

85

Conv blocks

86 of 97

Classic ConvNet Architecture

Input

Conv blocks

Convolution + activation (relu)
Convolution + activation (relu)
...
Maxpooling

Output

Fully connected layers
Softmax

86

87 of 97

CNNs for Classification: Feature Learning

Learn features in input image through convolution
Introduce non-linearity through activation function (real-world data is non-linear!)
Reduce dimensionality and preserve spatial invariance with pooling

87

Source: 6.S191 Intro. to Deep Learning at MIT

88 of 97

CNNs for Classification: Class Probabilities

CONV and POOL layers output high-level features of input
Fully connected layer uses these features for classifying input image
Express output as probability of image belonging to a particular class

88

Source: 6.S191 Intro. to Deep Learning at MIT

89 of 97

CNNs: Training with Backpropagation

Learn weights for convolutional filters and fully connected layers
Backpropagation: cross-entropy loss

89

Source: 6.S191 Intro. to Deep Learning at MIT

90 of 97

CNN in TensorFlow

90

91 of 97

Lab: CNN with TensorFlow

MNIST example
To classify handwritten digits

91

92 of 97

CNN Structure

92

93 of 97

Loss and Optimizer

Loss

Classification: Cross entropy
Equivalent to applying logistic regression

Optimizer

GradientDescentOptimizer
AdamOptimizer: the most popular optimizer

93

94 of 97

Test or Evaluation

94

95 of 97

CNN for Steel Surface Defects

95

96 of 97

Steel Surface Defects

NEU steel surface defects example

96

97 of 97

CNN with TensorFlow

97