ME 5990
�
Image Recognition�Convolution, Pooling
Partial Slide by Gang Hua, Wompex AI
Outline
AlexNet
[0.86, 0.14]
[0.14, 0.86]
Loss Function
Training Labels
Loss
Special effects: shape and motion capture
Source: S. Seitz
3D visualization: Microsoft Photosynth
Source: S. Seitz
Optical character recognition (OCR)
Automatic check processing
Source: S. Seitz
Biometrics
Fingerprint scanners on many new laptops, �other devices
Face recognition systems now beginning to appear more widely�http://www.sensiblevision.com/
Source: S. Seitz
Biometrics
Source: S. Seitz
Mobile visual search
Lincoln, Microsoft Research
Face detection
Source: S. Seitz
Smile detection
Source: S. Seitz
Face annotation
Windows Live Photo Gallery
Automotive safety
Source: A. Shashua, S. Seitz
Vision for robotics, space exploration
Vision systems (JPL) used for several tasks
NASA'S Mars Exploration Rover Spirit captured this westward view from atop �a low plateau where Spirit spent the closing months of 2007.
Source: S. Seitz
Outline
Image Format: Very Brief Idea
https://pursuit.unimelb.edu.au/articles/it-s-time-to-retire-lena-from-computer-science
https://www.geeksforgeeks.org/matlab-rgb-image-representation/
Image Processing: Convolution
Convolution: previous we learnt
Convolution
f
Source: F. Durand
Convolution
Demonstration
Demonstration
Annoying details
f
g
g
g
g
f
g
g
g
g
f
g
g
g
g
full
same
valid
Annoying Details
Stride
Padding
The ultimate equation
Convolution Application
Convolution Application
Sharpening
original
smoothed (5x5)
–
detail
=
sharpened
=
Let’s add it back:
original
detail
+
Gaussian Filter
0.003 0.013 0.022 0.013 0.003
0.013 0.059 0.097 0.059 0.013
0.022 0.097 0.159 0.097 0.022
0.013 0.059 0.097 0.059 0.013
0.003 0.013 0.022 0.013 0.003
Gaussian Blur
Box Blur
Gaussian Blur
Convolution for Gradient
Edge Detection
Original
Canny Edge Detection
About Convolution
The dress: what color is it?
https://www.youtube.com/watch?v=n9fwiNyDHLI
Outline
Tensor
Cauchy Stress Tensor
Image tensor, with dimension 255 x 255 x 3
Convolutions Over Volumes
2D convolution on volume
2D convolution on volume
Solution: the 3rd dimension of the input and output shall be the same
2D convolution on volume
2D convolution on volume
Multiple Filters
…
…
Multiple Filters
https://cs231n.github.io/convolutional-networks/
Multiple Filters
Multiple Filters
Bias
No. of parameters
No. of parameters
Solution: D
Summary
Summary
Outline
Pooling Function
Max pooling
Pooling function
Outline
Typical convolutional neural network
Typical network arrangement
Activation
…
Repeat 2d convolution and activation
…
Repeat pooling and activation
Activation
…
Repeat FC and activation
Flatten
Fully-connected
Activation
Softmax
Output: one-hot
Flatten Layer
Network Dimension
Input 227 x 227 x 3
h1: ? x ? x ?
conv1: 11 x 11 x 3 x 96�stride: 4
padding: 0 �ReLU activation
h2: ? x ? x ?
conv2: 5 x 5 x ? x 96�stride: 8
padding: 2 �ReLU activation
h3: ? x ? x ?
Max pooling 2 x 2
Stride: 2�ReLU activation
…
Network Dimension
Input 227 x 227 x 3
h1: ? x ? x ?
conv1: 11 x 11 x 3 x 96�stride: 4
padding: 0 �ReLU activation
h2: ? x ? x ?
conv2: 5 x 5 x ? x 96�stride: 8
padding: 2 �ReLU activation
h3: ? x ? x ?
Max pooling 2 x 2
Stride: 2�ReLU activation
h4: ? x ? x ?
Conv 3: 3 x 3 x ? x 48
padding: 0
Stride: 1�ReLU activation
h5: ? x ?
flatten
h6: ? x ?
FC: ? x 1024�Sigmoid
FC: ? x ?�Sigmoid
Output before softmax h7: ? x ?
Output in one-hot: ? x ?
Softmax
Network Dimension
Input 227 x 227 x 3
h1: 55 x 55 x 96
conv1: 11 x 11 x 3 x 96�stride: 4
padding: 0 �ReLU activation
h2: ? x ? x ?
conv2: 5 x 5 x ? x 96�stride: 8
padding: 2 �ReLU activation
h3: ? x ? x ?
Max pooling 2 x 2
Stride: 2�ReLU activation
…
Network Dimension
Input 227 x 227 x 3
h1: 55 x 55 x 96
conv1: 11 x 11 x 3 x 96�stride: 4
padding: 0 �ReLU activation
h2: 14 x 14 x 96
conv2: 5 x 5 x ? x 96�stride: 4
padding: 2 �ReLU activation
h3: ? x ? x ?
Max pooling 2 x 2
Stride: 2�ReLU activation
…
Network Dimension
Input 227 x 227 x 3
h1: 55 x 55 x 96
conv1: 11 x 11 x 3 x 96�stride: 4
padding: 0 �ReLU activation
h2: 14 x 14 x 96
conv2: 5 x 5 x ? x 96�stride: 4
padding: 2 �ReLU activation
h3: ? x ? x ?
Max pooling 2 x 2
Stride: 2�ReLU activation
…
Network Dimension
h3: 7 x 7 x 96
h4: 5 x 5 x 48
Conv 3: 3 x 3 x 96 x 48
padding: 0
Stride: 1�ReLU activation
h5: ? x ?
flatten
h6: ? x ?
FC: ? x 1024�Sigmoid
FC: ? x ?�Sigmoid
Output before softmax h7: ? x ?
Output in one-hot: ? x ?
Softmax
Network Dimension
h3: 7 x 7 x 96
h4: 5 x 5 x 48
Conv 3: 3 x 3 x 96 x 48
padding: 0
Stride: 1�ReLU activation
h5: 1200 x 1
flatten
h6: 1024 x 1
FC1: ? x 1024�Sigmoid
FC2: ? x ?�Sigmoid
Output before softmax h7: ? x ?
Output in one-hot: ? x ?
Softmax
Network Dimension
h3: 7 x 7 x 96
h4: 5 x 5 x 48
Conv 3: 3 x 3 x 96 x 48
padding: 0
Stride: 1�ReLU activation
h5: 1200 x 1
flatten
h6: 1024 x 1
FC1: 1200 x 1024�Sigmoid
FC2: 1000 x 1024�Sigmoid
Output before softmax h7: 1000 x 1
Output in one-hot: 1000 x 1
Softmax
Network Dimension
Function (layer) | # Parameter |
conv1 | 34944 |
conv2 | 230496 |
max pooling | 0 |
conv3 | 41520 |
flatten | 0 |
fc1 | 1229824 |
fc2 | 1025000 |
SoftMax | 0 |
Sum | 1229824 |
Outline
LeNet
https://www.researchgate.net/figure/The-LeNet-5-Architecture-a-convolutional-neural-network_fig4_321586653
AlexNet Discussion
https://medium.com/analytics-vidhya/concept-of-alexnet-convolutional-neural-network-6e73b4f9ee30
PyTorch Discussion: AlexNet Definition
PyTroch Demonstration
VGG
VGG-19
Image from geek for geek.
ResNET
ResNet
ResNet
https://arxiv.org/pdf/1512.03385
PyTorch Implementation for Residual block
Tensor,
Residual Block Implementation in PyTorch
Depth-wise separable convolution
PNAS Net
https://sh-tsang.medium.com/reading-pnasnet-progressive-neural-architecture-search-image-classification-1beb1de06fe6
Summary
https://paperswithcode.com/sota/image-classification-on-imagenet