Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN): Image Processing
Housing Price Prediction
size of house
price
Housing Price Prediction
Housing Price Prediction
#bedrooms
zip code
wealth
size
y
Supervised Learning
Output (y)
Application
Input(x)
Click on ad? (0/1)
Online Advertising
Ad, user info
Object (1,…,1000)
Photo tagging
Image
Text transcript
Speech recognition
Audio
Price
Real Estate
Home features
Chinese
Machine translation
English
Position of other cars
Autonomous driving
Image, Radar info
Neural Network examples
Standard NN
Recurrent NN
Convolutional NN
Size | #bedrooms | … | Price (1000$s) |
2104 1600 2400 ⋮ 3000 | 3 3 3 ⋮ 4 | | 400 330 369 ⋮ 540 |
Structured Data
Supervised Learning
User Age | Ad Id | … | Click |
41 80 18 ⋮ 27 | 93242 93287 87312 ⋮ 71244 | | 1 0 1 ⋮ 1 |
Unstructured Data
Image
Four scores and seven years ago…
Text
Audio
Neural Networks
How do our brains work?
Artificial Neurons
Dendrites: Input
Cell body: Processor
Axon terminals (Synaptic): Link
Axon: Output
An artificial neuron is an imitation of a human neuron
Artificial Neurons
• Now, let us have a look at the model of an artificial neuron.
How do ANNs work?
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1+X2 + ….+Xm =y
. . . . . . . . . . . .
How do ANNs work?
Not all inputs are equal
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1w1+X2w2 + ….+Xmwm =y
w1
w2
wm
weights
. . . . . . . . . . . .
. . . . .
How do ANNs work?
The signal is not passed down to the next neuron verbatim
Transfer Function (Activation Function)
Output
x1
x2
xm
∑
y
Processing
Input
w1
w2
wm
weights
. . . . . . . . . . . .
f(vk)
. . . . .
How do ANNs work?
Simple Neural Network
Neural Network with 2 inputs, 2 hidden nodes and one output
Activation Functions
Sigmoid
ReLU (Rectified Linear Unit)
Activation Functions
Sigmoid
ReLU (Rectified Linear Unit)
Threshold
Tanh
Example
Goal is to find the function y
in terms of x1 and x2
Simple Neural Network
Neural Network with 2 inputs, 1 hidden nodes and one output
Training a Neural Network
Backpropagation
How to modify weights given the loss, so that loss can be minimized
Optimization – Gradient Descent
How to update the weights?
Gradient Descent intuition
Loss function graphed
We want to find the value of
weights such that loss is minimized
Neural Network Training – Put it all together
Step 1: Feed Forward Step
For simplicity, assume initial weights are 0.5 and biases are 0.
Let’s take activation function as ReLU
Start with first sample from dataset
Step 2: Calculate Loss and Backpropagation
Let’s take Loss function as MSE
Step 3: Update weights with gradient descent
Step 4: Repeat the process
Deep neural networks
Anatomy of a deep neural networks
Layers
Input data and targets
cat
dog
Loss function
Optimizer
Animation from: https://imgur.com/s25RsOr
Anatomy of a deep neural networks
Smaller Network: CNN
A CNN arranges its neurons in three dimensions (width, height, depth). Every layer of a CNN transforms the 3D input volume to a 3D output volume. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).
A regular 3-layer Neural Network.
Consider learning an image:
“beak” detector
Can represent a small region with fewer parameters
Same pattern appears in different places:�They can be compressed!�What about training a lot of such “small” detectors�and each detector must “move around”.
“upper-left beak” detector
“middle beak” detector
They can be compressed
to the same parameters.
What are Convolutional Neural Networks (CNNs)
A simple CNN structure
CONV: Convolutional kernel layer
RELU: Activation function
POOL: Dimension reduction layer
FC: Fully connection layer
Convolutional Layer
Convolutional Kernels
Kernel
Feature Maps
Feature Activation Map
What is a Convolution?
Input
Feature Activation Map
...
Convolution
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
6 x 6 image
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
Filter 2
……
These are the network parameters to be learned.
Each filter detects a small pattern (3 x 3).
Convolution
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
6 x 6 image
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
3
-1
stride=1
Dot
product
Convolution
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
6 x 6 image
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
3
-3
If stride=2
Convolution
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
6 x 6 image
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
3
-1
-3
-1
-3
1
0
-3
-3
-3
0
1
3
-2
-2
-1
stride=1
Convolution
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
6 x 6 image
3
-1
-3
-1
-3
1
0
-3
-3
-3
0
1
3
-2
-2
-1
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
Filter 2
-1
-1
-1
-1
-1
-1
-2
1
-1
-1
-2
1
-1
0
-4
3
Repeat this for each filter
stride=1
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Feature
Map
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
image
convolution
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
……
……
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
Convolution v.s. Fully Connected
Fully-connected
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
6 x 6 image
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
1
2
3
…
8
9
…
13
14
15
…
Only connect to 9 inputs, not fully connected
4:
10:
16
1
0
0
0
0
1
0
0
0
0
1
1
3
fewer parameters!
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
1:
2:
3:
…
7:
8:
9:
…
13:
14:
15:
…
4:
10:
16:
1
0
0
0
0
1
0
0
0
0
1
1
3
-1
Shared weights
6 x 6 image
Fewer parameters
Even fewer parameters
A CNN compresses a fully connected network in two ways:
Color image: RGB 3 channels
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
Filter 2
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
Color image
ReLU Layer
3
0
0
0
0
1
0
0
0
0
0
1
3
0
0
0
Pooling layer
Max Pooling
3
-1
-3
-1
-3
1
0
-3
-3
-3
0
1
3
-2
-2
-1
-1 | 1 | -1 |
-1 | 1 | -1 |
-1 | 1 | -1 |
Filter 2
-1
-1
-1
-1
-1
-1
-2
1
-1
-1
-2
1
-1
0
-4
3
1 | -1 | -1 |
-1 | 1 | -1 |
-1 | -1 | 1 |
Filter 1
Why Pooling
Subsampling
bird
bird
We can subsample the pixels to make image smaller
fewer parameters to characterize the image
Key characteristics and functions of pooling layers
The whole CNN
Fully Connected Feedforward network
cat dog ……
Convolution
Max Pooling
Convolution
Max Pooling
Flattened
Can repeat many times
Max Pooling
1 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 |
6 x 6 image
3
0
1
3
-1
1
3
0
2 x 2 image
New image
but smaller
Conv
Max
Pooling
The whole CNN
Convolution
Max Pooling
Convolution
Max Pooling
Can repeat many times
A new image
The number of channels is the number of filters
Smaller than the original image
3
0
1
3
-1
1
3
0
The whole CNN
Fully Connected Feedforward network
cat dog ……
Convolution
Max Pooling
Convolution
Max Pooling
Flattened
A new image
A new image
Flattening
3
0
1
3
-1
1
3
0
Flattened
3
0
1
3
-1
1
0
3
Fully Connected Feedforward network
Fully Connected Layers
Hierarchical Feature Extraction
Retain most information (edge detectors)
Towards more abstract representation
Encode high level concepts
Sparser representations:
Detect less (more abstract) features
Data Preparation
Basic CNN model definition
Model summary
Training
Evaluation