1 of 72

Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN): Image Processing

2 of 72

Housing Price Prediction

size of house

price

3 of 72

Housing Price Prediction

4 of 72

Housing Price Prediction

#bedrooms

zip code

wealth

size

 

 

 

 

y

5 of 72

Supervised Learning

Output (y)

Application

Input(x)

Click on ad? (0/1)

Online Advertising

Ad, user info

Object (1,…,1000)

Photo tagging

Image

Text transcript

Speech recognition

Audio

Price

Real Estate

Home features

Chinese

Machine translation

English

Position of other cars

Autonomous driving

Image, Radar info

6 of 72

Neural Network examples

Standard NN

Recurrent NN

Convolutional NN

7 of 72

Size

#bedrooms

Price (1000$s)

2104

1600

2400

3000

3

3

3

4

400

330

369

540

Structured Data

Supervised Learning

User Age

Ad Id

Click

41

80

18

27

93242

93287

87312

71244

1

0

1

1

Unstructured Data

Image

Four scores and seven years ago…

Text

Audio

8 of 72

Neural Networks

  • Subset of machine learning models that are inspired from the working of biological neurons
  • The “building blocks” of neural networks are the neurons.
    • In technical systems, we also refer to them as units or nodes.
  • Basically, each neuron
    • receives input from many other neurons.
    • changes its internal state (activation) based on the current input.
    • sends one output signal to many other neurons, possibly including its input neurons (recurrent network).

9 of 72

How do our brains work?

  • A processing element
    • Dendrites: Input
    • Cell body: Processor
    • Synaptic: Link
    • Axon: Output
  • A neuron is connected to other neurons through about 10,000 synapses
  • A neuron receives input from other neurons. Inputs are combined
  • Once input exceeds a critical level, the neuron discharges a spike ‐ an electrical pulse that travels from the body, down the axon, to the next neuron(s)

10 of 72

Artificial Neurons

  • Inspired from Biological Neurons

  • Information is received from multiple inputs, processed and a combined output is generated.

  • Does not mimic all the functionality of biological neuron, it is a highly restrictive model

Dendrites: Input

Cell body: Processor

Axon terminals (Synaptic): Link

Axon: Output

An artificial neuron is an imitation of a human neuron

11 of 72

Artificial Neurons

Now, let us have a look at the model of an artificial neuron.

12 of 72

How do ANNs work?

Output

x1

x2

xm

y

Processing

Input

∑= X1+X2 + ….+Xm =y

. . . . . . . . . . . .

13 of 72

How do ANNs work?

Not all inputs are equal

Output

x1

x2

xm

y

Processing

Input

∑= X1w1+X2w2 + ….+Xmwm =y

w1

w2

wm

weights

. . . . . . . . . . . .

. . . . .

14 of 72

How do ANNs work?

The signal is not passed down to the next neuron verbatim

Transfer Function (Activation Function)

Output

x1

x2

xm

y

Processing

Input

w1

w2

wm

weights

. . . . . . . . . . . .

f(vk)

. . . . .

15 of 72

How do ANNs work?

  • Inputs are multiplied with weights

  • Weighted inputs are added together with bias

  • The sum is passed through activation function f

16 of 72

Simple Neural Network

Neural Network with 2 inputs, 2 hidden nodes and one output

17 of 72

Activation Functions

  • Activation functions keep the output of a neuron restricted to a certain limit

  • Important function is to introduce non-linearity into the neural network

Sigmoid

ReLU (Rectified Linear Unit)

18 of 72

Activation Functions

Sigmoid

ReLU (Rectified Linear Unit)

Threshold

Tanh

19 of 72

Example

Goal is to find the function y

in terms of x1 and x2

  • We are given the dataset on the right

  • Let’s solve this using neural networks!

20 of 72

Simple Neural Network

Neural Network with 2 inputs, 1 hidden nodes and one output

21 of 72

Training a Neural Network

  • Loss function
    • Loss function helps us understand ”How good the model is doing”. One way to evaluate our model

    • Mean Squared Error (Regression Problems)

    • Cross Entropy Loss / Negative Log Likelihood (Classification problems)

22 of 72

Backpropagation

How to modify weights given the loss, so that loss can be minimized

  • Loss function can be written in terms of the weights and biases

  • Calculate how much loss changes when we tweak the weights (learnable parameters)

  • The system of calculating partial derivatives by working backwards is called backpropagation

23 of 72

Optimization – Gradient Descent

How to update the weights?

  • Optimization function tells us how to update the weights and biases to minimize the loss

  • n – learning rate that controls how fast we train. It is a hyper parameter.

  • A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data.

24 of 72

Gradient Descent intuition

Loss function graphed

We want to find the value of

weights such that loss is minimized

25 of 72

Neural Network Training – Put it all together

  • 1. Choose a sample data point from the dataset.

  • 2. Pass the input through the network

  • 3. Calculate all the partial derivatives of loss with respect to weights or biases (e.g. ∂L/∂w1​, ∂L​/∂w2​, etc).

  • 4. Use the optimization equation to update each weight and bias.

  • 5. Go back to step 1.

26 of 72

Step 1: Feed Forward Step

For simplicity, assume initial weights are 0.5 and biases are 0.

Let’s take activation function as ReLU

Start with first sample from dataset

27 of 72

Step 2: Calculate Loss and Backpropagation

Let’s take Loss function as MSE

28 of 72

Step 3: Update weights with gradient descent

29 of 72

Step 4: Repeat the process

  • Now we repeat the process with other samples from the dataset and update the weights

  • We can stop this if our loss value is zero, then we approximated the function y

  • Or, we can stop after a set number of maximum iterations. (eg. 100 max iterations)

30 of 72

Deep neural networks

  • Neural network with more than 2 layers
  • DNNs are a specific type of ANN characterized by their depth, meaning they have many hidden layers.
  • Can model more complex functions

31 of 72

Anatomy of a deep neural networks

  • Layers
  • Input data and targets
  • Loss function
  • Optimizer

32 of 72

Layers

  • Data processing modules
  • Many different kinds exist
    • densely connected
    • convolutional
    • recurrent
    • pooling, flattening, merging, normalization, etc.
  • Input: one or more tensors�output: one or more tensors
  • Usually have a state, encoded as weights
    • learned, initially random
  • When combined, form a network or�a model

33 of 72

Input data and targets

  • The network maps the input data X to predictions Y′
  • During training, the predictions Y′ are compared to true targets Y using the loss function

cat

dog

34 of 72

Loss function

  • The quantity to be minimized (optimized) during training
    • the only thing the network cares about
    • there might also be other metrics you care about
  • Common tasks have “standard” loss functions:
    • mean squared error for regression
    • binary cross-entropy for two-class classification
    • categorical cross-entropy for multi-class classification
    • etc.
  • https://lossfunctions.tumblr.com/

35 of 72

Optimizer

  • How to update the weights based on the loss function
  • Learning rate (+scheduling)
  • Stochastic gradient descent, momentum, and their variants

Animation from: https://imgur.com/s25RsOr

36 of 72

Anatomy of a deep neural networks

37 of 72

Smaller Network: CNN

  • We know it is good to learn a small model.
  • From this fully connected model, do we really need all the edges?
  • Can some of these be shared?

38 of 72

A CNN arranges its neurons in three dimensions (width, height, depth). Every layer of a CNN transforms the 3D input volume to a 3D output volume. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

A regular 3-layer Neural Network.

39 of 72

Consider learning an image:

  • Some patterns are much smaller than the whole image

beak” detector

Can represent a small region with fewer parameters

40 of 72

Same pattern appears in different places:�They can be compressed!�What about training a lot of such “small” detectors�and each detector must “move around”.

“upper-left beak” detector

“middle beak” detector

They can be compressed

to the same parameters.

41 of 72

What are Convolutional Neural Networks (CNNs)

  • CNNs are similar to MLPs since they only feed signals forward (feedforward nets), but have different kind of layers unique to CNNs
    • Convolutional Layer: process data in a small receptive field (i.e., filter)
    • Pooling layer: Pooling layers perform spatial down-sampling by selecting the most important information from the feature maps produced by the convolutional layers
    • Dense (fully connected) layer: These layers consolidate high-level features and make final decisions

42 of 72

A simple CNN structure

CONV: Convolutional kernel layer

RELU: Activation function

POOL: Dimension reduction layer

FC: Fully connection layer

43 of 72

Convolutional Layer

  • Convolutional layers apply a set of learnable filters to input data to extract meaningful features
    • Filters also known as kernels
  • Designed to detect patterns and hierarchies of features
  • Learnable Filters (Kernels)
    • these filters are adjusted through backpropagation to capture relevant features in the data
    • learn the most informative patterns

44 of 72

Convolutional Kernels

  • A convolutional kernel, often referred to as a filter or convolutional filter, is a small matrix used for feature extraction
  • Kernels are applied to the input data through convolution operations
  • Kernels serve as feature detectors
    • Each kernel specializes in detecting a specific pattern or feature within the input data
    • For example, one kernel might detect edges, while another could detect corners
  • Learnable Parameters
    • Kernels contain learnable parameters that are adjusted during training
    • These parameters determine the filter's behavior and help it adapt to detect relevant features in the data
  • Size
    • Kernels are typically square and have a small size, such as 3x3 or 5x5
    • The size of the kernel affects the receptive field

Kernel

45 of 72

Feature Maps

  • Feature map
    • also known as an activation map or a convolutional feature map
    • A two-dimensional grid of values resulting from the application of a convolutional kernel to the input data
    • Each feature map corresponds to a specific kernel
  • It represents the presence or activation of particular features
  • A way to visualize which patterns or structures the network has detected
  • The number of feature maps in a layer is determined by the number of kernels or filters used

Feature Activation Map

46 of 72

What is a Convolution?

  • Weighted moving sum

Input

Feature Activation Map

...

47 of 72

Convolution

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

6 x 6 image

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

-1

1

-1

-1

1

-1

-1

1

-1

Filter 2

……

These are the network parameters to be learned.

Each filter detects a small pattern (3 x 3).

48 of 72

Convolution

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

6 x 6 image

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

3

-1

stride=1

Dot

product

49 of 72

Convolution

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

6 x 6 image

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

3

-3

If stride=2

50 of 72

Convolution

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

6 x 6 image

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

3

-1

-3

-1

-3

1

0

-3

-3

-3

0

1

3

-2

-2

-1

stride=1

51 of 72

Convolution

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

6 x 6 image

3

-1

-3

-1

-3

1

0

-3

-3

-3

0

1

3

-2

-2

-1

-1

1

-1

-1

1

-1

-1

1

-1

Filter 2

-1

-1

-1

-1

-1

-1

-2

1

-1

-1

-2

1

-1

0

-4

3

Repeat this for each filter

stride=1

Two 4 x 4 images

Forming 2 x 4 x 4 matrix

Feature

Map

52 of 72

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

image

convolution

-1

1

-1

-1

1

-1

-1

1

-1

1

-1

-1

-1

1

-1

-1

-1

1

……

……

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

Convolution v.s. Fully Connected

Fully-connected

53 of 72

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

6 x 6 image

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

1

2

3

8

9

13

14

15

Only connect to 9 inputs, not fully connected

4:

10:

16

1

0

0

0

0

1

0

0

0

0

1

1

3

fewer parameters!

54 of 72

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

1:

2:

3:

7:

8:

9:

13:

14:

15:

4:

10:

16:

1

0

0

0

0

1

0

0

0

0

1

1

3

-1

Shared weights

6 x 6 image

Fewer parameters

Even fewer parameters

55 of 72

A CNN compresses a fully connected network in two ways:

  • Reducing number of connections
  • Shared weights on the edges
  • Max pooling further reduces the complexity

56 of 72

Color image: RGB 3 channels

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

-1

1

-1

-1

1

-1

-1

1

-1

Filter 2

1

-1

-1

-1

1

-1

-1

-1

1

1

-1

-1

-1

1

-1

-1

-1

1

-1

1

-1

-1

1

-1

-1

1

-1

-1

1

-1

-1

1

-1

-1

1

-1

Color image

57 of 72

ReLU Layer

  • ReLU is an activation function
  • Activation functions are typically applied after each convolutional layer in CNNs
  • The main aim is to remove all the negative values from the convolution
  • Activation functions introduce non-linearity,
    • allowing the network to model complex data patterns and relationships effectively

3

0

0

0

0

1

0

0

0

0

0

1

3

0

0

0

58 of 72

Pooling layer

  • Pooling layers
    • also known as pooling operations
    • crucial component in CNNs
    • used for feature extraction and dimensionality reduction
  • These layers are inserted between convolutional layers
    • downsample feature maps
    • reducing their spatial dimensions
    • retain important information
  • Pooling is performed by implementing the following 4 steps
    • Pick a window size (usually 2 or 3)
    • Pick a stride (usually 2)
    • Walk your window across your filtered images
    • From each window, take the maximum value
  • Pooling helps reduce computational complexity, mitigate overfitting, and increase the depth of CNNs
  • In general, max pooling method is used

59 of 72

Max Pooling

3

-1

-3

-1

-3

1

0

-3

-3

-3

0

1

3

-2

-2

-1

-1

1

-1

-1

1

-1

-1

1

-1

Filter 2

-1

-1

-1

-1

-1

-1

-2

1

-1

-1

-2

1

-1

0

-4

3

1

-1

-1

-1

1

-1

-1

-1

1

Filter 1

60 of 72

Why Pooling

  • Subsampling pixels will not change the object

Subsampling

bird

bird

We can subsample the pixels to make image smaller

fewer parameters to characterize the image

61 of 72

Key characteristics and functions of pooling layers

  • Downsampling
    • Pooling layers reduce the spatial dimensions (width and height) of the feature maps
  • Translation Invariance
    • Pooling layers help achieve translation invariance in the learned features
  • Dimension Reduction
    • By reducing the spatial dimensions of feature maps, pooling layers reduce the number of parameters in the subsequent layers

62 of 72

The whole CNN

Fully Connected Feedforward network

cat dog ……

Convolution

Max Pooling

Convolution

Max Pooling

Flattened

Can repeat many times

63 of 72

Max Pooling

1

0

0

0

0

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

0

1

0

0

1

0

0

1

0

0

0

1

0

1

0

6 x 6 image

3

0

1

3

-1

1

3

0

2 x 2 image

New image

but smaller

Conv

Max

Pooling

64 of 72

The whole CNN

Convolution

Max Pooling

Convolution

Max Pooling

Can repeat many times

A new image

The number of channels is the number of filters

Smaller than the original image

3

0

1

3

-1

1

3

0

65 of 72

The whole CNN

Fully Connected Feedforward network

cat dog ……

Convolution

Max Pooling

Convolution

Max Pooling

Flattened

A new image

A new image

66 of 72

Flattening

3

0

1

3

-1

1

3

0

Flattened

3

0

1

3

-1

1

0

3

Fully Connected Feedforward network

67 of 72

Fully Connected Layers

  • CNNs primarily consist of convolutional and pooling layers for feature extraction
  • fully connected layers are typically found at the end of the network
  • Feature Aggregation
    • Aggregate and process the features extracted by the preceding convolutional and pooling layers.
  • In the context of tasks like image classification, fully connected layers make the final decisions
  • They map the learned features to the output classes or values, producing the network's predictions
  • The final fully connected layer typically has as many neurons as there are output classes
  • The softmax activation function is often used in this layer to produce class probabilities

68 of 72

Hierarchical Feature Extraction

Retain most information (edge detectors)

Towards more abstract representation

Encode high level concepts

Sparser representations:

Detect less (more abstract) features

69 of 72

Data Preparation

70 of 72

Basic CNN model definition

71 of 72

Model summary

72 of 72

Training

Evaluation