Artificial Neural Network
Dinesh K. Vishwakarma, Ph.D.
PROFESSOR, DEPARTMENT OF INFORMATION TECHNOLOGY
DELHI TECHNOLOGICAL UNIVERSITY, DELHI.
Webpage: http://www.dtu.ac.in/Web/Departments/InformationTechnology/faculty/dkvishwakarma.php
Introduction
Dinesh K. Vishwakarma, Ph.D.
2
11/17/2021
Introduction…
Dinesh K. Vishwakarma, Ph.D.
3
11/17/2021
Human Brain Processing
Dinesh K. Vishwakarma, Ph.D.
4
11/17/2021
Input
Output
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
The human brain is made up of billions of simple processing units – neurons.
Neuron
Dinesh K. Vishwakarma, Ph.D.
5
11/17/2021
…
bias
Activation function
weights
Neuron…
Dinesh K. Vishwakarma, Ph.D.
6
11/17/2021
6
Activation functions
Activation Function works
Neural Network
Dinesh K. Vishwakarma, Ph.D.
7
11/17/2021
How do we train?
4 + 2 = 6 neurons (not counting inputs)
[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters
Weights
Activation functions
Training Perceptron
Dinesh K. Vishwakarma, Ph.D.
8
11/17/2021
Backpropagation
Dinesh K. Vishwakarma, Ph.D.
9
11/17/2021
Backpropagation Algorithm
CS 484 – Artificial Intelligence
10
Hidden Layer representation
Can this be learned?
Target Function:
Yes
CS 484 – Artificial Intelligence
12
Input | | Hidden Values | | Output |
10000000 | → | .89 .04 .08 | → | 10000000 |
01000000 | → | .15 .99 .99 | → | 01000000 |
00100000 | → | .01 .97 .27 | → | 00100000 |
00010000 | → | .99 .97 .71 | → | 00010000 |
00001000 | → | .03 .05 .02 | → | 00001000 |
00000100 | → | .01 .11 .88 | → | 00000100 |
00000010 | → | .80 .01 .98 | → | 00000010 |
00000001 | → | .60 .94 .01 | → | 00000001 |
Example 1 of NN
Dinesh K. Vishwakarma, Ph.D.
13
11/17/2021
W1
W2
W3
f(x)
1.4
-2.5
-0.06
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
14
11/17/2021
2.7
-8.6
0.002
f(x)
1.4
-2.5
-0.06
x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
15
11/17/2021
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
16
11/17/2021
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
17
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Initialise with random weights
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
18
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
1.4
2.7
1.9
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
19
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
1.4
2.7 0.8
1.9
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
20
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
1.4
2.7 0.8
0
1.9 error 0.8
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
21
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
1.4
2.7 0.8
0
1.9 error 0.8
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
22
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
6.4
2.8
1.7
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
23
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
6.4
2.8 0.9
1.7
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
24
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
6.4
2.8 0.9
1
1.7 error -0.1
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
25
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
6.4
2.8 0.9
1
1.7 error -0.1
Example 1 of NN…
Dinesh K. Vishwakarma, Ph.D.
26
11/17/2021
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
And so on ….
6.4
2.8 0.9
1
1.7 error -0.1
Repeat this thousands, maybe millions of times – each time taking a random training instance, and making slight weight adjustments
Algorithms for weight adjustment are designed to make changes that will reduce the error
Example of Digit Recognition
Dinesh K. Vishwakarma, Ph.D.
27
11/17/2021
Machine
“2”
16 x 16 = 256
……
Ink → 1 No ink → 0
……
y1
y2
y10
is 1
is 2
is 0
……
0.1
0.7
0.2
The image is “2”
Example of Neural Network
Dinesh K. Vishwakarma, Ph.D.
28
11/17/2021
Sigmoid Function
1
-1
1
-2
1
-1
1
0
4
-2
0.98
0.12
Example of Neural Network
Dinesh K. Vishwakarma, Ph.D.
29
11/17/2021
1
-2
1
-1
1
0
4
-2
0.98
0.12
2
-1
-1
-2
3
-1
4
-1
0.86
0.11
0.62
0.83
0
0
-2
2
1
-1
Example of Neural Network
Dinesh K. Vishwakarma, Ph.D.
30
11/17/2021
1
-2
1
-1
1
0
0.73
0.5
2
-1
-1
-2
3
-1
4
-1
0.72
0.12
0.51
0.85
0
0
-2
2
Different parameters define different function
0
0
Example of Neural Network
Dinesh K. Vishwakarma, Ph.D.
31
11/17/2021
1
-2
1
-1
1
0
4
-2
0.98
0.12
1
-1
Example of Neural Network
Dinesh K. Vishwakarma, Ph.D.
32
11/17/2021
……
……
……
……
……
……
……
……
y1
y2
yM
W1
W2
WL
b2
bL
x
a1
a2
y
b1
W1
x
+
b2
W2
a1
+
bL
WL
+
aL-1
b1
Neural Network
Dinesh K. Vishwakarma, Ph.D.
33
11/17/2021
……
……
……
……
……
……
……
……
y1
y2
yM
W1
W2
WL
b2
bL
x
a1
a2
y
y
x
b1
W1
x
+
b2
W2
+
bL
WL
+
…
b1
…
Using parallel computing techniques to speed up matrix operation
Softmax
Dinesh K. Vishwakarma, Ph.D.
34
11/17/2021
Ordinary Layer
In general, the output of network can be any value.
May not be easy to interpret
Softmax
Dinesh K. Vishwakarma, Ph.D.
35
11/17/2021
3
-3
1
2.7
20
0.05
0.88
0.12
≈0
Network Parameters
Dinesh K. Vishwakarma, Ph.D.
36
11/17/2021
16 x 16 = 256
……
……
……
……
……
Ink → 1
No ink → 0
……
y1
y2
y10
0.1
0.7
0.2
y1 has the maximum value
Set the network parameters such that ……
Input:
y2 has the maximum value
Input:
is 1
is 2
is 0
Softmax
Visual Information Processing
Dinesh K. Vishwakarma, Ph.D.
37
11/17/2021
Enabling Factor of DL
Dinesh K. Vishwakarma, Ph.D.
38
11/17/2021
Hierarchical Learning
Dinesh K. Vishwakarma, Ph.D.
39
11/17/2021
Low-level features
output
Mid-level features
High-level features
Trainable classifier
Inspired from visual information processing, a representation of Hierarchical Learning is developed, also know as “Deep Learning”
First in 1986 by Rina Dechter
Revolution since 2012
Deep Neural Network
Dinesh K. Vishwakarma, Ph.D.
40
11/17/2021
Output Layer
Hidden Layers
Input Layer
Input
Output
Layer 1
……
……
Layer 2
……
Layer L
……
……
……
……
……
y1
y2
yM
Deep means many hidden layers
neuron
Why Deep Network?
Dinesh K. Vishwakarma, Ph.D.
41
11/17/2021
Layer X Size | Word Error Rate (%) | Layer X Size | Word Error Rate (%) |
1 X 2k | 24.2 | | |
2 X 2k | 20.4 | | |
3 X 2k | 18.4 | | |
4 X 2k | 17.8 | | |
5 X 2k | 17.2 | 1 X 3772 | 22.5 |
7 X 2k | 17.1 | 1 X 4634 | 22.6 |
| | 1 X 16k | 22.1 |
Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech. 2011.
Not surprised, more parameters, better performance
Why Deep Network?
Dinesh K. Vishwakarma, Ph.D.
42
11/17/2021
Any continuous function f
Can be realized by a network with one hidden layer
(given enough hidden neurons)
Why “Deep” neural network not “Fat” neural network?
Dinesh K. Vishwakarma, Ph.D.
43
11/17/2021
Fat + Short v.s. Thin + Tall
……
Deep
……
……
Shallow
Which one is better?
The same number of parameters
Fat + Short v.s. Thin + Tall
Dinesh K. Vishwakarma, Ph.D.
44
11/17/2021
Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech. 2011.
Layer X Size | Word Error Rate (%) | Layer X Size | Word Error Rate (%) |
1 X 2k | 24.2 | | |
2 X 2k | 20.4 | | |
3 X 2k | 18.4 | | |
4 X 2k | 17.8 | | |
5 X 2k | 17.2 | 1 X 3772 | 22.5 |
7 X 2k | 17.1 | 1 X 4634 | 22.6 |
| | 1 X 16k | 22.1 |
Training multi-layer NNs (DNN)
Dinesh K. Vishwakarma, Ph.D.
45
11/17/2021
Dinesh K. Vishwakarma, Ph.D.
46
11/17/2021
Train this layer first
Training multi-layer NNs
Training multi-layer NNs
Dinesh K. Vishwakarma, Ph.D.
47
11/17/2021
Train this layer first
then this layer
Training multi-layer NNs
Dinesh K. Vishwakarma, Ph.D.
48
11/17/2021
Train this layer first
then this layer
then this layer
Training multi-layer NNs
Dinesh K. Vishwakarma, Ph.D.
49
11/17/2021
Train this layer first
then this layer
then this layer
then this layer
Training multi-layer NNs
Dinesh K. Vishwakarma, Ph.D.
50
11/17/2021
Train this layer first
then this layer
then this layer
then this layer
finally this layer
When to use Deep Learning?
Dinesh K. Vishwakarma, Ph.D.
51
11/17/2021
Fuel of deep learning is the big data by Andrew Ng
Deep
Learning
Machine
Learning
Amount of Data
Performance
Limitations of Deep Learning
Dinesh K. Vishwakarma, Ph.D.
52
11/17/2021
Thank you!�dinesh@dtu.ac.in
Dinesh K. Vishwakarma, Ph.D.
Slide 53 of 74
11/17/2021
Problems on Neural Networks
Dinesh K. Vishwakarma, Ph.D.
Slide 54 of 74
11/17/2021
Problem 1
Dinesh K. Vishwakarma, Ph.D.
55
11/17/2021
Solutions
Dinesh K. Vishwakarma, Ph.D.
56
11/17/2021
Problem 2
Dinesh K. Vishwakarma, Ph.D.
57
11/17/2021
Solutions
Dinesh K. Vishwakarma, Ph.D.
58
11/17/2021
Problem 3
Dinesh K. Vishwakarma, Ph.D.
59
11/17/2021
Solutions
Dinesh K. Vishwakarma, Ph.D.
60
11/17/2021
Solutions
Dinesh K. Vishwakarma, Ph.D.
61
11/17/2021
Solutions
Dinesh K. Vishwakarma, Ph.D.
62
11/17/2021