1 of 62

Artificial Neural Network

Dinesh K. Vishwakarma, Ph.D.

PROFESSOR, DEPARTMENT OF INFORMATION TECHNOLOGY

DELHI TECHNOLOGICAL UNIVERSITY, DELHI.

Webpage: http://www.dtu.ac.in/Web/Departments/InformationTechnology/faculty/dkvishwakarma.php

2 of 62

Introduction

  • Artificial neural networks (ANNs) provide a practical method for learning
      • real-valued functions
      • discrete-valued functions
      • vector-valued functions
  • Robust to errors in training data
  • Successfully applied to such problems as
      • interpreting visual scenes
      • speech recognition
      • learning robot control strategies

Dinesh K. Vishwakarma, Ph.D.

2

11/17/2021

3 of 62

Introduction…

  • ANN learning well-suit to problems which the training data corresponds to noisy, complex data (inputs from cameras or microphones)
  • Can also be used for problems with symbolic representations
  • Most appropriate for problems where
    • Instances have many attribute-value pairs
    • Target function output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes
    • Training examples may contain errors
    • Long training times are acceptable
    • Fast evaluation of the learned target function may be required
    • The ability for humans to understand the learned target function is not important.

Dinesh K. Vishwakarma, Ph.D.

3

11/17/2021

4 of 62

Human Brain Processing

Dinesh K. Vishwakarma, Ph.D.

4

11/17/2021

Input

Output

Dendrites: Input

Cell body: Processor

Synaptic: Link

Axon: Output

The human brain is made up of billions of simple processing units – neurons.

5 of 62

Neuron

Dinesh K. Vishwakarma, Ph.D.

5

11/17/2021

 

bias

Activation function

weights

6 of 62

Neuron…

Dinesh K. Vishwakarma, Ph.D.

6

11/17/2021

6

  • Artificial neurons are based on biological neurons.
  • Each neuron in the network receives one or more inputs.
  • An activation function is applied to the inputs, which determines the output of the neuron – the activation level.

Activation functions

Activation Function works

7 of 62

Neural Network

Dinesh K. Vishwakarma, Ph.D.

7

11/17/2021

How do we train?

 

 

 

 

 

4 + 2 = 6 neurons (not counting inputs)

[3 x 4] + [4 x 2] = 20 weights

4 + 2 = 6 biases

26 learnable parameters

Weights

Activation functions

8 of 62

Training Perceptron

  • Learning involves choosing values for the weights
  • The perceptron is trained as follows:
    • First, inputs are given random weights (usually between –0.5 and 0.5).
    • An item of training data is presented. If the perceptron mis-classifies it, the weights are modified according to the following:
      • where t is the target output for the training example, o is the output generated by the perceptron and a is the learning rate, between 0 and 1 (usually small such as 0.1)
  • Cycle through training examples until successfully classify all examples
    • Each cycle known as an epoch

Dinesh K. Vishwakarma, Ph.D.

8

11/17/2021

9 of 62

Backpropagation

  • Multilayer neural networks learn in the same way as perceptrons.
  • However, there are many more weights, and it is important to assign credit (or blame) correctly when changing weights.
  • E sums the errors over all of the network output units

Dinesh K. Vishwakarma, Ph.D.

9

11/17/2021

10 of 62

Backpropagation Algorithm

  • Create a feed-forward network with nin inputs, nhidden hidden units, and nout output units.
  • Initialize all network weights to small random numbers.
  • Until termination condition is met, Do
    • For each <x,t> in training examples, Do.
    • Propagate the input forward through the network:
    • Input the instance x to the network and compute the output ou of every unit u in the network.
    • Propagate the errors backward through the network:
    • For each network output unit k, calculate its error term δk

    • For each hidden unit h, calculate its error term δh

    • Update each network weight wji where

CS 484 – Artificial Intelligence

10

11 of 62

Hidden Layer representation

Can this be learned?

Target Function:

12 of 62

Yes

CS 484 – Artificial Intelligence

12

Input

Hidden Values

Output

10000000

.89 .04 .08

10000000

01000000

.15 .99 .99

01000000

00100000

.01 .97 .27

00100000

00010000

.99 .97 .71

00010000

00001000

.03 .05 .02

00001000

00000100

.01 .11 .88

00000100

00000010

.80 .01 .98

00000010

00000001

.60 .94 .01

00000001

13 of 62

Example 1 of NN

Dinesh K. Vishwakarma, Ph.D.

13

11/17/2021

W1

W2

W3

f(x)

1.4

-2.5

-0.06

14 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

14

11/17/2021

2.7

-8.6

0.002

f(x)

1.4

-2.5

-0.06

x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34

15 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

15

11/17/2021

A dataset

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

16 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

16

11/17/2021

Training the neural network

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

17 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

17

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Initialise with random weights

18 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

18

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Present a training pattern

1.4

2.7

1.9

19 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

19

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Feed it through to get output

1.4

2.7 0.8

1.9

20 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

20

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Compare with target output

1.4

2.7 0.8

0

1.9 error 0.8

21 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

21

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Adjust weights based on error

1.4

2.7 0.8

0

1.9 error 0.8

22 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

22

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Present a training pattern

6.4

2.8

1.7

23 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

23

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Feed it through to get output

6.4

2.8 0.9

1.7

24 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

24

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Compare with target output

6.4

2.8 0.9

1

1.7 error -0.1

25 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

25

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

Adjust weights based on error

6.4

2.8 0.9

1

1.7 error -0.1

26 of 62

Example 1 of NN…

Dinesh K. Vishwakarma, Ph.D.

26

11/17/2021

Training data

Fields class

1.4 2.7 1.9 0

3.8 3.4 3.2 0

6.4 2.8 1.7 1

4.1 0.1 0.2 0

etc …

And so on ….

6.4

2.8 0.9

1

1.7 error -0.1

Repeat this thousands, maybe millions of times – each time taking a random training instance, and making slight weight adjustments

Algorithms for weight adjustment are designed to make changes that will reduce the error

27 of 62

Example of Digit Recognition

Dinesh K. Vishwakarma, Ph.D.

27

11/17/2021

Machine

“2”

16 x 16 = 256

……

Ink → 1 No ink → 0

……

y1

y2

y10

is 1

is 2

is 0

……

0.1

0.7

0.2

The image is “2”

 

28 of 62

Example of Neural Network

Dinesh K. Vishwakarma, Ph.D.

28

11/17/2021

Sigmoid Function

1

-1

1

-2

1

-1

1

0

4

-2

0.98

0.12

29 of 62

Example of Neural Network

Dinesh K. Vishwakarma, Ph.D.

29

11/17/2021

1

-2

1

-1

1

0

4

-2

0.98

0.12

2

-1

-1

-2

3

-1

4

-1

0.86

0.11

0.62

0.83

0

0

-2

2

1

-1

30 of 62

Example of Neural Network

Dinesh K. Vishwakarma, Ph.D.

30

11/17/2021

1

-2

1

-1

1

0

0.73

0.5

2

-1

-1

-2

3

-1

4

-1

0.72

0.12

0.51

0.85

0

0

-2

2

 

Different parameters define different function

 

 

0

0

31 of 62

Example of Neural Network

Dinesh K. Vishwakarma, Ph.D.

31

11/17/2021

 

1

-2

1

-1

1

0

4

-2

0.98

0.12

 

 

 

 

 

 

1

-1

 

32 of 62

Example of Neural Network

Dinesh K. Vishwakarma, Ph.D.

32

11/17/2021

……

……

……

……

……

……

……

……

y1

y2

yM

W1

W2

WL

b2

bL

x

a1

a2

y

b1

W1

x

+

 

b2

W2

a1

+

 

bL

WL

+

 

aL-1

b1

33 of 62

Neural Network

Dinesh K. Vishwakarma, Ph.D.

33

11/17/2021

 

 

……

……

……

……

……

……

……

……

y1

y2

yM

W1

W2

WL

b2

bL

x

a1

a2

y

y

 

x

b1

W1

x

+

 

b2

W2

+

bL

WL

+

b1

Using parallel computing techniques to speed up matrix operation

34 of 62

Softmax

  • Softmax layer as the output layer

Dinesh K. Vishwakarma, Ph.D.

34

11/17/2021

Ordinary Layer

In general, the output of network can be any value.

May not be easy to interpret

35 of 62

Softmax

  • Softmax layer as the output layer

Dinesh K. Vishwakarma, Ph.D.

35

11/17/2021

3

-3

1

2.7

20

0.05

0.88

0.12

0

 

36 of 62

Network Parameters

Dinesh K. Vishwakarma, Ph.D.

36

11/17/2021

16 x 16 = 256

……

……

……

……

……

Ink → 1

No ink → 0

……

y1

y2

y10

0.1

0.7

0.2

y1 has the maximum value

Set the network parameters such that ……

Input:

y2 has the maximum value

Input:

is 1

is 2

is 0

Softmax

 

37 of 62

Visual Information Processing

  • Visual information processed by our brain is multi-layered.

Dinesh K. Vishwakarma, Ph.D.

37

11/17/2021

38 of 62

Enabling Factor of DL

  • Training of deep networks was made computationally feasible by:
      • Faster CPU’s
      • The move to parallel CPU architectures
      • Advent of GPU computing
  • Neural networks are often represented as a matrix of weight vectors.
  • GPU’s are optimized for very fast matrix multiplication
  • 2008 - Nvidia’s CUDA library for GPU computing is released.

Dinesh K. Vishwakarma, Ph.D.

38

11/17/2021

39 of 62

Hierarchical Learning

Dinesh K. Vishwakarma, Ph.D.

39

11/17/2021

Low-level features

output

Mid-level features

High-level features

Trainable classifier

Inspired from visual information processing, a representation of Hierarchical Learning is developed, also know as “Deep Learning”

First in 1986 by Rina Dechter

Revolution since 2012

40 of 62

Deep Neural Network

Dinesh K. Vishwakarma, Ph.D.

40

11/17/2021

Output Layer

Hidden Layers

Input Layer

Input

Output

Layer 1

……

……

Layer 2

……

Layer L

……

……

……

……

……

y1

y2

yM

Deep means many hidden layers

neuron

41 of 62

Why Deep Network?

Dinesh K. Vishwakarma, Ph.D.

41

11/17/2021

Layer X Size

Word Error Rate (%)

Layer X Size

Word Error Rate (%)

1 X 2k

24.2

2 X 2k

20.4

3 X 2k

18.4

4 X 2k

17.8

5 X 2k

17.2

1 X 3772

22.5

7 X 2k

17.1

1 X 4634

22.6

1 X 16k

22.1

Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech. 2011.

Not surprised, more parameters, better performance

42 of 62

Why Deep Network?

  • Universal Theorem

Dinesh K. Vishwakarma, Ph.D.

42

11/17/2021

Any continuous function f

Can be realized by a network with one hidden layer

(given enough hidden neurons)

Why “Deep” neural network not “Fat” neural network?

43 of 62

Dinesh K. Vishwakarma, Ph.D.

43

11/17/2021

Fat + Short v.s. Thin + Tall

……

Deep

……

……

Shallow

Which one is better?

The same number of parameters

44 of 62

Fat + Short v.s. Thin + Tall

Dinesh K. Vishwakarma, Ph.D.

44

11/17/2021

Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech. 2011.

Layer X Size

Word Error Rate (%)

Layer X Size

Word Error Rate (%)

1 X 2k

24.2

2 X 2k

20.4

3 X 2k

18.4

4 X 2k

17.8

5 X 2k

17.2

1 X 3772

22.5

7 X 2k

17.1

1 X 4634

22.6

1 X 16k

22.1

45 of 62

Training multi-layer NNs (DNN)

Dinesh K. Vishwakarma, Ph.D.

45

11/17/2021

46 of 62

Dinesh K. Vishwakarma, Ph.D.

46

11/17/2021

Train this layer first

Training multi-layer NNs

47 of 62

Training multi-layer NNs

Dinesh K. Vishwakarma, Ph.D.

47

11/17/2021

Train this layer first

then this layer

48 of 62

Training multi-layer NNs

Dinesh K. Vishwakarma, Ph.D.

48

11/17/2021

Train this layer first

then this layer

then this layer

49 of 62

Training multi-layer NNs

Dinesh K. Vishwakarma, Ph.D.

49

11/17/2021

Train this layer first

then this layer

then this layer

then this layer

50 of 62

Training multi-layer NNs

Dinesh K. Vishwakarma, Ph.D.

50

11/17/2021

Train this layer first

then this layer

then this layer

then this layer

finally this layer

51 of 62

When to use Deep Learning?

  • Data size is large
  • High end infrastructure
  • Lack of domain understanding
  • Complex problem such as image classification, speech recognition etc.

Dinesh K. Vishwakarma, Ph.D.

51

11/17/2021

Fuel of deep learning is the big data by Andrew Ng

Deep

Learning

Machine

Learning

Amount of Data

Performance

52 of 62

Limitations of Deep Learning

  • Very slow to train
  • Models are very complex, with lot of parameters to optimize:
    • Initialization of weights
    • Layer-wise training algorithm
    • Neural architecture
      • Number of layers
      • Size of layers
      • Type – regular, pooling, max pooling, soft max
    • Fine-tuning of weights using back propagation

Dinesh K. Vishwakarma, Ph.D.

52

11/17/2021

53 of 62

Thank you!�dinesh@dtu.ac.in

Dinesh K. Vishwakarma, Ph.D.

Slide 53 of 74

11/17/2021

54 of 62

Problems on Neural Networks

Dinesh K. Vishwakarma, Ph.D.

Slide 54 of 74

11/17/2021

55 of 62

Problem 1

  • Consider a artificial Neurons, which has three inputs nodes x = (x1, x2, x3) that receive only binary signals (either 0 or 1). How many different input patterns this node can receive? What if the node had four inputs? Five? Can you give a formula that computes the number of binary input patterns for a given number of inputs?

Dinesh K. Vishwakarma, Ph.D.

55

11/17/2021

56 of 62

Solutions

  •  

Dinesh K. Vishwakarma, Ph.D.

56

11/17/2021

57 of 62

Problem 2

  • Consider a artificial neurons have three inputs, the weights corresponding to the these inputs have (2, -4, 1), the activation function is unit step. Determine the output for following input values.

Dinesh K. Vishwakarma, Ph.D.

57

11/17/2021

58 of 62

Solutions

  •  

Dinesh K. Vishwakarma, Ph.D.

58

11/17/2021

59 of 62

Problem 3

  •  

Dinesh K. Vishwakarma, Ph.D.

59

11/17/2021

60 of 62

Solutions

  • In order to find the output of the network it is necessary to calculate weighted sums of hidden nodes 3 and 4:

  • Then find the outputs from hidden nodes using activation function.
  • Use the outputs of the hidden nodes y3 and y4 as the input values to the output layer (nodes 5 and 6), and find weighted sums of output nodes 5 and 6:

  • Finally, compute the outputs from nodes 5 and 6 using

Dinesh K. Vishwakarma, Ph.D.

60

11/17/2021

61 of 62

Solutions

Dinesh K. Vishwakarma, Ph.D.

61

11/17/2021

62 of 62

Solutions

Dinesh K. Vishwakarma, Ph.D.

62

11/17/2021