1 of 44

Neural Networks and Fuzzy Systems

The Multilayer Perceptron

Rizoan Toufiq

Assistant Professor

Department of Computer Science & Engineering

Rajshahi University of Engineering & Technology

2 of 44

Altering The Perceptron Model

The Problem:

The perceptron's in the second layer do not know which of the real inputs were on or not;
It is impossible to strengthen the correct parts of the network
Credit assignment problem, since it means that the network is unable to determine which of the input weights should be increased and which should not�

3 of 44

Altering The Perceptron Model

The Problem:

How are we to overcome the problem of being unable to solve linearly inseparable problems with our perceptron?

4 of 44

Altering The Perceptron Model

The Solution:

A couple of possibilities for the new thresholding function�

5 of 44

The New Model

6 of 44

The New Learning Rule

“Generalised delta rule”, or the “backpropagation rule”, and

Suggested in 1986 by Rumelhart, McClelland, and Williams.

Parker had published similar results in 1982
Werbos was shown to have done the work in 1974.

7 of 44

The New Learning Rule

“Generalised delta rule”, or the “backpropagation rule”, and

define an error function that represents the difference between the network’s current output and the correct output that we want it to produce.
We want to continually reduce the value of this error function

adjusting the weights on the links between the units, and the generalised delta rule does this by calculating the value of the error function for that particular input, and then back-propagating (hence the name!) the error from one layer to the previous one.

8 of 44

The New Learning Rule

The Mathematics

E_p → the error function for pattern p ,
t_pj → the target output for pattern p on node j,
O_pj →the actual output for pattern p on node j
w_ij → the weight from node i to node j.

Hidden Layer to output layer

9 of 44

The New Learning Rule

The Mathematics

E_p → the error function for pattern p ,
t_pj → the target output for pattern p on node j,
O_pj →the actual output for pattern p on node j
w_ij → the weight from node i to node j.

Hidden Layer to output layer

10 of 44

The New Learning Rule

The Mathematics

E_p → the error function for pattern p ,
t_pj → the target output for pattern p on node j,
O_pj →the actual output for pattern p on node j
w_ij → the weight from node i to node j.

Hidden Layer to output layer

11 of 44

The New Learning Rule

The Mathematics

E_p → the error function for pattern p ,
t_pj → the target output for pattern p on node j,
O_pj →the actual output for pattern p on node j
w_ij → the weight from node i to node j.

Hidden Layer to output layer

12 of 44

The New Learning Rule

The Mathematics

E_p → the error function for pattern p ,
t_pj → the target output for pattern p on node j,
O_pj →the actual output for pattern p on node j
w_ij → the weight from node i to node j.

Hidden Layer to output layer

13 of 44

The New Learning Rule

The Mathematics

Hidden Layer to output layer

We now need to know what δ_pjis for each of the units-if we know

this, then we can decrease E .

Using (4.6) and chain rule,

14 of 44

The New Learning Rule

The Mathematics

Hidden Layer to output layer

We now need to know what δ_pjis for each of the units-if we know

this, then we can decrease E .

Using (4.6) and chain rule,

15 of 44

The New Learning Rule

The Mathematics

Hidden Layer to output layer

16 of 44

The New Learning Rule

The Mathematics

Input layer to Hidden Layer

17 of 44

The New Learning Rule

The Mathematics

Input layer to Hidden Layer

Equations (4.12) and (4.15) together define how we can train our multilayer networks. �

18 of 44

The New Learning Rule

Advantage of Sigmoid Function

19 of 44

The Multilayer Perceptron Algorithm

20 of 44

The Multilayer Perceptron Algorithm

21 of 44

The Multilayer Perceptron Algorithm

22 of 44

The XOR problem Revisited

The hidden unit acts as a feature detector, �

23 of 44

The XOR problem Revisited

24 of 44

The XOR problem Revisited

One of the more interesting cases is when there is no direct connection from the input to the output.

25 of 44

The XOR problem Revisited

One of the more interesting cases is when there is no direct connection from the input to the output.

26 of 44

The XOR problem Revisited

The learning rule is not guaranteed to produce convergence, however, and it is possible for the network to fall into a situation in�which it is unable to learn the correct output.

The network shown in figure 4.7 will correctly respond to the input patterns 00 and 10, but fails to produce the correct output for the�patterns 01 or 11. �

This local minimum occurs infrequently-about 1%of the time in the XOR problem.

27 of 44

Visualising Network Behavior

28 of 44

Visualising Network Behavior

29 of 44

Visualising Network Behavior

30 of 44