1 of 37

Neural Networks�a Brief History

Deniz Yuret�Feb 27, 2016

2 of 37

Real vs Artificial Neurons

3 of 37

A biological neuron

From: http://cs231n.github.io/neural-networks-1

4 of 37

A computational neuron

From: http://cs231n.github.io/neural-networks-1

5 of 37

Summary

Real neurons are extremely complex structures with multiple functions, multiple chemical pathways…�
We think (but do not know for sure) that they primarily function by sending each other trains of electrical impulses.�
An artificial neuron is a simplified model that represents the strength of the impulses and connections with simple numbers.

7 of 37

Perceptrons (Rosenblatt, 1958)

From: http://research.microsoft.com/en-us/um/people/cmbishop/prml/index.htm

8 of 37

Perceptron algorithm

9 of 37

Perceptron algorithm: example run

10 of 37

Perceptron convergence theorem

11 of 37

Perceptrons, the book (Minsky and Papert 1969)

12 of 37

Summary

Rosenblatt called a simplified model of a single neuron a “perceptron”.�
He figured out an algorithm to train a perceptron and showed it can learn to recognize many patterns.�
Minsky and Papert showed that there are limits to what concepts a perceptron can represent.

13 of 37

Multilayer perceptrons

14 of 37

PDP (Rumelhart and McClelland, 1986)

15 of 37

Networks of perceptrons

16 of 37

Differentiable activation functions

17 of 37

Backpropagation algorithm

y = Wx + b “model prediction”

J = |y-y’|² “objective function”

18 of 37

Summary

People knew that networks of artificial neurons can “represent” concepts that a single perceptron cannot, but did not know how to “train” them.�
In 1980’s they figured out how to train them using the backpropagation algorithm.�
They also showed that a neural network with a large enough hidden layer is a “universal function approximator”, i.e. it can compute any function.

19 of 37

Convolutional Neural Networks

20 of 37

The human visual system

21 of 37

Receptive fields of ganglion cells in the retina�(Kuffler 1953)

22 of 37

Receptive fields of cells from the cat visual cortex (Hubel and Wiesel, 1959)

23 of 37

A fully connected artificial neural network

From http://cs231n.github.io/convolutional-networks

24 of 37

Convolutional neural networks have sparse connectivity (LeCun, 1998)

From http://deeplearning.net/tutorial/lenet.html

25 of 37

Convolutional neural networks share weights

From http://deeplearning.net/tutorial/lenet.html

26 of 37

Activations of an example ConvNet

From http://cs231n.github.io/convolutional-networks

27 of 37

Receptive fields learnt by an example ConvNet

From http://cs231n.github.io/convolutional-networks

28 of 37

Summary

Multilayer perceptrons connect units in one layer to all the units in the previous layer.�
Neurons in the early visual cortex have spatially limited receptive fields and groups of them compute similar functions.�
Convolutional neural networks connect units on each layer with a small patch of units in the previous layer and have groups of units that compute the same function.�
They do well on visual processing tasks.

29 of 37

Recurrent neural networks

30 of 37

Recurrent connections

From: https://www.willamette.edu/~gorr/classes/cs449/rnn1.html

31 of 37

Processing sequences

From: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

32 of 37

Example: machine translation

From: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

33 of 37

Example: generating image descriptions

From: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

34 of 37

Training RNNs

From: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

35 of 37

Demo: writing Shakespeare

From: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

36 of 37

Summary

Feedforward nets (MLP, CNN) have fixed sized inputs and outputs and execute a fixed number of computational steps.�
RNNs can operate over arbitrary length sequences, changing their internal state, thus modifying the operations they perform at every step (like a computer program).

37 of 37

Conclusion

We are trying to build computer models with three features:

Representation: a model should be able to represent a solution to your problem.�
Learning: a model should be trainable to find the solution from examples.�
Efficiency: the architecture should allow a compact representation to make learning feasible.