1 of 43

Building simple ANN from scratch

MNIST

2 of 43

From scratch?

Starting with tensorflow, pytorch is easy, but can’t really tell what’s going on inside unless you read source codes or official documentation. Building simple network from scratch may help you better understand ANN.

  • What we are going to do: Build Simple ANN for MNIST classification with Numpy

3 of 43

Recap slides

  •  

4 of 43

A perceptron - neuron

  •  

Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain”. 《Psychological Review》 65 (6): 386–408. 

 

 

 

 

 

 

 

 

 

5 of 43

A perceptron – activation function

  •  

6 of 43

Single layer perceptron

  •  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

input

output

7 of 43

A single layer can be seen as matrix multiplication too

https://www.jeremyjordan.me/intro-to-neural-networks/

 

8 of 43

Neural network is transformation

  • Weights matrix transformations input vec to output vec
  • It moves the points (input) into new space (transformation)

9 of 43

Linearity

  •  

10 of 43

Non-linearity using activation functions

  •  

https://junstar92.tistory.com/122

11 of 43

Multi layer perceptron

  • A stacked layers of perceptrons that has hidden layer (s)
  • No cycle, the output of each layer cannot go back to former layers
  • Also called Deep learning: has more values than input/output only

https://ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html

12 of 43

Forward propagation

  • Two hidden layered, deep learning model

-1 or 1 ?

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Orange denotes -1, Blue denotes 1

input

hidden 1

hidden 2

output

 

13 of 43

Forward propagation (input – hidden1)

  • Two hidden layered, deep learning model

-1 or 1 ?

 

 

 

 

 

 

 

 

input

hidden 1

hidden 2

output

 

 

 

 

 

14 of 43

Forward propagation (hidden1 – hidden2)

  • Two hidden layered, deep learning model

-1 or 1 ?

 

 

 

 

 

 

 

 

input

hidden 1

hidden 2

output

 

 

 

 

 

15 of 43

Forward propagation (hidden2 – output)

  • Two hidden layered, deep learning model

-1 or 1 ?

 

 

 

 

 

 

 

 

input

hidden 1

hidden 2

output

 

 

 

 

 

16 of 43

loss

  •  

17 of 43

Gradient descent

  •  

Loss

 

 

 

 

 

Loss

18 of 43

Gradient descent

  •  

 

 

Loss

 

 

 

Loss

19 of 43

Optimizing

 

 

 

 

 

 

 

 

loss = 0.25

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

20 of 43

Optimizing

 

 

 

 

 

 

 

 

more closer to 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

21 of 43

Backward propagation

  • BUT, how can we optimize other weights?
  • Because we cannot get the loss in hidden layers...

We use chain rule!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22 of 43

Backward propagation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

23 of 43

Parameter update

This can be done over all weights!

when we renew every weights, we call 1 update is done

  • ※ Stochastic gradient descent
    • Updating all parameters once with one datapoint

 

 

 

 

 

 

 

24 of 43

How can we make a deep learning model

25 of 43

Training / Testing

  •  

Training set

Testing set

26 of 43

From scratch?

Starting with tensorflow, pytorch is easy, but can’t really tell what’s going on inside unless you read source codes or official documentation. Building simple network from scratch may help you better understand ANN.

  • What we are going to do: Build Simple ANN for MNIST classification with Numpy

27 of 43

About MNIST

  • Modified National Institute of Standards and Technology
  • MNIST is a dataset of handwritten digits
  • It is commonly used for training and testing machine learning algorithms for image classification
  • The dataset consists of 60,000 training images and 10,000 test images
  • The images are 28x28 pixels in size and are grayscale
  • The digits are between 0 and 9
  • MNIST is widely used in the field of computer vision and is often used as a benchmark for testing the performance of new image classification algorithms.

28 of 43

About MNIST

  • The images are 28x28 pixels in size and are grayscale
  • The digits are between 0 and 9

29 of 43

30 of 43

Network structure

  • Image size is 28 x 28 (784) 🡪 Input layer of size 784
  • Simple ANN 🡪 single hidden layer, then output layer
  • Should look something like this…

31 of 43

32 of 43

Network structure

 

value

 

 

 

 

 

Size: 784

Size: 10

Size: 10

 

 

 

33 of 43

What do we need?

  • Read data
  • Define network
  • Feed to network (Forward prop)
  • Update parameters (Backward prop)
  • Repeat for few cycle
  • Check result

34 of 43

Read data

  • Including shuffle, and train/test set separation

35 of 43

36 of 43

Network structure

 

 

value

 

 

 

 

 

Size: 784

Size: 10

Size: 10

 

 

 

37 of 43

Network structure

rand returns 0~1

38 of 43

Network structure

39 of 43

Network structure

Met in Bioinformatics…

Backward

W2 := W2 - 𝛼 dW2

b2 := b2 - 𝛼 db2

W1 := W1 - 𝛼 dW1

b1 := b1 - 𝛼 db1

40 of 43

One-hot encoding

41 of 43

Combined!

42 of 43

To do

  • Play with provided network
  • Modify network structure, hyper parameters to get better result!
  • Add more layer?
  • Another activation function?
  • More iteration?
  • Set your hypothesis, run experiment to check, share with others

43 of 43

Brought to you by..