1 of 43

Building simple ANN from scratch

MNIST

2 of 43

From scratch?

Starting with tensorflow, pytorch is easy, but can’t really tell what’s going on inside unless you read source codes or official documentation. Building simple network from scratch may help you better understand ANN.

What we are going to do: Build Simple ANN for MNIST classification with Numpy

3 of 43

Recap slides

4 of 43

A perceptron - neuron

Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain”. 《Psychological Review》 65 (6): 386–408.

5 of 43

A perceptron – activation function

6 of 43

Single layer perceptron

input

output

7 of 43

A single layer can be seen as matrix multiplication too

https://www.jeremyjordan.me/intro-to-neural-networks/

8 of 43

Neural network is transformation

Weights matrix transformations input vec to output vec
It moves the points (input) into new space (transformation)

9 of 43

Linearity

10 of 43

Non-linearity using activation functions

https://junstar92.tistory.com/122

11 of 43

Multi layer perceptron

A stacked layers of perceptrons that has hidden layer (s)
No cycle, the output of each layer cannot go back to former layers
Also called Deep learning: has more values than input/output only

https://ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html

12 of 43

Forward propagation

Two hidden layered, deep learning model

-1 or 1 ?

Orange denotes -1, Blue denotes 1

input

hidden 1

hidden 2

output

13 of 43

Forward propagation (input – hidden1)

Two hidden layered, deep learning model

-1 or 1 ?

input

hidden 1

hidden 2

output

14 of 43

Forward propagation (hidden1 – hidden2)

Two hidden layered, deep learning model

-1 or 1 ?

input

hidden 1

hidden 2

output

15 of 43

Forward propagation (hidden2 – output)

Two hidden layered, deep learning model

-1 or 1 ?

input

hidden 1

hidden 2

output

16 of 43

loss

17 of 43

Gradient descent

Loss

18 of 43

Gradient descent

Loss

19 of 43

Optimizing

loss = 0.25

20 of 43

Optimizing

more closer to 1

21 of 43

Backward propagation

BUT, how can we optimize other weights?
Because we cannot get the loss in hidden layers...

We use chain rule!

22 of 43

Backward propagation

23 of 43

Parameter update

This can be done over all weights!

when we renew every weights, we call 1 update is done

※ Stochastic gradient descent

Updating all parameters once with one datapoint

24 of 43

How can we make a deep learning model

25 of 43

Training / Testing

Training set

Testing set

26 of 43

From scratch?

Starting with tensorflow, pytorch is easy, but can’t really tell what’s going on inside unless you read source codes or official documentation. Building simple network from scratch may help you better understand ANN.

What we are going to do: Build Simple ANN for MNIST classification with Numpy

27 of 43

About MNIST

Modified National Institute of Standards and Technology
MNIST is a dataset of handwritten digits
It is commonly used for training and testing machine learning algorithms for image classification
The dataset consists of 60,000 training images and 10,000 test images
The images are 28x28 pixels in size and are grayscale
The digits are between 0 and 9
MNIST is widely used in the field of computer vision and is often used as a benchmark for testing the performance of new image classification algorithms.

28 of 43

About MNIST

The images are 28x28 pixels in size and are grayscale
The digits are between 0 and 9

29 of 43

30 of 43

Network structure

Image size is 28 x 28 (784) 🡪 Input layer of size 784
Simple ANN 🡪 single hidden layer, then output layer
Should look something like this…

31 of 43

32 of 43

Network structure

value

Size: 784

Size: 10

33 of 43

What do we need?

Read data
Define network
Feed to network (Forward prop)
Update parameters (Backward prop)
Repeat for few cycle
Check result

34 of 43

Read data

Including shuffle, and train/test set separation

35 of 43

36 of 43

Network structure

value

Size: 784

Size: 10

37 of 43

Network structure

rand returns 0~1

38 of 43

Network structure

39 of 43

Network structure

Met in Bioinformatics…

Backward

W₂:= W₂ - 𝛼 dW₂

b₂:= b₂ - 𝛼 db₂

W₁:= W₁ - 𝛼 dW₁

b₁:= b₁ - 𝛼 db₁

40 of 43

One-hot encoding

41 of 43

Combined!

42 of 43

To do

Play with provided network
Modify network structure, hyper parameters to get better result!
Add more layer?
Another activation function?
More iteration?
Set your hypothesis, run experiment to check, share with others

43 of 43

Brought to you by..