JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 23

Neural Networks in Rust

Yoray Herzberg

2 of 23

$ whoami

Yoray Herzberg
CS Student @ The Open University
Interested in Cybersecurity and AI

CTFs
Bug Bounty
Work on projects

Loves Rust

Twitter: @sag0li

My blog: vaktibabat.github.io

3 of 23

The Goal

Making a digit classifier: given an image of a digit, how likely is it to belong to each class (0-9)?
e.g.:

[0.03, 0.05, 0.04, 0.02, 0.08, 0.03, 0.06, 0.6, 0.04, 0.01, 0.05]

4 of 23

The MNIST Dataset

70,000 greyscale handwritten digits

along with their label

Each digit is 28x28 pixels

5 of 23

Why Neural Networks?

Widely used in practice today in applications such as CV and NLP
Wanted to learn more about the inner workings of Neural Nets

6 of 23

Neural Networks

Composed of layers of neurons
Consecutive layers are connected with weights and biases
The output of layer l+1 is

7 of 23

Forward Pass

8 of 23

Forward Pass

9 of 23

Training Neural Networks

Loss functions
For multiclass classification, we use cross-entropy loss

10 of 23

Training Neural Networks

To minimize the loss function, we use Gradient Descent
The gradient is the direction of steepest ascent, and conversely the direction opposite the gradient is the direction of steepest descent

11 of 23

Backwards Pass

12 of 23

Backpropagation

To compute the gradients WRT weights and biases, we propagate the loss backwards through the network using the chain rule

13 of 23

Deriving The Gradients

Start by deriving the gradient of the loss function WRT the output(s) of the network
In the case of cross-entropy loss, it turns out that

14 of 23

Deriving The Gradients

The chain rule:

For example, if f(x, y, z) = (x + y) * z, then

15 of 23

Deriving The Gradients

To use it in our case, observe that

16 of 23

Deriving The Gradients

ReLU layer:

The derivative:

17 of 23

Deriving The Gradients

Linear layer:

18 of 23

Deriving The Gradients

19 of 23

Deriving The Gradients

Start by computing the gradient

of the loss WRT the output

Then compute the gradients of the loss

WRT the layer before the last using

the backprop equations

Repeat until you get to the input layer

20 of 23

Backwards Pass

21 of 23

Demo Time!

22 of 23

Any Questions?

Twitter: @sag0li

GitHub: vaktibabat

My blog: vaktibabat.github.io

Email: yoray.herzberg@gmail.com

23 of 23

Thanks for listening!!!