1 of 23

Neural Networks in Rust

Yoray Herzberg

2 of 23

$ whoami

  • Yoray Herzberg
  • CS Student @ The Open University
  • Interested in Cybersecurity and AI
    • CTFs
    • Bug Bounty
    • Work on projects
  • Loves Rust

Twitter: @sag0li

My blog: vaktibabat.github.io

3 of 23

The Goal

  • Making a digit classifier: given an image of a digit, how likely is it to belong to each class (0-9)?
  • e.g.:

[0.03, 0.05, 0.04, 0.02, 0.08, 0.03, 0.06, 0.6, 0.04, 0.01, 0.05]

4 of 23

The MNIST Dataset

  • 70,000 greyscale handwritten digits

along with their label

  • Each digit is 28x28 pixels

5 of 23

Why Neural Networks?

  • Widely used in practice today in applications such as CV and NLP
  • Wanted to learn more about the inner workings of Neural Nets

6 of 23

Neural Networks

  • Composed of layers of neurons
  • Consecutive layers are connected with weights and biases
  • The output of layer l+1 is

7 of 23

Forward Pass

8 of 23

Forward Pass

9 of 23

Training Neural Networks

  • Loss functions
  • For multiclass classification, we use cross-entropy loss

10 of 23

Training Neural Networks

  • To minimize the loss function, we use Gradient Descent
  • The gradient is the direction of steepest ascent, and conversely the direction opposite the gradient is the direction of steepest descent

11 of 23

Backwards Pass

12 of 23

Backpropagation

  • To compute the gradients WRT weights and biases, we propagate the loss backwards through the network using the chain rule

13 of 23

Deriving The Gradients

  • Start by deriving the gradient of the loss function WRT the output(s) of the network
  • In the case of cross-entropy loss, it turns out that

14 of 23

Deriving The Gradients

  • The chain rule:

  • For example, if f(x, y, z) = (x + y) * z, then

15 of 23

Deriving The Gradients

  • To use it in our case, observe that

16 of 23

Deriving The Gradients

  • ReLU layer:

  • The derivative:

17 of 23

Deriving The Gradients

  • Linear layer:

18 of 23

Deriving The Gradients

19 of 23

Deriving The Gradients

  1. Start by computing the gradient

of the loss WRT the output

  • Then compute the gradients of the loss

WRT the layer before the last using

the backprop equations

  • Repeat until you get to the input layer

20 of 23

Backwards Pass

21 of 23

Demo Time!

22 of 23

Any Questions?

Twitter: @sag0li

GitHub: vaktibabat

My blog: vaktibabat.github.io

Email: yoray.herzberg@gmail.com

23 of 23

Thanks for listening!!!