2 of 11

Lab 01 was “due”

How was it??

I think a Lab 00 with “workflow” tips could have been useful

Lab 02 is available but based on feedback I can/will change things

Goal is low floor/high ceiling

3 of 11

CSE 10124�Recurrent Neural Networks

4 of 11

Writing MNIST “Equations”

…

X = <1, 784>

ŷ = X ⍵^T + b

X⍵^T + b)

y = <1, 10>

<0, 0, 0, 0, 0, 1, 0, 0, 0, 0>

<1, 10> = <1, 784> ᐧ <784, 10> + <1, 10>

= 5

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

5 of 11

Non-linearity

Not all data can be separated with a straight line!

6 of 11

Activation Functions

We can add non-linearity to our network by wrapping our linear transformations in nonlinear functions called activation functions

7 of 11

nanochat - MLP

8 of 11

Convolutional Neural Networks

9 of 11

FFN vs. FFN + Dropout vs. CNN

10 of 11

Image vs. Text Data

Whether an image of a 5 came before the 0 has no impact on how we predict the 0.

However, for text, what came previously matters a LOT

cute

kittens

and

11 of 11

Recurrent Neural

Networks

tokenizer.decode(6)

tokenizer.decode(4)

y₁ = softmax(b_y + h₁w_y)

tanh(h₀w_h+ ex₀w_x + b_h) = h₁

<0.18, 0.14, 0.13, 0.12, 0.10, 0.09, 0.08, 0.07, 0.05, 0.04>

ex₀= embeddings[xt₀]

h₀

y₂ = softmax(b_y + h₂w_y)

tanh(h₁w_h+ ex₁w_x + b_h) = h₂

ex₁= embeddings[xt₁]

y_t+1 = softmax(b_y + h_tw_y)

tanh(h_t-1w_h+ ex_tw_x + b_h) = h_t

ex_t= embeddings[xt_t]

[xt₀, xt₁, …, xt_t]= tokenizer(X)

X = “I like cute kittens and”

<0.11, 0.16, 0.13, 0.21, 0.10, 0.08, 0.07, 0.06, 0.05, 0.03>

<0.07, 0.15, 0.14, 0.10, 0.09, 0.08, 0.25, 0.05, 0.04, 0.03>

…

yt₁ = 0

yt₂ = 4

yt_t+1 = 6

tokenizer.decode(0)

“like”

“cute”

“puppies”

…

“...”