1 of 11

Meme of the Day

2 of 11

Lab 01 was “due”

How was it??

I think a Lab 00 with “workflow” tips could have been useful

Lab 02 is available but based on feedback I can/will change things

Goal is low floor/high ceiling

3 of 11

CSE 10124�Recurrent Neural Networks

4 of 11

Writing MNIST “Equations”

<

>

X = <1, 784>

ŷ = X ⍵T + b

X⍵T + b)

y = <1, 10>

<0, 0, 0, 0, 0, 1, 0, 0, 0, 0>

<1, 10> = <1, 784> ᐧ <784, 10> + <1, 10>

= 5

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

5 of 11

Non-linearity

Not all data can be separated with a straight line!

6 of 11

Activation Functions

We can add non-linearity to our network by wrapping our linear transformations in nonlinear functions called activation functions

7 of 11

nanochat - MLP

8 of 11

Convolutional Neural Networks

9 of 11

FFN vs. FFN + Dropout vs. CNN

10 of 11

Image vs. Text Data

Whether an image of a 5 came before the 0 has no impact on how we predict the 0.

However, for text, what came previously matters a LOT

I

like

cute

kittens

and

11 of 11

Recurrent Neural

Networks

tokenizer.decode(6)

tokenizer.decode(4)

y1 = softmax(by + h1wy)

tanh(h0wh+ ex0wx + bh) = h1

<0.18, 0.14, 0.13, 0.12, 0.10, 0.09, 0.08, 0.07, 0.05, 0.04>

ex0 = embeddings[xt0]

h0

y2 = softmax(by + h2wy)

tanh(h1wh+ ex1wx + bh) = h2

ex1 = embeddings[xt1]

yt+1 = softmax(by + htwy)

tanh(ht-1wh+ extwx + bh) = ht

ext = embeddings[xtt]

[xt0, xt1, …, xtt] = tokenizer(X)

X = “I like cute kittens and”

<0.11, 0.16, 0.13, 0.21, 0.10, 0.08, 0.07, 0.06, 0.05, 0.03>

<0.07, 0.15, 0.14, 0.10, 0.09, 0.08, 0.25, 0.05, 0.04, 0.03>

yt1 = 0

yt2 = 4

ytt+1 = 6

tokenizer.decode(0)

“like”

“cute”

“puppies”

“...”