Meme of the Day
Lab 01 was “due”
How was it??
I think a Lab 00 with “workflow” tips could have been useful
Lab 02 is available but based on feedback I can/will change things
Goal is low floor/high ceiling
CSE 10124�Recurrent Neural Networks
Writing MNIST “Equations”
…
…
<
>
X = <1, 784>
ŷ = X ⍵T + b
X⍵T + b)
y = <1, 10>
<0, 0, 0, 0, 0, 1, 0, 0, 0, 0>
<1, 10> = <1, 784> ᐧ <784, 10> + <1, 10>
= 5
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Non-linearity
Not all data can be separated with a straight line!
Activation Functions
We can add non-linearity to our network by wrapping our linear transformations in nonlinear functions called activation functions
nanochat - MLP
Convolutional Neural Networks
FFN vs. FFN + Dropout vs. CNN
Image vs. Text Data
Whether an image of a 5 came before the 0 has no impact on how we predict the 0.
However, for text, what came previously matters a LOT
I
like
cute
kittens
and
Recurrent Neural
Networks
tokenizer.decode(6)
tokenizer.decode(4)
y1 = softmax(by + h1wy)
tanh(h0wh+ ex0wx + bh) = h1
<0.18, 0.14, 0.13, 0.12, 0.10, 0.09, 0.08, 0.07, 0.05, 0.04>
ex0 = embeddings[xt0]
h0
y2 = softmax(by + h2wy)
tanh(h1wh+ ex1wx + bh) = h2
ex1 = embeddings[xt1]
yt+1 = softmax(by + htwy)
tanh(ht-1wh+ extwx + bh) = ht
ext = embeddings[xtt]
[xt0, xt1, …, xtt] = tokenizer(X)
X = “I like cute kittens and”
<0.11, 0.16, 0.13, 0.21, 0.10, 0.08, 0.07, 0.06, 0.05, 0.03>
<0.07, 0.15, 0.14, 0.10, 0.09, 0.08, 0.25, 0.05, 0.04, 0.03>
…
yt1 = 0
yt2 = 4
ytt+1 = 6
tokenizer.decode(0)
“like”
“cute”
“puppies”
…
“...”