Lecture 3�Recurrent Neural Networks – Part1���Sookyung Kim�sookim@ewha.ac.kr
1
Types of Neural Networks
One-to-one
Vanilla Neural-net�Image classification
Many-to-one
Action recognition�(Frames → class)
Many-to-many
Frame classification�(Frames → classes)
2
Types of Neural Networks
One-to-many
Image captioning�(Image → words)
Many-to-many
Video captioning�(Video → words)
3
Recurrent Neural Networks
4
RNN Basics
RNNs have an internal state that is updated as a sequence is processed.
x
RNN
y
5
RNN Basics
Expanded view of an RNN:
x3
RNN
y3
x1
RNN
y1
x2
RNN
y2
xT
RNN
yT
…
h0
h1
h2
h3
hT-1
6
RNN Basics
x
RNN
y
7
Vanilla RNN
xt
fW
yt
ht-1
ht
Whh
Wxh
Why
8
Vanilla RNN
x1
fW
y1
h0
h1
x2
fW
y2
h2
x3
fW
y3
h3
Whh
Wxh
Why
9
Vanilla RNN
x1
fW
y1
h0
h1
x2
fW
y2
h2
x3
fW
y3
h3
y1
y2
y3
10
Vanilla RNN
x1
fW
h0
h1
x2
fW
h2
x3
fW
yT
hT
yT
11
Vanilla RNN
x1
fW
h0
h1
fW
h2
fW
y3
h3
y3
y1
y2
y1
y2
y1
y2
12
Vanilla RNN
x1
fW
h0
h1
fW
h2
fW
h3
x2
x3
[]
fW
s0
s1
fW
s2
fW
y3
s3
y3
y1
y2
y1
y2
y1
y2
13
TensorFlow API: Vanilla RNN
Dimensionality of hidden state
>> input_shape = [32, 10, 8] # (batch_size, seq_len, dim)
>> inputs = np.random.random(input_shape).astype(np.float32)
>> simple_rnn = tf.keras.layers.SimpleRNN(� 4, return_sequences=True, return_state=True)�>> output_seq, final_state = simple_rnn(inputs)�>> print(output_seq.shape) # result: [32, 10, 4]�>> print(final_state.shape) # result: [32, 4]
tf.keras.layers.SimpleRNN(
units,� activation='tanh',� use_bias=True,
kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros', � kernel_regularizer=None,
recurrent_regularizer=None, � bias_regularizer=None, � activity_regularizer=None,
kernel_constraint=None, � recurrent_constraint=None, � bias_constraint=None,
dropout=0.0, recurrent_dropout=0.0, � return_sequences=False,� return_state=False,
go_backwards=False, stateful=False, � unroll=False, **kwargs
)
?
14
RNN Trade-offs
15
Multi-layer RNN
x1
fW
h10
h11
x2
fW
h12
x3
fW
h13
fW
h21
fW
h22
fW
h23
fW
y1
h31
fW
y2
h32
fW
y3
h33
y1
y2
y3
h20
h30
16
So, where can we use RNN?
17
Image Captioning
18
Visual Question & Answering (VQA)
19
Visual Dialog (Conversation about an Image)
20
Visual Language Navigation
21
Towards Modeling Longer Dependence
22