CS458 Natural language Processing
Lecture 11
RNNs
Krishnendu Ghosh
Department of Computer Science & Engineering
Indian Institute of Information Technology Dharwad
Simple Recurrent Networks (RNNs or Elman Nets)
The Need for Sequences
RNN: Feedback Loop
RNN Architecture
RNN
RNN
Advantages
Processing Variable Lengths
Issues
Vanishing/Exploding Gradient Problem
Training Recurrent Networks (RNNs)
Training in simple RNNs
Just like feedforward training:
Weights that need to be updated:
Training in simple RNNs: unrolling in time
Unlike feedforward networks:
1. To compute loss function for the output at time t we need the hidden layer from time t − 1.
2. hidden layer at time t influences the output at time t and hidden layer at time t+1 (and hence the output and loss at t+1).
So: to measure error accruing to ht,
Unrolling in time (2)
We unroll a recurrent network into a feedforward computational graph eliminating recurrence
RNNs as Language Models
Reminder: Language Modeling
The size of the conditioning context for different LMs
The n-gram LM:
Context size is the n − 1 prior words we condition on.
The feedforward LM:
Context is the window size.
The RNN LM:
No fixed context size; ht-1 represents entire history
Training RNN LM
RNNs for Sequences
RNNs for Sequence Labeling
Assign a label to each element of a sequence
Part-of-speech tagging
RNNs for Sequence Labeling
RNNs for Sequence Classification
Text classification
Instead of taking the last state, could use some pooling function of all the output states, like mean pooling
RNNs for Sequence Classification
Autoregressive Generation
Autoregressive Generation
Stacked RNNs
Stacked RNNs
Bidirectional RNNs
Bidirectional RNNs
Bidirectional RNNs for Classification
Bidirectional RNNs for Classification
Thank You