Rekurentní sítě, sekvence, CTC, pozornost.
KNN - Convolutional Neural Networks
Michal Hradiš - Brno University of Technology
Sequence processing
TEXT
Sound
Image
Video
Documents
Classification
Tasks - element classification (segmentation)
Sequence generation
The probability is 6 %.
How to work with sequences
Sequence procesing
Recurrent layers - start with single fully connected layer
Input feature vector
Output activation vector
f(x) = sigmoid(Wx + b)
Sequence processing - communictaion
Sequence processing - communictaion
Recurrent
Convolution
Attention
Graph networks
Recurrent layers
Recurrent layers
Input vector sequence
Recurrent layers
Input vector sequence
Input state vector
Output state vector
Recurrent layers
Input vector sequence
Start state vector
Final
state
vector
Output vector sequence
Vanila RNN
Christopher Olah: Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Fully connected layer:
RNN:
RNN:
Recurrent layers - training
t1
t2
t3
t5
t4
t6
t8
t7
tend
loss
loss
loss
loss
loss
loss
loss
loss
loss
Final objective function is a sum of all loss functions
Valid comp. graph - oriented acyclic
Gradient backpropagation is “standard”
Optimization alg. is “standard”
Recurrent layers - long sequences and gradients
t8
tend
loss
loss
GRU - Gated Recurrent Unit
How to get better long distance gradients?
Can use similar “bypass” principle as in residual networks.
Christopher Olah: Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM - Long Short Term Memory
Christopher Olah: Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Bidirectional recurrent layer
Deep Dive into Bidirectional LSTM
https://www.i2tutorials.com/technology/deep-dive-into-bidirectional-lstm/
1D convolution (temporal convolution)
Attention?
John
went
home
through
snow
very
deep
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Subject
Verb
Object
...
...
...
object
...
...
Attention?
John
went
home
through
snow
very
deep
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Subject
Verb
Object
...
...
...
object
...
...
0.1
0
0.2
0
0
0.3
0.4
0
Multiply with a weight
sum
Attention
Transformer - positional encoding
ALiBi
Press et al.: TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR
BIASES ENABLES INPUT LENGTH EXTRAPOLATION, 2022.
Rotary position encoding
Jianlin Su et al.: RoFormer: Enhanced Transformer with Rotary Position Embedding
Mix convolution and recurrent layers
1D conv.
1D pooling
Fully connected
Recurrent layer
Mix convolution and recurrent layers
1D point conv.
Attention
Fully connected
CONV 24
CONV 24
CONV 48
CONV 48
CONV 96
CONV 96
POOL 2x
POOL 2x
POOL 2x
LSTM
OCR - text line transcription
CONV 256
CONV 1D
P(“n”|image, position) = 0.97
CONV 1D
SOFTMAX
Char probs.
P(“c”|image, position) = 0.94
OCR - text line transcription with CTC
A
B
C
D
E
F
G
H
#
#######fe#aa###dd#...
CTC - Conectionist temporal classification
Loss function
Labels: sequence of class id
Net output: sequence of class probability vectors
Idea:
Sequence Regression
The
product
is
the
know
best
I
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Score 4.2/5
Sequence Classification
John
went
home
through
snow
very
deep
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
English language
Sequence Classification (better)
John
went
home
through
snow
very
deep
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
SUM
English language
Classification
Czech language
1D conv.
1D pooling
Fully connected
Recurrent layer
Word tagging / text transcription / sequence segmentation
John
went
home
through
snow
very
deep
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Subject
Verb
Object
...
...
...
...
...
Reading Comprehension / conditioned sequence segmentation
Question embed.
Question encoder
He
killed
him
in
1985
July
of
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
out
out
out
out
start
in
end
out
When
LUT
?
LUT
Learn long distance dependencies?
John
went
home
through
snow
very
deep
.
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Subject
Verb
Object
...
...
...
object
...
...
BERT - Bidirectional Encoder Representations from Transformers
Trained to fill in masked words in a sentence.
BERT
Trained to estimate text continuity.
Are two sentences sequential in the text corpus?
Transformer encoder
Sequential / random
Multi-modal
Huang et al.: LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking