ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAE
1
[Description] Long Short-Term Memory (LSTM) - RNN variant that solves vanishing gradient using gating mechanisms.
2
LSTM (Long Short-Term Memory)
3
4
═══ RNN Problem: Vanishing Gradient ═══
5
6
• Gradient gets smaller in long sequences (vanishing)
7
• Difficult to remember distant past information
8
• Solution: LSTM's Cell State (long-term memory)
9
10
11
═══ LSTM Architecture ═══
12
13
Forget Gate (f_t)
Decide what to forget
σ(W_f·[h_{t-1}, x_t] + b_f)
14
Input Gate (i_t)
What to store from new information
σ(W_i·[h_{t-1}, x_t] + b_i)
15
Candidate (c̃_t)
New candidate information
tanh(W_c·[h_{t-1}, x_t] + b_c)
16
Output Gate (o_t)
Decide what to output
σ(W_o·[h_{t-1}, x_t] + b_o)
17
18
19
═══ State Updates ═══
20
21
Cell State (c_t)
Long-term memory
c_t = f_t ⊙ c_{t-1} + i_t ⊙ c̃_t
22
Hidden State (h_t)
Short-term memory / Output
h_t = o_t ⊙ tanh(c_t)
23
24
25
═══ LSTM Cell Diagram ═══
26
27
c_{t-1}
───────────────→
×f_t──→+──→c_t
28
29
i_t×c̃_t
30
31
h_{t-1}──→[concat]──→Gates
32
x_t──↗
33
34
tanh(c_t)×o_t=h_t
35
36
37
Input: 'this movie is good' (y_true = 1)
38
Embedding dim: 4, Hidden dim: 3
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100