1 of 52

Long short-term memory

Neural Networks that remember

Victor ADASCALITEI

2 of 52

Neural Networks recap

3 of 52

Neural Networks recap

What we know so far?

4 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

5 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

Yes

No

Cat?

6 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

Yes

No

Cat?

Noise

7 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

Yes

No

Cat?

Noise

More generally

8 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

Yes

No

Cat?

Noise

More generally

Data

Result

9 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

Yes

No

Cat?

Noise

More generally

Data

Result

OR

10 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

Yes

No

Cat?

Noise

More generally

Data

Result

OR

Neural Network(Data) = Result

11 of 52

Neural Networks recap

What we know so far?

Convolutional Neural Networks (CNNs) Generative Adversarial Networks (GANs)

Yes

No

Cat?

Noise

More generally

Data

Result

OR

Neural Network(Data) = Result

f ( x ) = y

12 of 52

Neural Networks recap

They are universal function approximators

13 of 52

Neural Networks recap

They are universal function approximators

(Given enough data, they can approximate any function)

14 of 52

Neural Networks recap

They are universal function approximators

(Given enough data, they can approximate any function)

True for images

15 of 52

Neural Networks recap

They are universal function approximators

(Given enough data, they can approximate any function)

True for images and classifiers

16 of 52

Neural Networks recap

They are universal function approximators

(Given enough data, they can approximate any function)

True for images and classifiers

but

17 of 52

Neural Networks recap

They are universal function approximators

(Given enough data, they can approximate any function)

True for images and classifiers

but

What about text ?

18 of 52

Neural Networks recap

They are universal function approximators

(Given enough data, they can approximate any function)

True for images and classifiers

but

What about text ?

… and speech ?

… and music ?

19 of 52

Neural Networks recap

They are universal function approximators

(Given enough data, they can approximate any function)

True for images and classifiers

but

What about text ?

… and speech ?

… and music ?

What about time/context dependent data?

20 of 52

Neural Networks

Input in relation with the output

21 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

22 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

23 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

24 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

25 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

???

26 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

27 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

GANs

Recognition

28 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

Image

Captioning

29 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

Sentiment

Analysis

30 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

Movie

Analysis

31 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

Many to Many*

32 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

Many to Many*

Translations

33 of 52

Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

Many to Many*

?

34 of 52

Recurrent Neural Networks

Input in relation with the output

- Input

- Network

- Output

One to One

One to Many

Many to One

Many to Many

Many to Many*

?

35 of 52

Recurrent Neural Networks

Input in relation with the output

?

36 of 52

Recurrent Neural Networks

Input in relation with the output

?

=

37 of 52

Recurrent Neural Networks

Input in relation with the output

38 of 52

Recurrent Neural Networks

Input in relation with the output

39 of 52

Recurrent Neural Networks

Explained

ht

Whh

Wxh

xt

40 of 52

Recurrent Neural Networks

Explained

ht

Whh

Wxh

xt

41 of 52

Recurrent Neural Networks

Explained

ht

Whh

Wxh

xt

42 of 52

Recurrent Neural Networks

The problem

43 of 52

Recurrent Neural Networks

The problem

44 of 52

Recurrent Neural Networks

The problem

45 of 52

Recurrent Neural Networks

The problem

46 of 52

Recurrent Neural Networks

The problem

47 of 52

Recurrent Neural Networks

The problem

Vanishing gradient

48 of 52

Long short-term memory cell

49 of 52

Long short-term memory cell

[1]

50 of 52

Long short-term memory cell

[1]

51 of 52

Code example

[Github]

52 of 52

Sources

[1] “LSTM: A Search Space Odyssey” -- https://arxiv.org/pdf/1503.04069.pdf

[2] “RNN Escapades” -- London ML meetup 09/2015 Andrej Karpathy https://docs.google.com/presentation/d/1qs2IuSdZvbNfzw217kH5-1Z9DjG0Ng6fJiabaLNQVaY