1 of 22

Ali Eslami

Nicolas Heess

Theophane Weber

Yuval Tassa

Koray Kavukcuoglu

Geoffrey Hinton

Attend, Infer, Repeat:

Fast Scene Understanding with

Generative Models and Recurrent Neural Networks

2 of 22

x

z

blue brick

Model

Image

Cause

3 of 22

x

z

blue brick

pile of bricks

x

z

Model

Image

Cause

not sufficient for

grasping

counting

transfer

generalisation

4 of 22

x

z

x

z1

z2

Model

Image

Cause

blue brick

red brick

pile of bricks

5 of 22

x

z

Model

Image

Cause

x

zwhat

y1

z1

zwhere

z1

zwhat

y2

z2

zwhere

z2

att

y1

att

y2

blue brick

red brick

pile of bricks

blue brick

above

red brick

below

x

z1

z2

6 of 22

Decoder

x

y

z

Decoder

x

y

h1

h2

h3

z1

z2

z3

x

z

x

z1

z2

z3

Model

Inference Network

7 of 22

Decoder

x

y

h1

h2

h3

z1

z2

z3

Decoder

x

y

h1

h2

h3

zpres

z1

zpres

z2

zpres

z3

zwhat

z1

zwhat

z2

zwhat

z3

zwhere

z1

zwhere

z2

zwhere

z3

x

zwhat

y1

z1

zwhere

z1

zwhat

y2

z2

zwhere

z2

att

y1

att

y2

Model

Inference Network

x

z1

z2

z3

8 of 22

Decoder

x

y

h1

h2

h3

z1

z2

z3

Decoder

x

y

h1

h2

h3

zpres

z1

zpres

z2

zpres

z3

zwhat

z1

zwhat

z2

zwhat

z3

zwhere

z1

zwhere

z2

zwhere

z3

x

zwhat

y1

z1

zwhere

z1

zwhat

y2

z2

zwhere

z2

att

y1

att

y2

Model

Inference Network

x

z1

z2

z3

focus on representation

not reconstruction

output is a set

order? count?

9 of 22

x

y

zpres

zwhat

xatt

yatt

hi

zwhere

...

VAE

yi

i

i

i

i

i

...

...

10 of 22

Key ideas

1. Build in structure

Get out meaning

2. Inference networks that are

    • recurrent
    • variable-length
    • attentive

3. End-to-end learning through

  • discrete, continuous vars
  • inference and model nets

Decoder

x

y

h1

h2

h3

zpres

z1

zpres

z2

zpres

z3

zwhat

z1

zwhat

z2

zwhat

z3

zwhere

z1

zwhere

z2

zwhere

z3

x

zwhat

y1

z1

zwhere

z1

zwhat

y2

z2

zwhere

z2

att

y1

att

y2

11 of 22

Demo reel

12 of 22

Omniglot

13 of 22

Representational power

6

9

no

yes

Sum?

Increasing order?

14 of 22

Additional structure

x

z

distributed vector that correlates

with blue brick

learned

15 of 22

Additional structure

x

z

x

z

distributed vector that correlates

with blue brick

class=brick

colour=blue

position=P

rotation=R

learned

specified

16 of 22

17 of 22

18 of 22

Additional structure

Decoder

x

y

h1

h2

h3

z1

z2

z3

x

z1

z2

z3

specified

19 of 22

Inverse graphics

20 of 22

21 of 22

Policy learning

Table-top

MNIST

22 of 22

Attend, Infer, Repeat

Fast Scene Understanding with Generative Models

http://arxiv.org/abs/1603.08575

https://youtu.be/4tc84kKdpY4