1 of 30

DISENTANGLED REPRESENTATION LEARNING

Gennaro Gala

“In a certain light, all of science is one big unsupervised learning problem in which we search for the most disentangled representation of the world around us”

2 of 30

ABOUT ME

EDUCATION

  • PhD student in Computer Science, 2021- Present

TU/e, the Netherlands

  • MSc in Computer Science, 2018-2020

University of Bari Aldo Moro, Italy

  • Bachelor in Computer Science, 2015-2018

University of Bari Aldo Moro, Italy

2

INTERESTS

  • Disentanglement
  • Probabilistic Circuits� github.com/deeprob-org/deeprob-kit
  • Generative Models
  • Machine Learning
  • Neural-Symbolic Integration

3 of 30

OUTLINE

3

Disentangled Representation Learning (DRL)

From AE to VAE

Unsupervised DRL methods

Weakly-Supervised DRL

Class-Content-Style Disentanglement

Image-to-Image translation

4 of 30

REPRESENTATION LEARNING

Representation Learning is learning representations of the data that make it easier to extract useful information when building classifiers or other predictors.

(Y. Bengio et al., 2013)

4

 

 

 

 

 

 

Manifold Assumption: The data lie approximately on a manifold of much lower dimension than the input space

5 of 30

REPRESENTATION LEARNING - ISSUES

5

 

 

 

 

v

v

CAT

DOG

FROG

ZEBRA

 

Due to their highly entangled nature, representations learned by NNs are not ideal.

What makes a good representation?

Difficult to interpret and reuse (without fine-tuning)

6 of 30

DISENTANGLED REPRESENTATION LEARNING (DRL)

  •  

6

Color

L-R

Pose

Color

L-R

Pose

Color

L-R

Pose

 

7 of 30

HOW CAN WE LEARN REPRESENTATIONS IN AN UNSUPERVISED WAY?�

7

8 of 30

AUTOENCODERS (AEs)

8

 

 

 

 

 

RGB image

RGB image

 

 

Information bottleneck

9 of 30

AEs 2D LATENT SPACE - MNIST

9

Note x and y axis values!

Disentangled Representations - How to do Interpretable Compression with Neural Models, Yordan Hristov.

10 of 30

VARIATIONAL AUTOENCODERS (VAEs)

10

 

 

 

 

 

 

 

 

RGB image

RGB image

 

 

Negative ELBO

Kingma, D. P. & Welling, M. (2014). Auto-Encoding Variational Bayes

 

11 of 30

VAEs 2D LATENT SPACE - MNIST

11

Disentangled Representations - How to do Interpretable Compression with Neural Models, Yordan Hristov.

Note x and y axis values!

12 of 30

VAEs 2D LATENT SPACE - MNIST

12

robz.github.io/mnist-vae

13 of 30

DSPRITES DATASET

13

 

github.com/deepmind/dsprites-dataset

14 of 30

HOW CAN WE CHECK IF A REPRESENTATION IS DISENTANGLED?

14

1

 

2

 

3

Grab an image

Encode

Traverse each dimension independently

 

4

Decode

15 of 30

LATENT TRAVERSAL IN VAEs

15

 

Latent traversal: each column corresponds to the traversal of a single latent variable while keeping the others fixed

 

 

 

 

 

 

 

 

 

Latent traversals are meaningless!

You don’t say?

16 of 30

 

16

 

 

 

 

 

 

 

 

RGB image

RGB image

 

 

 

Negative ELBO

 

17 of 30

 

17

 

Latent traversal: each column corresponds to the traversal of a single latent variable while keeping the others fixed

 

 

 

 

 

 

 

 

 

Why does it (partially) work?

Y-pos X-pos Scale Rotat. Rotat.

 

 

18 of 30

Unsupervised DRL Models

  •  

18

github.com/YannDubs/disentangling-vae

Common theme: Regularize more, regularize better!

Factor-VAE trained on CelebA

19 of 30

Impossibility of unsupervised DRL (without biases)

  • [Locatello et al., 2019] proved that unsupervised learning of disentangled representations is theoretically impossible from i.i.d. observations without inductive bias on both the models and the data.

19

  • No empirical evidence that the considered models can be used to reliably learn disentangled representations
  • Random seeds and hyper-parameters seem to matter more than the model choice

Metric=DCI Disentanglement

1 2 3 4 5 6

Dataset=Dsprites

 

20 of 30

OUTLINE

20

Disentangled Representation Learning (DRL)

From AE to VAE

Unsupervised DRL methods

Weakly-Supervised DRL

Class-Content-Style Disentanglement

Image-to-Image translation

21 of 30

WEAKLY SUPERVISED DRL – From i.i.d. to non-i.i.d.

  •  

21

Color

L-R

Pose

Color

L-R

Pose

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22 of 30

CLASS-CONTENT-STYLE DISENTANGLEMENT

22

  • CLASS: information shared by all images sharing the same class

  • CONTENT: spatial information that is unchanged if an image is transferred between classes (spatial structure of the image)

  • STYLE: texture information that is unchanged if an image is transferred between classes (rendering of the structure)

- A Style-Based Generator Architecture for GANs, 2018.

- thispersondoesnotexist.com

STYLE

CONTENT

 

Dataset dependent problem!

23 of 30

CYCLE-CONSISTENT VAE

23

 

 

 

 

 

 

 

 

 

 

 

 

 

Weakly supervision: labels are not used !

24 of 30

CYCLE-CONSISTENT VAE – AVOIDING SHORTCUT PROBLEM

24

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

randomly sampled

25 of 30

CYCLE-CONSISTENT VAE - SWAPPING

25

class

style

class

content

Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders

26 of 30

IMAGE-TO-IMAGE TRANSLATION

  • From pairs of images to a pair of datasets

26

Image translation is the task of mapping images between different domains: Given an input image in a source domain (e.g., dogs), we aim to generate an analogous image in a target domain (e.g., cats).

The basic assumption is that multi-domain images share common content but differ style

27 of 30

IMAGE-TO-IMAGE TRANSLATION - MUNIT

27

Decode

 

 

 

 

 

 

 

 

 

 

 

 

 

GAN loss

Prior

Encode

Dog

Cat

Input

images

Recon.

images

Within-domain reconstruction

Cross-domain translation

Input

images

Trans.

images

28 of 30

IMAGE-TO-IMAGE TRANSLATION - APPLICATIONS

28

 

Multimodal Unsupervised Image-to-Image Translation, X. Huang et al.

29 of 30

CONCLUSIONS

  • Disentangling is good to gain interpretability
  • Disentangled representations are useful for many downstream tasks
  • Learning such representations in unsupervised fashion might be troublesome in the real world
  • Weak supervision has theoretical guarantees and is available in several scenarios in practice
  • Disentangling high-level concepts is easier than disentangling all factors of variation

29

“In a certain light, all of science is one big unsupervised learning problem in which we search for the most disentangled representation of the world around us”

30 of 30

30

DISENTANGLED REPRESENTATION LEARNING

  • Gennaro Gala

THANK YOU FOR LISTENING!

ANY QUESTIONS?