1 of 30

DISENTANGLED REPRESENTATION LEARNING

Gennaro Gala

“In a certain light, all of science is one big unsupervised learning problem in which we search for the most disentangled representation of the world around us”

2 of 30

ABOUT ME

EDUCATION

PhD student in Computer Science, 2021- Present

TU/e, the Netherlands

MSc in Computer Science, 2018-2020

University of Bari Aldo Moro, Italy

Bachelor in Computer Science, 2015-2018

University of Bari Aldo Moro, Italy

2

INTERESTS

Disentanglement
Probabilistic Circuits� github.com/deeprob-org/deeprob-kit
Generative Models
Machine Learning
Neural-Symbolic Integration

g.gala@tue.nl

3 of 30

OUTLINE

3

Disentangled Representation Learning (DRL)

From AE to VAE

Unsupervised DRL methods

Weakly-Supervised DRL

Class-Content-Style Disentanglement

Image-to-Image translation

4 of 30

REPRESENTATION LEARNING

Representation Learning is learning representations of the data that make it easier to extract useful information when building classifiers or other predictors.

(Y. Bengio et al., 2013)

4

Manifold Assumption: The data lie approximately on a manifold of much lower dimension than the input space

5 of 30

REPRESENTATION LEARNING - ISSUES

5

v

CAT

DOG

FROG

ZEBRA

Due to their highly entangled nature, representations learned by NNs are not ideal.

What makes a good representation?

Difficult to interpret and reuse (without fine-tuning)

6 of 30

DISENTANGLED REPRESENTATION LEARNING (DRL)

6

Color

L-R

Pose

Color

L-R

Pose

Color

L-R

Pose

7 of 30

HOW CAN WE LEARN REPRESENTATIONS IN AN UNSUPERVISED WAY?�

7

8 of 30

AUTOENCODERS (AEs)

8

RGB image

Information bottleneck

9 of 30

AEs 2D LATENT SPACE - MNIST

9

Note x and y axis values!

Disentangled Representations - How to do Interpretable Compression with Neural Models, Yordan Hristov.

10 of 30

VARIATIONAL AUTOENCODERS (VAEs)

10

RGB image

Negative ELBO

Kingma, D. P. & Welling, M. (2014). Auto-Encoding Variational Bayes

11 of 30

VAEs 2D LATENT SPACE - MNIST

11

Disentangled Representations - How to do Interpretable Compression with Neural Models, Yordan Hristov.

Note x and y axis values!

12 of 30

VAEs 2D LATENT SPACE - MNIST

12

robz.github.io/mnist-vae

13 of 30

DSPRITES DATASET

13

github.com/deepmind/dsprites-dataset

14 of 30

HOW CAN WE CHECK IF A REPRESENTATION IS DISENTANGLED?

14

1

2

3

Grab an image

Encode

Traverse each dimension independently

4

Decode

15 of 30

LATENT TRAVERSAL IN VAEs

15

Latent traversal: each column corresponds to the traversal of a single latent variable while keeping the others fixed

Latent traversals are meaningless!

You don’t say?

16 of 30

16

RGB image

Negative ELBO

17 of 30

17

Latent traversal: each column corresponds to the traversal of a single latent variable while keeping the others fixed

Why does it (partially) work?

Y-pos X-pos Scale Rotat. Rotat.

18 of 30

Unsupervised DRL Models

18

github.com/YannDubs/disentangling-vae

Common theme: Regularize more, regularize better!

Factor-VAE trained on CelebA

19 of 30

Impossibility of unsupervised DRL (without biases)

[Locatello et al., 2019] proved that unsupervised learning of disentangled representations is theoretically impossible from i.i.d. observations without inductive bias on both the models and the data.

19

No empirical evidence that the considered models can be used to reliably learn disentangled representations
Random seeds and hyper-parameters seem to matter more than the model choice

Metric=DCI Disentanglement

1 2 3 4 5 6

Dataset=Dsprites

20 of 30

OUTLINE

20

Disentangled Representation Learning (DRL)

From AE to VAE

Unsupervised DRL methods

Weakly-Supervised DRL

Class-Content-Style Disentanglement

Image-to-Image translation

21 of 30

WEAKLY SUPERVISED DRL – From i.i.d. to non-i.i.d.

21

Color

L-R

Pose

Color

L-R

Pose

22 of 30

CLASS-CONTENT-STYLE DISENTANGLEMENT

22

CLASS: information shared by all images sharing the same class

CONTENT: spatial information that is unchanged if an image is transferred between classes (spatial structure of the image)

STYLE: texture information that is unchanged if an image is transferred between classes (rendering of the structure)

- A Style-Based Generator Architecture for GANs, 2018.

- thispersondoesnotexist.com

STYLE

CONTENT

Dataset dependent problem!

23 of 30

CYCLE-CONSISTENT VAE

23

Weakly supervision: labels are not used !

24 of 30

CYCLE-CONSISTENT VAE – AVOIDING SHORTCUT PROBLEM

24

randomly sampled

25 of 30

CYCLE-CONSISTENT VAE - SWAPPING

25

class

style

class

content

Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders

26 of 30

IMAGE-TO-IMAGE TRANSLATION

From pairs of images to a pair of datasets

26

Image translation is the task of mapping images between different domains: Given an input image in a source domain (e.g., dogs), we aim to generate an analogous image in a target domain (e.g., cats).

The basic assumption is that multi-domain images share common content but differ style

27 of 30

IMAGE-TO-IMAGE TRANSLATION - MUNIT

27

Decode

GAN loss

Prior

Encode

Dog

Cat

Input

images

Recon.

images

Within-domain reconstruction

Cross-domain translation

Input

images

Trans.

images

28 of 30

IMAGE-TO-IMAGE TRANSLATION - APPLICATIONS

28

Multimodal Unsupervised Image-to-Image Translation, X. Huang et al.

29 of 30

CONCLUSIONS

Disentangling is good to gain interpretability
Disentangled representations are useful for many downstream tasks
Learning such representations in unsupervised fashion might be troublesome in the real world
Weak supervision has theoretical guarantees and is available in several scenarios in practice
Disentangling high-level concepts is easier than disentangling all factors of variation

29

“In a certain light, all of science is one big unsupervised learning problem in which we search for the most disentangled representation of the world around us”

30 of 30

30

DISENTANGLED REPRESENTATION LEARNING

Gennaro Gala

THANK YOU FOR LISTENING!

ANY QUESTIONS?