1 of 51

Draw as you can tell:Controlled image synthesis and edit using TL-GAN

Shaobo Guan

Fellow of artificial intelligence, �Insight Data Science

Female, smile

Male, smile

Male, non-smile

2 of 51

Describe vs. Draw

classification vs. generation

Images

classification

Female

Smile

Male

Smile

Labels

Male

Non-smile

3 of 51

Describe vs. Draw

classification vs. generation

Images

generation

classification

Female

Smile

Male

Smile

Labels

Male

Non-smile

4 of 51

Describe vs. Draw

classification vs. generation

Images

generation

classification

Female

Smile

Male

Smile

Labels

Male

Non-smile

Smart synthesis and edit of images

5 of 51

In the ideal case, this is what we want.

6 of 51

GAN: generative adversarial network

State-of-the-art model that produces photo-realistic images from random noise

Synthetic

image

Synthetic images of pgGAN, Karras et. al, Nvidia, 2018

7 of 51

GAN: generative adversarial network

Replace noise with interpretable labels

Make it transparent, then control the content synthesis

Synthetic

image

Synthetic images of pgGAN, Karras et. al, Nvidia, 2018

Interpretable labels

8 of 51

TL-GAN: Transparent Latent space GAN

Generator network of GAN: from noise to image

9 of 51

TL-GAN: Transparent Latent space GAN

Feature extractor network: from image to label

Labeled real

images

10 of 51

TL-GAN: Transparent Latent space GAN

Make noise transparent by coupling generator with feature extractor

gender

age

expression

hair

Transparent

latent space

11 of 51

TL-GAN: Transparent Latent space GAN

Use the transparent latent space to control�image synthesis/edit

gender

age

expression

hair

Transparent

latent space

Controlled image synthesis/edit

12 of 51

TF-GAN: controlled generation

Male

Female

Axis: gender

13 of 51

TF-GAN: controlled generation

Smile

Non smile

Axis: smile

14 of 51

TF-GAN: real-time interactive control

15 of 51

Make him Young

16 of 51

Make him Young

17 of 51

Make him Young

18 of 51

Make him Young

19 of 51

More Beard

20 of 51

More Beard

21 of 51

Recede his Hairline

22 of 51

Recede his Hairline, again

23 of 51

Smile

24 of 51

Smile

25 of 51

Smile

26 of 51

Make him HER

27 of 51

Make him HER

28 of 51

Make him HER

29 of 51

Make him HER

30 of 51

More Bang hair

31 of 51

Wavy hair

32 of 51

Wavy hair

33 of 51

TL-GAN: Draw as you can tell

As long as you can classify, you can control the generation

gender

age

expression

hair

Transparent

latent space

34 of 51

Under the hood: efficiency (1 hour)

Simple and efficient workflow, no need to retrain GAN

Well-trained GAN,�(2 weeks Nvidia’s time)

Transfer Learning�with MobileNet

on low-resolution

labeled images (1h)

Pre-generate 20000�images (8h)

Regression (5 min)

35 of 51

TL-GAN: Conclusion and future directions

GitHub repo and interactive online demo, have fun!

https://github.com/SummitKwan/transparent_latent_gan

Medium blog post (Generating custom photo-realistic faces using AI)

Medium blog post received 58k+ views; GitHub repo received 600+ stars and 50+ forks

  • Edit your own photos: encoding network
  • Beyond faces: fashion design, augmenting rare business data

Novel method, state-of-the-art results in controlled image synthesis/edit

36 of 51

Shaobo Guan

PhD in Neuroscience, �MSc in computer science

37 of 51

Shaobo Guan

PhD in Neuroscience, �MSc in computer science

38 of 51

The following slides are back-ups

39 of 51

Online Interactive Demo: link

40 of 51

Under the hood: disentangle features

Use linear algebra to decorrelated feature vectors

beard

no beard

41 of 51

Under the hood: disentangle features

Use linear algebra to decorrelated feature vectors

male

female

beard

no beard

42 of 51

Under the hood: disentangle features

Use linear algebra to decorrelated feature vectors

male

female

beard

no beard

no beard

beard

43 of 51

Before disentangle

44 of 51

After disentangle

(make all other features orthogonal to gender and age)

45 of 51

To bridge latent vector with features

labeled dataset

xreal

yreal

paired data

zencode

G-1

xreal

yreal

paired data

z

xgen

G

y

label the

synthetic images

z

xgen

G

ypredict

F

z

xgen

G

GAN generator

z: latent vector

x: image

y: feature label

Potential approach 1:�Computing the latent vector for images in the labeled dataset

Potential approach 2:�label the features of synthetic images manually

Approach of TL-GAN:�Use a separately trained feature extractor network F to produce feature labels

link we want

to build

F is trained on labelled data (xreal, yreal)

46 of 51

labeled dataset

xreal

yreal

paired data

zencode

G-1

xreal

yreal

paired data

z

xgen

G

y

label the

synthetic images

z

xgen

G

ypredict

F

z

xgen

G

GAN generator

z: latent vector

x: image

y: feature label

Potential approach 1:�Computing the latent vector for images in the labeled dataset

Potential approach 2:�label the features of synthetic images manually

Approach of TL-GAN:�Use a separately trained feature extractor network F to produce feature labels

link we want

to build

F is trained on labelled data (xreal, yreal)

male

not young

non-smile

...

47 of 51

TL-GAN: Transparent Latent space GAN

Make noise transparent by coupling generator with feature extractor

gender

age

expression

hair

GAN generator network

CNN model

GLM regression

Random latent vector

xgen

z

ypredict

Uncovered feature axes

in the latent space

TL-GAN

model architecture

48 of 51

TL-GANmotivation

gender

age

expression

hair

Random vector in latent space

Feature axes in latent space

512 real numbers

1024×1024 px

image

Male

Female

move latent vector

along gender axis

Make latent space transparent

49 of 51

random

noise

vector

custom features:

male,

smile

glasses

...

Random generation of high quality images

Controlled image generation according to custom features

50 of 51

Discriminative vs generative models

Images

generative

discriminative

Female

Smile

...

Male

Smile

...

Feature labels

Male

Non-smile

...

51 of 51

GAN: generative adversarial network

State-of-the-art model to generate photo-realistic images

figure from Rohith Gandhi, Medium