1 of 51

Draw as you can tell:�Controlled image synthesis and edit using TL-GAN

Shaobo Guan

Fellow of artificial intelligence, �Insight Data Science

Female, smile

Male, smile

Male, non-smile

2 of 51

Describe vs. Draw

classification vs. generation

Images

classification

Female

Smile

Male

Smile

Labels

Male

Non-smile

3 of 51

Describe vs. Draw

classification vs. generation

Images

generation

classification

Female

Smile

Male

Smile

Labels

Male

Non-smile

4 of 51

Describe vs. Draw

classification vs. generation

Images

generation

classification

Female

Smile

Male

Smile

Labels

Male

Non-smile

Smart synthesis and edit of images

5 of 51

In the ideal case, this is what we want.

6 of 51

GAN: generative adversarial network

State-of-the-art model that produces photo-realistic images from random noise

Synthetic

image

Synthetic images of pgGAN, Karras et. al, Nvidia, 2018

7 of 51

GAN: generative adversarial network

Replace noise with interpretable labels

Make it transparent, then control the content synthesis

Synthetic

image

Synthetic images of pgGAN, Karras et. al, Nvidia, 2018

Interpretable labels

✗

8 of 51

TL-GAN: Transparent Latent space GAN

Generator network of GAN: from noise to image

9 of 51

TL-GAN: Transparent Latent space GAN

Feature extractor network: from image to label

Labeled real

images

10 of 51

TL-GAN: Transparent Latent space GAN

Make noise transparent by coupling generator with feature extractor

gender

age

expression

hair

✗

Transparent

latent space

11 of 51

TL-GAN: Transparent Latent space GAN

Use the transparent latent space to control�image synthesis/edit

gender

age

expression

hair

✗

Transparent

latent space

Controlled image synthesis/edit

12 of 51

TF-GAN: controlled generation

Male

Female

Axis: gender

13 of 51

TF-GAN: controlled generation

Wavy hair

33 of 51

TL-GAN: Draw as you can tell

As long as you can classify, you can control the generation

gender

age

expression

hair

✗

Transparent

latent space

34 of 51

Under the hood: efficiency (1 hour)

Simple and efficient workflow, no need to retrain GAN

Well-trained GAN,�(2 weeks Nvidia’s time)

Transfer Learning�with MobileNet

on low-resolution

labeled images (1h)

Pre-generate 20000�images (8h)

Regression (5 min)

35 of 51

TL-GAN: Conclusion and future directions

GitHub repo and interactive online demo, have fun!

https://github.com/SummitKwan/transparent_latent_gan

Medium blog post (Generating custom photo-realistic faces using AI)

Medium blog post received 58k+ views; GitHub repo received 600+ stars and 50+ forks

Edit your own photos: encoding network
Beyond faces: fashion design, augmenting rare business data

Novel method, state-of-the-art results in controlled image synthesis/edit

36 of 51

Shaobo Guan

PhD in Neuroscience, �MSc in computer science

37 of 51

Shaobo Guan

PhD in Neuroscience, �MSc in computer science

38 of 51

no beard

beard

43 of 51

Before disentangle

44 of 51

After disentangle

(make all other features orthogonal to gender and age)

45 of 51

To bridge latent vector with features

labeled dataset

x_real

y_real

paired data

z_encode

G^-1

x_real

y_real

paired data

z

x_gen

G

y

label the

synthetic images

✗

z

x_gen

G

y_predict

F

z

x_gen

G

GAN generator

z: latent vector

x: image

y: feature label

Potential approach 1:�Computing the latent vector for images in the labeled dataset

Potential approach 2:�label the features of synthetic images manually

Approach of TL-GAN:�Use a separately trained feature extractor network F to produce feature labels

link we want

to build

F is trained on labelled data (x_real, y_real)

46 of 51

labeled dataset

x_real

y_real

paired data

z_encode

G^-1

x_real

y_real

paired data

z

x_gen

G

y

label the

synthetic images

✗

z

x_gen

G

y_predict

F

z

x_gen

G

GAN generator

z: latent vector

x: image

y: feature label

Potential approach 1:�Computing the latent vector for images in the labeled dataset

Potential approach 2:�label the features of synthetic images manually

Approach of TL-GAN:�Use a separately trained feature extractor network F to produce feature labels

link we want

to build

F is trained on labelled data (x_real, y_real)

male

not young

non-smile

...

47 of 51

TL-GAN: Transparent Latent space GAN

Make noise transparent by coupling generator with feature extractor

gender

age

expression

hair

GAN generator network

CNN model

GLM regression

Random latent vector

x_gen

z

y_predict

Uncovered feature axes

in the latent space

TL-GAN

model architecture

48 of 51

TL-GANmotivation

gender

age

expression

hair

✗

Random vector in latent space

Feature axes in latent space

512 real numbers

1024×1024 px

image

Male

Female

move latent vector

along gender axis

Make latent space transparent

49 of 51

random

noise

vector

custom features:

male,

smile

glasses

...

Random generation of high quality images

Controlled image generation according to custom features

50 of 51

Discriminative vs generative models

Images

generative

discriminative

Female

Smile

...

Male

Smile

...

Feature labels

Male

Non-smile

...

51 of 51

GAN: generative adversarial network

State-of-the-art model to generate photo-realistic images

figure from Rohith Gandhi, Medium