Draw as you can tell:�Controlled image synthesis and edit using TL-GAN
Shaobo Guan
Fellow of artificial intelligence, �Insight Data Science
Female, smile
Male, smile
Male, non-smile
Describe vs. Draw
classification vs. generation
Images
classification
Female
Smile
Male
Smile
Labels
Male
Non-smile
Describe vs. Draw
classification vs. generation
Images
generation
classification
Female
Smile
Male
Smile
Labels
Male
Non-smile
Describe vs. Draw
classification vs. generation
Images
generation
classification
Female
Smile
Male
Smile
Labels
Male
Non-smile
Smart synthesis and edit of images
In the ideal case, this is what we want.
GAN: generative adversarial network
State-of-the-art model that produces photo-realistic images from random noise
Synthetic
image
Synthetic images of pgGAN, Karras et. al, Nvidia, 2018
GAN: generative adversarial network
Replace noise with interpretable labels
Make it transparent, then control the content synthesis
Synthetic
image
Synthetic images of pgGAN, Karras et. al, Nvidia, 2018
Interpretable labels
✗
TL-GAN: Transparent Latent space GAN
Generator network of GAN: from noise to image
TL-GAN: Transparent Latent space GAN
Feature extractor network: from image to label
Labeled real
images
TL-GAN: Transparent Latent space GAN
Make noise transparent by coupling generator with feature extractor
gender
age
expression
hair
✗
Transparent
latent space
TL-GAN: Transparent Latent space GAN
Use the transparent latent space to control�image synthesis/edit
gender
age
expression
hair
✗
Transparent
latent space
Controlled image synthesis/edit
TF-GAN: controlled generation
Male
Female
Axis: gender
TF-GAN: controlled generation
Smile
Non smile
Axis: smile
TF-GAN: real-time interactive control
Make him Young
Make him Young
Make him Young
Make him Young
More Beard
More Beard
Recede his Hairline
Recede his Hairline, again
Smile
Smile
Smile
Make him HER
Make him HER
Make him HER
Make him HER
More Bang hair
Wavy hair
Wavy hair
TL-GAN: Draw as you can tell
As long as you can classify, you can control the generation
gender
age
expression
hair
✗
Transparent
latent space
Under the hood: efficiency (1 hour)
Simple and efficient workflow, no need to retrain GAN
Well-trained GAN,�(2 weeks Nvidia’s time)
Transfer Learning�with MobileNet
on low-resolution
labeled images (1h)
Pre-generate 20000�images (8h)
Regression (5 min)
TL-GAN: Conclusion and future directions
GitHub repo and interactive online demo, have fun!
https://github.com/SummitKwan/transparent_latent_gan
Medium blog post (Generating custom photo-realistic faces using AI)
Medium blog post received 58k+ views; GitHub repo received 600+ stars and 50+ forks
Novel method, state-of-the-art results in controlled image synthesis/edit
Shaobo Guan
PhD in Neuroscience, �MSc in computer science
Shaobo Guan
PhD in Neuroscience, �MSc in computer science
The following slides are back-ups
Online Interactive Demo: link
Under the hood: disentangle features
Use linear algebra to decorrelated feature vectors
beard
no beard
Under the hood: disentangle features
Use linear algebra to decorrelated feature vectors
male
female
beard
no beard
Under the hood: disentangle features
Use linear algebra to decorrelated feature vectors
male
female
beard
no beard
no beard
beard
Before disentangle
After disentangle
(make all other features orthogonal to gender and age)
To bridge latent vector with features
labeled dataset
xreal
yreal
paired data
zencode
G-1
xreal
yreal
paired data
z
xgen
G
y
label the
synthetic images
✗
✗
z
xgen
G
ypredict
F
z
xgen
G
GAN generator
z: latent vector
x: image
y: feature label
Potential approach 1:�Computing the latent vector for images in the labeled dataset
Potential approach 2:�label the features of synthetic images manually
Approach of TL-GAN:�Use a separately trained feature extractor network F to produce feature labels
link we want
to build
F is trained on labelled data (xreal, yreal)
labeled dataset
xreal
yreal
paired data
zencode
G-1
xreal
yreal
paired data
z
xgen
G
y
label the
synthetic images
✗
✗
z
xgen
G
ypredict
F
z
xgen
G
GAN generator
z: latent vector
x: image
y: feature label
Potential approach 1:�Computing the latent vector for images in the labeled dataset
Potential approach 2:�label the features of synthetic images manually
Approach of TL-GAN:�Use a separately trained feature extractor network F to produce feature labels
link we want
to build
F is trained on labelled data (xreal, yreal)
male
not young
non-smile
...
TL-GAN: Transparent Latent space GAN
Make noise transparent by coupling generator with feature extractor
gender
age
expression
hair
GAN generator network
CNN model
GLM regression
Random latent vector
xgen
z
ypredict
Uncovered feature axes
in the latent space
TL-GAN
model architecture
TL-GANmotivation
gender
age
expression
hair
✗
Random vector in latent space
Feature axes in latent space
512 real numbers
1024×1024 px
image
Male
Female
move latent vector
along gender axis
Make latent space transparent
random
noise
vector
custom features:
male,
smile
glasses
...
Random generation of high quality images
Controlled image generation according to custom features
Discriminative vs generative models
Images
generative
discriminative
Female
Smile
...
Male
Smile
...
Feature labels
Male
Non-smile
...
GAN: generative adversarial network
State-of-the-art model to generate photo-realistic images
figure from Rohith Gandhi, Medium