第 1 页,共 34 页

Sketch-Guided Text-to-Image Generation

Final Report - Jul 27, by Elliott Wu

Mentor: Hyungjoo Cho

Advisor: Yongyi Lu, Yu-Wing Tai,

Chi-Keung Tang

第 2 页,共 34 页

sText2Image

male, long face, smile with mouth closed, double eyelids, five o'clock shadow…

第 3 页,共 34 页

sText2Image

male, long face, smile with mouth closed, double eyelids, five o'clock shadow…

第 4 页,共 34 页

sText2Image

male, long face, smile with mouth closed, double eyelids, five o'clock shadow…

TEXT

SKETCH

IMAGE

第 5 页,共 34 页

Text2Image

  • Generative Adversarial Text-to-Image Synthesis (Reed et al, ICML 2016)
  • Stack-GAN: Text to Photo-realistic Image Synthesis (Zhang et al, arxiv)

* retrieved from Stack-GAN

第 6 页,共 34 页

Sketch?

* Jun-yan Zhu, Generative Visual Manipulation on the Natural Image Manifold, ECCV 2016

第 7 页,共 34 页

Sketch?

* collected from volunteers

第 8 页,共 34 页

Sketch?

?

?

?

第 9 页,共 34 页

Sketch?

第 10 页,共 34 页

Joint Representation

male, long face, smile with mouth closed, double eyelids, five o'clock shadow…

TEXT

SKETCH

IMAGE

male, long face, smile with mouth closed, double eyelids, five o'clock shadow…

Joint Space

TEXT

SKETCH

IMAGE

第 11 页,共 34 页

Network Architecture - Training

512

128

64

4x8

8x16

16x32

32x64

256

64

128

256

32x64

16x32

8x16

4x8

G(z, t)

y

Generator:

Discriminator:

100

z

linear

512

t

18

18

replicate

512

fake/wrong

real

t

18

replicate

4x8

18

64

第 12 页,共 34 页

Network Architecture - Testing

z

Lcontextual :

Discriminator

Lperceptual :

text

G(z, t)

Generator

Input:

text

sketch

text

Output:

backprop

第 13 页,共 34 页

Data Preparation - Image

Face (CelebA)

Bird (CUB)

Flower (Oxford)

11k

202k

8k

第 14 页,共 34 页

Data Preparation - Image

40 attributes:

1 : "5_o_Clock_Shadow"

2 : "Big_Lips"

3 : "Big_Nose"

4 : "Chubby"

5 : "Double_Chin"

6 : "Eyeglasses"

7 : "Goatee"

8 : "Heavy_Makeup"

9 : "High_Cheekbones"

10 : "Male"

11 : Mouth_Slightly_Open"

12 : "Mustache"

...

For both bird and flower datasets, 10 captions per image provided by char-CNN-RNN (Reed et al, CVPR 2016):

attribute vector OR

text embedding

Face (CelebA)

Bird (CUB)

Flower (Oxford)

第 15 页,共 34 页

Data Preparation - Synthesized Sketch�

Edge detection:

  • XDog (Winnemöller et al, Computer & Graphics 2012)
  • Photoshop photocopy effect

Simplification (synthesized sketches):

  • Sketch simplification (Simo-Serra & Iizuka, SIGGRAPH 2016)

Image

Edge

Simplified

第 16 页,共 34 页

Data Preparation - Freehand Sketch�

* collected from volunteers

第 17 页,共 34 页

Experiments - Face

male, long face, smile with mouth closed, double eyelids, five o'clock shadow…

ATTRIBUTES

SKETCH

IMAGE

第 18 页,共 34 页

Experiments - Failures

第 19 页,共 34 页

Experiments - Failures

第 20 页,共 34 页

第 21 页,共 34 页

Experiments - Finally…

第 22 页,共 34 页

Experiments - Face

1

Attributes Match Sketch

2

Attributes Mismatch Sketch

3

Freehand Sketch

第 23 页,共 34 页

Experiments - Match (Mustache)

第 24 页,共 34 页

Experiments - Match (Eyeglasses)

第 25 页,共 34 页

Experiments - Match (Lipstick)

第 26 页,共 34 页

Female, Heavy_Makeup, Wearing_Lipstick

Experiments - Mismatch

Female, Heavy_Makeup, Smiling, Wearing_Lipstick

第 27 页,共 34 页

Experiments - Mismatch

Male, Chubby, Double_Chin, High_Cheekbones, Mouth_Open

Male

第 28 页,共 34 页

Experiments - Mismatch

Female, High_Cheekbones, Smiling, Wearing_Lipstick, No_Eyeglasses

Female, Heavy_Makeup, High_Cheekbones, Pointy_Nose, Smiling, Wearing_Lipstick, No_Eyeglasses

第 29 页,共 34 页

第 30 页,共 34 页

Experiments - Freehand

第 31 页,共 34 页

Experiments - Freehand

第 32 页,共 34 页

Experiments - Failure Cases (Eyeglasses)

第 33 页,共 34 页

Timeline

Before Mar

Ideation

Mar

Submitted to ICCV on Sketch-to-Image

Jul

Extension on Sketch-

Guided Text-to-Image

Aug

Run experiments on bird and flower datasets

Sept - Oct

Refine results and paper write-up

Nov

Submit to CVPR

第 34 页,共 34 页

THANK YOU!

Shangzhe (Elliott) Wu

Email: swuai@ust.hk

GitHub: elliottwu