1 of 15

ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

Shiwei Jin 1, Zhen Wang 2, Lei Wang 2, Ning Bi 2, Truong Nguyen 1

1 ECE Dept. UC San Diego, 2 Qualcomm Technologies, Inc.

TUE-PM-135

1/13

2 of 15

2/13

3 of 15

Motivation

[1] Y. Zheng, S. Park, X. Zhang, S. De Mello, and O. Hilliges, “Self-learning transformations for improving gaze and head redirection,” Advances in Neural Information Processing Systems, vol. 33, pp. 13127–13138, 2020.

[2] Y. Ganin, D. Kononenko, D. Sungatullina, and V. Lempitsky, “Deepwarp: Photorealistic image resynthesis for gaze manipulation,” in European conference on computer vision, pp. 311–326, Springer, 2016.

[3] Y. Yu and J.-M. Odobez, “Unsupervised representation learning for gaze estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7314–7324, 2020.

[4] S. Park, S. D. Mello, P. Molchanov, U. Iqbal, O. Hilliges, and J. Kautz, “Few-shot adaptive gaze estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9368–9377, 2019.

[5] Z. Wu, D. Lischinski, and E. Shechtman, “Stylespace analysis: Disentangled controls for stylegan image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12863–12872, 2021.

Task

Method

Category

Image

Resolution

DoF

Condition

Gaze Redirection

DeepWarp [1]

Warping-

Eye

2

Pitch & Yaw

Yu et al. [2]

Warping-

Eye

2

FAZE [3]

Generator-

Eye

2

ST-ED [4]

Generator-

Face (Restricted)

2

Face Editing

StyleSapce [5]

Generator-

Face

1

No physical meanings

3/13

4 of 15

Latent Vectors Editing in cGAN

[4] Y. Zheng, S. Park, X. Zhang, S. De Mello, and O. Hilliges, “Self-learning transformations for improving gaze and head redirection,” Advances in Neural Information Processing Systems, vol. 33, pp. 13127–13138, 2020.

[5] Y. Dalva, S. F. Altındiş, and A. Dundar, "Vecgan: Image-to-image translation with interpretable latent directions," in *Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XVI*, pp. 153-169, 2022.

4/13

5 of 15

Latent Vectors Editing in cGAN

[4] Y. Zheng, S. Park, X. Zhang, S. De Mello, and O. Hilliges, “Self-learning transformations for improving gaze and head redirection,” Advances in Neural Information Processing Systems, vol. 33, pp. 13127–13138, 2020.

[5] Y. Dalva, S. F. Altındiş, and A. Dundar, "Vecgan: Image-to-image translation with interpretable latent directions," in *Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XVI*, pp. 153-169, 2022.

Method

Latent Vector Compression

Initial Condition Estimation

Interpretability*

Portability

Editability

Physical Meaning of Conditions

VecGAN [5]

No

Yes

No

Yes

No

No

ST-ED [4]

Yes

Yes

Yes

No

-

Yes

ReDirTrans

No

Yes

Yes

Yes

Yes

Yes

* Transformation equivariant mappings between the embedding space and image space.

5/13

6 of 15

Latent Vectors Editing in cGAN

[4] Y. Zheng, S. Park, X. Zhang, S. De Mello, and O. Hilliges, “Self-learning transformations for improving gaze and head redirection,” Advances in Neural Information Processing Systems, vol. 33, pp. 13127–13138, 2020.

[5] Y. Dalva, S. F. Altındiş, and A. Dundar, "Vecgan: Image-to-image translation with interpretable latent directions," in *Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XVI*, pp. 153-169, 2022.

Method

Latent Vector Compression

Initial Condition Estimation

Interpretability*

Portability

Editability

Physical Meaning of Conditions

VecGAN [5]

No

Yes

No

Yes

No

No

ST-ED [4]

Yes

Yes

Yes

No

-

Yes

ReDirTrans

No

Yes

Yes

Yes

Yes

Yes

* Transformation equivariant mappings between the embedding space and image space.

5/13

7 of 15

Latent Vectors Editing in cGAN

[4] Y. Zheng, S. Park, X. Zhang, S. De Mello, and O. Hilliges, “Self-learning transformations for improving gaze and head redirection,” Advances in Neural Information Processing Systems, vol. 33, pp. 13127–13138, 2020.

[5] Y. Dalva, S. F. Altındiş, and A. Dundar, "Vecgan: Image-to-image translation with interpretable latent directions," in *Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XVI*, pp. 153-169, 2022.

Method

Latent Vector Compression

Initial Condition Estimation

Interpretability*

Portability

Editability

Physical Meaning of Conditions

VecGAN [5]

No

Yes

No

Yes

No

No

ST-ED [4]

Yes

Yes

Yes

No

-

Yes

ReDirTrans

No

Yes

Yes

Yes

Yes

Yes

* Transformation equivariant mappings between the embedding space and image space.

5/13

8 of 15

ReDirTrans-GAN

  • ReDirTrans works with GAN inversion

[6] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. Cohen-Or, “Designing an encoder for stylegan image manipulation,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–14, 2021.

[7] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8110–8119, 2020.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Trainable

Fixed

 

 

 

 

 

 

 

 

 

 

6/13

9 of 15

Results

  • Quantitative Comparison

[8] K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, W. Matusik, and A. Torralba, “Eye tracking for everyone,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2176–2184, 2016.

[9] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “It’s written all over your face: Full-face appearance-based gaze estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 51–60, 2017.

Gaze Redir

Head Redir

Gaze Induce

Head Induce

LPIPS

StarGAN

4.602

3.989

0.755

3.067

0.257

He et al.

4.617

1.392

0.560

3.925

0.223

VecGAN

2.282

0.824

0.401

2.205

0.197

ST-ED

2.385

0.800

0.384

2.187

0.208

ReDirTrans

2.163

0.753

0.429

2.155

0.197

Gaze Redir

Head Redir

Gaze Induce

Head Induce

LPIPS

StarGAN

4.488

3.031

0.786

2.783

0.260

He et al.

5.092

1.372

0.684

3.411

0.241

VecGAN

2.670

1.242

0.391

1.941

0.207

ST-ED

2.380

1.085

0.371

1.782

0.212

ReDirTrans

2.380

0.985

0.391

1.782

0.202

Within-dataset: GazeCapture Test Subset [8]

Cross-dataset: MPIIFaceGaze [9]

7/13

10 of 15

Results

  • Qualitative Comparison

[4] Y. Zheng, S. Park, X. Zhang, S. De Mello, and O. Hilliges, “Self-learning transformations for improving gaze and head redirection,” Advances in Neural Information Processing Systems, vol. 33, pp. 13127–13138, 2020.

ST-ED [4]

ReDirTrans

Target

Subject 1

Subject 2

Subject 3

Subject 4

8/13

11 of 15

Results

  • Qualitative Comparison

9/13

12 of 15

Gaze Correction

  • Qualitative Comparison
    • CelebA-HQ [10]

  • Pipeline
    • Using the same image as both input and target samples

[10] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.

10/13

13 of 15

Gaze Correction

  • Quantitative Performance

[8] K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, W. Matusik, and A. Torralba, “Eye tracking for everyone,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2176–2184, 2016.

[10] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.

Method

Gaze Redir 🠗

Head Redir 🠗

GAN Inversion

(e4e + StyleGAN2)

11.302

4.130

0.334

0.377

-

-

ReDirTrans-GAN

2.505

1.020

0.353

0.388

0.117

0.128

Method

Gaze Redir 🠗

Head Redir 🠗

GAN Inversion

(e4e + StyleGAN2)

4.448

2.586

0.211

0.286

-

-

ReDirTrans-GAN

3.157

2.257

0.228

0.314

0.087

0.099

Within-dataset: GazeCapture Test Subset [8]

Cross-dataset: CelebA-HQ [10]

 

11/13

14 of 15

Data Augmentation

  • Downstream Gaze Estimation Task
    • 10,000 real images with annotations
    • Pick Q% real images (Subset A) for synthesizing the same number of images (Subset B)
    • Raw: Subset A, Aug: Subsets A & B

Q%

GazeCapture

MPIIFaceGaze

Raw 🠗

Aug 🠗

Raw 🠗

Aug 🠗

25

5.875

5.238

8.607

7.096

50

4.741

4.506

6.787

6.113

75

4.308

4.200

6.165

5.767

12/13

15 of 15

Thank You

Paper: https://arxiv.org/pdf/2305.11452.pdf

13/13