1 of 13

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Wenbo Li1, Zhe Lin2, Kun Zhou3, Lu Qi1, Yi Wang4, Jiaya Jia1

1The Chinese University of Hong Kong

2Adobe Inc.

3The Chinese University of Hong Kong (Shenzhen)

4Shanghai AI Laboratory

2 of 13

Image Inpainting

MAT: Mask-aware Transformer for (1) Large holes, (2) High-resolution, (3) Diverse outputs

3 of 13

Introduction

Contributions

    • A novel mask-aware transformer (MAT) architecture
      • Multi-head contextual attention
      • Adjusted transformer block
      • Style manipulation module
    • State-of-the-art performance on Places and CelebA-HQ

4 of 13

Framework

 

5 of 13

Multi-Head Contextual Attention

  • The proposed attention module exploits valid tokens using a dynamic mask in shifted windows:
  • Mast Initialization: hole mask
  • Mask Update: all tokens in a window are updated to be valid after attention if there is at least one valid token before.

U: mask update

S: window shift

6 of 13

Adjusted Transformer Block

Compared to the conventional transformer block, we adjust the architecture:

    • Remove the layer normalization (LN)
    • Employ fusion learning

The conventional transformer block[1]

Our transformer stage and block

[1] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." in ICLR, 2020.

7 of 13

Style Manipulation Module

We use a style code to manipulate the output by changing the weight normalization of convolution[1-3]:

[1] Chen, Ting, et al. "On self modulation for generative adversarial networks.” In ICLR, 2018.

[2] Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." In CVPR, 2019.

[3] Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." In CVPR, 2020.

 

8 of 13

 

 

SOTA performance on both small and large masks for Places and CelebA-HQ

9 of 13

 

When using Full Places Dataset (8M images) for training, our MAT achieves significant improvements.

10 of 13

 

11 of 13

 

12 of 13

 

13 of 13

Thanks for listening!

https://github.com/fenglinglwb/MAT