JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 13

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Wenbo Li¹, Zhe Lin², Kun Zhou³, Lu Qi¹, Yi Wang⁴, Jiaya Jia¹

¹The Chinese University of Hong Kong

²Adobe Inc.

³The Chinese University of Hong Kong (Shenzhen)

⁴Shanghai AI Laboratory

2 of 13

Image Inpainting

MAT: Mask-aware Transformer for (1) Large holes, (2) High-resolution, (3) Diverse outputs

3 of 13

Introduction

Contributions

A novel mask-aware transformer (MAT) architecture

Multi-head contextual attention
Adjusted transformer block
Style manipulation module

State-of-the-art performance on Places and CelebA-HQ

4 of 13

Framework

5 of 13

Multi-Head Contextual Attention

The proposed attention module exploits valid tokens using a dynamic mask in shifted windows:

Mast Initialization: hole mask
Mask Update: all tokens in a window are updated to be valid after attention if there is at least one valid token before.

U: mask update

S: window shift

6 of 13

Adjusted Transformer Block

Compared to the conventional transformer block, we adjust the architecture:

Remove the layer normalization (LN)
Employ fusion learning

The conventional transformer block^[1]

Our transformer stage and block

[1] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." in ICLR, 2020.

7 of 13

Style Manipulation Module

We use a style code to manipulate the output by changing the weight normalization of convolution^[1-3]:

[1] Chen, Ting, et al. "On self modulation for generative adversarial networks.” In ICLR, 2018.

[2] Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." In CVPR, 2019.

[3] Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." In CVPR, 2020.

8 of 13

SOTA performance on both small and large masks for Places and CelebA-HQ

9 of 13

When using Full Places Dataset (8M images) for training, our MAT achieves significant improvements.

1 of 13

2 of 13

3 of 13

4 of 13

5 of 13

6 of 13

7 of 13

8 of 13

9 of 13

10 of 13

11 of 13

12 of 13

13 of 13