1 of 23

SUNet: Swin Transformer with UNet for

Image Denoising

2 of 23

Outline

2

Introduction

Proposed Method

Experiments

Conclusion

3 of 23

Introduction

3

Introduction

Proposed Method

Experiments

Conclusion

4 of 23

Introduction (1/2)

4

Image denoising is a challenging ill-posed problem which also

has been a long-standing issue. Moreover, denoising is an

important low-level image processing which could improve the performance in the high-level vision tasks.

Denoising

X

Y

5 of 23

Introduction (2/2)

5

Contribution

We proposed a denoising architecture based on the image

segmentation Swin-UNet model called SUNet.

We proposed a dual up-sample block which comprises both subpixel and bilinear up-sample methods to prevent checkboard artifacts.

To the best of our knowledge, our model is the first one to incorporate Swin Transformer and UNet in denoising.

We demonstrate the competitive results of our SUNet in two common datasets for image denoising.

6 of 23

Method

6

Introduction

Proposed Method

Experiments

Conclusion

7 of 23

Method (1/7)

7

SUNet

8 of 23

Method (2/7)

8

SUNet

Shallow feature extraction module (3 x 3 convolution)

UNet feature extraction module

Reconstruction module (3 x 3 convolution)

9 of 23

Method (3/7)

9

UNet feature extraction module

Swin Transformer Layer (STL) and Swin Transformer Block (STB)

Patch Merging down-sampling

Dual up-sampling

10 of 23

Method (4/7)

10

STL and STB

We use Swin Transformer Block as our feature extraction blocks for

image denoising.

Because of the shifted-window design, the Swin Transformer Layer in Block have to the multiple of even.

11 of 23

Method (5/7)

11

Patch Merging down-sampling

Concatenate the input features of each group of 2 × 2 neighboring

patches, and then use the linear layer to obtain the specified channel

number of output features.

The operation is same as the Pixel Unshuffled with the convolution.

12 of 23

Method (6/7)

12

Dual up-sampling

Original Swin-UNet uses patch expanding up-sampling method which is equivalent to transpose convolution. However, the transpose

convolution usually happens the block effects which seriously

influence the denoised performance.

We proposed a up-sampling block called

Dual up-sampling, which comprises

Bilinear and PixelShuffle methods to

prevent checkerboard artifacts.

13 of 23

Method (7/7)

13

14 of 23

Experiments

14

Introduction

Proposed Method

Experiments

Conclusion

15 of 23

Experiments (1/6)

15

16 of 23

Experiments (2/6)

16

Dataset

Testing dataset:

CBSD68 which has 68 color images with the resolution of 321 x481.

Kodak24 which consisting of 24 images with the image size of

768 x 512.

17 of 23

Experiments (3/6)

17

Denoising result

18 of 23

Experiments (4/6)

18

Denoising visual result (CBSD68)

19 of 23

Experiments (5/6)

19

Denoising visual result (Kodak24)

20 of 23

Experiments (6/6)

20

Additional experiments about Dual up-sampling

	Gaussian Noise: 50	Original Swin-UNet (jpeg)	SUNet (Bilinear)	SUNet (Subpixel)	Original Swin-UNet (png)	SUNet (Dual up-sample)
baby.png in Set 5 dataset
PSNR	15.281	27.818	27.830	28.739	28.891	29.220
SSIM	0.2751	0.8071	0.7971	0.8249	0.8327	0.8430

21 of 23

Conclusion

21

Introduction

Proposed Method

Experiments

Conclusion

22 of 23

Conclusion (1/1)

22

Conclusion

We present the SUNet architecture which is based on the Swin

Transformer and achieve the competitive results on denoising.

we propose the dual up-sample module to avoid the checkerboard

artifacts.

The potential of Swin Transformer still deserves to be expected in the future.

Our future works are going to attempt more complex restoration tasks, such as real-world noise and real-world motion-blur.

23 of 23

23

Thanks for listening