1 of 23

SUNet: Swin Transformer with UNet for

Image Denoising

 

 

2 of 23

Outline

2

  • Introduction

  • Proposed Method

  • Experiments

  • Conclusion

3 of 23

Introduction

3

  • Introduction

  • Proposed Method

  • Experiments

  • Conclusion

4 of 23

Introduction (1/2)

4

Image denoising is a challenging ill-posed problem which also

has been a long-standing issue. Moreover, denoising is an

important low-level image processing which could improve the performance in the high-level vision tasks.

Denoising

X

Y

5 of 23

Introduction (2/2)

5

  • Contribution
    • We proposed a denoising architecture based on the image

segmentation Swin-UNet model called SUNet.

    • We proposed a dual up-sample block which comprises both subpixel and bilinear up-sample methods to prevent checkboard artifacts.

    • To the best of our knowledge, our model is the first one to incorporate Swin Transformer and UNet in denoising.

    • We demonstrate the competitive results of our SUNet in two common datasets for image denoising.

6 of 23

Method

6

  • Introduction

  • Proposed Method

  • Experiments

  • Conclusion

7 of 23

Method (1/7)

7

  • SUNet

8 of 23

Method (2/7)

8

  • SUNet

    • Shallow feature extraction module (3 x 3 convolution)

    • UNet feature extraction module

    • Reconstruction module (3 x 3 convolution)

9 of 23

Method (3/7)

9

  • UNet feature extraction module

    • Swin Transformer Layer (STL) and Swin Transformer Block (STB)

    • Patch Merging down-sampling

    • Dual up-sampling

10 of 23

Method (4/7)

10

  • STL and STB

    • We use Swin Transformer Block as our feature extraction blocks for

image denoising.

    • Because of the shifted-window design, the Swin Transformer Layer in Block have to the multiple of even.

11 of 23

Method (5/7)

11

  • Patch Merging down-sampling

    • Concatenate the input features of each group of 2 × 2 neighboring

patches, and then use the linear layer to obtain the specified channel

number of output features.

    • The operation is same as the Pixel Unshuffled with the convolution.

12 of 23

Method (6/7)

12

  • Dual up-sampling

    • Original Swin-UNet uses patch expanding up-sampling method which is equivalent to transpose convolution. However, the transpose

convolution usually happens the block effects which seriously

influence the denoised performance.

    • We proposed a up-sampling block called

Dual up-sampling, which comprises

Bilinear and PixelShuffle methods to

prevent checkerboard artifacts.

13 of 23

Method (7/7)

13

 

14 of 23

Experiments

14

  • Introduction

  • Proposed Method

  • Experiments

  • Conclusion

15 of 23

Experiments (1/6)

15

 

16 of 23

Experiments (2/6)

16

  • Dataset
    • Testing dataset:

CBSD68 which has 68 color images with the resolution of 321 x481.

Kodak24 which consisting of 24 images with the image size of

768 x 512.

17 of 23

Experiments (3/6)

17

  • Denoising result

18 of 23

Experiments (4/6)

18

  • Denoising visual result (CBSD68)

19 of 23

Experiments (5/6)

19

  • Denoising visual result (Kodak24)

20 of 23

Experiments (6/6)

20

  • Additional experiments about Dual up-sampling

Gaussian

Noise: 50

Original

Swin-UNet (jpeg)

SUNet

(Bilinear)

SUNet

(Subpixel)

Original

Swin-UNet (png)

SUNet

(Dual up-sample)

baby.png in Set 5 dataset

PSNR

15.281

27.818

27.830

28.739

28.891

29.220

SSIM

0.2751

0.8071

0.7971

0.8249

0.8327

0.8430

21 of 23

Conclusion

21

  • Introduction

  • Proposed Method

  • Experiments

  • Conclusion

22 of 23

Conclusion (1/1)

22

  • Conclusion
    • We present the SUNet architecture which is based on the Swin

Transformer and achieve the competitive results on denoising.

    • we propose the dual up-sample module to avoid the checkerboard

artifacts.

    • The potential of Swin Transformer still deserves to be expected in the future.

    • Our future works are going to attempt more complex restoration tasks, such as real-world noise and real-world motion-blur.

23 of 23

23

Thanks for listening