1 of 37

Enhanced Deep Residual Networks for Single Image Super-Resolution

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee

Computer Vision Lab.

Dept. of ECE, ASRI, Seoul National University

http://cv.snu.ac.kr

2 of 37

SISR (Single Image Super Resolution)

Goal: Restoring a HR image from a single LR image

Low-resolution

image

High-resolution

image

Super-Resolution

3 of 37

Lessons from Recent Studies

  • Skip connections
    • Global and local skip connections enable deep architecture & stable training
  • Upscaling methods
    • Post-upscaling using sub-pixel convolution is more efficient than pre-upscaling
    • However, they are limited that only single-scale SR is possible

SRResNet (CVPR2017)

VDSR (CVPR2016)

4 of 37

EDSR

MDSR

5 of 37

4 Techniques for Better SR

Need Batch-Normalization?

Increasing model size

Better loss function

Geometric self-ensemble

EDSR

6 of 37

Need Batch-Normalization?

Empirical tests show that removing Batch-Normalization improves the performance!

7 of 37

Need Batch-Normalization?

  • Unlike classification problem, input and output have similar distributions

  • In SR, normalizing intermediate features may not be desirable

  • Also, can save ~40% of memory → Can enlarge the model size

8 of 37

Increasing Model Size

  • Empirical test show that increasing #features is better than increasing depth

  • Instability occurs when #features increased up to 256

Given a limited memory, which design is better?

9 of 37

Increasing Model Size

  • Residual Scaling Layer
    • Increasing #features (up to 256) results instability during training
    • Constant scaling layers after each residual path prevents such instability

Proposed in (Szegedy 2016), “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

10 of 37

Loss Function: L1 vs L2

  • Is MSE (L2 loss) the best choice?
  • Comparison between different loss functions
    • EDSR baseline(16 res-blocks), scale=2, tested on DIV2K images (791~800)

MSE is not a good choice!

11 of 37

Geometric Self-Ensemble

  • Motivation
    • Model ensemble is nice, but expensive!
    • How can we achieve an ensemble effect while avoiding training new models?

  • Method
    • Transform test image 8 times with flips and rotations (x8)
    • Build 8 outputs and inverse-transform correspondingly
    • Average 8 results

Proposed in (Timofte 2016), “Seven ways to improve example-based single-image super-resolution

12 of 37

Geometric Self-Ensemble

13 of 37

EDSR Summary

  • Deeper & Wider: 32 ResBlocks and 256 channels
  • Global-local skip connections
  • Post-upscaling
  • No Batch-Normalization
  • Residual scaling
  • L1 loss function
  • Geometric self-ensemble (EDSR+)

14 of 37

EDSR

MDSR

15 of 37

Motivation

  • VDSR: Multi-scale SR in a single model
  • Multi-scale knowledge transfer

Efficient Multi-Scale Model

  • Designing MDSR
  • Single vs. Multi-scale learning
  • Train & Test method
  • EDSR vs. MDSR

MDSR

16 of 37

Motivation

SRCNN, VDSR: A single architecture regardless of upscaling factor

⇨ Multi-scale SR in a single model (VDSR)

FSRCNN, ESPCN, SRResNet: Fast & Efficient, (late upsampling)�but cannot deal with the multiple scales in a single model.

17 of 37

Motivation

FSRCNN, ESPCN, SRResNet

Different models for different scales?

  • Heavy training burden
  • Waste of parameters for similar tasks
  • Redundancy

18 of 37

Motivation

  • Pre-trained scale x2 networks greatly helps training scale x3 and x4 networks.

  • Super-resolution at multiple scales are inter-related tasks!

Multi-scale knowledge transfer

19 of 37

Designing MDSR

How to make EDSR (post-upscaling) to handle multiscale SR as VDSR?

Requirements

  1. Reduce the variance between the different scales

  • Most parameters are shared across scales

  • For efficiency: Post-upscaling

Scale-specific pre-processing modules

main branch

Scale-specific up-samplers

20 of 37

Train and Test Method

  1. Train
  2. Only one of 3 scale-specific branches is activated at each iteration
  3. A mini-batch consists of single-scale patches

  1. Test
  2. Select one of the paths �(①~③) according to the desired SR scale

21 of 37

EDSR vs. MDSR

  • Performance:�MDSR ≲ EDSR
  • # Parameters:MDSR << EDSR�(Almost ⅕! + MDSR can handle the multiple scales in a single model)
  • Stability:�MDSR << EDSR�(We failed to increase #features� even with residual scaling)

22 of 37

MDSR Summary

  • Very deep architecture: 80 ResBlocks
  • Most parameters are shared in main branch
  • Scale-specific pre-processing modules and up-samplers
  • Post-upscaling
  • No Batch-Normalization
  • L1 loss function
  • Geometric self-ensemble (MDSR+)

23 of 37

Results

24 of 37

Training Details

25 of 37

Quantitative Results

26 of 37

Qualitative Results

27 of 37

Qualitative Results

28 of 37

Qualitative Results

29 of 37

Qualitative Results

30 of 37

Qualitative Results

31 of 37

Unknown Track (Challenge)

32 of 37

Unknown Track (Challenge)

33 of 37

Extreme SR (up to x64)

1/64 Scale!

How about extreme cases?

34 of 37

Extreme SR (up to x64)

Bicubic

EDSR

NN

35 of 37

Extreme SR (up to x64)

Bicubic

EDSR

NN

36 of 37

Conclusion

  1. State-of-the-art single image super-resolution system using better ResNet structure

  • Techniques to build & train extremely large model

  • A single network to deal with multi-scale SR problem

37 of 37

Thank you!