1 of 37

Enhanced Deep Residual Networks for Single Image Super-Resolution

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee

Computer Vision Lab.

Dept. of ECE, ASRI, Seoul National University

http://cv.snu.ac.kr

2 of 37

SISR (Single Image Super Resolution)

Goal: Restoring a HR image from a single LR image

Low-resolution

image

High-resolution

image

Super-Resolution

3 of 37

Lessons from Recent Studies

Skip connections

Global and local skip connections enable deep architecture & stable training

Upscaling methods

Post-upscaling using sub-pixel convolution is more efficient than pre-upscaling
However, they are limited that only single-scale SR is possible

SRResNet (CVPR2017)

VDSR (CVPR2016)

4 of 37

EDSR

MDSR

5 of 37

4 Techniques for Better SR

Need Batch-Normalization?

Increasing model size

Better loss function

Geometric self-ensemble

EDSR

6 of 37

Need Batch-Normalization?

Empirical tests show that removing Batch-Normalization improves the performance!

7 of 37

Need Batch-Normalization?

Unlike classification problem, input and output have similar distributions

In SR, normalizing intermediate features may not be desirable

Also, can save ~40% of memory → Can enlarge the model size

8 of 37

Increasing Model Size

Empirical test show that increasing #features is better than increasing depth

Instability occurs when #features increased up to 256

Given a limited memory, which design is better?

9 of 37

Increasing Model Size

Residual Scaling Layer

Increasing #features (up to 256) results instability during training
Constant scaling layers after each residual path prevents such instability

Proposed in (Szegedy 2016), “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”

10 of 37

Loss Function: L1 vs L2

Is MSE (L2 loss) the best choice?
Comparison between different loss functions

EDSR baseline(16 res-blocks), scale=2, tested on DIV2K images (791~800)

→ MSE is not a good choice!

11 of 37

Geometric Self-Ensemble

Motivation

Model ensemble is nice, but expensive!
How can we achieve an ensemble effect while avoiding training new models?

Method

Transform test image 8 times with flips and rotations (x8)
Build 8 outputs and inverse-transform correspondingly
Average 8 results

Proposed in (Timofte 2016), “Seven ways to improve example-based single-image super-resolution”

12 of 37

Geometric Self-Ensemble

13 of 37

EDSR Summary

Deeper & Wider: 32 ResBlocks and 256 channels
Global-local skip connections
Post-upscaling
No Batch-Normalization
Residual scaling
L1 loss function
Geometric self-ensemble (EDSR+)

14 of 37

EDSR

MDSR

15 of 37

Motivation

VDSR: Multi-scale SR in a single model
Multi-scale knowledge transfer

Efficient Multi-Scale Model

Designing MDSR
Single vs. Multi-scale learning
Train & Test method
EDSR vs. MDSR

MDSR

16 of 37

Motivation

SRCNN, VDSR: A single architecture regardless of upscaling factor

⇨ Multi-scale SR in a single model (VDSR)

FSRCNN, ESPCN, SRResNet: Fast & Efficient, (late upsampling)�but cannot deal with the multiple scales in a single model.

17 of 37

Motivation

FSRCNN, ESPCN, SRResNet

⇨ Different models for different scales?

Heavy training burden
Waste of parameters for similar tasks
Redundancy

18 of 37

Motivation

Pre-trained scale x2 networks greatly helps training scale x3 and x4 networks.

Super-resolution at multiple scales are inter-related tasks!

Multi-scale knowledge transfer

19 of 37

Designing MDSR

How to make EDSR (post-upscaling) to handle multiscale SR as VDSR?

Requirements

Reduce the variance between the different scales

Most parameters are shared across scales

For efficiency: Post-upscaling

⇨ Scale-specific pre-processing modules

⇨ main branch

⇨ Scale-specific up-samplers

20 of 37

Train and Test Method

Train
Only one of 3 scale-specific branches is activated at each iteration
A mini-batch consists of single-scale patches

Test
Select one of the paths �(①~③) according to the desired SR scale

21 of 37

EDSR vs. MDSR

Performance:�MDSR ≲ EDSR
# Parameters:�MDSR << EDSR�(Almost ⅕! + MDSR can handle the multiple scales in a single model)
Stability:�MDSR << EDSR�(We failed to increase #features� even with residual scaling)

22 of 37

MDSR Summary

Very deep architecture: 80 ResBlocks
Most parameters are shared in main branch
Scale-specific pre-processing modules and up-samplers
Post-upscaling
No Batch-Normalization
L1 loss function
Geometric self-ensemble (MDSR+)

23 of 37

Results

24 of 37

Training Details

25 of 37

Quantitative Results

26 of 37

Qualitative Results

27 of 37

Qualitative Results

28 of 37

Qualitative Results

29 of 37

Qualitative Results

30 of 37

Qualitative Results

31 of 37

Unknown Track (Challenge)

32 of 37

Unknown Track (Challenge)

33 of 37

Extreme SR (up to x64)

1/64 Scale!

How about extreme cases?

34 of 37

Extreme SR (up to x64)

Bicubic

EDSR

NN

35 of 37

Extreme SR (up to x64)

Bicubic

EDSR

NN

36 of 37

Conclusion

State-of-the-art single image super-resolution system using better ResNet structure

Techniques to build & train extremely large model

A single network to deal with multi-scale SR problem

37 of 37

Thank you!

http://cv.snu.ac.kr