1 of 15

Large Kernel Distillation Network for Efficient Single Image Super-Resolution

Chengxing Xie1∗ Xiaoming Zhang1∗ Linze Li1 Haiteng Meng1

Tianlin Zhang2 Tianrui Li1 Xiaole Zhao1†

1 Southwest Jiaotong University, China�2 National Space Science Center, Chinese Academy of Science, China

2 of 15

Super-Resolution

downsample

upsample

GT

LR

SR

ill-posed

3 of 15

Motivation

[1] Guo, M. H., Lu, C. Z., Liu, Z. N., Cheng, M. M., & Hu, S. M. (2022). Visual attention network. arXiv preprint arXiv:2202.09741.

Large-Kernel Decomposition

4 of 15

Motivation

[2] Ding, X., Zhang, X., Han, J., & Ding, G. (2021). Diverse branch block: Building a convolution as an inception-like unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10886-10895).886-10895.

Structural Re-parameterization

5 of 15

Architecture

Shallow Feature Extraction

Deep Feature Extraction

Image Reconstruction

6 of 15

Architecture

Down-Sample

Up-Sample

Conv

Sigmoid

Sigmoid

Conv Groups

Contrast

Sigmoid

Conv Groups

Enhanced spatial attention (ESA)

Contrast-aware channel attention (CCA)

[3] Hui, Z., Gao, X., Yang, Y., & Wang, X. (2019, October). Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th acm international conference on multimedia (pp. 2024-2032).

[4] Liu, J., Zhang, W., Tang, Y., Tang, J., & Wu, G. (2020). Residual feature aggregation network for image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2359-2368). 2020: 2359-2368.

7 of 15

Architecture

(a) LKDB: Large Kernel Distillation Block; (b) BSConv: Blueprint Separable Convolution; (c) LKA: Large Kernel Attention; (d) RBSB: Re-parameterized Blueprint Shallow Block.

8 of 15

Optimizer

Heavy-ball acceleration (HBA)

Nesterov Accelerated Gradient (NAG)

  • To compute the “ahead-of-time gradient.”, Adan can pre-perceive the geometric information around the current point.

  • Adan can adapt to larger learning rates and batch sizes, leading to faster convergence.

Adam

Adan

[5] Xie, X., Zhou, P., Li, H., Lin, Z., & Yan, S. (2022). Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models. arXiv preprint arXiv:2208.06677.

9 of 15

Ablation Studies

  • Removing ESA&CCA can lead to a large decrease in performance.

  • Adding LKA can significantly improve performance with a small number of additional parameters.

  • Adjusting the number of network channels can slightly increase performance.

Table 1. Ablation study on large kernel attention.

10 of 15

Ablation Studies

Table 2. PSNR / SSIM comparison of different basic blocks in the feature distillation connections of LKDN-S.

  • Removing shortcut and using structural re-parameterization can increase performance.

11 of 15

Table 3. PSNR / SSIM comparison of applying Adam and Adan optimizers.

Ablation Studies

  • Adan optimizer has faster rate of convergence and shorter training time.

  • Adam optimizer is less likely to fall into local optima than the Adam optimizer.

12 of 15

Performance

Table 5. Comparison with state-of-the-art methods, and our training dataset is DF2K (2650 images).

13 of 15

Performance

Comparison of model performance and complexity on Urban100 with SR(×4).

14 of 15

Performance

  • Clear lines

  • Less artifacts

15 of 15

Thank you for your attention.

Q&A