1 of 15

Large Kernel Distillation Network for Efficient Single Image Super-Resolution

Chengxing Xie^1∗ Xiaoming Zhang^1∗ Linze Li¹ Haiteng Meng¹

Tianlin Zhang² Tianrui Li¹ Xiaole Zhao^1†

�¹Southwest Jiaotong University, China�² National Space Science Center, Chinese Academy of Science, China

2 of 15

Super-Resolution

downsample

upsample

GT

LR

SR

ill-posed

3 of 15

Motivation

[1] Guo, M. H., Lu, C. Z., Liu, Z. N., Cheng, M. M., & Hu, S. M. (2022). Visual attention network. arXiv preprint arXiv:2202.09741.

Large-Kernel Decomposition

4 of 15

Motivation

[2] Ding, X., Zhang, X., Han, J., & Ding, G. (2021). Diverse branch block: Building a convolution as an inception-like unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10886-10895).886-10895.

Structural Re-parameterization

5 of 15

Architecture

Shallow Feature Extraction

Deep Feature Extraction

Image Reconstruction

6 of 15

Architecture

Down-Sample

Up-Sample

Conv

Sigmoid

Conv Groups

Contrast

Sigmoid

Conv Groups

Enhanced spatial attention (ESA)

Contrast-aware channel attention (CCA)

[3] Hui, Z., Gao, X., Yang, Y., & Wang, X. (2019, October). Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th acm international conference on multimedia (pp. 2024-2032).

[4] Liu, J., Zhang, W., Tang, Y., Tang, J., & Wu, G. (2020). Residual feature aggregation network for image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2359-2368). 2020: 2359-2368.

7 of 15

Architecture

(a) LKDB: Large Kernel Distillation Block; (b) BSConv: Blueprint Separable Convolution; (c) LKA: Large Kernel Attention; (d) RBSB: Re-parameterized Blueprint Shallow Block.

8 of 15

Optimizer

Heavy-ball acceleration (HBA)

Nesterov Accelerated Gradient (NAG)

To compute the “ahead-of-time gradient.”, Adan can pre-perceive the geometric information around the current point.

Adan can adapt to larger learning rates and batch sizes, leading to faster convergence.

Adam

Adan

[5] Xie, X., Zhou, P., Li, H., Lin, Z., & Yan, S. (2022). Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models. arXiv preprint arXiv:2208.06677.

9 of 15

Ablation Studies

Removing ESA&CCA can lead to a large decrease in performance.

Adding LKA can significantly improve performance with a small number of additional parameters.

Adjusting the number of network channels can slightly increase performance.

Table 1. Ablation study on large kernel attention.

10 of 15

Ablation Studies

Table 2. PSNR / SSIM comparison of different basic blocks in the feature distillation connections of LKDN-S.

Removing shortcut and using structural re-parameterization can increase performance.

11 of 15

Table 3. PSNR / SSIM comparison of applying Adam and Adan optimizers.

Ablation Studies

Adan optimizer has faster rate of convergence and shorter training time.

Adam optimizer is less likely to fall into local optima than the Adam optimizer.

12 of 15

Performance

Table 5. Comparison with state-of-the-art methods, and our training dataset is DF2K (2650 images).

13 of 15

Performance

Comparison of model performance and complexity on Urban100 with SR(×4).

14 of 15

Performance

Clear lines

Less artifacts

15 of 15

Thank you for your attention.

Q&A