1 of 20

CrowdSR: Enabling High-Quality Video Ingest οΏ½in Crowdsourced Livecast via Super-Resolution

Zhenxiao Luo, Zelong Wang, Jinyu Chen, Miao Hu, Yipeng Zhou, Tom Z. J. Fu, Di Wu

2 of 20

CrowdSR: Enabling High-Quality Video οΏ½Ingest in Crowdsourced Livecast via Super-Resolution

Zhenxiao Luo, Zelong Wang, Jinyu Chen,

Miao Hu, Yipeng Zhou, Tom Z. J. Fu, Di Wu

3 of 20

Introduction

  • Crowdsourced Livecast
    • increasingly attractive
    • younger generations
  • Twitch TV
    • 93 billion-minute per month
    • 3.18 million viewers per month
    • 9.71 million per month

4 of 20

Introduction

  • Livecast System
    • broadcaster
    • ingest server
    • content distribution network
    • viewer

5 of 20

Introduction

  • Neural-enhanced techniques
    • NAS
    • LiveNAS
  • Existing problem
    • seldom consider the collaboration among broadcasters

6 of 20

System Design

  • Challenge
    • different upstream bandwidth
    • different device capabilities
  • Motivation
    • provide 1080p video when upload video is 540p
    • solve the lack of high-resolution video samples for SR model training

7 of 20

System Design

  • Framework Overview
    • Patch Selector
    • Online Trainer
    • SR Processor

8 of 20

System Design

  • Patch Selector
    • Reference patch selector
    • Training patch selector

9 of 20

System Design

  • Reference patch selector
    • divide a frame from target video into patches
    • calculate the mean-square error (MSE) with patches of previous pframe
    • choose top k patches as reference patches

10 of 20

System Design

  • Training patch selector
    • divide a frame from similar videos into patches
    • calculate pHash value between reference patches and candidate patches
    • choose the top π‘š patches as training samples

11 of 20

System Design

  • Online Trainer
    • load pre-trained general model
    • using patches from patch selector
    • update model periodically

12 of 20

System Design

  • SR Processor
    • EDSR
    • enhance video quality
    • deliver high-resolution frame

13 of 20

Implementation

  • Main process
    • Receive video
    • aiortc, aiohttp
  • Online Trainer Process
    • Dataset update
    • Model training
    • PyTorch
  • SR process
    • Model update
    • Video quality enhancement
    • PyTorch, OpenCV

14 of 20

Evaluation

  • Datasets
    • Douyu and Bilibili
    • 1080p, 30fps
  • Metrics
    • Peak Signal-to-Noise Ratio (PSNR)
    • Structural Similarity Index (SSIM)
  • Baselines
    • Bicubic interpolation (BI)
    • General SR
    • Specialized SR

15 of 20

Evaluation

  • Server Specification
    • Intel Xeon Silver 4210R
    • 2 * NVIDIA RTX 3090
  • Training Parameter
    • Learning rate, 0.001
    • Batch size, 64
    • Epoch number, 100
    • Dataset size, 64000
    • Optimizer, Adam
    • Loss function, L1 loss

16 of 20

Evaluation

  • Average PSNR
    • Baseline (BI) is 29.48dB
    • 0.42-1.09dB improvement
  • Average SSIM
    • Baseline (BI) is 0.881
    • 0.006-0.014 improvement

17 of 20

Evaluation

  • PSNR change over time
    • sample every second
    • better than BI and GeneralSR
    • SpecializedSR is upper bound

18 of 20

Evaluation

  • Inference Latency
    • 28ms on average
    • 96% frames less than 33ms

19 of 20

Evaluation

  • GPU usages
    • most of the computation happens in training step
    • inference step is not such computationally intensive
    • takes about 4300MB and 3800 MB GPU memory

20 of 20

Conclusion

  • CrowdSR is novel video ingest framework
    • Utilizes super-resolution techniques
    • Utilizes similar broadcasters’ video as training sample
    • Utilizes online training to optimize performance
  • CrowdSR is effective video ingest framework
    • Achieve real-time video quality enhancement
    • Improve PSNR by 0.42-1.09 dB
    • Improve SSIM by 0.006-0.014