1 of 13

FRR-Net: A Real-Time Blind Face Restoration and Relighting Network

Authors: Samira Pouyanfar, Sunando Sengupta, Mahmoud Mohammadi, Ebey Abraham, Brett Bloomquist, Lukas Dauterman, Anjali Parikh, Steve Lim, and Eric Sommerlade

Workshop: NTIRE2023-New Trends in Image Restoration and Enhancement workshop and associated challenges

2 of 13

Motivations

    • Enhancing the quality of faces in videos and images significantly improves the user experience in different applications:
      • Video conferencing
      • Mobile apps
    • There are different conditions that may affect the image quality of the facial region including:
      • Light/exposure (e.g., dark rooms, windows, lamps, etc.)
      • Camera focus blur and movement
      • Distance from the camera
      • Screen illumination on the face
    • Efficient models are needed for real-time applications

3 of 13

FRR-Net contributions:

  • A computationally efficient model to handle various distortions (low-light, blur, noise, illumination, etc.). This model includes:
    • A new synthetic degradation model to handle both face relighting and restoration
    • Extended the idea of model compression and carefully design a network to balance the network’s depth and width which leads to better performance and speed
    • A distortion-guided classifier that predicts the degradation type and uses that class information as a prior in the autoencoder
    • A face segmentation mask and dice loss to only focus on the face region

(e) Low Light+Blur/Noise

Example input images captured under varying conditions using a custom

webcam and the corresponding enhanced version obtained by FRR-Net

4 of 13

Degradation Model

FRR-Net recovers face quality in a variety of conditions

    • Resize ↓(bilinear, bicubic)
    • Exposure ξ (different lights/exposures) 
    • Chromatic Aberration
    • Noise (gaussian) N
    • Blur k (gaussian)
    • JPEG compression
    • Color Jitter ŋ

input

target

Low Light

Blurry

Noise

Illumination

5 of 13

Model Framework

    • Encoder: extracts rich local features via dense residual connections
    • Decoder (generator): generates enhanced face and the face mask
    • Distortion Classifier: predicts the types of degradation present in the input image
    • Face segmentation and generation: facial segmentation is used to let the model only focus on enhancing the facial region while predicting the face boundary
    • Model compression module: adjusts the input/output channels of the convolution layers as well as the number of residual layers in each dense block

Model compression module

3

4

4

5

ELT layout

6 of 13

Experiments

Comparison results on StyleGAN validation data

Inference time and computational cost comparison

Comparison results on Celeb-A validation data

  • FRR-Net provides a good balance of accuracy and latency, making it suitable for real-time image/video face restoration/relighting across CPU, GPU, and NPU hardware

Inference time on NPU for various Width-Depth versions

7 of 13

Input

Ground Truth

FRR-Net (ours)

GFPGAN (CVPR 2021)

VQFR (ECCV2022)

PANINI (AAAI 2022)

FRR-Net improves both low light and noise/blur

Our approach does similarly or better relative to other SOTA approaches.

8 of 13

FRR-Net Outputs on Real Samples�FRR-Net can remove the unnatural color cast on users’ faces because of screen content or other light sources

Input Output

Input Output

Input Output

Input Output

9 of 13

Video Demo: Low Light-Far from Camera

Input

FRR-Net

10 of 13

Video Demo: Synthetic Noise+light+blur

Input

FRR-Net

11 of 13

Video Demo: All distortion combined

Input

FRR-Net

12 of 13

Limitations and Future Work

  • Extreme distortions
  • Video temporal issue
  • Body and background noise removal

ELT layout

13 of 13

  • Q&A

ELT layout