1 of 26

Sensor-independent illumination estimation for DNN models

Mahmoud Afifi¹ and Michael S. Brown^1,2

¹York University ²Samsung AI Center - Toronto

2 of 26

Onboard camera processing

When we capture a photograph,

a number of steps are applied�onboard the camera to produce�the final sRGB output image.

Figure from Karaimer and Brown [ECCV 2016]

raw-RGB

sRGB

3 of 26

A key routine in the pipeline is white balance

When we capture a photograph, a number of steps are applied�onboard the camera to produce�the final sRGB output image.

raw-RGB

sRGB

4 of 26

White balance is called computational colour constancy�in computer vision

White balance is applied to remove the colour cast caused by the scene illumination.

5 of 26

Illumination estimation

raw-RGB image

“White-balanced”

raw-RGB image

6 of 26

Illumination estimation

Illumination�estimation algorithm

raw-RGB image

“White-balanced”

raw-RGB image

7 of 26

White balance is applied to raw-RGB

When we capture a photograph.

A number of steps are applied�onboard the camera to produce�the final sRGB output image.

raw-RGB

sRGB

8 of 26

Illumination estimation methods

Two main approaches to illumination estimation

Sensor-independent approaches

Rely on image colour statistics.

Grey-world [1980]

White-patch [1986]

Shades-of-grey [2004]

Grey-edge [2007]

PCA [2010]

…

Sensor-dependent approaches

Learning-based approaches via training data.

Gamut-based [1990]

Bayesian methods [2008]

Bias-correction [2013]

Decision-trees [2014]

…

DNNs

[2015 - foreseeable future]

9 of 26

DNN-based approaches

Sensor X raw-RGB image

White-balanced image

DNN model

for sensor X

Sensor Y raw-RGB image

DNN models are trained per sensor.

White-balanced image

DNN model

for sensor Y

10 of 26

DNN-based approaches

Sensor

spectral sensitivity

Wavelength

Sensitivity

Wavelength

Sensitivity

Sensor

spectral sensitivity

R

G

B

R

G

B

Canon 600D raw-RGB image

Nikon D5200 raw-RGB image

Images from NUS-8 illumination estimation dataset

11 of 26

Our proposed method

Sensor X raw-RGB image

Sensor Y raw-RGB image

White-balanced image

Sensor-independent framework

(1) Sensor-mapping�network

(2) Illumination estimation

network

12 of 26

Overall framework and network architecture

Image mapping matrix

M

conv/ReLU

128 5x5 with

stride = 2

fc

9

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Sensor mapping network

RGB-uv histogram block

RGB-uv histogram for sensor mapping

Mapping input image from original raw

space to the learned working space

RGB-uv histogram block

RGB-uv histogram for illuminant estimation

fc

3

conv/ReLU

128 5x5 with

stride = 2

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Illuminant estimation network

White-balanced image

Mapping to original

raw space

M^-1

Final

estimated

illuminant

13 of 26

Starting point: RGB–uv histogram block

We build on the RGB-uv histogram feature from [1,2] by adding two learnable parameters to control the contribution of each color channel in the generated histogram and the smoothness of histogram bins.

[1] J. Barron, Convolutional Color Constancy, ICCV, 2015.

[2] M. Afifi et al., When Color Constancy Goes Wrong, CVPR, 2019.

Input raw-RGB image

Generated RGB-uv

histogram

14 of 26

Sensor space mapping network

conv/ReLU

128 5x5 with

stride = 2

fc

9

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Input image histogram

15 of 26

Illumination estimation network

Mapped image histogram

Mapped raw-RGB image

conv/ReLU

128 5x5 with

stride = 2

fc

3

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

16 of 26

Training and loss function

R

G

B

Estimated

GT

17 of 26

Training and loss function

RGB-uv histogram block

fc

3

conv/ReLU

128 5x5 with

stride = 2

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Illuminant estimation network

RGB-uv histogram for illuminant estimation

RGB-uv histogram block

conv/ReLU

128 5x5 with

stride = 2

fc

9

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Sensor mapping network

RGB-uv histogram for sensor mapping

Mapping input image from original raw

space to the learned working space

Mapping to original

raw space

M^-1

Angular �Error Loss

Angular error Loss =

Ground Truth

18 of 26

Training and loss function

RGB-uv histogram block

fc

3

conv/ReLU

128 5x5 with

stride = 2

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Illuminant estimation network

RGB-uv histogram for illuminant estimation

RGB-uv histogram block

conv/ReLU

128 5x5 with

stride = 2

fc

9

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Sensor mapping network

RGB-uv histogram for sensor mapping

Mapping input image from original raw

space to the learned working space

Mapping to original

raw space

M^-1

Angular �Error Loss

Angular error Loss =

Ground Truth

19 of 26

Test time example #1

RGB-uv histogram block

Illuminant estimation net

Estimated illuminant in the working space

Mapping matrix

Sensor mapping

net

Testing raw-RGB image

(Fujifilm XM1)

Project back to the original raw-RGB space

White-balanced raw-RGB�

"working space"

20 of 26

Test time example #2

RGB-uv histogram block

Testing raw-RGB image

(Canon 1Ds Mk III)

Mapping matrix

Sensor mapping

network

Illuminant estimation net

Estimated illuminant in the working space

"working space"

Project back to the original raw-RGB space

White-balanced raw-RGB�

21 of 26

Experimental results

All camera models in:

NUS 8-Camera dataset (8 camera models)
Gehler-Shi dataset (2 camera models)
Cube/Cube+ dataset (1 camera model)

NUS-8 dataset

Gehler-Shi dataset

Cube/Cube+ datasets

22 of 26

Results

Methods	NUS 8-Camera				Gehler-Shi
Methods	Mean	Median	Best 25%	Worst 25%	Mean	Median	Best 25%	Worst 25%
Avg. sensor-independent methods	4.26	3.25	0.99	9.43	5.10	4.03	1.91	10.77
Avg. sensor-dependent methods	2.40	1.64	0.50	5.75	2.62	1.75	0.50	5.95
Ours	2.05	1.50	0.52	4.48	2.77	1.93	0.55	6.53

Angular errors

Sensor-independent methods

- J. Van De Weijer, et al., Edge-based color constancy, TIP, 2007

- S. Bianco and C. Cusano, Quasi-unsupervised color constancy, �In CVPR’19

- Y. Qian, et al., On finding gray pixels, In CVPR’19

Sensor-dependent methods

- W. Shi, et al., Deep specialized network for illuminant estimation, In ECCV’16

- . J. T. Barron and Y.-T. Tsai, Fast Fourier color constancy,

In CVPR’17

- Y. Hu, et al., FC4: Fully convolutional color constancy with confidence-weighted pooling, In CVPR’17

23 of 26

Results

Methods	Cube				Cube+
Methods	Mean	Median	Best 25%	Worst 25%	Mean	Median	Best 25%	Worst 25%
Avg. sensor-independent methods	3.57	2.47	0.64	8.30	4.98	3.32	0.82	11.77
Avg. sensor-dependent methods	1.54	0.92	0.26	3.85	2.04	1.02	0.25	5.58
Ours	1.98	1.36	0.40	4.64	2.14	1.44	0.44	5.06

Sensor-independent methods

- J. Van De Weijer, et al., Edge-based color constancy, TIP, 2007

- S. Bianco and C. Cusano, Quasi-unsupervised color constancy, �In CVPR’19

- Y. Qian, et al., On finding gray pixels, In CVPR’19

Sensor-dependent methods

- W. Shi, et al., Deep specialized network for illuminant estimation, In ECCV’16

- . J. T. Barron and Y.-T. Tsai, Fast Fourier color constancy,

In CVPR’17

- Y. Hu, et al., FC4: Fully convolutional color constancy with confidence-weighted pooling, In CVPR’17

Angular errors

24 of 26

Results

Angular error = 1.29°

Angular error = 0.43°

Angular error = 0.98°

Samsung NX

Canon 600D

Canon 5D

Input raw-RGB images

Mapped images

Corrected images

Ground truth

Images from NUS 8-Cameras and Gehler-Shi datasets

25 of 26

Summary

Presented a deep learning framework for sensor-independent illumination estimation

Our method learns an image-specific mapping to transform different camera raw-RGB images into a canonical "working space"

Our framework is on par with sensor-dependent DNNs, but requires only a single DNN.

26 of 26

Thank you

Image mapping matrix

M

RGB-uv histogram block

conv/ReLU

128 5x5 with

stride = 2

fc

9

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Sensor mapping network

RGB-uv histogram for sensor mapping

Mapping input image from original raw

space to the learned working space

RGB-uv histogram block

fc

3

conv/ReLU

128 5x5 with

stride = 2

conv/ReLU

256 3x3 with

stride = 2

conv/ReLU

512 2x2 with

stride = 1

Illuminant estimation network

RGB-uv histogram for illuminant estimation

White-balanced image

Mapping to original

raw space

M^-1

Final

estimated

illuminant

Questions?