1 of 39

Inferring Distributions Over Depth

from a Single Image

Gengshan Yang, Peiyun Hu and Deva Ramanan

Carnegie Mellon University

2 of 39

Why monocular images?

Costs, payload, power consumption, ...
Also, safety: sensor redundancy / fusion

LiDAR depth image

3 of 39

Why estimating depth distributions?

[1] Li, Zhengqi, and Noah Snavely. "MegaDepth: Learning single-view depth prediction from internet photos." CVPR 2018.

[2] Fu, Huan, et al. "Deep ordinal regression network for monocular depth estimation." CVPR. 2018.

[3] Godard, Clément, et alGodard, Clément, et al. "Digging into self-supervised monocular depth estimation." ICCV. 2019.

MegaDepth [1]

DORN [2]

MonoDepth2 [3]

4 of 39

Why estimating depth distributions?

self-awareness of possible failure

input image

depth estimation

(yellow->near)

5 of 39

Why estimating depth distributions?

graceful-degradation in functionality

6 of 39

Our Probabilistic Solution

Discretize depth into K intervals in log space;
Predict occupancy probability for each depth interval.

d=80m

d=40m

d=20m

d=10m

7 of 39

Our Probabilistic Solution

Report back occupancy probabilities on a (H,W,K) voxel grid.

d=80m

d=40m

d=20m

d=10m

8 of 39

Our Probabilistic Solution

At train time, construct K binary labels, {0,1,0,0} from ground-truth depth.
Train with binary cross entropy loss.

d=80m

d=40m

d=20m

d=10m

d*=41m

9 of 39

Our Probabilistic Solution

At train time, construct K binary labels, or a soft-target, {.6, .9, .6, .1} from ground-truth depth.
Train with binary cross entropy loss.

d=80m

d=40m

d=20m

d=10m

d*=41m

10 of 39

Our Probabilistic Solution

At test time, normalize the occupancy scores into a distribution over depth.

no need of sampling

d=80m

d=40m

d=20m

d=10m

11 of 39

Network Architecture

A lightweight architecture estimating distributions over depth given a single image.

12 of 39

Network Architecture

A lightweight architecture estimating distributions over depth given a single image.

13 of 39

Network Architecture

A lightweight architecture estimating distributions over depth given a single image.

14 of 39

Experiments

15 of 39

Baselines: Unimodal Gaussian

Prediction

normal distribution

Loss

negative log-likelihood

d=80m

d=40m

d=20m

d=10m

Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

16 of 39

Baselines: Unimodal Gaussian

Prediction

normal distribution

Loss

negative log-likelihood

Monte Carlo dropout

d=80m

d=40m

d=20m

d=10m

Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

17 of 39

Baselines: Softmax

Prediction

multi-class distribution

Loss

multi-class cross-entropy

d=80m

d=40m

d=20m

d=10m

Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen. "Estimating depth from monocular images as classification using deep fully convolutional residual networks." IEEE Transactions on Circuits and Systems for Video Technology. 2017.

18 of 39

Qualitative: Unimodal v.s. Multimodal

Returns back multiple modes for ambiguous cases

19 of 39

Qualitative: Unimodal v.s. Multimodal

20 of 39

Evaluating Depth Distribution

ROC curve

Sort predictions by confidence;
Compute average error over a fraction of most confident predictions.

A good depth distribution has

uncertainty measurement (entropy) is well-correlated with actual error.
accurate depth estimation

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

21 of 39

Evaluating Depth Distribution

ROC curve

Sort predictions by confidence;
Compute average error over a fraction of most confident predictions.

A good depth distribution has

uncertainty measurement (entropy) is well-correlated with actual error.
accurate depth estimation

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

22 of 39

Evaluating Depth Distribution

ROC curve

Sort predictions by confidence;
Compute average error over a fraction of most confident predictions.

A good depth distribution has

uncertainty measurement (entropy) is well-correlated with actual error.
accurate depth estimation

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

23 of 39

Evaluating Depth Distribution

ROC curve

Sort predictions by confidence;
Compute average error over a fraction of most confident predictions.

Quantitative metric

Area under curve (AUC)

A good depth distribution has

uncertainty measurement (entropy) is well-correlated with actual error.
accurate depth estimation

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

24 of 39

Sorted with error

26 of 39

Evaluating Depth Distribution

[1] Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

[2] Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen. "Estimating depth from monocular images as classification using deep fully convolutional residual networks." IEEE Transactions on Circuits and Systems for Video Technology. 2017.

Better than Gaussian [1] and Softmax [2]
No need of Monte Carlo sampling

27 of 39

Standard Depth Estimation

[1] Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

[3] Fu, Huan, et al. "Deep ordinal regression network for monocular depth estimation." CVPR. 2018.

Better than SOTA with the same backbone.

28 of 39

Application: Dense Monocular Mapping

Task

Given camera pose and monocular depth estimation, build a dense voxel map.

Challenges

Depth discontinuities

Streak-like artifacts
Increase memory overhead

Our solution

Only consider the reliable depth estimates

Bârsan, Ioan Andrei, et al. "Robust dense mapping for large-scale dynamic environments." ICRA, 2018.

29 of 39

Depth estimation (blue-> far away)

image coordinate

Assuming we know the camera intrinsics and extrinsics,

depth

30 of 39

Depth estimation (blue-> far away)

Uncertainty estimation (blue-> small uncertainty)

depth

image coordinate

Assuming we know the camera intrinsics and extrinsics,

entropy

we further remove unreliable points with high entropy

31 of 39

Depth estimation (blue-> far away)

Uncertainty estimation (blue-> small uncertainty)

depth

image coordinate

Assuming we know the camera intrinsics and extrinsics,

entropy

we further remove unreliable points with high entropy

Octomap

32 of 39

Application: Online Monocular Mapping

34 of 39

188 MB memory

LiDAR

35 of 39

243 MB memory

Ours

36 of 39

182 MB memory

Ours-uncertainty

37 of 39

LiDAR

Ours

Ours-uncertainty

38 of 39

Summary

An approach to estimate distributions over depth
Binary classification for each voxel on a log-polar grid

produces reliable depth uncertainty, with multiple modes
much faster than sampling-based approaches

1 of 39

2 of 39

3 of 39

4 of 39

5 of 39

6 of 39

7 of 39

8 of 39

9 of 39

10 of 39

11 of 39

12 of 39

13 of 39

14 of 39

15 of 39

16 of 39

17 of 39

18 of 39

19 of 39

20 of 39

21 of 39

22 of 39

23 of 39

24 of 39

25 of 39

26 of 39

27 of 39

28 of 39

29 of 39

30 of 39

31 of 39

32 of 39

33 of 39

34 of 39

35 of 39

36 of 39

37 of 39

38 of 39

39 of 39