1 of 39

Inferring Distributions Over Depth

from a Single Image

Gengshan Yang, Peiyun Hu and Deva Ramanan

Carnegie Mellon University

2 of 39

Why monocular images?

  • Costs, payload, power consumption, ...
  • Also, safety: sensor redundancy / fusion

2

LiDAR depth image

3 of 39

Why estimating depth distributions?

3

[1] Li, Zhengqi, and Noah Snavely. "MegaDepth: Learning single-view depth prediction from internet photos." CVPR 2018.

[2] Fu, Huan, et al. "Deep ordinal regression network for monocular depth estimation." CVPR. 2018.

[3] Godard, Clément, et alGodard, Clément, et al. "Digging into self-supervised monocular depth estimation." ICCV. 2019.

MegaDepth [1]

DORN [2]

MonoDepth2 [3]

4 of 39

Why estimating depth distributions?

4

  • self-awareness of possible failure

input image

depth estimation

(yellow->near)

5 of 39

Why estimating depth distributions?

5

  • graceful-degradation in functionality

6 of 39

Our Probabilistic Solution

6

  • Discretize depth into K intervals in log space;
  • Predict occupancy probability for each depth interval.

d=80m

d=40m

d=20m

d=10m

7 of 39

Our Probabilistic Solution

7

  • Report back occupancy probabilities on a (H,W,K) voxel grid.

d=80m

d=40m

d=20m

d=10m

8 of 39

Our Probabilistic Solution

8

  • At train time, construct K binary labels, {0,1,0,0} from ground-truth depth.
  • Train with binary cross entropy loss.

d=80m

d=40m

d=20m

d=10m

d*=41m

9 of 39

Our Probabilistic Solution

9

  • At train time, construct K binary labels, or a soft-target, {.6, .9, .6, .1} from ground-truth depth.
  • Train with binary cross entropy loss.

d=80m

d=40m

d=20m

d=10m

d*=41m

10 of 39

Our Probabilistic Solution

10

  • At test time, normalize the occupancy scores into a distribution over depth.
    • no need of sampling

d=80m

d=40m

d=20m

d=10m

11 of 39

Network Architecture

11

A lightweight architecture estimating distributions over depth given a single image.

12 of 39

Network Architecture

12

A lightweight architecture estimating distributions over depth given a single image.

13 of 39

Network Architecture

13

A lightweight architecture estimating distributions over depth given a single image.

14 of 39

Experiments

14

15 of 39

Baselines: Unimodal Gaussian

15

  • Prediction
    • normal distribution
  • Loss
    • negative log-likelihood

d=80m

d=40m

d=20m

d=10m

Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

16 of 39

Baselines: Unimodal Gaussian

16

  • Prediction
    • normal distribution
  • Loss
    • negative log-likelihood
  • Monte Carlo dropout

d=80m

d=40m

d=20m

d=10m

Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

17 of 39

Baselines: Softmax

17

  • Prediction
    • multi-class distribution
  • Loss
    • multi-class cross-entropy

d=80m

d=40m

d=20m

d=10m

Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen. "Estimating depth from monocular images as classification using deep fully convolutional residual networks." IEEE Transactions on Circuits and Systems for Video Technology. 2017.

18 of 39

Qualitative: Unimodal v.s. Multimodal

18

  • Returns back multiple modes for ambiguous cases

19 of 39

Qualitative: Unimodal v.s. Multimodal

19

20 of 39

Evaluating Depth Distribution

  • ROC curve
    • Sort predictions by confidence;
    • Compute average error over a fraction of most confident predictions.
  • A good depth distribution has
    • uncertainty measurement (entropy) is well-correlated with actual error.
    • accurate depth estimation

20

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

21 of 39

Evaluating Depth Distribution

  • ROC curve
    • Sort predictions by confidence;
    • Compute average error over a fraction of most confident predictions.
  • A good depth distribution has
    • uncertainty measurement (entropy) is well-correlated with actual error.
    • accurate depth estimation

21

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

22 of 39

Evaluating Depth Distribution

  • ROC curve
    • Sort predictions by confidence;
    • Compute average error over a fraction of most confident predictions.
  • A good depth distribution has
    • uncertainty measurement (entropy) is well-correlated with actual error.
    • accurate depth estimation

22

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

23 of 39

Evaluating Depth Distribution

  • ROC curve
    • Sort predictions by confidence;
    • Compute average error over a fraction of most confident predictions.
  • Quantitative metric
    • Area under curve (AUC)
  • A good depth distribution has
    • uncertainty measurement (entropy) is well-correlated with actual error.
    • accurate depth estimation

23

Hu, Xiaoyan, and Philippos Mordohai. "A quantitative evaluation of confidence measures for stereo vision." TPAMI, 2012.

24 of 39

24

Sorted with error

25 of 39

25

26 of 39

Evaluating Depth Distribution

26

[1] Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

[2] Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen. "Estimating depth from monocular images as classification using deep fully convolutional residual networks." IEEE Transactions on Circuits and Systems for Video Technology. 2017.

  • Better than Gaussian [1] and Softmax [2]
  • No need of Monte Carlo sampling

27 of 39

Standard Depth Estimation

27

[1] Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." NeurIPS. 2017.

[2] Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen. "Estimating depth from monocular images as classification using deep fully convolutional residual networks." IEEE Transactions on Circuits and Systems for Video Technology. 2017.

[3] Fu, Huan, et al. "Deep ordinal regression network for monocular depth estimation." CVPR. 2018.

  • Better than SOTA with the same backbone.

28 of 39

Application: Dense Monocular Mapping

  • Task
    • Given camera pose and monocular depth estimation, build a dense voxel map.
  • Challenges
    • Depth discontinuities
      • Streak-like artifacts
      • Increase memory overhead
  • Our solution
    • Only consider the reliable depth estimates

28

Bârsan, Ioan Andrei, et al. "Robust dense mapping for large-scale dynamic environments." ICRA, 2018.

29 of 39

Depth estimation (blue-> far away)

29

image coordinate

Assuming we know the camera intrinsics and extrinsics,

depth

30 of 39

Depth estimation (blue-> far away)

Uncertainty estimation (blue-> small uncertainty)

30

depth

image coordinate

Assuming we know the camera intrinsics and extrinsics,

entropy

we further remove unreliable points with high entropy

31 of 39

Depth estimation (blue-> far away)

Uncertainty estimation (blue-> small uncertainty)

31

depth

image coordinate

Assuming we know the camera intrinsics and extrinsics,

entropy

we further remove unreliable points with high entropy

Octomap

32 of 39

Application: Online Monocular Mapping

32

33 of 39

33

34 of 39

34

188 MB memory

LiDAR

35 of 39

35

243 MB memory

Ours

36 of 39

36

182 MB memory

Ours-uncertainty

37 of 39

37

LiDAR

Ours

Ours-uncertainty

38 of 39

Summary

  • An approach to estimate distributions over depth
  • Binary classification for each voxel on a log-polar grid
    • produces reliable depth uncertainty, with multiple modes
    • much faster than sampling-based approaches

38

39 of 39

Thanks!

39