1 of 55

Binocular Stereo

CSC606: Intro Computer Vision

Single image stereogram, https://en.wikipedia.org/wiki/Autostereogram

What is this?

These slides are taken from Cornell university

2 of 55

From last time: 3D modeling from a photograph

video by Antonio Criminisi

3 of 55

3D modeling from a photograph

Flagellation. Piero della Francesca. c1453.

4 of 55

5 of 55

Vanishing points and projection matrix

= v_x (X vanishing point)

Z

3

Y

2

,

similarly,

v

π

v

π

=

Not So Fast! We only know v’s up to a scale factor

Can fully specify by providing 3 reference points with known coordinates

6 of 55

Calibration using a reference object

Place a known object in the scene

identify correspondence between image and scene
compute mapping from scene to image

Issues

must know geometry very accurately
must know 3D -> 2D correspondence

7 of 55

AR codes

ArUco

8 of 55

Estimating the projection matrix

Place a known object in the scene

identify correspondence between image and scene
compute mapping from scene to image

9 of 55

Alternative: multi-plane calibration

Images courtesy Jean-Yves Bouguet

Advantage

Only requires a plane
Don’t have to know positions/orientations
Good code available online! (including in OpenCV)

Matlab version by Jean-Yves Bouget: http://www.vision.caltech.edu/bouguetj/calib_doc/index.html
Amy Tabb’s camera calibration software: https://github.com/amy-tabb/basic-camera-calibration

10 of 55

Single-image depth prediction using deep learning

Image

Depth map

Deep learning

Li and Snavely. Megadepth: Learning single-view depth prediction from internet photos. CVPR 2018.

11 of 55

MiDaS depth prediction

https://github.com/intel-isl/MiDaS

https://gradio.app/g/AK391/MiDaS

Ranftl et al. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer.�

12 of 55

Single-image depth prediction

Picture credit: Magritte, The Treachery of Images, and the Berkeley Computer Vision Group

Miangoleh*, Dille*, Mai, Paris, and Aksoy.

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging.

CVPR 2021.

13 of 55

Deep geometry prediction

More on this topic later!

14 of 55

Questions?

15 of 55

“Mark Twain at Pool Table", no date, UCR Museum of Photography

16 of 55

https://giphy.com/gifs/wigglegram-706pNfSKyaDug

17 of 55

Stereo Vision as Localizing Points in 3D

An object point will project to some point in our image
That image point corresponds to a ray in the world
Two rays intersect at a single point, so if we want to localize points in 3D we need 2 eyes

18 of 55

Stereo

Given two images from different viewpoints

How can we compute the depth of each point in the image?
Based on how much each pixel moves between the two images

19 of 55

Epipolar geometry

epipolar lines

(x₁, y₁)

(x₂, y₁)

x₂- x₁= the disparity of pixel (x₁, y₁)

Two images captured by a purely horizontal translating camera

(rectified stereo pair)

20 of 55

Disparity = inverse depth

http://stereo.nypl.org/view/41729

(Or, hold a finger in front of your face and wink each eye in succession.)

21 of 55

Your basic stereo matching algorithm

Match Pixels in Conjugate Epipolar Lines

Assume brightness constancy
This is a challenging problem
Hundreds of approaches

A good survey and evaluation: http://www.middlebury.edu/stereo/

22 of 55

Your basic stereo matching algorithm

For each epipolar line

For each pixel in the left image

compare with every pixel on same epipolar line in right image

pick pixel with minimum match cost

Improvement: match windows

23 of 55

Stereo matching based on SSD

SSD

^dmin d

Best matching disparity

24 of 55

Window size

Smaller window

more detail

- more noise

Larger window

less noise

- less detail

W = 3

W = 20

Better results with adaptive window

T. Kanade and M. Okutomi, A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment, ICRA 1991.
D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. IJCV, July 1998

Effect of window size

25 of 55

Stereo results

Data from University of Tsukuba
Similar results on other images without ground truth

Ground truth

Scene

26 of 55

Results with window search

Window-based matching

(best window size)

Ground truth

27 of 55

Better methods exist...

Graph cuts-based method

Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,

International Conference on Computer Vision 1999.

Ground truth

For the latest and greatest: http://www.middlebury.edu/stereo/

28 of 55

Stereo as energy minimization

What defines a good stereo correspondence?

Match quality

Want each pixel to find a good match in the other image

Smoothness

If two pixels are adjacent, they should (usually) move about the same amount

29 of 55

Stereo as energy minimization

Find disparity map d that minimizes an energy function

Simple pixel / window matching

SSD distance between windows I(x, y) and J(x + d(x,y), y)

=

30 of 55

Stereo as energy minimization

y = 141

C(x, y, d); the disparity space image (DSI)

x

d

31 of 55

Stereo as energy minimization

y = 141

x

d

Simple pixel / window matching: choose the minimum of each column in the DSI independently:

32 of 55

Greedy selection of best match

33 of 55

Stereo as energy minimization

Better objective function

{

match cost

smoothness cost

Want each pixel to find a good match in the other image

Adjacent pixels should (usually) move about the same amount

34 of 55

Stereo as energy minimization

match cost:

smoothness cost:

4-connected neighborhood

8-connected neighborhood

: set of neighboring pixels

35 of 55

Smoothness cost

“Potts model”

L₁ distance

How do we choose V?

36 of 55

Smoothness cost

If λ = infinity, then we only consider smoothness
Optimal solution is a surface of constant depth/disparity

Fronto-parallel surface

In practice, want to balance data term with smoothness term

37 of 55

Dynamic programming

Can minimize this independently per scanline using dynamic programming (DP)

38 of 55

Dynamic programming

Finds “smooth”, low-cost path through DPI from left to right
Visiting a node incurs its data cost, switching disparities from one column to the next also incurs a (smoothness) cost

y = 141

x

d

39 of 55

Dynamic Programming

40 of 55

Dynamic programming

Can we apply this trick in 2D as well?

No: the shortest path trick only works to find a 1D path

Slide credit: D. Huttenlocher

41 of 55

Stereo as a minimization problem

The 2D problem has many local minima

Gradient descent doesn’t work well

And a large search space

n x m image w/ k disparities has k^nm possible solutions
Finding the global minimum is NP-hard in general

Good approximations exist (e.g., graph cuts algorithms)

42 of 55

Questions?

43 of 55

Depth from disparity

f

x

x’

baseline

z

C

C’

X

f

44 of 55

Stereo reconstruction pipeline

Steps

Calibrate cameras
Rectify images
Compute disparity
Estimate depth

Camera calibration errors
Poor image resolution
Occlusions
Violations of brightness constancy (specular reflections)
Large motions
Low-contrast image regions

What will cause errors?

45 of 55

Variants of stereo

46 of 55

Real-time stereo

Used for robot navigation (and other tasks)

Several real-time stereo techniques have been developed (most based on simple discrete search)

Nomad robot searches for meteorites in Antartica

47 of 55

Active stereo with structured light

Project “structured” light patterns onto the object

simplifies the correspondence problem
basis for active depth sensors, such as Kinect and iPhone X (using IR)

camera 2

camera 1

projector

camera 1

projector

Li Zhang’s one-shot stereo

48 of 55

Active stereo with structured light

https://ios.gadgethacks.com/news/watch-iphone-xs-30k-ir-dots-scan-your-face-0180944/

49 of 55

Laser scanning

Optical triangulation

Project a single stripe of laser light
Scan it across the surface of the object
This is a very precise version of structured light scanning

Digital Michelangelo Project

http://graphics.stanford.edu/projects/mich/

50 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

51 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

52 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

53 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

54 of 55

3D Photography on your Desk

http://www.vision.caltech.edu/bouguetj/ICCV98/

55 of 55

Questions?