1 of 55

Binocular Stereo

CSC606: Intro Computer Vision

What is this?

These slides are taken from Cornell university

2 of 55

From last time: 3D modeling from a photograph

video by Antonio Criminisi

3 of 55

3D modeling from a photograph

Flagellation. Piero della Francesca. c1453.

4 of 55

Related problem: camera calibration

  • Goal: estimate the camera parameters
    • Version 1: solve for 3x4 projection matrix

    • Version 2: solve for camera parameters separately
      • intrinsics (focal length, principal point, pixel size)
      • extrinsics (rotation angles, translation)
      • radial distortion

5 of 55

Vanishing points and projection matrix

= vx (X vanishing point)

Z

3

Y

2

,

similarly,

v

π

v

π

=

=

Not So Fast! We only know v’s up to a scale factor

    • Can fully specify by providing 3 reference points with known coordinates

6 of 55

Calibration using a reference object

  • Place a known object in the scene
    • identify correspondence between image and scene
    • compute mapping from scene to image

Issues

    • must know geometry very accurately
    • must know 3D -> 2D correspondence

7 of 55

AR codes

ArUco

8 of 55

Estimating the projection matrix

  • Place a known object in the scene
    • identify correspondence between image and scene
    • compute mapping from scene to image

9 of 55

Alternative: multi-plane calibration

Images courtesy Jean-Yves Bouguet

Advantage

10 of 55

Single-image depth prediction using deep learning

Image

Depth map

Deep learning

Li and Snavely. Megadepth: Learning single-view depth prediction from internet photos. CVPR 2018.

11 of 55

MiDaS depth prediction

Ranftl et al. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer.�

12 of 55

Single-image depth prediction

Picture credit: Magritte, The Treachery of Images, and the Berkeley Computer Vision Group

Miangoleh*, Dille*, Mai, Paris, and Aksoy.

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging.

CVPR 2021.

13 of 55

Deep geometry prediction

  • More on this topic later!

14 of 55

Questions?

15 of 55

“Mark Twain at Pool Table", no date, UCR Museum of Photography

16 of 55

17 of 55

Stereo Vision as Localizing Points in 3D

  • An object point will project to some point in our image
  • That image point corresponds to a ray in the world
  • Two rays intersect at a single point, so if we want to localize points in 3D we need 2 eyes

18 of 55

Stereo

  • Given two images from different viewpoints
    • How can we compute the depth of each point in the image?
    • Based on how much each pixel moves between the two images

19 of 55

Epipolar geometry

epipolar lines

(x1, y1)

(x2, y1)

x2 - x1 = the disparity of pixel (x1, y1)

Two images captured by a purely horizontal translating camera

(rectified stereo pair)

20 of 55

Disparity = inverse depth

(Or, hold a finger in front of your face and wink each eye in succession.)

21 of 55

Your basic stereo matching algorithm

  • Match Pixels in Conjugate Epipolar Lines
    • Assume brightness constancy
    • This is a challenging problem
    • Hundreds of approaches
      • A good survey and evaluation: http://www.middlebury.edu/stereo/

22 of 55

Your basic stereo matching algorithm

For each epipolar line

For each pixel in the left image

      • compare with every pixel on same epipolar line in right image
      • pick pixel with minimum match cost

Improvement: match windows

23 of 55

Stereo matching based on SSD

SSD

dmin d

Best matching disparity

24 of 55

Window size

    • Smaller window
      • more detail

- more noise

    • Larger window
      • less noise

- less detail

W = 3

W = 20

Better results with adaptive window

Effect of window size

25 of 55

Stereo results

    • Data from University of Tsukuba
    • Similar results on other images without ground truth

Ground truth

Scene

26 of 55

Results with window search

Window-based matching

(best window size)

Ground truth

27 of 55

Better methods exist...

Graph cuts-based method

Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,

International Conference on Computer Vision 1999.

Ground truth

For the latest and greatest: http://www.middlebury.edu/stereo/

28 of 55

Stereo as energy minimization

  • What defines a good stereo correspondence?
    1. Match quality
      • Want each pixel to find a good match in the other image
    2. Smoothness
      • If two pixels are adjacent, they should (usually) move about the same amount

29 of 55

Stereo as energy minimization

  • Find disparity map d that minimizes an energy function

  • Simple pixel / window matching

SSD distance between windows I(x, y) and J(x + d(x,y), y)

=

30 of 55

Stereo as energy minimization

y = 141

C(x, y, d); the disparity space image (DSI)

x

d

31 of 55

Stereo as energy minimization

y = 141

x

d

Simple pixel / window matching: choose the minimum of each column in the DSI independently:

32 of 55

Greedy selection of best match

33 of 55

Stereo as energy minimization

  • Better objective function

{

{

match cost

smoothness cost

Want each pixel to find a good match in the other image

Adjacent pixels should (usually) move about the same amount

34 of 55

Stereo as energy minimization

match cost:

smoothness cost:

4-connected neighborhood

8-connected neighborhood

: set of neighboring pixels

35 of 55

Smoothness cost

“Potts model”

L1 distance

How do we choose V?

36 of 55

Smoothness cost

  • If λ = infinity, then we only consider smoothness
  • Optimal solution is a surface of constant depth/disparity
    • Fronto-parallel surface

  • In practice, want to balance data term with smoothness term

37 of 55

Dynamic programming

  • Can minimize this independently per scanline using dynamic programming (DP)

38 of 55

Dynamic programming

  • Finds “smooth”, low-cost path through DPI from left to right
  • Visiting a node incurs its data cost, switching disparities from one column to the next also incurs a (smoothness) cost

y = 141

x

d

39 of 55

Dynamic Programming

40 of 55

Dynamic programming

  • Can we apply this trick in 2D as well?

  • No: the shortest path trick only works to find a 1D path

Slide credit: D. Huttenlocher

41 of 55

Stereo as a minimization problem

  • The 2D problem has many local minima
    • Gradient descent doesn’t work well

  • And a large search space
    • n x m image w/ k disparities has knm possible solutions
    • Finding the global minimum is NP-hard in general

  • Good approximations exist (e.g., graph cuts algorithms)

42 of 55

Questions?

43 of 55

Depth from disparity

f

x

x’

baseline

z

C

C’

X

f

44 of 55

Stereo reconstruction pipeline

  • Steps
    • Calibrate cameras
    • Rectify images
    • Compute disparity
    • Estimate depth

    • Camera calibration errors
    • Poor image resolution
    • Occlusions
    • Violations of brightness constancy (specular reflections)
    • Large motions
    • Low-contrast image regions

What will cause errors?

45 of 55

Variants of stereo

46 of 55

Real-time stereo

  • Used for robot navigation (and other tasks)
    • Several real-time stereo techniques have been developed (most based on simple discrete search)

Nomad robot searches for meteorites in Antartica

47 of 55

Active stereo with structured light

  • Project “structured” light patterns onto the object
    • simplifies the correspondence problem
    • basis for active depth sensors, such as Kinect and iPhone X (using IR)

camera 2

camera 1

projector

camera 1

projector

Li Zhang’s one-shot stereo

48 of 55

Active stereo with structured light

49 of 55

Laser scanning

  • Optical triangulation
    • Project a single stripe of laser light
    • Scan it across the surface of the object
    • This is a very precise version of structured light scanning

Digital Michelangelo Project

http://graphics.stanford.edu/projects/mich/

50 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

51 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

52 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

53 of 55

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

54 of 55

3D Photography on your Desk

55 of 55

Questions?