Binocular Stereo
CSC606: Intro Computer Vision
Single image stereogram, https://en.wikipedia.org/wiki/Autostereogram
What is this?
These slides are taken from Cornell university
From last time: 3D modeling from a photograph
video by Antonio Criminisi
3D modeling from a photograph
Flagellation. Piero della Francesca. c1453.
Related problem: camera calibration
Vanishing points and projection matrix
= vx (X vanishing point)
Z
3
Y
2
,
similarly,
v
π
v
π
=
=
Not So Fast! We only know v’s up to a scale factor
Calibration using a reference object
Issues
AR codes
ArUco
Estimating the projection matrix
Alternative: multi-plane calibration
Images courtesy Jean-Yves Bouguet
Advantage
Single-image depth prediction using deep learning
Image
Depth map
Deep learning
Li and Snavely. Megadepth: Learning single-view depth prediction from internet photos. CVPR 2018.
MiDaS depth prediction
Ranftl et al. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer.�
Single-image depth prediction
Picture credit: Magritte, The Treachery of Images, and the Berkeley Computer Vision Group
Miangoleh*, Dille*, Mai, Paris, and Aksoy.
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging.
CVPR 2021.
Deep geometry prediction
Questions?
“Mark Twain at Pool Table", no date, UCR Museum of Photography
Stereo Vision as Localizing Points in 3D
Stereo
Epipolar geometry
epipolar lines
(x1, y1)
(x2, y1)
x2 - x1 = the disparity of pixel (x1, y1)
Two images captured by a purely horizontal translating camera
(rectified stereo pair)
Disparity = inverse depth
(Or, hold a finger in front of your face and wink each eye in succession.)
Your basic stereo matching algorithm
Your basic stereo matching algorithm
For each epipolar line
For each pixel in the left image
Improvement: match windows
Stereo matching based on SSD
SSD
dmin d
Best matching disparity
Window size
- more noise
- less detail
W = 3
W = 20
Better results with adaptive window
Effect of window size
Stereo results
Ground truth
Scene
Results with window search
Window-based matching
(best window size)
Ground truth
Better methods exist...
Graph cuts-based method
Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,
International Conference on Computer Vision 1999.
Ground truth
For the latest and greatest: http://www.middlebury.edu/stereo/
Stereo as energy minimization
Stereo as energy minimization
SSD distance between windows I(x, y) and J(x + d(x,y), y)
=
Stereo as energy minimization
y = 141
C(x, y, d); the disparity space image (DSI)
x
d
Stereo as energy minimization
y = 141
x
d
Simple pixel / window matching: choose the minimum of each column in the DSI independently:
Greedy selection of best match
Stereo as energy minimization
{
{
match cost
smoothness cost
Want each pixel to find a good match in the other image
Adjacent pixels should (usually) move about the same amount
Stereo as energy minimization
match cost:
smoothness cost:
4-connected neighborhood
8-connected neighborhood
: set of neighboring pixels
Smoothness cost
“Potts model”
L1 distance
How do we choose V?
Smoothness cost
Dynamic programming
Dynamic programming
y = 141
x
d
Dynamic Programming
Dynamic programming
Slide credit: D. Huttenlocher
Stereo as a minimization problem
Questions?
Depth from disparity
f
x
x’
baseline
z
C
C’
X
f
Stereo reconstruction pipeline
What will cause errors?
Variants of stereo
Real-time stereo
Nomad robot searches for meteorites in Antartica
Active stereo with structured light
camera 2
camera 1
projector
camera 1
projector
Li Zhang’s one-shot stereo
Active stereo with structured light
Laser scanning
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
3D Photography on your Desk
Questions?