1 of 37

Creating a Disparity Map of Two Images from a Smartphone Camera

Final Project Presentation

Bryan Heryanto

Spring 2023

2 of 37

Final Presentation

Disparity

2

Spring 2023

depth

“The disparity is the distance between a pixel and its horizontal match in the other image.” Akhavan et al. (2013, 2014)

To calculate disparity we need rectified image

problem: not every smartphone have

stereo camera

This research focus on using translation of mono camera

3 of 37

Final Presentation

Overview and few things that is used in Project

Feature Matching – ORB (Oriented Fast and Rotated BRIEF)

Calculate Fundamental – (8 point algorithm with RANSAC (implemented myself) and opencv 7 point algorithm ransac)

Stereo Rectify (Hartley (my implementation, OpenCV))

Stereo Matching - (SSD (myself) , SAD (OpenCV BFMatch and SGBM), HitNet

3

Spring 2023

4 of 37

Final Presentation

Added and Deprecated Plan

Take 2 Picture of Objects using any camera

Either Manually or with SIFT based library (Changed

to ORB)

Using 8 point algorithm (Calibrate the camera for finding Essential)

Feature Matching

Calculate Fundamental and/or Essential

Using Hartley & Zisserman method or Trucco & VIeri method

Doing stereo rectification

Find Disparity

Using Matching Transformation

with several cost (Changed

to with SSD, SAD , SGBM and HITNet (Data-Driven Approach))

4

Spring 2023

5 of 37

Final Presentation

About Implementation of SSD

Open CV use SAD Implementation that make use good dynamic programming
Computation is optimized

My Implementation runs at around 5 minutes compared to open cv that can run in under 30 second

5

my result for middleburry dataset:

Solution: Change to stereoBM

OpenCv implementation

Spring 2023

6 of 37

Final Presentation

SGBM (Semi- Global Block Matching)

6

H. Hirschmuller, "Stereo Processing by Semiglobal Matching and Mutual Information," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328-341, Feb. 2008, doi: 10.1109/TPAMI.2007.1166.

Spring 2023

7 of 37

Final Presentation

Before SBGM (Energy Minimization)

The pixelwise cost and the smoothness constraints are expressed by defining the energy E(D) that depends on the disparity image D

7

The first term is the sum of all pixel matching costs

The second term adds a constant penalty P1 for all pixels q in the neighborhood N_p of p , for which the disparity changes a little bit (that is, 1 pixel)

The third term adds a larger constant penalty P2, for all larger disparity changes

PROBLEM : NP-COMPLETE

Spring 2023

8 of 37

Final Presentation

Before SBGM (Scan-Line Aprroach)

8

Using DP it can be solved in polynomial time but it easily suffer from streaking

cause of problem : very strong constraints in one direction along image rows and it is combined with none or much weaker constraints in the other direction along image columns.

Spring 2023

9 of 37

Final Presentation

SBGM

9

each cell run by path(number of direction) number of times

Idea of aggregating matching cost of 1D from all direction equally

Spring 2023

10 of 37

Final Presentation

SGBM VS OpenCV (From OpenCV docs)

By default, the algorithm is single-pass, which means that you consider only 5 directions instead of 8. Set mode=StereoSGBM::MODE_HH in createStereoSGBM to run the full variant of the algorithm but beware that it may consume a lot of memory.
The algorithm matches blocks, not individual pixels. Though, setting blockSize=1 reduces the blocks to single pixels.
Mutual information cost function is not implemented. Instead, a simpler Birchfield-Tomasi sub-pixel metric from is used. Though, the color images are supported as well.
Some pre- and post- processing steps from K. Konolige algorithm StereoBM are included, for example: pre-filtering (StereoBM::PREFILTER_XSOBEL type) and post-filtering (uniqueness check, quadratic interpolation and speckle filtering).

10

Spring 2023

11 of 37

Final Presentation

HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching (2022)

11

Spring 2023

12 of 37

Final Presentation

HITNet (Feature Extraction)

12

U-Net like architecture (Encoder and Decoder with Skip Connection)
Obtain two multi-scale representations ε^L and ε^R of feature map from the upsampling block

Spring 2023

13 of 37

Final Presentation

HITNet (Initialization)

13

The goal is to get tile hypotheses
4 × 4 convolution on each extracted feature map e_i where for ε^Lused 4 × 4 strides and for ε^R used 4x1 stride

matching cost:

Initial Disparity:

Tile Hypotheses:

Feature Descriptor:

Spring 2023

14 of 37

Final Presentation

HITNet (Propagation) –Warping

14

The warping step computes matching costs between the feature maps e_i^L l and e_i^Rat every feature resolution associated to the tiles.

Spring 2023

15 of 37

Final Presentation

HITNet (Propagation) – Tile Update Prediction

augment the tile hypothesis with the matching costs φ from the warping step for refining the result of hypothesis

15

Spring 2023

16 of 37

Final Presentation

Test Image

16

Spring 2023

17 of 37

Final Presentation

1^stPipeline

ORB (Oriented Fast and BRIEF) – 8 point algorithm with RANSAC (implemented myself)- Hartley Stereo Matching(implemented myself)- BFMatching

17

Spring 2023

18 of 37

Final Presentation

What is Wrong?

18

Spring 2023

19 of 37

Final Presentation

2^ndPipeline (Feature Matching Fix)

ORB (Oriented Fast and BRIEF) with FLANN Matching – 8 point algorithm with RANSAC (implemented myself)- Hartley Stereo Matching(implemented myself)- BFMatching

19

attempt 1

attempt 2

Spring 2023

20 of 37

Final Presentation

2^ndPipeline Result

20

Spring 2023

21 of 37

Final Presentation

3^rd Pipeline

ORB (Oriented Fast and BRIEF) – 8 point algorithm with Ransac (implemented myself)- Hartley Stereo Matching(implemented myself)- SGBMMatching

21

Spring 2023

22 of 37

Final Presentation

4^th Pipeline

ORB (Oriented Fast and BRIEF) – 8 point algorithm with Ransac (implemented myself)- Hartley Stereo Matching(implemented by openCV)- SGBMMatching

22

No Big Effect because same Algorithm

Spring 2023

23 of 37

Final Presentation

5^th Pipeline Moving on to OpenCV

ORB (Oriented Fast and BRIEF) – 7 point algorithm with Ransac (implemented with openCV)- Hartley Stereo Matching(implemented by openCV)- SBGMMatching

23

Spring 2023

24 of 37

Final Presentation

6^th Pipeline

ORB (Oriented Fast and BRIEF) – 8 point algorithm with Ransac (implemented myself) - Hartley Stereo Matching(implemented by openCV)- HITNet

24

Spring 2023

25 of 37

Final Presentation

7^th Pipeline

ORB (Oriented Fast and BRIEF) – 7 point algorithm with Ransac (implemented with openCV) - Hartley Stereo Matching(implemented by openCV)- HITNet

25

Spring 2023

26 of 37

Final Presentation

Trying to project into left Camera

26

Spring 2023

27 of 37

Final Presentation

Test on other images 1^st

27

Spring 2023

28 of 37

Final Presentation

Result Test on other images 1^st

28

Spring 2023

29 of 37

Final Presentation

Images 3 and 4

29

Spring 2023

30 of 37

Final Presentation

Result Images 3 and 4

30

low number of point corespondence

low number of point correspondence and too large baseline

Spring 2023

31 of 37

Final Presentation

Result Image 5

31

Spring 2023

32 of 37

Final Presentation

Result Image 5

32

Small Baseline: move only little bit and the object is almost fronto parallel to camera because of plannar

Spring 2023

33 of 37

Final Presentation

Fun Things With Disparity

33

Spring 2023

34 of 37

Final Presentation

Fun Things With Disparity

34

Spring 2023

35 of 37

Thank You

(QnA)

35

36 of 37

Final Presentation

The Conclusion

My method of finding disparity map is still not robust enough so there must be a lot of things to try on:

Calibration (Removing Projective Ambiguity)
Stroger feature matching (SuperGlue)
Different method of rectification ex: Loop & Zhang
Data-driven rectification or single-stage method of rectification and stereo matching

36

Spring 2023

37 of 37

Final Presentation

Reference and Fair Use

https://vision.middlebury.edu/stereo/data/scenes2005/
Hirschmuller, Heiko. "Stereo processing by semiglobal matching and mutual information." IEEE Transactions on pattern analysis and machine intelligence 30.2 (2007): 328-341.
Tankovich, Vladimir, et al. "Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
http://saurabhg.web.illinois.edu/teaching/ece549/sp2020/
https://en.wikipedia.org/wiki/Semi-global_matching
https://docs.opencv.org/

37

Spring 2023