JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 456

Remaining course logistics

Final project reports are due on Tuesday 12/14 at 23:59 ET.

- Gradescope submission site available.

Keep an eye out for a Piazza announcement on returning borrowed equipment.

2 of 456

Class evaluation*s* – please take them!

CMU’s Faculty Course Evaluations (FCE): https://cmu.smartevals.com/

TA evaluation: https://www.ugrad.cs.cmu.edu/ta/F21/feedback/

15-463/663/862 end-of-semester survey: https://docs.google.com/forms/d/e/1FAIpQLSc6eXw_tbcxIF7-fm882V_7g80Q-l_34oObplTMCqOUgMcEkw/viewform

Please take all three of them, super helpful for developing future offerings of the class.

Thanks in advance!

3 of 456

Today’s judges

Matthew O’Toole

Jun-Yan Zhu

4 of 456

Computational Periscopy with an Ordinary Camera

BY: JOSH ABRAMS

5 of 456

Imagine…

You are standing near a corner with a wall.

6 of 456

Imagine…

You are standing near a corner with a wall.
There’s something really cool on the other side�that you want to take a picture of.

???

7 of 456

Imagine…

You are standing near a corner with a wall.
There’s something really cool on the other side�that you want to take a picture of.
The thing on the other side will kill you if you walk�around the corner.

8 of 456

Imagine…

You are standing near a corner with a wall.
There’s something really cool on the other side�that you want to take a picture of.
The thing on the other side will kill you if you walk�around the corner.
There’s something on the other side with known shape�(but potentially unknown position).

9 of 456

Imagine…

You are standing near a corner with a wall.
There’s something really cool on the other side�that you want to take a picture of.
The thing on the other side will kill you if you walk�around the corner.
There’s something on the other side with known shape�(but potentially unknown position).

10 of 456

Sounds awfully contrived don’t’cha think?

Well… yes
But potentially less contrived and a step forward in some ways
Maybe there’s something you could throw around the corner

11 of 456

Alright, so how does this work?

12 of 456

Alright, so how does this work?

[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.

13 of 456

Alright, so how does this work?

1. Make assumptions to simplify the modified rendering equation

[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.

14 of 456

Alright, so how does this work?

2. Estimate the occluder position and use this to estimate the lightfield matrix

[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.

15 of 456

Alright, so how does this work?

3. Solve least squares problem with total variance regularization

[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.

16 of 456

Why the occluder?

Improves conditioning for inversion of the light transport matrix. Without it, the columns are rather smooth and similar since every point on the wall receives light from every point in the image. This results in a not-very-well-defined inverse, as shown at left.

[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.

17 of 456

Did it work for them?

[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.

18 of 456

Did it work for you?

No :( — Will rant if there's time

19 of 456

Thanks! Any Questions?

20 of 456

Citations

All professional-looking equations and figures come from:
[1] Saunders, C., Murray-Bruce, J., & Goyal, V. K. (2019). Computational periscopy with an ordinary digital camera. Nature, 565(7740), 472–475. https://doi.org/10.1038/s41586-018-0868-6

21 of 456

Acoustic lenses

(Deblurring weird lenses)

Hossein Baktash

22 of 456

Optics (EM waves)

Acoustics (mechanical waves)

23 of 456

Acoustics + Optics ??

24 of 456

Refraction

Depends on the material

And that’s how lenses work

25 of 456

What do acoustic waves have to do with this?

Pressure in the material changes (spatially and temporally)

This changes refractive index at every point

Too hard!

Let’s use this one just for water

Where c is a constant and p is pressure

26 of 456

Change in refraction

How the change happens:�
c has been empirically found for water and it is very small! ��
e.g. to make a change of 0.1 in refractive index of water, we need��
Remember glass (~1.5) vs air (~1.0)

27 of 456

So, it's hard to affect light with mechanical waves

28 of 456

But it's possible!

Ignore the wires!

Continuous refraction:

Cylindrical transducer

Just apply voltage to the surface

Pressure profile

29 of 456

(transducer)

US OFF

US ON

30 of 456

Can we do imaging with this?

Yes. Here are the PSFs of our lens

~1mm

Spatially varying PSF
Almost Gaussian
Deconvolution is very doable
Can do sparse sampling of PSFs and treat them as locally invariant PSFs

Pressure pattern

31 of 456

Results

Simulated data

Experimental data

32 of 456

More interesting patterns

Mode 0 pattern:

Mode 2:

33 of 456

Mode 2

Why? Because it can be extended to travelling wave (not a closed cylinder)

Beam pattern in Mode 2

3 PSFs

PSFs are to large to deconvolve!

34 of 456

Ignore the outer parts

An image under mode 2

Top left pinhole

Center pinhole

Top center pinhole

Pulsing light in sync with ultrasound gives u this:

35 of 456

The inner part takes line integrals only

Project all pixels on this line

Orange plot is kinda being copied in every row
Take an average over rows
This also helps with noise

36 of 456

Radon transform

Project to 1D
Rotate the image a little
Project to 1D
Rotate
Project
…

37 of 456

In simulation

38 of 456

Reconstruction

39 of 456

Improvement with a 1D deconvolution

1D PSFs

40 of 456

True image Radon and inverted

Sharpened mode 2 reconstructed

41 of 456

More improvement with more sets of images

42 of 456

Captured images

The setup:

Only 8 rotations for now (actually 4)

1D deconvolve

43 of 456

After 1D deconvolving

44 of 456

A sparse target image

More rotations:

45 of 456

Will it work?

Yes :D
Needs a better setup, to make more rotations possible

46 of 456

Thank You!

47 of 456

Yi-Chun Chen

Building

Diffuser Camera for Photography

48 of 456

What is DiffuserCam?

Lensless imaging system
Replace lens with a diffuser
Advantages:
Lightweight
Create it at home!
Possibility of 3D imaging

https://waller-lab.github.io/DiffuserCam/tutorial/building_guide.pdf

49 of 456

My DiffuserCam

Raspberry Pi

Pi camera with lens removed

Sensor!

Lens!

50 of 456

My DiffuserCam

Scotch Tape as the diffuser

Black paper to construct aperture

51 of 456

Why could it work?

https://waller-lab.github.io/DiffuserCam/tutorial/building_guide.pdf

𝒗: a 2D array of light intensity values (scene), � the sum of many point sources of varying intensity and position

b: a 2D array of pixel values on the sensor

𝒉: PSF

52 of 456

Why could it work?

https://waller-lab.github.io/DiffuserCam/tutorial/building_guide.pdf

Solving a deconvolution problem.

Cropping function makes it not invertible.

Use alternating direction method of multipliers (ADMM) to optimize the reconstruction.

53 of 456

Process

https://waller-lab.github.io/DiffuserCam/tutorial/building_guide.pdf

54 of 456

Calibration and images

PSF - Out of focus

PSF - (Almost) in focus

Sensor reading of an object

55 of 456

Results

56 of 456

Results

57 of 456

Results

58 of 456

Results

59 of 456

Results

60 of 456

What’s more

3D Imaging

https://waller-lab.github.io/DiffuserCam/

61 of 456

Acknowledgement

Ching-Yi Lin (ECE)

Wu-Chou Kuo (III)

Joon Jang

Prof. Yannis

62 of 456

Convolution Color Constancy for Mobile Device Photography

Yutong Dai

63 of 456

Background

Color Constancy

— What color is the light illuminating the scene?

— ? —>

[2]

[2]

64 of 456

Goals

Requirements for Color Constancy

— speed of processing, mitigation of uncertainty, � temporal cohesion for frames, account for low input resolution

Objective

— generalizability of introduced algorithm to photograph sets other than the ones utilized and curated from DSLR cameras

[1]

65 of 456

Algorithm: Convolutional Color Constancy (CCC)

Discriminative Learning

�

— Optimizing:

[2]

66 of 456

Algorithm: CCC

Efficient Filtering

[2]

naive histogram depiction

67 of 456

Algorithm: CCC

Generalization

— create more potential inputs with the same properties and histograms

68 of 456

Sample Results

— two sources of lighting

— only room lighting

69 of 456

Citations

Nikola Banic, Karlo Koscevic, Marko Subasic, and Sven Loncaric. The past and the present of the color checker dataset misuse. CoRR, abs/1903.04473, 2019
Jonathan T. Barron. Convolutional color constancy. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

70 of 456

Thank you!

71 of 456

Dual Photography

Ravi Dudhagra

72 of 456

Background

Light Source

Camera

Scene

Prime Image

Helmholtz Reciprocity

fr(ωi → ωo) = fr(ωo → ωi)

(BRDF is the same going from A→B as from B→A)

Light

73 of 456

Background

(virtual)

Light Source

(virtual)

Camera

Scene

Dual Image

Dual Photography:

technique to interchange the lights and cameras in a scene

Helmholtz Reciprocity

fr(ωi → ωo) = fr(ωo → ωi)

(BRDF is the same going from A→B as from B→A)

74 of 456

Implementation

Camera

Scene

Projector

p

n

q

m

i

j

pixel i (projector-space)

maps to pixel j (camera space)

Call this mapping T (mn x pq)

75 of 456

Implementation

Camera

Scene

Projector

p

n

q

m

c = T p

p is some pattern we project (pq x 1)

c is the resulting camera image (mn x 1)

76 of 456

Implementation

Camera

Scene

Projector

p

n

q

m

p’’ = T^T c’’

c’’ is some pattern we project

(from the POV of the camera)

p’’ is the resulting dual image

(from the POV of the projector)

77 of 456

Acquiring T

(naive approach: capture an image for each projector pixel)

TWO PROBLEMS

Too many projector pixels!

1920*1080 = 2,073,600 images

T is VERY large

(1920*1080)*(1920*1080)*[8 bytes per num] = 34,398,535,680,000 (~34.4 TB)

78 of 456

Acquiring T

Use blocks

Split projector into blocks
Take image for each block
Identify blocks that didn’t interfere with each other
Subdivide the blocks
Run non-interfering blocks in parallel

p

q

1

2

3

4

Efficiently

79 of 456

Acquiring T

Use blocks

Split projector into blocks
Take image for each block
Identify blocks that didn’t interfere with each other
Subdivide the blocks
Run non-interfering blocks in parallel

p

q

1

2

3

4

1

2

3

4

Efficiently

80 of 456

Acquiring T

Use blocks

Split projector into blocks
Take image for each block
Identify blocks that didn’t interfere with each other
Subdivide the blocks
Run non-interfering blocks in parallel

p

q

1

2

3

4

1

2

3

4

Efficiently

81 of 456

Acquiring T

Use blocks

Split projector into blocks
Take image for each block
Identify blocks that didn’t interfere with each other
Subdivide the blocks
Run non-interfering blocks in parallel

p

q

5

1

2

3

4

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Efficiently

82 of 456

Acquiring T

Use blocks

Split projector into blocks
Take image for each block
Identify blocks that didn’t interfere with each other
Subdivide the blocks
Run non-interfering blocks in parallel

p

q

5

1

2

3

4

6

5

6

7

8

7

8

5

6

9

10

7

8

11

12

Efficiently

83 of 456

Setup

Projector

Webcam

Scene

84 of 456

Results

Prime image

Dual image

Capture sequence

85 of 456

Results

	# Subdivision levels	# Blocks	File size of T	Raw size of T matrix
Coffee Mug	16	943218	66MB	3.8TB
Box/cards	15	576833	37.7MB	3.8TB
Glass/spoon	15	159825	11.2MB	3.8TB

Camera: (960 x 540)

Projector: (1280 x 720)

86 of 456

Applications

Capturing lightfields

Multiple light sources cannot be used in parallel
Solution: use multiple cameras instead, dual-photography to convert

87 of 456

thanks!

Any questions?

See the code @

https://github.com/rdudhagra/dual-photography

Read the paper @

https://graphics.stanford.edu/papers/dual_photography/

?

88 of 456

A Comparison of Various Edge-Aware Filtering Techniques

By Gilbert Fan

89 of 456

Bilateral Filter

Simple weighted average where weights are determined by similarity (sig_r) and distance (sig_s).

90 of 456

Domain Transform

Break image into horizontal and vertical 1D signals (C + 1 dimensions).

Domain transform into 1D, must be isometric.

Apply a 1D filter to the transformed data repeatedly for set number of iterations.

91 of 456

Guided Filter

“Similar” to joint bilateral filter but instead of computing weights from guide image, the output is a linear transformation of the guide image with coefficients determined by the input image, calculated window by window.

OutputWindow = a * GuideWindow + b

Where a, b are determined by the input image window

High variance around pixel => a = 1, b = 0

Flat patch => a = 0, b = average of window

92 of 456

Laplacian Filter

Constructs a laplacian pyramid pixel by pixel from remappings of the input image using values of its gaussian pyramid, collapsing them to form the final image.

93 of 456

Bilateral

94 of 456

Domain

95 of 456

Guided

96 of 456

Laplacian

97 of 456

Original

98 of 456

Bilateral

99 of 456

Domain

100 of 456

Guided

101 of 456

Laplacian

102 of 456

Original

103 of 456

Bilateral

104 of 456

Domain

105 of 456

Domain

106 of 456

Guided

107 of 456

Laplacian

108 of 456

Citations

CHEN, J., PARIS, S., AND DURAND, F. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics (Proc. SIGGRAPH) 26, 3.

Eduardo S. L. Gastal and Manuel M. Oliveira. "Domain Transform for Edge-Aware Image and Video Processing". ACM Transactions on Graphics. Volume 30 (2011), Number 4, Proceedings of SIGGRAPH 2011, Article 69.

He, K., Sun, J., & Tang, X. (2013). Guided Image Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1397–1409. https://doi.org/10.1109/tpami.2012.213

Paris, S., Hasinoff, S. W., & Kautz, J. (2015). Local laplacian filters. Communications of the ACM, 58(3), 81–91. https://doi.org/10.1145/2723694

109 of 456

2D Light Direction Estimation Analysis Using Different Color Spaces

Mary Hatfalvi

110 of 456

2D Light Direction Estimation

Input Image

Contours traced around image mask

111 of 456

2D Light Source Estimation Algorithm [1]

L = Light Direction

N = 2D Normal Vectors (found from 2 connecting contour points)

I = Intensity Value of the Light interpolated at the location of the normal vector

This least squares equation can be rewritten as a pseudo inverse solution

Least Squares Solution for Light Direction [1]

M Matrix Definition [1]

[1] Kee, Eric & Farid, Hany. (2010). Exposing digital forgeries from 3-D lighting environments. 10.1109/WIFS.2010.5711437.

112 of 456

Color Spaces

Grayscale

XYZ

113 of 456

Color Spaces

LAB

YCbCr

114 of 456

Color Spaces

Illumination Invariant - (IIv) [1]

SUV [2]

[2] Satya P. Mallick, Todd E. Zickler, David J. Kriegman, and Peter N. Belhumeur, "Beyond Lambert: Reconstructing Specular Surfaces Using Color." in Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2005.

[1] Hamilton Y. Chong, Steven J. Gortler, and Todd Zickler. 2008. A perception-based color space for illumination-invariant image processing. In ACM SIGGRAPH 2008 papers (SIGGRAPH '08)DOI:https://doi.org/10.1145/1399504.1360660

115 of 456

Dataset used for Testing

Helmet Left [1]

Plant Left [1]

[1] “Light Stage Data Gallery.” Light Stage Data Gallery, https://vgl.ict.usc.edu/Data/LightStage/.

116 of 456

Helmet and Plant Error Results

Plant Left	Grayscale	XYZ	LAB	YCbCr	SUV	IIV
Mean Angle Error	0.7646	0.7188	0.7778	0.7677	0.5044	0.6431
Mean Euclidean Distance Error	0.4409	0.3995	0.455	0.445	0.1786	0.3064

Helmet Left	Grayscale	XYZ	LAB	YCbCr	SUV	IIV
Mean Angle Error	0.3939	0.6117	0.3701	0.3974	0.473	0.4593
Mean Euclidean Distance Error	0.1437	0.3434	0.1239	0.146	0.2227	0.1913

117 of 456

Taken Images

3 Light Sources

Example Image with Color Checker and Chrome Sphere for calculating 2D Light Direction [1]

[1] Ying Xiong (2021). PSBox (https://www.mathworks.com/matlabcentral/fileexchange/45250-psbox), MATLAB Central File Exchange. Retrieved December 8, 2021.

118 of 456

Interesting Images Results (Rubix Cube)

Input Image

GT 2D Light Estimation

XYZ - Y Channel

SUV Light Direction Estimation Results

SUV - Specularity Invariant Combination

XYZ light Direction Estimation Results

119 of 456

Thank you

120 of 456

Depth Estimation using the Symmetric Point Spread Function Model

Dual Pixel Defocus Disparity:

Anne He

Anne He

121 of 456

Motivation

Getting the depth map is very useful - refocusing gives nice stylistic effects, especially on lower end hardware
How can we get an accurate depth map without needing to collect a ton of data?

Other methods either require multiple images or very constrained scenarios
Learning-based approaches require large amounts of ground truth training data which is hard to acquire

Back to depth from defocus

??

??

Garg et al. Learning Single Camera Depth Estimation using Dual Pixels.

122 of 456

123 of 456

Smartphone DP data in the wild

124 of 456

Smaller aperture leads to smaller disparity, but still noticeable to the human eye

125 of 456

Hypothesis

In an ideal DP camera, the circle of confusion is evenly split between the two views and sum to a full circle
In real life, light may leak into the opposite side
How do we prove this property still generally holds?

126 of 456

Experiment

Capturing the PSF

The PSF models what a point light source looks like under different depths and camera configurations
Can also estimate PSF using image of grid of disks with known radius and spacing (the sharp image)

Consider a constant depth image patch G and the known sharp image F
Let H be the PSF that produces G when convolved with F

127 of 456

Are kernels symmetrical?

Estimate kernel for each view
Cross-correlate two sides

(mostly)Yes!

128 of 456

How is this useful for depth estimation

Radius of blur kernel is in relation to depth!

Thin lens model: distance between sensor and lens creates circular blur

Symmetry allows us to simplify parameters and solve an optimization problem for H

129 of 456

There’s more!

At a region of constant depth, can parameterize H by just the radius

Typically something like a Gaussian kernel
Proposed translating disk kernel is more realistic and accurate

130 of 456

Results

131 of 456

Results

132 of 456

Results

133 of 456

Results

134 of 456

Discussion

Advantages

Impressive that the method works on smartphone data
Looks reasonably good
Images taken rather arbitrarily, did not have to capture any training data (calibration experiment earlier was just to prove the symmetry property)

Disadvantages

As with other defocus methods, struggles with untextured surfaces
Sliding window optimization approach is O(number of constant depth windows), which is pretty slow
Can be sped up using a CNN

135 of 456

References

[1] Abhijith Punnappurath, Abdullah Abuolaim, Mahmoud Afifi, and Michael S. Brown. Modeling defocus-disparity in dual-pixel sensors. In IEEE International Conference on Computational Photography (ICCP), 2020.

[2] Shumian Xin, Neal Wadhwa, Tianfan Xue, Jonathan T. Barron, Pratul P. Srinivasan, Jiawen Chen, Ioannis Gkioulekas, and Rahul Garg. Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image. In ICCV 2021. https://imaging.cs.cmu.edu/dual_pixels/

[3] F. Mannan and M. S. Langer, “Blur calibration for depth from defocus,” in Conference on Computer and Robot Vision (CRV), 2016, pp. 281–288.

[4] T. Xian and M. Subbarao, “Depth-from-defocus: Blur equalization technique,” in SPIE: Society of Photo-Optical Instrumentation Engineers, 2006

136 of 456

Thank you!

Questions?

137 of 456

PatchMatch and

Content-Aware Image Editing

By: Alan Hsu

(15-463 Fall 2021)

138 of 456

PatchMatch

Algorithm

139 of 456

What is PatchMatch?

PatchMatch is an algorithm to produce a dense per-pixel correspondence (Nearest Neighbor Field, or NNF) between two images

Patch offset

140 of 456

How is Patch Match more efficient?

1. Dimensionality of Offset Space (much smaller)

2. Natural Structure of Images

(adjacent pixels likely have a similar offset)

3. Law of Large Numbers

(over multiple random offsets it is likely to find a good offset)

141 of 456

How does PatchMatch work?

142 of 456

Iter 0

Iter 0.25

Iter 0.5

Iter 0.75

Iter 1

Iter 5

Example of Convergence

The top image is reconstructed using the pixels from the bottom image

143 of 456

PatchMatch

Applications

144 of 456

Can you spot the difference?

145 of 456

How about this one?

146 of 456

Guided Image Completion

We compute the NNF from the hole to the rest of the image!

147 of 456

More Structural Image Editing Results

148 of 456

Per-Pixel Style Transfer

149 of 456

Oil Painting!

150 of 456

More Painting Results (Monet)

151 of 456

Citations and Acknowledgements:

1.) Prof. Ioannis Gkioulekas for the all the help he provided!

2.) PatchMatch paper: https://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/patchmatch.pdf

3.) SIFT Flow: https://people.csail.mit.edu/celiu/SIFTflow/

4.) Style Transfer: https://arxiv.org/pdf/1508.06576.pdf

5.) Vincent van Gogh and Claude Monet for their paintings

152 of 456

Thanks!

153 of 456

DiffuserCam for 3D Printing Applications

Joon Jang, jiwoong@andrew.cmu.edu

154 of 456

DiffuserCam

DiffuserCam is a compact and relatively simple computational camera for single-shot 3D imaging.

Due to the single-shot nature and affordability of components to build it, I’m exploring whether it’d be usable in 3D printing contexts.

155 of 456

How it Works

156 of 456

Theory: PSF Differences

Lateral changes leads to changes in placement in the PSF on the sensor.

Similarly, Vertical changes leads to changes in placement in the PSF on the sensor.

157 of 456

How It’s Made

158 of 456

How It’s Made

159 of 456

Calibration

160 of 456

Preliminary Results

3D File

3D Print

Reconstructed Print

161 of 456

Preliminary Results

3D File

3D Print

Reconstructed Print

162 of 456

Applying Computational Photography techniques to the WhiskSight Sensor

Teresa Kent

163 of 456

WhiskSight Sensor

Kent, Teresa A., et al. "WhiskSight: A Reconfigurable, Vision-Based, Optical Whisker Sensing Array for Simultaneous Contact, Airflow, and Inertia Stimulus Detection." IEEE Robotics and Automation Letters 6.2 (2021): 3357-3364.

164 of 456

What the Camera Sees

Kent, Teresa A., et al. "WhiskSight: A Reconfigurable, Vision-Based, Optical Whisker Sensing Array for Simultaneous Contact, Airflow, and Inertia Stimulus Detection." IEEE Robotics and Automation Letters 6.2 (2021): 3357-3364.

165 of 456

Current Application

Kent, Teresa A., et al. "WhiskSight: A Reconfigurable, Vision-Based, Optical Whisker Sensing Array for Simultaneous Contact, Airflow, and Inertia Stimulus Detection." IEEE Robotics and Automation Letters 6.2 (2021): 3357-3364.

166 of 456

What Improvements can be made

Handling situations which break the tracker

Tradeoffs between array size and accuracy

Novel Applications

167 of 456

Understanding the Camera Space/Overlap

Image Overlap at 50 mm

Two Camera’s View of the 3d world

168 of 456

Updated Sensor

Camera

169 of 456

Visual Representation of the Overlapping Space

170 of 456

Image Detection Through the Elastomer

171 of 456

Use the sharpness to distinguish the objects in front of and behind the elastomer

Luminance

Blurred Image

High Detail Image

=

-

172 of 456

Tested Different Thresholds for Sharpness and Sigma Values

Threshold

173 of 456

Separated Image

174 of 456

Separated Image

175 of 456

What’s Next

Create 3d Map of the external scene
Deblur the external scene

Quantize the improvement in localization of the whisker using two cameras.

Huixuan Tang, Scott Cohen, Brian Price, Stephen Schiller and Kiriakos N. Kutulakos, Depth from defocus in the wild. Proc. IEEE Computer Vision and Pattern Recognition Conference, 2017.

Heide, F., Rouf, M., Hullin, M. B., Labitzke, B., Heidrich, W., & Kolb, A. (2013). High-quality computational imaging through simple lenses. ACM Transactions on Graphics (TOG), 32(5), 1-14.

176 of 456

Thank You

177 of 456

All-in-focus Image Estimation from Dual Pixel Images

Akash Sharma Tarasha Khurana

Expected all-in-focus image

178 of 456

What are DP images and why we care about all-in-focus?

Some devices with Dual Pixel sensors

a) Traditional Sensor b) Dual-Pixel Sensor

Each pixel of the Dual-Pixel sensor is divided into two parts, left and right. It captures images much like a stereo camera with a small baseline.

All-in-focus images are used extensively in robotics applications. Existing robotics algorithms cannot run on defocus-blurred images.

179 of 456

Can we recover an all-in-focus image from a DP image?

Dual Pixel Images are two-sample light fields and each pixel integrates light from half the main lens aperture.

Idea: Use disparity of out-of-focus points as a guide to defocus the image. Problem is still under-constrained.

180 of 456

How has this been done before?

Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.

181 of 456

How has this been done before?

Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.

182 of 456

How has this been done before?

Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.

183 of 456

How has this been done before?

Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.

184 of 456

What are some limitations that can be addressed?

MPIs are good at rendering images fast but cannot represent scenes with continuous depths.
We can rather render defocus image observations from continuous representations such as Neural Radiance Fields.

Model defocus image using rendering from continuous volumetric representation.

Discrete depth representations can model only coarse relative ordering of pixels

185 of 456

End of Story? Why is this difficult?

With MPI you can blur image planes using 2D convolution with lens kernels.
With continuous representation, it is unclear how lens aberrations and spatially varying blur of lenses can be similarly convolved or modeled

Trace hourglass like cones through volume and weight regions continuously.

Easy to model blurring

186 of 456

What is a workaround? Thin-lens rendering

Bypass blurring altogether by sampling the lens aperture as an n x m grid i.e., render a lightfield.
For a NeRF¹, this means rendering an image at each of these n x m locations on the lens.
Then average left and right half of the lens aperture to develop a rendering model for dual pixel images

¹ We are skipping a lot of volumetric rendering details for NeRF.

Lens aperture

Image plane

Rays from aperture plane

187 of 456

So, does this work? Kind of!

Implementing this gives us a lightfield image where we have a 8 x 8 subaperture view of every pixel on the image.

We can render the left and right DP images from these by averaging the 8 x 4 halves of all the pixels.

8

8

188 of 456

So, does this work? Kind of!

Although rendered dual pixel images look good (PSNR: 43.13) …

Predicted Images

Left DP Image

Right DP Image

189 of 456

So, does this work? Kind of!

Although rendered dual pixel images look good (PSNR: 43.13) …

Target Images

Left DP Image

Right DP Image

190 of 456

So, does this work? Kind of!

The individual pinhole images (or all-in-focus images) look jagged and we need a way to enforce pixel-wise smoothness.

191 of 456

How can we enforce smoothness?

Through a regularization loss between neighbouring pixels, like the Smooth L1 loss.

Recall that default pixel sampling in NeRF is random.

192 of 456

How can we enforce smoothness?

Through a regularization loss between neighbouring pixels, like the Smooth L1 loss.

Recall that default pixel sampling in NeRF is random.

We modify to randomly sample only half the pixels and then add their x- and y- neighbours (chosen with equal probability).

193 of 456

Does enforcing smoothness help? Not really

All-in-focus images become excessively smooth (almost stretched) …

*Note that we did not have enough GPU cycles for hyperparameter tuning for any of the experiments

194 of 456

Does enforcing smoothness help? Not really

and PSNR drops on the dual pixel images from 43.13 to 32.6.

Predicted Images

Left DP Image

Right DP Image

195 of 456

Does enforcing smoothness help? Not really

and PSNR drops on the dual pixel images from 43.13 to 32.6.

Target Images

Left DP Image

Right DP Image

196 of 456

What is really going wrong?

What can still be done?

Even with a forward model defined, the problem is under-constrained.
Careful regularization with losses is required to obtain good results.

Brute force approach, really really slow
Improve aperture sampling by reusing ray samples
Use an implicit network to model light field directly

197 of 456

References

[1] Shumian Xin, Neal Wadhwa, Tianfan Xue, Jonathan T. Barron, Pratul P. Srinivasan, Jiawen Chen, Ioannis Gkioulekas, and Rahul Garg. Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image, ICCV 2021.

[2] Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R. and Ng, R., 2020, August. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision (pp. 405-421). Springer, Cham.

[3] Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R. and Srinivasan, P.P., 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. arXiv preprint arXiv:2103.13415.

[4] Sitzmann, V., Rezchikov, S., Freeman, W.T., Tenenbaum, J.B. and Durand, F., 2021. Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering. arXiv preprint arXiv:2106.02634.

198 of 456

3D Video�from�single view video

Nupur Kumari, Vivek Roy

199 of 456

Existing work: 3D shot photography

200 of 456

Method flowchart

Create a Layered Depth Image (LDI) using depth map

Find discontinuities in LDI by depth threshold

Create context region for in-painting background�(lower depth part)

201 of 456

Example Context and Discontinuity map

Note that each discontinuous edge is handled independently with its own context region

202 of 456

Failure cases

If the context regions extends to wrong regions, in-painting output is not ideal and leads to bleeding.

203 of 456

Similar example, where the road has white patches due to white color in shirt, though its not there in original frame.

Thus the method is quite sensitive to predicted depth values

Failure cases

204 of 456

Use segmentation to remove context from different layers of LDI. (We use detectron2 Mask RCNN)

Simple approach to prevent context bleeding

Before

After

205 of 456

Simple approach to prevent context bleeding

Before

After

206 of 456

Doesn’t solve the issue always. We can use context of other frames in video for in-painting

Before

After

207 of 456

Extended to video frame by frame

208 of 456

Depth estimation from video

Used CVD^[*] with updated flow prediction network for improved consistency based depth estimation.

Given camera parameters for each frame map depth frame to frame:

[*] Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. Consistent video depth estimation. ACM TOG (Proc. SIGGRAPH), 39(4), 2020.

209 of 456

Depth estimation from video

Flow prediction by RAFT

Use predicted flow to warp the

frame to frame and then find the 3D coordinate of warped image in coordinate frame.

Measure the similarity between the reprojections using flow and camera.

Teed, Zachary, and Jia Deng. "Raft: Recurrent all-pairs field transforms for optical flow." European conference on computer vision. Springer, Cham, 2020..

210 of 456

CVD

211 of 456

CVD + RAFT

212 of 456

Using more context from video

Note that each discontinuous edge is handled independently with its own context region

213 of 456

Using more context from video

Combined context

214 of 456

Using more context from video

Combined context

215 of 456

Using more context from video

216 of 456

217 of 456

Failure Case

218 of 456

Temporally consistent Nerf Representation

219 of 456

Nerf video representation

Current training uses flow to train a time consistent NeRF model

Li, Zhengqi, et al. "Neural scene flow fields for space-time view synthesis of dynamic scenes." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021..

220 of 456

Disentangling objects in Nerf representation

Use different nerf model for each object and fuse them together during rendering.

Promote predicted opacity to match the masks for object branch.

Li, Zhengqi, et al. "Neural scene flow fields for space-time view synthesis of dynamic scenes." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021..

221 of 456

Scene Flow from Light Field Gradients

Joelle Lim

222 of 456

Scene Flow: what is?

Given:

How did the scene move?

time t

t + 1

t + k

223 of 456

how…

224 of 456

Ray flow equation

225 of 456

Ray flow equation

Light field gradients

226 of 456

Ray flow equation

Light field gradients

Scene Flow Components

227 of 456

Light ray parameterization

Ray Flow due to scene motion

parallel shift of ray

228 of 456

229 of 456

=

Assuming ray brightness remains constant,...

230 of 456

First order taylor expansion

231 of 456

Ray flow equation

First order taylor expansion

Substitute

232 of 456

Ray flow equation

underconstrained

233 of 456

Ray flow equation

underconstrained

have to impose additional constraints!!!

234 of 456

Ray flow equation

Optical flow equation

235 of 456

Ray flow equation

Optical flow equation

236 of 456

Ray flow equation

Optical flow equation

Local:

Lucas Kanade

Global:

Horn Schunck

237 of 456

Lucas Kanade (Local)

Solve for V,

where

assumption: scene motion is constant in neighbourhood

238 of 456

Horn Schunck (Global)

Optimization problem to minimize

Scene motion variation

Minimize:

error term

smoothness term

239 of 456

some results

240 of 456

Lucas Kanade

241 of 456

my results

their chad result

Lucas Kanade

Horn-Schunck

Global + Local

242 of 456

Other results from paper

with enhancements ++�

Local + Global methods
Pyramid
Graduated Non Convexity iterations
Occlusions terms

243 of 456

Thank you!

Citations:

Ma, S., Smith, B. M., & Gupta, M. (2018). 3D scene flow from 4d light field gradients. Computer Vision – ECCV 2018, 681–698. https://doi.org/10.1007/978-3-030-01237-3_41

244 of 456

Reconstructing Refractive Objects

via Light-Path Triangulation

Emma Liu (emmaliu)

245 of 456

Why is reconstruction of transparent objects hard?

Because they don’t have a local appearance - we have to rely on observations of the scene behind the object.

Well that’s difficult. What can we do about it?

Apply knowledge of the scene and properties of refraction to light-path triangulation!

246 of 456

Refractive Light-Path Triangulation

Assuming that light intersects with the surface of an object at most twice (enters and exits), and with light propagation information of different views of the scene, solve a minimization problem finitely constraining the surface properties of the object.

247 of 456

Light Path Scene Model

Reference camera �vs. with validation cameras

For a pixel q, if our view of 3D world p is obscured by a transparent object, it takes a piece-linear light path and refracts through the object.

Assuming that the light path intersects with the surface exactly 2x, then we can solve for the remainders:

b: 1st point of intersection w/ object
l^b: ray from p to b
l^m: ray connecting l^b, l^f (refracts incident light in the object’s interior)
f: 2nd point of intersection w/ object

With position f and its surface normal n^f,

we can compute the depth of the object at that point.

248 of 456

Developing the Correspondence Map

For each pixel q, determine the first ray in the light path that indirectly projects through it: the first ray that intersects the object, l^b = L(q).

Key: Make use of two backdrop plane locations to determine two backdrop positions via stripe projection, then map pixel to the ray defined both.

249 of 456

Reconstruction via Minimizing Light-Path Consistency Error

Objective: For each light path, determine the surfel pair (f,n^f) that minimizes the reconstruction error: refracted ray l^m meets l^f and l^b at both ends as much as possible.

For each estimate surfel, compute l^mby tracing backwards and applying Snell’s Law with the index of refraction*.

Then the reconstruction error per camera is the shortest distance between the two rays:

* But wait, aren’t we trying to determine the IOR?? More on that later…

250 of 456

Triangulation Procedure: Depth Sampling

By refraction laws, the normals n^b and n^f can be computed based on position b, f.

Perform a brute-force search to determine the candidate locations of b and f:

Choose finite ranges along rays l^b, l^f.
Sample n points evenly along this range
For each sample pair, compute the normal n^f. Evaluate its error.

Depth computed with optimal surfel, camera pos:

251 of 456

Determining the Index of Refraction (IOR)

With >=4 cameras, the index of refraction can be computed. Repeat triangulation/optimal surfel-finding procedure for n pixels for m candidate IORs.

For reference, the IOR of K9 crystal glass (my Bohemian crystal) is ~1.509.

252 of 456

Results

Depth (above), normal maps for rotated views (below)

253 of 456

Thanks

Citations:

K. N. Kutulakos and E. Steger, “A theory of refractive and specular 3D shape by light-path triangulation,” International Journal of Computer Vision, 11-Jul-2007. [Online]. Available: https://link.springer.com/article/10.1007/s11263-007-0049-9.

*Steger, E. (2006). “Reconstructing transparent objects by refractive light-path triangulation”, Master’s thesis, Department of Computer Science, University of Toronto. http://www.cs.toronto.edu/ ~esteger/thesis.pdf.

*Figures and high-level algorithm explanation provided by this paper.

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon,and infographics & images by Freepik

254 of 456

Multi-modal 3D reconstruction

Robert Marcus (rbm@)

255 of 456

Background

Quality and price are typically proportional with LIDAR/TOF sensors
Useful implementations may be cost prohibitive for lower-end applications
Sol 3D scanner*

$800
1-0.1mm resolution

Microsoft Kinect**

$200 (new, 2014)
Single digit mm resolution

*https://scandimension.com/products/sol-3d-scanner-included-tent-3d-scanner-usb-cables-cover-and-scan-target

**https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3304120/

256 of 456

Background - Cont

TOF/LIDAR sensors have other shortcomings

Size/bulkiness
Results can be corrupted (i.e., noisy results depending on lighting)

Image based alternatives for 3d reconstruction

Photometric stereo (Assgn5)
Shape from shading
Etc.

Data based alternatives for 3d reconstruction

Monocular depth estimation

Lot of cool looking monocular stuff here: https://paperswithcode.com/task/depth-estimation

257 of 456

Background - Cont

Ideal sensor package would have all three properties
What if we tried ???

258 of 456

Background - Cont

Ideal sensor package would have all three properties
What if we tried

259 of 456

GCPNs - Big picture

Use aligned coarse* depth map to constrain shape (normals) from polarization of scene
Integrating over the constrained normals will yield a superior result to either method on their own

Kadambi et al had >10x resolution boost**

*1-5mm resolution

**5mm kinect resolution -> .3mm combined resolution

260 of 456

GCPNs - How

Very complicated. TLDR:
Normals from polarization by Fresnel equations
Crutch of the problem comes down to solving several separate systems

I.e., solve for Z to get improved normals
And more (physics based integration)

Significant additional work done to reduce ambiguities and distortions by physics based integration (see far right result vs raw fusion)

Image credit: https://web.media.mit.edu/~achoo/polar3D/polarized3D_poster.pdf

261 of 456

GCPNs - Results

Physics based poisson integration of fused data yielded superior results

Allows correcting refractive distortion and distortion of zenith angle
At a cost though…

Image 1 credit https://web.media.mit.edu/~achoo/polar3D/polarized3D_poster.pdf

Image 2 credit https://web.media.mit.edu/~achoo/polar3D/kadambi_new_england_vision2015.pdf

Table credit https://web.media.mit.edu/~achoo/polar3D/camready/supplement_iccv.pdf

262 of 456

GCPN - My results

Work in progress
Using additional (simpler) papers to guide process
Namely

Efﬁciently Combining Positions and Normals for Precise 3D Geometry (Nehab et al)

For sensor fusion

Depth from a polarisation + RGB stereo pair (Zhu et al)

For integration of the fused data

263 of 456

Credits

Special thanks to Prof. Yannis for helping me read through the core paper and suggest additional literature
Polarized 3D: High-Quality Depth Sensing with Polarization Cues

Achuta Kadambi and Vage Taamazyan and Boxin Shi and Ramesh Raskar

Efﬁciently Combining Positions and Normals for Precise 3D Geometry

Diego Nehab and Szymon Rusinkiewicz and James Davis and Ravi Ramamoorthi

Depth from a polarisation + RGB stereo pair

Dizhong Zhu and William A. P. Smith

Thank you for listening

264 of 456

Synthetic Depth-of-Field with a Mobile Phone

Arpita Nag

265 of 456

Motivation and Background

A shallow depth of field and resulting bokeh effect is an aesthetic quality in photos

Difficult to produce on typical smartphone cameras with small apertures, and DSLR’s are pretty expensive on their own.

DP data can serve as a 2D lightfield, or a stereo system with a tiny baseline -- maybe use this disparity to mimic bokeh!

266 of 456

Pipeline

Part 1 - Obtain a Defocus Map

Figure: Original paper which used 2D tile-matching to estimate flow in temporal sequence.

First, we must estimate a high resolution, sub-pixel disparity map.

We first use tile matching of one view against the other, and heuristics to produce a per-pixel flow field (to the other view) and per-pixel confidence.

Because image content is optically blurred based on distance from the focal plane, there is a shift, or disparity, between the two views that depends on depth and on the shape of the blur kernel. This system is normally used for auto- focus, where it is sometimes called phase-detection auto-focus.

We just look at -3 to 3 offset for 8x8 tiles and take the argmin, then use the two neighboring offsets to fit a “subpixel” disparity

We have a couple heuristics -- namely horizontal gradient, presence of a second minimum (though that didnt rly do much), and assuming that at least one neighboring tile has same tile-offset

Then we do a sort of nearest neighbor interpolation for the pixels where we take the sub-pixel shift at the bounding ones and take the argmin of this and then the next step

267 of 456

Pipeline (Cont)

Part 1.2 - Calibrate the Image

Sadly, smartphone cameras don’t totally obey the thin-lens approximation.

Objects at the same depth but different spatial locations can have different disparities due to lens aberrations and sensor defects.

268 of 456

Pipeline (Cont)

Part 1.3 - Final Depth / Defocus Map

Use bilateral filtering methods and confidence map to produce a smoothed out, in-painted disparity map.

SIDE NOTE: The original paper also uses trained convolutional neural networks to semantically segment out “people” if they appear in any pics and mask out the region to constant disparity.

269 of 456

Pipeline (End)

Part 2 - Rendering the Blur

Precompute a blur radius based on the difference between disparity at location “x” and the target in-focus disparity.

With optimizations for time, perform a “scatter” operation on the image (similar to convolution)

Once “blurring” is done for some disparity ranges, use alpha compositing to render final image

270 of 456

Results (Pt 1)

Disclaimer: Calibration not done yet, and rendering is still done with discrete, quantized “disparity bands”.

271 of 456

Results (Pt 2)

272 of 456

More Results and Challenges

Current rendering implementation is still painfully slow

Solution? Blur pixels in the gradient domain or perform blur on downsampled image

Initial data is still too noisy, even after having captured a burst of images
Blurred result is unrealistically smooth, so we need to synthetically add noise as in the paper

273 of 456

References

Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: virtual reality video. ACM Trans. Graph. 35, 6, Article 198 (November 2016), 13 pages. DOI:https://doi.org/10.1145/2980179.2980257

Barron J.T., Poole B. (2016) The Fast Bilateral Solver. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_38

Neal Wadhwa, Rahul Garg, David E. Jacobs, Bryan E. Feldman, Nori Kanazawa, Robert Carroll, Yair Movshovitz-Attias, Jonathan T. Barron, Yael Pritch, and Marc Levoy. 2018. Synthetic depth-of-field with a single-camera mobile phone. ACM Trans. Graph. 37, 4, Article 64 (August 2018), 13 pages. DOI:https://doi.org/10.1145/3197517.3201329

274 of 456

Thank you!

Any questions?

275 of 456

Capturing Lightfields

Gaurav Parmar

276 of 456

Task Goal

277 of 456

Hardware for Capturing the Lightfields

278 of 456

Example lightfields

279 of 456

Refocusing the Image

280 of 456

Aspects of this approach

(+) the lighfields captured are great

(-) very expensive equipment is needed

- Can I capture something like this?

281 of 456

Unstructured Lightfields

Can we capture the lighfield by moving the camera around?
Aligned later using a template matching method

282 of 456

Challenges in this Procedure

The motion of the camera needs to be carefully planned

283 of 456

Solution - filtering out frames

Track the position of each frame in the video

284 of 456

Initial Result Captured

285 of 456

Focus on the stone

286 of 456

Focus on the Text

287 of 456

Issues in the results

The results captured still do not look great�
Interface to close the loop

288 of 456

In progress tasks

289 of 456

Thank You!

290 of 456

Structured Light 3D Scanning

Sarah Pethani

291 of 456

Structured Light Scanning

Triangulation:

Replace a camera with a projector
Solve correspondence problem by searching the pattern in the camera image

292 of 456

Time Multiplexing: Temporal Binary Codes

Time Multiplexing: create a codeword by projecting many patterns on scene

First projected pattern is most often the most significant bit
Follow coarse-to-fine paradigm
This process is called “decoding”

Temporal binary codes: display sequence of white and black bars

White bar = 1, black bar = 0

293 of 456

294 of 456

295 of 456

Problems with Basic Structured Light Scanning

Incident illumination is large problem for surface reconstruction

Short-range effects: subsurface scattering and defocus
Long-range effects: interreflections

296 of 456

How to fix?

For short range effects, want to use ONLY low-frequency patterns
For long range effects, want to use ONLY high-frequency patterns

Decompose low-frequency pattern into multiple high-frequency patterns, and project these instead

When performing xor on the binary codes resulting from these patterns, we result in binary code for low-frequency pattern
Allows us to pick between low-frequency and high-frequency in different scenarios, without projecting a different set of patterns

297 of 456

Deciding between low-freq and high-freq

Project 2 low-freq patterns, and 2 high-freq patterns
In case of short-range effects:

The low-freq patterns will agree with one another and be correct

In case of long-range effects:

The high-freq patterns will agree with one another and be correct

Intuitively, just perform a consistency check to decide which depth is correct

298 of 456

299 of 456

300 of 456

Conventional Gray

Max-Min SW Gray

XOR 02

XOR 04

Combined

301 of 456

Thanks!

302 of 456

Yingsi Qin

Andrew Maimone, Andreas Georgiou, and Joel S. Kollin. 2017. Holographic near-eye displays for virtual and augmented reality. ACM Trans. Graph. 36, 4, Article 85 (July 2017), 16 pages.

303 of 456

304 of 456

305 of 456

306 of 456

307 of 456

308 of 456

309 of 456

310 of 456

311 of 456

312 of 456

313 of 456

314 of 456

315 of 456

316 of 456

317 of 456

318 of 456

319 of 456

320 of 456

321 of 456

Complete 3D Object Reconstruction from Surround Structured Light Scanning

Nathan Riopelle

322 of 456

Project Goal (As Proposed)

Create a 3D reconstruction of an object using a single camera under the presence of structured light. While traditional approaches for capturing a 360-degree scan of an object require a turntable and multiple images, this project will accomplish the same using a pair of planar mirrors and an orthographic projector constructed with a Fresnel lens.

323 of 456

Project Goal (As Proposed)

Orthographic projector and planar mirrors generate four “virtual” cameras

1

2

3

4

Based off of 2007 3DIMPVT paper

324 of 456

Calibration is Immensely Complex

Intrinsic camera calibration

Intrinsic projector calibration

Projector / Fresnel lens focal point alignment

Mapping of scan lines to planes in 3D

Estimation of position/pose of mirrors relative to camera

325 of 456

First Attempt at Surround Scanning

326 of 456

Revised Project Goal

Create a single-view 3D reconstruction of an object using a single camera under the presence of structured light. As a stretch goal, attempt surround scanning using a pair of planar mirrors and an orthographic projector constructed with a Fresnel lens.

327 of 456

Single-View Scanning Setup

328 of 456

Intrinsic Camera + Projector Calibration

Calibrated by projecting structured light Gray codes onto a 9x6 checkerboard

Zoomed in

Zoomed out

Projector

Camera

329 of 456

3D Scan Targets

Larger,

Simple geometry

Many inter-occlusions,

Detailed

Large depth variation in small area

330 of 456

Projected Area Mask

331 of 456

Projector Row + Column Decodings

332 of 456

3D Reconstruction Results

333 of 456

3D Reconstruction Results

334 of 456

3D Reconstruction Results

335 of 456

References

1] Gupta, M., Agrawal, A., Veeraraghavan, A., & Narasimhan, S. G. (2013). A Practical Approach to 3D Scanning in the Presence of Interreflections, Subsurface Scattering and Defocus. International Journal of Computer Vision, 102(1–3), 33–55. https://doi.org/10.1007/s11263-012-0554-3

[2] Gupta, M., Agrawal, A., Veeraraghavan, A., & Narasimhan, S. G. (2011). Structured light 3D scanning in the presence of global illumination. CVPR 2011, 713–720. https://doi.org/10.1109/CVPR.2011.5995321

[3] Lanman, D., Crispell, D., & Taubin, G. (2007). Surround Structured Lighting for Full Object Scanning. Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), 107–116. https://doi.org/10.1109/3DIM.2007.57

[4] Lanman, D., & Taubin, G. (2009). Build your own 3D scanner: 3D photography for beginners. ACM SIGGRAPH 2009 Courses on - SIGGRAPH ’09, 1–94. https://doi.org/10.1145/1667239.1667247

[5] Moreno, D., & Taubin, G. (2012). Simple, Accurate, and Robust Projector-Camera Calibration. 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, 464–471. https://doi.org/10.1109/3DIMPVT.2012.77

[6] Patent Images. (n.d.-a). Retrieved October 25, 2021, from https://pdfaiw.uspto.gov/.aiw?PageNum=0&docid=20170206660&IDKey=&HomeUrl=%2F

[7] Salvi, J., Pagès, J., & Batlle, J. (2004). Pattern codification strategies in structured light systems. Pattern Recognition, 37(4), 827–849. https://doi.org/10.1016/j.patcog.2003.10.002

336 of 456

IMU-Aided Deblurring with Smartphone

Jason Xu, Sanjay Salem

337 of 456

Premise

338 of 456

Choices to Increase Exposure

Increase ISO

High amount of noise

Decrease aperture

Impossible on smartphones and smaller cameras

Decrease shutter speed

Increased blur (unless you have very steady hands)

339 of 456

Naïve Method - Blind Deconvolution

Guess blur kernel using various heuristics
Deconvolve image using regularized least squares optimization
Lots of assumptions and wide range of plausible solutions

340 of 456

What If…?

We could do better?

341 of 456

Improved method:

IMU Aided Deconvolution

Joshi, Neel et al. Image Deblurring using Inertial Measurement Sensors.

342 of 456

High Level Approach

Use IMUs to record camera movement
Compute blur kernel from rotation and translation data
Use blur kernel from motion to deconvolve image

343 of 456

Our Improvements

Using smartphone instead of complex capturing rig
Incorporating magnetometer data to increase DoF

344 of 456

Data Collection

Android app logging gyroscope, accelerometer and magnetometer data, synchronized to start of exposure
Exposure time of 240ms

Huai, Jianzhu et al. Mobile AR Sensor Logger.

345 of 456

Calculation of Camera Motion

Integrate linear acceleration twice to get translation data
Use Madgwick filter with gyro, accelerometer and magnetometer data to get attitude (rotation data)

Better method than original paper

346 of 456

Forming the Blur Kernel

347 of 456

Deconvolution

Regularized Least-Squares Optimization

348 of 456

Example Output

349 of 456

Example Output

350 of 456

Example Output

351 of 456

Advantages

Takes the guesswork out of forming the blur kernel
Less computationally intensive - fewer unknowns
Better quality

352 of 456

References

Joshi, Neel, et al. “Image Deblurring Using Inertial Measurement Sensors.” ACM Transactions on Graphics, vol. 29, no. 4, ACM, 2010, pp. 1–9, https://doi.org/10.1145/1778765.1778767.

Most formulas and images come from this paper as well.

Huai, Jianzhu, et al. The Mobile AR Sensor Logger for Android and iOS Devices. 2019. https://github.com/OSUPCVLab/mobile-ar-sensor-logger.

Logging app on Android device

353 of 456

Thank you!

354 of 456

Structured Light Blocks-World

Keshav Sangam

355 of 456

Traditional Structured Light Scanning

Triangulation:

Replace a camera with a projector
Solve correspondence problem by searching the pattern in the camera image

Why is this inefficient?

Correspondence problems traditionally take lots of compute power
Going from a 3D point cloud to a 2D planar representation wastes memory, and the planes found are often inaccurate
More accuracy often necessitates multiple images being captured

356 of 456

Blocks-World Structured Light Scanning

357 of 456

Planes from Known Correspondences

Given a known image feature corresponding to a known pattern feature, we can use geometry to find the plane parameters such as the surface normal and shortest distance D from from the camera origin to the plane.

358 of 456

Planes from Unknown Correspondences

359 of 456

Pattern Design

360 of 456

Did it work?

Not yet; many different problems.

361 of 456

Multi-flash Photobooth

Flash based computational imaging filters

Vivian Shen, vhshen

362 of 456

What can we do by controlling capture conditions with flash?

363 of 456

Hardware setup

364 of 456

Final Hardware

365 of 456

Problem: difficult to distinguish depth discontinuities and edges

366 of 456

Canny Edge Detector

367 of 456

Canny Edge Detector

Multi-flash Edge Detection

368 of 456

Multi-flash Edge Detection

Four images: up, down, left, right flashes

each flash image:

normalize by total grayscale mean

divide by max grayscale isolate shadows

sobel filter on shadow images

hysteresis filter on silhouettes

369 of 456

Detecting Silhouettes / Simple Edge Map

370 of 456

371 of 456

Applications

372 of 456

Shadow Removal

Simple composite of all non-shadowed regions from every image (taking the max RGB value from each image)

373 of 456

Stylized Edge Rendering

Using the confidence map of the edge detections (composited from all flash images) to stylize the image

374 of 456

De-emphasized texturing

Distinguish foreground from background using depth discontinuities

Create a mask using the edge pixels and the texture values near the edge pixels on the “foreground” side

Apply gradient of textures based on euclidean distance

375 of 456

Problem: segmenting foreground and background

376 of 456

Flash Matting

377 of 456

Not much change in background because it is distant

378 of 456

Not much change in background because it is distant

379 of 456

Base segmented foreground image

380 of 456

Flash Matting

Foreground

381 of 456

Bayesian Flash Matting

Generate trimap and apply Bayesian

382 of 456

Joint Bayesian Flash Matting

383 of 456

Problem: shading details lost in certain lighting conditions

384 of 456

Detail Enhancement (not implemented yet)

Based on the shadowing introduced from the flash at different angles, different features are highlighted (and shadowed)

385 of 456

Multiscale Decomposition (per image)

386 of 456

Synthesis

387 of 456

Examples

388 of 456

Examples

389 of 456

Bibliography

Raskar, Ramesh, Jingyi Yu, and Adrian Ilie. "A non-photorealistic camera: Detecting silhouettes with multi-flash." ACM SIGGRAPH 2003 Technical Sketch (2003).
Raskar, Ramesh, et al. "Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging." ACM transactions on graphics (TOG) 23.3 (2004): 679-688.
Sun, Jian, et al. "Flash matting." ACM SIGGRAPH 2006 Papers. 2006. 772-778.
Fattal, Raanan, Maneesh Agrawala, and Szymon Rusinkiewicz. "Multiscale shape and detail enhancement from multi-light image collections." ACM Trans. Graph. 26.3 (2007): 51.

390 of 456

Event-Based Video Frame Interpolation

An explanation and implementation of TimeLens[1]

Gustavo Silvera

CMU 15-463 Fall 21

391 of 456

Two cameras

Video Camera

(+) Captures full RGB images (720p, 1080p, etc.)
(-) Low temporal resolution (24hz, 30hz, 60hz, etc.)

Event Camera

Captures “events” asynchronously
Acts similarly to human retina [1]
(+) Very high temporal resolution (microseconds)
(+) Very high DR & contrast sensitivity
(-) Typically lower spatial resolution (~240p)
(-) $$$

392 of 456

Two cameras

Video Footage:

- still frames

Event Data:

- dense motion

Image Credit: By TimoStoffregen - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=97727937

393 of 456

Combining the two

What if we leverage the high temporal resolution of the event cameras with the high spatial resolution of the video camera to generate a video with the best of both?

Low FPS video + High FPS events => High FPS video

Image Credit: TimeLens Presentation: S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, D. Scaramuzza, CVPR 2021

394 of 456

How this works (1/2)

Image Credit: TimeLens Presentation: S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, D. Scaramuzza, CVPR 2021

What do we have to work with?

395 of 456

How this works (2/2)

[1] Time Lens: Event-based Video Frame Interpolation. S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, D. Scaramuzza, CVPR 2021

A warping operation is done on the boundary frames according to the event sequence optical flow
The warping is refined by computing residual flow (and warping again)
Interpolation via synthesis is performed (directly fuses keyframe information and event sequences)
Optimally combine all three above approaches using an attention-based averaging method

These modules are designed to marry the advantages of synthesis VFI and warping VFI

Synthesis: (+) lighting robustness, (+) sudden changes, (-) texture distortion, (-) noise robustness
Warping: (-) lighting robustness, (+) non-linear motion, (+) noise robustness

(Read more in Section 2 of their paper[1])

How they perform VFI (Video Frame Interpolation)

396 of 456

Example Output

Video (30hz) Events

VFI Video (210hz)

Note that this data was provided by the authors of the paper, but these outputs were generated by my implementation of TimeLens.

For other cool demos like this, see their webpage

(http://rpg.ifi.uzh.ch/TimeLens.html)

397 of 456

My data capture

1920x1080@60hz

320x240@5000hz

My setup

Their setup

398 of 456

Unforeseen challenges

No synchrony between (my) cameras!

Need to synchronize time (for a common timeline between events and frames)
Need to synchronize space (both cameras see the same thing)

Synchrony Point

399 of 456

My results (1/3)

Slowed down (15fps) Slowed down (60fps)

400 of 456

My results (2/3)

Slowed down (30fps) Slowed down (60fps)

401 of 456

My results (3/3)

Slowed down (30fps) Slowed down (60fps)

402 of 456

Thank you!

Questions?

References:

ETH Zurich Robotics & Perception Group: https://rpg.ifi.uzh.ch

Time Lens: Event-based Video Frame Interpolation: S. Tulyakov*, D. Gehrig*, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, D. Scaramuzza, IEEE Conference on Computer Vision and Pattern Recognition, 2021.

403 of 456

Dual Photography

Rachel Tang

404 of 456

Dual Photography

Use Helmholtz Reciprocity to produce a virtual image from perspective of the projector without changing setup of scene

405 of 456

Ways of Capturing Image Dataset

Brute Force Algorithm: Illuminate one pixel on the projector at a time

Fixed Scanning: Illuminate several pixels at set interval

406 of 456

Ways of Capturing Image Dataset

Adaptive Multiplexed Illumination: recursively calculate which frame a pixel should be illuminated in based on conflicts

Bernoulli Binary Patterns: illuminate random pixels based on uniform distribution

407 of 456

Computation of the Matrix T

Captured matrices C (m*n, k) and P (p*q, k) → want to calculate T (m*n, p*q)

C = T P
C^T = P^TT^T

This gives us m*n least squares problems to solve to get matrix T!

In the Ax = b least squares problem,

A → P^T
b → c_i (one row of the matrix C)
x → t_i (one row of the light transport matrix T)

408 of 456

Intermediate Results + Challenges

With fixed scanning (debugging images):

Challenges:

Takes a long time to run the algorithms
Requires a large amount of memory

409 of 456

References

Sen, Pradeep, and Soheil Darabi. “Compressive Dual Photography.” Computer Graphics Forum, vol. 28, no. 2, 2009, pp. 609–618., https://doi.org/10.1111/j.1467-8659.2009.01401.x.

Sen, Pradeep, et al. “Dual Photography.” ACM SIGGRAPH 2005 Papers on - SIGGRAPH '05, 2005, https://doi.org/10.1145/1186822.1073257.

410 of 456

Depth Estimation with Two Taps �on Your Phone

Chih-Wei Wu | 15663, Fall 2021

411 of 456

Introduction

Depth from Focus (DfF)

412 of 456

Approach

Ref: Suwajanakorn, et al. "Depth from focus with your mobile phone." CVPR, 2015.

Focal stack

alignment

Focus measure

Depth prediction

Depth refinement

413 of 456

Challenge

Focal stack

alignment

Focus measure

Depth prediction

Depth refinement

Challenge 1:

Camera shake, motion

Challenge 2:

Noisy depth estimation

No-texture region

414 of 456

Dealing with Camera Shake

Ref 1: Surh, et al. "Noise robust depth from focus using a ring difference filter." CVPR, 2017.

Ref 2: Farnebäck, Gunnar. "Two-frame motion estimation based on polynomial expansion." SCIA, 2003.

Ref 3: Teed, et al. "Raft: Recurrent all-pairs field transforms for optical flow." ECCV, 2020.

Homography¹

Convention Optical Flow²

Deep

Optical Flow³

1st improvement

Input stack

415 of 456

Dealing with Camera Shake

Homography¹

Convention Optical Flow²

Deep

Optical Flow³

Objects are sheared

Object edges are distorted

No obvious shear or distortion!

Ref 1: Surh, et al. "Noise robust depth from focus using a ring difference filter." CVPR, 2017.

Ref 2: Farnebäck, Gunnar. "Two-frame motion estimation based on polynomial expansion." SCIA, 2003.

Ref 3: Teed, et al. "Raft: Recurrent all-pairs field transforms for optical flow." ECCV, 2020.

416 of 456

Dealing with Noisy Depth Estimation

Handcraft focus measure is not robust enough
Idea: Train CNN to learn to estimate depth

CNN can learn powerful feature as focus measure
CNN can learn object shape prior to populate depth for no-texture region

Ref: Hazirbas, et al. "Deep depth from focus." ACCV, 2018.

2nd improvement

417 of 456

Dealing with Noisy Depth Estimation

Problem with previous work

No feature exchange between stack images
Asking the network to predict actual depth is ill-posed

Non-learnable

1x1 conv

Depth

(L2 loss)

Focal

Stack

Learnable

1D conv

Focal

Stack

1D pool

Per-pixel classification of infocus image

(Cross-entropy loss)

Proposed (To be done)

DfF = 1 row of AFI

418 of 456

Comparing Depth Estimation Method

Ref: All-in-focus image

LoG + Gaussian Blur

Deep network

419 of 456

Last Secret Weapon

Focal stack

alignment

Focus measure

Depth prediction

Depth refinement

Formulate as MRF multi-label problem,

use graphcut to minimize energy function:

Unary term

Inverse of pixel sharpness

Pairwise term

Inverse of depth smoothness

Ref: AIF image

(LoG + Gaussian)

420 of 456

Result

Ref: All-in-focus image

LoG + Gaussian Blur

Deep network

LoG + Gaussian Blur

+ Depth refinement

421 of 456

Result

Ref: All-in-focus image

LoG + Gaussian Blur

Deep network

LoG + Gaussian Blur

+ Depth refinement

422 of 456

More Result

Ref: All-in-focus image

LoG + Gaussian Blur

Deep network

LoG + Gaussian Blur

+ Depth refinement

423 of 456

Extreme Case: Super Large Camera Motion

Ref: All-in-focus image

LoG + Gaussian Blur

Deep network

LoG + Gaussian Blur

+ Depth refinement

424 of 456

Thank you!

425 of 456

3D Scanning

(But not with a Stick)

Daniel Zeng

426 of 456

Assignment 6 - 3D Scanning with a Stick

Simple setup:

427 of 456

Assignment 6 - 3D Scanning with a Stick

Wave the stick:

428 of 456

Assignment 6 - 3D Scanning with a Stick

Frog!

429 of 456

Problems with Stick:

It’s not very good
Manually move the stick
Lots of frames for processing

430 of 456

Better method: Structured lighting

Use a projector with a specific coding pattern in order to capture the plane

Before it was using the shadow from the stick to calculate the plane

Coding pattern matters a lot!

431 of 456

Discrete Codes:

Binary

Gray

432 of 456

Example output:

Statue in white light

Reconstruction with Gray code

433 of 456

Continuous Codes:

Ramp

Triangle

Sinusoid

Hamiltonian

434 of 456

Continuous Codes:

Curve length is the geometric representation of the code

Longer curve = noise more spread out = better code for 3D scanning

435 of 456

Results

Have not gotten any yet
Here are some comparisons from the paper:

436 of 456

Acknowledgement

Mohit Gupta, Nikhil Nakhate; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 87-102, https://openaccess.thecvf.com/content_ECCV_2018/papers/Mohit_Gupta_A_Geometric_Perspective_ECCV_2018_paper.pdf

Hamiltonian Codes

Build Your Own 3D Scanner course: http://mesh.brown.edu/byo3d/source.html

3D Scanning with structured lighting

437 of 456

Thank you!

438 of 456

Image Colorization Using Optimization

Joyce Zhang

439 of 456

Background

Colorization typically requires:

Image segmentation + tracking segments over image sequences
Lots of user input/intervention

Levin’s paper “Colorization using Optimization” describe a new interactive colorization technique that doesn’t precise manual segmentation.

(Levin, 2004)

440 of 456

Algorithm

Operates on the basic principle that neighboring pixels should have similar colors if their intensities are similar (thus making it edge aware)
Working in YUV color space

Y → intensity
U, V → chrominance (encode color)

compute the weight matrix W that depends on the similarities in the neighbor intensities for a pixel from the original gray-scale image.
Optimization problem → solve large, sparse system of linear equations; use the user imputed colors as constraints

441 of 456

RGB

YUV

Weight calculated based neighbors values

Set up Linear System:

Compute weight matrix W of size (imgsize * imgsize)

Setup b based on visual cues provided by the user

Solve linear system of sparse matrix

442 of 456

Results

Not bad!

443 of 456

Idea: Can we use this on non-photorealistic images?

Test: kind of works, but not very well.

444 of 456

Input image filtered with GradientShop, algorithm starts to show strong artifacts.

445 of 456

Discussion

Seems to be very particular about the user input, not sure if it’s a problem with my code, or the algorithm itself
Could be implemented iteratively with Jacobian, which speeds it up
Potentially can be used as an assistive tool in creative practices. Modern image/video coloration seems to adopt ML-based algorithms instead of user-input-based coloration.

446 of 456

Next Steps

Try to implement the “Colorization filter” from GradientShop, which seems to provide a better algorithm.

447 of 456

Papers referenced

Bhat, Pravin & Zitnick, C. & Cohen, Michael & Curless, Brian. (2010). GradientShop: A gradient-domain optimization framework for image and video filtering. ACM Trans. Graph.. 29.

Levin, A., Lischinski, D., & Weiss, Y. (2004). Colorization using optimization. In ACM SIGGRAPH 2004 Papers (pp. 689-694).

448 of 456

Image Stylization

Robin Zheng

449 of 456

Image stylization

(Results from Image Style Transfer Using Convolutional Neural Network by Gatys)

450 of 456

VGG-19

451 of 456

Content Representation

orginal content

conv1_1

452 of 456

Style representation

orginal style

conv1_1

453 of 456

Style Transfer

454 of 456

Some Result (for now)

455 of 456

Reference

Gatys, etc. Image Style Transfer Using Convolutional Neural Networks， https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf

456 of 456

Thank you!