Remaining course logistics
- Gradescope submission site available.
Class evaluation*s* – please take them!
Today’s judges
Matthew O’Toole
Jun-Yan Zhu
Computational Periscopy with an Ordinary Camera
BY: JOSH ABRAMS
Imagine…
Imagine…
???
Imagine…
Imagine…
Imagine…
Sounds awfully contrived don’t’cha think?
Alright, so how does this work?
Alright, so how does this work?
[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.
Alright, so how does this work?
1. Make assumptions to simplify the modified rendering equation
[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.
Alright, so how does this work?
2. Estimate the occluder position and use this to estimate the lightfield matrix
[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.
Alright, so how does this work?
3. Solve least squares problem with total variance regularization
[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.
Why the occluder?
[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.
Did it work for them?
[1] Charles Saunders, John Murray-Bruce, and Vivek K Goyal. Computational periscopy with an ordinary digital camera. Nature, 565(7740):472–475, 2019.
Did it work for you?
No :( — Will rant if there's time
Thanks! Any Questions?
Citations
Acoustic lenses
(Deblurring weird lenses)
Hossein Baktash
Optics (EM waves)
Acoustics (mechanical waves)
Acoustics + Optics ??
Refraction
Depends on the material
What do acoustic waves have to do with this?
Pressure in the material changes (spatially and temporally)
This changes refractive index at every point
Too hard!
Let’s use this one just for water
Where c is a constant and p is pressure
Change in refraction
So, it's hard to affect light with mechanical waves
But it's possible!
Ignore the wires!
Continuous refraction:
Pressure profile
(transducer)
US OFF
US ON
Can we do imaging with this?
~1mm
Pressure pattern
Results
Simulated data
Experimental data
More interesting patterns
Mode 0 pattern:
Mode 2:
Mode 2
Beam pattern in Mode 2
3 PSFs
PSFs are to large to deconvolve!
Ignore the outer parts
An image under mode 2
Top left pinhole
Center pinhole
Top center pinhole
Pulsing light in sync with ultrasound gives u this:
The inner part takes line integrals only
Project all pixels on this line
Radon transform
In simulation
Reconstruction
Improvement with a 1D deconvolution
1D PSFs
True image Radon and inverted
Sharpened mode 2 reconstructed
More improvement with more sets of images
Captured images
The setup:
Only 8 rotations for now (actually 4)
1D deconvolve
After 1D deconvolving
A sparse target image
More rotations:
Will it work?
Thank You!
Yi-Chun Chen
Building
Diffuser Camera for Photography
What is DiffuserCam?
My DiffuserCam
Raspberry Pi
Pi camera with lens removed
Sensor!
Lens!
My DiffuserCam
Scotch Tape as the diffuser
Black paper to construct aperture
Why could it work?
𝒗: a 2D array of light intensity values (scene), � the sum of many point sources of varying intensity and position
b: a 2D array of pixel values on the sensor
𝒉: PSF
Why could it work?
Solving a deconvolution problem.
Cropping function makes it not invertible.
Use alternating direction method of multipliers (ADMM) to optimize the reconstruction.
Calibration and images
PSF - Out of focus
PSF - (Almost) in focus
Sensor reading of an object
Results
Results
Results
Results
Results
What’s more
Acknowledgement
Ching-Yi Lin (ECE)
Wu-Chou Kuo (III)
Joon Jang
Prof. Yannis
Convolution Color Constancy for Mobile Device Photography
Yutong Dai
Background
Color Constancy
— What color is the light illuminating the scene?
— ? —>
[2]
[2]
Goals
Requirements for Color Constancy
— speed of processing, mitigation of uncertainty, � temporal cohesion for frames, account for low input resolution
Objective
— generalizability of introduced algorithm to photograph sets other than the ones utilized and curated from DSLR cameras
[1]
Algorithm: Convolutional Color Constancy (CCC)
Discriminative Learning
�
— Optimizing:
[2]
Algorithm: CCC
Efficient Filtering
[2]
naive histogram depiction
Algorithm: CCC
Generalization
— create more potential inputs with the same properties and histograms
Sample Results
— two sources of lighting
— only room lighting
Citations
Thank you!
Dual Photography
Ravi Dudhagra
Background
Light Source
Camera
Scene
Prime Image
Helmholtz Reciprocity
fr(ωi → ωo) = fr(ωo → ωi)
(BRDF is the same going from A→B as from B→A)
Light
Background
(virtual)
Light Source
(virtual)
Camera
Scene
Dual Image
Dual Photography:
technique to interchange the lights and cameras in a scene
Helmholtz Reciprocity
fr(ωi → ωo) = fr(ωo → ωi)
(BRDF is the same going from A→B as from B→A)
Implementation
Camera
Scene
Projector
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
p
n
q
m
i
j
pixel i (projector-space)
maps to pixel j (camera space)
Call this mapping T (mn x pq)
Implementation
Camera
Scene
Projector
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
p
n
q
m
c = T p
p is some pattern we project (pq x 1)
c is the resulting camera image (mn x 1)
Implementation
Camera
Scene
Projector
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
p
n
q
m
p’’ = TT c’’
c’’ is some pattern we project
(from the POV of the camera)
p’’ is the resulting dual image
(from the POV of the projector)
Acquiring T
(naive approach: capture an image for each projector pixel)
TWO PROBLEMS
Acquiring T
Use blocks
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
p
q
1
2
3
4
Efficiently
Acquiring T
Use blocks
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
p
q
1
2
3
4
1
2
3
4
Efficiently
Acquiring T
Use blocks
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
p
q
1
2
3
4
1
2
3
4
Efficiently
Acquiring T
Use blocks
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
p
q
5
1
2
3
4
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Efficiently
Acquiring T
Use blocks
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
p
q
5
1
2
3
4
6
5
6
7
8
7
8
5
6
9
10
7
8
11
12
Efficiently
Setup
Projector
Webcam
Scene
Results
Prime image
Dual image
Capture sequence
Results
| # Subdivision levels | # Blocks | File size of T | Raw size of T matrix |
Coffee Mug | 16 | 943218 | 66MB | 3.8TB |
Box/cards | 15 | 576833 | 37.7MB | 3.8TB |
Glass/spoon | 15 | 159825 | 11.2MB | 3.8TB |
Camera: (960 x 540)
Projector: (1280 x 720)
Applications
Capturing lightfields
thanks!
Any questions?
?
A Comparison of Various Edge-Aware Filtering Techniques
By Gilbert Fan
Bilateral Filter
Simple weighted average where weights are determined by similarity (sig_r) and distance (sig_s).
Domain Transform
Break image into horizontal and vertical 1D signals (C + 1 dimensions).
Domain transform into 1D, must be isometric.
Apply a 1D filter to the transformed data repeatedly for set number of iterations.
Guided Filter
“Similar” to joint bilateral filter but instead of computing weights from guide image, the output is a linear transformation of the guide image with coefficients determined by the input image, calculated window by window.
OutputWindow = a * GuideWindow + b
Where a, b are determined by the input image window
High variance around pixel => a = 1, b = 0
Flat patch => a = 0, b = average of window
Laplacian Filter
Constructs a laplacian pyramid pixel by pixel from remappings of the input image using values of its gaussian pyramid, collapsing them to form the final image.
Bilateral
Domain
Guided
Laplacian
Original
Bilateral
Domain
Guided
Laplacian
Original
Bilateral
Domain
Domain
Guided
Laplacian
Citations
CHEN, J., PARIS, S., AND DURAND, F. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics (Proc. SIGGRAPH) 26, 3.
Eduardo S. L. Gastal and Manuel M. Oliveira. "Domain Transform for Edge-Aware Image and Video Processing". ACM Transactions on Graphics. Volume 30 (2011), Number 4, Proceedings of SIGGRAPH 2011, Article 69.
He, K., Sun, J., & Tang, X. (2013). Guided Image Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1397–1409. https://doi.org/10.1109/tpami.2012.213
Paris, S., Hasinoff, S. W., & Kautz, J. (2015). Local laplacian filters. Communications of the ACM, 58(3), 81–91. https://doi.org/10.1145/2723694
2D Light Direction Estimation Analysis Using Different Color Spaces
Mary Hatfalvi
2D Light Direction Estimation
Input Image
Contours traced around image mask
2D Light Source Estimation Algorithm [1]
L = Light Direction
N = 2D Normal Vectors (found from 2 connecting contour points)
I = Intensity Value of the Light interpolated at the location of the normal vector
This least squares equation can be rewritten as a pseudo inverse solution
Least Squares Solution for Light Direction [1]
M Matrix Definition [1]
[1] Kee, Eric & Farid, Hany. (2010). Exposing digital forgeries from 3-D lighting environments. 10.1109/WIFS.2010.5711437.
Color Spaces
Grayscale
XYZ
Color Spaces
LAB
YCbCr
Color Spaces
Illumination Invariant - (IIv) [1]
SUV [2]
[2] Satya P. Mallick, Todd E. Zickler, David J. Kriegman, and Peter N. Belhumeur, "Beyond Lambert: Reconstructing Specular Surfaces Using Color." in Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2005.
[1] Hamilton Y. Chong, Steven J. Gortler, and Todd Zickler. 2008. A perception-based color space for illumination-invariant image processing. In ACM SIGGRAPH 2008 papers (SIGGRAPH '08)DOI:https://doi.org/10.1145/1399504.1360660
Dataset used for Testing
Helmet Left [1]
Plant Left [1]
[1] “Light Stage Data Gallery.” Light Stage Data Gallery, https://vgl.ict.usc.edu/Data/LightStage/.
Helmet and Plant Error Results
Plant Left | Grayscale | XYZ | LAB | YCbCr | SUV | IIV |
Mean Angle Error | 0.7646 | 0.7188 | 0.7778 | 0.7677 | 0.5044 | 0.6431 |
Mean Euclidean Distance Error | 0.4409 | 0.3995 | 0.455 | 0.445 | 0.1786 | 0.3064 |
Helmet Left | Grayscale | XYZ | LAB | YCbCr | SUV | IIV |
Mean Angle Error | 0.3939 | 0.6117 | 0.3701 | 0.3974 | 0.473 | 0.4593 |
Mean Euclidean Distance Error | 0.1437 | 0.3434 | 0.1239 | 0.146 | 0.2227 | 0.1913 |
Taken Images
3 Light Sources
Example Image with Color Checker and Chrome Sphere for calculating 2D Light Direction [1]
[1] Ying Xiong (2021). PSBox (https://www.mathworks.com/matlabcentral/fileexchange/45250-psbox), MATLAB Central File Exchange. Retrieved December 8, 2021.
Interesting Images Results (Rubix Cube)
Input Image
GT 2D Light Estimation
XYZ - Y Channel
SUV Light Direction Estimation Results
SUV - Specularity Invariant Combination
XYZ light Direction Estimation Results
Thank you
Depth Estimation using the Symmetric Point Spread Function Model
Dual Pixel Defocus Disparity:
Anne He
Anne He
Motivation
??
??
Garg et al. Learning Single Camera Depth Estimation using Dual Pixels.
Smartphone DP data in the wild
Smaller aperture leads to smaller disparity, but still noticeable to the human eye
Hypothesis
Experiment
Are kernels symmetrical?
(mostly)Yes!
How is this useful for depth estimation
There’s more!
Results
Results
Results
Results
Discussion
References
[1] Abhijith Punnappurath, Abdullah Abuolaim, Mahmoud Afifi, and Michael S. Brown. Modeling defocus-disparity in dual-pixel sensors. In IEEE International Conference on Computational Photography (ICCP), 2020.
[2] Shumian Xin, Neal Wadhwa, Tianfan Xue, Jonathan T. Barron, Pratul P. Srinivasan, Jiawen Chen, Ioannis Gkioulekas, and Rahul Garg. Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image. In ICCV 2021. https://imaging.cs.cmu.edu/dual_pixels/
[3] F. Mannan and M. S. Langer, “Blur calibration for depth from defocus,” in Conference on Computer and Robot Vision (CRV), 2016, pp. 281–288.
[4] T. Xian and M. Subbarao, “Depth-from-defocus: Blur equalization technique,” in SPIE: Society of Photo-Optical Instrumentation Engineers, 2006
Thank you!
Questions?
PatchMatch and
Content-Aware Image Editing
By: Alan Hsu
(15-463 Fall 2021)
PatchMatch
Algorithm
What is PatchMatch?
PatchMatch is an algorithm to produce a dense per-pixel correspondence (Nearest Neighbor Field, or NNF) between two images
Patch offset
How is Patch Match more efficient?
1. Dimensionality of Offset Space (much smaller)
2. Natural Structure of Images
(adjacent pixels likely have a similar offset)
3. Law of Large Numbers
(over multiple random offsets it is likely to find a good offset)
How does PatchMatch work?
Iter 0
Iter 0.25
Iter 0.5
Iter 0.75
Iter 1
Iter 5
Example of Convergence
The top image is reconstructed using the pixels from the bottom image
PatchMatch
Applications
Can you spot the difference?
How about this one?
Guided Image Completion
We compute the NNF from the hole to the rest of the image!
More Structural Image Editing Results
Per-Pixel Style Transfer
Oil Painting!
More Painting Results (Monet)
Citations and Acknowledgements:
1.) Prof. Ioannis Gkioulekas for the all the help he provided!
2.) PatchMatch paper: https://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/patchmatch.pdf
3.) SIFT Flow: https://people.csail.mit.edu/celiu/SIFTflow/
4.) Style Transfer: https://arxiv.org/pdf/1508.06576.pdf
5.) Vincent van Gogh and Claude Monet for their paintings
Thanks!
DiffuserCam for 3D Printing Applications
Joon Jang, jiwoong@andrew.cmu.edu
DiffuserCam
DiffuserCam is a compact and relatively simple computational camera for single-shot 3D imaging.
Due to the single-shot nature and affordability of components to build it, I’m exploring whether it’d be usable in 3D printing contexts.
How it Works
Theory: PSF Differences
Lateral changes leads to changes in placement in the PSF on the sensor.
Similarly, Vertical changes leads to changes in placement in the PSF on the sensor.
How It’s Made
How It’s Made
Calibration
Preliminary Results
3D File
3D Print
Reconstructed Print
Preliminary Results
3D File
3D Print
Reconstructed Print
Applying Computational Photography techniques to the WhiskSight Sensor
Teresa Kent
WhiskSight Sensor
Kent, Teresa A., et al. "WhiskSight: A Reconfigurable, Vision-Based, Optical Whisker Sensing Array for Simultaneous Contact, Airflow, and Inertia Stimulus Detection." IEEE Robotics and Automation Letters 6.2 (2021): 3357-3364.
What the Camera Sees
Kent, Teresa A., et al. "WhiskSight: A Reconfigurable, Vision-Based, Optical Whisker Sensing Array for Simultaneous Contact, Airflow, and Inertia Stimulus Detection." IEEE Robotics and Automation Letters 6.2 (2021): 3357-3364.
Current Application
Kent, Teresa A., et al. "WhiskSight: A Reconfigurable, Vision-Based, Optical Whisker Sensing Array for Simultaneous Contact, Airflow, and Inertia Stimulus Detection." IEEE Robotics and Automation Letters 6.2 (2021): 3357-3364.
What Improvements can be made
Handling situations which break the tracker
Tradeoffs between array size and accuracy
Novel Applications
Understanding the Camera Space/Overlap
Image Overlap at 50 mm
Two Camera’s View of the 3d world
Updated Sensor
Camera
Visual Representation of the Overlapping Space
Image Detection Through the Elastomer
Use the sharpness to distinguish the objects in front of and behind the elastomer
Luminance
Blurred Image
High Detail Image
=
-
Tested Different Thresholds for Sharpness and Sigma Values
Threshold
Separated Image
Separated Image
What’s Next
Huixuan Tang, Scott Cohen, Brian Price, Stephen Schiller and Kiriakos N. Kutulakos, Depth from defocus in the wild. Proc. IEEE Computer Vision and Pattern Recognition Conference, 2017.
Heide, F., Rouf, M., Hullin, M. B., Labitzke, B., Heidrich, W., & Kolb, A. (2013). High-quality computational imaging through simple lenses. ACM Transactions on Graphics (TOG), 32(5), 1-14.
Thank You
All-in-focus Image Estimation from Dual Pixel Images
Akash Sharma Tarasha Khurana
Expected all-in-focus image
What are DP images and why we care about all-in-focus?
Some devices with Dual Pixel sensors
a) Traditional Sensor b) Dual-Pixel Sensor
Each pixel of the Dual-Pixel sensor is divided into two parts, left and right. It captures images much like a stereo camera with a small baseline.
All-in-focus images are used extensively in robotics applications. Existing robotics algorithms cannot run on defocus-blurred images.
Can we recover an all-in-focus image from a DP image?
Dual Pixel Images are two-sample light fields and each pixel integrates light from half the main lens aperture.
Idea: Use disparity of out-of-focus points as a guide to defocus the image. Problem is still under-constrained.
How has this been done before?
Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.
How has this been done before?
Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.
How has this been done before?
Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.
How has this been done before?
Model the dual pixel (left, right) images as an observation rendered from an underlying Multiplane Image.
What are some limitations that can be addressed?
Model defocus image using rendering from continuous volumetric representation.
Discrete depth representations can model only coarse relative ordering of pixels
End of Story? Why is this difficult?
Trace hourglass like cones through volume and weight regions continuously.
Easy to model blurring
What is a workaround? Thin-lens rendering
1 We are skipping a lot of volumetric rendering details for NeRF.
Lens aperture
Image plane
Rays from aperture plane
So, does this work? Kind of!
Implementing this gives us a lightfield image where we have a 8 x 8 subaperture view of every pixel on the image.
We can render the left and right DP images from these by averaging the 8 x 4 halves of all the pixels.
8
8
So, does this work? Kind of!
Although rendered dual pixel images look good (PSNR: 43.13) …
Predicted Images
Left DP Image
Right DP Image
So, does this work? Kind of!
Although rendered dual pixel images look good (PSNR: 43.13) …
Target Images
Left DP Image
Right DP Image
So, does this work? Kind of!
The individual pinhole images (or all-in-focus images) look jagged and we need a way to enforce pixel-wise smoothness.
How can we enforce smoothness?
Through a regularization loss between neighbouring pixels, like the Smooth L1 loss.
Recall that default pixel sampling in NeRF is random.
How can we enforce smoothness?
Through a regularization loss between neighbouring pixels, like the Smooth L1 loss.
Recall that default pixel sampling in NeRF is random.
We modify to randomly sample only half the pixels and then add their x- and y- neighbours (chosen with equal probability).
Does enforcing smoothness help? Not really
All-in-focus images become excessively smooth (almost stretched) …
*Note that we did not have enough GPU cycles for hyperparameter tuning for any of the experiments
Does enforcing smoothness help? Not really
and PSNR drops on the dual pixel images from 43.13 to 32.6.
Predicted Images
Left DP Image
Right DP Image
Does enforcing smoothness help? Not really
and PSNR drops on the dual pixel images from 43.13 to 32.6.
Target Images
Left DP Image
Right DP Image
What is really going wrong?
What can still be done?
References
[1] Shumian Xin, Neal Wadhwa, Tianfan Xue, Jonathan T. Barron, Pratul P. Srinivasan, Jiawen Chen, Ioannis Gkioulekas, and Rahul Garg. Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image, ICCV 2021.
[2] Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R. and Ng, R., 2020, August. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision (pp. 405-421). Springer, Cham.
[3] Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R. and Srinivasan, P.P., 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. arXiv preprint arXiv:2103.13415.
[4] Sitzmann, V., Rezchikov, S., Freeman, W.T., Tenenbaum, J.B. and Durand, F., 2021. Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering. arXiv preprint arXiv:2106.02634.
3D Video�from�single view video
Nupur Kumari, Vivek Roy
Existing work: 3D shot photography
Method flowchart
Create a Layered Depth Image (LDI) using depth map
Find discontinuities in LDI by depth threshold
Create context region for in-painting background�(lower depth part)
Example Context and Discontinuity map
Note that each discontinuous edge is handled independently with its own context region
Failure cases
If the context regions extends to wrong regions, in-painting output is not ideal and leads to bleeding.
Similar example, where the road has white patches due to white color in shirt, though its not there in original frame.
Thus the method is quite sensitive to predicted depth values
Failure cases
Use segmentation to remove context from different layers of LDI. (We use detectron2 Mask RCNN)
Simple approach to prevent context bleeding
Before
After
Simple approach to prevent context bleeding
Before
After
Doesn’t solve the issue always. We can use context of other frames in video for in-painting
Before
After
Extended to video frame by frame
Depth estimation from video
Used CVD[*] with updated flow prediction network for improved consistency based depth estimation.
Given camera parameters for each frame map depth frame to frame:
[*] Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. Consistent video depth estimation. ACM TOG (Proc. SIGGRAPH), 39(4), 2020.
Depth estimation from video
Flow prediction by RAFT
Use predicted flow to warp the
frame to frame and then find the 3D coordinate of warped image in coordinate frame.
Measure the similarity between the reprojections using flow and camera.
Teed, Zachary, and Jia Deng. "Raft: Recurrent all-pairs field transforms for optical flow." European conference on computer vision. Springer, Cham, 2020..
CVD
CVD + RAFT
Using more context from video
Note that each discontinuous edge is handled independently with its own context region
Using more context from video
Combined context
Using more context from video
Combined context
Using more context from video
Failure Case
Temporally consistent Nerf Representation
Nerf video representation
Current training uses flow to train a time consistent NeRF model
Li, Zhengqi, et al. "Neural scene flow fields for space-time view synthesis of dynamic scenes." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021..
Disentangling objects in Nerf representation
Use different nerf model for each object and fuse them together during rendering.
Promote predicted opacity to match the masks for object branch.
Li, Zhengqi, et al. "Neural scene flow fields for space-time view synthesis of dynamic scenes." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021..
Scene Flow from Light Field Gradients
Joelle Lim
Scene Flow: what is?
Given:
How did the scene move?
time t
t + 1
t + k
how…
Ray flow equation

Ray flow equation
Light field gradients

Ray flow equation
Light field gradients
Scene Flow Components

Light ray parameterization
Ray Flow due to scene motion
parallel shift of ray
=
Assuming ray brightness remains constant,...
First order taylor expansion
Ray flow equation
First order taylor expansion
Substitute
Ray flow equation

underconstrained
Ray flow equation

underconstrained
have to impose additional constraints!!!
Ray flow equation
Optical flow equation
Ray flow equation
Optical flow equation
Ray flow equation
Optical flow equation
Local:
Lucas Kanade
Global:
Horn Schunck
Lucas Kanade (Local)
Solve for V,
where
assumption: scene motion is constant in neighbourhood
Horn Schunck (Global)
Optimization problem to minimize
Scene motion variation
Minimize:
error term
smoothness term
some results
Lucas Kanade
my results
their chad result
Lucas Kanade
Horn-Schunck
Global + Local
Other results from paper
with enhancements ++�
Thank you!
Citations:
Ma, S., Smith, B. M., & Gupta, M. (2018). 3D scene flow from 4d light field gradients. Computer Vision – ECCV 2018, 681–698. https://doi.org/10.1007/978-3-030-01237-3_41
Reconstructing Refractive Objects
via Light-Path Triangulation
Emma Liu (emmaliu)
Why is reconstruction of transparent objects hard?
Because they don’t have a local appearance - we have to rely on observations of the scene behind the object.
Well that’s difficult. What can we do about it?
Apply knowledge of the scene and properties of refraction to light-path triangulation!
Refractive Light-Path Triangulation
Assuming that light intersects with the surface of an object at most twice (enters and exits), and with light propagation information of different views of the scene, solve a minimization problem finitely constraining the surface properties of the object.
Light Path Scene Model
Reference camera �vs. with validation cameras
For a pixel q, if our view of 3D world p is obscured by a transparent object, it takes a piece-linear light path and refracts through the object.
Assuming that the light path intersects with the surface exactly 2x, then we can solve for the remainders:
With position f and its surface normal nf,
we can compute the depth of the object at that point.
Developing the Correspondence Map
For each pixel q, determine the first ray in the light path that indirectly projects through it: the first ray that intersects the object, lb = L(q).
Key: Make use of two backdrop plane locations to determine two backdrop positions via stripe projection, then map pixel to the ray defined both.
Reconstruction via Minimizing Light-Path Consistency Error
Objective: For each light path, determine the surfel pair (f,nf) that minimizes the reconstruction error: refracted ray lm meets lf and lb at both ends as much as possible.
For each estimate surfel, compute lm by tracing backwards and applying Snell’s Law with the index of refraction*.
Then the reconstruction error per camera is the shortest distance between the two rays:
* But wait, aren’t we trying to determine the IOR?? More on that later…
Triangulation Procedure: Depth Sampling
By refraction laws, the normals nb and nf can be computed based on position b, f.
Perform a brute-force search to determine the candidate locations of b and f:
Depth computed with optimal surfel, camera pos:
Determining the Index of Refraction (IOR)
With >=4 cameras, the index of refraction can be computed. Repeat triangulation/optimal surfel-finding procedure for n pixels for m candidate IORs.
For reference, the IOR of K9 crystal glass (my Bohemian crystal) is ~1.509.
Results
Depth (above), normal maps for rotated views (below)
Thanks
Citations:
K. N. Kutulakos and E. Steger, “A theory of refractive and specular 3D shape by light-path triangulation,” International Journal of Computer Vision, 11-Jul-2007. [Online]. Available: https://link.springer.com/article/10.1007/s11263-007-0049-9.
*Steger, E. (2006). “Reconstructing transparent objects by refractive light-path triangulation”, Master’s thesis, Department of Computer Science, University of Toronto. http://www.cs.toronto.edu/ ~esteger/thesis.pdf.
*Figures and high-level algorithm explanation provided by this paper.
Multi-modal 3D reconstruction
Robert Marcus (rbm@)
Background
**https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3304120/
Background - Cont
Background - Cont
Background - Cont
GCPNs - Big picture
*1-5mm resolution
**5mm kinect resolution -> .3mm combined resolution
GCPNs - How
Image credit: https://web.media.mit.edu/~achoo/polar3D/polarized3D_poster.pdf
GCPNs - Results
Image 1 credit https://web.media.mit.edu/~achoo/polar3D/polarized3D_poster.pdf
Image 2 credit https://web.media.mit.edu/~achoo/polar3D/kadambi_new_england_vision2015.pdf
Table credit https://web.media.mit.edu/~achoo/polar3D/camready/supplement_iccv.pdf
GCPN - My results
Credits
Synthetic Depth-of-Field with a Mobile Phone
Arpita Nag
Motivation and Background
Pipeline
Part 1 - Obtain a Defocus Map
Figure: Original paper which used 2D tile-matching to estimate flow in temporal sequence.
First, we must estimate a high resolution, sub-pixel disparity map.
We first use tile matching of one view against the other, and heuristics to produce a per-pixel flow field (to the other view) and per-pixel confidence.
Pipeline (Cont)
Part 1.2 - Calibrate the Image
Sadly, smartphone cameras don’t totally obey the thin-lens approximation.
Pipeline (Cont)
Part 1.3 - Final Depth / Defocus Map
Pipeline (End)
Part 2 - Rendering the Blur
Results (Pt 1)
Disclaimer: Calibration not done yet, and rendering is still done with discrete, quantized “disparity bands”.
Results (Pt 2)
More Results and Challenges
References
Thank you!
Any questions?
Capturing Lightfields
Gaurav Parmar
Task Goal
Hardware for Capturing the Lightfields
Example lightfields
Refocusing the Image
Aspects of this approach
(+) the lighfields captured are great
(-) very expensive equipment is needed
- Can I capture something like this?
Unstructured Lightfields
Challenges in this Procedure
Solution - filtering out frames
Initial Result Captured
Focus on the stone
Focus on the Text
Issues in the results
In progress tasks
Thank You!
Structured Light 3D Scanning
Sarah Pethani
Structured Light Scanning
Triangulation:
Time Multiplexing: Temporal Binary Codes
Problems with Basic Structured Light Scanning
How to fix?
Deciding between low-freq and high-freq
Conventional Gray
Max-Min SW Gray
XOR 02
XOR 04
Combined
Thanks!
Yingsi Qin
Andrew Maimone, Andreas Georgiou, and Joel S. Kollin. 2017. Holographic near-eye displays for virtual and augmented reality. ACM Trans. Graph. 36, 4, Article 85 (July 2017), 16 pages.
Complete 3D Object Reconstruction from Surround Structured Light Scanning
Nathan Riopelle
Project Goal (As Proposed)
Create a 3D reconstruction of an object using a single camera under the presence of structured light. While traditional approaches for capturing a 360-degree scan of an object require a turntable and multiple images, this project will accomplish the same using a pair of planar mirrors and an orthographic projector constructed with a Fresnel lens.
Project Goal (As Proposed)
Orthographic projector and planar mirrors generate four “virtual” cameras
1
2
3
4
Based off of 2007 3DIMPVT paper
Calibration is Immensely Complex
Intrinsic camera calibration
Intrinsic projector calibration
Projector / Fresnel lens focal point alignment
Mapping of scan lines to planes in 3D
Estimation of position/pose of mirrors relative to camera
First Attempt at Surround Scanning
Revised Project Goal
Create a single-view 3D reconstruction of an object using a single camera under the presence of structured light. As a stretch goal, attempt surround scanning using a pair of planar mirrors and an orthographic projector constructed with a Fresnel lens.
Single-View Scanning Setup
Intrinsic Camera + Projector Calibration
Calibrated by projecting structured light Gray codes onto a 9x6 checkerboard
Zoomed in
Zoomed out
Projector
Camera
3D Scan Targets
Larger,
Simple geometry
Many inter-occlusions,
Detailed
Large depth variation in small area
Projected Area Mask
Projector Row + Column Decodings
3D Reconstruction Results
3D Reconstruction Results
3D Reconstruction Results
References
1] Gupta, M., Agrawal, A., Veeraraghavan, A., & Narasimhan, S. G. (2013). A Practical Approach to 3D Scanning in the Presence of Interreflections, Subsurface Scattering and Defocus. International Journal of Computer Vision, 102(1–3), 33–55. https://doi.org/10.1007/s11263-012-0554-3
[2] Gupta, M., Agrawal, A., Veeraraghavan, A., & Narasimhan, S. G. (2011). Structured light 3D scanning in the presence of global illumination. CVPR 2011, 713–720. https://doi.org/10.1109/CVPR.2011.5995321
[3] Lanman, D., Crispell, D., & Taubin, G. (2007). Surround Structured Lighting for Full Object Scanning. Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), 107–116. https://doi.org/10.1109/3DIM.2007.57
[4] Lanman, D., & Taubin, G. (2009). Build your own 3D scanner: 3D photography for beginners. ACM SIGGRAPH 2009 Courses on - SIGGRAPH ’09, 1–94. https://doi.org/10.1145/1667239.1667247
[5] Moreno, D., & Taubin, G. (2012). Simple, Accurate, and Robust Projector-Camera Calibration. 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, 464–471. https://doi.org/10.1109/3DIMPVT.2012.77
[6] Patent Images. (n.d.-a). Retrieved October 25, 2021, from https://pdfaiw.uspto.gov/.aiw?PageNum=0&docid=20170206660&IDKey=&HomeUrl=%2F
[7] Salvi, J., Pagès, J., & Batlle, J. (2004). Pattern codification strategies in structured light systems. Pattern Recognition, 37(4), 827–849. https://doi.org/10.1016/j.patcog.2003.10.002
IMU-Aided Deblurring with Smartphone
Jason Xu, Sanjay Salem
Premise
Choices to Increase Exposure
High amount of noise
Impossible on smartphones and smaller cameras
Increased blur (unless you have very steady hands)
Naïve Method - Blind Deconvolution
What If…?
We could do better?
Improved method:
IMU Aided Deconvolution
Joshi, Neel et al. Image Deblurring using Inertial Measurement Sensors.
High Level Approach
Our Improvements
Data Collection
Huai, Jianzhu et al. Mobile AR Sensor Logger.
Calculation of Camera Motion
Forming the Blur Kernel
Deconvolution
Regularized Least-Squares Optimization
Example Output
Example Output
Example Output
Advantages
References
Thank you!
Structured Light Blocks-World
Keshav Sangam
Traditional Structured Light Scanning
Triangulation:
Why is this inefficient?
Blocks-World Structured Light Scanning
Planes from Known Correspondences
Given a known image feature corresponding to a known pattern feature, we can use geometry to find the plane parameters such as the surface normal and shortest distance D from from the camera origin to the plane.
Planes from Unknown Correspondences
Pattern Design
Did it work?
Not yet; many different problems.
Multi-flash Photobooth
Flash based computational imaging filters
Vivian Shen, vhshen
What can we do by controlling capture conditions with flash?
Hardware setup
Final Hardware
Problem: difficult to distinguish depth discontinuities and edges
Canny Edge Detector
Canny Edge Detector
Multi-flash Edge Detection
Multi-flash Edge Detection
Four images: up, down, left, right flashes
each flash image:
normalize by total grayscale mean
divide by max grayscale isolate shadows
sobel filter on shadow images
hysteresis filter on silhouettes
Detecting Silhouettes / Simple Edge Map
Applications
Shadow Removal
Simple composite of all non-shadowed regions from every image (taking the max RGB value from each image)
Stylized Edge Rendering
Using the confidence map of the edge detections (composited from all flash images) to stylize the image
De-emphasized texturing
Distinguish foreground from background using depth discontinuities
Create a mask using the edge pixels and the texture values near the edge pixels on the “foreground” side
Apply gradient of textures based on euclidean distance
Problem: segmenting foreground and background
Flash Matting
Not much change in background because it is distant
Not much change in background because it is distant
Base segmented foreground image
Flash Matting
Foreground
Bayesian Flash Matting
Generate trimap and apply Bayesian
Joint Bayesian Flash Matting
Problem: shading details lost in certain lighting conditions
Detail Enhancement (not implemented yet)
Based on the shadowing introduced from the flash at different angles, different features are highlighted (and shadowed)
Multiscale Decomposition (per image)
Synthesis
Examples
Examples
Bibliography
Event-Based Video Frame Interpolation
An explanation and implementation of TimeLens[1]
Gustavo Silvera
CMU 15-463 Fall 21
Two cameras
Video Camera
Event Camera
Two cameras
Video Footage:
- still frames
Event Data:
- dense motion
Image Credit: By TimoStoffregen - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=97727937
Combining the two
What if we leverage the high temporal resolution of the event cameras with the high spatial resolution of the video camera to generate a video with the best of both?
Low FPS video + High FPS events => High FPS video
Image Credit: TimeLens Presentation: S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, D. Scaramuzza, CVPR 2021
How this works (1/2)
Image Credit: TimeLens Presentation: S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, D. Scaramuzza, CVPR 2021
What do we have to work with?
How this works (2/2)
[1] Time Lens: Event-based Video Frame Interpolation. S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Li, D. Scaramuzza, CVPR 2021
These modules are designed to marry the advantages of synthesis VFI and warping VFI
(Read more in Section 2 of their paper[1])
How they perform VFI (Video Frame Interpolation)
Example Output
Video (30hz) Events
VFI Video (210hz)
Note that this data was provided by the authors of the paper, but these outputs were generated by my implementation of TimeLens.
For other cool demos like this, see their webpage
My data capture
1920x1080@60hz
320x240@5000hz
My setup
Their setup
Unforeseen challenges
No synchrony between (my) cameras!
Synchrony Point
My results (1/3)
Slowed down (15fps) Slowed down (60fps)
My results (2/3)
Slowed down (30fps) Slowed down (60fps)
My results (3/3)
Slowed down (30fps) Slowed down (60fps)
Thank you!
Questions?
References:
ETH Zurich Robotics & Perception Group: https://rpg.ifi.uzh.ch
Dual Photography
Rachel Tang
Dual Photography
Use Helmholtz Reciprocity to produce a virtual image from perspective of the projector without changing setup of scene
Ways of Capturing Image Dataset
Brute Force Algorithm: Illuminate one pixel on the projector at a time
Fixed Scanning: Illuminate several pixels at set interval
Ways of Capturing Image Dataset
Adaptive Multiplexed Illumination: recursively calculate which frame a pixel should be illuminated in based on conflicts
Bernoulli Binary Patterns: illuminate random pixels based on uniform distribution
Computation of the Matrix T
Captured matrices C (m*n, k) and P (p*q, k) → want to calculate T (m*n, p*q)
Intermediate Results + Challenges
With fixed scanning (debugging images):
Challenges:
References
Sen, Pradeep, and Soheil Darabi. “Compressive Dual Photography.” Computer Graphics Forum, vol. 28, no. 2, 2009, pp. 609–618., https://doi.org/10.1111/j.1467-8659.2009.01401.x.
Sen, Pradeep, et al. “Dual Photography.” ACM SIGGRAPH 2005 Papers on - SIGGRAPH '05, 2005, https://doi.org/10.1145/1186822.1073257.
Depth Estimation with Two Taps �on Your Phone
Chih-Wei Wu | 15663, Fall 2021
Introduction
Depth from Focus (DfF)
Approach
Ref: Suwajanakorn, et al. "Depth from focus with your mobile phone." CVPR, 2015.
Focal stack
alignment
Focus measure
Depth prediction
Depth refinement
Challenge
Focal stack
alignment
Focus measure
Depth prediction
Depth refinement
Challenge 1:
Camera shake, motion
Challenge 2:
Noisy depth estimation
No-texture region
Dealing with Camera Shake
Ref 1: Surh, et al. "Noise robust depth from focus using a ring difference filter." CVPR, 2017.
Ref 2: Farnebäck, Gunnar. "Two-frame motion estimation based on polynomial expansion." SCIA, 2003.
Ref 3: Teed, et al. "Raft: Recurrent all-pairs field transforms for optical flow." ECCV, 2020.
Homography1
Convention Optical Flow2
Deep
Optical Flow3
1st improvement
Input stack
Dealing with Camera Shake
Homography1
Convention Optical Flow2
Deep
Optical Flow3
Objects are sheared
Object edges are distorted
No obvious shear or distortion!
Ref 1: Surh, et al. "Noise robust depth from focus using a ring difference filter." CVPR, 2017.
Ref 2: Farnebäck, Gunnar. "Two-frame motion estimation based on polynomial expansion." SCIA, 2003.
Ref 3: Teed, et al. "Raft: Recurrent all-pairs field transforms for optical flow." ECCV, 2020.
Dealing with Noisy Depth Estimation
Ref: Hazirbas, et al. "Deep depth from focus." ACCV, 2018.
2nd improvement
Dealing with Noisy Depth Estimation
Non-learnable
1x1 conv
Depth
(L2 loss)
Focal
Stack
Learnable
1D conv
Focal
Stack
1D pool
Per-pixel classification of infocus image
(Cross-entropy loss)
Proposed (To be done)
DfF = 1 row of AFI
Comparing Depth Estimation Method
Ref: All-in-focus image
LoG + Gaussian Blur
Deep network
Last Secret Weapon
Focal stack
alignment
Focus measure
Depth prediction
Depth refinement
Formulate as MRF multi-label problem,
use graphcut to minimize energy function:
Unary term
Inverse of pixel sharpness
Pairwise term
Inverse of depth smoothness
Ref: AIF image
(LoG + Gaussian)
Result
Ref: All-in-focus image
LoG + Gaussian Blur
Deep network
LoG + Gaussian Blur
+ Depth refinement
Result
Ref: All-in-focus image
LoG + Gaussian Blur
Deep network
LoG + Gaussian Blur
+ Depth refinement
More Result
Ref: All-in-focus image
LoG + Gaussian Blur
Deep network
LoG + Gaussian Blur
+ Depth refinement
Extreme Case: Super Large Camera Motion
Ref: All-in-focus image
LoG + Gaussian Blur
Deep network
LoG + Gaussian Blur
+ Depth refinement
Thank you!
3D Scanning
(But not with a Stick)
Daniel Zeng
Assignment 6 - 3D Scanning with a Stick
Assignment 6 - 3D Scanning with a Stick
Assignment 6 - 3D Scanning with a Stick
Problems with Stick:
Better method: Structured lighting
Discrete Codes:
Binary
Gray
Example output:
Statue in white light
Reconstruction with Gray code
Continuous Codes:
Ramp
Triangle
Sinusoid
Hamiltonian
Continuous Codes:
Results
Acknowledgement
Thank you!
Image Colorization Using Optimization
Joyce Zhang
Background
Colorization typically requires:
Levin’s paper “Colorization using Optimization” describe a new interactive colorization technique that doesn’t precise manual segmentation.
(Levin, 2004)
Algorithm
RGB
YUV
Weight calculated based neighbors values
Set up Linear System:
Compute weight matrix W of size (imgsize * imgsize)
Setup b based on visual cues provided by the user
Solve linear system of sparse matrix
Results
Not bad!
Idea: Can we use this on non-photorealistic images?
Test: kind of works, but not very well.
Input image filtered with GradientShop, algorithm starts to show strong artifacts.
Discussion
Next Steps
Try to implement the “Colorization filter” from GradientShop, which seems to provide a better algorithm.
Papers referenced
Bhat, Pravin & Zitnick, C. & Cohen, Michael & Curless, Brian. (2010). GradientShop: A gradient-domain optimization framework for image and video filtering. ACM Trans. Graph.. 29.
Levin, A., Lischinski, D., & Weiss, Y. (2004). Colorization using optimization. In ACM SIGGRAPH 2004 Papers (pp. 689-694).
Image Stylization
Robin Zheng
Image stylization
(Results from Image Style Transfer Using Convolutional Neural Network by Gatys)
VGG-19
Content Representation
orginal content
conv1_1
Style representation
orginal style
conv1_1
Style Transfer
Some Result (for now)
Reference
Gatys, etc. Image Style Transfer Using Convolutional Neural Networks, https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf
Thank you!