1 of 481

Remaining course logistics

1

Optional homework assignment 7 is due on 12/18 at 23:59 ET.

- Gradescope submission site available.

Final project reports are due on Sunday 12/20 at 23:59 ET.

- Gradescope submission site available.

One upcoming computational photography talk:

- Tuesday 12/22, 1 – 2 pm, Grace Kuo (lensless cameras, diffuserCam, AR/VR displays).

Returning borrowed equipment:

https://docs.google.com/spreadsheets/d/1CVg7nUbI701pvZFPX3BR0uzKB76Y6PEl3tXmq1UF4AU/edit#gid=0

2 of 481

Class evaluation*s* – please take them!

2

CMU’s Faculty Course Evaluations (FCE): https://cmu.smartevals.com/

TA evaluation: https://www.ugrad.cs.cmu.edu/ta/F20/feedback/

15-463/663/862 end-of-semester survey: https://docs.google.com/forms/d/e/1FAIpQLSew6XHzagr0KqWPFCmdoVmnHXQR5vfi4A6nEG0jMg6O1lBixA/viewform

Please take all three of them, super helpful for developing future offerings of the class.

Thanks in advance!

3 of 481

Using incident lightfield for simulating GelSight

Arpit Agarwal

4 of 481

Objective

Capture spatially and directionally varying illumination and use this lighting to generate images with synthetic objects

5 of 481

Key Principle

Measured scene radiance and global illumination by placing light probe
Use raycasting to find normals of surface at each pixel and interpolate colors

Debevec, Paul. "Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography." ACM SIGGRAPH 2008 classes. 2008. 1-10.

Capture

Render

6 of 481

Spatially varying incident lightfield

Previous method only works if the object is very small as compared to the captured image
Capture images with multiple probe locations

Unger, Jonas, et al. Capturing and rendering with incident light fields. UNIVERSITY OF SOUTHERN CALIFORNIA MARINA DEL REY CA INST FOR CREATIVE TECHNOLOGIES, 2003.

7 of 481

Motivation

GelSight has small form factor
Camera can not be masked

2cm

8 of 481

Capture Method

GelSight

9 of 481

Capture Method

GelSight

10 of 481

Capture Method

GelSight

11 of 481

Rendering Method

Diffuse material

Precomputed

12 of 481

Rendering Method

Diffuse material

Scaled to [0,1]

13 of 481

Simulation setup

Setup scene in Mitsuba 2
Probe location along grid 8x9 with 0.5mm spacing

14 of 481

Simulation Results

15 of 481

GelSight data collection

Collected low dynamic range raw images at 5 exposure settings
Merged LDR images into HDR using MergeDebevec function

16 of 481

GelSight Results

17 of 481

Summary

Increasing the number of probe locations might result to better image
Implement the method for general material
The method could be used to place fake light sources for global illumination

18 of 481

18

Deep High Dynamic Ranging of Dynamic Scenes

Presenter: Uma Arunachalam

Andrew ID: uarunach

19 of 481

Brief Overview

To merge LDR images that are not necessarily aligned or based on scenes that are not necessarily static, using deep learning (DL) and produce a HDR image as output
Motivation-

Reduce the setup-requirements on capture process

Can be applied to any random scene captured with a handheld camera instead of using tripods, gphoto for capture

Reduce the processing time required for merging

Time is spent only on aligning images and inference time of the DL network.

Mimic and extend on Baseline reference [1]

19

20 of 481

Approach

3 misaligned images

21 of 481

Optical Flow

3 misaligned images

22 of 481

CeLiu’s Classical Method

Explores brightness constancy assumption, and uses iterative re-weighted least squares to minimize pixel intensity values between reference image and the ‘estimated’ flow guided pixel of another image

23 of 481

Approach

3 misaligned images

24 of 481

Learning based LDR merging

3 misaligned images

25 of 481

Network Architecture

Direct

Weight estimator

Fully Differentiable Architecture

26 of 481

Experimental Setup

Default optical flow parameters, as suggested in [2]
For training Direct and WE architectures:

Used online available datasets plus few custom captures
Xavier uniform initialization
Adam optimizer, Learning rate = 0.0001
Trained images over ~100 epochs, logging every 20 iterations

Trained for about 3 hours before achieving convergence per method on NVIDIA Tesla v100
Torch MSE Loss, with reduction to ‘sum’

27 of 481

Results: Error metric

Direct

WE

28 of 481

Results: Optical Flow

29 of 481

Results: Deep merging vs Tent

30 of 481

Results: WE vs Direct

31 of 481

Results

Metric	Tent merging	Deep merging
Mean Square error	0.0594	0.0037
PSNR	12.27	24.31
Run time (merging)*	9.58 s	5.88 s

* not profiled, averaged over 2 runs

32 of 481

Main References

[1] https://cseweb.ucsd.edu/~viscomp/projects/SIG17HDR/PaperData/SIGGRAPH17_HDR.pdf

[2] https://people.csail.mit.edu/celiu/OpticalFlow/

33 of 481

Thank you!

Questions?

34 of 481

Real-time Cartoonization

Ricky Bao

35 of 481

What is NPR and cartoonization?

Non-photorealistic rendering favors artistic styles instead of faithful recreation of a scene

Paintings and drawings

Cartoonization is a form of NPR

Enhanced lines
Textures and colors smoothed out

Use cases of cartoonization

Entertainment
Simplification of scenes

36 of 481

Example of cartoonization

37 of 481

How is cartoonization performed?

Line extraction

Canny
Gradient
Difference of Gaussian

Color reduction

Bilateral filter
Anisotropic diffusion filter

Image enhancement

Combine line extraction and color reduction

38 of 481

Real-time cartoonization

	Average (ms)	Min (ms)	Max (ms)
Roof	56.2	50.3	62.7
Android	54.9	47.2	60.1
Pool	58.0	50.6	66.9
Globe	53.2	46.3	58.2
Sunset	54.7	49.8	61.0

39 of 481

Resources

R. Raskar, K.-H. Tan, R. Feris, J. Yu, and M. Turk, “Non-photorealistic camera: Depth edge detection and stylized rendering using multi-flash imaging,” eng, ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 679-688, 2004, issn: 0730-0301.
A. B. Patankar, P. A. Kubde, and A. Karia, “Image cartoonization methods,” eng, IEEE, 2016, pp. 1-7, isbn: 9781509032914.

40 of 481

Multi-Flash Imaging for Depth Edge Detection and Photometric Stereo

Matthew Baron (mcbaron)

41 of 481

Setup

42 of 481

Depth Edge Detection

43 of 481

Photometric Stereo

44 of 481

References

[1] T. Papadhimitri and P. Favaro, “A New Perspective on Uncalibrated Photometric Stereo,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, Jun. 2013, pp. 1474–1481, doi: 10.1109/CVPR.2013.194.

[2] R. Raskar, K.-H. Tan, R. Feris, J. Yu, and M. Turk, “Non-Photorealistic Camera: Depth Edge Detection and Stylized Rendering Using Multi-Flash Imaging,” p. 12.

[3] Y. Taguchi, “Rainbow Flash Camera: Depth Edge Extraction Using Complementary Colors,” p. 16.

[4] L. Wu, A. Ganesh, B. Shi, Y. Matsushita, Y. Wang, and Y. Ma, “Robust Photometric Stereo via Low-Rank Matrix Completion and Recovery,” in Computer Vision – ACCV 2010, vol. 6494, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 703–717.

[5] Y. Quéau, F. Lauze, and J.-D. Durou, “Solving Uncalibrated Photometric Stereo Using Total Variation,” J Math Imaging Vis, vol. 52, no. 1, pp. 87–107, May 2015, doi: 10.1007/s10851-014-0512-5.

45 of 481

Tracking Microscopic Motion Using Speckle Imaging

Yash Belhe

46 of 481

47 of 481

Setup

Optically Rough Surface

Laser

Bare Camera Sensor

Approx Co-located with Laser

48 of 481

Example with horizontal motion (with stutters)

49 of 481

Tracking Results

50 of 481

Learning to predict Depth and Autofocus using Focal and Aperture stack

Akankshya Kar and Anand Bhoraskar

{akankshk,abhorask}@andrew.cmu.edu

51 of 481

Contents

Problem Statement
Related Work
Dataset
Proposed Method
Ground Truth generation for Autofocus
Depth Map Generation
Autofocus
Future Work

52 of 481

Problem Statement

Low Light Image

Bright Autofocus Image

Depth Map

AFI Image

53 of 481

Related Work

Learning to Autofocus¹

Focal Stack

CNN

F

Auto-Focused Image

Uses Soft ordinal regression⁴ for predicting F from 49 indices in focal stacks
Captures its own images using 5 Pixel phones in High resolution
Use MobileNetV2, to make the prediction very fast, to enable on-device use case

F

54 of 481

Datasets

DDFF⁷

720 images (12 scenes)
Resolution 383 × 552
Missing area in depth
Very low baseline
Lenselet size = 9
Available online

Flowers Dataset⁸

3343 images
Resolution 376 x 541
No depth data
Limited Domain(1)
Synthetic data
Lenslet size = 14

Learning to Autofocus dataset¹

387,000 patches of 128x128
Resolution 1512 × 2016
50 Varied Scenes (indoor and outdoor)
Not released yet

55 of 481

Proposed Method

CNN

(Dark to light)

AFI

CNN

All in focus

Depth

Refocusing

CNN

F, A

Selected Patch

GT generation

Project scope

56 of 481

Ground Truth generation for Autofocus

f_min

f_max

Subset Focal stack

F (median depth)

Merging to get partial focused image

SSIM

(Grid search)

A, F(median)

Selected Patch

Selected subset using focus to disparity

AFI

Partial Focused

57 of 481

Results

GT generation

# of Patches: 19,311

Patches Size : 64 x 64

Stride : 100

Path	X_begin	Y_begin	Aperture size	Focus Distance
cafeteria/LF_0001.npy	32	96	7	3
cafeteria/LF_0001.npy	32	128	7	3
cafeteria/LF_0001.npy	32	160	7	3
cafeteria/LF_0001.npy	32	192	7	3
cafeteria/LF_0001.npy	64	32	7	3
cafeteria/LF_0001.npy	64	64	7	3
cafeteria/LF_0001.npy	64	96	7	3
cafeteria/LF_0001.npy	64	128	7	3
cafeteria/LF_0001.npy	64	160	7	2
cafeteria/LF_0001.npy	64	192	7	2
cafeteria/LF_0001.npy	96	32	7	2
cafeteria/LF_0001.npy	96	64	7	2
cafeteria/LF_0001.npy	96	96	7	2
cafeteria/LF_0001.npy	96	128	7	2

Selected Patch in Full image

Depth Patch and cropped out patch

58 of 481

Depth Map Generation

AFI is generated and given to Encoder-Decoder style architecture, which has 80 channels (10 focal and 8 aperture)
The Encoder is MobileNetV2 and Decoder is FCN with Upsampling, the loss is Virtual Normal Loss

AFI

Depth Map

CNN

59 of 481

Virtual Normal Loss²

Enforces a high order geometric constraint in the 3D space for the depth prediction task.
Makes N groups of 3 points in Point cloud and fits a plane
DDFF is an indoor dataset, hence VNL was chosen

60 of 481

Results

Depth from confocal stack

Current encoder-decoder network with MobileNet-V2 backbone and VNL loss
So far we trained ~4000 iterations
Reasons for slow training

Small dataset
Limited compute resources
Large size of confocal stack (8 times the light field image)
Pre-trained model not used to initialise

Ground Truth

Predicted

61 of 481

Auto-Focus

AFI is generated and selected patch is given to given to CNN
The selected patch is given as masked and as extra 2 channels in CoordConv⁵ method, loss is using Soft Ordinal Loss⁴

Refocusing

CNN

F, A

Selected Patch

62 of 481

Ordinal Regression Loss^3,4

Ordinal Regression solves tasks with relative ordering by penalising more for further away predictions
Soft Ordinal Regression makes the ground truth vector have a Softmax like weighting scheme rather than hard one hot vector

63 of 481

Future Work

DDFF dataset has many limitations, we would shift to Flowers dataset as it is bigger and more blurring is there due to bigger aperture size
Flowers dataset does not have depth, but we can generate the same Aperture and Focus dataset using Focus measure to get the same mapping between focus and disparity
Work on removing the bug, which gives maximum aperture size (could be a DDFF related problem)
Complete the training and ablation experiments
Work on Low light aspect of it by using RetinaNet to generate low light images from light field and use that for training

64 of 481

References

Herrmann, Charles, et al. "Learning to Autofocus." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Yin, Wei, et al. "Enforcing geometric constraints of virtual normal for depth prediction." Proceedings of the IEEE International Conference on Computer Vision. 2019.
Diaz, Raul, and Amit Marathe. "Soft labels for ordinal regression." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Fu, Huan, et al. "Deep ordinal regression network for monocular depth estimation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Liu, Rosanne, et al. "An intriguing failing of convolutional neural networks and the coordconv solution." Advances in neural information processing systems 31 (2018): 9605-9616.
Tsai, Yu-Ju, et al. "Attention-based view selection networks for light-field disparity estimation." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 07. 2020.
Hazirbas, Caner, et al. "Deep depth from focus." Asian Conference on Computer Vision. Springer, Cham, 2018.
Srinivasan, Pratul P., et al. "Learning to synthesize a 4D RGBD light field from a single image." Proceedings of the IEEE International Conference on Computer Vision. 2017.

65 of 481

Fast Separation of Direct/Global Images In A Pocket

Presenter: Zili Chai (zilic)

66 of 481

Hardware Setup

Sony IMX219

RPi 4B

DLPDLCR2000EVM

67 of 481

Camera Calibration

10 bit bayer data, BGGR ordering

AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD AABBCCDD

Linearity ∝ exposure

ISO = 100

Dark Frame

Mean = 63.90 (on 100 dark images)

daylight result

1ms

100ms

Projector also needs time!

68 of 481

Global/Direct Images

Global Illumination

scattering, interreflection, shadows…

Separation In Real World

69 of 481

Global/Direct Images

Separation in Real World

70 of 481

Results

Checkerboard shift

max

min

direct

global

direct

global

71 of 481

Results

More Patterns

random pattern with phase shift

0.5+0.5·sinφ

+2pi/7, 4pi/7, 6pi/7, 8pi/7, 10pi/7,12pi/7

direct

global

72 of 481

Results

More Patterns

gray code

73 of 481

References

Sony, IMX219 product brief version 1.0, (Jan 2017).

Texas Instruments, TI DLP® LightCrafter™ Display 2000 EVM User's Guide, (Oct 2017).

Pagnutti, Mary A., et al. "Laying the foundation to use Raspberry Pi 3 V2 camera module imagery for scientific and engineering purposes." Journal of Electronic Imaging 26.1 (2017): 013014.

Nayar, Shree K., et al. "Fast separation of direct and global components of a scene using high frequency illumination." ACM SIGGRAPH 2006 Papers. 2006. 935-944.

74 of 481

Q&A

75 of 481

Kirchhoff Migration for NLOS Imaging

Dorian Chan

76 of 481

NLOS Imaging

Figure from:

David B. Lindell, Gordon Wetzstein, and Matthew O’Toole. 2019. Wave-based non-line-of-sight Imaging using fast f−k migration. ACM Trans. Graph. 38, 4, 116.

77 of 481

Traditional Approach: Backprojection

From http://www.xsgeo.com/course/mig.htm

78 of 481

F-K Migration: a Wave-Based Theory

Fourier Transform

Resample and Filter

Inverse Fourier Transform

Supposedly more BRDF robust!

79 of 481

Kirchhoff Migration: another idea from seismology

Can be implemented as a backprojection
Theoretically equivalent to F-K migration - derived from scalar theory

80 of 481

Second Order Kirchhoff Migration

Attempt to model propagation from light source to hidden scene
Looks a lot like the LCT

81 of 481

82 of 481

83 of 481

Relating F-K migration to volumetric albedo models

Derive F-K from a volumetric albedo model (inspired by radar literature)
Derive a volumetric albedo model using a slightly modified F-K derivation
Main difference is a ramp filter

84 of 481

Questions?

85 of 481

Colorization through Optimization

Shruti Chidambaram | 15-463, Fall 2020

86 of 481

Colorization: Background

Colorization typically requires:

Image segmentation + tracking segments over image sequences
Lots of user input/intervention

Generally an ill-formed problem
Contemporary techniques: deep learning to predict likely colors of pixels

87 of 481

Acknowledgement

My project reimplements the algorithm presented in Colorization using Optimization

(Anat Levin, Dani Lischinski, and Yair Weiss, 2004)

88 of 481

Algorithm

Neighboring pixels should have similar colors if their intensities are similar
Working in YUV color space

Y → intensity
U, V → chrominance (encode color)

Minimize difference between color at a pixel and the weighted average of colors at neighboring pixels, subject to constraints of the user-specified colors
Optimization problem → solve large, sparse system of linear equations

89 of 481

Results

MARKED-UP GRAYSCALE

COLORIZED

GROUND TRUTH

90 of 481

Results: Experimenting with Varied User Input

MARKED-UP GRAYSCALE

COLORIZED

GROUND TRUTH

91 of 481

Experimenting with a Deep Learning Approach

Autocolorization results from demos.algorithmia.com/colorize-photos

paper: Colorful Image Colorization (Zhang, Isola, Efros; 2016)

OUTPUT

INPUT

GROUND

TRUTH

92 of 481

Non-Realistic Rendering App

Han Deng (handeng)

Heshan Liu (heshanl)

93 of 481

Motivation

Nowadays there are a lot of different types of filters you can choose when you are using camera app. We would like to make something different and slightly more complicated rather than just adjust the color of the image.

Original

Blending

Sketch

Watercolor

94 of 481

Goal

Implement multiple image stylization effects on mobile device
Effects

Pencil Sketch
Watercolor Painting
Image Blending

Android App

Take photos or select photo from the album.
Then do the target filter on the taken/selected photo.

95 of 481

Environment Setup

Mobile Device: Android

Package

Original Bitmap Factory provided by Android
RenderScripts: Framework for running computationally intensive tasks

96 of 481

Algorithms: Pencil Sketch

Two types of implementation

Grayscale Image divided by Gaussian Blurred result
Bilateral Filtering with a shade factor -> into Grayscale Output.

Input Image

Output 1

Output 2

97 of 481

Algorithms: WaterColor

Input Image

Output Image

98 of 481

Algorithms: Image Blending - Poisson Blending

Goal: Seamless clone the source image into the target image

Mix gradients of source and target image in target region:

Pérez, P., Gangnet, M., & Blake, A. (2003). Poisson image editing. In ACM SIGGRAPH 2003 Papers (pp. 313-318).

99 of 481

Algorithm and app Demo: Image blending

100 of 481

Possible Improvement

More complicated filters could be added like oil painting style

The camera UI could be more elegant

101 of 481

Thank you!

102 of 481

Uncertainty in Radiometric Calibration

15663 Course Project

Advait Gadhikar

103 of 481

Goal

Quantifying the uncertainty in Inverse Tone Mapping function
Estimate a distribution to quantify the uncertainty in linearizing tonemapped images
Using this uncertainty model to combine a HDR stack

104 of 481

Motivation

Linearizing images is an important preprocessing step in many Computer Vision Applications
Tone mapping for narrow gamut displays is a lossy compression
Standard methods to linearize JPEG images use a deterministic mapping
Such a mapping doesn’t consider information lost while converting to JPEG
A Probabilistic Model for inversion (JPEG to RAW) is more informative
Gives better performance on downstream tasks like HDR imaging

105 of 481

Standard Tone Map vs Learnt Tone Map

f is a polynomial and v is a linear transform, with g being a correction factor

If tone mapping is deterministic, Debevic and Malik’s method assumes this mapping

106 of 481

RMSE for Rendering Function (Raw to JPEG)

RMSE on a stack of images collected under multiple exposure

16 images, with 1 stop exposure step

RMSE: 16.995 (For 8-bit JPEG images)

Sample Images in stack

107 of 481

Inverse Distribution for Linearization

The inverse distribution is assumed to be a Multi Variate Gaussian

Here J is the forward rendering model
The integral is approximated over a grid of pixel values

A grid of 10 x 10 x 10 points was used for rgb values in [0,1]³

108 of 481

HDR with Uncertainty

Combining exposure stack to get HDR Image with Uncertainty

Estimated HDR image with uncertainty

109 of 481

Thank You

Questions!?

110 of 481

References

Chakrabarti, Ayan, et al. "Modeling radiometric uncertainty for vision with tone-mapped color images." IEEE Transactions on Pattern Analysis and Machine Intelligence 36.11 (2014): 2185-2198.
Paul E. Debevec and Jitendra Malik. 1997. Recovering high dynamic range radiance maps from photographs. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques (SIGGRAPH '97). ACM Press/Addison-Wesley Publishing Co., USA, 369–378. DOI:https://doi.org/10.1145/258734.258884

111 of 481

One-Shot to Vertigo: Novel View Synthesis using Light Field Cameras

Rohan Rao

(rgrao@andrew.cmu.edu)

112 of 481

Introduction

Extension of The “Vertigo Effect” on Your Smartphone: Dolly Zoom via Single Shot View Synthesis [1], a CVPR2020 paper on single-shot view synthesis. Also implements US Patent by Lytro/Google, “Generating dolly zoom effect using light field image data”. [2]
Creating a “dolly-zoom” or “vertigo” effect, popularized by Alfred Hitchcock’s 1958 movie, Vertigo.
Keep the subject in focus while moving the camera and zooming simultaneously, creating a tunnelling/falling effect. Also widely used for aerial cinematography shots.

113 of 481

Single-camera Pipeline: Pre-processing

Process RAW lightfield images (calibration, demosaicking, etc) using the PlenPy library [3].
Obtain depth image from lightfield using Structure Tensors [5] to estimate disparity [3].
Used the INRIA Lytro dataset [4] for testing calibration and depthmap results.

(from [1])

114 of 481

Single-camera Pipeline: Digital Zoom

Can obtain closed-form expression for digital zoom using focal length and camera intrinsics.
Implemented clipped zooming using OpenCV and efficient resizing.

(from [1])

115 of 481

Single-camera Pipeline: View Synthesis

Can obtain closed-form expression for single-shot view synthesis and warp image and depth map using z-buffering and forward warping.
Implemented with a z-buffer and looping over all pixels (more efficient in C++ though)

(from [1])

116 of 481

Single-camera Pipeline: Image/Depth fusion

Can fuse the images by creating image/depth occlusion masks from the previous step (regions where warping did not update the value)

(from [1])

117 of 481

Single-camera Pipeline: Depth Hole Filling

Simplistic method - just takes max of neighboring values to replace missing

(from [1])

118 of 481

Single-camera Pipeline: Image Inpainting

Extension: convert continuous depth values to discrete ones using K-Means clustering
Inpainting using masks and nearest valid pixel in the row

(from [1])

119 of 481

Single-camera Pipeline: Shallow Depth of Field (SDoF)

Extension: smoother blending of depth segments using finite bounding values, then scaling
Blur depth segments with different sigma values to keep subject focused

(from [1])

120 of 481

Intermediate steps towards final results

Post digital zoom (I1)

Without digital zoom (I2)

I₁^DZ

I₂^DZ

I₁^DZ

121 of 481

Final result

122 of 481

Extensions of prior work

Ideas for future work

Improved depth map estimate obtained from a light field instead of monocular depth estimation from a single RGB image.
Algorithm for using continuous-valued depth maps through K-Means based discretization.
Algorithm for Shallow Depth of Field (SDoF) and smoother merging of multiple image segments to prevent artifacts at the edges.

Improve the inpainting approaches using more sophisticated search for valid pixels (currently searches same row) or using deep-learning for inpainting (experimented but didn’t show results here)
Obtain more realistic depth values with calibration, since this will give better results
Improve the view synthesis from light fields using Multiplane Images (learning view synthesis using MPIs), this can be used to create more novel views from a single image/video sequence

123 of 481

Thanks for listening! Questions?

References

Liang et al., The “Vertigo Effect” on Your Smartphone: Dolly Zoom via Single Shot View Synthesis, CVPRW 2020.
US Patent US8971625B2, Generating dolly zoom effect using light field image data, Lytro Inc.
Schambach, Maximilian and Puente León, Fernando, Microlens array grid estimation, light field decoding, and calibration, IEEE Transactions on Computational Imaging, 2020.
M. Le Pendu, X. jiang, C. Guillemot, Light Field inpainting propagation via low rank matrix completion, IEEE Trans. on Image Processing, vol. 27, No. 4, pp. 1981-1993, Jan. 2018.
Sven Wanner and Bastian Goldlueck, Globally Consistent Depth Labeling of 4D Light Fields

124 of 481

Epipolar scanning for shape-from-silhouette in scattering medium

Shirsendu S Halder

shirsenh

125 of 481

Shape from silhouette

126 of 481

Shape from silhouette

Collimated illumination

Orthographic camera

Object

127 of 481

Imaging through scattering medium

We want the transmissive paths (ballistic photons) as they sharpen the object image and also enhances contrast.

128 of 481

Imaging through scattering medium

Rows of an orthographic projector lit.

Camera captures ballistic paths from each row.

Combination of position cues, angle cues, and probing

Orthographic projector

Telecentric camera

129 of 481

Imaging through scattering medium

Rows of an orthographic projector lit sequentially.

Camera captures ballistic paths from each row.

Orthographic projector

Telecentric camera

130 of 481

Why telecentricity?

Low perspective distortion��
Zero angular field-of-view: minimum parallax error��
Larger usable depth of field than conventional lenses due to symmetrical blurring

Image credit: https://www.edmundoptics.com/knowledge-center/application-notes/imaging/advantages-of-telecentricity/

131 of 481

Setup

132 of 481

Setup

Syncing electronics

133 of 481

Setup

134 of 481

Laser line

135 of 481

Object used for scanning

136 of 481

Scanning (in air)

Global

137 of 481

Scanning (in air)

Epipolar

138 of 481

Scanning (in water)

Global

139 of 481

Scanning (in water)

Epipolar

140 of 481

Scanning (milk + water - low concentration)

Global

141 of 481

Scanning (milk + water - low concentration)

Epipolar

142 of 481

Scanning (milk + water - medium concentration)

Global

143 of 481

Scanning (milk + water - medium concentration)

Epipolar

144 of 481

Scanning (milk + water - high concentration)

Global

145 of 481

Scanning (milk + water - high concentration)

Epipolar

146 of 481

Thank you

Questions?

147 of 481

HDRnet and Artistic Style Transfer for HDR replication

Brandon Hung

148 of 481

HDRNet

149 of 481

Artistic Style Transfer

150 of 481

HDRnet for Tonemapping

Input Output Learned

151 of 481

Style Transfer to Preserve Content

Idea: map bright exposure image on to low-exposure image
Bright exposure = “style”, low-exposure = “content”

152 of 481

Combining the two?

Style transfer maps colors to image
HDRnet tonemaps color
Idea: tonemap image using HDRnet transfer tonemap on original image

Hopefully preserves tonemap color + original image content

153 of 481

Combining the two?

154 of 481

Unstructured Lightfields

Akshath Jain and Deepayan Patra

155 of 481

Inspiration

Assignment 4 - Planar Unstructured Lightfields

155

156 of 481

Technical Background

Goal: Generating arbitrary viewpoint rendering given unstructured, non-planar, input

Proposed Solution: Interpolate on a triangular mesh of input viewpoints with appropriate depth information to generate new image

156

157 of 481

Steps

Generate Input Poses

1

2

3

Triangulate Viewpoints

Rendering

157

158 of 481

Generating Poses

COLMAP estimates camera poses and a 3D reconstruction of the scene using structure from motion. This is broken down into:

Feature detection and extraction
Feature matching and geometric verification
Structure and motion reconstruction

158

159 of 481

Triangulating Viewpoints

From the new projected camera origins, we develop a Delaunay Triangulation of the viewpoints.

Each point within the input space has exactly three neighboring viewpoints to interpolate.

159

160 of 481

Rendering

Generate each novel viewpoint’s image as:

Backproject novel pixel onto mesh to determine intersecting face
Project novel pixel onto scene
Backproject scene pixel onto mesh face vertices
Assign novel pixel color using Barycentric interpolation

160

161 of 481

Results

162 of 481

Unstructured Images

163 of 481

Output

164 of 481

Credits

Our implementation was inspired by the Unstructured Light Fields paper by Davis et. al. We also relied on work done by Mildenhall et. al in LLFF to get camera poses from COLMAP and to identify ideal new sample viewpoints. Our full list of sources is as follows:

Davis et al. “Unstructured Light Fields.” Eurographics 2012.

Mildenhall et. al. “Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines.” SIGGRAPH 2019.

Levoy, Marc. “Light Fields and Computational Imaging.” IEEE 2006.

Buehler et. al. “Unstructured Lumigraph Rendering.” SIGGRAPH 2001.

Isaksen et. al. “Dynamically Reparameterized Light Fields.” SIGGRAPH 2000.

Schönberger, Johannes Lutz and Frahm, Jan-Michael. “Structure-from-Motion Revisited.” CVPR 2016.

Pollefeys, Marc. “Visual 3D Modeling from Images” 2002.

164

165 of 481

15663: Computational Photography

Fall 2020

Using Spatio-Temporal Radiance Variations to Look Around Corners

Varun Jain

varunjai@andrew.cmu.edu

166 of 481

Idea^[1]

Figure 1a: A and B represent two people hidden from the camera’s view by a wall.

167 of 481

Idea^[1]

167

Figure 2: (top) the hidden scene with 2 actors, (bottom) penumbra as seen by the naked eye, as seen by the camera and the groundtruth trajectories over time.

168 of 481

Contributions

Re-implementation of the existing code-base in Python

Benchmark the implementation by comparing results on the given video sequences

Explore the efficacy of features extracted by deep methods

169 of 481

Results ^{[1 person]}

Fig: Observed video

Fig: 1-D angular projections of the hidden scene

170 of 481

Results ^{[2 people]}

Fig: Observed video

Fig: 1-D angular projections of the hidden scene

171 of 481

Comparative Results

Method for Background Subtraction	Error wrt Official Code (MSE)
First Frame	7.41
Mean Frame	2.94
Windowed Average Frame	20.16

172 of 481

Limitations

1D reconstruction

Temporal motion required

Assumes uniform floor albedo

173 of 481

Contributions

Re-implementation of the existing code-base in Python

Benchmark the implementation by comparing results on the given video sequences

Explore the efficacy of features extracted by deep methods

174 of 481

Dataset and Methodology

Figure 3: (left) generated dataset, and, (right) model architecture.

175 of 481

References

Bouman, Katherine L., Vickie Ye, Adam B. Yedidia, Frédo Durand, Gregory W. Wornell, Antonio Torralba, and William T. Freeman. "Turning corners into cameras: Principles and methods." In Proceedings of the IEEE International Conference on Computer Vision, pp. 2270-2278. 2017.

Seidel, Sheila W., Yanting Ma, John Murray-Bruce, Charles Saunders, William T. Freeman, C. Yu Christopher, and Vivek K. Goyal. "Corner occluder computational periscopy: Estimating a hidden scene from a single photograph." In 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1-9. IEEE, 2019.

Tancik, Matthew, Guy Satat, and Ramesh Raskar. "Flash photography for data-driven hidden scene recovery." arXiv preprint arXiv:1810.11710 (2018).

Saunders, Charles, John Murray-Bruce, and Vivek K. Goyal. "Computational periscopy with an ordinary digital camera." Nature 565, no. 7740 (2019): 472-475.

175

176 of 481

Evaluating Confocal Stereo

Kyle Jannak-Huang

177 of 481

Relative Exitance Estimation

Aperture f3.5

Focus dist ~ 2cm

Aperture f7.1

Focus dist ~ 2cm

178 of 481

Relative Exitance Estimation

Aperture f3.5

Focus dist ~ 2cm

Aperture f3.5

Focus dist ~ 90cm

179 of 481

Image Alignment

180 of 481

Depthmap from Aperture-Focal Images

Diagonal line artifacts related to small differences in corner detection during image alignment
White/black artifacts on leaves that moved due to air flow during capture
Poor depth estimation on rest of plant due to incorrect depth estimates

181 of 481

Depthmap from Aperture-Focal Images

Diagonal line artifacts related to small differences in corner detection during image alignment
White/black artifacts on leaves that moved due to air flow during capture
Poor depth estimation on rest of plant due to incorrect depth estimates

182 of 481

Depthmap from Aperture-Focal Images

183 of 481

Fuzz dataset images

184 of 481

Denoised result

185 of 481

Discussion

Benefits from using a custom feature detector that will find features in the center regardless of blur
Improvements in capture - optical table, diffuse plane, better lighting
Optimizer for image alignment needs tuning
Could benefit from segmenting out the background

186 of 481

Gradient-Domain Path Tracing

Zhi Jing (Zoltan), Ran Zhang (Ryan)

187 of 481

188 of 481

189 of 481

Gradient-Domain Path Tracing

Perform standard Monte Carlo rendering to obtain primal image
Sample gradients
Reconstruct image from primal and gradients

190 of 481

Gradient-Domain Path Tracing

Perform standard Monte Carlo rendering to obtain primal image
Sample gradients
Reconstruct image from primal and gradients

191 of 481

Standard Monte Carlo rendering

192 of 481

Gradient-Domain Path Tracing

Perform standard Monte Carlo rendering to obtain primal image
Sample gradients
Reconstruct image from primal and gradients

193 of 481

Render result - primal image

194 of 481

Sample gradients

195 of 481

The most naive approach

196 of 481

Gradient image - naive approach

dx

dy

197 of 481

Reconstruction result - naive approach

primal

final

198 of 481

Reconnection

199 of 481

Gradient image - with reconnection

dx

dy

200 of 481

Gradient image - naive approach

dx

dy

201 of 481

Reconstruction result - with reconnection

primal

final

202 of 481

Paper’s approach - add MIS

dx

dy

203 of 481

Paper’s approach - add MIS

primal

final

204 of 481

Gradient-Domain Path Tracing

Perform standard Monte Carlo rendering to obtain primal image
Sample gradients
Reconstruct image from primal and gradients

205 of 481

Image Reconstruction

Find an image that best fits the estimated primal and gradients

Primal

Gx

Gy

206 of 481

Poisson Reconstruction

L2-norm reconstruction

L1-norm reconstruction

Gradient term

Primal term

207 of 481

Poisson Reconstruction

A x = b

Solve the least squares minimization problem using Conjugate Gradient Descent

208 of 481

Naïve Implementation using CGD

209 of 481

Naïve Implementation using CGD

L2 norm

L1 norm

210 of 481

Re-implement and Experiment (L1 norm vs. L2 norm)

L2 Norm

L1 Norm

211 of 481

Experiment (L1 norm vs. L2 norm)

L2 Norm

L1 Norm

Unbiased but has some artifacts

Biased but has less artifacts

212 of 481

Experiment (different α values)

α = 0.2

α = 1.0

α = 10

213 of 481

Thank you!

214 of 481

References

Kettunen, Markus, et al. "Gradient-domain path tracing." ACM Transactions on Graphics (TOG) 34.4 (2015): 1-13.

Lehtinen, Jaakko, et al. "Gradient-domain metropolis light transport." ACM Transactions on Graphics (TOG) 32.4 (2013): 1-12.

Manzi, Marco, et al. "Gradient-Domain Bidirectional Path Tracing." EGSR (EI&I). 2015.

Pérez, Patrick, Michel Gangnet, and Andrew Blake. "Poisson image editing." ACM SIGGRAPH 2003 Papers. 2003. 313-318.

Manzi, Marco, Delio Vicini, and Matthias Zwicker. "Regularizing Image Reconstruction for Gradient‐Domain Rendering with Feature Patches." Computer graphics forum. Vol. 35. No. 2. 2016.

Ji, Hao, and Yaohang Li. "Block conjugate gradient algorithms for least squares problems." Journal of Computational and Applied Mathematics 317 (2017): 203-217.

215 of 481

View Synthesis using Neural Radiance Fields

Rohan Joshi

216 of 481

Problem Definition:

Given: Images of a scene from known camera poses,

Task: Render new views of the scene

217 of 481

Method: Scene Representation as Neural Radiance Fields

Use a fully connected network instead of a voxel grid �(Advantages: Simplicity and efficient storage)��
Calculate by querying points along rays

218 of 481

Algorithm for Volume Rendering

Forward Pass

Step 1: Generate Camera Rays

Step 2: Sample points along each ray (affects the resolution of output image )

Step 3: Get RGB and σ (Volume Density at the point) value

219 of 481

Algorithm for Volume Rendering

Error Calculation and Backpropagation

Step 1: Composite the pixel color along the ray to get the rgb image

Step 2: Minimise the mse for known images

Step 3: Backprop ...

W(σi)

220 of 481

Results: Performed on Synthetic Data

�

Training Images:

100 images taken from different views
Camera Poses provided in the dataset

Training Set

221 of 481

Result: Rendered Views

222 of 481

References

DeepStereo: Google quartet has method for new-view synthesis- Credit: John Flynn et al. arXiv:1506.06825
Nerf: https://arxiv.org/pdf/2003.08934.pdf
https://www.matthewtancik.com/nerf

223 of 481

Thank you

224 of 481

Color-Filtered Aperture For Image Depth & Segmentation

Leron Julian

225 of 481

Depth Estimation Techniques

Stereo

Depth From Disparity Using Color Filter (This Project)

Depth From Defocus

226 of 481

Constructing Depth Using Color Misalignment

Red, Green, Blue Color Aperture placed in a certain arrangement
A scene point farther than focused depth =

Right-Shift in R
Up-Shift in G
Left-Shift in B

Color shifts come from geometric shifts.

227 of 481

How? (Red & Green Example)

Background

Camera Sensor

Object In Focus

Color Aperture

228 of 481

Sample Images

229 of 481

Depth Estimation

Let d be the hypothesized disparity between the RGB Channels:
Need to measure the quality of between:

Colors in natural images form elongated clusters in RGB space

230 of 481

Depth Estimation

Consider a set of pixels with hypothesized disparity d. Set can be represented as:

Goal is to find a disparity d that minimizes the color alignment measure

As L gets small when cluster is elongated meaning the RGB components are correlated

231 of 481

RGB Color Model

d = 1

d = 3

d = 5

As the disparity d increases, the cluster becomes more isotropic.

232 of 481

Depth Map

Can now solve for the disparity d to by minimizing L for a predetermined range of disparities.
Can construct a depth map using local estimates (prone to error) or energy minimization graph-cuts:

Captured Images

Local Estimates

Estimate with Graph-Cuts

233 of 481

Trimap

Foreground

Unknown

Background

234 of 481

Matting

Matting Equation:

235 of 481

Matte Optimization Flow

Optimize the matte with various consistency methods based on foreground and background color
Obtain the optimal matte

236 of 481

Applications of Matting

Changing the background of images
Refocusing
Artificial Background Blur
Video Matting

237 of 481

References

Bando et. al, “Extracting Depth and Matte using a Color-filtered Aperture”. SIGGRAPH 2008.

Levin et. al, “A closed form solution to natural image matting”. CVPR 2006

238 of 481

Depth from Defocus in the Wild

Alice Lai, Adriana Martinez

239 of 481

Goal: Two-frame depth from defocus using tiny blur condition

Original Image

Image w/Blur

240 of 481

Idea: Combine local depth/flow estimation w/ �spline-based scene understanding

241 of 481

Idea: Combine local depth/flow estimation w/ �spline-based scene understanding

242 of 481

Local Depth/Flow Estimation (Bottom-up Likelihood)

Defocus Equalization Filters

Minimize sum-of-squared difference between two equalized images!

Ensures brightness constancy and equalizes the variances between the two images

Local Likelihood Q

243 of 481

Local Depth/Flow Estimation (Bottom-up Likelihood)

Local Prior L_qp

Local smoothness prior due to overlapping patches

Local Likelihood Q

Fit quadratic function to make it analytical for global optimization

244 of 481

Local Depth/Flow Estimation (Bottom-up Likelihood)

Q_p evaluated at every pixel q for some (d,v) patch pixel p estimate (p denotes center pixel)

M x M array of values that encodes smoothness between every pixel w.r.t. every pixel

Minimize over LDFD loss to optimize (d,v) patch estimates using Markov Random Field (MRF)

245 of 481

Idea: Combine local depth/flow estimation w/ �spline-based scene understanding

246 of 481

Global DFD (Top-Down Likelihood)

Inputs:

Control Points, �Feature Map, �Patch Likelihoods

Update weight vectors

Update occlusion map

Update depth planes, 2D affine transformations, segment labels of control points

Update control points’ feature vectors

Output:

Pixel-based depth and flow

Patch-based depth and flow

Scene segmentation

Spline parameters

Control Point C_n: �- depth plane D�- 2D affine transformation U, V

Pixel q (including control points):

- weight vector

- feature vector

- segment label

247 of 481

Global DFD (Top-Down Likelihood)

Control Points

Scene Segmentation

248 of 481

Results

Arbitrary L_qp smoothness depth map

Ground truth depth

Original input

Patch-based L_qp smoothness depth map

249 of 481

Results

Arbitrary L_qp smoothness depth map

Ground truth depth

Original input

Patch-based L_qp smoothness depth map

250 of 481

Acknowledgements

Dr. Ioannis Gkioulekas for the tireless support of our project through many OH sessions and Slack messages

Dr. Kiriakos Kutulakos for sharing his research group’s dataset and contact information of first authors

References:

Huixuan Tang, Scott Cohen, Brian Price, Stephen Schiller and Kiriakos N. Kutulakos, Depth from defocus in the wild. Proc. IEEE Computer Vision and Pattern Recognition Conference, 2017. https://www.dgp.toronto.edu/WildDFD/

251 of 481

Questions?

252 of 481

A Reconstruction Framework for Time Series Thermal Traces using Mixed Stereo Photography

Arjun Lakshmipathy

Computational Photography (15-862)

Fall 2020

253 of 481

Context

Larger Thrust: Automated Design of Custom, Low-Cost Dextrous Hands from Demonstration

254 of 481

Principal Reference

255 of 481

The Approach

Difficult to simulate, not so difficult to capture (with the right tools)
Thermography can detect residual heat traces imparted on objects through human grasping
With depth + different viewpoints, we can estimate contact texture on arbitrary geometries
Use multiple cameras and fuse
Approach: Manipulate physical object, rotate object and capture both thermal and depth images from multiple viewpoints, combine into contact map

256 of 481

The Problems

3 problems to solve:

Image alignment
Pose and location estimation of target object within the capture scene
Texture mapping

257 of 481

Materials and Setup

•1 Intel RealSense D415

•1 Flir C5

•2 tripods + 1 clamp mount

•?? Hand warmers

•1 dressed up turntable

•1 calibration target

•1 house lab space

Turntable and second tripod taken borrowed from CMU Motion Capture Lab

258 of 481

Process Step 1: Calibration

•Thermal intrinsics

•Stereo Calibration

•~ 10 images apiece

259 of 481

Process Step 2: Capture

260 of 481

Process Step 3: Object Pose Estimation

•Entirely in Depth CCS

•Load 3D model, convert to point cloud

•Segment scene to extract object

•Solve R and T using Iterative Closest Point Method (ICP) [2]

261 of 481

Process Step 4*: Alignment

262 of 481

Process Step 5*: Texture Mapping via Color Map Optimization [3]

263 of 481

Concluding Remarks

•Process: Calibration -> Capture -> Pose Estimation -> Alignment -> Texture Mapping

•Works reasonably…in principle. Still some items to resolve for full operation

•Next steps: work through bugs and nuances with each step, collect initial and final grasps for multiple objects from multiple subjects. Use as basis for next research thrust.

*Synthesized from dataset captures only / not my own yet

264 of 481

Thanks for listening! Questions?

265 of 481

References

[1] S. Brahmbhatt, C. Ham, C. C. Kemp, and J. Hays, “Contactdb: Analyzing and predicting grasp contact via thermal imaging,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8709–8719, 2019. �

[2] PJ Besl and Neil D McKay. A method for registration of 3- d shapes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 14(2):239–256, 1992

[3] Qian-Yi Zhou and Vladlen Koltun. Color map optimization for 3d reconstruction with consumer depth cameras. ACM Transactions on Graphics (TOG), 33(4):155, 2014.

266 of 481

HDR Image Reconstruction using Hallucinated exposure stack

Shamit Lal(shamitl), Sajal Maheshwari(sajalm)

267 of 481

Problem

268 of 481

Traditional approach

Collect a stack of images at different exposures
If the images are non-linear, linearize them.
Merge the linear exposure stack.
If we want to display the image, we can also do a tonemapping operation

269 of 481

Deep learning based methods

270 of 481

What if ?

We can merge these ideas and :

Create a linearized and de-quantized exposure stack using a single input image.
Use this linearized stack and weights for each of the pixels and generate an output HDR image.
Instead of traditional merging method, let a deep network decide the weights for each of the pixels in the stack.

271 of 481

Our approach

Step 1 : Exposure stack generation

Images should be sharp.
The changes should be in brightness, not in color.
No multiple iterations
Possible approaches

Naive encoder-decoder
Mapping based model
Modified encoder-decoder

272 of 481

Naive encoder-decoder

A simple single encoder- multi decoder model with different decoder heads having supervision corresponding to different images of the exposure stack.
Result :

Blurry images
Images with artefacts like color inconsistency
Not great results as the output images looked very similar to the original input.

273 of 481

Mapping based approach

274 of 481

Mapping based approach

The approach described in the previous slide uses multiple iterations. Therefore, although being a good starting point, it can not be used in our experiments.
We now explain our approach, which can directly give us a mapping of higher degree.
We output a 256 dimension vector (for each channel). Now, we make this vector monotonic and bound it from 0 to 1, and then for intensity value at each pixel, pick the output intensity value to be the value at the index of the input intensity.

0	0.2	0.4	0.7	.	.	.	1

275 of 481

Mapping based approach - Problem!

This operation is non-differentiable. How to solve this ?
During training, use an extra input of 256 x H x W dimensions, with each of the entries containing a one-hot vector. Multiply each of these values with the output vector (at each pixel value) earlier element-wise and sum the values along the channel dimension.
Additionally, all the outputs should be identical, which we incorporate by adding loss to make all the pixel-wise embeddings closer to the mean embedding.
Better and sharper results!

276 of 481

Mapping based approach - Results

The results are much better now, however, we can see some color cast. This might be because we are creating the function for all channels. We are currently trying to run this method only on the luminance channel.

277 of 481

Modified encoder-decoder

Instead of using a naive encoder-decoder trying to learn the output, try to learn the difference between the output and input instead (has been done before for many many problems!)

278 of 481

Modified encoder-decoder

The modified encoder-decoder ensures that the results are close to the input but also have the brightness of the output where supervision is provided.
The 1x1 convolution ensures that the output we get is sharp and not blurry outputs.

279 of 481

Merging HDR

Once we get the stacks, we can either merge the images to get the HDR image using traditional approaches or a learning based method, which can again consist of an encoder-decoder with shared weights for each of the outputs.
We have not yet run any tone-mapping model, so the HDR images generated look a bit dark. However, the images are decent when viewed in HDR viewers like openhdr(https://viewer.openhdr.org/). However, this is not working as expected and we believe there is still some training/tuning required.

280 of 481

Unsupervised methods ?

We have currently been providing supervision at every stage - what happens if we try to generate the exposure in an unsupervised fashion ?
Currently running experiments for unsupervised methods

281 of 481

References

1.Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline, Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang (https://arxiv.org/abs/2004.01179)

2.Deep Reverse Tone Mapping, Yuki Endo, Yoshihiro Kanamori, and Jun Mitani(http://www.cgg.cs.tsukuba.ac.jp/~endo/projects/DrTMO/)

3.HDR image reconstruction from a single exposure using deep CNNs,Gabriel Eilertsen, Joel Kronander, Gyorgy Denes, Rafał K. Mantiuk, Jonas Unger(https://arxiv.org/abs/1710.07480)

4. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement, Chunle Guo, Chongyi Lim Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, Runmin Cong(https://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_Zero-Reference_Deep_Curve_Estimation_for_Low-Light_Image_Enhancement_CVPR_2020_paper.pd f )

5. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , Jun-Yan Zhu, Taesung Park, Phillip Isola ,Alexei A. Efros(https://junyanz.github.io/CycleGAN/)

6. ExpandNet: A Deep Convolutional Neural Network for High Dynamic Range Expansion from Low Dynamic Range Content, Demetris Marnerides, Thomas Bashford-Rogers, Jonathan Hatchett, Kurt Debattista(https://arxiv.org/abs/1803.02266)

282 of 481

Shape Estimation

Qiqin Le, Qiao Zhang

283 of 481

Overview

Limitations:

Depth from defocus:

Affected by occlusions
Ambiguity

Depth from correspondence:

Correspondence errors
Matching ambiguity at repeating patterns and noisy regions
Affected by occlusions

Combining depth from defocus and correspondence

Require complicated algorithms / Camera modifications / Multiple image exposures

Improvement:

Use angular coherence to improve robustness.

Pipeline:

Combine defocus and correspondence metrics for dense depth estimation and then uses the additional cue of shading to refine details in the shape.

284 of 481

Pipeline

Tao, Michael W., et al. "Shape estimation from shading, defocus, and correspondence using light-field angular coherence."

285 of 481

Image Source

http://lightfield.stanford.edu/lfs.html

Our images taken by the Lytro Illum camera

286 of 481

Defocus Depth

287 of 481

Defocus Depth

Depth Map & Confidence Map & Shape Estimation

288 of 481

Correspondence Depth

289 of 481

Correspondence Depth

Depth Map & Confidence Map

290 of 481

Combined Depth from Defocus and Correspondence

291 of 481

Combined Depth from Defocus and Correspondence

Depth Map & Confidence Map

292 of 481

Shading Depth

Albedo

293 of 481

Shading Depth

Spherical Harmonic

294 of 481

Defocus Depth

Depth Map & Confidence Map

295 of 481

Correspondence Depth

Depth Map & Confidence Map

296 of 481

Combined Depth from Defocus and Correspondence

Depth Map & Confidence Map

297 of 481

Defocus Depth

Depth Map & Confidence Map

298 of 481

Correspondence Depth

Depth Map & Confidence Map

299 of 481

Combined Depth from Defocus and Correspondence

Depth Map & Confidence Map

300 of 481

References

[1] Tao, Michael W., et al. "Shape estimation from shading, defocus, and correspondence using light-field angular coherence." IEEE transactions on pattern analysis and machine intelligence 39.3 (2016): 546-560.

[2] Tao, Michael W., et al. "Depth from combining defocus and correspondence using light-field cameras." Proceedings of the IEEE International Conference on Computer Vision. 2013.

301 of 481

Q & A

302 of 481

Normal Estimation for Transparent Objects

Amy Lee

302

303 of 481

Overview + Motivation

Naive photometric stereo methods do not work well for specular or transparent objects.
Improvement idea:

Use a reference chrome sphere to estimate the light direction for each image.
Estimate the normals at highlighted points on the object using the reference light direction.

303

304 of 481

Pipeline

304

305 of 481

Image Set

305

Video of static scene, moving spot light (dense set of images under variable illumination)
Object set next to chrome sphere
Assume orthographic camera

306 of 481

Image Segmentation

306

Image

Mask

Final

307 of 481

Initial Silhouette-based Surface Normals

Key Ideas:

Surface normals are parallel to the image plane at the silhouette of the object.
Interpolating the silhouette normals gives a decent approximation of the surface.
Caveat: Initial normals heavily influence quality of final result.

307

308 of 481

Initial Silhouette-based Surface Normals

308

309 of 481

Orientation Consistency w/ Reference Sphere

309

310 of 481

Normal Clustering

Can infer that highlights occur when intensity exceeds some threshold (e.g. >0.9). Highlight occurrences form clusters in time.

310

311 of 481

Normal Clustering

May have multiple clusters due to interreflections (not true direct highlights).
Find top 2 “salient” clusters (i.e. the 2 clusters with the most observations) corresponding to specular highlights for every pixel.
Record the mean normal corresponding to each cluster (computed using reference sphere).

311

312 of 481

Finding Optimal Normal Clusters

Graph-cut cost minimization using max flow algorithm.

At most 2 nodes per pixel (1 node per salient normal cluster).
Want to label nodes as being either true or false highlights. If true, use reference normal.
Edges between adjacent pixels’ nodes.
Cost (energy): difference with initial normal (E₁) + neighboring normals (E₂)

312

313 of 481

Final Calibrated Normals

313

Quality of results are heavily influenced by the initial normals

314 of 481

References

Yeung, Sai-Kit & Wu, Tai-Pang & Tang, Chi-Keung & Chan, Tony & Osher, Stanley. (2015). Normal Estimation of a Transparent Object Using a Video. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 37. 890-897. 10.1109/TPAMI.2014.2346195.
H.-S. Ng, T.-P. Wu, and C.-K. Tang, “Surface-from-gradients without dis-crete integrability enforcement: A Gaussian kernel approach,”IEEE Trans.Pattern Anal. Mach. Intell., vol. 32, no. 11, pp. 2085–2099, Nov. 2010.

314

315 of 481

Thank you for your time!

315

316 of 481

Event-based Object Tracking-by-Detection

Jessica Lee

317 of 481

Why Event-Based Vision?

Event-based vision measures log-based changes in brightness for each pixel in the sensor independently

In comparison to a typical camera, which captures via rolling shutter synchronously [1].

Little motion blur
High-dynamic range

Polarity - moving in/out of a scene

Data point: (ts, x, y, p)

318 of 481

Prior Work on Event-Based Object Detection

Image Reconstruction [2]

Event Tensors:

Event Volumes [5]
Histograms
Time Surfaces

RED [3] (SoTA)

backbone

heads

319 of 481

Our Approach - Tracking by Detection

CenterTrack [4]

Event Volumes [5]

Volume t

Volume t-1

Tracks t-1

Tracks t

Sizes t

Offsets t

Inputs

Outputs

320 of 481

Current Results & Future Work

Results:

Optimized Event Volume Data Loading - From 4 to 0.001 seconds
Currently, training is not converging
Labels are incredibly sparse for the event data - taken every 5 seconds

Doesn’t allow to learn tracking since we need adjacent labels

Future Work:

Balance positive and negative samples to 50/50
Trim videos to only use the event data that has labels
Sample from random videos to make a diverse batch

321 of 481

Citations

[1] E. Mueggler, B. Huber, and D. Scaramuzza, Event-based, 6-DOF Pose Tracking for High-Speed Maneuvers. IROS 2014

[2] H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza. High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence 2019.

[3] E. Perot, P. Tournemire, D. Nitti, J.Masci, and A. Sironi. Learning to Detect Objects with a 1 Megapixel Event Camera. NeurIPS 2020.

[4] X. Zhou, V. Koltun, and P. Krähenbühl. Tracking Objects as Points. ECCV 2020.

[5] A. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, Unsupervised event-based learning of optical flow, depth, and egomotion. CVPR 2019.

322 of 481

Distortion-Free Wide-Angle Portraits on Camera Phones

Ji Liu, Zhuoqian Yang

12/17/2020

323 of 481

Introduction and

Method Overview

324 of 481

Introduction

Modern mobile phones are equipped with wide angle cameras
Wide angle cameras introduce artifacts caused by perspective projection
An algorithm is needed to do correction on-site immediately after photo taking

325 of 481

Introduction

Before correction After correction

326 of 481

Method Overview

327 of 481

Results on Images of Our Own

328 of 481

Naive Stereographic Result

input

naive stereographic

flow

329 of 481

Subject Mask Segmentation Result

input

face mask

330 of 481

Correction Result

Before correction

After correction

331 of 481

Method - Subject Mask Segmentation

Use Detectron 2 to extract person mask
Use Dlib to extract face bounding box
Intersect the two to get face mask

332 of 481

Method - Stereographic Projection

Stereographic projection can picture the 3D world onto a 2D plane with minimal conformal distortion, at the expense of changing the curvature of long lines.

We enforce the stereographic projection locally on face regions to correct perspective distortion.

Given the camera focal length f , we compute the stereographic projection from the input using a radial mapping:

333 of 481

Method - Mesh Placement

334 of 481

Before and after naive stereographic, we can observe strong artifacts around the face regions.

335 of 481

Method - Local Face Undistortion

We minimize the following energy function to determine an optimal mesh.

where Et is the weighted sum of several energy terms including Face Objective Term, Line-Bending Term and Regularization Term.

336 of 481

Method - Regularization Term

we regularize the mesh by encouraging smoothness between 4-way adjacent vertices using a regularization term

337 of 481

Method - Line-Bending Term

On the boundary between the face and background, straight lines may be distorted because the two regions follow diﬀerent projections. We preserve straight lines by encouraging the output mesh to scale rather than twist by adding a line-bending term. The line-bending term penalizes the shearing of the grid, and therefore preserves the edge structure on the background

338 of 481

Method - Mesh Boundary Extension

Similar to gradient-domain image processing, a simple boundary condition by forcing

vi = pi on the mesh boundary would do the job. However,

it creates strong distortions when faces are close to image boundary.

This distributes the distortion to padded vertices, and reduces artifacts near the boundary of the output image.

339 of 481

Method - Similarity Constraint

Constrain the transformation around each facial area to be a similarity transformation that preserves scale

340 of 481

Extension of the Method to Other Object Categories

341 of 481

342 of 481

343 of 481

344 of 481

Thank you!

345 of 481

Mirror, Mirror On the Wall: Detecting Mirrors and Correcting Point Clouds

Sachit Mahajan

sachitma

345

346 of 481

Motivation

Mirrors are common in indoor scenes
Mirrors appear as doors or entranceways to visual sensors.
For scene reconstruction- point clouds have to be be manually corrected by giving the mirror boundaries
Robots performing exploration cannot differentiate between doors and mirrors, and end up planning paths through mirrors or forming incorrect maps

Matterport3D: Learning from RGB-D Data in Indoor Environments

346

347 of 481

Motivation

Taken from Phone (camera+Lidar) and ‘3d Scanner App’

347

348 of 481

Setup

Logitech c920

Velodyne

VLP-16

AprilTAG

IMU

348

349 of 481

Methodology

349

350 of 481

Calibrate Extrinsics and Locate Mirror/Entrances

Preprocess pointcloud: Median Filtering + Bilateral Filtering
Use depth discontinuities to detect possible mirrors

350

351 of 481

Classify mirrors and Remove Points

Two different methods

Using self-reflection (mirrored fiducial marker) to identify mirror plane

Detect the 6dof pose of AprilTag, this gives the location of one point on the mirror plane and its orientation
Novel method: Divide the image into grids, use a region growing to identify mirror planes. Heuristics used to classify grid as mirror-

Part of AprilTag lies in Grid
Depth greater than mirror plane and high depth discontinuities
Grid normals differ from that of mirror plane/ High normal variance (sum of dot products)
Image intensities have high variance (mirror boundaries)

Requires very accurate calibration

Segmentation mirrors in images by modeling semantical and low-level color/texture discontinuities -

MirrorNet from ‘Yang, Xin and Mei, Haiyang and Xu, Ke and Wei, Xiaopeng and Yin, Baocai and Lau, Rynson W.H.Where Is My Mirror?’
Works for only one mirror in the scene, fails very often when there is no mirror in the scene

351

352 of 481

Classify mirrors and Remove Points

Mask Obtained From Region Growing

Mask Obtained From MirrorNet

352

353 of 481

Results

353

354 of 481

Future Work

Correct the errors in region growing,
Region growing should be done using a coarse-to-fine approach such that mirror boundaries can be better estimated
Instead of removing point clouds they can be corrected, since we know our pose and mirror plane we can find which 3D point they actually correspond to and reproject them.
Once a mirror is found and estimated, it should be tracked through the scene even after the AprilTag is not visible

354

355 of 481

Image Deblurring Using Inertial Measurement Sensors

Kevin O’Brien

kobrien

356 of 481

Problem Motivation/Description

Blurry images stink! She gets it --------------------->

Deconvolution is difficult

-Optimization with no closed-form solution

-Different optimization strategies perform well for some types of scenes, poorly for others

-Generally want to enforce sparse gradients in the image to preserve sharp edges

357 of 481

Blind Deconvolution

Simultaneously solve for the sharp image I and the blur kernel K

-This is an even harder optimization problem, solving for two things at once

-Advances in blind deconvolution come from introducing additional constraints

-sharp edge prediction, color statistics

MAIN IDEA

When we’re dealing with camera shake, use the camera’s motion to help inform the blind deconvolution problem

358 of 481

Blur as a Homography

For small camera movements during exposure, blurring can be parametrized as small planar warps

359 of 481

Sensor Data Processing

Accelerometer readings give linear acceleration (m/s^2)

Gyroscope readings give rotational velocity (rad/s)

Procedure:

Integrate rotational velocity to get angular position, convert to rotation matrix
Rotate accelerometer readings into starting reference frame
Integrate accelerometer readings to get relative translations
Use recovered rotations/translations for homography calculation

360 of 481

Implementation Status

ADMM deblurring optimization with known blur kernel

361 of 481

Implementation Status

Deblurring using ground truth camera positions

362 of 481

Implementation Status

Use raw sensor values next

363 of 481

Thank you!

364 of 481

Light field 3D reconstruction

Wenxuan Ou (Owen)

365 of 481

Abstract

Objective: Obtain high resolution depth map from structured light field
Primary reference:

Kim, Zimmer. “Scene Reconstruction from High Spatio-Angular Resolution Light Fields.” ACM transactions on graphics 32.4 (2013): 1–12. Web.

Other materials:

Lin, Yeh. “Depth Estimation for Lytro Images by Adaptive Window Matching on EPI.” Journal of imaging 3.2 (2017): 17–. Web.

Motivation: Extend the algorithm to structured lightfield (Lytro image), achieve 3D reconstruction in a single-shot.

366 of 481

Algorithm

Depth estimation from epipolar image (EPI)

Depth relationship to disparity:

Calculate the depth score of each possible disparity values, select the best estimate:

Refine depth estimation based on confident masking and radiance/color

Compute depth only at confident locations, thresholding

Reference: Kim et al

367 of 481

Algorithm

Obtain EPI from Lytro image

Does not need rectification

Reference: Lin et al

368 of 481

Results

Original image

EPI

Disparity

Confidence

369 of 481

Results

Original image

My depth map

AFI depth map

370 of 481

Discussion and conclusion

Lytro image does not have large enough disparity to generate reliable depth estimation. Need a larger camera transition to capture fine details.
Depth from EPI is strongly influenced by texture of the objects. Homogeneous region leads to ambiguous depth score.
Future extension:

The homogeneous problem can be potentially solved by doing a further step fine-to-coarse refinement.

371 of 481

Thank you !

372 of 481

3D Scanning with Structured Light

George Ralph

Hello to everyone passing through!

Good luck on your presentations ♥

373 of 481

Background

Structured light reconstruction is dependent on accurate pixel correspondences

Global Illumination
Defocus

374 of 481

Reducing Global Illumination Effects

375 of 481

Pattern Binarization

Two-Image Binary Codes

One-Image XOR Gray Codes

376 of 481

Other Filtering Techniques

Median Filtering of Correspondences

Point Cloud Filtering

377 of 481

Examples

378 of 481

Examples

379 of 481

Examples

380 of 481

Thank You!

Feel free to ask questions.

381 of 481

Adaptive SPAD Imaging with Depth Priors

Po Ryan

382 of 481

Background

High temporal resolution
Only records first arriving photon
Performs fine under no ambient light
Pile-up effect under ambient light

Coates’ Estimate used for pile-up compensation
Earlier bins carry more information
Earlier bins have better estimates

SPAD Sensor

[Gupta et al. “Photon-Flooded Single-Photon 3D Cameras” CVPR 2019]

383 of 481

Depth Estimate

Establish Depth Prior

Adjust Gate and Active Time of SPAD

Method

384 of 481

Stereo Pairs

Depth Estimate

Adaptive SPAD

Conventional SPAD

Poor estimates for occluded pixels
Certainty of estimate unknown

Stereo Depth Prior

385 of 481

[Xia et al. “Generating and Exploiting Probabilistic Monocular Depth Estimates” CVPR 2020]

CNN

Single RGB Image

Depth Estimate

Uncertainty

Adaptive SPAD

Conventional SPAD

NN-Based Probabilistic Single RGB Depth

Estimates are probabilistic
Shifting of gate and active time can be determined through uncertainty

386 of 481

Application to Async. Acquisition

[Gupta et al. “Asynchronous Single-Photon 3D Imaging”]

Asynchronous Acquisition

Uniform shifting of gate
Every bin is near beginning of transient for at least one shift

Depth prior can be utilized depth prior to determine shifting pattern instead of relying on uniform shifting schemes
Depth prior based coding schemes yield at least 20% increase in accuracy

387 of 481

Integration with NN-Based Pipelines

[Siddiqui et al. “An Extensible Multi-Sensor Fusion Framework for 3D Imaging” CVPR 2020]

Stand-alone adaptive SPAD may not give the best final measurements, but improves raw SPAD measurements
Better raw measurements means that other multi-sensor fusion methods would also perform better

388 of 481

SPAD Only Adaptation?

Per bin probability of error can be estimated

CR-bound for variance of Coates’ estimate
Chernoff Bound on probability of error for each bin

Per bin probability is quite uniformly distributed making adaptive gating very ill-posed

[Pediredla et al. “Signal Processing Based Pile-up Compensation for Gated Single-Photon Avalanche Diodes”]

389 of 481

Thanks!

390 of 481

Computational Photography Project

CMU-15663

Tejas Zodage , Harsh Sharma

391 of 481

Handheld Photometric Stereo

Motivation

To develop a simple device for dense 3D reconstruction of static scene.

Problem definition

A set of images taken by camera with a fixed Point light source attached to it.

Add photometric constraints to multi-view stereo to get a dense depth map.

391

392 of 481

Problem definition

Camera

Point light source rigidly attached to the camera

Input: m images

.

Output: depth image

algorithm

393 of 481

Procedure

Use images from multiple views to get a per-pixel depth map estimate, using plane sweep stereo
Get estimates for normals, albedos and ambient lighting using RANSAC near-light photometric stereo.
Refine the depth, normal and albedo estimates by formulating an optimization problem with the photometric and depth smoothness costs.

393

394 of 481

Initial estimates

Plane sweep stereo for per pixel depth estimate-

Choose a reference frame and warp images.
Pixel locations are estimated to be on that plane where their NCC score is maximum.

RANSAC based near-light photometric stereo-

Normals and albedos

Depth planes

395 of 481

Optimization

Photoconsistency cost - account for varying lighting, since the light source is attached to the moving camera.
Surface normal cost - Preferred depth estimates are those which are consistent with the surface normal estimates.
Smoothness cost - penalize large discontinuities in the depth map

Model the problem into a graph-cut optimization problem and solve for the constraints jointly

395

396 of 481

Can’t we solve it like HW5?

Near-light photometric stereo	Conventional photometric stereo
Directional light source	Point light source
Orthographic camera	Perspective camera

No!

397 of 481

Data generation

Blender + Mitsuba

397

Generate scenes using blender (easy for visualization)
Render scenes using mitsuba (photorealistic rendering)

398 of 481

Results

Plane sweep stereo - depth map

398

399 of 481

Challenges

Data generation -

No readily available dataset

Graph-cut implementation (np hard problem) -

Suggested algorithm for graph cut by authors - alpha beta swapping algorithm.
Takes hours for optimization on images of size > 100 px X 100 px

400 of 481

Future work

Data generation -

Real world data.

Graph-cut alternatives -

Look for better solutions to solve energy minimization on graphs.
Deep learning based methods?

401 of 481

References

Higo, Tomoaki, et al. "A hand-held photometric stereo camera for 3-d modeling." 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009.
Plane sweeping stereo https://github.com/ReynoldZhao/Stereo/blob/master/src/plane_sweep_stereo.py
Graph cuts�https://github.com/roflmaostc/Fast-Approximate-Energy-Minimization-via-Graph-Cuts

402 of 481

Image Morphing

Jiazheng Sun

403 of 481

Sample effect

404 of 481

Step 1 - set up

Load texture (image)
Coordinates system conversion: world to canvas (cartesian)

Move (0, 0) from top left to middle; flip y’s direction

x / world_size * canvas_size - (canvas_size/2)
(canvas_size/2) - y / world_size * canvas_size

405 of 481

Step 2 - build control points

Define basic anchor points

Define control points and map them to anchor points using Barycentric coordinates

406 of 481

Step 3 - perturb, interpolate & blend

Perturb the control points (random or preset trajectory)

Interpolate with curves; blending guarantees C2 continuty at intersection

407 of 481

Curves & Blending

Neville interpolation:

Blending function:

Citation: Cem, Yuksel. 2020. A class of C2 interpolating splies. ACM Transactions on Graphics, 39, 5, 2020

408 of 481

Step 4 - texture mapping

409 of 481

Additional functionalities

Register key frames, and interpolate between frames to get customized animation.
Customize anchor points.
Directly select and drag control points.
Precompute a trajectory for each control points, so they move automatically.

410 of 481

Precompute trajectory demo

411 of 481

3D Scene Flow Using Lightfield Images

Kevin Wang

412 of 481

Lightfield Image Structure

4-dimensional L(x, y, u, v):

X and y are the camera coordinates
U and v are the pixel coordinates

Given two lightfield images, we want to find 3D scene flow (Motion in X, Y, and Z)

413 of 481

Relating Ray Flow to Optical Flow

Optical Flow Equation: I_XV_X + I_YV_Y + I_t = 0
Raw Flow Equation: L_XV_X + L_YV_Y + L_zV_z + L_t = 0

L_z is found with relation to the pinhole camera position, partial derivatives of the X and Y positions and a fixed depth constant

414 of 481

Lucas-Kanade

Because of the form of the ray flow equation, and it’s similarity to the traditional optical flow equation, we can use existing methods to estimate depth motion.
For Lucas-Kanade, finding local motion (assumes that pixels in the same area will have the same motion)

415 of 481

Horn-Schunck

Assumes that the flow between pixels is “smooth” (global)

Minimum can be found using the Euler-Lagrange Equations

416 of 481

Example Results

417 of 481

418 of 481

A Computational Approach for Obstruction- Free Photography

Wesley Wang

(wesleyw)

419 of 481

Overview:

Take photos through occluding elements like windows and fences
Remove those occlusive elements from the photos
Capture video (or image sequence) while moving the camera
Background scene must be far enough that it remains static

420 of 481

Overview:

Occlusive Foreground Objects

421 of 481

Overview:

Window Reflections

422 of 481

Implementation:

Treat image as composition of a background layer and obstruction layer

I_O is occlusion (foreground) layer
I_B is background layer
A is alpha mask where occlusion exists in image

423 of 481

Implementation:

Solve for those image components using info from other frames
Also solve for the warping W(V) of layers between frames

Solve optimization problem

424 of 481

Results:

Motion estimation between frames after processing + edge detection

425 of 481

Questions?

426 of 481

Confocal Stereo

Hiroshi Wu

427 of 481

Confocal Stereo

A method to extract depth information using multiple images of the same scene with varying focus and aperture
https://people.csail.mit.edu/hasinoff/pubs/hasinoff-confocal-2006.pdf

428 of 481

In assignment 4...

Simulated a focal-aperture stack based on image from plenoptic camera
Constructed AFI (aperture-focus-image) for each pixel
Used direct variance evaluation to determine depth d at each pixel

429 of 481

The whole process (as detailed in paper)

Aperture-focal stack image acquisition using a DSLR
Relative exitance estimation (radiometric calibration)
Image alignment (lens warp calibration)
AFI construction
Depth estimate by AFI model fitting

Although the core idea is the same, the whole process is more involved

430 of 481

Aperture-focal stack acquisition

Depth from blur: bigger bokeh = better resolution
To maximize bokeh:

Used a lens with f/2.8 aperture
Zoomed to the maximum focal length on that lens: 70mm

Took photos of the same scene for 80 focus settings (0.38m ~ 3.4m) and 16 apertures (2.8 ~ 16)
Used gphoto2 to automate the process; one set of data took 1 hour to capture

Example: f=50, a=2.8

431 of 481

2. Relative exitance estimation

Since we compare pixel values across the AF stack, we need to ensure that same values mean the same thing
E.g. vignetting
We take photos of a uniform white paper using similar focus settings and aperture values and check the exitance at each pixel and each setting so we can adjust for brightness differences

432 of 481

3. Image alignment

Changing focus distance actually changes the scene due to lens mechanics:

Modelled as a magnification + quadratic distortion

Smaller FOV

433 of 481

3. Image alignment

Use focal stack of a scene with distinct features to obtain the warp relationships between different focus, and use nonlinear optimization to solve for the parameters

434 of 481

4. AFI

Example from paper:

Example from my data:

435 of 481

5. AFI model fitting

There is a distinct pattern to AFIs
Due to different amounts of blurring: similar blurs should have similar colors
We derive blur = where the real-life distance of configuration x is due to the thin lens model.
Quantizing theoretical blur gives us these bands; we seek to minimize the variance within the bands

436 of 481

6. Result from piano dataset

437 of 481

Image smoothing via L0 Gradient Minimization

Ye Wu

438 of 481

Objective

Globally maintain and possibly enhance the most prominent set of edges by increasing steepness of transition while not affecting the overall acutance.
A strategy to confine the discrete number of intensity changes among neighboring pixels, which links mathematically to the L0 norm for information sparsity pursuit.

439 of 481

Algorithm

440 of 481

Pipeline

Start

Gradient y

Gradient x

Mix 1

441 of 481

Pipeline

Mix 1 with threshold

Gradient x new

Gradient y new

Mix 2

end

442 of 481

Results

k=2.0

k=3.0

l=0.1

l=0.01

l=0.001

l=0.0001

l=0.00001

443 of 481

Results

k=2.0

k=4.0

k=6.0

k=8.0

k=10.0

l=0.1

l=0.03

l=0.00001

444 of 481

Comparison

L0 minimization

3.05s

Bilateral filtering

64.13s

445 of 481

Application

Image abstraction

446 of 481

Application

Clip-art compression artifact removal

447 of 481

Application

Combination of BLF and L0 method

Origin

L0

BLF

BLF+L0

448 of 481

Main References

http://www.cse.cuhk.edu.hk/~leojia/projects/L0smoothing/

https://www.caam.rice.edu/~yzhang/reports/tr0710_rev.pdf

449 of 481

Thank you!

Q&A

450 of 481

Fast Reflection Removal using Hierarchical Bilateral Grids

Zhichao Yin

(zhichaoy@)

451 of 481

Reflection removal

451

Photo taken through glass

Ground-truth background/transmissive layer

452 of 481

High-res reflection removal

Q: can existing approaches work well for high-res input images?

Challenges:

slow runtime
memory consumption
degraded quality (CNNs are not scale equivariant)

452

453 of 481

High-res challenges – runtime & memory

453

CPU runtime & memory vs. input resolution on real-world test set (208 images)

Zhang et al. "Single Image Reflection Separation with Perceptual Losses." CVPR, 2018
Niklaus et al. ”Learned Dual-view Reflection Removal.” In Submission, 2020

454 of 481

High-res challenges – degraded quality

454

Input

Niklaus etal.¹

Ground-truth

Result upsampled from prediction with 512-res input

Niklaus et al. ”Learned Dual-view Reflection Removal.” In Submission, 2020

455 of 481

High-res challenges – degraded quality

455

Input

Niklaus etal.¹

Ground-truth

Result upsampled from prediction with 1024-res input

Niklaus et al. ”Learned Dual-view Reflection Removal.” In Submission, 2020

456 of 481

High-res challenges – degraded quality

456

Input

Niklaus etal.¹

Ground-truth

Result upsampled from prediction with 2048-res input

Niklaus et al. ”Learned Dual-view Reflection Removal.” In Submission, 2020

457 of 481

HDRNet¹ works well for tone mapping

457

Gharbi et al. ”Deep Bilateral Learning for Real-Time Image Enhancement.” SIGGRAPH, 2017

Input

Output

458 of 481

But HDRNet¹ cannot work for dereflection

458

Gharbi et al. ”Deep Bilateral Learning for Real-Time Image Enhancement.” SIGGRAPH, 2017

Addition of reflection parts makes the transformation highly non-linear in color space.

Input

Output

459 of 481

Our method – overview

459

460 of 481

High-res�Comparison