1 of 165

Fusing Physical and Virtual Worlds into

Interactive Mixed Reality

Ruofei Du | Google, San Francisco | me@duruofei.com

Virtual | Mobile Immersive Computing by Prof. Bo Han

Good morning everyone!

First of all, thanks Prof. Han for inviting me to give the gesture lecture to students in Mobile Immersive Computing.

I'm honored to share some of my latest research in bridging the gap between the digital and the physical world.

So, the title of my talk today is "Fusing Physical and Virtual Worlds into Interactive Mixed Reality."

Abstract,

With the dramatic growth of virtual and augmented reality, ubiquitous digital information is created or captured from both the virtual and the physical worlds. However, it remains a challenge how to bridge the gap between the real and virtual. In this talk, I will present several technologies that seamlessly fuse the digital and physical worlds into interactive mixed reality. In Project DepthLab, I will present an opensourced library that enables real-time 3D interaction on a mobile phone. In Project Geollery.com, I will present a real-time pipeline to reconstruct a mirrored world on various platforms. In Project Log-Rectilinear, I will present a foveated streaming algorithm of 360 videos. Finally, we conclude the talk with demos of CollaboVR and Montage4D to envision a future of fused reality on mobile and wearable devices.

2 of 165

Self Intro

www.duruofei.com

3 of 165

Self Intro

Ruofei Du (杜若飞)

4 of 165

Self Intro

Ruofei Du (杜若飞)

User Interaction

Depth Map

Meshes

Multiview / 360 Videos

Geollery, �CHI '19, Web3D '19, VR '19

Social Street View, Web3D '16

Best Paper Award

VideoFields

Web3D '16

SketchyScene

SIGGRAPH Asia '19, ECCV '18

Montage4D

I3D '18

JCGT '19

DepthLab UIST '20

Kernel Foveated Rendering,�I3D '18, VR '20, TVCG '20

CollaboVR ISMAR '20

LogRectilinear

IEEE VR '21

TVCG Honorable Mention

5 of 165

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

With the dramatic growth of virtual and augmented reality, ubiquitous digital information is created or captured from both the virtual and the physical worlds. However, it remains a challenge how to bridge the gap between the real and virtual. In this talk, I will present several technologies that seamlessly fuse the digital and physical worlds into interactive mixed reality. In Project DepthLab, I will present an opensourced library that enables real-time 3D interaction on a mobile phone. In Project Geollery.com, I will present a real-time pipeline to reconstruct a mirrored world on various platforms. In Project Log-Rectilinear, I will present a foveated streaming algorithm of 360 videos. Finally, we conclude the talk with demos of CollaboVR and Montage4D to envision a future of fused reality on mobile and wearable devices.

6 of 165

Introduction

Mobile Augmented Reality

7 of 165

Introduction

Google's ARCore

8 of 165

Introduction

Google's ARCore

9 of 165

Introduction

Mobile Augmented Reality

10 of 165

Introduction

Motivation

Is direct placement and rendering of 3D objects sufficient for realistic AR experiences?

11 of 165

Introduction

Depth Lab

Not always!

12 of 165

Introduction

Depth Lab

Virtual content looks like it’s “pasted on the screen” rather than “in the world”!

13 of 165

Introduction

Motivation

14 of 165

Introduction

Motivation

15 of 165

Introduction

Depth Lab

How can we bring these advanced

features to mobile AR experiences without relying on dedicated sensors or the need for computationally expensive surface reconstruction?

16 of 165

Introduction

Depth Map

17 of 165

Introduction

Depth Lab

Google	•Pixel 2, Pixel 2 XL, Pixel 3, Pixel 3 XL, Pixel 3a, Pixel 3a XL, Pixel 4, Pixel 4 XL
Huawei	•Honor 10, Honor V20, Mate 20 Lite, Mate 20, Mate 20 X, Nova 3, Nova 4, P20, P30, P30 Pro
LG	•G8X ThinQ, V35 ThinQ, V50S ThinQ, V60 ThinQ 5G
OnePlus	•OnePlus 6, OnePlus 6T, OnePlus 7, OnePlus 7 Pro, OnePlus 7 Pro 5G, OnePlus 7T, OnePlus 7T Pro
Oppo	•Reno Ace
Samsung	•Galaxy A80, Galaxy Note8, Galaxy Note9, Galaxy Note10, Galaxy Note10 5G, Galaxy Note10+, Galaxy Note10+ 5G, Galaxy S8, Galaxy S8+, Galaxy S9, Galaxy S9+, Galaxy S10e, Galaxy S10, Galaxy S10+, Galaxy S10 5G, Galaxy S20, Galaxy S20+ 5G, Galaxy S20 Ultra 5G
Sony	•Xperia XZ2, Xperia XZ2 Compact, Xperia XZ2 Premium, Xperia XZ3
Xiaomi	•Pocophone F1
	And growing…�https://developers.google.com/ar/discover/supported-devices

18 of 165

Introduction

Depth Lab

Is there more to realism than occlusion?

19 of 165

Introduction

Depth Lab

Surface interaction?

20 of 165

Introduction

Depth Lab

Realistic Physics?

21 of 165

Introduction

Depth Lab

Path Planning?

22 of 165

23 of 165

Introduction

Depth Lab

24 of 165

Related Work

Valentin et al.

25 of 165

Introduction

Depth Lab

26 of 165

Introduction

Depth Lab

27 of 165

Introduction

Depth Generation

28 of 165

Introduction

Depth Lab

29 of 165

Related Work

Valentin et al.

30 of 165

Introduction

Depth Lab

31 of 165

Introduction

Depth Lab

Up to 8 meters, with

the best within 0.5m to 5m

32 of 165

Motivation

Gap from raw depth to applications

33 of 165

Introduction

Depth Lab

ARCore

Depth API

DepthLab

Mobile AR developers

34 of 165

Design Process

3 Brainstorming Sessions

3 brainstorming sessions

18 participants

39 aggregated ideas

35 of 165

Design Process

3 Brainstorming Sessions

36 of 165

System

Architecture overview

37 of 165

Data Structure

Depth Array

2D array (160x120 and above) of 16-bit integers

38 of 165

Data Structure

Depth Mesh

39 of 165

Data Structure

Depth Texture

40 of 165

System

Architecture

41 of 165

Localized Depth

Coordinate System Conversion

42 of 165

Localized Depth

Normal Estimation

43 of 165

Localized Depth

Normal Estimation

44 of 165

Localized Depth

Normal Estimation

45 of 165

Localized Depth

Avatar Path Planning

46 of 165

Localized Depth

Rain and Snow

47 of 165

Surface Depth

Use Cases

48 of 165

Surface Depth

Physics collider

Physics with depth mesh.

49 of 165

Surface Depth

Texture decals

Texture decals with depth mesh.

50 of 165

Surface Depth

3D Photo

Projection mapping with depth mesh.

3D photo effect is also included in DepthLab with texture projection mapping techniques.

//

\by reprojecting a color texture to a frozen depth mesh with rotating viewpoint.

When the user click the 3D photo capture button, we save the current camera image, depth image, and camera parameters.

Given the screen center, we first resolve the central vertex in the real world, based on the current depth map.

Next, we generate a depth mesh

Based on the camera's world position, we can compute the directional vector from the camera to the central point.

Next, with a cross product, we can compute the plane perpendicular to the direction.

Then we let the camera rotates around in the plane in a circle.

Finally, we recompute the projection matrix based on the camera's position and project the cached texture to the depth mesh.

The details is also opensourced in GitHub.

51 of 165

Dense Depth

Depth Texture - Antialiasing

52 of 165

Dense Depth

Real-time relighting

θ

N

L

53 of 165

Dense Depth

Why normal map does not work?

54 of 165

Dense Depth

Real-time relighting

55 of 165

Dense Depth

Real-time relighting

56 of 165

Dense Depth

Real-time relighting

go/realtime-relighting, go/relit

57 of 165

Dense Depth

Wide-aperture effect

58 of 165

Dense Depth

Occlusion-based rendering

59 of 165

Experiments

DepthLab minimum viable application

60 of 165

Experiments

General Profiling of MVP

61 of 165

Experiments

Relighting

62 of 165

Experiments

Aperture effects

63 of 165

Discussion

Deployment with partners

64 of 165

Discussion

Deployment with partners

65 of 165

Discussion

Deployment with partners

66 of 165

Limitations

Design space of dynamic depth

Dynamic Depth? HoloDesk, HyperDepth, Digits, Holoportation for mobile AR?

67 of 165

Envision

Design space of dynamic depth

68 of 165

GitHub

Please feel free to fork!

69 of 165

Play Store

Try it yourself!

70 of 165

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

71 of 165

Thank you!

DepthLab | UIST 2020

72 of 165

Demo

DepthLab | UIST 2020

73 of 165

Project Geollery.com: Reconstructing a Live Mirrored World With Geotagged Social Media

Ruofei Du^†, David Li^†, and Amitabh Varshney

{ruofei, dli7319, varshney}@umiacs.umd.edu | www.Geollery.com | Web3D 2019, Los Angeles, USA

UMIACS

THE AUGMENTARIUM

VIRTUAL AND AUGMENTED REALITY LAB

AT THE UNIVERSITY OF MARYLAND

COMPUTER SCIENCE

UNIVERSITY OF MARYLAND, COLLEGE PARK

Project Geollery.com: Reconstructing a Live Mirrored World With Geotagged Social Media

74 of 165

Geollery.com

v2: a major leap

74

75 of 165

System Overview

Geollery Workflow

75

76 of 165

System Overview

2D Map Data

76

77 of 165

System Overview

2D Map Data

77

78 of 165

System Overview

+Avatar +Trees +Clouds

78

79 of 165

System Overview

+Avatar +Trees +Clouds +Night

79

80 of 165

System Overview

Street View Panoramas

80

81 of 165

System Overview

Street View Panoramas

81

82 of 165

System Overview

Street View Panoramas

82

83 of 165

System Overview

Geollery Workflow

83

All data we used is publicly and widely available on the Internet.

84 of 165

Rendering Pipeline

Close-view Rendering

84

85 of 165

Rendering Pipeline

Initial spherical geometries

85

86 of 165

Rendering Pipeline

Depth correction

86

87 of 165

Rendering Pipeline

Intersection removal

87

88 of 165

Rendering Pipeline

Texturing individual geometry

88

89 of 165

Rendering Pipeline

Texturing with alpha blending

89

90 of 165

Rendering Pipeline

Rendering result in the fine detail

90

91 of 165

Rendering Pipeline

Rendering result in the fine detail

91

92 of 165

Rendering Pipeline

Rendering result in the fine detail

92

93 of 165

Rendering Pipeline

Experimental Features

93

94 of 165

A Log-Rectilinear Transformation for Foveated 360-Degree Video Streaming

David Li, Ruofei Du, Adharsh Babu, Camelia Brumar, and Amitabh Varshney�IEEE Transactions on Visualization and Computer Graphics (TVCG), 2021.

With the dramatic growth of virtual and augmented reality, ubiquitous digital information is created or captured from both the virtual and the physical worlds. However, it remains a challenge how to bridge the gap between the real and virtual. In this talk, I will present several technologies that seamlessly fuse the digital and physical worlds into interactive mixed reality. In Project DepthLab, I will present an opensourced library that enables real-time 3D interaction on a mobile phone. In Project Geollery.com, I will present a real-time pipeline to reconstruct a mirrored world on various platforms. In Project Log-Rectilinear, I will present a foveated streaming algorithm of 360 videos. Finally, we conclude the talk with demos of CollaboVR and Montage4D to envision a future of fused reality on mobile and wearable devices.

95 of 165

Introduction

VR headset & video streaming

95

360 Cameras and VR headsets are increasing in resolution.
Video streaming is quickly increasing in popularity.

96 of 165

Introduction

VR + eye tracking

96

Commercial VR headsets are getting eye-tracking capabilities.

HTC Vive Eye

Varjo VR-3

Fove

97 of 165

Introduction

360 videos

97

360 cameras capture the scene in every direction with a full 360 degree spherical field of regard.
These videos are typically stored in the equirectangular projection parameterized by spherical coordinates (𝜃, 𝜑).

360° Field of Regard

Scene

Captured 360 Video

98 of 165

Introduction

360 videos

98

When viewed in a VR headset, 360° videos cover the entire field-of-view for more immersive experiences.
However, transmitting the full field-of-regard either has worse perceived quality or requires far more bandwidth than for conventional videos.

Captured 360 Video

Projection to Field of View

99 of 165

Introduction

360 videos

99

Existing work in 360° streaming focuses on viewport dependent streaming by using tiling to transmit only visible regions based on the user’s head rotation.

Tiling Illustration

Image from (Liu et al. with Prof. Bo, 2017)

100 of 165

Introduction

Foveated rendering

100

Foveated rendering renders the fovea region of the viewport at a high-resolution and the peripheral region at a lower resolution.
Kernel Foveated Rendering (Meng et al., PACMCGIT 2018) uses a log-polar transformation to render foveated images in real-time.

Image Credit: Tobii

Log-polar Transformation,

Image from (Meng et al., 2018)

101 of 165

Introduction

Log-Polar Foveated Streaming

101

Applying log-polar subsampling to videos results in flickering and aliasing artifacts in the foveated video.

102 of 165

Research Questions

102

Can foveation techniques from rendering be used to optimize 360 video streaming?
How can we reduce foveation artifacts by leveraging the full original video frame?

103 of 165

Log-Polar Foveated Streaming

Artifacts are caused by subsampling of the original video frame.

Original Frame

Subsampled Pixel

104 of 165

Log-Polar Foveated Streaming

Artifacts are caused by subsampling of the original video frame.

Original Frame

Subsampled Pixel

105 of 165

Log-Polar Foveated Streaming

Subsampled pixels should represent an average over an entire region of the original video frame.
Computationally, this would take O(region size) time to compute for each sample.

106 of 165

Summed-Area Tables

One way to compute averages quickly is using summed-area tables, also known as integral images.
Sampling a summed area table only takes O(1) time.

107 of 165

Log-Rectilinear Transformation

Apply exponential drop off along x-axis and y-axis independently.
Rectangular regions allow the use of summed area tables for subsampling.
A one-to-one mapping near the focus region preserves the resolution of the original frame.

108 of 165

Foveated Streaming

Decoding 360° Video

GPU-driven Summed-Area Table Generation

Computing the Log-Rectilinear Buffer

Encoding the Log-Rectilinear Video Stream

Updating the Foveal Position

Decoding the

Log-Rectilinear Video Stream

Transforming into a Full-resolution Video Frame

Video Streaming Server

Client

socket

FFmpeg

OpenCL

FFmpeg

OpenCL

Video Streaming Request

socket

109 of 165

Qualitative Results

Shown with gaze at the center of the viewport

110 of 165

Quantitative Results

We perform quantitative evaluations comparing the log-rectilinear transformation and the log-polar transformation in 360° video streaming.

Performance overhead of summed-area tables.
Full-frame quality.
Bandwidth usage.

111 of 165

Quantitative Results

Pairing the log-rectilinear transformation with summed area table filtering yields lower flickering while also reducing bandwidth usage and returning high weighted-to-spherical signal to noise ratio (WS-PSNR) results.

112 of 165

Quantitative Results

Pairing the log-rectilinear transformation with summed area table filtering yields lower flickering while also reducing bandwidth usage and returning high weighted-to-spherical signal to noise ratio (WS-PSNR) results.

113 of 165

Conclusion

We present a log-rectilinear transformation which utilizes foveation, summed-area tables, and standard video codecs for foveated 360° video streaming.

Foveation

Summed-Area Tables

Standard Video Codecs

Foveated 360° Video Streaming

114 of 165

Zhenyi He^* Ruofei Du^† Ken Perlin^*

^*Future Reality Lab, New York University ^†Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

115 of 165

116 of 165

117 of 165

118 of 165

The best layout and interaction mode?

119 of 165

Research Questions:

Design: What if we could bring sketching to real-time collaboration in VR?
Design + Evaluation: If we can convert raw sketches into interactive animations, will it improve the performance of remote collaboration?
Evaluation: Are there best user arrangements or input modes for different use cases, or is it more a question of personal preferences

120 of 165

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

121 of 165

122 of 165

123 of 165

124 of 165

CollaboVR

Chalktalk (Cloud App)

Audio Communication

Layout Reconfiguration

125 of 165

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(3) hybrid

Input Modes

(1) direct

(2) projection

126 of 165

Layout Reconfiguration

User Arrangements

(1) side-by-side

(b)

user 1

Interactive boards

tracking range of user 1

user 1

user 2

127 of 165

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(b)

(c)

user 1

user 2

(b)

user 2

observed by user 1

A

user 1

RH

LH

RH

LH

The face-to-face arrangement solves the occlusion issue by mirroring all the other avatars’ locations to the other side of their currently activated interactive board.

See the example that user 1 enables the face-to-face arrangement. So user 2 in user 1’s view is mirrored to the other side of the left interactive board which user2 is looking at. Hence user2 and user1 did not visually block each other.

Then let’s take a look at the gaze interaction. Spot A is defined as the intersection point of the gaze direction between two users and the left interactive board. Assuming that user1 and user2 are both looking at spot A. Before mirror operation, user1 and user2 are only aware of themselves looking at spot A. After the mirror operation, the gaze direction between users and content is maintained. Moreover, users are aware of each other’s focus when gazing at spot A at the same time.

Since we did not manipulate the content in CollaboVR, the content is still legible and correct to each viewer.

128 of 165

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(d)

(c)

(b)

user 1

user 2

user 3

user 4

129 of 165

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(3) hybrid

(d)

(c)

(b)

user 2

teacher

user 3

user 4

130 of 165

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

131 of 165

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

132 of 165

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

133 of 165

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

134 of 165

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

135 of 165

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

136 of 165

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

137 of 165

Evaluation

Overview of subjective feedback on CollaboVR

138 of 165

Evaluation

139 of 165

Evaluation

140 of 165

Takeaways

Developing CollaboVR, a reconfigurable end-to-end collaboration system.
Designing custom conﬁgurations for real-time user arrangements and input modes.
Quantitative and qualitative evaluation of CollaboVR.
Open-sourcing our software at https://github.com/snowymo/CollaboVR.

141 of 165

more live demos...

142 of 165

143 of 165

144 of 165

145 of 165

146 of 165

Zhenyi He^* Ruofei Du^† Ken Perlin^*

^*Future Reality Lab, New York University ^†Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

147 of 165

Future Directions

The Ultimate XR Platform

147

148 of 165

Future Directions

Fuses Past Events

148

149 of 165

Future Directions

With the present

149

150 of 165

Future Directions

And look into the future

150

151 of 165

Future Directions

Change the way we communicate in 3D and consume the information

151

152 of 165

Future Directions

Consume the information throughout the world

152

153 of 165

Fusing Physical and Virtual Worlds into

Interactive Mixed Reality

Ruofei Du | Google, San Francisco | me@duruofei.com

Virtual | Mobile Immersive Computing by Prof. Bo Han | March 19, 2021

154 of 165

Introduction

Depth Map

Introduction

Depth Map

155 of 165

Introduction

Depth Lab

156 of 165

Thank you!

www.duruofei.com

157 of 165

Introduction

Depth Lab

Occlusion is a critical component for AR realism!

Correct occlusion helps ground content in reality, and makes virtual objects feel as if they are actually in your space.

158 of 165

Introduction

Motivation

159 of 165

Depth Mesh

Generation

160 of 165

Localized Depth

Avatar Path Planning

161 of 165

Dense Depth

Depth Texture

162 of 165

Introduction

Depth Map

163 of 165

Taxonomy

Depth Usage

164 of 165

Introduction

Depth Map

165 of 165

Introduction

Depth Map