1 of 399

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

Good morning everyone!

[slow down]

First of all, I would like to thank Prof. Varshney for offering me the opportunity to return my alma mater /ˌälmə ˈmädər/

// inviting me to give the guest lecture to graduate students at the University of Maryland, College Park.

I'm honored to share some of my latest research in virtual and augmented reality.

The title of my talk today is "Interactive Graphics for a Universally Accessible Metaverse."

Abstract,

With the dramatic growth of virtual and augmented reality, ubiquitous digital information is created or captured from both the virtual and the physical worlds. However, it remains a challenge how to bridge the gap between the real and virtual. In this talk, I will present several technologies that seamlessly fuse the digital and physical worlds into interactive mixed reality. In Project DepthLab, I will present an opensourced library that enables real-time 3D interaction on a mobile phone. In Project Geollery.com, I will present a real-time pipeline to reconstruct a mirrored world on various platforms. In Project Log-Rectilinear, I will present a foveated streaming algorithm of 360 videos. Finally, we conclude the talk with demos of CollaboVR and Montage4D to envision a future of fused reality on mobile and wearable devices.

2 of 399

Self Intro

www.duruofei.com

3 of 399

Self Intro

Ruofei Du (杜若飞)

4 of 399

Self Intro

Ruofei Du (杜若飞)

5 of 399

Self Intro

Ruofei Du (杜若飞)

Human-Computer Interaction

Geollery �CHI '19, Web3D '19, VR '19

Social Street View Web3D '16

Best Paper Award

VideoFields

Web3D '16

SketchyScene

TOG (SIGGRAPH Asia) '19, ECCV '18

Montage4D

I3D '18

JCGT '19

DepthLab UIST '20

13K Installs

Kernel Foveated Rendering�I3D '18, VR '20, TVCG '20

CollaboVR ISMAR '20

LogRectilinear

IEEE VR '21 (TVCG)

TVCG Honorable Mention

GazeChat

UIST '21

Computer Graphics

MDIF�ICCV' 21

HumanGPS�CVPR' 21

HandSight

ECCVW '14

TACCESS '15

Ad hoc UI

CHIEA '22

ProtoSound

CHI ‘22

PRIF�ECCV' 22

Computer

Vision

6 of 399

Self Intro

Ruofei Du (杜若飞)

Interaction and Communication

Geollery �CHI '19, Web3D '19, VR '19

Social Street View Web3D '16

Best Paper Award

VideoFields

Web3D '16

SketchyScene

TOG (SIGGRAPH Asia) '19, ECCV '18

Montage4D

I3D '18

JCGT '19

DepthLab UIST '20

13K Installs

Kernel Foveated Rendering�I3D '18, VR '20, TVCG '20

CollaboVR ISMAR '20

LogRectilinear

IEEE VR '21 (TVCG)

TVCG Honorable Mention

GazeChat

UIST '21

Digital World

Digital Human

HumanGPS�CVPR' 21

HandSight

ECCVW '14

TACCESS '15

ProtoSound

CHI ‘22

Ad hoc UI

CHIEA '22

OmniSyn

IEEE VR '22

SlurpAR DIS '22

7 of 399

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

8 of 399

Metaverse

Neal Stephenson, 1992.

9 of 399

Metaverse

10 of 399

11 of 399

12 of 399

13 of 399

Metaverse

Future of Internet?

Internet of Things?

Virtual Reality?

Augmented Reality?

Decentralization?

Blockchain + NFT?

Mirrored World?

Digital Twin?

VR OS?

Web 3.0?

14 of 399

The Future of Internet

Internet of Things

Virtual Reality

Augmented Reality

Decentralization

Blockchain

NFT

Mirrored World

Metaverse

Digital Twin

VR OS

Web 3.0

Extended Reality (XR)

Accessibility

Avatars

Co-presence

Economics

Gaming

Wearable

AI

Privacy

Security

Vision

Neural

15 of 399

Metaverse

Metaverse envisioned a persistent digital world where people are fully connected as virtual representations,

As a teenager, my dream was to live in a metaverse...

However, today I wish metaverse is only a tool to make information more useful and accessible and help people to live a better physical life.

16 of 399

Interactive Graphics for a Universally Accessible Metaverse

Chapter One · Mirrored World & Real-time Rendering

Chapter Two · Computational Interaction: Algorithm & Systems

Chapter Three · Digital Human & Augmented Communication

17 of 399

Interactive Graphics for a Universally Accessible Metaverse

Chapter One · Mirrored World & Real-time Rendering

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

18 of 399

Interactive Graphics for a Universally Accessible Metaverse

Chapter One · Mirrored World & Real-time Rendering

Geollery �CHI '19, Web3D '19, VRW '19

Social Street View Web3D '16

Best Paper Award

Kernel Foveated Rendering�I3D '18, VR '20, TVCG '20

LogRectilinear, OmniSyn

IEEE VR '21 (TVCG), VRW ‘22

TVCG Honorable Mention

19 of 399

Project Geollery.com & Social Street View: Reconstructing a Live Mirrored World With Geotagged Social Media

Ruofei Du^†, David Li^†, and Amitabh Varshney

{ruofei, dli7319, varshney}@umiacs.umd.edu | www.Geollery.com | ACM CHI 2019 & Web3D 2016 Best Paper Award & 2019

UMIACS

THE AUGMENTARIUM

VIRTUAL AND AUGMENTED REALITY LAB

AT THE UNIVERSITY OF MARYLAND

COMPUTER SCIENCE

UNIVERSITY OF MARYLAND, COLLEGE PARK

20 of 399

Introduction

Social Media

20

image courtesy: plannedparenthood.org

21 of 399

Introduction

Social Media + Topics

21

image courtesy: huffingtonpost.com

22 of 399

Motivation

Social Media + XR

22

23 of 399

Motivation

Social Media + XR

23

image courtesy:

instagram.com,

facebook.com,

twitter.com

24 of 399

Motivation

2D layout

24

image courtesy:

pinterest.com

25 of 399

Motivation

Immersive Mixed Reality?

25

image courtesy:

viralized.com

26 of 399

Motivation

Pros and cons of the classic

26

27 of 399

Motivation

Pros and cons of the classic

27

28 of 399

Related Work

Social Street View, Du and Varshney

Web3D 2016 Best Paper Award

28

29 of 399

Technical Challenges?

30 of 399

Related Work

Social Street View, Du and Varshney

Web3D 2016 Best Paper Award

30

31 of 399

Related Work

Social Street View, Du and Varshney

Web3D 2016 Best Paper Award

31

32 of 399

Related Work

3D Visual Popularity

Bulbul and Dahyot, 2017

32

33 of 399

Related Work

Virtual Oulu, Kukka et al.

CSCW 2017

33

34 of 399

Related Work

Immersive Trip Reports

Brejcha et al. UIST 2018

34

35 of 399

Related Work

High Fidelity, Inc.

35

36 of 399

Related Work

Facebook Spaces, 2017

36

37 of 399

What's Next?

Research Question 1/3

37

What may a social media platform look like in mixed reality?

38 of 399

What's Next?

Research Question 2/3

38

What if we could allow social media sharing in a live mirrored world?

39 of 399

What's Next?

Research Question 3/3

39

What use cases can we benefit from social media platform in XR?

40 of 399

Geollery.com

A Mixed-Reality Social Media Platform

40

41 of 399

Geollery.com

A Mixed-Reality Social Media Platform

41

42 of 399

42

1

Conception, architecting & implementation

Geollery

A mixed reality system that can depict geotagged social media and online avatars with 3D textured buildings.

43 of 399

43

2

Extending the design space of

3D Social Media Platform

Progressive streaming, aggregation approaches, virtual representation of social media, co-presence with virtual avatars, and collaboration modes.

44 of 399

44

3

Conducting a user study of

Geollery vs. Social Street View

by discussing their benefits, limitations, and potential impacts to future 3D social media platforms.

45 of 399

System Overview

Geollery Workflow

45

46 of 399

System Overview

Geollery Workflow

46

47 of 399

Geollery.com

v2: a major leap

47

48 of 399

System Overview

Geollery Workflow

48

49 of 399

System Overview

2D Map Data

49

50 of 399

System Overview

2D Map Data

50

51 of 399

System Overview

+Avatar +Trees +Clouds

51

52 of 399

System Overview

+Avatar +Trees +Clouds +Night

52

53 of 399

System Overview

Street View Panoramas

53

54 of 399

System Overview

Street View Panoramas

54

55 of 399

System Overview

Street View Panoramas

55

56 of 399

System Overview

Geollery Workflow

56

All data we used is publicly and widely available on the Internet.

57 of 399

Rendering Pipeline

Close-view Rendering

57

58 of 399

Rendering Pipeline

Initial spherical geometries

58

59 of 399

Rendering Pipeline

Depth correction

59

60 of 399

Rendering Pipeline

Intersection removal

60

61 of 399

Rendering Pipeline

Texturing individual geometry

61

62 of 399

Rendering Pipeline

Texturing with alpha blending

62

63 of 399

Rendering Pipeline

Rendering result in the fine detail

63

64 of 399

Rendering Pipeline

Rendering result in the fine detail

64

65 of 399

Rendering Pipeline

Rendering result in the fine detail

65

66 of 399

User Study

Social Street View vs. Geollery

66

67 of 399

User Study

Quantitative Evaluation

67

68 of 399

User Study

Quantitative Evaluation

68

69 of 399

69

I would like to use it for the food in different restaurants. I am always hesitating of different restaurants. It will be very easy to see all restaurants with street views. In Yelp, I can only see one restaurant at a time.

P6 / F

70 of 399

70

[I will use it for] exploring new places. If I am going on vacation somewhere, I could immerse myself into the location. If there are avatars around that area, I could ask questions.

P1 / M

71 of 399

71

I think it (Geollery) will be useful for families. I just taught my grandpa how to use Facetime last week and it would great if I could teleport to their house and meet with them, then we could chat and share photos with our avatars.

P2 / F

72 of 399

72

if there is a way to unify the interaction between them, there will be more realistic buildings [and] you could have more roof structures. Terrains will be interesting to add on.

P18 / M

73 of 399

Rendering Pipeline

Experimental Features

73

74 of 399

Landing Impact

Demos at ACM CHI 2019

74

75 of 399

Landing Impact

Demos at ACM CHI 2019

75

76 of 399

Landing Impact

Demos at ACM CHI 2019

76

77 of 399

Instant Panoramic Texture Mapping with Semantic Object Matching for Large-Scale Urban Scene Reproduction

TVCG 2021, Jinwoo Park, Ik-beom Jeon, Student Members, Sung-eui Yoon, and Woontack Woo

78 of 399

Instant Panoramic Texture Mapping with Semantic Object Matching for Large-Scale Urban Scene Reproduction

TVCG 2021, Jinwoo Park, Ik-beom Jeon, Student Members, Sung-eui Yoon, and Woontack Woo

A more applicable method for constructing walk-through experiences in urban streets was employed by Geollery [16], which adopted an efficient transformation of a dense spherical mesh to construct a local proxy geometry based on the depth maps from Google Street View

79 of 399

Freeman et al. ACM PHCI 2022

He et al. ISMAR 2020

Park et al. Virtual Reality 2022

Yeom et al. IEEE VR 2021

80 of 399

What's Next?

81 of 399

OmniSyn: Intermediate View Synthesis Between Wide-Baseline Panoramas

David Li, Yinda Zhang, Christian Häne, Danhang Tang, Amitabh Varshney, and Ruofei Du, VR 2022

82 of 399

OmniSyn: Intermediate View Synthesis Between Wide-Baseline Panoramas

David Li, Yinda Zhang, Christian Häne, Danhang Tang, Amitabh Varshney, and Ruofei Du, VR 2022

83 of 399

84 of 399

How can we further accelerate the real-time rendering procedure?

85 of 399

85

Kernel Foveated Rendering

Xiaoxu Meng, Ruofei Du, Matthias Zwicker and Amitabh Varshney

Augmentarium | UMIACS

University of Maryland, College Park

ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2018

86 of 399

86

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

87 of 399

87

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Kernel Log-polar Mapping

88 of 399

Eye-dominance-guided�Foveated Rendering

Xiaoxu Meng, Ruofei Du, and Amitabh Varshney

IEEE Transactions on Visualization and Computer Graphics (TVCG)

89 of 399

fovea

more foveation for the non-dominant eye

90 of 399

3D-Kernel Foveated Rendering for Light Fields

Xiaoxu Meng, Ruofei Du, Joseph JaJa, and Amitabh Varshney

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2020

UMIACS

91 of 399

A Log-Rectilinear Transformation for Foveated 360-Degree Video Streaming

David Li^†, Ruofei Du^‡, Adharsh Babu^†, Camelia Brumar^†, Amitabh Varshney^†

^†University of Maryland, College Park ^‡Google Research

UMIACS

TVCG Honorable Mentions Award

92 of 399

93 of 399

94 of 399

Sandwiched Image Compression

Wrapping Neural Networks Around a Standard Codec

Increasing the Resolution and Dynamic Range of Standard Codecs

Onur Guleryuz, Philip Chou, Hugues Hoppe, Danhang Tang, �Ruofei Du, Philip Davidson, and Sean Fanello

2021 IEEE International Conference on Image Processing (ICIP)�2022 Picture Coding Symposium (PCS)

95 of 399

96 of 399

97 of 399

98 of 399

Multiresolution Deep Implicit Functions for 3D Shape Representation

Zhang Chen, Yinda Zhang, Kyle Genova, Thomas Funkhouse, Sean Fanello, Sofien Bouaziz, Christian Häne, Ruofei Du, Cem Keskin, and Danhang Tang

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

99 of 399

100 of 399

101 of 399

Interactive Graphics for a Universally Accessible Metaverse

Chapter Two · Computational Interaction: Algorithm & Systems

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

102 of 399

Interactive Graphics for a Universally Accessible Metaverse

Chapter Two · Computational Interaction: Algorithm & Systems

Ad hoc UI

CHI EA ‘21

DepthLab

UIST '20

13K Installs & deployed in Tiktok, Snap, Teamviewer etc.

SlurpAR�DIS ‘22

RetroSphere

IMWUT ‘22

103 of 399

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

With the dramatic growth of virtual and augmented reality, ubiquitous digital information is created or captured from both the virtual and the physical worlds. However, it remains a challenge how to bridge the gap between the real and virtual. In this talk, I will present several technologies that seamlessly fuse the digital and physical worlds into interactive mixed reality. In Project DepthLab, I will present an opensourced library that enables real-time 3D interaction on a mobile phone. In Project Geollery.com, I will present a real-time pipeline to reconstruct a mirrored world on various platforms. In Project Log-Rectilinear, I will present a foveated streaming algorithm of 360 videos. Finally, we conclude the talk with demos of CollaboVR and Montage4D to envision a future of fused reality on mobile and wearable devices.

104 of 399

Introduction

Mobile Augmented Reality

105 of 399

Introduction

Google's ARCore

106 of 399

Introduction

Google's ARCore

107 of 399

Introduction

Mobile Augmented Reality

108 of 399

Introduction

Motivation

Is the current generation of object placement sufficient for realistic AR experiences?

109 of 399

Introduction

Depth Lab

Not always!

110 of 399

Introduction

Depth Lab

Virtual content looks like it’s “pasted on the screen” rather than “in the world”!

111 of 399

Introduction

Motivation

112 of 399

113 of 399

Introduction

Depth Lab

How can we bring these advanced

features to mobile AR experiences WITHOUT relying on dedicated sensors or the need for computationally expensive surface reconstruction?

114 of 399

Introduction

Depth Map

115 of 399

Introduction

Depth Lab

Google	•Pixel 2, Pixel 2 XL, Pixel 3, Pixel 3 XL, Pixel 3a, Pixel 3a XL, Pixel 4, Pixel 4 XL
Huawei	•Honor 10, Honor V20, Mate 20 Lite, Mate 20, Mate 20 X, Nova 3, Nova 4, P20, P30, P30 Pro
LG	•G8X ThinQ, V35 ThinQ, V50S ThinQ, V60 ThinQ 5G
OnePlus	•OnePlus 6, OnePlus 6T, OnePlus 7, OnePlus 7 Pro, OnePlus 7 Pro 5G, OnePlus 7T, OnePlus 7T Pro
Oppo	•Reno Ace
Samsung	•Galaxy A80, Galaxy Note8, Galaxy Note9, Galaxy Note10, Galaxy Note10 5G, Galaxy Note10+, Galaxy Note10+ 5G, Galaxy S8, Galaxy S8+, Galaxy S9, Galaxy S9+, Galaxy S10e, Galaxy S10, Galaxy S10+, Galaxy S10 5G, Galaxy S20, Galaxy S20+ 5G, Galaxy S20 Ultra 5G
Sony	•Xperia XZ2, Xperia XZ2 Compact, Xperia XZ2 Premium, Xperia XZ3
Xiaomi	•Pocophone F1
	And growing…�https://developers.google.com/ar/discover/supported-devices

116 of 399

Introduction

Depth Lab

Is there more to realism than occlusion?

117 of 399

Introduction

Depth Lab

Surface interaction?

118 of 399

Introduction

Depth Lab

Realistic Physics?

119 of 399

Introduction

Depth Lab

Path Planning?

120 of 399

121 of 399

Introduction

Depth Lab

122 of 399

Related Work

Valentin et al.

123 of 399

Depth Maps

124 of 399

Depth �from Motion

Depth From a Single Camera

125 of 399

Best Practices

Depth From a Single Camera

Use depth-certified ARCore devices

Minimal movement in the scene

Encourage users to move the device

Depth from 0 to 8 meters

Best accuracy 0.5 to 5 meters

126 of 399

Enhancing Depth

Optimized to give you the best depth

Depth from Motion is fused with state-of-the-art Machine Learning

Depth leverages specialized hardware like a Time-of-Flight sensor when available

127 of 399

Introduction

Depth Lab

128 of 399

Introduction

Depth Lab

129 of 399

Introduction

Depth Generation

130 of 399

Introduction

Depth Lab

131 of 399

Related Work

Valentin et al.

132 of 399

Introduction

Depth Lab

133 of 399

Introduction

Depth Lab

Up to 8 meters, with

the best within 0.5m to 5m

134 of 399

Motivation

Gap from raw depth to applications

135 of 399

Introduction

Depth Lab

ARCore

Depth API

DepthLab

Mobile AR developers

136 of 399

Design Process

3 Brainstorming Sessions

3 brainstorming sessions

18 participants

39 aggregated ideas

137 of 399

Design Process

3 Brainstorming Sessions

138 of 399

System

Architecture overview

139 of 399

Data Structure

Depth Array

2D array (160x120 and above) of 16-bit integers

140 of 399

Data Structure

Depth Mesh

141 of 399

Data Structure

Depth Texture

142 of 399

System

Architecture

143 of 399

Localized Depth

Coordinate System Conversion

144 of 399

Localized Depth

Normal Estimation

145 of 399

Localized Depth

Normal Estimation

146 of 399

Localized Depth

Normal Estimation

147 of 399

Localized Depth

Avatar Path Planning

148 of 399

Localized Depth

Rain and Snow

149 of 399

Surface Depth

Use Cases

150 of 399

Surface Depth

Physics collider

Physics with depth mesh.

151 of 399

Surface Depth

Texture decals

Texture decals with depth mesh.

152 of 399

Surface Depth

3D Photo

Projection mapping with depth mesh.

3D photo effect is also included in DepthLab with texture projection mapping techniques.

//

\by reprojecting a color texture to a frozen depth mesh with rotating viewpoint.

When the user click the 3D photo capture button, we save the current camera image, depth image, and camera parameters.

Given the screen center, we first resolve the central vertex in the real world, based on the current depth map.

Next, we generate a depth mesh

Based on the camera's world position, we can compute the directional vector from the camera to the central point.

Next, with a cross product, we can compute the plane perpendicular to the direction.

Then we let the camera rotates around in the plane in a circle.

Finally, we recompute the projection matrix based on the camera's position and project the cached texture to the depth mesh.

The details is also opensourced in GitHub.

153 of 399

Dense Depth

Depth Texture - Antialiasing

154 of 399

Dense Depth

Real-time relighting

θ

N

L

155 of 399

Dense Depth

Why normal map does not work?

156 of 399

Dense Depth

Real-time relighting

157 of 399

Dense Depth

Real-time relighting

158 of 399

Dense Depth

Real-time relighting

go/realtime-relighting, go/relit

159 of 399

Dense Depth

Wide-aperture effect

160 of 399

Dense Depth

Occlusion-based rendering

161 of 399

162 of 399

Experiments

DepthLab minimum viable application

163 of 399

Experiments

General Profiling of MVP

164 of 399

Experiments

Relighting

165 of 399

Experiments

Aperture effects

166 of 399

Impact

Deployment with partners

167 of 399

Impact

Deployment with partners

168 of 399

Impact

Deployment with partners

169 of 399

AR Realism

In TikTok

170 of 399

AR Realism

Built into Lens Studio for Snapchat Lenses

Kevaid

Saving Chelon

Quixotical�The Seed: World of Anthrotopia

Snap�Dancing Hotdog

171 of 399

Camera Image

3D Point Cloud

Provides a more detailed representation of the geometry of the objects in the scene.

Raw Depth API

New depth capabilities

172 of 399

Camera Image

Raw Depth Image

Depth Image

Confidence Image

New depth capabilities

Raw Depth API

Provides a more detailed representation of the geometry of the objects in the scene.

173 of 399

Try it yourself!

TeamViewer�LifeAR App

ARCore�Depth Lab App

Depth �Hit Test

New depth capabilities

174 of 399

ARCore�Depth Lab App

Depth API�Codelab

Raw Depth API�Codelab

So what's next? That's up to you! Please check out all the ARCore Depth API experiences shared in the Google I/O Sandbox this year.

We are providing several open-source examples to start with.�

[CLICK] Our Depth Lab is open-sourced in Unity, and has been updated with our latest features, providing a bunch of creative ways to use depth.
And we have two codelabs, [CLICK] including one that teaches you how to retrieve and apply depth images for occlusion, and...
And one that walks you through building an app using Raw Depth for geometric analysis, doing things like detecting objects and finding their bounding box, shown here.

Whether it’s to make fun videos, measure the world, or play a game, developers are using the Depth API to make their AR experiences more realistic and more immersive - and so can you!

We are excited to see what you build with the ARCore Depth API!

175 of 399

Limitations

Design space of dynamic depth

Dynamic Depth? HoloDesk, HyperDepth, Digits, Holoportation for mobile AR?

176 of 399

Envision

Design space of dynamic depth

177 of 399

GitHub

Please feel free to fork!

178 of 399

Play Store

Try it yourself!

179 of 399

Impact

Significant Media Coverage

180 of 399

Impact

Significant Media Coverage

181 of 399

182 of 399

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

183 of 399

Thank you!

DepthLab | UIST 2020

184 of 399

Demo

DepthLab | UIST 2020

185 of 399

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

186 of 399

Ad hoc UI: On-the-fly Transformation of Everyday Objects

into Tangible 6DOF Interfaces for AR

Ruofei Du, Alex Olwal, Mathieu Le Goc, Shengzhi Wu, Danhang Tang,

Yinda Zhang, Jun Zhang, David Joseph Tan, Federico Tombari, David Kim

Google | CHI 2022 Interactivity / In submission to UIST 2022

187 of 399

188 of 399

Applications

189 of 399

“Slurp” Revisited: Using Software Reconstruction to Reflect on Spatial Interactivity and Locative Media

Shengzhi Wu, Daragh Byrne, Ruofei Du, and Molly Steenson

ACM DIS 2022

190 of 399

191 of 399

RetroSphere: Self-Contained Passive 3D Controller Tracking for Augmented Reality

Ananta Narayanan Balaji, Clayton Kimber, David Li, Shengzhi Wu, Ruofei Du, David Kim

ACM Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 2022

192 of 399

Ananta Narayanan Balaji, Clayton Kimber, David Li, Shengzhi Wu, Ruofei Du, David Kim

ACM Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 2022

193 of 399

194 of 399

Interactive Graphics for a Universally Accessible Metaverse

Chapter Three · Digital Human & Augmented Communication

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

195 of 399

Interactive Graphics for a Universally Accessible Metaverse

Chapter Three · Digital Human & Augmented Communication

HumanGPS �CVPR ‘21

Montage4D

I3D '18

JCGT '19

GazeChat & CollaboVR�UIST ‘21 & ISMAR ‘20

Neural Head Avatar

CVPR ‘22 (in submission)

196 of 399

196

197 of 399

198 of 399

199 of 399

200 of 399

201 of 399

202 of 399

203 of 399

204 of 399

205 of 399

206 of 399

207 of 399

208 of 399

209 of 399

210 of 399

ACM Trans. Graph., Vol. 40, No. 4, Article 1. SIGGRAPH 2021

211 of 399

ACM Trans. Graph., Vol. 40, No. 4, Article 1. SIGGRAPH 2021

212 of 399

213 of 399

214 of 399

215 of 399

216 of 399

217 of 399

GazeChat

Enhancing Virtual Conferences With

Gaze-Aware 3D Photos

Zhenyi He^†, Keru Wang^†, Brandon Yushan Feng^‡, Ruofei Du^⸸, Ken Perlin^†

^†New York University�^‡University of Maryland, College Park �^⸸Google

218 of 399

219 of 399

Introduction

VR headset & video streaming

219

220 of 399

Related Work

Gaze-2 (2003)

220

221 of 399

Related Work

MultiView (2005)

221

222 of 399

Related Work

MMSpace (2016)

222

223 of 399

Our Work

GazeChat (UIST 2021)

223

224 of 399

Gaze Awareness

Definition

224

Gaze awareness, defined here as knowing what someone is looking at.

225 of 399

Gaze Awareness

Definition

225

gaze correction

gaze redirection

raw input image

GazeChat

226 of 399

Gaze Correction

Definition

226

227 of 399

Gaze Rediction

Definition

227

eye contact

who is looking at whom

228 of 399

Pipeline

System

228

229 of 399

Eye Tracking

WebGazer..js

229

230 of 399

Neural Rendering

Eye movement

230

231 of 399

Neural Rendering

Eye movement

231

232 of 399

3D Photo Rendering

3D photos

232

233 of 399

3D Photo Rendering

3D photos

233

234 of 399

Layouts

UI

234

235 of 399

Networking

WebRTC

235

236 of 399

Zhenyi He^* Ruofei Du^† Ken Perlin^*

^*Future Reality Lab, New York University ^†Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

237 of 399

238 of 399

ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

ACM CHI 2012 · Dhruv Jain, Khoa Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, and Jon Froehlich

239 of 399

240 of 399

SketchyScene: Richly-Annotated Scene Sketches

Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengying Gao, Baoquan Chen, and Hao Zhang (ECCV 2022)

241 of 399

Language-based Colorization of Scene Sketches

Changqing Zou, Haoran Mo, Chengying Gao, Ruofei Du, and Hongbo Fu (ACM Transaction on Graphics, SIGGRAPH Asia 2019)

242 of 399

243 of 399

244 of 399

Future Directions

The Ultimate XR Platform

244

245 of 399

Wearable Subtitles

Augmenting Spoken Communication with

Lightweight Eyewear for All-day Captioning

246 of 399

Future Directions

The Ultimate XR Platform

246

247 of 399

Future Directions

Fuses Past Events

247

248 of 399

Future Directions

With the present

248

249 of 399

Future Directions

And look into the future

249

250 of 399

Future Directions

Change the way we communicate in 3D and consume the information

250

251 of 399

Future Directions

Consume the information throughout the world

251

252 of 399

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

253 of 399

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

Good morning everyone!

First of all, thanks Prof. Varshney for inviting me to give the gesture lecture to graduate students at the University of Maryland, College Park.

I'm honored to share some of my latest research in virtual and augmented reality.

The title of my talk today is "Interactive Graphics for a Universally Accessible Metaverse."

Abstract,

With the dramatic growth of virtual and augmented reality, ubiquitous digital information is created or captured from both the virtual and the physical worlds. However, it remains a challenge how to bridge the gap between the real and virtual. In this talk, I will present several technologies that seamlessly fuse the digital and physical worlds into interactive mixed reality. In Project DepthLab, I will present an opensourced library that enables real-time 3D interaction on a mobile phone. In Project Geollery.com, I will present a real-time pipeline to reconstruct a mirrored world on various platforms. In Project Log-Rectilinear, I will present a foveated streaming algorithm of 360 videos. Finally, we conclude the talk with demos of CollaboVR and Montage4D to envision a future of fused reality on mobile and wearable devices.

254 of 399

255 of 399

255

Kernel Foveated Rendering

Xiaoxu Meng, Ruofei Du, Matthias Zwicker and Amitabh Varshney

Augmentarium | UMIACS

University of Maryland, College Park

ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2018

256 of 399

256

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Application	Resolution	Frame rate	MPixels / sec
Desktop game	1920 x 1080 x 1	60	124

257 of 399

257

Application	Resolution	Frame rate	MPixels / sec
Desktop game	1920 x 1080 x 1	60	124
2018 VR (HTC Vive PRO)	1440 x 1600 x 2	90	414

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

258 of 399

258

* Data from Siggraph Asia 2016, Prediction by Michael Abrash, October 2016

Application	Resolution	Frame rate	MPixels / sec
Desktop game	1920 x 1080 x 1	60	124
2018 VR (HTC Vive PRO)	1440 x 1600 x 2	90	414
2020 VR *	4000 x 4000 x 2	90	2,880

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

259 of 399

259

Virtual reality is a challenging workload

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

260 of 399

260

Virtual reality is a challenging workload

Most VR pixels are peripheral

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

fovea:

the center of the retina

corresponds to the center of the vision field

261 of 399

261

Virtual reality is a challenging workload

Most VR pixels are peripheral

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

foveal region:

the human eye detects significant detail

peripheral region:

the human eye detects little high fidelity detail

262 of 399

262

Virtual reality is a challenging workload

Most VR pixels are peripheral

foveal

region

foveal region

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

foveal region:

the human eye detects significant detail

peripheral region:

the human eye detects little high fidelity detail

263 of 399

263

Virtual reality is a challenging workload

Most VR pixels are peripheral

96 %

27 %

Percentage of the foveal pixels

4 %

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

* Data from Siggraph 2017, by Anjul Patney, August 2017

264 of 399

264

265 of 399

265

Foveated Rendering

266 of 399

266

Virtual reality is a challenging workload

Most VR pixels are peripheral

Eye tracking technology available

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

267 of 399

267

Related Work

268 of 399

268

Full Resolution

Multi-Pass Foveated Rendering [Guenter et al. 2012]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

269 of 399

269

Rasterizer

Early Z

Generate Coarse Quad

Shade

Evaluate Coarse Pixel Size

Input primitives

Coarse Pixel Shading (CPS) [Vaidyanathan et al. 2014]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

270 of 399

270

CPS with TAA & Contrast Preservation [Patney et al. 2016]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

271 of 399

271

Can we change the resolution gradually?

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

272 of 399

272

Perceptual Foveated Rendering [Stengel et al. 2016]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

273 of 399

273

Is there a foveated rendering approach

without

the expensive pixel interpolation?

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

274 of 399

274

Log-polar mapping [Araujo and Dias 1996]

Log-polar Mapping

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

275 of 399

275

Log-polar mapping [Araujo and Dias 1996]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Log-polar Mapping

276 of 399

276

Log-polar mapping [Araujo and Dias 1996]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Log-polar Mapping

277 of 399

277

Log-polar mapping [Araujo and Dias 1996]

Log-polar Mapping

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

278 of 399

278

Log-polar mapping [Araujo and Dias 1996]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Log-polar Mapping

279 of 399

279

Log-polar mapping [Araujo and Dias 1996]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Log-polar Mapping

280 of 399

280

Log-polar mapping [Araujo and Dias 1996]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Log-polar Mapping

281 of 399

281

Log-polar Mapping for 2D Image [Antonelli et al. 2015]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

282 of 399

282

Log-polar Mapping for 2D Image

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

283 of 399

283

Our Approach

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

284 of 399

284

Kernel Log-polar Mapping

range: [0,1]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

285 of 399

285

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Log-polar Mapping

286 of 399

286

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Kernel Log-polar Mapping

287 of 399

Kernel Foveated Rendering

287

288 of 399

288

Kernel log-polar Mapping

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

289 of 399

289

Kernel log-polar Mapping

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

290 of 399

290

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

291 of 399

291

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

292 of 399

292

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

293 of 399

293

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

294 of 399

294

Fovea

295 of 399

295

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

296 of 399

296

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

297 of 399

297

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

298 of 399

298

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

299 of 399

299

Fovea

300 of 399

Eye-dominance-guided�Foveated Rendering

Xiaoxu Meng, Ruofei Du, and Amitabh Varshney

IEEE Transactions on Visualization and Computer Graphics (TVCG)

301 of 399

302 of 399

303 of 399

304 of 399

304

305 of 399

305

Ocular Dominance: the tendency to prefer scene perception from one eye over the other.

306 of 399

Advantage of the Dominant Eye Over the Non-dominant Eye

better color-vision discrimination ability [Koctekin 2013]
shorter reaction time on visually triggered manual action [Chaumillon 2014]
better visual acuity, contrast sensitivity [Shneor 2006]

306

307 of 399

307

Application	Resolution	Frame rate	MPixels / sec
Desktop game	1920 x 1080 x 1	60	124
2018 VR (HTC Vive PRO)	1440 x 1600 x 2	90	414
2020 VR (Varjo)	1920 x 1080 x 2 + 1440 x 1600 x 2	90	788

308 of 399

Foveated Rendering

VR requires enormous rendering budget
Most pixels are outside the fovea

308

309 of 399

Foveated Rendering

VR requires enormous rendering budget
Most pixels are outside the fovea

309

96 %

27 %

Percentage of the foveal pixels

4 %

* Data from Siggraph 2017, by Anjul Patney, August 2017

310 of 399

311 of 399

fovea

312 of 399

Can we do better?

313 of 399

fovea

non-dominant eye

314 of 399

fovea

more foveation for the non-dominant eye

315 of 399

A Log-Rectilinear Transformation for Foveated 360-Degree Video Streaming

David Li^†, Ruofei Du^‡, Adharsh Babu^†, Camelia Brumar^†, Amitabh Varshney^†

^†University of Maryland, College Park ^‡Google

UMIACS

TVCG Honorable Mentions Award

316 of 399

Introduction

VR headset & video streaming

316

360 Cameras and VR headsets are increasing in resolution.
Video streaming is quickly increasing in popularity.

317 of 399

Introduction

VR + eye tracking

317

Commercial VR headsets are getting eye-tracking capabilities.

HTC Vive Eye

Varjo VR-3

Fove

318 of 399

Introduction

360 videos

318

360 cameras capture the scene in every direction with a full 360 degree spherical field of regard.
These videos are typically stored in the equirectangular projection parameterized by spherical coordinates (𝜃, 𝜑).

360° Field of Regard

Scene

Captured 360 Video

319 of 399

Introduction

360 videos

319

When viewed in a VR headset, 360° videos cover the entire field-of-view for more immersive experiences.
However, transmitting the full field-of-regard either has worse perceived quality or requires far more bandwidth than for conventional videos.

Captured 360 Video

Projection to Field of View

320 of 399

Introduction

360 videos

320

Existing work in 360° streaming focuses on viewport dependent streaming by using tiling to transmit only visible regions based on the user’s head rotation.

Tiling Illustration

Image from (Liu et al. with Prof. Bo, 2017)

321 of 399

Introduction

Foveated rendering

321

Foveated rendering renders the fovea region of the viewport at a high-resolution and the peripheral region at a lower resolution.
Kernel Foveated Rendering (Meng et al., PACMCGIT 2018) uses a log-polar transformation to render foveated images in real-time.

Image Credit: Tobii

Log-polar Transformation,

Image from (Meng et al., 2018)

322 of 399

Introduction

Log-Polar Foveated Streaming

322

Applying log-polar subsampling to videos results in flickering and aliasing artifacts in the foveated video.

323 of 399

323

1

Research Question

Can foveation techniques from rendering be used to optimize 360 video streaming?

324 of 399

324

2

Research Question

How can we reduce foveation artifacts by leveraging the full original video frame?

325 of 399

Log-Polar Foveated Streaming

Artifacts are caused by subsampling of the original video frame.

Original Frame

Subsampled Pixel

326 of 399

Log-Polar Foveated Streaming

Artifacts are caused by subsampling of the original video frame.

Original Frame

Subsampled Pixel

327 of 399

Log-Polar Foveated Streaming

Subsampled pixels should represent an average over an entire region of the original video frame.
Computationally, this would take O(region size) time to compute for each sample.

328 of 399

Summed-Area Tables

One way to compute averages quickly is using summed-area tables, also known as integral images.
Sampling a summed area table only takes O(1) time.

329 of 399

Log-Rectilinear Transformation

Apply exponential drop off along x-axis and y-axis independently.
Rectangular regions allow the use of summed area tables for subsampling.
A one-to-one mapping near the focus region preserves the resolution of the original frame.

330 of 399

Foveated Streaming

Decoding 360° Video

GPU-driven Summed-Area Table Generation

Computing the Log-Rectilinear Buffer

Encoding the Log-Rectilinear Video Stream

Updating the Foveal Position

Decoding the

Log-Rectilinear Video Stream

Transforming into a Full-resolution Video Frame

Video Streaming Server

Client

socket

FFmpeg

OpenCL

FFmpeg

OpenCL

Video Streaming Request

socket

331 of 399

Qualitative Results

Shown with gaze at the center of the viewport

332 of 399

Quantitative Results

We perform quantitative evaluations comparing the log-rectilinear transformation and the log-polar transformation in 360° video streaming.

Performance overhead of summed-area tables.
Full-frame quality.
Bandwidth usage.

333 of 399

Quantitative Results

Pairing the log-rectilinear transformation with summed area table filtering yields lower flickering while also reducing bandwidth usage and returning high weighted-to-spherical signal to noise ratio (WS-PSNR) results.

334 of 399

Quantitative Results

Pairing the log-rectilinear transformation with summed area table filtering yields lower flickering while also reducing bandwidth usage and returning high weighted-to-spherical signal to noise ratio (WS-PSNR) results.

335 of 399

Conclusion

We present a log-rectilinear transformation which utilizes foveation, summed-area tables, and standard video codecs for foveated 360° video streaming.

Foveation

Summed-Area Tables

Standard Video Codecs

Foveated 360° Video Streaming

336 of 399

Zhenyi He^* Ruofei Du^† Ken Perlin^*

^*Future Reality Lab, New York University ^†Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

337 of 399

338 of 399

339 of 399

340 of 399

The best layout and interaction mode?

341 of 399

Research Questions:

Design: What if we could bring sketching to real-time collaboration in VR?
Design + Evaluation: If we can convert raw sketches into interactive animations, will it improve the performance of remote collaboration?
Evaluation: Are there best user arrangements or input modes for different use cases, or is it more a question of personal preferences

342 of 399

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

343 of 399

344 of 399

345 of 399

346 of 399

CollaboVR

Chalktalk (Cloud App)

Audio Communication

Layout Reconfiguration

347 of 399

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(3) hybrid

Input Modes

(1) direct

(2) projection

348 of 399

Layout Reconfiguration

User Arrangements

(1) side-by-side

(b)

user 1

Interactive boards

tracking range of user 1

user 1

user 2

349 of 399

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(b)

(c)

user 1

user 2

(b)

user 2

observed by user 1

A

user 1

RH

LH

RH

LH

The face-to-face arrangement solves the occlusion issue by mirroring all the other avatars’ locations to the other side of their currently activated interactive board.

See the example that user 1 enables the face-to-face arrangement. So user 2 in user 1’s view is mirrored to the other side of the left interactive board which user2 is looking at. Hence user2 and user1 did not visually block each other.

Then let’s take a look at the gaze interaction. Spot A is defined as the intersection point of the gaze direction between two users and the left interactive board. Assuming that user1 and user2 are both looking at spot A. Before mirror operation, user1 and user2 are only aware of themselves looking at spot A. After the mirror operation, the gaze direction between users and content is maintained. Moreover, users are aware of each other’s focus when gazing at spot A at the same time.

Since we did not manipulate the content in CollaboVR, the content is still legible and correct to each viewer.

350 of 399

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(d)

(c)

(b)

user 1

user 2

user 3

user 4

351 of 399

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(3) hybrid

(d)

(c)

(b)

user 2

teacher

user 3

user 4

352 of 399

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

353 of 399

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

354 of 399

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

355 of 399

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

356 of 399

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

357 of 399

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

358 of 399

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

359 of 399

Evaluation

Overview of subjective feedback on CollaboVR

360 of 399

Evaluation

361 of 399

Evaluation

362 of 399

Takeaways

Developing CollaboVR, a reconfigurable end-to-end collaboration system.
Designing custom conﬁgurations for real-time user arrangements and input modes.
Quantitative and qualitative evaluation of CollaboVR.
Open-sourcing our software at https://github.com/snowymo/CollaboVR.

363 of 399

more live demos...

364 of 399

365 of 399

366 of 399

367 of 399

368 of 399

Zhenyi He^* Ruofei Du^† Ken Perlin^*

^*Future Reality Lab, New York University ^†Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

369 of 399

Fusing Physical and Virtual Worlds into

An Interactive Metaverse

Ruofei Du

Senior Research Scientist

Google, San Francisco

www.ruofeidu.com

me@duruofei.com

Good morning everyone!

First of all, thanks Prof. Zhang for inviting me to give the gesture lecture to graduate students at UCLA.

I'm honored to share some of my latest research in virtual and augmented reality.

The title of my talk today is "Fusing Physical and Virtual Worlds into An Interactive Metaverse."

Abstract,

With the dramatic growth of virtual and augmented reality, ubiquitous digital information is created or captured from both the virtual and the physical worlds. However, it remains a challenge how to bridge the gap between the real and virtual. In this talk, I will present several technologies that seamlessly fuse the digital and physical worlds into interactive mixed reality. In Project DepthLab, I will present an opensourced library that enables real-time 3D interaction on a mobile phone. In Project Geollery.com, I will present a real-time pipeline to reconstruct a mirrored world on various platforms. In Project Log-Rectilinear, I will present a foveated streaming algorithm of 360 videos. Finally, we conclude the talk with demos of CollaboVR and Montage4D to envision a future of fused reality on mobile and wearable devices.

370 of 399

Introduction

Depth Map

Introduction

Depth Map

371 of 399

Introduction

Depth Lab

372 of 399

Thank you!

www.duruofei.com

373 of 399

Introduction

Depth Lab

Occlusion is a critical component for AR realism!

Correct occlusion helps ground content in reality, and makes virtual objects feel as if they are actually in your space.

374 of 399

Introduction

Motivation

375 of 399

Depth Mesh

Generation

376 of 399

Localized Depth

Avatar Path Planning

377 of 399

Dense Depth

Depth Texture

378 of 399

Introduction

Depth Map

379 of 399

Taxonomy

Depth Usage

380 of 399

Introduction

Depth Map

381 of 399

Introduction

Depth Map

382 of 399

OmniSyn: Synthesizing 360 Videos with Wide-baseline Panoramas

David Li, Yinda Zhang, Christian Häne, Danhang Tang, Amitabh Varshney, Ruofei Du

382

383 of 399

Problem

Interpolate between 360° street view panoramas.

383

≥ 5 meters

baseline

OmniSyn

360° Wide-baseline

View Synthesis

384 of 399

Related Works - Monocular Neural Image Based Rendering with Continuous View Control (ICCV 2019)

Xu Chen, Jie Song, and Otmar Hilliges
Predict target depth from source RGB & target pose.
Do back-projection using the target depth.

384

385 of 399

Related Works - SynSin (CVPR 2020)

Olivia Wiles, Georgia Gkioxari, Richard Szeliski, Justin Johnson
End-to-end View Synthesis from a Single Image
Trained with differentiable point cloud rendering and without depth supervision.

385

386 of 399

Research Goal

What are the challenges associated with 360 street view synthesis?
How can we modify the traditional view synthesis pipeline to leverage two 360 panoramas meters apart?

386

387 of 399

Method

Tailor the traditional view synthesis pipeline to handle 360 panoramas.

Depth estimation with spherical sweep cost volumes (Sunghoon, 2016).
Mesh rendering for 360 panoramas.
Inpainting with CoordConv (Liu, 2018) and Circular CNNs (Schubert, 2019).

387

0	0	0	0	0	0	0	0
0.1	0.1	0.1	0.1	0.1	0.1	0.1	0.1
…	…						…
1	1	1	1	1	1	1	1

CoordConv

Spherical Cost Volume

388 of 399

Pipeline

Our pipeline directly operates on equirectangular panoramas:

388

Panorama 1

Panorama 0

Depth Prediction 0

Depth Prediction 1

RGB + Visibility

Depth Predictor

[R₀|t₀]

Mesh Renderer

Fusion

Network

[R₁|t₁]

Mesh Renderer

Target Panorama

389 of 399

Stereo Depth with Cost Volume

Stereo depth estimation uses cost volumes with perspective images.

Cost volumes compute the image different at various offsets to identify the correct disparity.
StereoNet compute cost volumes from deep features within a CNN.

389

-

=

-

=

…

390 of 399

Stereo 360 Depth with Cost Volume

For 360, we can perform a similar sweep testing different depths:

390

391 of 399

Stereo 360 Depth with Cost Volume

For 360, we can perform a similar sweep testing different depths:

391

-

=

-

=

…

392 of 399

Mesh Rendering

For 360 images, using point clouds leads to sparse regions:

392

393 of 399

Mesh Rendering

For 360 images, using point clouds leads to sparse regions:

393

Point Cloud Render

Mesh Render

3 m

2 m

1 m

4 m

OmniSyn (Mesh)

GT Visibility

OmniSyn (Point Cloud)

394 of 399

CoordConv and Circular CNN

CoordConv - append an additional channel to allow conv layers to know where it is in the image and hopefully adapt to the ERP distortion.

Distortion in the panorama means the images are not shift-invariant.

Circular CNN - modify the CNN padding so that the kernel wraps around the left and right edges of the ERP image.

394

0	0	0	0	0	0	0	0
0.1	0.1	0.1	0.1	0.1	0.1	0.1	0.1
…	…						…
1	1	1	1	1	1	1	1

CoordConv

Circular CNN

395 of 399

Experiments

Generated some synthetic street view images using CARLA

Static scenes without pedestrians or cars.
Allows us to obtain clean ground-truth depth information.

Trained our pipeline with explicit depth and RGB supervision.
Compare mesh and point-cloud rendering.
Test generalization to real street view images.

395

396 of 399

Results

We experiment on synthetic static street scenes from CARLA:

396

397 of 399

Generalization to Real Street View Panoramas

397

0 m GT

4.6 m GT

10.1 m GT

0 m GT

9.0 m GT

0 m GT

Synthesized

398 of 399

Limitations

398

Input 0

Input 1

Synthesized

Fusion network does not generalize well to unseen colors.

Depth prediction struggles with tall buildings.

Triangle removal may eliminate thin structures.

399 of 399

Conclusion

We identified some challenges with performing view synthesis on sets of synthetic 360 street view panoramas.
We augmented the view synthesis pipeline with components suitable for dual 360 panoramas.

399