1 of 450

2 of 450

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

Twitter: DuRuofei@

me@duruofei.com

3 of 450

Self Intro

www.duruofei.com

4 of 450

Self Intro

Ruofei Du (杜若飞)

5 of 450

Self Intro

Ruofei Du (杜若飞)

6 of 450

Self Intro

Ruofei Du (杜若飞)

Human-Computer Interaction

Geollery CHI '19, Web3D '19, VR '19

Social Street View Web3D '16 Best Paper Award

VideoFields

Web3D '16

SketchyScene

TOG (SIGGRAPH Asia) '19, ECCV '18

Montage4D

I3D '18

JCGT '19

DepthLab UIST '20

13K Installs

Kernel Foveated RenderingI3D '18, VR '20, TVCG '20

CollaboVR ISMAR '20

LogRectilinear

IEEE VR '21 (TVCG)

TVCG Honorable Mention

GazeChat

UIST '21

Computer Graphics

MDIF�ICCV' 21

HumanGPS�CVPR' 21

HandSight

ECCVW '14

TACCESS '15

Ad hoc UI

CHIEA '22

ProtoSound

CHI ‘22

PRIF�ECCV' 22

Computer

Vision

Rapsai

CHI ‘23

Visual Captions

ThingShare

CHI ‘23

7 of 450

Self Intro

Ruofei Du (杜若飞)

Interaction and Communication

Geollery CHI '19, Web3D '19, VR '19

Social Street View Web3D '16

Best Paper Award

VideoFields

Web3D '16

SketchyScene

TOG (SIGGRAPH Asia) '19, ECCV '18

Montage4D

I3D '18

JCGT '19

DepthLab UIST '20

13K Installs

Kernel Foveated RenderingI3D '18, VR '20, TVCG '20

CollaboVR ISMAR '20

LogRectilinear

IEEE VR '21 (TVCG)

TVCG Honorable Mention

GazeChat

UIST '21

Digital World

Digital Human

HumanGPS�CVPR' 21

HandSight

ECCVW '14

TACCESS '15

ProtoSound

CHI ‘22

Ad hoc UI

CHIEA '22

OmniSyn

IEEE VR '22

SlurpAR DIS '22

Visual Captions

ThingShare

CHI ‘23

Rapsai

CHI ‘23

8 of 450

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

9 of 450

Metaverse

10 of 450

How Metaverse is defined by

academia and industry?

11 of 450

Neal Stephenson, 1992.

12 of 450

Metaverse

13 of 450

14 of 450

15 of 450

16 of 450

Metaverse

Future of Internet?

Internet of Things?

Virtual Reality?

Augmented Reality?

Decentralization?

Blockchain + NFT?

Mirrored World?

Digital Twin?

VR OS?

Web 3.0?

17 of 450

The Future of Internet

Internet of Things

Virtual Reality

Augmented Reality

Decentralization

Blockchain

NFT

Mirrored World

Metaverse

Digital Twin

VR OS

Web 3.0

Extended Reality (XR)

Accessibility

Avatars

Co-presence

Economics

Gaming

Wearable

AI

Privacy

Security

Vision

Neural

18 of 450

How do I define Metaverse?

19 of 450

More importantly, what research

directions shall we devote to Metaverse?

20 of 450

Metaverse

Metaverse envisioned a persistent digital world where people are fully connected as virtual representations,

As a teenager, my dream was to live in a metaverse...

However, today I wish metaverse is only a tool to make information more useful and accessible and help people to live a better physical life.

21 of 450

Interactive Graphics for a Universally Accessible Metaverse

Chapter One · Mirrored World & Real-time Rendering

Chapter Two · Computational Interaction: Algorithm & Systems

Chapter Three · Digital Human & Augmented Communication

22 of 450

Interactive Graphics for a Universally Accessible Metaverse

Chapter One · Mirrored World & Real-time Rendering

Geollery CHI '19, Web3D '19, VRW '19

Social Street View Web3D '16

Best Paper Award

Kernel Foveated RenderingI3D '18, VR '20, TVCG '20

LogRectilinear, OmniSyn

IEEE VR '21 (TVCG), VRW ‘22

TVCG Honorable Mention

23 of 450

How about a little bit dating back?

24 of 450

25 of 450

Project Geollery.com & Social Street View: Reconstructing a Live Mirrored World With Geotagged Social Media

Ruofei Du, David Li, and Amitabh Varshney

{ruofei, dli7319, varshney}@umiacs.umd.edu | www.Geollery.com | ACM CHI 2019 & Web3D 2016 Best Paper Award & 2019

UMIACS

THE AUGMENTARIUM

VIRTUAL AND AUGMENTED REALITY LAB

AT THE UNIVERSITY OF MARYLAND

COMPUTER SCIENCE

UNIVERSITY OF MARYLAND, COLLEGE PARK

26 of 450

Introduction

Social Media

26

image courtesy: plannedparenthood.org

27 of 450

Introduction

Social Media + Topics

27

image courtesy: huffingtonpost.com

28 of 450

Motivation

Social Media + XR

28

29 of 450

Motivation

Social Media + XR

29

image courtesy:

instagram.com,

facebook.com,

twitter.com

30 of 450

Motivation

2D layout

30

image courtesy:

pinterest.com

31 of 450

Motivation

Immersive Mixed Reality?

31

image courtesy:

viralized.com

32 of 450

Motivation

Pros and cons of the classic

32

33 of 450

Motivation

Pros and cons of the classic

33

34 of 450

Related Work

Social Street View, Du and Varshney

Web3D 2016 Best Paper Award

34

35 of 450

Technical Challenges?

36 of 450

Related Work

Social Street View, Du and Varshney

Web3D 2016 Best Paper Award

36

37 of 450

Related Work

Social Street View, Du and Varshney

Web3D 2016 Best Paper Award

37

38 of 450

Related Work

3D Visual Popularity

Bulbul and Dahyot, 2017

38

39 of 450

Related Work

Virtual Oulu, Kukka et al.

CSCW 2017

39

40 of 450

Related Work

Immersive Trip Reports

Brejcha et al. UIST 2018

40

41 of 450

Related Work

High Fidelity, Inc.

41

42 of 450

Related Work

Facebook Spaces, 2017

42

43 of 450

What's Next?

Research Question 1/3

43

What may a social media platform look like in mixed reality?

44 of 450

What's Next?

Research Question 2/3

44

What if we could allow social media sharing in a live mirrored world?

45 of 450

What's Next?

Research Question 3/3

45

What use cases can we benefit from social media platform in XR?

46 of 450

Geollery.com

A Mixed-Reality Social Media Platform

46

47 of 450

Geollery.com

A Mixed-Reality Social Media Platform

47

48 of 450

48

1

Conception, architecting & implementation

Geollery

A mixed reality system that can depict geotagged social media and online avatars with 3D textured buildings.

49 of 450

49

2

Extending the design space of

3D Social Media Platform

Progressive streaming, aggregation approaches, virtual representation of social media, co-presence with virtual avatars, and collaboration modes.

50 of 450

50

3

Conducting a user study of

Geollery vs. Social Street View

by discussing their benefits, limitations, and potential impacts to future 3D social media platforms.

51 of 450

System Overview

Geollery Workflow

51

52 of 450

System Overview

Geollery Workflow

52

53 of 450

Geollery.com

v2: a major leap

53

54 of 450

System Overview

Geollery Workflow

54

55 of 450

System Overview

2D Map Data

55

56 of 450

System Overview

2D Map Data

56

57 of 450

System Overview

+Avatar +Trees +Clouds

57

58 of 450

System Overview

+Avatar +Trees +Clouds +Night

58

59 of 450

System Overview

Street View Panoramas

59

60 of 450

System Overview

Street View Panoramas

60

61 of 450

System Overview

Street View Panoramas

61

62 of 450

System Overview

Geollery Workflow

62

All data we used is publicly and widely available on the Internet.

63 of 450

Rendering Pipeline

Close-view Rendering

63

64 of 450

Rendering Pipeline

Initial spherical geometries

64

65 of 450

Rendering Pipeline

Depth correction

65

66 of 450

Rendering Pipeline

Intersection removal

66

67 of 450

Rendering Pipeline

Texturing individual geometry

67

68 of 450

Rendering Pipeline

Texturing with alpha blending

68

69 of 450

Rendering Pipeline

Rendering result in the fine detail

69

70 of 450

Rendering Pipeline

Rendering result in the fine detail

70

71 of 450

Rendering Pipeline

Rendering result in the fine detail

71

72 of 450

User Study

Social Street View vs. Geollery

72

73 of 450

User Study

Quantitative Evaluation

73

74 of 450

User Study

Quantitative Evaluation

74

75 of 450

75

I would like to use it for the food in different restaurants. I am always hesitating of different restaurants. It will be very easy to see all restaurants with street views. In Yelp, I can only see one restaurant at a time.

P6 / F

76 of 450

76

[I will use it for] exploring new places. If I am going on vacation somewhere, I could immerse myself into the location. If there are avatars around that area, I could ask questions.

P1 / M

77 of 450

77

I think it (Geollery) will be useful for families. I just taught my grandpa how to use Facetime last week and it would great if I could teleport to their house and meet with them, then we could chat and share photos with our avatars.

P2 / F

78 of 450

78

if there is a way to unify the interaction between them, there will be more realistic buildings [and] you could have more roof structures. Terrains will be interesting to add on.

P18 / M

79 of 450

Rendering Pipeline

Experimental Features

79

80 of 450

Landing Impact

Demos at ACM CHI 2019

80

81 of 450

Landing Impact

Demos at ACM CHI 2019

81

82 of 450

Landing Impact

Demos at ACM CHI 2019

82

83 of 450

Instant Panoramic Texture Mapping with Semantic Object Matching for Large-Scale Urban Scene Reproduction

TVCG 2021, Jinwoo Park, Ik-beom Jeon, Student Members, Sung-eui Yoon, and Woontack Woo

84 of 450

Instant Panoramic Texture Mapping with Semantic Object Matching for Large-Scale Urban Scene Reproduction

TVCG 2021, Jinwoo Park, Ik-beom Jeon, Student Members, Sung-eui Yoon, and Woontack Woo

A more applicable method for constructing walk-through experiences in urban streets was employed by Geollery [16], which adopted an efficient transformation of a dense spherical mesh to construct a local proxy geometry based on the depth maps from Google Street View

85 of 450

Freeman et al. ACM PHCI 2022

He et al. ISMAR 2020

Park et al. Virtual Reality 2022

Yeom et al. IEEE VR 2021

86 of 450

What's Next?

87 of 450

Video Fields: Fusing Multiple Surveillance Videos into a Dynamic Virtual Environment

88 of 450

image courtesy: university of maryland, college park

Introduction

Surveillance Videos

89 of 450

Architecture

Video Fields Flowchart

90 of 450

91 of 450

OmniSyn: Intermediate View Synthesis Between Wide-Baseline Panoramas

David Li, Yinda Zhang, Christian Häne, Danhang Tang, Amitabh Varshney, and Ruofei Du, VR 2022

92 of 450

OmniSyn: Intermediate View Synthesis Between Wide-Baseline Panoramas

David Li, Yinda Zhang, Christian Häne, Danhang Tang, Amitabh Varshney, and Ruofei Du, VR 2022

93 of 450

input 1

input 2

input 3

94 of 450

95 of 450

How can we further accelerate the real-time rendering procedure?

96 of 450

96

Kernel Foveated Rendering

Xiaoxu Meng, Ruofei Du, Matthias Zwicker and Amitabh Varshney

Augmentarium | UMIACS

University of Maryland, College Park

ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2018

97 of 450

97

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

98 of 450

98

 

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

Kernel Log-polar Mapping

 

99 of 450

99

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

100 of 450

Can we further accelerate it?

101 of 450

Eye-dominance-guided�Foveated Rendering

Xiaoxu Meng, Ruofei Du, and Amitabh Varshney

IEEE Transactions on Visualization and Computer Graphics (TVCG)

102 of 450

fovea

fovea

more foveation for the non-dominant eye

103 of 450

What if we apply to 3D volume data formats?

104 of 450

3D-Kernel Foveated Rendering for Light Fields

Xiaoxu Meng, Ruofei Du, Joseph JaJa, and Amitabh Varshney

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2020

UMIACS

105 of 450

106 of 450

How about 360 video streaming?

107 of 450

A Log-Rectilinear Transformation for Foveated 360-Degree Video Streaming

David Li, Ruofei Du, Adharsh Babu, Camelia Brumar, Amitabh Varshney

University of Maryland, College Park Google Research

UMIACS

TVCG Honorable Mentions Award

108 of 450

109 of 450

110 of 450

With recent advances of neural networks,

how can we further

compress existing graphics?

111 of 450

Sandwiched Image Compression

Wrapping Neural Networks Around a Standard Codec

Increasing the Resolution and Dynamic Range of Standard Codecs

Onur Guleryuz, Philip Chou, Hugues Hoppe, Danhang Tang, �Ruofei Du, Philip Davidson, and Sean Fanello

2021 IEEE International Conference on Image Processing (ICIP)�2022 Picture Coding Symposium (PCS) Best Paper Finalist

112 of 450

113 of 450

114 of 450

115 of 450

What about compressing 3D volumes?

116 of 450

What are levels of details?

117 of 450

118 of 450

Multiresolution Deep Implicit Functions for 3D Shape Representation

Zhang Chen, Yinda Zhang, Kyle Genova, Thomas Funkhouse, Sean Fanello, Sofien Bouaziz, Christian Häne, Ruofei Du, Cem Keskin, and Danhang Tang

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

119 of 450

120 of 450

121 of 450

Interactive Graphics for a Universally Accessible Metaverse

Chapter Two · Computational Interaction: Algorithm & Systems

Ad hoc UI

CHI EA ‘22

DepthLab

UIST '20

13K Installs & deployed in Tiktok, Snap, Teamviewer etc.

SlurpARDIS ‘22

Rapsai

CHI ‘23

122 of 450

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

123 of 450

Introduction

Mobile Augmented Reality

124 of 450

Introduction

Google's ARCore

125 of 450

Introduction

Google's ARCore

126 of 450

Introduction

Mobile Augmented Reality

127 of 450

Introduction

Motivation

Is the current generation of object placement sufficient for realistic AR experiences?

128 of 450

Introduction

Depth Lab

Not always!

129 of 450

Introduction

Depth Lab

Virtual content looks like it’s “pasted on the screen” rather than “in the world”!

130 of 450

Introduction

Motivation

131 of 450

132 of 450

Introduction

Depth Lab

How can we bring these advanced

features to mobile AR experiences WITHOUT relying on dedicated sensors or the need for computationally expensive surface reconstruction?

133 of 450

Introduction

Depth Map

134 of 450

Introduction

Depth Lab

Google

Pixel 2, Pixel 2 XL, Pixel 3, Pixel 3 XL, Pixel 3a, Pixel 3a XL, Pixel 4, Pixel 4 XL

Huawei

Honor 10, Honor V20, Mate 20 Lite, Mate 20, Mate 20 X, Nova 3, Nova 4, P20, P30, P30 Pro

LG

G8X ThinQ, V35 ThinQ, V50S ThinQ, V60 ThinQ 5G

OnePlus

OnePlus 6, OnePlus 6T, OnePlus 7, OnePlus 7 Pro, OnePlus 7 Pro 5G, OnePlus 7T, OnePlus 7T Pro

Oppo

Reno Ace

Samsung

Galaxy A80, Galaxy Note8, Galaxy Note9, Galaxy Note10, Galaxy Note10 5G, Galaxy Note10+, Galaxy Note10+ 5G, Galaxy S8, Galaxy S8+, Galaxy S9, Galaxy S9+, Galaxy S10e, Galaxy S10, Galaxy S10+, Galaxy S10 5G, Galaxy S20, Galaxy S20+ 5G, Galaxy S20 Ultra 5G

Sony

Xperia XZ2, Xperia XZ2 Compact, Xperia XZ2 Premium, Xperia XZ3

Xiaomi

Pocophone F1

135 of 450

Introduction

Depth Lab

Is there more to realism than occlusion?

136 of 450

Introduction

Depth Lab

Surface interaction?

137 of 450

Introduction

Depth Lab

Realistic Physics?

138 of 450

Introduction

Depth Lab

Path Planning?

139 of 450

140 of 450

Introduction

Depth Lab

141 of 450

Related Work

Valentin et al.

142 of 450

Depth Maps

143 of 450

Depth �from Motion

Depth From a Single Camera

144 of 450

Best Practices

Depth From a Single Camera

Use depth-certified ARCore devices

Minimal movement in the scene

Encourage users to move the device

Depth from 0 to 8 meters

Best accuracy 0.5 to 5 meters

145 of 450

Enhancing Depth

Optimized to give you the best depth

Depth from Motion is fused with state-of-the-art Machine Learning

Depth leverages specialized hardware like a Time-of-Flight sensor when available

146 of 450

Introduction

Depth Lab

147 of 450

Introduction

Depth Lab

148 of 450

Introduction

Depth Generation

149 of 450

Introduction

Depth Lab

150 of 450

Related Work

Valentin et al.

151 of 450

Introduction

Depth Lab

152 of 450

Introduction

Depth Lab

Up to 8 meters, with

the best within 0.5m to 5m

153 of 450

Motivation

Gap from raw depth to applications

154 of 450

Introduction

Depth Lab

ARCore

Depth API

DepthLab

Mobile AR developers

155 of 450

Design Process

3 Brainstorming Sessions

3 brainstorming sessions

18 participants

39 aggregated ideas

156 of 450

Design Process

3 Brainstorming Sessions

157 of 450

System

Architecture overview

158 of 450

Data Structure

Depth Array

2D array (160x120 and above) of 16-bit integers

159 of 450

Data Structure

Depth Mesh

160 of 450

Data Structure

Depth Texture

161 of 450

System

Architecture

162 of 450

Localized Depth

Coordinate System Conversion

163 of 450

Localized Depth

Normal Estimation

164 of 450

Localized Depth

Normal Estimation

165 of 450

Localized Depth

Normal Estimation

166 of 450

Localized Depth

Avatar Path Planning

167 of 450

Localized Depth

Rain and Snow

168 of 450

Surface Depth

Use Cases

169 of 450

Surface Depth

Physics collider

Physics with depth mesh.

170 of 450

Surface Depth

Texture decals

Texture decals with depth mesh.

171 of 450

Surface Depth

3D Photo

Projection mapping with depth mesh.

172 of 450

Dense Depth

Depth Texture - Antialiasing

173 of 450

Dense Depth

Real-time relighting

θ

N

L

174 of 450

Dense Depth

Why normal map does not work?

175 of 450

Dense Depth

Real-time relighting

176 of 450

Dense Depth

Real-time relighting

177 of 450

Dense Depth

Real-time relighting

go/realtime-relighting, go/relit

178 of 450

Dense Depth

Wide-aperture effect

179 of 450

Dense Depth

Occlusion-based rendering

180 of 450

181 of 450

Experiments

DepthLab minimum viable application

182 of 450

Experiments

General Profiling of MVP

183 of 450

Experiments

Relighting

184 of 450

Experiments

Aperture effects

185 of 450

Impact

Deployment with partners

186 of 450

Impact

Deployment with partners

187 of 450

Impact

Deployment with partners

188 of 450

AR Realism

In TikTok

189 of 450

AR Realism

Built into Lens Studio for Snapchat Lenses

Kevaid

Saving Chelon

Quixotical�The Seed: World of Anthrotopia

Snap�Dancing Hotdog

190 of 450

Camera Image

3D Point Cloud

Provides a more detailed representation of the geometry of the objects in the scene.

Raw Depth API

New depth capabilities

191 of 450

Camera Image

Raw Depth Image

Depth Image

Confidence Image

New depth capabilities

Raw Depth API

Provides a more detailed representation of the geometry of the objects in the scene.

192 of 450

Try it yourself!

TeamViewer�LifeAR App

ARCore�Depth Lab App

Depth �Hit Test

New depth capabilities

193 of 450

ARCore�Depth Lab App

Depth API�Codelab

Raw Depth API�Codelab

194 of 450

Limitations

Design space of dynamic depth

Dynamic Depth? HoloDesk, HyperDepth, Digits, Holoportation for mobile AR?

195 of 450

Envision

Design space of dynamic depth

196 of 450

GitHub

Please feel free to fork!

197 of 450

Play Store

Try it yourself!

198 of 450

Impact

Significant Media Coverage

199 of 450

Impact

Significant Media Coverage

200 of 450

More Links

Significant Media Coverage

201 of 450

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

202 of 450

Thank you!

DepthLab | UIST 2020

203 of 450

Demo

DepthLab | UIST 2020

204 of 450

DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte,

Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces,

Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, David Kim

Google | ACM UIST 2020

205 of 450

After exploring interaction with the environment,

how shall we interact with everyday object?

206 of 450

Ad hoc UI: On-the-fly Transformation of Everyday Objects

into Tangible 6DOF Interfaces for AR

Ruofei Du, Alex Olwal, Mathieu Le Goc, Shengzhi Wu, Danhang Tang,

Yinda Zhang, Jun Zhang, David Joseph Tan, Federico Tombari, David Kim

Google | CHI 2022 Interactivity

207 of 450

208 of 450

Applications

209 of 450

Can we learn from the history to

interact with everyday object?

210 of 450

“Slurp” Revisited: Using Software Reconstruction to Reflect on Spatial Interactivity and Locative Media

Shengzhi Wu, Daragh Byrne, Ruofei Du, and Molly Steenson

ACM DIS 2022

211 of 450

212 of 450

RetroSphere: Self-Contained Passive 3D Controller Tracking for Augmented Reality

Ananta Narayanan Balaji, Clayton Kimber, David Li, Shengzhi Wu, Ruofei Du, David Kim

ACM Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 2022

213 of 450

Ananta Narayanan Balaji, Clayton Kimber, David Li, Shengzhi Wu, Ruofei Du, David Kim

ACM Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 2022

214 of 450

215 of 450

With recent advances of on-device ML models,

how can we accelerate the prototyping efforts?

216 of 450

What if we can build applications as if building Legos?

217 of 450

rapsai

Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming

Ruofei Du, Na Li, Jing Jin, Michelle Carney, Scott Miles, Maria Kleiner, Xiuxiu Yuan, Yinda Zhang,

Anuva Kulkarni, Xingyu "Bruce" Liu, Ahmed Sabie, Sergio Escolano, Abhishek Kar,

Ping Yu, Ram Iyengar, Adarsh Kowdle, and Alex Olwal

218 of 450

219 of 450

Interactive Graphics for a Universally Accessible Metaverse

Chapter Three · Digital Human & Augmented Communication

HumanGPS CVPR ‘21

Montage4D

I3D '18

JCGT '19

GazeChat & CollaboVRUIST ‘21 & ISMAR ‘20

Visual Captions & ThingShareCHI ‘23

220 of 450

What is Avatar?

221 of 450

222 of 450

Avatar

History & Definition

Avatar is a term used in Hinduism for a material manifestation of a deity: “descent of a deity from a heaven”

223 of 450

Avatar

History & Definition

In computing, an avatar is a graphical representation of a user or the user's character or persona.

224 of 450

Avatar

Taxonomy

What is the oldest avatar in computer history?

225 of 450

Avatar

History & Definition

226 of 450

Avatar

History & Definition

Guo, Kaiwen, Peter Lincoln, Philip Davidson, Jay Busch, Xueming Yu, Matt Whalen, Geoff Harvey et al. "The relightables: Volumetric performance capture of humans with realistic relighting." ACM Transactions on Graphics (ToG) 38, no. 6 (2019): 1-19.

227 of 450

Dating back to real-time digital human / avatars…

228 of 450

228

229 of 450

230 of 450

231 of 450

232 of 450

233 of 450

234 of 450

235 of 450

236 of 450

237 of 450

238 of 450

239 of 450

240 of 450

241 of 450

242 of 450

What is the state-of-the-art since then?

243 of 450

Related Work

Fusing Multiple Dynamic Videos

244 of 450

ACM Trans. Graph., Vol. 40, No. 4, Article 1. SIGGRAPH 2021

245 of 450

Photorealistic Characters

The Relightables

Kaiwen Guo, Peter Lincoln, Philip Davidson, Jay Busch, Xueming Yu, Matt Whalen, Geoff Harvey, Sergio Orts-Escolano, Rohit Pandey, Jason Dourgarian, Danhang Tang, Anastasia Tkach, Adarsh Kowdle, Emily Cooper, Mingsong Dou, Sean Fanello, Graham Fyffe, Christoph Rhemann, Jonathan Taylor, Paul Debevec, and Shahram Izadi. 2019. The Relightables: Volumetric Performance Capture of Humans With Realistic Relighting. ACM Transactions on Graphics, pp. . DOI: https://doi.org/10.1145/3355089.3356571

246 of 450

ACM Trans. Graph., Vol. 40, No. 4, Article 1. SIGGRAPH 2021

247 of 450

Photorealistic Characters

Rocketbox

Mar Gonzalez-Franco, Eyal Ofek, Ye Pan, Angus Antley, Anthony Steed, Bernhard Spanlang, Antonella Maselli, Domna Banakou, Nuria Pelechano, Sergio Orts Escolano, Veronica Orvahlo, Laura Trutoiu, Markus Wojcik, Maria V. Sanchez-Vives, Jeremy Bailenson, Mel Slater, and Jaron Lanier "The Rocketbox library and the utility of freely available rigged avatars." Frontiers in Virtual Reality DOI: 10.3389/frvir.2020.561558

248 of 450

Photorealistic Characters

From phone scan

Chen Cao, Tomas Simon, Jin Kyu Kim, Gabe Schwartz, Michael Zollhoefer, Shun-Suke Saito, Stephen Lombardi, Shih-En Wei, Danielle Belko, Shoou-I Yu, Yaser Sheikh, and Jason Saragih. 2022. Authentic Volumetric Avatars From a Phone Scan. ACM Transactions on Graphics, pp. . DOI: https://doi.org/10.1145/3528223.3530143

249 of 450

How can we build dynamic dense correspondence

within the same subject and

among different subjects?

250 of 450

251 of 450

252 of 450

253 of 450

254 of 450

255 of 450

How can we leverage real-time Avatars today?

256 of 450

GazeChat

Enhancing Virtual Conferences With

Gaze-Aware 3D Photos

Zhenyi He, Keru Wang, Brandon Yushan Feng, Ruofei Du, Ken Perlin

New York University� University of Maryland, College Park � Google

257 of 450

258 of 450

Introduction

VR headset & video streaming

258

259 of 450

Related Work

Gaze-2 (2003)

259

260 of 450

Related Work

MultiView (2005)

260

261 of 450

Related Work

MMSpace (2016)

261

262 of 450

Our Work

GazeChat (UIST 2021)

262

263 of 450

Gaze Awareness

Definition

263

Gaze awareness, defined here as knowing what someone is looking at.

264 of 450

Gaze Awareness

Definition

264

gaze correction

gaze redirection

raw input image

GazeChat

265 of 450

Gaze Correction

Definition

265

266 of 450

Gaze Rediction

Definition

266

eye contact

who is looking at whom

267 of 450

Pipeline

System

267

268 of 450

Eye Tracking

WebGazer..js

268

269 of 450

Neural Rendering

Eye movement

269

270 of 450

Neural Rendering

Eye movement

270

271 of 450

3D Photo Rendering

3D photos

271

272 of 450

3D Photo Rendering

3D photos

272

273 of 450

Layouts

UI

273

274 of 450

Networking

WebRTC

274

275 of 450

How can we work in XR as stylized avatars?

276 of 450

Zhenyi He* Ruofei Du Ken Perlin*

*Future Reality Lab, New York University Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

277 of 450

278 of 450

How can we further augment communication,

in videoconferencing, AR, and XR in future?

279 of 450

Visual Captions

Augmenting Verbal Communication

With On-the-Fly Visuals

Xingyu "Bruce" Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Peggy Chi,

Xiang "Anthony" Chen, Alex Olwal, and Ruofei Du

280 of 450

281 of 450

ThingShare

Ad-Hoc Digital Copies of Physical Objects for Sharing Things in Video Meetings

Erzhen Hu, Jens Emil Grønbæk, Wen Ying, Ruofei Du, and Seongkook Heo

282 of 450

283 of 450

How can AI benefit a broader inclusive community?

284 of 450

ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

ACM CHI 2012 · Dhruv Jain, Khoa Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, and Jon Froehlich

285 of 450

286 of 450

How can AI + Metaverse improve our life?

287 of 450

SketchyScene: Richly-Annotated Scene Sketches

Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengying Gao, Baoquan Chen, and Hao Zhang (ECCV 2022)

288 of 450

Language-based Colorization of Scene Sketches

Changqing Zou, Haoran Mo, Chengying Gao, Ruofei Du, and Hongbo Fu (ACM Transaction on Graphics, SIGGRAPH Asia 2019)

289 of 450

290 of 450

291 of 450

292 of 450

Future Directions

The Ultimate XR Platform

292

293 of 450

Wearable Subtitles

Augmenting Spoken Communication with

Lightweight Eyewear for All-day Captioning

294 of 450

Future Directions

The Ultimate XR Platform

294

295 of 450

Future Directions

Fuses Past Events

295

296 of 450

Future Directions

With the present

296

297 of 450

Future Directions

And look into the future

297

298 of 450

Future Directions

Change the way we communicate in 3D and consume the information

298

299 of 450

Future Directions

Consume the information throughout the world

299

300 of 450

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

Thank you!

301 of 450

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

302 of 450

Interactive Perception & Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

303 of 450

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

304 of 450

Interactive Graphics for a

Universally Accessible Metaverse

Ruofei Du

Senior Research Scientist

Google Labs, San Francisco

www.ruofeidu.com

me@duruofei.com

305 of 450

306 of 450

306

Kernel Foveated Rendering

Xiaoxu Meng, Ruofei Du, Matthias Zwicker and Amitabh Varshney

Augmentarium | UMIACS

University of Maryland, College Park

ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2018

307 of 450

307

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

Application

Resolution

Frame rate

MPixels / sec

Desktop game

1920 x 1080 x 1

60

124

308 of 450

308

Application

Resolution

Frame rate

MPixels / sec

Desktop game

1920 x 1080 x 1

60

124

2018 VR

(HTC Vive PRO)

1440 x 1600 x 2

90

414

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

309 of 450

309

* Data from Siggraph Asia 2016, Prediction by Michael Abrash, October 2016

Application

Resolution

Frame rate

MPixels / sec

Desktop game

1920 x 1080 x 1

60

124

2018 VR

(HTC Vive PRO)

1440 x 1600 x 2

90

414

2020 VR *

4000 x 4000 x 2

90

2,880

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

310 of 450

310

  • Virtual reality is a challenging workload

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

311 of 450

311

  • Virtual reality is a challenging workload

  • Most VR pixels are peripheral

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

fovea:

the center of the retina

corresponds to the center of the vision field

312 of 450

312

  • Virtual reality is a challenging workload

  • Most VR pixels are peripheral

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

foveal region:

the human eye detects significant detail

peripheral region:

the human eye detects little high fidelity detail

313 of 450

313

  • Virtual reality is a challenging workload

  • Most VR pixels are peripheral

foveal

region

foveal region

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

foveal region:

the human eye detects significant detail

peripheral region:

the human eye detects little high fidelity detail

314 of 450

314

  • Virtual reality is a challenging workload

  • Most VR pixels are peripheral

96 %

27 %

Percentage of the foveal pixels

4 %

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

* Data from Siggraph 2017, by Anjul Patney, August 2017

315 of 450

315

316 of 450

316

Foveated Rendering

317 of 450

317

  • Virtual reality is a challenging workload

  • Most VR pixels are peripheral

  • Eye tracking technology available

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

318 of 450

318

Related Work

319 of 450

319

Full Resolution

 

 

Multi-Pass Foveated Rendering [Guenter et al. 2012]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

320 of 450

320

Rasterizer

Early Z

 

Generate Coarse Quad

Shade

Evaluate Coarse Pixel Size

Input primitives

Coarse Pixel Shading (CPS) [Vaidyanathan et al. 2014]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

321 of 450

321

CPS with TAA & Contrast Preservation [Patney et al. 2016]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

322 of 450

322

Can we change the resolution gradually?

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

323 of 450

323

Perceptual Foveated Rendering [Stengel et al. 2016]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

324 of 450

324

Is there a foveated rendering approach

without

the expensive pixel interpolation?

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

325 of 450

325

 

 

 

 

Log-polar mapping [Araujo and Dias 1996]

 

 

Log-polar Mapping

 

 

 

 

 

 

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

326 of 450

326

Log-polar mapping [Araujo and Dias 1996]

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

 

Log-polar Mapping

327 of 450

327

Log-polar mapping [Araujo and Dias 1996]

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

 

Log-polar Mapping

328 of 450

328

Log-polar mapping [Araujo and Dias 1996]

 

 

 

Log-polar Mapping

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

329 of 450

329

Log-polar mapping [Araujo and Dias 1996]

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

 

 

Log-polar Mapping

330 of 450

330

Log-polar mapping [Araujo and Dias 1996]

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

 

 

Log-polar Mapping

331 of 450

331

Log-polar mapping [Araujo and Dias 1996]

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

 

 

Log-polar Mapping

332 of 450

332

Log-polar Mapping for 2D Image [Antonelli et al. 2015]

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

333 of 450

333

Log-polar Mapping for 2D Image

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

334 of 450

334

Our Approach

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

335 of 450

335

Kernel Log-polar Mapping

 

 

 

 

 

range: [0,1]

 

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

336 of 450

336

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

 

 

Log-polar Mapping

337 of 450

337

 

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

Kernel Log-polar Mapping

 

338 of 450

Kernel Foveated Rendering

338

 

339 of 450

339

Kernel log-polar Mapping

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

 

 

 

340 of 450

340

Kernel log-polar Mapping

 

 

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

341 of 450

341

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

342 of 450

342

 

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

343 of 450

343

 

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

344 of 450

344

 

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

345 of 450

345

 

 

 

Fovea

Fovea

Fovea

346 of 450

346

 

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

347 of 450

347

 

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

348 of 450

348

 

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

349 of 450

349

 

Original Frame

Buffer

Screen

Sample Map

Introduction

Related Work

Our Approach

User Study

Experiments

Conclusion

350 of 450

350

 

 

 

Fovea

Fovea

Fovea

351 of 450

Eye-dominance-guided�Foveated Rendering

Xiaoxu Meng, Ruofei Du, and Amitabh Varshney

IEEE Transactions on Visualization and Computer Graphics (TVCG)

352 of 450

353 of 450

354 of 450

355 of 450

355

356 of 450

356

Ocular Dominance: the tendency to prefer scene perception from one eye over the other.

357 of 450

Advantage of the Dominant Eye Over the Non-dominant Eye

  • better color-vision discrimination ability [Koctekin 2013]
  • shorter reaction time on visually triggered manual action [Chaumillon 2014]
  • better visual acuity, contrast sensitivity [Shneor 2006]

357

358 of 450

358

Application

Resolution

Frame rate

MPixels / sec

Desktop game

1920 x 1080 x 1

60

124

2018 VR

(HTC Vive PRO)

1440 x 1600 x 2

90

414

2020 VR

(Varjo)

1920 x 1080 x 2 +

1440 x 1600 x 2

90

788

359 of 450

Foveated Rendering

  • VR requires enormous rendering budget
  • Most pixels are outside the fovea

359

360 of 450

Foveated Rendering

  • VR requires enormous rendering budget
  • Most pixels are outside the fovea

360

96 %

27 %

Percentage of the foveal pixels

4 %

* Data from Siggraph 2017, by Anjul Patney, August 2017

361 of 450

362 of 450

fovea

fovea

363 of 450

Can we do better?

364 of 450

fovea

fovea

non-dominant eye

365 of 450

fovea

fovea

more foveation for the non-dominant eye

366 of 450

A Log-Rectilinear Transformation for Foveated 360-Degree Video Streaming

David Li, Ruofei Du, Adharsh Babu, Camelia Brumar, Amitabh Varshney

University of Maryland, College Park Google

UMIACS

TVCG Honorable Mentions Award

367 of 450

Introduction

VR headset & video streaming

367

  • 360 Cameras and VR headsets are increasing in resolution.
  • Video streaming is quickly increasing in popularity.

368 of 450

Introduction

VR + eye tracking

368

  • Commercial VR headsets are getting eye-tracking capabilities.

HTC Vive Eye​

Varjo VR-3

Fove

369 of 450

Introduction

360 videos

369

  • 360 cameras capture the scene in every direction with a full 360 degree spherical field of regard.
  • These videos are typically stored in the equirectangular projection parameterized by spherical coordinates (𝜃, 𝜑).

360° Field of Regard

Scene

Captured 360 Video

370 of 450

Introduction

360 videos

370

  • When viewed in a VR headset, 360° videos cover the entire field-of-view for more immersive experiences.
  • However, transmitting the full field-of-regard either has worse perceived quality or requires far more bandwidth than for conventional videos.

Captured 360 Video

Projection to Field of View

371 of 450

Introduction

360 videos

371

  • Existing work in 360° streaming focuses on viewport dependent streaming by using tiling to transmit only visible regions based on the user’s head rotation.

Tiling Illustration

Image from (Liu et al. with Prof. Bo, 2017)

372 of 450

Introduction

Foveated rendering

372

  • Foveated rendering renders the fovea region of the viewport at a high-resolution and the peripheral region at a lower resolution.
  • Kernel Foveated Rendering (Meng et al., PACMCGIT 2018) uses a log-polar transformation to render foveated images in real-time.

Image Credit: Tobii

Log-polar Transformation,

Image from (Meng et al., 2018)

373 of 450

Introduction

Log-Polar Foveated Streaming

373

  • Applying log-polar subsampling to videos results in flickering and aliasing artifacts in the foveated video.

374 of 450

374

1

Research Question

Can foveation techniques from rendering be used to optimize 360 video streaming?

375 of 450

375

2

Research Question

How can we reduce foveation artifacts by leveraging the full original video frame?

376 of 450

Log-Polar Foveated Streaming

  • Artifacts are caused by subsampling of the original video frame.

Original Frame

Subsampled Pixel

377 of 450

Log-Polar Foveated Streaming

  • Artifacts are caused by subsampling of the original video frame.

Original Frame

Subsampled Pixel

378 of 450

Log-Polar Foveated Streaming

  • Subsampled pixels should represent an average over an entire region of the original video frame.
  • Computationally, this would take O(region size) time to compute for each sample.

379 of 450

Summed-Area Tables

  • One way to compute averages quickly is using summed-area tables, also known as integral images.
  • Sampling a summed area table only takes O(1) time.

380 of 450

Log-Rectilinear Transformation

  • Apply exponential drop off along x-axis and y-axis independently.
  • Rectangular regions allow the use of summed area tables for subsampling.
  • A one-to-one mapping near the focus region preserves the resolution of the original frame.

381 of 450

Foveated Streaming

Decoding 360° Video

GPU-driven Summed-Area Table Generation

Computing the Log-Rectilinear Buffer

Encoding the Log-Rectilinear Video Stream

Updating the Foveal Position

Decoding the

Log-Rectilinear Video Stream

Transforming into a Full-resolution Video Frame

Video Streaming Server

Client

socket

socket

FFmpeg

OpenCL

OpenCL

FFmpeg

FFmpeg

OpenCL

Video Streaming Request

socket

382 of 450

Qualitative Results

  • Shown with gaze at the center of the viewport

383 of 450

Quantitative Results

We perform quantitative evaluations comparing the log-rectilinear transformation and the log-polar transformation in 360° video streaming.

  • Performance overhead of summed-area tables.
  • Full-frame quality.
  • Bandwidth usage.

384 of 450

Quantitative Results

  • Pairing the log-rectilinear transformation with summed area table filtering yields lower flickering while also reducing bandwidth usage and returning high weighted-to-spherical signal to noise ratio (WS-PSNR) results.

385 of 450

Quantitative Results

  • Pairing the log-rectilinear transformation with summed area table filtering yields lower flickering while also reducing bandwidth usage and returning high weighted-to-spherical signal to noise ratio (WS-PSNR) results.

386 of 450

Conclusion

  • We present a log-rectilinear transformation which utilizes foveation, summed-area tables, and standard video codecs for foveated 360° video streaming.

Foveation

Summed-Area Tables

Standard Video Codecs

Foveated 360° Video Streaming

387 of 450

Zhenyi He* Ruofei Du Ken Perlin*

*Future Reality Lab, New York University Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

388 of 450

389 of 450

390 of 450

391 of 450

The best layout and interaction mode?

392 of 450

Research Questions:

  • Design: What if we could bring sketching to real-time collaboration in VR?
  • Design + Evaluation: If we can convert raw sketches into interactive animations, will it improve the performance of remote collaboration?
  • Evaluation: Are there best user arrangements or input modes for different use cases, or is it more a question of personal preferences

393 of 450

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

394 of 450

395 of 450

396 of 450

397 of 450

CollaboVR

Chalktalk (Cloud App)

Audio Communication

Layout Reconfiguration

398 of 450

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(3) hybrid

Input Modes

(1) direct

(2) projection

399 of 450

Layout Reconfiguration

User Arrangements

(1) side-by-side

(b)

user 1

Interactive boards

tracking range of user 1

user 1

user 2

400 of 450

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(b)

(c)

user 1

user 2

(b)

user 2

observed by user 1

A

user 1

RH

LH

RH

LH

401 of 450

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(d)

(c)

(b)

user 1

user 2

user 3

user 4

402 of 450

Layout Reconfiguration

User Arrangements

(1) side-by-side

(2) face-to-face

(3) hybrid

(d)

(c)

(b)

user 2

teacher

user 3

user 4

403 of 450

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

404 of 450

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

405 of 450

Layout Reconfiguration

Input Modes

(1) direct

(2) projection

406 of 450

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

407 of 450

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

408 of 450

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

409 of 450

C1: Integrated Layout

C2: Mirrored Layout

C3: Projective Layout

410 of 450

Evaluation

Overview of subjective feedback on CollaboVR

411 of 450

Evaluation

412 of 450

Evaluation

413 of 450

Takeaways

  1. Developing CollaboVR, a reconfigurable end-to-end collaboration system.
  2. Designing custom configurations for real-time user arrangements and input modes.
  3. Quantitative and qualitative evaluation of CollaboVR.
  4. Open-sourcing our software at https://github.com/snowymo/CollaboVR.

414 of 450

more live demos...

415 of 450

416 of 450

417 of 450

418 of 450

419 of 450

Zhenyi He* Ruofei Du Ken Perlin*

*Future Reality Lab, New York University Google LLC

CollaboVR: A Reconfigurable Framework for

Creative Collaboration in Virtual Reality

420 of 450

Fusing Physical and Virtual Worlds into

An Interactive Metaverse

Ruofei Du

Senior Research Scientist

Google, San Francisco

www.ruofeidu.com

me@duruofei.com

421 of 450

Introduction

Depth Map

Introduction

Depth Map

422 of 450

Introduction

Depth Lab

423 of 450

Thank you!

www.duruofei.com

424 of 450

Introduction

Depth Lab

Occlusion is a critical component for AR realism!

Correct occlusion helps ground content in reality, and makes virtual objects feel as if they are actually in your space.

425 of 450

Introduction

Motivation

426 of 450

Depth Mesh

Generation

427 of 450

Localized Depth

Avatar Path Planning

428 of 450

Dense Depth

Depth Texture

429 of 450

Introduction

Depth Map

430 of 450

Taxonomy

Depth Usage

431 of 450

Introduction

Depth Map

432 of 450

Introduction

Depth Map

433 of 450

OmniSyn: Synthesizing 360 Videos with Wide-baseline Panoramas

David Li, Yinda Zhang, Christian Häne, Danhang Tang, Amitabh Varshney, Ruofei Du

433

434 of 450

Problem

  • Interpolate between 360° street view panoramas.

434

≥ 5 meters

baseline

OmniSyn

360° Wide-baseline

View Synthesis

435 of 450

Related Works - Monocular Neural Image Based Rendering with Continuous View Control (ICCV 2019)

  • Xu Chen, Jie Song, and Otmar Hilliges
  • Predict target depth from source RGB & target pose.
  • Do back-projection using the target depth.

435

436 of 450

Related Works - SynSin (CVPR 2020)

  • Olivia Wiles, Georgia Gkioxari, Richard Szeliski, Justin Johnson
  • End-to-end View Synthesis from a Single Image
  • Trained with differentiable point cloud rendering and without depth supervision.

436

437 of 450

Research Goal

  • What are the challenges associated with 360 street view synthesis?
  • How can we modify the traditional view synthesis pipeline to leverage two 360 panoramas meters apart?

437

438 of 450

Method

  • Tailor the traditional view synthesis pipeline to handle 360 panoramas.
    • Depth estimation with spherical sweep cost volumes (Sunghoon, 2016).
    • Mesh rendering for 360 panoramas.
    • Inpainting with CoordConv (Liu, 2018) and Circular CNNs (Schubert, 2019).

438

0

0

0

0

0

0

0

0

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

1

1

1

1

1

1

1

1

CoordConv

Spherical Cost Volume

439 of 450

Pipeline

  • Our pipeline directly operates on equirectangular panoramas:

439

Panorama 1

Panorama 0

Depth Prediction 0

Depth Prediction 1

RGB + Visibility

RGB + Visibility

Depth Predictor

Depth Predictor

[R0|t0]

Mesh Renderer

Fusion

Network

[R1|t1]

Mesh Renderer

Target Panorama

440 of 450

Stereo Depth with Cost Volume

  • Stereo depth estimation uses cost volumes with perspective images.
    • Cost volumes compute the image different at various offsets to identify the correct disparity.
    • StereoNet compute cost volumes from deep features within a CNN.

440

-

=

-

=

441 of 450

Stereo 360 Depth with Cost Volume

  • For 360, we can perform a similar sweep testing different depths:

441

442 of 450

Stereo 360 Depth with Cost Volume

  • For 360, we can perform a similar sweep testing different depths:

442

-

=

-

=

443 of 450

Mesh Rendering

  • For 360 images, using point clouds leads to sparse regions:

443

444 of 450

Mesh Rendering

  • For 360 images, using point clouds leads to sparse regions:

444

Point Cloud Render

Mesh Render

3 m

2 m

1 m

4 m

OmniSyn (Mesh)

GT Visibility

OmniSyn (Point Cloud)

445 of 450

CoordConv and Circular CNN

  • CoordConv - append an additional channel to allow conv layers to know where it is in the image and hopefully adapt to the ERP distortion.
    • Distortion in the panorama means the images are not shift-invariant.
  • Circular CNN - modify the CNN padding so that the kernel wraps around the left and right edges of the ERP image.

445

0

0

0

0

0

0

0

0

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

1

1

1

1

1

1

1

1

CoordConv

Circular CNN

446 of 450

Experiments

  • Generated some synthetic street view images using CARLA
    • Static scenes without pedestrians or cars.
    • Allows us to obtain clean ground-truth depth information.
  • Trained our pipeline with explicit depth and RGB supervision.
  • Compare mesh and point-cloud rendering.
  • Test generalization to real street view images.

446

447 of 450

Results

  • We experiment on synthetic static street scenes from CARLA:

447

448 of 450

Generalization to Real Street View Panoramas

448

0 m GT

4.6 m GT

10.1 m GT

0 m GT

9.0 m GT

0 m GT

Synthesized

449 of 450

Limitations

449

Input 0

Input 1

Synthesized

Fusion network does not generalize well to unseen colors.

Depth prediction struggles with tall buildings.

Triangle removal may eliminate thin structures.

450 of 450

Conclusion

  • We identified some challenges with performing view synthesis on sets of synthetic 360 street view panoramas.
  • We augmented the view synthesis pipeline with components suitable for dual 360 panoramas.

450