ACDEFGHIJKLMNOPQRSTUVWXYZ
1
DateSpeakerTitle AbstractBioParticipants
2
2024-10-15 (Tue, 10:00)Hongyang Li Vista: The Evolution of Mobility: Self-driving Vehicles on the HorizonIn this talk, we would cover a deepdive walkthrough on the recent end-to-end autonomous driving system. Compared to conventional modular based approaches, end-to-end systems benefit from global optimization and inherit advantages from large foundational models. Vision and language models (VLM) are capable of good generalization to new scenarios and have shown some potential to exhibit emergence when the data scale is up to millions or billions of magnitude. What are the key ingredients to apply foundation models into the application of autonomous driving? How to utilize data science to embrace the emergence of fully autonomous driving?Does simply adapting VLM into the driving scenarios with a massive amount of data corpus, improve the driving task such as motion prediction or planning? Our observation shows that there exist a bunch of challenges ahead that need exquisite investigation. We will use recent work such as Vista, GenAD, DriveLM, from the OpenDriveLab team as an example.

Hongyang Li is an Assistant Professor at University of Hong Kong. His research focus is on autonomous driving and embodied AI. He led the end-to-end autonomous driving project, UniAD and won the IEEE CVPR 2023 Best Paper Award. UniAD has a large impact both in academia and industry, including the recent rollout to customers by Tesla in FSD V12. He proposed the bird’s-eye-view perception work, BEVFormer, that won Top 100 AI Papers in 2022 and was explicitly recognized by Jensen Huang, CEO of NVIDIA and Prof. Shashua, CEO of Mobileye at public keynotes. He served as Area Chair for CVPR 2023, 2024, NeurIPS 2023 (Notable AC), 2024, ACM MM 2024, ICLR 2025, referee for Nature Communications. He will serve as Workshop Chair for CVPR 2026. He is the Working Group Chair for IEEE Standards under Vehicular Technology Society and Senior Member of IEEE.20
3
2024-10-22 (Tue, 18:00)Boris IvanovicRevolutionizing Autonomous Vehicle Development with Foundation ModelsFoundation models, trained on vast and diverse data encompassing a breadth of human experiences, are at the heart of the ongoing AI revolution influencing the way we create, problem solve, and work. These models, and the lessons learned from their construction, can also be applied to the way we develop a similarly transformative technology: autonomous vehicles (AVs). In this talk, we’ll highlight recent research efforts toward rethinking elements of an AV program both in the vehicle and in the data center, with an emphasis on (1) composing ingredients for universal and controllable end-to-end simulation, (2) novel end-to-end AV architectures that are built from the ground up to harness foundation models, and (3) enabling generalization to multiple geographies by leveraging video generation models and improving their consistency with techniques from neural rendering.Boris is a Senior Research Scientist and Manager in NVIDIA's Autonomous Vehicle Research Group. His research interests include novel end-to-end AV architectures, sensor and traffic simulation, AI safety, and the thoughtful integration of foundation models in AV development. Prior to joining NVIDIA, he received his Ph.D. in Aeronautics and Astronautics under the supervision of Marco Pavone in 2021 and an M.S. in Computer Science in 2018, both from Stanford University. He received his B.A.Sc. in Engineering Science from the University of Toronto in 2016.
4
2024-10-29 (Tue, 10:00)
5
2024-11-05 (Tue, 10:00)Sebastian GoldtLearning higher-order data correlations with neural networks, efficientlyNeural networks excel at finding patterns in their data -- but which patterns do they extract, and how do they extract them efficiently? I will argue that the features learnt by deep neural networks are shaped by the higher-order correlations (HOCs) of their inputs. I will describe a mechanism by which neural networks can learn from the higher-order correlations of images -- a computationally hard task! -- efficiently by exploiting correlations in the latent space of the inputs. Finally, I will show how HOCs in a simple model of images are in turn shaped by the symmetries of the data. I will close by showing how to extend these results to transformers and natural language processing.Sebastian Goldt is an Assistant Professor at the International School of Advanced Studies (SISSA) in Trieste, Italy. The goal of his group is to develop theories of learning in artificial and biological neural networks. Sebastian studied physics at Cambridge (UK) and did his PhD on the stochastic thermodynamics of learning with Udo Seifert in Stuttgart (Germany). Before coming to Trieste in 2020, he then spent three years as a post-doc with Florent Krzakala and Lenka Zdeborovà in Paris. He is one of the organisers of the conference series "Youth in high dimensions", and was awarded an ERC Starting Grant in 2024.v
6
2024-11-12 (Tue, 10:00)Stefan BaurSelf-Supervised 3D Scene Flow for LiDAR Object Detection Without Human AnnotationsHuman annotations for lidar data are expensive and do not scale well. At the same time, temporal consistency and motion are cheap and strong supervision signal present in sequences of lidar point clouds. In this talk, I will first describe how to obtain accurate self-supervised 3D scene flow from lidar point cloud sequences.
Then, using the motion cues from the scene flow, I will demonstrate how we are able to train self-supervised lidar object detectors that generalize from moving to movable objects, using only temporal consistency (tracking) as supervision, but no human annotations.
I joined Mercedes-Benz R&D and University Tübingen Autonomous Vision Group (Andreas Geiger) in 2018 as a PhD student. First, I worked on realistic lidar simulation for autonomous driving. Later, I pivoted towards motion-based self-supervised learning for lidar point clouds. In 2022, I became a full-time ML engineer at Mercedes-Benz. At the moment, I am working on online HD map learning.
7
2024-11-19 (Tue, 17:30)Ishan KhatriTowards 4D Scene Understanding for RoboticsMuch prior art has focused on the areas of 3D reconstruction from sensor data and applications to robotics. In addition, many methods have added semantic knowledge to these reconstructions in some way to build accurate world representations that allow for robotics to operate in realistic domains. However these prior works generally focus on static environments and fail to capture the dynamics of the real world. This talk will focus on extending these world representations to "4D". Building temporally consistent world representations that also include an understanding of motion is one of the biggest unsolved challenges for real world robotics deployments.Ishan is a researcher at Stack AV; an autonomous trucking company based in Pittsburgh PA. He's previously worked at Motional and Argo AI and did his undergrad from UMass Amherst advised by Shlomo Zilberstein. His primary research interests are in 3D vision, 4D reconstruction and their applications to robotics.
8
2024-11-26 (Tue, 10:00)Alexandre Boulch
9
2024-12-03 (Tue, 17:30)Federico Tombari3D scene understanding with neural representations for
Augmented Reality
Neural representations have shown tremendous progress and represent a
promising tool for novel applications in the space of Augmented and
Mixed Reality. In this talk I will give an overview on the use of neural
representations for AR/XR applications with a focus on 3D scene
understanding, and for common tasks such as novel view synthesis, 3D
semantic segmentation and 3D asset generation. For each of these three
tasks, I will first highlight some important practical limitations of
current neural representations. I will then show solutions designed to
overcome such limitations, which include mobile novel view synthesis at
high framerate, open set 3D scene segmentation with radiance fields, and
realistic 3D asset generation from text prompts.
Federico Tombari is Research Director at Google where he leads an
applied research team in computer vision and machine learning across
North America and Europe. He is also a Lecturer (PrivatDozent) at the
Technical University of Munich (TUM). He has 300+ peer-reviewed
publications in CV/ML and applications to robotics, autonomous driving,
healthcare and augmented reality. He got his PhD from the University of
Bologna and his Venia Legendi (Habilitation) from Technical University
of Munich (TUM). In 2018-19 he was co-founder and managing director of a
startup on 3D perception for AR and robotics, then acquired by Google.
He regularly serves as Area Chair and Associate Editor for international
conferences and journals (IJRR, RA-L, NeurIPS23/24, ECCV22/24,
CVPR23/24/25, IROS20/21/22, ICRA20/22, 3DV19-25 among others). He was
the recipient of two Google Faculty Research Awards, one Amazon Research
Award, 1 Outstanding AC Award (ECCV24), 5 Outstanding Reviewer Awards
(3x CVPR, ICCV21, NeuriIps21), among others. He has been a research
partner of private and academic institutions including Google, Toyota,
BMW, Audi, Amazon, Univ. Stanford, ETH and MIT.
10
2024-12-10 (Tue, 10:00)Ozan Ozdenizci
11
2024-12-17 (Tue, 10:00)Yuki M. Asana
12
2024-12-24 (Tue, 10:00) Duygu Ataman? Ozan?
13
2024-12-31 (Tue, 10:00)
14
2025-01-07 (Tue, 10:00)Evin Pinar Ornek
15
2025-01-14 (Tue, 10:00)
16
2025-01-21 (Tue, 10:00)
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100