KCIS Distinguished Lecture by Andrew Zisserman on Learning from Sight and Sound
Date: 2nd January 2019, 11:30 am
Venue: Himalaya 105, IIIT-H Campus, Gachibowli.

The talk will describe self-supervised learning from videos with sound and will be divided into two parts. The first part will describe self-supervised learning from the visual stream alone (no sound), and shows that it is possible to learn powerful embedding for tasks such as facial attribute prediction and human action
recognition. This requires defining a proxy loss, so that a deep network trained with this loss has to solve the task of interest. The second part will explore multi-modal self-supervised learning from video and audio. We investigate two proxy loss functions, synchronization and correspondence, to link the modalities. We show
that, in addition to training networks to encode images and audio, we get for free a number of functionalities including: active speaker identification; audio-visual speech enhancement; and localizing objects by their sound.

Andrew Zisserman is one of the principal architects of modern computer vision. His work in the 1980s on surface reconstruction with discontinuities is widely cited. He is best known for his leading role during the 1990s in establishing the computational theory of multiple view reconstruction and the development of practical algorithms that are widely in use today. This culminated in the publication, in 2000, of his book with Richard Hartley, already regarded as a standard text. His laboratory in Oxford is internationally renowned, and its work is currently shedding new light on the problems of object detection and recognition.

Name *
Your answer
e-mail *
Your answer
Phone Number *
Your answer
Name of the Organisation *
Your answer
Occupation *
Your answer
Never submit passwords through Google Forms.
This form was created inside of The IIIT-H Foundation. Report Abuse - Terms of Service