2014 Computer Vision Internships in the Willow Group
We are looking for strongly motivated candidates with interest in computer vision and applications of machine learning to computer vision problems. Good background in applied mathematics, strong programming skills and prior experience with Matlab are required. The internships can lead to a PhD in the Willow Group.
Proposed internship topics:
1. Semantic 3D reconstruction of architectural scenes
To apply, please send us your CV and come to visit us in the lab to discuss the topics.
Project supervisors: Mathieu Aubry <Mathieu.Aubry@ens.fr> Josef Sivic <Josef.Sivic@ens.fr>
Location: Willow Group, Departement d'Informatique de l'École Normale Supérieure
Figure. 1: Three sample reconstructions by Acute3D at three levels of details: a church facade, a street and a city.
Motivation
Large-scale 3D modeling of cities [Aggarwal09] has a wide range of applications in urban planning, sustainable development, navigation, simulation, virtual environments, film post-production, game industry, to mention but a few examples [Musialski12, Vanegas10]. As a result, public authorities, museums, cinematographic studios, but also industries, providers of car navigation systems or location-based services show a great interest in this topic. For example, Google, Microsoft, and Apple are driving the gigantic demand for 3D reconstruction of very large-scale urban environments with competing 3D platforms such as Google Earth and Virtual Earth. However, building highly detailed models requires either laser scans or a very large number of photographs and those models feature only plain surfaces textured from available images. In addition, the recovered geometry and texturing are often wrong when there are invisible or discontinuous parts such as occluding foreground objects including trees, cars or lampposts that are pervasive in urban scenes.
Goal
This internship will go beyond plain geometrical models and make a step towards producing semantic 3D models, i.e., models which are not bare surfaces but which identify architectural elements such as windows, walls, roofs, balconies, doors or even furniture inside the buildings. In particular, the goal is to discover and use semantic knowledge of architecture to reconstruct 3D models from a single or a few images of an architectural site. Imagine, for example, reconstructing now destroyed buildings from only a small number of historical photographs or even non-photographic depictions (such as paintings or sketches).
Project description
This internship is on the exciting boundary between 3D reconstruction and visual recognition, and will combine state-of-the-art techniques from both areas. In particular, the project will build on our recent work on learning mid-level image representations [Doersch12] and matching 3D CAD models to 2D imagery [Aubry13]. The project will proceed in the following four steps:
Potential impact
The potential impact of automatic large-scale semantic 3D modeling is immense with applications in diagnosis and simulation for building renovation projects and more general urban planning (e.g. solar cell deployment, thermal performance assessment, noise and pollution propagation) as well as quantitative analysis for architecture, archeology or history. Other applications are in virtual cities for navigation, gaming or film post-production with a new level of reality such as object-specific rendering for specular surfaces (e.g. windows).
Requirements
We are looking for strongly motivated candidates with an interest in computer vision, 3D reconstruction, visual recognition and machine learning. The project requires strong background in applied mathematics and excellent programming skills. If we find a mutual match the project can lead to a Phd in the Willow group.
References
[Aggarwal09] S. Aggarwal et al., Building Rome in a Day, ICCV 2009.
[Aubry13] M. Aubry, B. Russell, and J. Sivic. Painting-to-3D Model Alignment Via Discriminative Visual Elements, To appear in Transactions on Graphics (TOG), 2014.
[Bach07] F. Bach and Z. Harchaoui. DIFFRAC : a discriminative and flexible framework for clustering, Advances in Neural Information Processing Systems (NIPS) 20, 2007.
[Bao13] Y. Bao, M. Chandraker, Y. Lin, and S. Savarese, Dense Object Reconstruction Using Semantic Priors, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
http://cvgl.stanford.edu/papers/Bao_semantic_reconstruction_cvpr13.pdf
[Doersch12] C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. Efros. What makes Paris look like Paris? ACM Transactions on Graphics (SIGGRAPH 2012).
[Fouhey13] Data-Driven 3D Primitives for Single Image Understanding
David F. Fouhey, Abhinav Gupta, Martial Hebert, ICCV 2013
[Haene13] C. Haene, C. Zach, A. Cohen, R. Angst, M. Pollefeys, Joint 3D Scene Reconstruction and Class Segmentation, Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition 2013
[Martinovic13] A. Martinovic and L. Van Gool. Bayesian Grammar Learning for Inverse Procedural Modeling, CVPR 2013.
[Musialski12] P. Musialski et al. A Survey of Urban Reconstruction. Eurographics 2012-State of the Art Reports. 2012.
[Oquab13] M. Oquab, L. Bottou, I. Laptev, J. Sivic. Learning and transferring mid-level image representation using convolutional neural networks, http://hal.inria.fr/hal-00911179, 2013.
[Vanegas10] C.A. Vanegas et al. Modelling the appearance and behaviour of urban spaces. Computer Graphics Forum, 29(1):25-42, 2010
Talk with the course instructors if you wish to know additional internship topics.