16-824: Learning-Based Methods in Vision
Spring 2012
"The aim of computer vision is to overfit to our visual world"
-- remark by Antonio Torralba (after his third beer)
Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.
The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.
The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.
Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)
Location and Time: NSH 3002, Tu Th 12:00-1:20 PM
Instructors: Alexei “Alyosha” Efros, associate professor, CMU
Leonid Sigal, research scientist, Dinsey Research Pittsburgh
Office Hours: by appointment
Leave your comments about papers on the Class Blog
The paper list contains papers that will be discussed in class.
Date | Presenter | Topic/Paper | Slides |
Jan. 17 | Alyosha | Introduction, Vision Perspective: Measurement vs. Perception, Administrative stuff, overview of the course | |
Jan 19 | Leon | Introduction, Learning Perspective | |
Jan 24 | Alyosha | Intro to Data Readings: 1. The Unreasonable Effectiveness of Data, A. Halevy, P. Norvig, and F. Pereira, IEEE Intelligent Systems, 24 8--12, 2009. 2. Unbiased Look at Dataset Bias, Antonio Torralba, Alexei Efros, CVPR, 2011. | |
Jan. 26 | Alyosha | Theories of Visual Perception Reading: 1. Vision is getting easier every day, P. Cavanagh, 1995. Optional reading: Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century? K. Nakayama, 1998. | |
Feb 2 | Alyosha | Physiology of Vision Readings: 1. The Plenoptic Function and the Elements of Early Vision, E.H. Adelson and J.R. Bergen, 1991. 2. Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, B. Olshausen and D. Field, Nature, 1996. | |
Feb 4 | Alyosha | cont. | |
Feb 7 | Leon | Sparsity and Deep Learning Readings: 1. Non-Local Sparse Models for Image Restoration, J. Mairal, F. Bach, J. Ponce, G. Sapiro and A. Zisserman, ICCV, 2009. 2. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning, Matt Zeiler, Graham Taylor and Rob Fergus, ICCV, 2011. | |
Feb. 9 | Alyosha | What should be done at the low level? When is object/scene recognition just texture recognition? Readings: 1. When is scene recognition just texture recognition?, L. W. Renninger and J. Malik, Vision Research, 2004. 2. Object Categorization by Learned Universal Visual Dictionary, J. Winn, A. Criminisi, and T. Minka, ICCV, 2005. | |
Feb. 14 | Alyosha Samantha | cont. Reading: 1. Discriminant Learning of Local Image Descriptors, M. Brown, G. Hua and S. Winder, IEEE TPAMI 2010. | |
Feb. 16 | Alyosha | Mid-Level: when low-level is just too low Reading: 1. Learning a Classification Model for Segmentation, X. Ren and J. Malik, ICCV, 2003. | |
Feb. 21 | Narek Yogeshwar and Sasikanth | Contours and Segmentations Readings: 1. Contour Detection and Hierarchical Image Segmentation, P. Arbelaez, M. Maire, C. Fowlkes and J. Malik, IEEE TPAMI, May, 2011. 2. Constrained Parametric Min-Cuts for Automatic Object Segmentation, J. Carreira and C. Sminchisescu, CVPR, 2010. | |
Feb 23 | Leon | Image Labeling Readings: 1. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation, J. Shotton, J. Winn, C. Rother, A. Criminisi, ECCV, 2006. 2. Semantic Texton Forests for Image Categorization and Segmentation, J. Shotton, M. Johnson, R. Cipolla, CVPR, 2008. | |
Feb 28 | Adam Stambler Jonathon Smereka | Image Labeling, cont. Readings: 1. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, P. Krähenbühl and V. Koltun, NIPS, 2011. 2. Nonparametric Scene Parsing via Label Transfer, C. Liu, J. Yuen and A. Torralba, IEEE TPAMI, May, 2011. | |
Mar 1 | Alyosha | Object Detection: HOG Templates Readings: 1. Histograms of Oriented Gradients for Human Detection, Dalal & Triggs 2. Object Detection with Discriminatively Trained Part-Based Models, P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, IEEE TPAMI, Sept, 2009. | |
Mar 6 | Alyosha | Object Detection: Poslets and eSVM 1. Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev, J. Malik, ICCV, 2009. 2. Ensemble of Exemplar-SVMs for Object Detection and Beyond, T. Malisiewicz, A. Gupta, A. Efros, ICCV, 2011. | |
Mar 8 | Leon Tommy Liu | Transfer Learning Reading: 1. Adapting Visual Category Models to New Domains, K. Saenko, B. Kulis, M. Fritz and T. Darrell, ECCV, 2010. | |
Mar 13 | BREAK | ||
Mar 15 | BREAK | ||
Mar 20 | Tinghui Zhou | Transfer Learning for Object Detection and Categorization Readings: 1. Tabula Rasa: Model Transfer for Object Category Detection, Y. Aytar, A. Zisserman, ICCV, 2011. 2. Transfer Learning by Borrowing Examples for Multiclass Object Detection, J. Lim, R. Salakhutdinov and A. Torralba, NIPS, 2011. | |
Mar 22 | Ali Farhadi Maturana | Attributes (Guest lecture by A. Farhadi) Readings: 1. Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR, 2009. 2. Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar, ICCV, 2009. | |
Mar 27 | Alesha Yair Movshovitz-Attias | Anti-categories Readings: 1. Adaptively Learning the Crowd Kernel, O. Tamuz, C. Liu, S. Belongie, O. Shamir, A. Kalai, ICML, 2011. 2. Relative Attributes, D. Parikh, K. Grauman, ICCV, 2011. | |
Mar 29 | Alyosha Carl Doersch | Object Discovery, Image Graphs, Visual Memex Readings: 1. VisualRank: Applying PageRank to Large-Scale Image Search, Y. Jing, S. Baluja, IEEE TPAMI, 2008. 2. Unsupervised Detection of Regions of Interest Using Iterative Link Analysis, G. Kim, E. Xing, A. Torralba, NIPS, 2009. 3. Image Webs: Computing and Exploiting Connectivity in Image Collections, K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, L. J. Guibas, CVPR, 2010. | |
Apr 3 | NO CLASS | ||
Apr 5 | Carl Doersch Leon | cont. Role of Context | |
Apr 10 | Yogeshwar and Sasikanth Wen-Sheng Chu | cont. Readings: 1. Auto-context and its Application to High-level Tasks, Z. Tu, CVPR, 2008. Readings: 1. Efficient Object Category Recognition Using Classemes, L. Torresani, M.Szummer, A. Fitzgibbon, ECCV, 2010. 2. Objects as Attributes for Scene Classification, L.-J. Li, H. Su, Y. Lim and L. Fei-Fei, 1st International Workshop on Parts and Attributes, 2010. | |
Apr 12 | Alyosha Supreeth Hatem | Large-scale Image Retrieval Readings: 1. Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas, CVPR, 2009. 2. Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR, 2011. Readings: 1. Total Recall II: Query Expansion Revisited, O. Chum, A. Mikulik, M. Perdoch, and J. Matas, CVPR, 2011. 2. Learning Query-dependent Prefilters for Scalable Image Retrieval, L. Torresani, M. Szummer, A. Fitzgibbon, CVPR, 2009. | large_scale_image_retrieval.pdf |
Apr 17 | Leon Zhou Yu | Semi-supervised Learning Intro Readings: 1. Self-paced Learning for Latent Variable Models, M. Pawan Kumar, B. Packer and D. Koller, NIPS, 2011. | |
Apr 19 | David Fouhey Leon | 1. Measuring the objectness of image windows, B. Alexe, T. Deselaers, and V. Ferrari, IEEE TPAMI, 2012. Pose Estimation Intro | |
Apr 24 | Stephen Siena Greydon Foil | Readings: 1. Pictorial Structures Revisited: People Detection and Articulated Pose Estimation, M. Andriluka, S. Roth, B. Schiele, CVPR, 2009. 2. Learning effective human pose estimation from inaccurate annotation, S. Johnson and M. Everingham, CVPR, 2011. Readings: 1. Twin Gaussian Processes for Structured Prediction, L. Bo and C. Sminchisescu, IJCV, 2010. | |
Apr 26 | Leon Shoou-I Yu | Tracking Intro Readings: 1. Robust Visual Tracking using L1 Minimization, X. Mei and H. Ling, ICCV, 2009. | |
May 1 | Michalis Raptis Zhen-zhong Lan | Action Recognition (Guest lecture by Michalis Raptis) Readings: 1. Evaluation of local spatio-temporal features for action recognition, H. Wang, M. M. Ullah, A. Kläser, I. Laptev and C. Schmid, BMVC, 2009. | |
May 3 | Alyosha | The Last Lecture | |
May 4 | 6-9pm | Synthesis project presentations |
This course has been inspired by these offered by several of our colleagues. Here is a partial list: