16-824: Learning-Based Methods in Vision
"The aim of computer vision is to overfit to our visual world"
-- remark by Antonio Torralba (after his third beer)
Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.
The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.
The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.
Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)
Location and Time: NSH 3002, Tu Th 12:00-1:20 PM
Instructors: Alexei “Alyosha” Efros, associate professor, CMU
Leonid Sigal, research scientist, Dinsey Research Pittsburgh
Office Hours: by appointment
Leave your comments about papers on the Class Blog
The paper list contains papers that will be discussed in class.
Introduction, Vision Perspective: Measurement vs. Perception, Administrative stuff, overview of the course
Introduction, Learning Perspective
Intro to Data
1. The Unreasonable Effectiveness of Data, A. Halevy, P. Norvig, and F. Pereira, IEEE Intelligent Systems, 24 8--12, 2009.
2. Unbiased Look at Dataset Bias, Antonio Torralba, Alexei Efros, CVPR, 2011.
Theories of Visual Perception
1. Vision is getting easier every day, P. Cavanagh, 1995.
Optional reading: Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century? K. Nakayama, 1998.
Physiology of Vision
1. The Plenoptic Function and the Elements of Early Vision, E.H. Adelson and J.R. Bergen, 1991.
2. Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, B. Olshausen and D. Field, Nature, 1996.
Sparsity and Deep Learning
1. Non-Local Sparse Models for Image Restoration, J. Mairal, F. Bach, J. Ponce, G. Sapiro and A. Zisserman, ICCV, 2009.
2. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning, Matt Zeiler, Graham Taylor and Rob Fergus, ICCV, 2011.
What should be done at the low level? When is object/scene recognition just texture recognition?
1. When is scene recognition just texture recognition?, L. W. Renninger and J. Malik, Vision Research, 2004.
2. Object Categorization by Learned Universal Visual Dictionary, J. Winn, A. Criminisi, and T. Minka, ICCV, 2005.
1. Discriminant Learning of Local Image Descriptors, M. Brown, G. Hua and S. Winder, IEEE TPAMI 2010.
Mid-Level: when low-level is just too low
1. Learning a Classification Model for Segmentation, X. Ren and J. Malik, ICCV, 2003.
Yogeshwar and Sasikanth
Contours and Segmentations
1. Contour Detection and Hierarchical Image Segmentation, P. Arbelaez, M. Maire, C. Fowlkes and J. Malik, IEEE TPAMI, May, 2011.
2. Constrained Parametric Min-Cuts for Automatic Object Segmentation, J. Carreira and C. Sminchisescu, CVPR, 2010.
1. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation, J. Shotton, J. Winn, C. Rother, A. Criminisi, ECCV, 2006.
2. Semantic Texton Forests for Image Categorization and Segmentation, J. Shotton, M. Johnson, R. Cipolla, CVPR, 2008.
Image Labeling, cont.
1. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, P. Krähenbühl and V. Koltun, NIPS, 2011.
2. Nonparametric Scene Parsing via Label Transfer, C. Liu, J. Yuen and A. Torralba, IEEE TPAMI, May, 2011.
Object Detection: HOG Templates
1. Histograms of Oriented Gradients for Human Detection, Dalal & Triggs
2. Object Detection with Discriminatively Trained Part-Based Models, P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, IEEE TPAMI, Sept, 2009.
Object Detection: Poslets and eSVM
1. Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev, J. Malik, ICCV, 2009.
2. Ensemble of Exemplar-SVMs for Object Detection and Beyond, T. Malisiewicz, A. Gupta, A. Efros, ICCV, 2011.
1. Adapting Visual Category Models to New Domains, K. Saenko, B. Kulis, M. Fritz and T. Darrell, ECCV, 2010.
Transfer Learning for Object Detection and Categorization
2. Transfer Learning by Borrowing Examples for Multiclass Object Detection, J. Lim, R. Salakhutdinov and A. Torralba, NIPS, 2011.
Attributes (Guest lecture by A. Farhadi)
1. Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR, 2009.
2. Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar, ICCV, 2009.
1. Adaptively Learning the Crowd Kernel, O. Tamuz, C. Liu, S. Belongie, O. Shamir, A. Kalai, ICML, 2011.
2. Relative Attributes, D. Parikh, K. Grauman, ICCV, 2011.
Object Discovery, Image Graphs, Visual Memex
1. VisualRank: Applying PageRank to Large-Scale Image Search, Y. Jing, S. Baluja, IEEE TPAMI, 2008.
2. Unsupervised Detection of Regions of Interest Using Iterative Link Analysis, G. Kim, E. Xing, A. Torralba, NIPS, 2009.
3. Image Webs: Computing and Exploiting Connectivity in Image Collections, K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, L. J. Guibas, CVPR, 2010.
Role of Context
Yogeshwar and Sasikanth
1. Auto-context and its Application to High-level Tasks, Z. Tu, CVPR, 2008.
1. Efficient Object Category Recognition Using Classemes, L. Torresani, M.Szummer, A. Fitzgibbon, ECCV, 2010.
2. Objects as Attributes for Scene Classification, L.-J. Li, H. Su, Y. Lim and L. Fei-Fei, 1st International Workshop on Parts and Attributes, 2010.
Large-scale Image Retrieval
1. Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas, CVPR, 2009.
2. Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR, 2011.
1. Total Recall II: Query Expansion Revisited, O. Chum, A. Mikulik, M. Perdoch, and J. Matas, CVPR, 2011.
2. Learning Query-dependent Prefilters for Scalable Image Retrieval, L. Torresani, M. Szummer, A. Fitzgibbon, CVPR, 2009.
Semi-supervised Learning Intro
1. Self-paced Learning for Latent Variable Models, M. Pawan Kumar, B. Packer and D. Koller, NIPS, 2011.
1. Measuring the objectness of image windows, B. Alexe, T. Deselaers, and V. Ferrari, IEEE TPAMI, 2012.
Pose Estimation Intro
1. Pictorial Structures Revisited: People Detection and Articulated Pose Estimation, M. Andriluka, S. Roth, B. Schiele, CVPR, 2009.
2. Learning effective human pose estimation from inaccurate annotation, S. Johnson and M. Everingham, CVPR, 2011.
1. Twin Gaussian Processes for Structured Prediction, L. Bo and C. Sminchisescu, IJCV, 2010.
1. Robust Visual Tracking using L1 Minimization, X. Mei and H. Ling, ICCV, 2009.
Action Recognition (Guest lecture by Michalis Raptis)
1. Evaluation of local spatio-temporal features for action recognition, H. Wang, M. M. Ullah, A. Kläser, I. Laptev and C. Schmid, BMVC, 2009.
The Last Lecture
Synthesis project presentations
This course has been inspired by these offered by several of our colleagues. Here is a partial list: