Overview

Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.

The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.

The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.

Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)

Location and Time: NSH 3002, Tu Th 12:00-1:20 PM

Instructors: Alexei “Alyosha” Efros, associate professor, CMU

Leonid Sigal, research scientist, Dinsey Research Pittsburgh

Office Hours: by appointment

Paper Discussion

Leave your comments about papers on the Class Blog

Paper List

The paper list contains papers that will be discussed in class.

Schedule

Date	Presenter	Topic/Paper	Slides
Jan. 17	Alyosha	Introduction, Vision Perspective: Measurement vs. Perception, Administrative stuff, overview of the course	Intro ppt
Jan 19	Leon	Introduction, Learning Perspective	intro_learning.pdf
Jan 24	Alyosha	Intro to Data Readings: 1. The Unreasonable Effectiveness of Data, A. Halevy, P. Norvig, and F. Pereira, IEEE Intelligent Systems, 24 8--12, 2009. 2. Unbiased Look at Dataset Bias, Antonio Torralba, Alexei Efros, CVPR, 2011.	data.ppt
Jan. 26	Alyosha	Theories of Visual Perception Reading: 1. Vision is getting easier every day, P. Cavanagh, 1995. Optional reading: Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century? K. Nakayama, 1998.	theories.ppt
Feb 2	Alyosha	Physiology of Vision Readings: 1. The Plenoptic Function and the Elements of Early Vision, E.H. Adelson and J.R. Bergen, 1991. 2. Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, B. Olshausen and D. Field, Nature, 1996.	physiology.ppt
Feb 4	Alyosha	cont.
Feb 7	Leon	Sparsity and Deep Learning Readings: 1. Non-Local Sparse Models for Image Restoration, J. Mairal, F. Bach, J. Ponce, G. Sapiro and A. Zisserman, ICCV, 2009. 2. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning, Matt Zeiler, Graham Taylor and Rob Fergus, ICCV, 2011.	deep_learning.pdf
Feb. 9	Alyosha	What should be done at the low level? When is object/scene recognition just texture recognition? Readings: 1. When is scene recognition just texture recognition?, L. W. Renninger and J. Malik, Vision Research, 2004. 2. Object Categorization by Learned Universal Visual Dictionary, J. Winn, A. Criminisi, and T. Minka, ICCV, 2005.	lowlevel.ppt
Feb. 14	Alyosha Samantha	cont. Reading: 1. Discriminant Learning of Local Image Descriptors, M. Brown, G. Hua and S. Winder, IEEE TPAMI 2010.	descriptors.pptx
Feb. 16	Alyosha	Mid-Level: when low-level is just too low Reading: 1. Learning a Classification Model for Segmentation, X. Ren and J. Malik, ICCV, 2003.	midlevel.ppt
Feb. 21	Narek Yogeshwar and Sasikanth	Contours and Segmentations Readings: 1. Contour Detection and Hierarchical Image Segmentation, P. Arbelaez, M. Maire, C. Fowlkes and J. Malik, IEEE TPAMI, May, 2011. 2. Constrained Parametric Min-Cuts for Automatic Object Segmentation, J. Carreira and C. Sminchisescu, CVPR, 2010.	CPMC.pptx
Feb 23	Leon	Image Labeling Readings: 1. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation, J. Shotton, J. Winn, C. Rother, A. Criminisi, ECCV, 2006. 2. Semantic Texton Forests for Image Categorization and Segmentation, J. Shotton, M. Johnson, R. Cipolla, CVPR, 2008.	image_labeling.pdf
Feb 28	Adam Stambler Jonathon Smereka	Image Labeling, cont. Readings: 1. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, P. Krähenbühl and V. Koltun, NIPS, 2011. 2. Nonparametric Scene Parsing via Label Transfer, C. Liu, J. Yuen and A. Torralba, IEEE TPAMI, May, 2011.	efficentcrf.pdf
Mar 1	Alyosha	Object Detection: HOG Templates Readings: 1. Histograms of Oriented Gradients for Human Detection, Dalal & Triggs 2. Object Detection with Discriminatively Trained Part-Based Models, P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, IEEE TPAMI, Sept, 2009.	ObjectsParts.ppt
Mar 6	Alyosha	Object Detection: Poslets and eSVM 1. Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev, J. Malik, ICCV, 2009. 2. Ensemble of Exemplar-SVMs for Object Detection and Beyond, T. Malisiewicz, A. Gupta, A. Efros, ICCV, 2011.
Mar 8	Leon Tommy Liu	Transfer Learning Reading: 1. Adapting Visual Category Models to New Domains, K. Saenko, B. Kulis, M. Fritz and T. Darrell, ECCV, 2010.	transfer_learning.pdf
Mar 13		BREAK
Mar 15		BREAK
Mar 20	Tinghui Zhou	Transfer Learning for Object Detection and Categorization Readings: 1. Tabula Rasa: Model Transfer for Object Category Detection, Y. Aytar, A. Zisserman, ICCV, 2011. 2. Transfer Learning by Borrowing Examples for Multiclass Object Detection, J. Lim, R. Salakhutdinov and A. Torralba, NIPS, 2011.	slides_tz-1.pdf
Mar 22	Ali Farhadi Maturana	Attributes (Guest lecture by A. Farhadi) Readings: 1. Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR, 2009. 2. Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar, ICCV, 2009.
Mar 27	Alesha Yair Movshovitz-Attias	Anti-categories Readings: 1. Adaptively Learning the Crowd Kernel, O. Tamuz, C. Liu, S. Belongie, O. Shamir, A. Kalai, ICML, 2011. 2. Relative Attributes, D. Parikh, K. Grauman, ICCV, 2011.	relative_attributes.pdf
Mar 29	Alyosha Carl Doersch	Object Discovery, Image Graphs, Visual Memex Readings: 1. VisualRank: Applying PageRank to Large-Scale Image Search, Y. Jing, S. Baluja, IEEE TPAMI, 2008. 2. Unsupervised Detection of Regions of Interest Using Iterative Link Analysis, G. Kim, E. Xing, A. Torralba, NIPS, 2009. 3. Image Webs: Computing and Exploiting Connectivity in Image Collections, K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, L. J. Guibas, CVPR, 2010.	memex_graphs.ppt image_graphs.pdf
Apr 3		NO CLASS
Apr 5	Carl Doersch Leon	cont. Role of Context
Apr 10	Yogeshwar and Sasikanth Wen-Sheng Chu	cont. Readings: 1. Auto-context and its Application to High-level Tasks, Z. Tu, CVPR, 2008. Readings: 1. Efficient Object Category Recognition Using Classemes, L. Torresani, M.Szummer, A. Fitzgibbon, ECCV, 2010. 2. Objects as Attributes for Scene Classification, L.-J. Li, H. Su, Y. Lim and L. Fei-Fei, 1st International Workshop on Parts and Attributes, 2010.	autocontext.pptx attributes_classemes.pptx
Apr 12	Alyosha Supreeth Hatem	Large-scale Image Retrieval Readings: 1. Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas, CVPR, 2009. 2. Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR, 2011. Readings: 1. Total Recall II: Query Expansion Revisited, O. Chum, A. Mikulik, M. Perdoch, and J. Matas, CVPR, 2011. 2. Learning Query-dependent Prefilters for Scalable Image Retrieval, L. Torresani, M. Szummer, A. Fitzgibbon, CVPR, 2009.	large_scale_image_retrieval.pdf
Apr 17	Leon Zhou Yu	Semi-supervised Learning Intro Readings: 1. Self-paced Learning for Latent Variable Models, M. Pawan Kumar, B. Packer and D. Koller, NIPS, 2011.	Self-pacedLearningLVMs.pptx
Apr 19	David Fouhey Leon	1. Measuring the objectness of image windows, B. Alexe, T. Deselaers, and V. Ferrari, IEEE TPAMI, 2012. Pose Estimation Intro	objectness.ppt
Apr 24	Stephen Siena Greydon Foil	Readings: 1. Pictorial Structures Revisited: People Detection and Articulated Pose Estimation, M. Andriluka, S. Roth, B. Schiele, CVPR, 2009. 2. Learning effective human pose estimation from inaccurate annotation, S. Johnson and M. Everingham, CVPR, 2011. Readings: 1. Twin Gaussian Processes for Structured Prediction, L. Bo and C. Sminchisescu, IJCV, 2010.
Apr 26	Leon Shoou-I Yu	Tracking Intro Readings: 1. Robust Visual Tracking using L1 Minimization, X. Mei and H. Ling, ICCV, 2009.
May 1	Michalis Raptis Zhen-zhong Lan	Action Recognition (Guest lecture by Michalis Raptis) Readings: 1. Evaluation of local spatio-temporal features for action recognition, H. Wang, M. M. Ullah, A. Kläser, I. Laptev and C. Schmid, BMVC, 2009.
May 3	Alyosha	The Last Lecture	conclusion.ppt
May 4	6-9pm	Synthesis project presentations

Similar Courses

This course has been inspired by these offered by several of our colleagues. Here is a partial list:

Visual Recognition (Kristen Grauman, Texas-Austin, Fall 2011)
Grounding Object Recognition and Scene Understanding (Antonio Torralba, MIT, Fall 2011)
Visual Scene Understanding (Derek Hoiem, UIUC, Spring 2009)
Statistical Models for Visual Recognition (Deva Ramanan, UCI, Winter 2009)
Object Recognition and Scene Understanding (Antonio Torralba, MIT, Fall 2008)
Scene Understanding Seminar (Aude Oliva, MIT, Fall 2008)
Selected Topics in Vision & Learning (Serge Belongie, UCSD, Spring 2011)
Learning and Inference in Vision (Bill Freeman, MIT)
Cutting Edge of Computer Vision (Fei-Fei Li, Stanford)
Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
Recognition Problems in Computer Vision (Greg Mori, SFU)
Visual Recognition (Pietro Perona, CalTech)
Vision and Learning (Jianbo Shi, UPenn)

Overview

Paper Discussion

Paper List

Schedule

Similar Courses

Tutorials, workshops, summer schools and seminars: