16-824: Learning-Based Methods in Vision

Spring 2012

"The aim of computer vision is to overfit to our visual world"

         -- remark by Antonio Torralba (after his third beer)

Overview

Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.

The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.

The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.

Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)

Location and Time: NSH 3002, Tu Th 12:00-1:20 PM

Instructors:   Alexei “Alyosha” Efros, associate professor, CMU

Leonid Sigal, research scientist, Dinsey Research Pittsburgh

Office Hours:  by appointment

Paper Discussion

Leave your comments about papers on the Class Blog

Paper List

The paper list contains papers that will be discussed in class.

Schedule

Date

Presenter

Topic/Paper

Slides

Jan. 17

Alyosha

Introduction, Vision Perspective: Measurement vs. Perception, Administrative stuff, overview of the course

Intro ppt

Jan 19

Leon

Introduction, Learning Perspective

intro_learning.pdf

Jan 24

Alyosha

Intro to Data

Readings:

1. The Unreasonable Effectiveness of Data, A. Halevy, P. Norvig, and F. Pereira, IEEE Intelligent Systems, 24 8--12, 2009.

2. Unbiased Look at Dataset Bias, Antonio Torralba, Alexei Efros, CVPR, 2011.

data.ppt

Jan. 26

Alyosha

Theories of Visual Perception

Reading:

1. Vision is getting easier every day, P. Cavanagh, 1995. 

Optional reading: Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century? K. Nakayama, 1998.

theories.ppt

Feb 2

Alyosha

Physiology of Vision

Readings:

1. The Plenoptic Function and the Elements of Early Vision, E.H. Adelson and J.R. Bergen, 1991. 

2. Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, B. Olshausen and D. Field, Nature, 1996.

physiology.ppt

Feb  4

Alyosha

cont.

Feb 7

Leon

Sparsity and Deep Learning

Readings:

1. Non-Local Sparse Models for Image Restoration, J. Mairal, F. Bach, J. Ponce, G. Sapiro and A. Zisserman, ICCV, 2009.

2. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning, Matt Zeiler, Graham Taylor and Rob Fergus, ICCV,  2011.

deep_learning.pdf

Feb. 9

Alyosha

What should be done at the low level? When is object/scene recognition just texture recognition?

Readings:

1. When is scene recognition just texture recognition?, L. W. Renninger and J. Malik, Vision Research, 2004. 

2. Object Categorization by Learned Universal Visual Dictionary, J. Winn, A. Criminisi, and T. Minka, ICCV, 2005.

lowlevel.ppt

Feb. 14

Alyosha

Samantha

cont.

Reading:

1. Discriminant Learning of Local Image Descriptors, M. Brown, G. Hua and S. Winder, IEEE TPAMI 2010.

descriptors.pptx

Feb. 16

Alyosha

Mid-Level: when low-level is just too low

Reading:

1. Learning a Classification Model for Segmentation, X. Ren and J. Malik, ICCV, 2003.

midlevel.ppt

Feb. 21

Narek

Yogeshwar and Sasikanth

Contours and Segmentations

Readings:

1. Contour Detection and Hierarchical Image Segmentation, P. Arbelaez, M. Maire, C. Fowlkes and J. Malik, IEEE TPAMI, May, 2011.

2. Constrained Parametric Min-Cuts for Automatic Object Segmentation, J. Carreira and C. Sminchisescu, CVPR, 2010.

CPMC.pptx

Feb 23

Leon

Image Labeling

Readings:

1. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation, J. Shotton, J. Winn, C. Rother, A. Criminisi, ECCV, 2006.

2. Semantic Texton Forests for Image Categorization and Segmentation, J. Shotton, M. Johnson, R. Cipolla, CVPR, 2008.

image_labeling.pdf

Feb 28

Adam Stambler

Jonathon Smereka

Image Labeling, cont.

Readings:

1. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, P. Krähenbühl and V. Koltun, NIPS, 2011.

2. Nonparametric Scene Parsing via Label Transfer, C. Liu, J. Yuen and A. Torralba, IEEE TPAMI, May, 2011.

efficentcrf.pdf

Mar 1

Alyosha

Object Detection: HOG Templates

Readings:

1. Histograms of Oriented Gradients for Human Detection, Dalal & Triggs

2. Object Detection with Discriminatively Trained Part-Based Models, P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, IEEE TPAMI, Sept, 2009.

ObjectsParts.ppt

Mar 6

Alyosha

Object Detection: Poslets and eSVM

1. Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, L. Bourdev, J. Malik, ICCV, 2009.

2.  Ensemble of Exemplar-SVMs for Object Detection and Beyond, T. Malisiewicz, A. Gupta, A. Efros, ICCV, 2011.

Mar 8

Leon

Tommy Liu

Transfer Learning

Reading:

1. Adapting Visual Category Models to New Domains, K. Saenko, B. Kulis, M. Fritz and T. Darrell, ECCV, 2010.

transfer_learning.pdf

Mar 13

BREAK

Mar 15

BREAK

Mar 20

Tinghui Zhou

Transfer Learning for Object Detection and Categorization

Readings:

1. Tabula Rasa: Model Transfer for Object Category Detection, Y. Aytar, A. Zisserman, ICCV, 2011.

2. Transfer Learning by Borrowing Examples for Multiclass Object Detection, J. Lim, R. Salakhutdinov and A. Torralba, NIPS, 2011.

slides_tz-1.pdf

Mar 22

Ali Farhadi

Maturana

Attributes (Guest lecture by A. Farhadi)

Readings:

1. Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR, 2009.

2. Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar, ICCV, 2009.

Mar 27

Alesha

Yair Movshovitz-Attias

Anti-categories

Readings:

1. Adaptively Learning the Crowd Kernel, O. Tamuz, C. Liu, S. Belongie, O. Shamir, A. Kalai, ICML, 2011.

2. Relative Attributes, D. Parikh, K. Grauman, ICCV, 2011.

relative_attributes.pdf

Mar 29

Alyosha

Carl Doersch

Object Discovery, Image Graphs, Visual Memex

Readings:

1. VisualRank: Applying PageRank to Large-Scale Image Search, Y. Jing, S. Baluja, IEEE TPAMI, 2008.

2. Unsupervised Detection of Regions of Interest Using Iterative Link Analysis, G. Kim, E. Xing, A. Torralba, NIPS, 2009.

3. Image Webs: Computing and Exploiting Connectivity in Image Collections, K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, L. J. Guibas, CVPR, 2010.

memex_graphs.ppt

image_graphs.pdf

Apr 3

NO CLASS

Apr 5

Carl Doersch

Leon

cont.

Role of Context

Apr 10

Yogeshwar and Sasikanth

Wen-Sheng Chu

cont.

Readings:

1. Auto-context and its Application to High-level Tasks, Z. Tu, CVPR, 2008.

Readings:

1. Efficient Object Category Recognition Using Classemes, L. Torresani, M.Szummer, A. Fitzgibbon, ECCV, 2010.

2. Objects as Attributes for Scene Classification, L.-J. Li, H. Su, Y. Lim and L. Fei-Fei, 1st International Workshop on Parts and Attributes, 2010.

autocontext.pptx

attributes_classemes.pptx

Apr 12

Alyosha

Supreeth

Hatem

Large-scale Image Retrieval

Readings:

1. Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas, CVPR, 2009.

2. Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR, 2011.

Readings:

1. Total Recall II: Query Expansion Revisited, O. Chum, A. Mikulik, M. Perdoch, and J. Matas, CVPR, 2011.

2. Learning Query-dependent Prefilters for Scalable Image Retrieval, L. Torresani, M. Szummer, A. Fitzgibbon, CVPR, 2009.

large_scale_image_retrieval.pdf

Apr 17

Leon

Zhou Yu

Semi-supervised Learning Intro

Readings:

1. Self-paced Learning for Latent Variable Models, M. Pawan Kumar, B. Packer and D. Koller, NIPS, 2011.

Self-pacedLearningLVMs.pptx

Apr 19

David Fouhey

Leon

1. Measuring the objectness of image windows, B. Alexe, T. Deselaers, and V. Ferrari, IEEE TPAMI, 2012.

Pose Estimation Intro

objectness.ppt

Apr 24

Stephen Siena

Greydon Foil

Readings:

1. Pictorial Structures Revisited: People Detection and Articulated Pose Estimation, M. Andriluka, S. Roth, B. Schiele, CVPR, 2009.

2. Learning effective human pose estimation from inaccurate annotation, S. Johnson and M. Everingham, CVPR, 2011.

Readings:

1. Twin Gaussian Processes for Structured Prediction, L. Bo and C. Sminchisescu, IJCV, 2010.

Apr 26

Leon

Shoou-I Yu

Tracking Intro 

Readings:

1. Robust Visual Tracking using L1 Minimization, X. Mei and H. Ling, ICCV, 2009.

May 1

Michalis Raptis

Zhen-zhong Lan

Action Recognition (Guest lecture by Michalis Raptis)

Readings:

1. Evaluation of local spatio-temporal features for action recognition, H. Wang, M. M. Ullah, A. Kläser, I. Laptev and C. Schmid, BMVC, 2009.

May 3

Alyosha

 The Last Lecture

conclusion.ppt

May 4

6-9pm

Synthesis project presentations

Similar Courses

This course has been inspired by these offered by several of our colleagues. Here is a partial list:

Tutorials, workshops, summer schools and seminars: