Recognition and Classification in Images and Video *


Course overview        Useful links        Syllabus        Detailed schedule  

Meeting Times: Monday 9-12, Room 704

Instruction Hour: Wednesday 11:00-12:00, Room 410 (Jacobs)

Instructor: Dr. Rita Osadchy

e-mail: rita [at]cs [dot]
Office: Jacobs 410




  NEW: Final Project submission is due to 12.10

  30.03 no class

  All announcements and guidelines will be distributed by email.

  Those who do not send their contact address on time will not be added to the contact list!!!

  You must send me an email to (rita[at]cs[dot] by March 15 from your active address with the subject "course 4780"


Course overview:


General: This is a graduate course in computer vision.   We will survey and discuss vision papers relating to object and activity recognition and scene understanding.  The goal of the course is to understand classical and modern approaches to some important problems, analyzing their strengths and weaknesses, and identifying interesting open questions.

Requirements: Students will be responsible for writing a paper review each week, participating in discussions, completing a programming project, and presenting one topic in a class.

Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, create slides, etc. one week before the date you are signed up for, to leave time for improvement. Note, that you should get my approval for your presentation.

More details on the requirements and grading breakdown are here


A.    Recognizing specific objects

Global features:

1.     Linear Subspaces

2.     Detection as a binary decision

Local features:

3.     Local features, matching for object instances

4.     Visual Vocabularies and Bag of Words


Region-based methods:

5.     Mid-Level Representations

B.    Beyond Single objects (using additional information)

1.     Saliency

2.     Attributes

3.     Context


C.    Scalability problems

1.     Scaling with the large number of categories

2.     Large-scale search


D.    Action recognition in video and images

Schedule and papers:

Note:  * = required reading. 
Additional papers are provided for reference, and as a starting point for background reading for projects.
Paper presentations: Cover the starred papers.




Papers and links



Course intro 



Introduction to Object and Event Recognition




No class




Linear Subspaces

Global appearance models for object recognition, dimensionality reduction.









o    *Eigenfaces for Recognition, Turk and Pentland, 1991.  [pdf]

o    *P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition using Class Specific Linear Projection, 1996 [pdf]

o    Face Database [here]

Additional Material

Shimon Ullman and Ronen Basri, Recognition by Linear Combinations of Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991 [pdf]

T.F. Cootes and C.J. Taylor, "Statistical models of appearance for medical image analysis and computer vision", Proc. SPIE Medical Imaging 2001. [pdf]



Magali Nadav



Local features and matching for object instances:

Invariant local features, instance recognition


o    *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

o    *Selected pages from: Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Read pp. 178-188, 216-220, 254-255]


o    Oxford group interest point software

o    Andrea Vedaldi's VLFeat code, including SIFT, MSER, hierarchical k-means.

o    INRIA LEAR team's software, including interest points, shape features

o    FLANN - Fast Library for Approximate Nearest Neighbors.  Marius Muja et al. 

o    Google Goggles

o    Kooaba

Additional Material

For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008.  [pdf] [code]

Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002.  [pdf]

A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid.  CVPR 2003 [pdf]

שון מן

שמעון פאץ



Patch-based Representations

visual vocabularies, bag-of-words and SPK for scene classification

o    *Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004.  [pdf]

o    *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR [pdf], [code],[data].

Additional Material

Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]

Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005.  [pdf]

Sampling Strategies for Bag-of-Features Image Classification.  E. Nowak, F. Jurie, and B. Triggs.  ECCV 2006. [pdf]

Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]



Orr Zilberman and Ran Bakalo [pdf]


Detection as a binary decision

Sliding window detection, detection as a binary decision problem.



o    *Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005.  [pdf]  [code] [PASCAL datasets]

o    *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001.  [pdf]  [code]


o    LIBSVM library for support vector machines

o    PASCAL VOC Visual Object Classes Challenge

o    Face data

Additional Material

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.

A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  DMcAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code

A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000.  [pdf]

Class-specific Hough Forests for Object Detection.  J. Gall and V. Lempitsky.  CVPR 2009.  [pdf] [slides] [code]

דוד, טל [pdf]


Importance and saliency

Among all items in the scene, which deserve attention (first)?  What makes images interesting or memorable?



Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code]

What Makes an Image Memorable?  P. Isola, J. Xiao, A. Torralba, A. Oliva. CVPR 2011. [pdf] [web] [code/data]

What Do We Perceive in a Glance of a Real-World Scene?  L. Fei-Fei, A. Iyer, C. Koch, and P. Perona.  Journal of Vision, 2007.  [pdf]

What Makes a Patch Distinct? Ran Margolin, Ayellet Tal, Lihi Zelnik-Manor. [pdf], [code].

Surface Regions of Interest for Viewpoint Selection, George Leifman, Elizabeth Shtrom and Ayellet Tal. [pdf]

A Model of Saliency-based Visual Attention for Rapid Scene Analysis.  L. Itti, C. Koch, and E. Niebur.  PAMI 1998  [pdf]

Interesting Objects are Visually Salient.  L. Elazary and L. Itti.  Journal of Vision, 8(3):115, 2008.  [pdf]

What is an Object?  B. Alexe, T. Deselaers, and V. Ferrari.  CVPR 2010.  [pdf] [code]

A Principled Approach to Detecting Surprising Events in Video.  L. Itti and P. Baldi.  CVPR 2005  [pdf]

Key-Segments for Video Object Segmentation.  Y. J. Lee, J. Kim, and K. Grauman.  ICCV 2011  [pdf]


Describing objects with attributes

Visual properties, learning from natural language descriptions, intermediate representations

o    *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009. [pdf] [web and data]

o    *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar. ICCV 2009 [pdf] [web] [lfw

data] [pubfig

Relative Attributes.  D. Parikh and K. Grauman.  ICCV 2011.  [pdf]  [code/data]

FaceTracer: A Search Engine for Large Collections of Images with Faces. N. Kumar, P. Belhumeur, and S. Nayar. ECCV 2008 [pdf] [code,data,demo]

Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

A Joint Learning Framework for Attribute Models and Object Descriptions.  D. Mahajan, S. Sellamanickam, V. Nair.  ICCV 2011.  [pdf]

SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes.  G. Patterson and J. Hays.  CVPR 2012.  [pdf] [data]

Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search.  W. Scheirer, N. Kumar, P. Belhumeur, T. Boult.  CVPR 2012  [pdf]

Attribute-Centric Recognition for Cross-Category Generalization.  A. Farhadi, I. Endres, D. Hoiem.  CVPR 2010.  [pdf]

ירין דידי

גיא עזר [pdf]


Large-scale image/object search and mining:

Scalable retrieval algorithms, mining for visual themes, particularly for object instances

o    *Semi-supervised hashing for large-scale search. J Wang, S Kumar, SF Chang [pdf]

o    *Hierarchical Semantic Indexing for Large Scale Image Retrieval. Jia Deng, Alex Berg, Li Fei-Fei [pdf]



Kernelized Locality Sensitive Hashing for Scalable Image Search, by B. Kulis and K. Grauman, ICCV 2009 [pdf]  [code] [80M Tiny Images data]

Image Webs: Computing and Exploiting Connectivity in Image Collections.  K. Heath, N. Gelfand, M. Ovsjanikov, M. Aanjaneya, and L. Guibas.  CVPR 2010.  [pdf]

Discovering Favorite Views of Popular Places with Iconoid Shift.  T. Weyand and B. Leibe.  ICCV 2011.  [pdf] [Paris 500K dataset]

Total Recall II: Query Expansion Revisited.  O. Chum, A. Mikulik, M. Perdoch, and J. Matas.  CVPR 2011.  [pdf]

Geometric Min-Hashing: Finding a (Thick) Needle in a Haystack, O. Chum, M. Perdoch, and J. Matas.  CVPR 2009.  [pdf]

Learning Query-dependent Prefilters for Scalable Image Retrieval.  L. Torresani, M. Szummer, and A. Fitzgibbon.  CVPR 2009.  [pdf]

Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, C. Lampert, ICCV 2009.  [pdf]  

Object Retrieval with Large Vocabularies and Fast Spatial Matching.  J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, CVPR 2007.  [pdf] [approx k-means code]



Project Discussion




Dealing with many categories

Sharing features between classes, transfer, taxonomy, learning from few examples, exploiting class relationships

o    *Sharing Visual Features for Multiclass and Multiview Object Detection, A. Torralba, K. Murphy, W. Freeman, PAMI 2007. [pdf] [code]

o    *A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, pp.10971105 (2012) [pdf] [code (other versions are available)]. [Slides on CNN]



Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition.  T. Gao and Daphne Koller ICCV 2011.  [pdf] [code]

Tabula Rasa: Model Transfer for Object Category Detection. Y. Atar and A. Zisserman.  CVPR 2011. [pdf] [HoG code]

Comparative Object Similarity for Improved Recognition with Few or Zero Examples. G. Wang, D. Forsyth, and D. Hoeim. CVPR 2010. [pdf]

Constructing Category Hierarchies for Visual Recognition, M. Marszalek and C. Schmid.  ECCV 2008.  [pdf]  [web] [Caltech256]

Incremental Learning of Object Detectors Using a Visual Shape Alphabet.  Opelt, Pinz, and Zisserman, CVPR 2006.  [pdf]

איליה בלד


אלקס וטרונין



Activity recognition

Recognizing and localizing human actions in video or static images

o    *Learning Realistic Human Actions from Movies. I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld. CVPR 2008. [pdf] [data] [code]

o    *Action Recognition from a Distributed Representation of Pose and Appearance, S. Maji, L. Bourdev, J. Malik, CVPR 2011. [pdf] [code]



Detecting Actions, Poses, and Objects with Relational Phraselets.  C. Desai and D. Ramanan.  ECCV 2012.  [pdf] [data] [code]

Beyond Actions: Discriminative Models for Contextual Group Activities.  T. Lian, Y. Wang, W. Yang, and G. Mori.  NIPS 2010.  [pdf] [data]

Efficient Activity Detection with Max-Subgraph Search.  C.-Y. Chen and K. Grauman. CVPR 2012.  [pdf] [project page]  [code]

Action Bank: a High-Level Representation of Activity in Video.  S. Sadanand and J. Corso.  CVPR 2012 [pdf]  [code/data]

A Hough Transform-Based Voting Framework for Action Recognition.  A. Yao, J. Gall, L. Van Gool.  CVPR 2010.  [pdf[code/data]

Actions in Context, M. Marszalek, I. Laptev, C. Schmid.  CVPR 2009.  [pdf] [web] [data]

Objects in Action: An Approach for Combining Action Understanding and Object Perception.   A. Gupta and L. Davis.  CVPR, 2007.  [pdf]  [data]

A Scalable Approach to Activity Recognition Based on Object Use. J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg.  ICCV 2007.  [pdf]

Recognizing Actions at a Distance.  A. Efros, G. Mori, J. Malik.  ICCV 2003.  [pdf] [web]

Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition.  A. Kovashka and K. Grauman.  CVPR 2010.  [pdf]

Temporal Causality for the Analysis of Visual Events.  K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg.  CVPR 2010.  [pdf] [Georgia Tech Computational Behavior Science project]

What's Going on?: Discovering Spatio-Temporal Dependencies in Dynamic Scenes.  D. Kuettel et al.  CVPR 2010.  [pdf]

Learning Actions From the Web.  N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff.  ICCV 2009.  [pdf]


Other useful links:


* This course is based on UT-Austin course: Special Topics in Computer Vision, by Kristen Grauman: