Recognition and Classification in Images and Video *

203.4780

Course overview Useful links Syllabus Detailed schedule

Meeting Times: Monday 9-12, Room 704

Instruction Hour: Wednesday 11:00-12:00, Room 410 (Jacobs)

Instructor: Dr. Rita Osadchy

e-mail: rita [at]cs [dot]haifa.ac.il
Office: Jacobs 410

Announcements:

§ NEW: Final Project submission is due to 12.10

§ 30.03 no class

§ All announcements and guidelines will be distributed by email.

§ Those who do not send their contact address on time will not be added to the contact list!!!

§ You must send me an email to (rita[at]cs[dot]haifa.ac.il) by March 15 from your active address with the subject "course 4780"

Course overview:

General: This is a graduate course in computer vision. We will survey and discuss vision papers relating to object and activity recognition and scene understanding. The goal of the course is to understand classical and modern approaches to some important problems, analyzing their strengths and weaknesses, and identifying interesting open questions.

Requirements: Students will be responsible for writing a paper review each week, participating in discussions, completing a programming project, and presenting one topic in a class.

Note that presentations are due one week before the slot your presentation is scheduled. This means you will need to read the papers, create slides, etc. one week before the date you are signed up for, to leave time for improvement. Note, that you should get my approval for your presentation.

More details on the requirements and grading breakdown are here.

Syllabus:

A. Recognizing specific objects

Global features:

1. Linear Subspaces

2. Detection as a binary decision

Local features:

3. Local features, matching for object instances

4. Visual Vocabularies and Bag of Words

Region-based methods:

5. Mid-Level Representations

B. Beyond Single objects (using additional information)

1. Saliency

2. Attributes

3. Context

C. Scalability problems

1. Scaling with the large number of categories

2. Large-scale search

D. Action recognition in video and images

Schedule and papers:

Note: * = required reading.
Additional papers are provided for reference, and as a starting point for background reading for projects.
Paper presentations: Cover the starred papers.

data]

Date

Topics

Papers and links

Presenters

16.3

Course intro

[slides]

23.3

Introduction to Object and Event Recognition

[slides]

30.3

No class

13.4

Linear Subspaces

Global appearance models for object recognition, dimensionality reduction.

o *Eigenfaces for Recognition, Turk and Pentland, 1991. [pdf]

o *P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition using Class Specific Linear Projection, 1996 [pdf]

o Face Database [here]

Additional Material

Shimon Ullman and Ronen Basri, Recognition by Linear Combinations of Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991 [pdf]

T.F. Cootes and C.J. Taylor, "Statistical models of appearance for medical image analysis and computer vision", Proc. SPIE Medical Imaging 2001. [pdf]

Magali Nadav

[pdf]

20.4

Local features and matching for object instances:

Invariant local features, instance recognition

SiftModelsFound

o *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999. [pdf] [code] [other implementations of SIFT] [IJCV]

o *Selected pages from: Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk. Foundations and Trends in Computer Graphics and Vision, 2008. [pdf] [Oxford code] [Read pp. 178-188, 216-220, 254-255]

o Oxford group interest point software

o Andrea Vedaldi's VLFeat code, including SIFT, MSER, hierarchical k-means.

o INRIA LEAR team's software, including interest points, shape features

o FLANN - Fast Library for Approximate Nearest Neighbors. Marius Muja et al.

o Google Goggles

o Kooaba

Additional Material

For more background on feature extraction: Szeliski book: Sec 3.2 Linear filtering, 4.1 Points and patches, 4.2 Edges

Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

SURF: Speeded Up Robust Features, Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008. [pdf] [code]

Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002. [pdf]

A Performance Evaluation of Local Descriptors. K. Mikolajczyk and C. Schmid. CVPR 2003 [pdf]

שון מן

שמעון פאץ

[pdf]

27.4

Patch-based Representations

visual vocabularies, bag-of-words and SPK for scene classification

o *Visual Categorization with Bags of Keypoints, C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, ECCV International Workshop on Statistical Learning in Computer Vision, 2004. [pdf]

o *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR [pdf], [code],[data].

LIBPMK feature extraction code, includes dense sampling
LIBSVM library for support vector machines
PASCAL VOC Visual Object Classes Challenge

Additional Material

Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003. [pdf] [demo]

Pedestrian Detection in Crowded Scenes, Leibe, Seemann, and Schiele, CVPR 2005. [pdf]

Sampling Strategies for Bag-of-Features Image Classification. E. Nowak, F. Jurie, and B. Triggs. ECCV 2006. [pdf]

Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR 2006. [pdf]

Orr Zilberman and Ran Bakalo [pdf]

4.5

Detection as a binary decision

Sliding window detection, detection as a binary decision problem.

o *Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, CVPR 2005. [pdf] [code] [PASCAL datasets]

o *Rapid Object Detection Using a Boosted Cascade of Simple Features, Viola and Jones, CVPR 2001. [pdf] [code]

o LIBSVM library for support vector machines

o PASCAL VOC Visual Object Classes Challenge

o Face data

Additional Material

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.

A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb, D. McAllester and D. Ramanan. CVPR 2008. [pdf] [code]

A Trainable System for Object Detection, C. Papageorgiou and T. Poggio, IJCV 2000. [pdf]

Class-specific Hough Forests for Object Detection. J. Gall and V. Lempitsky. CVPR 2009. [pdf] [slides] [code]

דוד, טל [pdf]

11.5

Importance and saliency

Among all items in the scene, which deserve attention (first)? What makes images interesting or memorable?

*Understanding and Predicting Importance in Images. A. Berg et al. CVPR 2012. [pdf] [UIUC sentence dataset] [ImageClef

dataset ]

*Learning to predict where humans look. Tike Judd, Krista Ehinger, Fredo Durand, Antonio Torralba ICCV 2009 [pdf] [code and data]

Learning to Detect a Salient Object. T. Liu et al. CVPR 2007. [pdf] [results] [data] [code]

What Makes an Image Memorable? P. Isola, J. Xiao, A. Torralba, A. Oliva. CVPR 2011. [pdf] [web] [code/data]

What Do We Perceive in a Glance of a Real-World Scene? L. Fei-Fei, A. Iyer, C. Koch, and P. Perona. Journal of Vision, 2007. [pdf]

What Makes a Patch Distinct? Ran Margolin, Ayellet Tal, Lihi Zelnik-Manor. [pdf], [code].

Surface Regions of Interest for Viewpoint Selection, George Leifman, Elizabeth Shtrom and Ayellet Tal. [pdf]

A Model of Saliency-based Visual Attention for Rapid Scene Analysis. L. Itti, C. Koch, and E. Niebur. PAMI 1998 [pdf]

Interesting Objects are Visually Salient. L. Elazary and L. Itti. Journal of Vision, 8(3):1–15, 2008. [pdf]

What is an Object? B. Alexe, T. Deselaers, and V. Ferrari. CVPR 2010. [pdf] [code]

A Principled Approach to Detecting Surprising Events in Video. L. Itti and P. Baldi. CVPR 2005 [pdf]

Key-Segments for Video Object Segmentation. Y. J. Lee, J. Kim, and K. Grauman. ICCV 2011 [pdf]

18.5

Describing objects with attributes

Visual properties, learning from natural language descriptions, intermediate representations

o *Describing Objects by Their Attributes, A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009. [pdf] [web and data]

o *Attribute and Simile Classifiers for Face Verification, N. Kumar, A. Berg, P. Belhumeur, S. Nayar. ICCV 2009 [pdf] [web] [lfw

data] [pubfig

Relative Attributes. D. Parikh and K. Grauman. ICCV 2011. [pdf] [code/data]

FaceTracer: A Search Engine for Large Collections of Images with Faces. N. Kumar, P. Belhumeur, and S. Nayar. ECCV 2008 [pdf] [code,data,demo]

Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009 [pdf] [web] [data]

A Joint Learning Framework for Attribute Models and Object Descriptions. D. Mahajan, S. Sellamanickam, V. Nair. ICCV 2011. [pdf]

SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes. G. Patterson and J. Hays. CVPR 2012. [pdf] [data]

Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search. W. Scheirer, N. Kumar, P. Belhumeur, T. Boult. CVPR 2012 [pdf]

Attribute-Centric Recognition for Cross-Category Generalization. A. Farhadi, I. Endres, D. Hoiem. CVPR 2010. [pdf]

ירין דידי

גיא עזר [pdf]

Recognition and Classification in Images and Video *

Instructor: Dr. Rita Osadchy

e-mail: rita [at]cs [dot]haifa.ac.il Office: Jacobs 410

Announcements:

Course overview:

Other useful links:

e-mail: rita [at]cs [dot]haifa.ac.il
Office: Jacobs 410