Post-doc position at INRIA (LEAR team)

The LEAR team at INRIA Grenoble is looking for a qualified post-doctoral researcher with a specialization in Computer Vision and Machine Learning, on the topic of discovering relationships between actions and objects.

The position is offered at the “Rhone-Alpes” Research Unit of INRIA, located near Grenoble and Lyon. The Unit includes more than 600 people, within 26 research teams and 10 support services.

Starting date: Summer 2009

Deadline for applications: June 2009.

Monthly salary after taxes : 1 983 € (medical insurance included)

Contact: Remi.Ronfard (at)


Recently, a number of image ranking approaches were proposed that build upon visual words similarity networks (i.e. [3,4]). These methods explore relationships between object categories by analyzing similarities of the extracted visual features. In the case of video actions, the relationships are more complex as similarities can be observed in the spaces of image features, motion features, and also in the joint space of image and motion features. An approach to discovering relationships in such networks would allow for recognition of objects, motions, and human-object interactions. The initial investigation can be performed along the lines in [3,4].

In order to achieve the above goal, a good feature extraction method has to be developed. Existing spatio-temporal features describe information of a video subvolume of a simple shape. Intuitively, the procedure that discovers the shapes of such subregions should be guided by some general measure of the subregion descriptiveness. Unfortunately, straightforward extensions of the common 2D subregion extraction methods [1] may not be appropriate. Additionally, approaches to obtaining good descriptors of the extracted subregions should be investigated., with special care taken to obtain good view and time-invariant spatio-temporal descriptors.

In order to investigate the relationships between actions and objects, the problem of analyzing human-object interactions should be addressed. It would be of significant practical benefit to have a method for recognizing interactions from an egocentric

camera. Ideally, the approach would discover atomic interactions from sequences of long-term activities. Some of the possible approaches to implement the idea would be to consider the interaction models [2].

Skills and Profile

* PhD degree (preferably in Computer Vision or Machine Learning)

* Solid programming skills; the project involves programming in Matlab and C++

* Solid mathematics knowledge (especially linear algebra and statistics)

* Creative and highly motivated

* Fluent in English, both written and spoken

* Prior knowledge in the areas of action recognition, video retrieval or object recognition is a plus


[1] A. Oikonomopoulos, I. Patras, and M. Pantic, Human action recognition with spatiotemporal salient points, IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, vol. 36, no. 3, pp. 710-719, 2006.

[2] Hedvig Kjellstrasom, Javier Romero, David Martinez Mercado, and Danica Kragic, Simultaneous visual recognition of manipulation actions and manipulated objects, in ECCV (2), 2008, pp. 336-349.

[3] Gunhee Kim, C. Faloutsos, and M. Hebert,Unsupervised modeling of object categories using link analysis techniques, in CVPR, 2008,pp. 1-8.

[4] Yushi Jing and Shumeet Baluja, Visualrank: Applying pagerank to large-scale image search, TPAMI, vol. 30, no. 11, pp. 1877-1890, 2008.