The goal of this challenge is to recognise objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images will be provided. The four object classes that have been selected are:

  • motorbikes
  • bicycles
  • people
  • cars

There will be two main competitions:

  • For each of the 4 classes, predicting presence/absence of an example of that class in the test image.
  • Predicting the bounding box and label of each object from the 4 target classes in the test image.

Contestants may enter either (or both) of these competitions, and can choose to tackle any (or all) of the four object classes. The challenge allows for two approaches to each of the competitions:

  • Contestants may use systems built or trained using any methods or data excluding the provided test sets.
  • Systems are to be built or trained using only the provided training data.

The intention in the first case is to establish just what level of success can currently be achieved on these problems and by what method; in the second case the intention is to establish which method is most successful given a specified training set.