The goal of this challenge is to attract the attention of the Machine Learning community towards the problem where the input distributions, p(x), are different for test and training inputs. A number of regression and classification tasks are proposed, where the test inputs follow a different distribution than the training inputs. Training data (input-output pairs) are given, and the contestants are asked to predict the outputs associated to a set of validation and test inputs. Probabilistic predictions are strongly encouraged, though non-probabilitic “point” predictions are also accepted. The performance of the competing algorithms will be evaluated both with traditional losses that only take into account “point predictions” and with losses that evaluate the quality of the probabilistic predictions.

Stemmatology (a.k.a. stemmatics) studies relations among different variants of a document that have been gradually built from an original by copying and modifying earlier versions. The aim of such study is to reconstruct the family tree of the variants. We invite applications of established and, in particular, novel approaches, including but of course not restricted to hierarchical clustering, graphical modeling, link analysis, phylogenetics, string-matching, etc. The objective of the challenge is to evaluate the performance of various approaches. Several sets of variants for different texts are provided, and the participants should attempt to reconstruct the relationships of the variants in each data-set. This enables the comparison of methods usually applied in unsupervised scenarios.

Letter-to-phoneme conversion is a classic problem in machine learning (ML), as it is both hard (at least for languages like English and French) and important. For non-linguists, a ‘phoneme’ is an abstract unit corresponding to the equivalence class of physical sounds that ‘represent’ the same speech sound. That is, members of the equivalence class are perceived by a speaker of the language as the ‘same’ phonemes: the word ‘cat’ consists of three phonemes, two of which are shared with the word ‘bat’. A phoneme is defined by its role in distinguishing word pairs like ‘bat’ and ‘cat’. Thus, /b/ and /k/ are different phonemes. But the /b/ in ‘bat’ and the /b/ in ‘tab’ are the same phoneme, in spite of their different acoustic realisations, because the difference between them is never used (in English) to signal a difference between minimally-distinctive word-pairs. Although we intend to give most prominence to letter-to-phoneme conversion, the community is challenged to develop and submit innovative solutions to these related problems.

The Visual Object Classes Chellenges has the following objectives:

  • To compile a standardised collection of object recognition databases
  • To provide standardised ground truth object annotations across all databases
  • To provide a common set of tools for accessing and managing the database annotations
  • To run a challenge evaluating performance on object class recognition

Touch Clarity (www.touchclarity.com) provides real time optimisation of websites. Touch Clarity chooses, from a number of options, the most popular content to display on a page. This decision is made by tracking how many visitors respond to each of the options, by clicking on them. This is a direct commercial application of the multi armed bandit problem – each of the items which might be shown is a separate bandit, with a separate response rate. As in the multi armed bandit problem, there is a trade off between exploration and exploitation – it is necessary to sometimes serve items other than the most popular in order to measure their response rate with sufficient precision to correctly identify which is the most popular. However, in this application there is a further complication – typically the rates of response to each item will vary over time, so continuous exploration is necessary in order to track this time variation, as old knowledge becomes out of  date. An extreme example of this might be in choosing which news story to serve as the main story on a news page – interest in one story will decrease over time while interest in another will increase. In addition, the interest in several stories might vary in a similar, coherent way – for example a general increase in interest in sports stories at weekends, or in political stories near to an election. So there are typically two types of variation to consider – where response rates vary together, and where response rates vary completely independently.

The objective of the Challenge is to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval, and statistical language modeling. The scientific goals are:

  • To learn of the phenomena underlying word construction in natural languages
  • To discover approaches suitable for a wide range of languages
  • To advance machine learning methodology

Textual Entailment Recognition has been proposed recently as a generic task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and (multi) document summarisation. This task requires to recognise, given two text fragments, whether the meaning of one text is entailed (can be inferred) from the other text. By introducing a second challenge we hope to keep the momentum going, and to further promote the formation of a research community around the applied entailment task. As in the previous challenge, the main task is judging whether a hypothesis (H) is entailed by a text (T). One of the main goals for the RTE-2 dataset is to provide more “realistic” text-hypothesis examples, based mostly on outputs of actual systems. We focus on the four application settings mentioned above: QA, IR, IE and multi-document summarisation. Each portion of the dataset includes typical T-H examples that correspond to success and failure cases of such applications. The examples represent different levels of entailment reasoning, such as lexical, syntactic, morphological and logical.

Recent years have seen a surge in research of text processing applications that perform semantic-oriented inference about concrete text meanings and their relationships. Even though many applications face similar underlying semantic problems, these problems are usually addressed in an application oriented manner. Consequently it is difficult to compare, under a generic evaluation framework, semantic methods that were developed within different applications. The PASCAL Challenge introduces textual entailment as a common task and evaluation framework for Natural Language Processing, Information Retrieval and Machine Learning researchers, covering a broad range of semantic-oriented inferences needed for practical applications. This task is therefore suitable for evaluating and comparing semantic-oriented models in a generic manner. Eventually, work on textual entailment may promote the development of generic semantic “engines”, which will play an analogous role to that of generic syntactic analyzers across multiple applications.

The objective of the challenge is to develop machine learning methods for structured data mining and to evaluate these methods for XML document mining tasks. The challenge is focused on classification and clustering for XML documents. Datasets coming from different XML collections and covering a variety of classification and clustering situations will be provided to the participants. One goal of this track is to build a reference categorisation/ clustering corpora of XML documents. The organisers are open to any suggestion concerning the construction of such corpora.

This project is dedicated to stimulate research and reveal the state-of-the art in “model selection” by organising a competition followed by a workshop. Model selection is a problem in statistics, machine learning, and data mining. Given training data consisting of input-output pairs, a model is built to predict the output from the input, usually by fitting adjustable parameters. Many predictive models have been proposed to perform such tasks, including linear models, neural networks, trees, and kernel methods. Finding methods to optimally select models, which will perform best on new test data, is the object of this project. The competition will help identify accurate methods of model assessment, which may include variants of the well-known cross-validation methods and novel techniques based on learning theoretic performance bounds. Such methods are of great practical importance in pilot studies, for which it is essential to know precisely how well desired specifications are met.