Gravitational lensing is the process where light from distant galaxies is bent by intervening mass in the Universe as it travels towards us. This bending process causes the shapes of galaxies to appear distorted. By measuring the properties and statistics of this distortion we are able to measure the properties of both dark matter and dark energy. For the vast majority of galaxies the effect of gravitational lensing is to simply apply a matrix distortion to the whole galaxy image: The shears g1 and g2 determines the amount of stretching along the axes, and along the diagonals, respectively. Since galaxies are not circular, we cannot tell whether any individual galaxy has been gravitationally lensed. We must statistically combine the measured shapes of many galaxies, marginalising over the (poorly known) intrinsic galaxy shape distribution, to extract information on dark matter and dark energy.

The GREAT challenges focussed on this unresolved and crucial problem which is of paramount importance for current and future cosmological observations. The resolution of this statistical inference problem would allow the cosmological world to answer some of the most important questions in physics. Solution of this problem would allow the cosmological community to reveal the nature of dark energy with the highest possible precision. This could rule out Einstein’s cosmological constant as a candidate for the dark energy and inspire a new theory to replace Einstein’s gravity.

For the challenges a suite of several million images was provided for download from a server at UCL with multiple mirrors provided at other institutions. Each image contains one galaxy or star (convolution kernel image) in roughly the center of the image. The images would be labelled as star or galaxy. The images would be divided into sets. Each set would contain a small number of star images from which the convolution kernel can be obtained for that set. Each galaxy image in a set would have the same shear (and convolution kernel) applied. The GREAT participant would then submit a shear estimate for each set of images. A key problem is that, as in real life, we will not be providing a model describing the shapes of the stars or galaxies. These must be inferred from the data simultaneously with measuring the shear, from noisy, incomplete and pixelised data.

The challenges have at least two key aspects that go beyond applications of machine learning. Firstly the estimation is required to be extremely accurate, something that contrasts with more traditional estimation tasks. Secondly the sizes of the data sets are very large. Both of these features have made the challenges of great interest to current developments in machine learning.

A Brain-Computer Interface (BCI) is a novel augmentative communication system that translates human intentions – reflected by suitable brain signals – into a control signal for an output device such as a computer application or a neuroprosthesis. In developing a BCI system many fields of research are involved, such as classification, signal processing, neurophysiology, measurement technology, psychology, control theory. In recent EEG-based BCI research the role of machine learning (a BCI approach pioneered by Fraunhofer FIRST at NIPS*01) becomes more and more important. In the literature, many machine learning and pattern classification algorithms have been reported to give impressive results when applied to BCI data in offline analyses. However, it is more difficult to evaluate their relative value for actual online use. Typically in each publication a different data set is used, such that — given the high inter-subject variability with respect to BCI performance — a comparison between different methods is practically impossible. Furthermore the offline evaluation EEG classification methods holds many possible pitfalls that lead to an overestimation of the performance. BCI data competitions have been organized to provide objective formal evaluations of alternative methods and therefore to foster the development of improved BCI technology by providing an unbiased validation of a variety of data analysis techniques.

Five Brain-Computer-Interface (BCI) competitions have been held, all to great success. The first BCI Competitions addressed basic problems of BCI research (most tasks posed the problem of classifying short term windows of defined mental states), and later BCI Competition addressed advanced problems with time-continuous feedback, classifiers that needed to be applied to sliding windows and the integration of different measurement sources for generating BCI control signals. More than 200 submissions from more than 50 different labs; one overview article has appeared (IEEE Trans Neural Sys Rehab Eng, 14(2):153-159, 2006) and a special volume of Lecture Notes Computational Science in 2010). Furthermore individual articles of the competition winners have appeared in different journals.

The challenge addressed the issue of active learning: a server can be accessed via the web and queries can be made. The goal (for the participant) is to obtain the best classifier making use of a fixed number of queries. The classifiers used are DFA, allowing participants to use Angluin’s L* algorithm as a starting point. Zulu is both a web based platform simulating an Oracle in a DFA learning task and a competition. As a web platform, Zulu allows users to generate tasks, to interact with the Oracle in learning sessions and to record the results of the users. It provides the users with a baseline algorithm written in JAVA, or the elements allowing to build from scratch a new learning algorithm capable of interacting with the server. In order to classify the contestants, a two-dimensional grid was used: one dimension concerns the size (in states) of the automata, and the other the size of the alphabet.

This challenge uses important marketing problems to benchmark classification methods in a setting typical of large-scale industrial applications. Three large databases made available by the French Telecom company, Orange, were used, each with tens of thousands of examples and variables. These data are unique in that they have both a large number of examples and a large number of variables, making the problem particularly challenging to many state-of-the-art machine learning algorithms. The problems used to illustrate this technical difficulty were the marketing problems of churn, appetency and up-selling. Churn is the propensity of customers to switch between service providers, appetency is the propensity of customers to buy a service, and up-selling is the success in selling additional good or services to make a sale more profitable. The challenge participants were given customer records and their goal was to predict whether a customer will switch provider (churn), buy the main service (appetency) and buy additional extras (up-selling), hence solving simultaneously three 2-class classification problems. Large prizes were donated by Orange (10000 Euros) to encourage participation. Winners were designated for gold, silver and bronze prizes, sharing the total amount.

Multiple Simultaneous Hypothesis Testing is a main issue in many areas of information extraction:

  • rule extraction,
  • validation of genes influence,
  • validation of spatio-temporal patterns extraction (e.g. in brain imaging),
  • other forms of spatial or temporal data (e.g. spatial collocation rule).
  • other multiple hypothesis testing,

In all above frameworks, the goal is to extract patterns such that some quantity of interest is significantly greater than some given threshold.

The objective of the challenges was to design a statistical machine learning algorithm that discovers the morphemes (smallest individually meaningful units of language) that comprise words. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval, and statistical language modeling. The scientific goals are:

  • To learn of the phenomena underlying word construction in natural languages
  • To discover approaches suitable for a wide range of languages
  • To advance machine learning methodology

The Morpho Challenges ran successfully in 2005, 2007, 2008, 2009 and 2010. They aimed at advancing the field of machine learning by providing a concrete application challenge for both semi-supervised and unsupervised algorithms whose objective is to learn to provide morphological analyses for words. The algorithms will be evaluated in information retrieval (IR) and statistical machine translation (SMT) tasks. Both tasks are evaluated using the state-of-the-art evaluation systems and evaluation corpora to see which algorithm performs best and does it improve the state-of-the-art. The Morpho Challenges have evoked significant interest both as participants in the challenge and as citations of the evaluation results.

This challenge addresses machine learning problems in which labeling data is expensive, but large amounts of unlabeled data are available at low cost. Examples include: – Handwriting and speech recognition; – Document classification (including Internet web pages); – Vision tasks; – Drug design using recombinant molecules or protein engineering. Such problems might be tackled from different angles: learning from unlabeled data or active learning. In the former case, the algorithms must satisfy themselves with the limited amount of labeled data and capitalize on the unlabeled data with semi-supervised learning methods. Several challenges have addressed this problem in the past. In the latter case, the algorithms may place a limited number of queries to get new sample labels. The goal in that case is to optimize the queries and the problem is referred to as active learning. In most past challenges we organized, we used the same datasets during the development period and during the test period. In this challenge we used two sets of datasets, one for development and one for the final test, drawn from: Embryology, cancer diagnosis, chemoinformatics, handwriting recognition, text ranking, ecology, and marketing.

In this challenge we consider the problem of separating and recognising speech in the cluttered acoustic backgrounds that characterise everyday listening conditions. In 2005, Pascal sponsored a highly successful ‘Speech Separation Challenge,’ which addressed the problem of recognising overlapping speech in single and multiple microphone scenarios. Although the challenge attracted much interest and culminated in the publication of a dedicated special issue of Computer Speech and Language, the focus on overlapping speech encouraged special-case solutions that do not necessarily generalise to real application scenarios. Five years on, the second challenge in PASCAL2 built on this work by extending the problem in ways that better modelled the demands of real noise-robust speech processing systems. In particular we considered the problem of a ‘speech-driven home automation’ application that needs to recognise spoken commands within the ongoing complex mixture of background sounds found in a typical domestic environment. The task was to identify the target commands being spoken given the binaural mixtures. Data was supplied first as isolated utterances (as is traditional for speech recognition evaluations) and then, more realistically, as sequences of utterances mixed intermittently into extended background recording sessions.

The two challenges held during PASCAL2 build on the success of the Exploration vs exploitation challenge run in PASCAL1. This challenge considered the standard bandit problem but with response rates changing over time. Despite the apparent simplicity of this challenge it inspired a range of very important developments including the UCT (Upper Confidence Tree) algorithm and its successful application to artificial Go in the award winning MoGo system. The earlier challenge included a £1000 award to the winner. The later challenges built on the earlier challenge in two important respects. Firstly, they considered so-called multi-variate bandits, that is bandits where the visitor/arm combination have associated features that are expected potentially to enable more accurate prediction of the response probability for that combination. Secondly, the data was drawn from a real-world dataset of advertisement (banner) placement on webpages with the response corresponding to click-through by the user. The multi-variate bandit problem represents an important stepping stone towards more complex problems involving delayed feedback, such as reinforcement learning. It involves a single state, but by involving the additional features takes significantly closer to standard supervised learning when compared to the simple bandits considered in the first challenge. The ability to respond accurately and bound performance for such systems is an important step towards a key component that can be integrated into cognitive systems, one of the major goals of the PASCAL network.

Probabilistic graphical models are a powerful tool for representing complex multivariate distributions. They have been used with considerable success in many fields, from machine vision and natural language processing to computational biology and channel coding. One of the key challenges in using such models in practice is that the inference problem is computationally intractable in many cases of interest. This has prompted much research on algorithms that approximate the inference task. Examples for such algorithms are loopy belief propagation, then mean field method and Gibbs sampling. Due to the wide use of graphical models it is of key importance to design algorithms that work well in practice. Empirical evaluation in this case is key, since one does not expect approximation algorithms to work well for any problem (due to the theoretical intractability of inference).

The challenge was held as part of the Uncertainty in Artificial Intelligence conference (UAI). The challenge involved several inference tasks (finding the MAP assignment, computing the probability of evidence, calculating marginals, and learning models using approximate inference). Participants provided inference algorithms and these were applied to models from the following domains: machine vision (e.g., segmentation and object detection), computational biology (e.g., protein design and pedigree analysis), constraint satisfaction, medical diagnosis and collaborative filtering, as well as some synthetic problems whose graph structure appears in real world problems (e.g., 2D and 3D grids). Evaluating the state of the art in the field of approximate inference helps guide research in the field. It highlights which methods are particularly promising in which domains. Additionally, since running time was carefully evaluated, it indicates which methods can perform on very large scale data.