This project is dedicated to stimulate research and reveal the state-of-the art in “model selection” by organizing a competition followed by a workshop. Model selection is a problem in statistics, machine learning, and data mining. Given training data consisting of input-output pairs, a model is built to predict the output from the input, usually by fitting adjustable parameters. Many predictive models have been proposed to perform such tasks, including linear models, neural networks, trees, and kernel methods. Finding methods to optimally select models, which will perform best on new test data, is the object of this project. The competition will help identifying accurate methods of model assessment, which may include variants of the well-known cross-validation methods and novel techniques based on learning theoretic performance bounds. Such methods are of great practical importance in pilot studies, for which it is essential to know precisely how well desired specifications are met.

The Challenge

The aim of the challenge in performance prediction is to find methods to predict how accuratly a given predictive model will perform on test data, on ALL five benchmark datasets. To facilitate entering results for all five datasets, all tasks are two-class classification problems. You can download the datasets from the table below:


Dataset Size Type Features Training Examples Validation Examples Test Examples
ADA 0.6 MB Dense 48 4147 415 41471
GINA 19.4 MB Dense 970 3153 315 31532
HIVA 7.6 MB Dense 1617 3845 384 38449
NOVA 2.3 MB Sparse binary 16969 1754 175 17537
SYLVA 15.6 MB Dense 216 13086 1308 130858

At the start of the challenge, participants had only access to labeled training data and unlabeled validation and test data. The submissions were evaluated on validation data only. The validation labels have been made available (one month before the end of the challenge). *** DOWNLOAD THE VALIDATION SET LABELS *** . The final ranking will be based on test data results, to be revealed only when the challenge is over.

Dataset Formats

All the data sets are in the same format and include 5 files in ASCII format:

  • dataname.param – Parameters and statistics about the data
  • – Training set (a sparse or a regular matrix, patterns in lines, features in columns).
  • – Validation set.
  • – Test set.
  • dataname_train.labels – Labels (truth values of the classes) for training examples.

The matrix data formats used are (in all cases, each line represents a pattern):

  • dense matrices – a space delimited file with a new-line character at the end of each line.
  • sparse binary matrices – for each line of the matrix, a space delimited list of indices of the non-zero values. A new-line character at the end of each line.

If you are a Matlab user, you can download some sample code to read and check the data.

The Challenge Learning Object package (CLOP)

A Matlab(R) library of models to perform the tasks of the challenge is provided for your convenience. You are not required to use this package, you can write your own code.

Download CLOP

CLOP may be downloaded and used freely for the purposes of the challenge. Please make sure you read the license agreement and the disclaimer. CLOP is based on the Spider developed at the Max Planck Institute for Biological Cybernetics and integrates software from several sources, see the credits. Download CLOP now (beta version, 4.7 MB.)

Installation requirements

CLOP runs with Matlab (Version 12 or greater) using either Linux or Windows.

Installation instructions

Unzip the archive and follow the instructions in the README file. Windows users will just have to run a script to set the Matlab path properly. Unix users will have to compile the LibSVM package if they want to use support vector machines. The Random Forest package is presently not supported under Unix.

Getting started

The sample code provided gives you and easy way of getting started. Consult the CLOP FAQ for further information.

Bugs and improvements

Please report bugs to Suggestions and code improvements are also welcome.

Bonus entries

We have canceled the option to make “bonus entries” using CLOP. This part of the challenge will be replaced by a post-challenge game to be announced.

Results File Formats

The results on each dataset should be formatted in ASCII files according to the following table. If you are a Matlab user, you may find some of the sample code routines useful for formatting the data. You can view an example of each format from the filename column. Optionally, you may submit your models in Matlab format.


Filename Development Challenge Description File Format
[dataname]_train.resu Optional Compulsory Classifier outputs for training examples +/-1 indicating class prediction.
[dataname]_valid.resu Compulsory Compulsory Classifier outputs for validation examples
[dataname]_test.resu Optional Compulsory Classifier outputs for test examples
[dataname]_train.conf Optional+ Optional+ Classifier confidence for training examples Non-negative real numbers indicating the confidence in the classification (large values indicating higher confidence). They do not need to be probabilities, and can be simply absolute values of discriminant values. Optionally they can be normalized between 0 and 1 to be interpreted as abs(P(y=1|x)-P(y=-1|x)).
[dataname]_valid.conf Optional+ Optional+ Classifier confidence for validation examples
[dataname]_test.conf Optional+ Optional+ Classifier confidence for test examples
[dataname].guess Optional* Compulsory* Your prediction of the BER (Balanced Error Rate) that you will achieve on test data A single number between 0 and 1.
[dataname]_model.mat Optional Optional The trained CLOP model used to compute the submitted results A Matlab learning object saved with the command save(‘[dataname]_model.mat’, ‘modelname’).

+ If no confidence file is supplied, equal confidence will be assumed for each classification. If confidences are not between 0 and 1, they will be divided by their maximum value.
* If no guess file is supplied it will be assumed that the predicted BER is 1 (which is highly detrimental to your submission). Optionally, you may add a second number indicating the error bar on your guess.

Results Archive Format

Submitted files must be in either a .zip or .tar.gz archive format. You can download the example zip archive or the example tar.gz archive to help familiarise yourself with the archive structures and contents (the results were generated with the sample code). Submitted files must use exactly the same filenames as in the example archive. If you use tar.gz archives please do not include any leading directory names for the files. Use

zip *.resu *.conf *.guess *.mat


tar cvf results.tar *.resu *.conf *.guess *.mat; gzip results.tar

to create valid archives.

Challenge Submissions

If you wish that your method is ranked on the overall table you should include classification results on ALL the datasets for the five tasks, but this is mandatory only for final submissions.

The method of submission is via the form on the submissions page. Please limit yourself to 5 submissions per day maximum. If you encounter problems with submission, please contact the Challenge Webmaster.

Your last 5 valid submissions will count towards the final ranking. (There are no more “bonus entries”). The deadline for submissions is March 1, 2006.


The results are evaluated according to the following performance measures. The validation set is used for ranking during the development period. The test set will be used for the final ranking.

Performance Measures

The results for a classifier can be represented in a confusion matrix, where a,b,c and d represent the number of examples falling into each possible outcome:


Class -1 Class +1
Truth Class -1 a b
Class +1 c d


Balanced Error Rate (BER)

The balanced error rate is the average of the errors on each class: BER = 0.5*(b/(a+b) + c/(c+d)). During the development period, the ranking is performed according to the validation BER.

Area Under Curve (AUC)

The area under curve is defined as the area under the ROC curve. This area is equivalent to the area under the curve obtained by plotting a/(a+b) against d/(c+d) for each confidence value, starting at (0,1) and ending at (1,0). The area under this curve is calculated using the trapezoid method. In the case when no confidence values are supplied for the classification the curve is given by {(0,1),(d/(c+d),a/(a+b)),(1,0)} and AUC = 1 – BER.

BER guess error

The BER guess error (deltaBER) is the absolute value of the difference between the BER you obtained on the test set (testBER) and the BER you predicted (predictedBER). The predicted BER is the value supplied in the .guess file.

deltaBER = abs(predictedBER – testBER)

Test score

The final ranking is based on the “test score” computed from the test set balanced error rate (testBER) and the “BER guess error” (deltaBER), both of which should be made as low as possible. The test score is computed according to the formula:

E = testBER + deltaBER * (1- exp(-deltaBER/sigma))

where sigma is the standard error on testBER. See the FAQ for details.