This challenge addresses a question of fundamental and practical interest in machine learning: the assessment of data representations produced by unsupervised learning procedures, for use in supervised learning tasks. It also addresses the evaluation of transfer learning methods capable of producing data representations useful across many similar supervised learning tasks, after training on supervised data from only one of them.

Classification problems are found in many application domains, including in pattern recognition (classification of images or videos, speech recognition), medical diagnosis, marketing (customer categorization), and text categorization (filtering of spam). The category identifiers are referred to as “labels”. Predictive models capable of classifying new instances (correctly predicting the labels) usually require “training” (parameter adjustment) using large amounts of labeled training data (pairs of examples of instances and associated labels). Unfortunately, few labeled training data may be available due to the cost or burden of manually annotating data. Recent research has been focusing on making use of the vast amounts of unlabeled data available at low cost including: space transformations, dimensionality reduction, hierarchical feature representations (“deep learning”), and kernel learning. However, these advances tend to be ignored by practitioners who continue using a handful of popular algorithms like PCA, ICA, k-means, and hierarchical clustering. The goal of this challenge is to perform an evaluation of unsupervised and transfer learning algorithms free of inventor bias to help to identify and popularize algorithms that have advanced the state of the art.

Five datasets from various domains are made available. The participants should submit on-line transformed data representations (or similarity/kernel matrices) on a validation set and a final evaluation set in a prescribed format. The data representations (or similarity/kernel matrices) are evaluated by the organizers on supervised learning tasks unknown to the participants. The results on the validation set are displayed on the learderboard to provide immediate feed-back. The results on the final evaluation set will be revealed only at the end of the challenge. To emphasize the capability of the learning systems to develop useful abstractions, the supervised learning tasks used to evaluate them make use of very few labeled training examples and the classifier used is a simple linear discriminant classifier. The challenge will proceed in 2 phases:

  • Phase 1 — Unsupervised learning: There exist a number of methods that produce new data representations (or kernels) from purely unlabeled data. Such unsupervised methods are sometimes used as preprocessing to supervised learning procedures. In the first phase of the challenge, no labels will be provided to the participants. The participants are requested to produce data representations (or similarity/kernel matrices) that will be evaluated by the organizers on supervised learning tasks (i.e. using labeled data not available to the participants).
  • Phase 2 — Transfer learning: In other practical settings, it is desirable to produce data representations that are re-usable from domain to domain. We want to examine the possibility that a representation developed with one set of labels can be used to learn a new, similar task more easily. For example, in the handwriting recognition domain, labeled handwritten digits would be available for training. The evaluation task would then be the recognition of handwritten alphabetical letters. We call this setting “transfer learning”. In the second phase of the challenge, some labels will be provided to the participants for the same datasets used in the first phase. This will allow the participants to improve their data representation (or similarity/kernel matrices) using supervised tasks similar (but different) from the task on which they will be tested.

Competition Rules

  • Goal of the challenge: Given a data matrix of samples represented as feature vectors (p samples in rows and n features in columns), produce another data matrix of dimension (p, n’) (the transformed representation of n’ new features) or a similarity/kernel matrix between samples of dimension (p, p). The transformed representations (or similarity/kernel matrices) should provide good results on supervised learning tasks used by the organizers to evaluate them. The labels of the supervised learning tasks used for evaluation purpose will remain unknown to the participants in phase 1 and 2, but other labels will be made available for transfer learning in phase 2.
  • Prizes: The winners of each phase will be awarded prizes see the Prizes page for details.
  • Dissemination: The challenge is part of the competition program of the IJCNN 2011 conference, San Jose, California, July 31 – August 5, 2011. We are organizing a special session and a competition workshop at IJCNN 2011 to discuss the results of the challenge. We are also organizing a workshop at ICML 2011, Bellevue, Washington, July 2, 2011. There are three publications opportunities, in JMLR W&CP and in the IEEE proceedings of IJCNN 2011 and in the ICML proceedings.
  • Schedule:
    Dec. 25, 2010 Start of the development period. Phase 0: Registration and submissions open. Rules, toy data, and sample code made available.
    Jan. 3, 2010 Start of phase 1: UNSUPERVISED LEARNING. Datasets made available. No labels available.
    Feb. 1, 2011 IJCNN 2011 papers due (optional).
    March 3, 2011 End of phase 1, at midnight (0 h Mar. 4, server time — time indicated on the Submit page).
    March 4, 2011 Start of phase 2: TRANSFER LEARNING. Training labels made available for transfer learning.
    April 1, 2011 IJCNN paper decision notification.
    April 15, 2011 End of the challenge at midnight (0 h April 16, server time — time indicated on the Submit page). Submissions closed. [Note: the grace period until April 20 has been canceled]
    April 22, 2011 All teams must turn in fact sheets (compulsory). The fact sheets will be used as abstracts for the workshops. Reviewers and participants are given access to provisional rankings and fact sheets.
    April 29, 2011 ICML 2011 papers due, to be published in JMLR W&CP (optional).
    May 1, 2011 Camera ready copies of IJCNN papers due.
    May 20, 2011 Release of the official ranking. Notification of abstract and paper acceptance.
    July 2, 2011 Workshop at ICML 2011, Bellevue, Washington state, USA. Confirmed.
    July 31 – Aug. 5, 2011 Special session and workshop at IJCNN 2011, San Jose, California, USA. Confirmed.
    Aug. 7, 2011 Reviews of JMLR W&CP papers sent back to authors.
    Sep. 30, 2011 Revised JMLR W&CP papers due.
  • Challenge protocol: (1) Development: From the outset of the challenge, all unlabeled development and evaluation data will be provided to the participants. All data will be preprocessed in a feature representation, such that the patterns are not easily recognizable by humans, making it difficult to label data using human experts. During development the participants may make submissions of a feature-based representation (or a similarity/kernel matrix) for a subset of the evaluation data (called validation set). They will receive on-line feed-back on the quality of their representation (or similarity measure) with a number of scoring metrics. (2) Final evaluation: To participate in the final evaluation the participants will have to (i) register as mutually exclusive teams; (ii) make one “final” correct submission of a feature based representation (or similarity/kernel matrix) for the final evaluation data for all 5 datasets of the challenge, (iii) submit the answers to a questionnaire on their method (method fact-sheet) and (iv) compete either in one of the two phases only or in both phases (it is not necessary to compete in both phases to earn prizes).
  • Baseline results: Results using baseline methods will be provided on the website of the challenge by the organizing team. Those results will be clearly marked as “ULref”. The most basic baseline result is obtained using the raw data. To qualify for prizes, the participants should exceed the performances on raw data for all the datasets of the challenge.
  • Eligibility of participation: Anybody complying with the rules of the challenge, with the exception of the organizers, is eligible to enter the challenge. To enter results and get on-line feed-back, the participants must make themselves known to the organizers by registering and providing a valid email so the organizers can communicate with them. However the participants may remain anonymous to the outside world. To participate in the final test rounds, the participants will have to register as teams. No participant will be allowed to enter as part of several teams. The team leaders will be responsible for ensuring that the team respects the rules of the challenge. There is no commitment to deliver code, data or publish methods to participate in the development phase, but the team leaders will be requested to fill out fact sheets with basic information on their methods to be able to claim prizes. When the challenge is over and the results are known, the teams who want to claim a prize will have to reveal their true identity to the outside world.
  • Anonymity: All entrants must identify themselves to the organizers. However, only your “Workbench id” will be displayed in result tables and you may choose a pseudonym to hide your identity to the rest of the world. Your emails will remain confidential.
  • Data: Datasets from various domains and various difficulty are available to download from the Data page. No labels are made available during phase 1. Some labels will be made available for transfer learning at the beginning of phase 2. Reverse-enginering the datasets to gain information on the identity of the patterns in the original data is forbidden. If it is suspected that this rule was violated, the organizers reserve the right to organize post-challenge verifications to which the top ranking participants will have to comply to earn prizes.
  • Submission method: The method of submission is via the form on the Submit page. To be ranked, submissions must comply with the Instructions. Robot submissions are permitted. If the system gets overloaded, the organizers reserve the right to limit the number of submissions per day per participant. We recommend not to exceed 5 submissions per day per participant. If you encounter problems with the submission process, please contact the Challenge Webmaster (see bottom of page).
  • Ranking: The method of scoring is posted on the Evaluation page. If the scoring method changes, the participants will be notified by email by the organizers.
    – During the development period (phase 1 and 2), the scores on the validation sets will be posted in the Leaderboard table. The participants are allowed to make multiple submissions on the validation sets.
    – The results on the final evaluation set will only be released after the challenge is over. The participants may make multiple submissions on the final evaluation sets, to avoid the last minute rush. However, for each registered team, only ONE final evaluation set submission for each dataset of the challenge be taken into account. These submissions will have to be grouped under the same “experiment” name. The team leader will designate which experiment should be taken into account for the final ranking. For each phase, the teams will be ranked by for each individual dataset and the winner will be determined by the best average rank over all datasets.