This challenge addressed a question of fundamental and practical interest in machine learning: the assessment of data representations produced by unsupervised and transfer learning procedures. By unsupervised learning we mean learning strictly from unlabelled data. In contrast, transfer learning is concerned with learning from labelled data from tasks that are related but different from the task at hand. For instance, the task to be performed may be recognizing alphabetical letters and the training task may be recognizing digits. Several large datasets from various application domains were made available for the evaluation. The task of the participants was to produce a data representation on an evaluation dataset, given both a very large unlabelled development set and an unlabelled evaluation set.
For clarity of the scientific evaluation, a first phase of the challenge focussed strictly on unsupervised learning. It was then followed by a second phase on transfer learning in which a few target values (labels) for other tasks than the evaluation task (training tasks) were provided on development data (details in the PDF file attached).
We used data from five different domains (handwriting recognition, object recognition from still images, action recognition in videos, text processing, sensor data) and specified that the participants made entries on all datasets to demonstrate the versatility of the method employed. The datasets were selected to follow a number of criteria: (1) being of medium difficulty to provide a good separation of results obtained by different approaches, (2) having over 10,000 unlabeled examples, (3) having over 10,000 labeled examples, with more than 10 classes and a minimum of 100 examples per class.
We believe this challenge has helped advance methodology in evaluating unsupervised learning algorithms and channel research effort on the important new problem of transfer learning. Every year, dozens of papers on unsupervised space transformations, dimensionality reduction and clustering get published. Yet, practitioners tend to ignore them and continue using a handful of popular algorithms like PCA, ICA, k-means, and hierarchical clustering. An evaluation free of inventor bias might help identify and popularize algorithms, which have advanced the state of the art. Another aspect of this challenge was to promote research on deep machine learning architectures, which use hierarchical feature representations.