We present a biological data-mining problem that poses a number of significant challenges; the available data: (i) are of high dimensionality but of extremely small sample size, (ii) come from different sources which correspond to different biological levels, (iii) exhibit a high degree of feature dependencies and interactions within and between the different sources; some of the interactions between the different sources are known and available as background knowledge, and (iv) are incomplete.
This data was obtained from patients with Obstructive Nephropathy (ON) which is the most frequent nephropathy observed among newborns and children, and the first cause of end stage renal disease usually treated by dialysis or transplantation. The goal is to construct diagnostic models that accurately connect the biological levels to the severity of the pathology. We particularly welcome data mining approaches and learning methods that are able to accommodate the available background information in order to address the formidable challenge of high dimensionality small sample size of our setting and deliver better models.
A prize is envisaged for the top performing approaches (2500EU in total). The price is sponsored by Rapid-I the company that supports RapidMiner, probably the most popular open-source data mining environment, and the European Commission through the e-Lico EU project. Participants are expected to prepare a paper, maximum 8 pages, describing their approach. We plan to have a number of selected papers considered for publication in a special issue of a journal (to be announced soon).
Challenge web page: http://tunedit.org/challenge/ON .
Started: Sep 15, 2010
Ends: Dec 19, 2010
– Alexandros Kalousis, University of Geneva, Switzerland
– Julie Klein, Inserm U858, Toulouse, France
– Joost Schanstra, Inserm U858, Toulouse, France
– Adam Woznica, University of Geneva, Switzerland