Goal
The goal is to select/design a classifier (and any pre-processing systems, including a feature extractor) that correctly classifies EEG data into one of two classes. The winner will be the submission that maximizes the area under the ROC curve.
Eligibility
Anyone that has an interest in machine learning and that has access to Matlab®.
Registration
Registration is not required. However, if you wish to receive important updates on the competition by email then please send a request to hildk@bme.ogi.edu.
Training Data
The training data may be downloaded from mlsp2010TrainingData.mat (44 MB). The training data, which is in Matlab® format, consist of EEG data collected while a subject viewed satellite images that were displayed in the center of an LCD monitor approximately 43 cm in front of them.
There are 64 channels of EEG data. The total number of samples is 176378. The sampling rate is 256 Hz. There are 75 blocks and 2775 total satellite images. Each block contains a total of 37 satellite images, each of which measures 500 x 500 pixels. All images within a block are displayed for 100 ms and each image is displayed as soon as the preceding image is finished. Each block is initiated by the subject after a rest period, the length of which was not specified in advance.
The subject was instructed to fixate on the center of the images and to press the space bar whenever they detected an instance of a target, where the targets are surface-to-air missile (SAM) sites. Subjects also needed to press the space bar to initiate a new block and to clear feedback information that was displayed to the subject after each block.
We expect a particular neural signature, the P300, to occur whenever the subject detects an instance of a target in one of the satellite images. The P300 gets its name from the fact that detection of a (rare, pre-defined) target causes a deflection of the EEG signal aproximately 300 ms after the visual stimulus is presented to the subject. The P300 is most prominent in the midline channels (e.g., Pz, Cz, Fz; see cap_64_layout_medium.jpg for the layout of the channels). In addition, there is a separate neural signature associated with the pressing of the space bar. For our data, this second neural signal occurs around 500-600 ms after the stimulus containing the target is presented.
The variables in the training data are defined as follows,
Test Data
The test data are not made available to the participants. Instead, participants will submit Matlab® code, which the competition chairs will test by passing the test data to the submitted code.
Like before, there are 64 channels of EEG data, the sampling rate is 256 Hz, there is no delay between images within the same block, the subject rests between blocks for as long as they wish, and the subject pressed the space bar to signify that they detected a target, to initiate a new block, and to clear feedback information that was displayed after each block.
Unlike before, the test data consist of 890 blocks and 9891 satellite images, the total number of samples of EEG data is 1603334, every other image within a block is a mask image (mask images do not contain targets), the buttonTrigger variable is not available, and the imageTrigger variable takes only values of 0 or 1, where a 1 corresponds to the onset of each prospective target image (i.e., the satellite images) and 0 is used elsewhere. Another difference is that 4 different image durations are used in the test data. The image durations, which apply to both satellite and mask images, are (approximately) 50 ms, 100 ms, 150 ms, and 200 ms. All images within a given block have the same image duration and all blocks having a specified image duration are grouped together. Each block contains 22, 10, 7, and 5 prospective target images when the image duration is 50 ms, 100 ms, 150 ms, and 200 ms, respectively. Keep in mind that the time difference between successive prospective target images is twice the corresponding image duration due to the presence of the mask images. Hence, successive prospective target images within a block appear every (approximately) 100 ms, 200 ms, 300 ms, or 400 ms.
Submission
A successful submission consists of (1) the names of the team members (each person may belong to at most two teams and each team is allowed a single submission), (2) the name(s) of the host institutions of the researchers, (3) a 1-3 paragraph description of the approach used, and (4) Matlab® code, myFunction.m, which must conform to the requirements below.
The competition chairs will call the submitted code using,
>> [out] = myFunction(eegData,t,imageTrigger,eegLabel,eegCoord,thr);
where the first 5 input variables correspond to the test data, out is a (9891 x 1) vector consisting of 0’s and 1’s (each 0 corresponds to a predicted non-target and each 1 corresponds to a predicted target) and thr is a user-defined threshold that biases the predicted class (0 < thr < 1). When thr = 0, the code should predict that all prospective target images are non-targets and, when thr = 1, the code should predict that all prospective target images are targets. The thr variable will be used to construct an ROC curve. Example code that demonstrates the use of thr can be downloaded from myFunction.m. The code should not write to any drive, it must finish running in a reasonable time (on the order of minutes), and must consist only of regular, uncompiled Matlab® code (P-code, mex files, and compiled code are, e.g., forbidden). Parameter values for the trained classifier can be hardcoded into myFunction.m. Alternatively, entrants can read parameter values into myFunction.m from a single user-supplied *.mat file.
Note
prior to submitting, we recommend testing the code on the training data (after performing the following: “f=find(imageTrigger==2); imageTrigger(f)=1;”).
Deadline
Submissions must be emailed to hildk@bme.ogi.edu no later than April 8, 2010.
Publication
The MLSP 2010 proceedings will include a publication, written by the competition chairs, which describes the competition and shows the comparison of the submitted methods. The committee chairs will invite up to three teams to submit a two-page summary that discusses the method they used in the competition (the mathematical notation must be consistent with the paper authored by the competition chairs). Pending approval, the invited summaries will be published in the conference proceedings. The process of selecting teams to which invitations to publish will be sent is distinct from selecting the winner of the competition. Invitations to publish will be based on both (1) utilization of machine learning principles and (2) performance in terms of area under ROC curve.
Awards
Up to two N900 high-performance mobile computers from Nokia will be awarded as prizes. The 2010 MLSP Competition Committee will distribute one award to each team or teams that they select, where the selection is based on: (1) the performance of the submitted methods and (2) the requirement that at least one member of each selected team attend the 2010 MLSP Conference. Members of the 2010 MLSP Competition Committee (and everyone belonging to any of the labs of the 2010 MLSP Competition Committee) are not eligible for this award.
In addition to the Nokia mobile devices, the MLSP Competition Committee will award a maximum of 1500 Euros to the selected winning team(s). The monetary award(s), provided by the PASCAL2 Challenge Program, are intended to defray some of the expenses related to traveling to and attending the MLSP 2010 Conference. More details will be made available at a later date.
Resources
[1] K.B. Campbell, E. Courchesne, T.W. Picton, and K.C. Squires, “Evoked potential correlations of human information processing,” Biological Psychology, Vol. 8, pp. 45-68, 1979.
[2] T.W. Picton, “The P300 wave of the human event-related potential,” Journal of Clinical Neurophysiology, Vol. 9, No. 4, pp. 456-479, 1992.
[3] Barry S. Oken, “Evoked Potentials in Clinical Medicine,” edited by K.H. Chiappa, Lippincott-Raven, Philadelphia, PA, 1997.