ICMLA 2010 Speaker Clustering Challenge

June 27, 2010

PASCAL2

Call for Papers

ICMLA 2010 Speaker Clustering Challenge
Washington DC, USA, 12-14 Dec. 2010

http://www.icmla-conference.org/icmla10/CFP_Challenge1_files/CFP_Challenge1.html

OVERVIEW:
Learning methods for sequential data are receiving widespread attention in recent years. This kind of data arises in many interesting scenarios, where the individual semantic units are no longer single vectors but collections of vectors. As examples of these kind of scenarios, we can cite multimedia analysis (e.g., video understanding, speaker recognition), bioinformatics (e.g., DNA or protein sequences), etc. Sequences can have different lengths, so standard distance measures for vector spaces are not directly applicable.
Moreover, sometimes the information conveyed by the sequences is encoded not just on the individual vectors themselves, but also in the dynamics under which these vectors evolve along time. In order to capture such information, it is usual to employ dynamic models such as hidden Markov models or more general dynamic Bayesian networks. Then, distances between sequences can be defined using the learned models.
However, there are many scenarios where the sequences can be accurately classified or clustered without attending their dynamic characteristics. Examples include bag-of-words models for image analysis, speech-independent speaker verification, etc. In these cases the sequences can be viewed as sets of independent and identically distributed (i.i.d.) samples, and can thus be characterized in terms of their underlying probability density function (PDF). There are many ways of defining affinities or distances between PDFs, from the classic Kullback-Leibler or Bhattacharya divergences (even in feature space) to the recently proposed Probability Product Kernels.
In this challenge we propose to focus on unsupervised methods for sequential data. Specifically, clustering of speech data. Clustering tries to find coherent (in some sense) disjoint groups within a dataset. It does not require any training examples, so it is a very important tool for exploratory data analysis. Furthermore, clustering algorithms can be easily expanded into semi-supervised methods which are very useful when the labelling process is costly.

CHALLENGE FORMAT
This challenge proposes two different tasks:
* 2-class speaker clustering
* Multiclass speaker clustering
The first task is 2-class speaker clustering. For this task we provide 7 datasets, each one of them comprised of speech coming from two different speakers. The participants should then identify two clusters within each dataset.
The more advanced task is multiclass speaker clustering. This task is to be carried out on a single dataset, which is formed by sequences coming from an unknown number of speakers in the range. Participants should discover the number of speakers and perform an adequate clustering.
Both tasks are based on a speech database recorded using a PDA. It includes both male and female speakers. Each subject recorded 50 isolated words, and the mean length of each utterance is around 1.3 seconds. The original audio files were processed using the HTK software, yielding a standard parametrization consisting of 12 Mel-frequency cepstral coefficients (MFCCs), an energy term and their respective increments, giving a total of 26 parameters. These parameters were obtained every 10ms with a 25ms analysis window, yielding 26-dimensional sequences of around 130 samples. Any further pre-processing (normalization, filtering, …) is up to the participants.
Participants can submit their results for just one of the tasks or for the two of them. For details on how to format the results, please contact the organizers.

SUBMISSION AND EVALUATION:
Apart from the actual results, a short paper (4 pages) describing the proposed algorithms should be submitted through the main conference submission website. These papers will be reviewed mainly based on:
• Originality and technical soundness of the employed distance measures
• Coherence of the discovered clusters w.r.t. the speakers
• In the multiclass task, special attention will be paid to the steps toward the correct identification of the number of speakers

PUBLICATION:
Accepted papers will be published in the ICMLA’10 conference proceedings.

IMPORTANT DATES:
Paper Submission Deadline: July 15, 2010
Notification of acceptance: September 7, 2010
Camera-ready papers & Pre-registration: October 1, 2010

ICMLA 2010 Challenge Organizers:
* Darío García-García, University Carlos III Madrid, Spain (dggarcia(at)tsc.uc3m.es)
* Raúl Santos-Rodríguez, University Carlos III Madrid, Spain (rsrodriguez(at)tsc.uc3m.es)