Unsupervised Segmentation of Words into Morphemes Challenge

The objective of the Challenge is to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval, and statistical language modeling.

The scientific goals are:

To learn of the phenomena underlying word construction in natural languages
To discover approaches suitable for a wide range of languages
To advance machine learning methodology

The results will be presented in a workshop arranged in connection with other PASCAL challenges on machine learning. Please read the rules and see the schedule. The datasets are available for download. Instructions on how to submit your camera-ready documents are given on the Workshop page. We are looking forward to an interesting competition!

Organizers

Mikko Kurimo, Mathias Creutz and Krista Lagus
Neural Networks Research Centre, Helsinki University of Technology

Program comittee

Levent Arslan, Boğaziçi University
Samy Bengio, IDIAP
Tolga Cilogu, Middle-East Technical University
John Goldsmith, University of Chicago
Kadri Hacioglu, Colorado University
Chun Yu Kit, City University of Hong Kong
Dietrich Klakow, Saarland University
Jan Nouza,Technical University of Liberec
Erkki Oja, Helsinki University of Technology
Richard Wicentowski, Swarthmore College
Murat Saraclar, Boğaziçi University

References

Mathias Creutz and Krista Lagus (2005). Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0. Publications in Computer and Information Science, Report A81, Helsinki University of Technology, March.

Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, Mikko Kurimo, Janne Pylkkönen, and Sami Virpioja (2005). Unlimited vocabulary speech recognition with morph language models applied to Finnish. Preprint accepted for publication in Computer Speech and Language.

Knowledge 4 All Foundation Ltd.