The objective of the Challenge is to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval, and statistical language modeling.
The scientific goals are:
- To learn of the phenomena underlying word construction in natural languages
- To discover approaches suitable for a wide range of languages
- To advance machine learning methodology
The results will be presented in a workshop arranged in connection with other PASCAL challenges on machine learning. Please read the rules and see the schedule. The datasets are available for download. Instructions on how to submit your camera-ready documents are given on the Workshop page. We are looking forward to an interesting competition!
Organizers
- Mikko Kurimo, Mathias Creutz and Krista Lagus
- Neural Networks Research Centre, Helsinki University of Technology
Program comittee
- Levent Arslan, Boğaziçi University
- Samy Bengio, IDIAP
- Tolga Cilogu, Middle-East Technical University
- John Goldsmith, University of Chicago
- Kadri Hacioglu, Colorado University
- Chun Yu Kit, City University of Hong Kong
- Dietrich Klakow, Saarland University
- Jan Nouza,Technical University of Liberec
- Erkki Oja, Helsinki University of Technology
- Richard Wicentowski, Swarthmore College
- Murat Saraclar, Boğaziçi University
References
Mathias Creutz and Krista Lagus (2005). Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0. Publications in Computer and Information Science, Report A81, Helsinki University of Technology, March.
Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, Mikko Kurimo, Janne Pylkkönen, and Sami Virpioja (2005). Unlimited vocabulary speech recognition with morph language models applied to Finnish. Preprint accepted for publication in Computer Speech and Language.