Title: Monolingual Machine Translation (MONOMT 2012).
Date: Nov 1, 2012
Location: San Diego, United States
* Colocated with AMTA 2012 (The Tenth Biennial Conference of the
Association for Machine Translation in the America)
website: http://computing.dcu.ie/~tokita/MONOMT/monomt.htm
DESCRIPTION
Due to the increasing demands for high quality translation, monolingual
Machine Translation (MT) subtasks are frequently encountered in various
occasions, where one MT task is decomposed into several subtasks some of
which can be called `monolingual’. Such monolingual MT subtasks include:
(1) MT for morphologically rich languages, [Bojar, 08] aimed at dealing
with morphologic richness of the target, as is the case with the English-
Czech (EN-CZ) language pair. An MT task is thus split into two subtasks:
first, English is (`bilingually’) translated into simplified Czech and
then, the obtained morphologically normalized Czech is (`monolingually’)
translated into morphologically rich Czech; (2) system combination [Matusov
et al., 05], where a source sentence is first translated into the target
language by several MT systems, and then, the obtained translations are
combined to create / generate the output in the same language; (3)
statistical post-editing [Dugast et al., 07; Simard et al., 07], where a
source sentence is first translated into the target language by a rule-based
MT system and then, the obtained output is `monolingually’ translated by an
SMT system; (4) domain adaptation using transfer learning [Daume III, 07]:
the source side written in a `source’ domain (e.g., newswires) is converted
into the target side written in a `target’ domain (e.g., patents); (5)
transliteration between phonemes / alphabets [Knight and Graehl, 98]; (6)
considering reordering issues (SVO and SOV) [Katz-Brown et al., 11]; (7)
MERT process [Arun et al., 10]; (8) translation memory (TM) and MT
integration [Ma et al., 11]; (9) paraphrasing for creating additional
training data or for evaluation purposes; ((10) error identification
and voting with independent monolingual crowdsources [Hu et al., 11].)
A distinction could be established between bilingual MT tools
(B-tools) and monolingual MT tools (M-tools) that may be exploited for
monolingual MT. Consider, e.g., monolingual subtasks such as MT for
morphologically rich languages, statistical post-editing, or
transliteration and a task of system combination or domain adaptation
as respective representatives. The latter group is often approached
with monolingual M-tools like monolingual word alignment [Matusov et
al., 05; He et al., 08] and the minimization of Bayes risk [Kumar and
Byrne, 02] (on the outputs of combined systems). However, the former
usually employs bilingual MT tools, like GIZA++ [Och and Ney, 04] to
extract bilingual phrases and MAP decoding on them. The way M-tools
and B-tools are used for monolingual MT is an issue of particular
interest for this workshop.
This workshop is intended to provide the opportunity to discuss ideas
and share opinions on the question of the applicability of M-tools or
B-tools for monolingual MT subtasks, and on their respective strengths
and weaknesses in specific settings. Furthermore we wish to provide
opportunity to demonstrate successful usecases of M-tools.
Possible questions, that are encouraged to be addressed during the
workshop, include:
+ ways of applying M-tools to monolingual MT subtasks such as MT for
morphologically rich languages and statistical post-editing.
+ investigation of the suitability of B-tools or M-tools for
monolingual MT subtasks.
+ performance improvements of monolingual word alignment tools, since
these are necessary for specific monolingual subtasks, such as MT for
morphologically rich languages and statistical post-editing.
IMPORTANT DATES
Submission deadline: August 3, 2012
Notification to authors: August 31, 2012
Camera ready: September 7, 2012
Workshop: November 1, 2012
TOPICS OF INTEREST
Original papers are invited on different aspects of monolingual MT, such as:
MT for morphologically rich languages
system combination
statistical post-editing
domain adaptation
MERT process
MT for reordering mismatched language pairs (SVO and SOV, …)
MT-TM integration (i.e. MT systems whose prior knowledge includes
bilingual terminology and TM)
transliteration
MT using textual entailment
MT using confidence estimation
paraphrasing
hybrid MT
…
Papers describing the mechanism of MT tools that may be considered
`monolingual’ are also encouraged. Some possible topics are listed
below:
MBR decoding, consensus decoding
monolingual word alignment (based on TER, METEOR,…)
language models constructed by learning the representation of data
data structure related matters
ranking algorithms
multitask learning (in the context of domain adaptation)
…
SUBMISSION
Authors are invited to submit long papers (up to 10 pages) and short
papers (2 – 4 pages). Long papers should describe unpublished,
substantial and completed research. Short papers should be position
papers, papers describing work in progress or short, focused
contributions. Papers will be accepted until August 3, 2012 in PDF
format via the system: http://www.softconf.com/amta2012/MONOMT2012/
Submitted papers must follow the styles and formatting guidelines
available from the AMTA main conference site (See below). As the
reviewing will be blind, the papers must not include the authors’
names and affiliations. Furthermore, self-references that reveal the
author’s identity, e.g., “We previously showed (Smith, 1991) …” must
be avoided. Instead, use citations such as “Smith previously showed
(Smith, 1991) …” Papers that do not conform to these requirements
will be rejected without review.
Workshop Chairs
Tsuyoshi Okita (Dublin City University, Ireland)
Artem Sokolov (LIMSI, France)
Taro Watanabe (NICT, Japan)
PROGRAM COMMITTEE (Tentative)
Bogdan Babych (University of Leeds, UK)
Loic Barrault (LIUM, Universite du Maine, France)
Nicola Bertoldi (FBK, Italy)
Ergun Bicici (CNGL, Dublin City University, Ireland)
Ondrej Bojar (Charles University, Czech)
Boxing Chen (NRC Institute for Information Technology, Canada)
Trevor Cohn (University of Sheffield, UK)
Marta Ruiz Costa-jussa (Barcelona Media, Spain)
Josep M. Crego (SYSTRAN, France)
John DeNero (Google, USA)
Jinhua Du (Xi’an University of Technology, China)
Kevin Duh (Nara Institute of Science and Technology, Japan)
Chris Dyer (CMU, USA)
Christian Federmann (DFKI, Germany)
Yvette Graham (Dublin City University, Ireland)
Barry Haddow (University of Edinburgh, UK)
Xiadong He (Microsoft, USA)
Jagadeesh Jagarlamudi (University of Maryland, USA)
Jie Jiang (Applied Language Solutions, UK)
Philipp Koehn (University of Edinburgh, UK)
Shankar Kumar (Google, USA)
Alon Lavie (CMU, USA)
Yanjun Ma (Baidu, China)
Aurelien Max (LIMSI, University Paris Sud, France)
Maite Melero (Barcelona Media, Spain)
Philip Resnik (University of Maryland, USA)
Stefan Riezler (University of Heidelberg, Germany)
Lucia Specia (University of Sheffield, UK)
Marco Turchi (JRC, Italy)
Antal van den Bosch (Radboud University Nijmegen, Netherlands)
Xianchao Wu (Baidu, Japan)
Dekai Wu (HKUST, Hong Kong)
Francois Yvon (LIMSI, University Paris Sud, France)