Appel: New Extended Deadline CFP - ML4HMT-12 Workshop at COLING 2012

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Oct 17 11:30:21 UTC 2012

Date: Mon, 15 Oct 2012 16:20:39 +0200
From: Maite Melero <maite.melero at>
Message-ID: <16B44A00C6287E46A454C5F9A5914F6FE16646A7CE at>

-----Apologies for duplicate postings-----


“Second Workshop on Applying Machine Learning Techniques to Optimise the
Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task)” at

Mumbai (India), 9th December, 2012 


The workshop and associated shared task are an effort to trigger a
systematic investigation on improving state-of-the-art hybrid machine
translation, making use of advanced machine-learning (ML)
methodologies. It follows the ML4HMT-11 workshop which took place last
November in Barcelona. The first workshop also road-tested a shared task
(and associated data set) and laid the basis for a broader reach in

Regular Papers ML4HMT-12
We are soliciting original papers on hybrid MT, including (but not
limited to):

* use of machine learning methods in hybrid MT;
* system combination: parallel in multi-engine MT (MEMT) or sequential
  in statistical post-editing (SPMT);
* combining phrases and translation units from different types of MT;
* syntactic pre-/re-ordering;
* using richer linguistic information in phrase-based or in hierarchical
* learning resources (e.g., transfer rules, transduction grammars) for
  probabilistic rule-based MT.

Full papers should be anonymous and follow the COLING full paper format
( To submit
contributions, please follow the instructions at the Workshop management
system submission website: The contributions will
undergo a double-blind review by members of the programme committee.

Shared Task ML4HMT-12

The main focus of the Shared Task is to address the question:

"Can Hybrid MT and System Combination techniques benefit from extra
information (linguistically motivated, decoding, runtime, confidence
scores, or other meta-data) from the systems involved?"

Participants are invited to build hybrid MT systems and/or system
combinations by using the output of several MT systems of different
types, as provided by the organisers.

While participants are encouraged to use machine learning techniques to
explore the additional meta-data information sources, other general
improvements in hybrid and combination based MT are welcome to
participate in the challenge.
For systems that exploit additional meta-data information the challenge
is that additional meta-data is highly heterogeneous and (individual)
system specific.

Data: The ML4HMT-12 Shared Task involves (ES-EN) and (ZH-EN) data sets,
in each case translating into EN.

* (ES-EN): Participants are given a development bilingual set aligned at
  a sentence level. Each "bilingual sentence" contains: 1) the source
  sentence, 2) the target (reference) sentence and 3) the corresponding
  multiple output translations from four systems, based on different MT
  approaches (Apertium, Ramirez-Sanchez, 2006; Lucy, Alonso and
  Thurmair, 2003; Moses, Koehn et. al., 2007). The output has been
  annotated with system-internal meta-data information derived from the
  translation process of each of the systems.

* (ZH-EN) A corresponding data set for ZH-EN with output translations
  from three systems (Moses, ICT_Chiero, Mi et. al., 2009;and Huajian
  RBMT) will be provided. (Participants are required to fill out a
  shared task evaluation agreement form and obtain the ZH-EN data from

Participants are challenged to build an MT mechanism where possible
making effective use of the system-specific MT meta-data output. They
can provide solutions based on opensource systems, or develop their own
mechanisms. The development set can be used for tuning the systems
during the development phase. Final submissions have to include
translation output on a test set, which will be made available one week
after training data release. Data will be provided to build
language/reordering models, possibly re-using existing resources from MT

Participants can also make use of additional (linguistic analysis,
confidence estimation etc.) tools, if their systems require so, but they
have to explicitly declare this upon submission, so that they are judged
as "unconstrained" systems. This will allow for a better comparison
between participating systems.

Shared task results should be submitted via email attachment. Please
compress your results as .zip or .gz archive and send them to
cfedermann at Use "ML4HMT-12 Shared Task Submission" as mail
subject. Shared task results are due by October 28th.

System output will be judged via peer-based human evaluation as well as
automatic evaluation. During the evaluation phase, participants will be
requested to rank system outputs of other participants through a
web-based interface (Appraise, Federmann 2010). Automatic metrics
include BLEU (Papineni et. Al, 2002), TER (Snover et al., 2006) and
METEOR (Lavie, 2005).

Results from the automatic evaluation of submitted shared task results
will be made available to participants on November 1st so that they
could be referred to in system description papers. As the manual
evaluation will take longer, its results will be presented and published
at the workshop.

Workshop Participation
If you are interested in our workshop and intend to participate, we'd
much appreciate if you could inform us about your participation intent
beforehand so that we can better plan the workshop; to do so, send an
email to cfedermann at

Important Dates 2012
15th August: Shared task Training data release (updated ML4HMT corpus)
23rd August: Shared task Test data release
22nd October: Workshop full paper submission deadline
28th October: Shared task Translation results submission deadline
31st October: Workshop paper accept/reject notification
1st November: Shared task Evaluation results release
4th November: Shared Task system description paper submision
11th November: Shared Task system description paper accept/reject
18th November: Workshop and Shared task Camera ready paper due
9th December: ML4HMT-12 Workshop

- Prof. Josef van Genabith, Dublin City University (DCU) and Centre for
  Next Generation Localisation (CNGL)
- Prof. Toni Badia, Universitat Pompeu Fabra and Barcelona Media (BM)
- Christian Federmann, German Research Center for Artificial
  Intelligence (DFKI), contact person: cfedermann at
- Dr. Maite Melero, Barcelona Media (BM)
- Dr. Marta R. Costa-jussà, Barcelona Media (BM)
- Dr. Tsuyoshi Okita, Dublin City University (DCU)

Program committee
- Eleftherios Avramidis (German Research Center for Artificial Intelligence, Germany)
- Prof. Sivaji Bandyopadhyay (Jadavpur University, India)
- Dr. Rafael Banchs (Institute for Infocomm Research - I2R, Singapore)
- Prof. Loïc Barrault (LIUM - University of Le Mans, France)
- Prof. Antal van den Bosch (Centre for Language Studies, Radboud University Nijmegen, Netherlands)
- Dr. Grzegorz Chrupala (Saarland University, Saarbrücken, Germany)
- Prof. Jinhua Du (Xi'an University of Technology (XAUT), China)
- Dr. Andreas Eisele (Directorate-General for Translation (DGT), Luxembourg)
- Dr. Cristina España-Bonet (Technical University of Catalonia, TALP, Barcelona)
- Dr. Declan Groves (Center for Next Generation Localisation, Dublin City University, Ireland)
- Prof. Jan Hajic (Institute of Formal and Applied Linguistics, Charles University in Prague)
- Prof. Timo Honkela (Aalto University, Finland)
- Dr. Patrick Lambert (LIUM - University of Le Mans, France)
- Prof. Qun Liu (Institute of Computing Technology, Chinese Academy of Sciences, China)
- Dr. Maite Melero (Barcelona Media Innovation Center, Spain)
- Dr. Tsuyoshi Okita (Dublin City University, Ireland)
- Prof. Pavel Pecina (Institute of Formal and Applied Linguistics, Charles University in Prague)
- Dr. Marta R. Costa-jussà (Barcelona Media Innovation Center, Spain)
- Dr. Felipe Sanchez Martinez (Escuela Politecnica Superior, Universidad de Alicante, Spain)
- Dr. Nicolas Stroppa (Google, Zurich, Switzerland)
- Prof. Hans Uszkoreit (German Research Center for Artificial Intelligence, Germany)
- Dr. David Vilar (German Research Center for Artificial Intelligence, Germany)

The ML4HMT workshop is supported by the META-NET T4ME project
(, funded by the DG INFSO of the European
Commission through the Seventh Framework Programme, grant agreement no.:

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list