[Corpora-List] ACL 2007: Second Workshop on Statistical Machine Translation

Wed Feb 14 14:19:52 UTC 2007

ACL 2007: SECOND WORKSHOP ON
STATISTICAL MACHINE TRANSLATION

Saturday, June 23, 2007
http://www.statmt.org/wmt07/

Translating documents between two different languages by computer
has been one of the oldest goals in computational linguistics. Now,
armed with vast amounts of translated text and powerful computers,
we are witnessing significant progress toward achieving that goal.

Statistical methods allow the analysis of parallel corpora and the
automatic construction of machine translation systems. For some
language pairs such as Chinese-English or Arabic-English,
statistical machine translation (SMT) systems built at research labs
currently outperform commercial systems.

This workshop focuses on statistical and hybrid methods for
machine translation and features a shared translation task. The
evaluation of machine translation systems is a growing field and
this workshop will also focus on determining the best methodology
for evaluating translation quality both with automatic metrics and
through subjective human evaluation.

This workshop builds on the success of the 2005 ACL Workshop
on Parallel Text and the 2006 NAACL Workshop on Statistical
Machine Translation.

Topics of interest include, but are not limited to:
   * word-based, phrase-based, syntax-based SMT
   * using comparable corpora for SMT
   * using morphological and POS information for SMT
   * integration of rule-based MT and statistical MT
   * decoding
   * error analysis
   * evaluation techniques for MT

SHARED TASK

In addition to soliciting research papers on the topics listed above,
the workshop will also feature a shared translation task. The
workshop organizers will provide common test sets for translation
between four language pairs in both directions:

   * English-German and German-English
   * English-French and French-English
   * English-Spanish and Spanish-English
   * English-Czech and Czech-English

Participants may submit translations for any or all of the language
directions. In addition to the common test sets the workshop
organizers will provide optional training resources, including a
newly expanded release of the Europarl corpora, and additional
out-of-domain corpora.

All participants who submit entries will have their translations
evaluated. In addition to automatic scoring, we will also evaluate
translation performance by human judgment. To facilitate the
human evaluation we will require participants in the shared task
to manually judge some of the submitted translations.

A more detailed description of the shared task (including
information about the test and training corpora, a freely
available MT system, and a number of other resources) is
available from http://www.statmt.org/wmt07/shared-task.html .
We also provide a baseline machine translation system, whose
performance matches the best systems from last year's shared
task.

SUBMISSION INFORMATION

Submissions will consist of regular full papers of max. 8 pages,
formatted following the ACL 2007 guidelines. Authors of regular
full papers will be required to indicate a track for their submission.
In addition, teams participating in the shared tasks will be invited
to submit short papers (max. 4 pages) describing their systems.
Both submission and review processes will be handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop, so that their experiments can be repeated by others
using these publicly available corpora.

Given the overlap of the paper submission time frame with that of
EMNLP 2007, we accept papers that are also submitted to the
EMNLP conference, but would like to know as soon as possible
after the notification if an accepted paper will be withdrawn.

IMPORTANT DATES

Regular paper submissions       April 2
(shared task) Results submissions       March 30
(shared task) Short paper submissions   April 6
Notification    April 23
Camera-ready papers     May 9

ORGANIZERS

Philipp Koehn (University of Edinburgh)
Christof Monz (University of London)
Cameron Shaw Fordyce (Center for the Evaluation of Language and
Communication Technologies)
Chris Callison-Burch (University of Edinburgh)

PROGRAM COMMITTEE

Lars Ahrenberg (Linköping University)
Francisco Casacuberta (University of Valencia)
Colin Cherry (University of Alberta)
Stephen Clark (Oxford University)
Brooke Cowan (Massachusetts Institute of Technology)
Mona Diab (Columbia University)
Chris Dyer (University of Maryland)
Andreas Eisele (University Saarbrücken)
Marcello Federico (ITC-IRST)
George Foster (Canada National Research Council)
Alex Fraser (ISI/University of Southern California)
Ulrich Germann (University of Toronto)
Rebecca Hwa (University of Pittburgh)
Kevin Knight (ISI/University of Southern California)
Philippe Langlais (University of Montreal)
Alon Lavie (Carnegie Melon University)
Lori Levin (Carnegie Mellon University)
Daniel Marcu (ISI/University of Southern California)
Bob Moore (Microsoft Research)
Miles Osborne (University of Edinburgh)
Michel Simard (Canada National Research Council)
Eiichiro Sumita (ATR Spoken Language Translation Research Laboratories)
Jörg Tiedemann (University of Groningen)
Christoph Tillmann (IBM Research)
Dan Tufiş (Romanian Academy)
Taro Watanabe (NTT)
Dekai Wu (HKUST)
Richard Zens (RWTH Aachen)

CONTACT
For questions, comments, etc. please send email to pkoehn at inf.ed.ac.uk