[Corpora-List] First Call for Participation: WMT 2015 Machine Translation Shared Tasks

Barry Haddow bhaddow at inf.ed.ac.uk
Mon Dec 22 21:17:56 UTC 2014


EMNLP 2015 TENTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION
Shared Tasks on news translation, automatic post-editing,  quality 
estimation and metrics.
September 2015, in conjunction with EMNLP 2015 in Lisbon, Portugal

http://www.statmt.org/wmt15/

As part of the EMNLP WMT15 workshop, as in previous years, we will be 
organising a collection of shared tasks related to machine translation.  
We hope that both beginners and established research groups will 
participate. This year we are pleased to present the following tasks:

- Translation task
- Automatic Post-editing task (pilot)
- Quality estimation task
- Metrics task (including tunable metrics)

Further information, including task rationale, timetables and data can 
be found on the WMT15 website. Brief descriptions of each task are given 
below. Intending participants are encouraged to register with the 
mailing list for further announcements 
(https://groups.google.com/forum/#!forum/wmt-tasks)

For all tasks,  participants will also be  invited to submit a short 
paper describing their system.

---------------------
Translation Task
---------------------
This will compare translation quality on five European language pairs 
(English-Czech, English-Finnish, English-French, English-German and 
English-Russian).
*New* for this year:
- Finnish appears as a "guest" language
- The English-French text will be drawn from informal news discussions. 
All other test sets will be from professionally written news articles.

We will provide extensive monolingual and parallel data sets for 
training, as well as development sets, all available for download from 
the task website. Translations will be evaluated both using automatic 
metrics, and using human evaluation. Participants will be expected to 
contribute to the human evaluations of the translations.

For this year's task we will be releasing the following new or updated 
corpora:
- An updated version of news-commentary
- A monolingual news crawl for 2014 in all the task languages
- Development sets for English-French and English-Finish
Not all data sets are available on the website yet, but they will be 
uploaded as soon as they are ready.

The translation task test week will be April 20-27.

This task is supported by the EU projects MosesCore 
(http://www.mosescore.eu), QT21 and Cracker,  and the Russian test sets 
are provided by Yandex.

-----------------------------------------------------
Pilot task on Automatic Post-Editing
-----------------------------------------------------
This shared task task will examine automatic methods for correcting 
errors produced by machine translation (MT) systems.  Automatic 
Post-editing (APE) aims at improving MT output in black box scenarios, 
in which the MT system is used "as is" and cannot be modified.
 From the application point of view APE components would make it 
possible to:

* Cope with systematic errors of an MT system whose decoding process is 
not accessible
* Provide professional translators with improved MT output quality to 
reduce (human) post-editing effort

In this first edition of the task, the evaluation will focus on one 
language pair (English-Spanish), measuring systems' capability to reduce 
the distance (HTER) that separates an automatic translation from its 
human-revised version approved for publication. Training and test data 
are provided by Unbabel.

Important dates
Release of training data: January 31, 2015
Test set distributed: April 27, 2015
Submission deadline: May, 15

------------------------
Quality Estimation
------------------------

This shared task will examine automatic *methods for estimating the 
quality of machine translation output at run-time*, without relying on 
reference translations. In this fourth edition of the shared task, in 
addition to *word-level* and *sentence-level* estimation, we will 
introduce *document-level *estimation. Our main *goals* are the following:

  * To investigate the effectiveness of quality labels and features for
    document-level prediction.
  * To explore differences between sentence-level and document-level
    prediction.
  * To analyse the effect of training data sizes and quality for
    sentence and word-level prediction, particularly for negative (i.e.
    low translation quality) examples.

The WMT12-14 quality estimation shared tasks provided a set of baseline 
features, datasets, evaluation metrics, and oracle results. Building on 
the last three years' experience and focusing on English, Spanish and 
German as languages, this year's shared task will reuse some of these 
resources, but provide additional training and test sets.

----------------
Metrics Task
----------------

The shared metrics task will examine automatic evaluation metrics for  
machine translation.
We will provide you with all of the translations  produced in the 
translation task along with the
reference human  translations. You will return your automatic metric 
scores for each of  the
translations at the system-level and/or at the sentence-level. We  will 
calculate the system-level
and sentence-level correlations of your rankings with WMT15 human 
judgements once the manual
evaluation has been  completed.

In addition to this evaluation task, we will run a tunable metrics 
task,  similar to the one we ran in
2010. The idea of this task is to evaluate which metrics give the best 
performance (according
to human evaluation) when used to tune an SMT system. We will provide 
the system, then you
will tune it using your metric and send us the resulting tuned weights.

Full details of the metrics tasks will be made  available on the task 
website.


The important dates for metrics task participants are:

May 4, 2015 - System outputs distributed for metrics task
May 25, 2014 - Submission deadline for metrics task

-----

Barry Haddow
(on behalf of the organisers)









-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141222/70e431b9/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list