Conf: Automatic Procedures in MT Evaluation - ELRA Workshop at MT Summit

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Wed Aug 29 20:21:34 UTC 2007

Date: Tue, 28 Aug 2007 14:12:51 +0200
From: ELDA <info at>
Message-ID: <46D41143.6080602 at>

  As you may know, ELRA is active in the field of evaluation. . In
this context, ELRA announces a workshop:
ELRA Workshop at MT Summit XI, 2007
This workshop, during MT Summit XI, Copenhagen 2007 (Sept. 11),
focusses on the discussion of automatic evaluation procedures in MT:
BLEU / NIST, d-score, x-score, edit distance, and other such tools.
The questions to be discussed are:
· What do the scores really measure? Are they biased towards specific
  MT technologies? (validity)

· What kind initial effort do they require (e.g.: pre-translate test
  corpus)? (economy)

· What kind of implicit assumptions do they make?

· What kind of resources do they need (e.g.: third party grammars)?
  (economy, feasibility)

· What kind of diagnostic support can they give? (where to improve the

· What kind of evaluation criteria (related to the FEMTI framework) do
  they support (adequacy, fluency, ...)
The objective of the workshop is to learn from recent evaluation
activities, and to create a better understanding of the strengths and
limitations of the respective approaches, and to get closer to a
common methodology for MT output evaluation.

Draft programme
9.00 Welcome and introduction
9.20 The place of automatic evaluation metrics in external quality 
models for machine translation
Andrei Popescu-Belis, University of Geneva
10.00 Evaluating Evaluation --- Lessons from the WMT'07 Shared Task
Philipp Koehn, University of Edinburgh
10.30 Coffee break
11.00 Investigating Why BLEU Penalizes Non-Statistical Systems
Eduard Hovy, University of Southern California
11.30 Edit distance as an evaluation metric
Christopher Cieri, Linguistic Data Consortium (TBC)
12.00 Experience and conclusions from the CESTA evaluation project
Olivier Hamon, ELDA
12.30 Lunch
13.30 Automatic Evaluation in MT system production
Gregor Thurmair, Linguatec
14.00 Sensitivity of performance-based and proximity-based models for
MT evaluation
Bogdan Babych, Univ. Leeds
14.30 Automatic & human Evaluations of MT in the framework of a speech
to speech communication
Khalid. Choukri, ELDA
15.00 Coffee break
15.30 Discussion and conclusions
17.00 Close
More information will be found under the MT Summit website:
Kindest regards,
ELRA evaluation committee
(B. Maegaard, Kh. Choukri, Gr. Thurmair)

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list