Info: Multilingual summary evaluation data available

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Fri Oct 1 20:03:26 UTC 2010

Date: Thu, 30 Sep 2010 22:37:45 +0200
From: Ralf Steinberger <ralf.steinberger at>
Message-id: <4CA4F519.8040904 at>

  Dear Colleagues,

we are happy to inform you that the first multilingual summary
evaluation data in seven languages is available for research and
evaluation purposes. You can download the set from .

This dataset consists of a manually annotated collection of document
clusters of parallel texts in seven languages (Arabic, Czech, English,
French, German, Russian and Spanish) that can be used to evaluate
multi-document, or even single document, summarisation software. The
data is particularly useful to compare the performance of software
across languages.

The four document clusters consist of five high-level commentaries
each, selected from, discussing
fields that can roughly be described as being about malaria,
Israel-and-Palestine-Conflict, genetics and science-and-society.

The resource and its use are described in:

     Marco Turchi, Josef Steinberger, Mijail Kabadjov and Ralf 
Steinberger (2010)
     Using Parallel Corpora for Multilingual (Multi-document) 
Summarisation Evaluation.
Springer Lecture Notes in Computer Science (LNCS), Volume 6360/2010,

We look forward to receiving any comments you may have,

Best Regards

Marco Turchi, Josef Steinberger, Mijail Kabadjov and Ralf Steinberger

* European Commission - Joint Research Centre (JRC)*
URL - Applications:
<> URL - The science behind them:
T.P. 267, Via Fermi 2749
21027 Ispra (VA), Italy

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list