[Corpora-List] Multilingual summary evaluation data available
Ralf Steinberger
ralf.steinberger at jrc.ec.europa.eu
Thu Sep 30 20:37:45 UTC 2010
Dear Colleagues,
we are happy to inform you that the first multilingual summary
evaluation data in seven languages is available for research and
evaluation purposes. You can download the set from
http://langtech.jrc.ec.europa.eu/JRC_Resources.html .
This dataset consists of a manually annotated collection of document
clusters of parallel texts in seven languages (Arabic, Czech, English,
French, German, Russian and Spanish) that can be used to evaluate
multi-document, or even single document, summarisation software. The
data is particularly useful to compare the performance of software
across languages.
The four document clusters consist of five high-level commentaries each,
selected from http://www.project-syndicate.org/, discussing fields that
can roughly be described as being about malaria,
Israel-and-Palestine-Conflict, genetics and science-and-society.
The resource and its use are described in:
Marco Turchi, Josef Steinberger, Mijail Kabadjov and Ralf
Steinberger (2010)
Using Parallel Corpora for Multilingual (Multi-document)
Summarisation Evaluation.
Springer Lecture Notes in Computer Science (LNCS), Volume 6360/2010, 52-63.
We look forward to receiving any comments you may have,
Best Regards
Marco Turchi, Josef Steinberger, Mijail Kabadjov and Ralf Steinberger
* European Commission - Joint Research Centre (JRC)*
URL - Applications: http://emm.jrc.it/overview.html
<http://emm.jrc.it/overview.html> URL - The science behind them:
http://langtech.jrc.ec.europa.eu/
T.P. 267, Via Fermi 2749
21027 Ispra (VA), Italy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100930/017125ee/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list