[Corpora-List] Where can I find a copy of MSE corpus?
marco turchi
marco.turchi at gmail.com
Thu Mar 17 09:09:20 UTC 2011
Dear Aqil,
few months ago, we released a multilingual summary evaluation corpus in
seven languages for research and evaluation purposes. You can download the
set from http://langtech.jrc.ec.europa.eu/JRC_Resources.html .
This dataset consists of a manually annotated collection of document
clusters of parallel texts in seven languages (Arabic, Czech, English,
French, German, Russian and Spanish) that can be used to evaluate
multi-document, or even single document, summarisation software. The data is
particularly useful to compare the performance of software across languages.
The four document clusters consist of five high-level commentaries each,
selected from http://www.project-syndicate.org/, discussing fields that can
roughly be described as being about malaria, Israel-and-Palestine-Conflict,
genetics and science-and-society.
The resource and its use are described in:
Marco Turchi, Josef Steinberger, Mijail Kabadjov and Ralf Steinberger
(2010)
Using Parallel Corpora for Multilingual (Multi-document) Summarisation
Evaluation.
Springer Lecture Notes in Computer Science (LNCS), Volume 6360/2010,
52-63.
Thanks a lot
Marco Turchi
On Thu, Mar 17, 2011 at 7:27 AM, Aqil Azmi <aazmi at yahoo.com> wrote:
> Hello everybody,
>
> Any idea where can I find a copy of MSE (Multilingual Summarization
> Evaluation) corpus? Thank you very much.
>
> --Aqil
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110317/e8f38e5b/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list