MultiLing 2013 - Multilingual Multi-document Summarization
Call for Systems/Participation

August 9th, 2013
ACL 2013, Sofia, Bulgaria

MultiLing 2013 is a workshop, held within ACL 2013, which covers three
sub-domains of Natural Language Processing, focused on the multilingual
aspect of summarization: multi-document summarization, summarization
evaluation and data collection.

The MultiLing 2013 workshop builds upon the Text Analysis Conference
(TAC) MultiLing Pilot task of 2011, where systems were asked to generate
fluent, representative summaries (around 250 words) for each of a set of
predefined topics per language.

This year each topic is described by 10 source documents. The set of
documents is in one of the following languages: Arabic, Chinese, Czech,
English, French, Greek, Hebrew, Hindi, Romanian and Spanish.

Based on the challenges revealed in MultiLing 2011, this year we also
address the problems of multilingual summary evaluation and data
We also conduct a pilot task for single document summarization.

This call asks for systems to participate and compete in the MultiLing
2013 tracks.
MultiLing 2013 aims to encourage research on multilingual summarization,
by providing a corpus in various languages, common evaluation procedures
and a forum for participants to present their results.

This call invites you to participate in the MultiLing 2013 Workshop
MultiLing 2013 organizers will provide suitable corpora for each track.
MultiLing 2013 has two main tracks and a pilot:

   1) Multilingual multi-document summarization
   2) Multilingual summary evaluation
   3) (Pilot) Multilingual single document summarization

Tracks and Tasks Detailed
Track 1: Multilingual multi-document summarization
The multilingual multi-document summarization track aims to evaluate the
application of (partially or fully) language-independent summarization
algorithms on a variety of languages. Each system participating in the
track will be called to provide summaries for a range of different
languages, based on a news corpus.  Participating systems will be
required to apply their methods to a minimum of two languages.
Evaluation will favor systems that apply their methods to more

The corpus used in the Multilingual multi-document summarization track
will be based on WikiNews texts ( Source texts
will be UTF-8, clean texts (without any mark-up, images,etc.).

The task requires systems to generate a single, fluent, representative
summary from a set of documents describing an event sequence. The
language of the document set will be within a given range of languages
and all documents in a set share the same language. The output summary
should be of the same language as its source documents. The output
summary should be 250 words at most.

Track 2: Multilingual summary evaluation
This track aims to examine how well automated systems can evaluate
summaries from different languages. The task offers as input the
summaries generated from automatic systems and humans in the
Multilingual multi-document summarization task. The output of evaluating
systems, should be a grading of the summaries. Ideally, we would want
the automatic evaluation to maximally correlate to human judgment. Human
judgments will be provided by the organizers, thanks to the co-operating

The corpus of the Multilingual summary evaluation will consist of gold
(human) summaries and the automatic system summaries output from the
Multilingual multi-document summarization task.

Pilot: Multilingual single document summarization
This pilot aims to measure the ability of automated systems to apply
single document summarization, in the context of Wikipedia texts.  Given
a single encyclopedic entry, possibly with several sections/subsection,
describing a specific subject, the systems will be requested to provide
a summary covering the main points of the entry (similarly to the lead
section of a Wikipedia page). The corpus will consist of (non-parallel)
documents in over 40 languages. Participating systems will be required
to apply their methods to a minimum of two languages.  Evaluation will
favor systems that apply their methods to more languages.

The pilot corpus will be based on selected texts from Wikipedia. Details
will follow in the MultiLing 2013 website

Please check the roadmap page on the MultiLing website:

How to apply as a Participant:
Enter your information by *April 20th, 2013* at the following web form:

If you have problems or questions, please contact George Giannakopoulos
(ggianna @ directly.

Please feel free to forward this call.

