[Corpora-List] GREC Shared Tasks 2009 (NLG/Summarisation)

Anja Belz a.s.belz at itri.brighton.ac.uk
Fri Sep 5 14:02:22 UTC 2008


FIRST CALL FOR PARTICIPATION

GENERATION OF REFERENCES IN CONTEXT (GREC) TASKS 2009
-----------------------------------------------------

Part of Generation Challenges 2009, in conjunction with ENLG 2009.

Generation Challenges 2009 is being organised to provide a common
forum for a number of different NLG Shared Tasks (see
http://www.nltg.brighton.ac.uk/research/genchal09/).

As part of Generation Challenges 2009, we are organising two GREC
Shared Task Competitions.  The first is the GREC-MSR (Main Subject
References) Task which uses the GREC-2.0 Corpus of 2,000 Wikipedia
introduction sections where references to the main subject of the
Wikipedia article have been annotated, and the task is to develop a
system that can select (from a given list) an MSR that is appropriate
in the context.  The second is the GREC-NEG (Named Entity Generation)
Task which uses the new GREC-People Corpus of 1,000 Wikipedia
introduction sections about people in which single and plural
references to all people mentioned in the text have been
annotated. The task in GREC-NEG is to select appropriate referential
expressions for all mentions (singular and plural) of people.

Submissions to both tasks will be evaluated using a range of intrinsic
and extrinsic measures, some assessed automatically, some manually.
Submitted systems and evaluation results will be presented in a
special session at ENLG'09, and published in the ENLG'09 proceedings.


1. Background
--------------

There has been increasing interest recently among text summarisation
researchers in postprocessing techniques to improve the referential
clarity and coherence of extractive summaries, and among language
generation researchers in generating referential expressions in
context.  The GREC tasks are aimed at researchers in both of these
groups, and the objective is the development of methods for generating
chains of referential expressions for discourse entities in the
context of a written discourse, as is useful for postprocessing
extractive summaries and repeatedly edited texts (such as Wikipedia
articles).


2. Data
--------

The GREC data resources consist of introduction sections collected
from Wikipedia articles in which three broad syntactic categories of
overt reference to named entities have been annotated: subject NPs,
object NPs and genitive subject-determiners (such as "Faraday's" in
"Faraday's law of induction"). The annotations include features
encoding basic syntactic and semantic information.

The GREC-2.0 corpus consists of 2,000 texts in five different domains
(cities, countries, rivers, people and mountains).  In this corpus,
only references to the single entity that is the main subject of a
Wikipedia article (e.g. "Michael Faraday") have been annotated.

The new GREC-People corpus consists of 1,000 texts in just one domain,
people. Here, all references to all people mentioned in a text have
been annotated.  GREC-People therefore includes explicit coreference
annotation for one or more coreference chains (whereas in GREC-2.0
texts there is always just one annotated coreference chain).

For GREC-2.0 and GREC-People we have test sets of 200 and 100 texts,
respectively, where referential expressions have been selected by
participants in an elicitation experiment. In these test sets, there
are three versions for each corpus text, in each of which the
referential expressions have been manually selected by a single
participant in the experiment.


3. The GREC'09 Tasks
--------------------

The GREC-MSR Task has the same task definition as the GREC shared task
at REG'08.  Participating systems need to select the referential
expression (RE) from a given set of alternatives that is most
appropriate in the given context, which may involve e.g. ensuring that
pronouns can be resolved.  Systems will be evaluated both against the
REs in the corpus and against human-selected topline solutions for
this task.  Results and descriptions of participating systems from the
REG'08 run of this task can be found here:
http://www.aclweb.org/anthology-new/W/W08/#1100

The new GREC-NEG Task is an extension of GREC-MSR in that it requires
participating systems to select appropriate referential expressions
for all discourse entities of the same type (people in this round) as
the main subject of the article.


4. Evaluation
-------------

For both tasks, the data will be randomly divided into training,
development and test data.  Participants will compute evaluation
scores on the development set (using code provided by the organisers),
and the organisers will perform evaluations on the test data set.

We will use a range of different evaluation methods, including
intrinsic and extrinsic, automatically assessed and human-evaluated.
The intrinsic methods will include string-accuracy, feature-accuracy
and string-similarity measures, as well as human-produced quality
assessments.  The extrinsic methods will include a
reading/comprehension experiment and measuring coreference resolver
success (for details about the previous edition, see
http://www.aclweb.org/anthology/W/W08/W08-1127.pdf).

Full details of the evaluation methods for GREC'09 will be given in
the Participants' Pack that will be distributed to registered
participants.


6. Participation
----------------

Registration is now open at the GREC'09 homepage
(http://www.nltg.brighton.ac.uk/research/genchal09/grec).  Once
registered, participants in the GREC-MSR Task will receive the
complete training and development set, evaluation software and
detailed documentation (collectively known as the Participants' Pack)
for this task.  Participants in GREC-NEG will first receive a sample
of the training and development data, to enable them to start building
systems; they will receive the complete Participants' Pack for
GREC-NEG by the end of September 2008.


7. Proceedings and Presentations
--------------------------------

The Generation Challenges 2009 meeting will be held as a special
session at ENLG 2009. The session will include overviews of all the
shared tasks, including the GREC'09 Tasks. The participating systems
will additionally be presented as papers in the ENLG'09 proceedings,
and as posters during the ENLG'09 poster session.

GREC'09 papers will not undergo a selection procedure with multiple
reviews, but the organisers reserve the right to reject material which
is not appropriate given the participation guidelines.


8. Important Dates
------------------

Sep 05, 2008   First Call for Participation in GREC'09 Tasks; GREC'09
               sample data sets available
Jan 01-31, 08  GREC'09 test data submission:
	       1. submit system report;
               2. download test data;
               3. submit outputs within 48h.
Jan 31, 2009   Final deadline for submission of GREC'09 test data outputs
Feb 01-28, 09  GREC'09 Evaluation period
Mar 30-31, 09  Generation Challenges 2009 meeting at ENLG'09 (date to be
               confirmed)


9. Organisation
---------------

Albert Gatt, Computing Science, University of Aberdeen, UK
Anja Belz, NLTG, University of Brighton, UK
Eric Kow, NLTG, University of Brighton, UK
Jette Viethen, Macquarie University, Australia

GREC'09 homepage: http://www.nltg.brighton.ac.uk/research/genchal09/grec
Generation Challenges homepage: http://www.nltg.brighton.ac.uk/research/genchal09
Generation Challenges email: nlg-stec at itri.brighton.ac.uk


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list