[Corpora-List] Surface Realisation Shared Task: Call for Pre-Registration and Sample Data Release

Fri Mar 11 14:41:16 UTC 2011

Generation Challenges 2011 Surface Realisation Shared Task
==========================================================

Call for Pre-Registration and Sample Data Release
-------------------------------------------------

We invite teams of researchers to pre-register now for the GenChal'11
Surface Realisation Shared Task (SR 2011) by filling in the
registration form on the SR Task website
(http://www.nltg.brighton.ac.uk/research/sr-task).

Once registered, teams will be given access to sample data for the SR
Task to familiarise themselves with the two common-ground input
representation formats we have developed and to provide comment and
feedback to us by March 25, 2011.

The complete SR Task training and development data will be distributed
on March 31, and the deadline for submitting system outputs will be in
early August (exact date to be confirmed).

Below we provide a brief overview of the SR Task. For more information
please visit the SR Task website:
http://www.nltg.brighton.ac.uk/research/sr-task

SR Task:

The task for participating teams is to develop systems that map (one
of) the common-ground input representations to surface word strings
(fully realised sentences), and to submit system outputs for the
inputs in the test data set.

Data:

The SR Task data is derived from the CoNLL-08 corpus which itself
merges data from several other corpora (the WSJ Treebank, the BBN
Corpus, Propbank and Nombank).  We have processed and adapted this
data to make it useful for generation tasks.

Evaluation:

Submitted system outputs will be evaluated by a variety of automatic
metrics and human-assessed quality criteria.

Common-ground Input Representations:

1. Shallow: Each word and punctuation marker is represented as a node
in a syntactic dependency tree. Information at each node consists of a
word's lemma, a coarse-grained POS-tag and, where appropriate, number
and tense features and sense tag IDs. Edges between nodes are labelled
with syntactic labels.

2. Deep: Graphs containing semantic relations when available, shallow
relations otherwise. Information at each node consists of a word's
lemma and, where appropriate, number and tense features and sense tag
IDs.  No POS tags are given for the deep representation.  Commas have
been removed from the deep representation, as have some function
words.

For both shallow and deep representations relations are arbitrarily
ordered.  Sentences have single sentence roots.

Organising Team:

Anja Belz, NLTG, University of Brighton, UK
Josef van Genabith, CNGL, Dublin City University, Ireland
Deirdre Hogan, CNGL, Cublin City University, Ireland
Amanda Stent, AT&T Labs Research Inc., US
Mike White, Department of Linguistics, The Ohio State University, US

Additional members of Common-ground Input Representation Working Group:

Bernd Bohnet, IMS, University of Stuttgart, Germany
Johan Bos, Groningen University, Netherlands
Aoife Cahill, IMS, University of Stuttgart, Germany
Charles Callaway, University of Haifa, Israel
Pablo Gervas, Universidad Complutense de Madrid, Spain
Stephan Oepen, University of Oslo, Norway
Leo Wanner, Information and Communication Technologies, UPF, Barcelona,
Spain

SR Task contact email: nlg-stec at itri.brighton.ac.uk
SR Task website: http://www.nltg.brighton.ac.uk/research/sr-task

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora