[Corpora-List] New Shared Task in Surface Realisation from a Common-ground Input Representation

Mon Oct 18 11:13:47 UTC 2010

Call for Expressions of Interest

SHARED TASK IN SURFACE REALISATION FROM A COMMON-GROUND INPUT
REPRESENTATION

We seek input and participation in a proposed shared task on surface
realisation using resources developed over the Penn TreeBank,
including a track for automatic evaluation metrics.  If the task
descriptions below interest you, please contact us (see end of email for
contact details).

Background

In Natural Language Analysis (NLA), reuse of core utilities and tools has
become common, and researchers frequently use off-the-shelf parsers,
POS-taggers, named entity recognisers, coreference resolvers, and many
other tools.  NLG has not so far developed generic tools, and methods for
comparing them, to the same extent as NLA.  The NLG subfield that has
perhaps come closest to developing generic tools is surface realisation.
Wide-coverage surface realisers such as PENMAN/NIGEL, FUF/SURGE and
REALPRO were intended to be more or less off-the-shelf plug-and-play
modules.  But they tended to require a significant amount of work to adapt
and integrate, e.g. requiring highly specific inputs with up to several
hundred features that needed to be set.

With the advent of statistical techniques in NLG, surface realisers
appeared for which it was far simpler to supply inputs, as information not
provided in the inputs could be added on the basis of likelihood. The
current generation of surface realisers tend to be statistical and use
reversible, treebank-based, automatically extracted grammars for both
parsing and generation.  A significant subset of statistical realisation
work has produced results for regenerating the Penn Treebank (PTB) where
the annotated resources of the PTB are mapped to some form of meaning
representation which then serves as input to the surface realiser whose
task it is to reproduce the original treebank sentence.

Despite the fact that these research projects involve the same corpus,
the reported results cannot be directly compared, because each
realiser uses different input representation formalisms (to match the
grammar formalisms used in the realiser: HPSG, CCG, LFG, LTAG, etc.)
and inputs specify the word-string outputs to different degrees (some
inputs are more `surfacey', others more semantic).  Evaluation results
typically report BLEU scores, and publications refer to each other and
(tentatively) compare BLEU scores, but no conclusions can be drawn
from these comparisons, because of the differences in inputs.
Additionally, meta-evaluations of MT metrics on realiser outputs have
suggested that these metrics correlate less well with human judgments
than in the case of MT outputs, perhaps because realiser outputs are
generally of higher quality and exhibit more subtle variation.  We are
therefore left in a situation where a vibrant new generation of
surface realisation research exists, but we do not have the facility
to compare these approaches with each other, or to the previous
generation of symbolic realisers.

Shared Task Outline

We are currently developing (for an early outline see Belz, White, van
Genabith, Hogan & Stent, 2010) a shared task in surface realisation (SR)
based on common inputs and annotated corpora of paired inputs and outputs
derived from various resources from NLA that build on the Penn Treebank.
Inputs are provided in a common-ground representation formalism which
participants map to the types of input required by their system. These
inputs are automatically derived from the Penn Treebank and the various
layers of annotation (syntactic, semantic, discourse) that have been
developed for the texts in it.  The shared task is defined precisely and
outputs from participating systems (realisations) are evaluated by
automatic comparison against the human-authored text in the corpora as
well as by human assessors.

We have assembled a working group of SR researchers (see below) to tackle
the task of designing the common-ground input representation, with the aim
of ensuring a fair and balanced approach.  In the short term, an SR Shared
Task as outlined here will make existing and new approaches directly
comparable by evaluation on the benchmark data associated with the task.
Additionally, a metrics track will allow researchers working on
automatic evaluation metrics to submit the results of their metrics on
the realisation data.

In the long term, the common-ground input representation is likely to lead
to a standardised representation that can act as a link between surface
realisers and preceding modules, and may one day make it possible to use
alternative surface realisers as drop-in replacements for each other,
enabling developers to determine the best realiser for their purpose.
Moreover, the acquired human judgments of realiser outputs will form a
challenging data set for advancing research on automatic evaluation
metrics.

Working Group Developing the Common-Ground Input Representation

Anja Belz, NLTG, University of Brighton, UK (coordination)
Bernd Bohnet, IMS, University of Stuttgart, Germany
Charles Callaway, University of Haifa, Israel
Josef van Genabith, CNGL, Dublin City University, Ireland
Deirdre Hogan, CNGL, Cublin City University, Ireland
Stephan Oepen, University of Oslo, Norway
Amanda Stent, AT&T Labs Research Inc., US
Leo Wanner, Information and Communication Technologies, UPF, Barcelona, Spain
Mike White, Department of Linguistics, The Ohio State University, US

Prospective Timeline

February 2011: Training/development data and task documentation available
July 2011: Submission of test data outputs
August 2011: Submission of automatic metrics results on realiser data
September 2011: Results session at GenChal'11 at ENLG 2011

Expressions of Interest

Please let us know if you would like to participate in the Surface
Realisation Shared Task or the Automatic Metrics Track. We welcome any
feedback or suggestions you may have.

Note that we would particularly welcome work on surface realization
from PTB-style inputs, but for languages other than English.  Related
ideas and/or results can be submitted to the Open Track at Generation
Challenges 2011 (to be announced shortly).

Shared Task Organisation:

Anja Belz, NLTG, University of Brighton, UK
Josef van Genabith, CNGL, Dublin City University, Ireland
Deirdre Hogan, CNGL, Cublin City University, Ireland
Amanda Stent, AT&T Labs Research Inc., US
Mike White, Department of Linguistics, The Ohio State University, US

Contact:

Anja Belz (A.S.Belz at brighton.ac.uk) and Mike White (mwhite at ling.osu.edu)

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora