15.121, Diss: Comp Ling: Karamanis: 'Entity Coherence...'

Fri Jan 16 17:04:02 UTC 2004

LINGUIST List:  Vol-15-121. Fri Jan 16 2004. ISSN: 1068-4875.

Subject: 15.121, Diss: Comp Ling: Karamanis: 'Entity Coherence...'

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Sheila Collberg, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Takako Matsui <tako at linguistlist.org>
 ==========================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
=================================Directory=================================

1)
Date:  Thu, 15 Jan 2004 13:59:27 -0500 (EST)
From:  N.Karamanis at sms.ed.ac.uk
Subject:  Entity Coherence for Descriptive Text Structuring

-------------------------------- Message 1 -------------------------------

Date:  Thu, 15 Jan 2004 13:59:27 -0500 (EST)
From:  N.Karamanis at sms.ed.ac.uk
Subject:  Entity Coherence for Descriptive Text Structuring

Institution: University of Edinburgh
Program: School of Informatics
Dissertation Status: Completed
Degree Date: 2003

Author: Nikiforos Karamanis

Dissertation Title: Entity Coherence for Descriptive Text Structuring

Dissertation URL:
http://www.iccs.informatics.ed.ac.uk/~nikiforo/thesis-online/phdthesis.ps

Linguistic Field: Computational Linguistics
Dissertation Director 1: Chris Mellish
Dissertation Director 2: Jon Oberlander
Dissertation Director 3: Massimo Poesio

Dissertation Abstract:

Although entity coherence, i.e. the coherence that arises from certain
patterns of references to entities, is of attested importance for
characterising a descriptive text structure, whether and how current
formal models of entity coherence such as Centering Theory can be used
for the purposes of natural language generation remains unclear. This
thesis investigates this issue and sets out to explore which of the
many formulations of Centering best suits text structuring. In doing
this, we assume text structuring to be a search task where different
orderings of propositions are evaluated according to scores assigned
by a metric.

The main question behind this study is how to choose a metric of
entity coherence among many alternatives as the only guidance to the
text structuring component of a system that produces descriptions of
objects.  Different ways of defining metrics of entity coherence using
Centering's notions are discussed and a general corpus-based
methodology is introduced to identify which of these metrics
constitute the most promising candidates for search-based text
structuring before the actual generation of the descriptive structure
takes place.

The performance of a large set of metrics is estimated empirically in
a series of computational experiments using two kinds of data: (i) a
reliably annotated corpus representing the genre of interest and (ii)
data derived from an existing natural language generation system and
ordered according to the instructions of a domain expert. A final
experiment supplements our main methodology by automatically
evaluating the best scoring orderings of some of the best performing
metrics in comparison to an upper bound defined by orderings produced
by multiple experts on additional application-specific data and a
lower bound defined by a random baseline.

The main findings are summarised as follows: In general, the simplest
metric of entity coherence constitutes a very robust baseline for both
datasets. However, when the metrics are modified according to an
additional constraint on entity coherence, then the baseline is beaten
in domain (ii).  The employed modification is supported by the
subsidiary evaluation which renders all employed metrics superior to
the random baseline and helps identify the metric which overall
constitutes the most suitable candidate (among the ones investigated)
for search-based descriptive text structuring in domain (ii).

This thesis provides substantial insight into the role of entity
coherence as a descriptive text structuring constraint. Viewing
Centering from an NLG perspective raises a series of interesting
challenges that the thesis identifies and attempts to investigate to a
certain extent. The general evaluation methodology and the results of
the empirical studies are useful for any subsequent attempt to
generate a descriptive text structure in the context of an application
that makes use of the notion of entity coherence as modelled by
Centering.

---------------------------------------------------------------------------
LINGUIST List: Vol-15-121