[Corpora-List] Referring expressions: familiarity/accessibility
Klebanov Beata
beata at cs.huji.ac.il
Mon Sep 2 14:23:02 UTC 2002
Dear all,
As far as I know, the classification of referring expressions
according to the assumed familiarity/accessibility of the entity
being referred to usually looks smth like:
pronouns > demonstratives(+NP) > partial names > short DEFs > long
DEFs > full names > short INDEFS > long INDEFS.
However, below are some cases I came across where the expression
is an RE, but it is not quite clear to me where it fits on the scale (all
examples are from the Wall Street Journal):
(1) comparatives:
weaker results (Digital Equipment's profit fell 32% in the latest
quarter, prompting forecasts of weaker results ahead.)
higher commissions and revenue (The company said the improved
performance from a year ago reflects
higher commissions and revenue from
marketing ....)
=> These assume that some benchmark results/revenue
were mentioned before (the 32% fall; those one year ago),
although entities referred to with the expressions themselves
are new.
It seems to me that "weaker results" has a higher degree
of familiarity than "weak results", but just how much higher?
The anchoring in previously mentioned entity reminds me of
bridging, which is usually associated with short DEFs.
(2) quantifiers:
another round of horror
any other major currency
=> seem to me somewhat similar to (1)
(3) things that are (possibly) assumed to be singular entities:
genocide (the reports of genocide taking place...)
gold (In the Commodity Exchange in New York, gold dropped $1.60
to...; The dollar finished mixed, while gold declined.)
literature (The Nobel prize in literature)
=> I think these are all REs, since they can be referred to later:
the killing ... (genocide); it regained ... (gold), this
category is considered the most competitive ... (literature).
One possibility is to treat them as names - genocide
standing for "the phenomenon of violence on ethnic basis",
literature being "category of competition where writings
of fiction by contemporary authors are presented", etc.
Another one is
to treat them as shortDEFs, as if every mention was a mention
of the singular, one only entity (akin to "the sun"), where
possibly not all of its aspects are relevant ("literature" in
the example does not include "The Iliad", or articles on
Computational Linguistics).
I will appreciate any pointers to relevant literature (!), and/or
comments on the examples. Would you know of any attempts to do automatic
classification of REs?
Thank you,
Beata Klebanov
==============
PhD student, Computer Science Department
The Hebrew University of Jerusalem, Israel
email: beata at cs.huji.ac.il
www: http://www.cs.huji.ac.il/~beata
phone (office): 972 - 2 - 6585386
More information about the Corpora
mailing list