[Corpora-List] Referring expressions: familiarity/accessibility

Klebanov Beata beata at cs.huji.ac.il
Mon Sep 2 14:23:02 UTC 2002


Dear all,


As far as I know, the classification of referring expressions
according to the assumed familiarity/accessibility of the entity
being referred to usually looks smth like:
pronouns > demonstratives(+NP) > partial names > short DEFs > long
DEFs > full names > short INDEFS > long INDEFS.

However, below are some cases I came across where the expression
is an RE, but it is not quite clear to me where it fits on the scale (all
examples are from the Wall Street Journal):

(1) comparatives:
    weaker results (Digital Equipment's profit fell 32% in the latest
	            quarter, prompting forecasts of weaker results ahead.)
    higher commissions and revenue (The company said the improved
				    performance from a year ago reflects
                                    higher commissions and revenue from
                                    marketing ....)

	=> These assume that some benchmark results/revenue
	   were mentioned before (the 32% fall; those one year ago),
	   although entities referred to with the expressions themselves
	   are new.
	   It seems to me that "weaker results" has a higher degree
           of familiarity than "weak results", but just how much higher?
	   The anchoring in previously mentioned entity reminds me of
           bridging, which is usually associated with short DEFs.

(2) quantifiers:
    another round of horror
    any other major currency
	=> seem to me somewhat similar to (1)

(3) things that are (possibly) assumed to be singular entities:

    genocide (the reports of genocide taking place...)
    gold (In the Commodity Exchange in New York, gold dropped $1.60
          to...; The dollar finished mixed, while gold declined.)
    literature (The Nobel prize in literature)

	=> I think these are all REs, since they can be referred to later:
  	   the killing ... (genocide); it regained ... (gold), this
           category is considered the most competitive ... (literature).
	   One possibility is to treat them as names - genocide
	   standing for "the phenomenon of violence on ethnic basis",
           literature being "category of competition where writings
	   of fiction by contemporary authors are presented", etc.
           Another one is
  	   to treat them as shortDEFs, as if every mention was a mention
           of the singular, one only entity (akin to "the sun"), where
           possibly not all of its aspects are relevant ("literature" in
	   the example does not include "The Iliad", or articles on
	   Computational Linguistics).



I will appreciate any pointers to relevant literature (!), and/or
comments on the examples. Would you know of any attempts to do automatic
classification of REs?


Thank you,	


Beata Klebanov
==============
PhD student, Computer Science Department
The Hebrew University of Jerusalem, Israel
email: beata at cs.huji.ac.il
www: http://www.cs.huji.ac.il/~beata
phone (office): 972 - 2 - 6585386



More information about the Corpora mailing list