[Corpora-List] NP Coordinations

Preslav Nakov nakov at cs.berkeley.edu
Sun Mar 9 10:17:24 UTC 2008


Hi Ekaterina,

There are several problems related to coordination. Here are some examples:

1. Boundaries -- i.e. what is coordinated: words, constituents, clauses,
etc.

2. Interactions with other constituents, e.g. PPs  

 	[health and [quality of life]] vs. [[health and quality] of life]

3. Ellipsis: e.g. "car and truck production" means " car _production_ and
truck production", but there is no ellipsis in "president and chief
executive"

4. ``or'' meaning ``and'': e.g., ``chronic diseases or disabilities"


The ellipsis problem is one a parser can solve. For example, the Penn
Treebank has flat NPs for the case of ellipsis:

   (NP car/NN and/CC truck/NN production/NN).

and NPs with external structure when there is no ellipsis:

   (NP 
	(NP president/NN) 
	and/CC 
	(NP chief/NN executive/NN))


See the following paper for details:

Nakov, P., and Hearst, M. "Using the Web as an Implicit Training Set:
Application to Structural Ambiguity Resolution.", In Proceedings of the
HLT-NAACL'05 Vancouver, 2005.
http://acl.ldc.upenn.edu/H/H05/H05-1105.pdf


The dataset (428 examples) used in that paper is listed in the appendix of
my PhD thesis:

"Using the Web as an Implicit Training Set: Application to Noun Compound
Syntax and Semantics"
http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-173.html

Of course, you can also extract such structures from the Penn Treebank
directly if you need more data.

Preslav

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Ekaterina Buyko
Sent: Friday, March 07, 2008 10:44 AM
To: corpora at uib.no
Subject: [Corpora-List] NP Coordinations

Hi,

I am looking for the data sets with annotated NP coordinations.

My aim is to evaluate my coordination disambiguation engine and it would
be great to have more evaluation data, for example the data sets already
used for evaluations (e.g., Resnik 1999).

Is there any free data for this task?

Thank you

Ekaterina

-- 

Ekaterina Buyko
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax:   +49-3641-944321
email: ekaterina.buyko at uni-jena.de
URL:   http://www.coling.uni-jena.de


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list