Corpora: Noun phrases categories
Francis Bond
bond at cslab.kecl.ntt.co.jp
Mon May 20 02:32:49 UTC 2002
G'day,
Fuchun> I am working on classifying noun phrases into several
Fuchun> categories, such as mass NPs and count NPs, and even dividing
Fuchun> each category further. The goal is to develop better language
Fuchun> models for noun phrases modeling. and If it works, we can
Fuchun> develop better language models for sentences and better NP
Fuchun> chunkers.
Fuchun> I am wondering are there any previous work done on this topic?
Fuchun> How many categories should we divide noun phrases into and are
Fuchun> there such labeled data?
There is a vast literature on this in linguistics, two of the references I
found particularly interesting are:
@Book{AnnaW:1988,
author = "Anna Wierzbicka",
title = "The Semantics of Grammar",
publisher = "John Benjamins",
address = "Amsterdam",
year = 1988
}
@article{Allan:1980,
author = "Keith Allan",
title = "Nouns and Countability",
journal = "Language",
year = 1980,
volume = 56,
number = 3,
pages = "541--67"
}
>From a computational point of view, I have been looking at
countability from the point of view of Japanese-to-English MT, and
suggest splitting countability into 5 types (with a couple of
sub-types): Fully countable; Strongly Countable; Weakly Countable;
Uncountable and Plural Only.
I discuss these in several papers and my dissertation:
@inproceedings{Bond:1994,
author = "Francis Bond and Kentaro Ogura and Satoru Ikehara",
title = "Countability and Number in {Japanese}-to-{English}
Machine Translation",
booktitle = coling-94,
year = "1994",
address = "Kyoto",
**month = aug,
pages = "32--38",
note = "(\url{http://xxx.lanl.gov/abs/cmp-lg/9511001})",
**organization ="The International Committee on Computational
Linguistics (ICCL)"
}
@Article{Bond:1998,
author = "Francis Bond and Kentaro Ogura",
title = "Reference in {Japanese}-to-{English} Machine
Translation",
journal = MT,
volume = 13,
number = "2--3",
year = 1998,
pages = "107-134"
}
@PhDThesis{Bond:2001,
author = "Francis Bond",
title = "Determiners and Number in {English} contrasted with
{Japanese} --- as exemplified in Machine
Translation",
school = "University of Queensland",
year = 2001,
address = "Brisbane, Australia"
}
Ann Copestake also talks a bit about countability in her dissertation
and other publications too numerous to mention:
@PhdThesis{Copestake:1992z,
author = "Ann Copestake",
title = "The Representation of Lexical Semantic Information",
school = "University of Sussex",
year = 1992,
address = "Brighton"
}
As far as I know there isn't any labeled data generally available, but
I would be happy to be proved wrong.
--
Francis Bond <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
NTT Communication Science Laboratories | Machine Translation Research Group
More information about the Corpora
mailing list