Corpora: Noun phrases categories

Francis Bond bond at cslab.kecl.ntt.co.jp
Mon May 20 02:32:49 UTC 2002


G'day,

Fuchun> I am working on classifying noun phrases into several
Fuchun> categories, such as mass NPs and count NPs, and even dividing
Fuchun> each category further. The goal is to develop better language
Fuchun> models for noun phrases modeling. and If it works, we can
Fuchun> develop better language models for sentences and better NP
Fuchun> chunkers.

Fuchun> I am wondering are there any previous work done on this topic?
Fuchun> How many categories should we divide noun phrases into and are
Fuchun> there such labeled data?

There is a vast literature on this in linguistics, two of the references I
found particularly interesting are:

@Book{AnnaW:1988,
  author =	 "Anna Wierzbicka",
  title =	 "The Semantics of Grammar",
  publisher =	 "John Benjamins",
  address =	 "Amsterdam",
  year =	 1988
}

@article{Allan:1980,
  author =	 "Keith Allan",
  title =	 "Nouns and Countability",
  journal =	 "Language",
  year =	 1980,
  volume =	 56,
  number =	 3,
  pages =	 "541--67"
}

>From a computational point of view, I have been looking at
countability from the point of view of Japanese-to-English MT, and
suggest splitting countability into 5 types (with a couple of
sub-types): Fully countable; Strongly Countable; Weakly Countable;
Uncountable and Plural Only.

I discuss these in several papers and my dissertation:


@inproceedings{Bond:1994,
  author =	 "Francis Bond and Kentaro Ogura and Satoru Ikehara",
  title =	 "Countability and Number in {Japanese}-to-{English}
                  Machine Translation",
  booktitle =	 coling-94,
  year =	 "1994",
  address =	 "Kyoto",
  **month =	 aug,
  pages =	 "32--38",
  note =	 "(\url{http://xxx.lanl.gov/abs/cmp-lg/9511001})",
  **organization ="The International Committee on Computational
                  Linguistics (ICCL)"
}
@Article{Bond:1998,
  author =	 "Francis Bond and Kentaro Ogura",
  title =	 "Reference in {Japanese}-to-{English} Machine
                  Translation",
  journal =	 MT,
  volume =	 13,
  number =	 "2--3",
  year =	 1998,
  pages =	 "107-134"
}
@PhDThesis{Bond:2001,
  author =	 "Francis Bond",
  title =	 "Determiners and Number in {English} contrasted with
                  {Japanese} --- as exemplified in Machine
                  Translation",
  school =	 "University of Queensland",
  year =	 2001,
  address =	 "Brisbane, Australia"
}

Ann Copestake also talks a bit about countability in her dissertation
and other publications too numerous to mention:

@PhdThesis{Copestake:1992z,
  author =	 "Ann Copestake",
  title =	 "The Representation of Lexical Semantic Information",
  school =	 "University of Sussex",
  year =	 1992,
  address =	 "Brighton"
}

As far as I know there isn't any labeled data generally available, but
I would be happy to be proved wrong.

--
Francis Bond  <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
NTT Communication Science Laboratories | Machine Translation Research Group



More information about the Corpora mailing list