Corpora: Sum: closed class word list

Diego Molla diego at ics.mq.edu.au
Thu Jun 6 02:34:27 UTC 2002


A few days ago I asked whether there is any list of closed class words
available. Thank you for all the responses that I received, here is a
brief summary.

First of all, some respondents said that there is no clear definition
about what is a closed class word. For example, several people suggested
to use a list of stop words.

My student is going to localise WordNet to the domain of software
documentation manuals, and one step in this process is the addition of
words from our corpus that are not defined in WordNet. Since WordNet
contains nouns, verbs, adjectives, and adverbs, he needs to find a way
to filter out those words that belong to other parts of speech.

So, for our application, closed classes are parts of speech other than
nouns, verbs, adjectives, and adverbs.

A way to find these words is to take a list of words annotated with
their part of speech, and select those that are not nouns, verbs,
adjectives, and adverbs. Fuchung Peng did something like that, and he
sent me a list of words tagged as DT, CC, PRP, PRP$, TO, WDT, WP$, WRB,
WP in the Brown corpus. Thank you for the list, I'll probably give it to
  my student. Those who are interested in the list can contact me and
I'll send it to them by email.

Best regards to all and again, thank you to all who replied to my message.

Diego

--

     This message is intended for the addressee named and may
     contain confidential information.  If you are not the intended
     recipient, please delete it and notify the sender.  Views expressed
     in this message are those of the individual sender, and are not
     necessarily the views of Macquarie University.

---------------------------------------------------------------------
Diego MOLLA ALIOD                                 diego at ics.mq.edu.au
Department of Computing               http://www.ics.mq.edu.au/~diego
Macquarie University



More information about the Corpora mailing list