Corpora: Sum: closed class word list
Diego Molla
diego at ics.mq.edu.au
Thu Jun 6 02:34:27 UTC 2002
A few days ago I asked whether there is any list of closed class words
available. Thank you for all the responses that I received, here is a
brief summary.
First of all, some respondents said that there is no clear definition
about what is a closed class word. For example, several people suggested
to use a list of stop words.
My student is going to localise WordNet to the domain of software
documentation manuals, and one step in this process is the addition of
words from our corpus that are not defined in WordNet. Since WordNet
contains nouns, verbs, adjectives, and adverbs, he needs to find a way
to filter out those words that belong to other parts of speech.
So, for our application, closed classes are parts of speech other than
nouns, verbs, adjectives, and adverbs.
A way to find these words is to take a list of words annotated with
their part of speech, and select those that are not nouns, verbs,
adjectives, and adverbs. Fuchung Peng did something like that, and he
sent me a list of words tagged as DT, CC, PRP, PRP$, TO, WDT, WP$, WRB,
WP in the Brown corpus. Thank you for the list, I'll probably give it to
my student. Those who are interested in the list can contact me and
I'll send it to them by email.
Best regards to all and again, thank you to all who replied to my message.
Diego
--
This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.
---------------------------------------------------------------------
Diego MOLLA ALIOD diego at ics.mq.edu.au
Department of Computing http://www.ics.mq.edu.au/~diego
Macquarie University
More information about the Corpora
mailing list