Corpora: closed class word list
Douglas Rohde
dr at tedlab.mit.edu
Tue Jun 4 16:12:26 UTC 2002
Diego Molla wrote:
> By definition, a list of closed class words must be easy to compile,
> since new additions to the list would be rare.
>
> Oddly enough, I haven't found any such list on the Web. A student of
> mine needs to use a list of closed class words. Does anybody know of
> such a list?
Assuming you're interested in English, I have a list of closed class
words that I developed for working with a corpus of usenet text. It has
about 150 words. As far as I can tell, the set of closed class words in
English is not completely well-defined. Some words (pronouns,
conjunctives, articles) are clearly closed class. But certain adverbs
and common verbs are probably debatable, as are, I think, digits. So
for what it's worth, here's my list. You notice that it includes things
like punctuation and stuff in brackets like <NUM> (which stands for a
number) that you may want to remove.
Doug
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: closedClass
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20020604/822727ba/attachment-0001.ksh>
More information about the Corpora
mailing list