Corpora: closed class word list

Douglas Rohde dr at tedlab.mit.edu
Tue Jun 4 16:12:26 UTC 2002


Diego Molla wrote:

> By definition, a list of closed class words must be easy to compile,
> since new additions to the list would be rare.
>
> Oddly enough, I haven't found any such list on the Web. A student of
> mine needs to use a list of closed class words. Does anybody know of
> such a list?


Assuming you're interested in English, I have a list of closed class
words that I developed for working with a corpus of usenet text.  It has
about 150 words.  As far as I can tell, the set of closed class words in
English is not completely well-defined.  Some words (pronouns,
conjunctives, articles) are clearly closed class.  But certain adverbs
and common verbs are probably debatable, as are, I think, digits.  So
for what it's worth, here's my list.  You notice that it includes things
like punctuation and stuff in brackets like <NUM> (which stands for a
number) that you may want to remove.

Doug


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: closedClass
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20020604/822727ba/attachment-0001.ksh>


More information about the Corpora mailing list