18.2686, Qs: English Type Frequencies by POS

Fri Sep 14 18:06:25 UTC 2007

LINGUIST List: Vol-18-2686. Fri Sep 14 2007. ISSN: 1068 - 4875.

Subject: 18.2686, Qs: English Type Frequencies by POS

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Dan Parker <dan at linguistlist.org>
================================================================  

We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then strongly encouraged to post a summary to the list. This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it
is usually a good idea to personally thank those individuals who have
taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 12-Sep-2007
From: Richard Hudson < dick at ling.ucl.ac.uk >
Subject: English Type Frequencies by POS

-------------------------Message 1 ---------------------------------- 
Date: Fri, 14 Sep 2007 14:05:35
From: Richard Hudson [dick at ling.ucl.ac.uk]
Subject: English Type Frequencies by POS
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=18-2686.html&submissionid=156093&topicid=8&msgnumber=1  

Does anyone know where I can find the proportion of English lemmas that are
nouns? 

More precisely, I'm looking for figures for lemmas in some large dictionary
or corpus classified by word class (aka part of speech), and if possible
also by token frequency; so ideally I'd like a table which shows nouns (and
maybe other word classes) as a percentage of the lemmas in a given
frequency range. My assumption is that the percentage of nouns in rare
vocabulary is higher than in common vocabulary, but I'd like to know
whether this is true. 

If I learn anything significant I'll summarise back to the list.

Dick Hudson  (dick at ling.ucl.ac.uk) 

Linguistic Field(s): Computational Linguistics

-----------------------------------------------------------
LINGUIST List: Vol-18-2686