20.413, Sum: Vocabulary Statistics

LINGUIST Network linguist at LINGUISTLIST.ORG
Mon Feb 9 20:47:12 UTC 2009


LINGUIST List: Vol-20-413. Mon Feb 09 2009. ISSN: 1068 - 4875.

Subject: 20.413, Sum: Vocabulary Statistics

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Randall Eggert, U of Utah  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Dan Parker <dan at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 07-Feb-2009
From: Richard Hudson < dick at ling.ucl.ac.uk >
Subject: Vocabulary Statistics

 

	
-------------------------Message 1 ---------------------------------- 
Date: Mon, 09 Feb 2009 15:45:59
From: Richard Hudson [dick at ling.ucl.ac.uk]
Subject: Vocabulary Statistics

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=20-413.html&submissionid=204706&topicid=10&msgnumber=1
  


Query for this summary posted in LINGUIST Issue: 20.284                                                                                                                                                
 

A few weeks ago I broadcast a double query about the statistics of English
vocabulary. My first question was about the number of morphemes compared
with the number of lemmas, but nobody offered an answer. 

My second question was more successful. This was about the proportion of
lemmas in each of the main word classes, and how this proportion varied
with token frequency; I was particularly keen to check a guess that the
proportion of nouns was greater among rare lemmas than among common ones. I
received data from Gwillim Law and Jasper Holmes. It turns out that my
guess was right. I've presented and summarised the data at
http://www.phon.ucl.ac.uk/home/dick/nouniness/nouniness.htm. If anyone has
comments or further data (including data on other languages), I should of
course be most interested to hear from them. 

Linguistic Field(s): Text/Corpus Linguistics






-----------------------------------------------------------
LINGUIST List: Vol-20-413	

	



More information about the LINGUIST mailing list