<html>

<body>

Hi Cyrus<br><br>

a) Is the list in any particular order?<br><br>

<blockquote type=cite class=cite cite="">

<font face="Courier New, Courier">Number of words: 5894564637<br>

WORD<x-tab>    </x-tab>COUNT<x-tab>

   </x-tab>FREQPERMILLION<br>

BESTING<x-tab> </x-tab>712<x-tab>

     </x-tab>0.120789242946086<br>

PRACTICABLY<x-tab>     </x-tab>98<x-tab>

      </x-tab>0.0166254856863995<br>

BANTERERS<x-tab>       </x-tab>2<x-tab>

       </x-tab>0.00033929562625305<br>

RECLOTHE<x-tab>        </x-tab>

89<x-tab>      </x-tab>

0.0150986553682607</font></blockquote><br>

b) Why are some items given a score of 0?<br><br>

<blockquote type=cite class=cite cite="">

<font face="Courier New, Courier">CYCLIZES<x-tab>

        </x-tab>0<x-tab>

       </x-tab>0</font></blockquote>

<br>

<blockquote type=cite class=cite cite="">

<font face="Courier New, Courier">PROCEEDERS<x-tab>

      </x-tab>0<x-tab>

       </x-tab>0</font></blockquote>

<br>

<blockquote type=cite class=cite cite="">

<font face="Courier New, Courier">DATEDLY<x-tab> </x-tab>0<x-tab>

       </x-tab>0<br>

TUTOYERED<x-tab>       </x-tab>0<x-tab>

       </x-tab>0</font></blockquote>

<br>

c) This means that this cannot be a corpus frequency list, but a

pre-existing wordlist<br>

with corpus frequencies attached?<br><br>

d) If so, where did the original list come from? Is it a list used for

psycholinguistic recognition<br>

of 'real words' and 'pseudo-words' or something like that?<br><br>

e) You mention 111,627 English words; another indication that this is not

the entire corpus frequency list, <br>

nor the 'most frequent 111,627 types in the corpus' (as some have a

frequency of 0).<br><br>

f) If the corpus size is 5,894,564,637 tokens, the entire list cannot

contain only 111,627 types.<br>

The Bank of English corpus in 1993 contained 120,362,928 tokens, and

475,633 types;<br>

in 2000, it contained 418,449,873 tokens and 938,914 types. So a corpus

of 5,894,564,637 tokens<br>

must contain a much larger number of types?<br><br>

Best<br>

Ramesh<br><br>

At 17:46 31/08/2006, you wrote:<br>

<blockquote type=cite class=cite cite="">Hi All, <br>

I thought that this might be of interest to the list. I have also

experimented with using a CC Attribution-NonCommercial-NoDerivs license

for this word frequency list. Please tell me if you think this is a good

or a bad idea.<br><br>

Thanks, <br>

Cyrus<br><br>

<br>

*******<br>

Announcement: Word frequencies for a large corpus of USENET text

released.<br>

*******<br>

The Westbury Lab at the University of Alberta does research on

lexical<br>

semantics and other areas of psycholinguistics. Recently, as part of

a<br>

research program investigating high-dimensional models of semantic

memory, they collected 5,894,564,637 words from 47,860 English language,

non-binary-file newsgroups from the<br>

USENET between October 2005 and August 2006. This list of orthographic

frequencies for 111,627 English words will be<br>

of use to anyone who has used older lists based on corpora from

decades<br>

past.<br>

The list is available for download (3.3 MB file) under a Creative<br>

Commons 2.5 license at:<br>

<a href="http://www.psych.ualberta.ca/~westburylab/downloads/wlfreq.download.html" eudora="autourl">

http://www.psych.ualberta.ca/~westburylab/downloads/wlfreq.download.html</a>

<br>

  <br><br>

=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}<br>

Cyrus Shaoul<br>

<a href="http://www.psych.ualberta.ca/~westburylab/" eudora="autourl">

http://www.psych.ualberta.ca/~westburylab/</a><br>

University of Alberta<br>

780-492-5843<br>

=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}<br><br>

<br><br>

<br>

</blockquote>

<x-sigsep><p></x-sigsep>

Ramesh Krishnamurthy<br><br>

Lecturer in English Studies, School of Languages and Social Sciences,

Aston University, Birmingham B4 7ET, UK<br>

[Room NX08, North Wing of Main Building] ; Tel: +44 (0)121-204-3812 ;

Fax: +44 (0)121-204-3766<br>

<a href="http://www.aston.ac.uk/lss/staff/krishnamurthyr.jsp" eudora="autourl">

http://www.aston.ac.uk/lss/staff/krishnamurthyr.jsp<br><br>

</a>Project Leader, ACORN (Aston Corpus Network):

<a href="http://corpus.aston.ac.uk/" eudora="autourl">

http://corpus.aston.ac.uk/</a></body>

</html>