[Corpora-List] Word frequencies for a large corpus of recent USENET text
Cyrus Shaoul
cyrus.shaoul at ualberta.ca
Thu Aug 31 16:46:26 UTC 2006
Hi All,
I thought that this might be of interest to the list. I have also experimented with using a CC Attribution-NonCommercial-NoDerivs license for this word frequency list. Please tell me if you think this is a good or a bad idea.
Thanks,
Cyrus
*******
Announcement: Word frequencies for a large corpus of USENET text released.
*******
The Westbury Lab at the University of Alberta does research on lexical
semantics and other areas of psycholinguistics. Recently, as part of a
research program investigating high-dimensional models of semantic memory,
they collected 5,894,564,637 words from 47,860 English language,
non-binary-file newsgroups from the
USENET between October 2005 and August 2006.
This list of orthographic frequencies for 111,627 English words will be
of use to anyone who has used older lists based on corpora from decades
past.
The list is available for download (3.3 MB file) under a Creative
Commons 2.5 license at:
http://www.psych.ualberta.ca/~westburylab/downloads/wlfreq.download.html
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
Cyrus Shaoul
http://www.psych.ualberta.ca/~westburylab/
University of Alberta
780-492-5843
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cyrus.shaoul.vcf
Type: text/x-vcard
Size: 293 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060831/8dc672e0/attachment-0001.vcf>
More information about the Corpora
mailing list