[Corpora-List] UPDATE: Corrected Word frequencies for a large corpus of recent USENET text, and full list of types. New query tool.

Cyrus Shaoul cyrus.shaoul at ualberta.ca
Tue Sep 5 06:45:39 UTC 2006


Adam Kilgarriff wrote:
> Just a comment about this kind of resource: wouldn't it be better to make it
> available as a searchable resource, allowing people to specify the searches
> they wanted and check up on anomalous frequencies, rather than distributing
> a frequency list which will inevitably raise many questions, for anyone
> planning to seriously use it, which they won't be able to answer (at least
> not without coming back to you, and their questions won't be your priority)
>
> Adam
>
>   
Good point, Adam. I have now made an interactive query tool available here:

    
http://www.psych.ualberta.ca/~westburylab/downloads/wlallfreq.download.html

It only allows one one-word query per submission, but I think it
should be sufficient for most quick searches.

Also, the type-list was a little too big for most usages, so I trimmed it down to words that 
appeared more than 20 times in the corpus (equivalent to words that appear more than 0.003 times per million).

Please send me any feedback that you have. (I do like answering questions! But I am 
busy with other work, as Adam surmised.)

Thanks, 

Cyrus



More information about the Corpora mailing list