[Corpora-List] Frequency lists (corrected)
Stefan Evert
stefan.evert at uos.de
Mon Feb 23 20:29:11 UTC 2009
> There is, of course, the Google language modeling data, based on over
> a trillion words worth of web pages:
>
> http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
In that context, I can't resist pointing out my signature ...
--
The wonders of Googleology (episode 1)
"from collectibles to cars"
84,700,000 -- Google
9,443,672 -- Google N-grams (Web 1T5)
1 -- ukWaC
[ stefan.evert at uos.de | http://purl.org/stefan.evert ]
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list