[Corpora-List] Frequency lists (corrected)

Stefan Evert stefan.evert at uos.de
Mon Feb 23 20:29:11 UTC 2009


> There is, of course, the Google language modeling data, based on over
> a trillion words worth of web pages:
>
>   http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

In that context, I can't resist pointing out my signature ...

--
The wonders of Googleology (episode 1)

"from collectibles to cars"
	84,700,000 -- Google
	9,443,672 -- Google N-grams (Web 1T5)
	1 -- ukWaC

[ stefan.evert at uos.de | http://purl.org/stefan.evert ]









_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list