[Corpora-List] The size of Internet in words
Mark Davies
Mark_Davies at byu.edu
Tue Jan 20 19:22:11 UTC 2004
Serge Sharoff wrote:
> Does anyone know the size of Internet in terms of words and
> relative to languages [and] the amount of texts for a
> given language?
Some possible starting points:
(Previous CORPORA discussion; Dec 2001)
http://helmer.aksis.uib.no/corpora/2001-4/0161.html
(Paper by Bill Fletcher; originally written 2001)
http://www.kwicfinder.com/FletcherCLLT2001.pdf
(Widely-quoted 2000 paper by Greffenstette and Nioche)
http://arxiv.org/ftp/cs/papers/0006/0006032.pdf
(Dec 2003; by language; but not by words)
http://www.caslon.com.au/metricsguide6.htm
(April 2003; by language; but not by words)
http://www.dlib.org/dlib/april03/lavoie/04lavoie.html
Mark Davies
=================================================
Mark Davies
Assoc. Prof., Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
http://davies-linguistics.byu.edu
** Corpus design and use // Web-database scripting **
** Historical linguistics // Functional-typological grammar **
** Spanish and Portuguese historical and dialectal syntax **
=================================================
More information about the Corpora
mailing list