[Corpora-List] The size of Internet in words

Mark Davies Mark_Davies at byu.edu
Tue Jan 20 19:22:11 UTC 2004


Serge Sharoff wrote:

> Does anyone know the size of Internet in terms of words and 
> relative to languages [and] the amount of texts for a 
> given language?

Some possible starting points:

(Previous CORPORA discussion; Dec 2001) 
http://helmer.aksis.uib.no/corpora/2001-4/0161.html

(Paper by Bill Fletcher; originally written 2001)
http://www.kwicfinder.com/FletcherCLLT2001.pdf

(Widely-quoted 2000 paper by Greffenstette and Nioche)
http://arxiv.org/ftp/cs/papers/0006/0006032.pdf

(Dec 2003; by language; but not by words)
http://www.caslon.com.au/metricsguide6.htm

(April 2003; by language; but not by words)
http://www.dlib.org/dlib/april03/lavoie/04lavoie.html

Mark Davies

=================================================
Mark Davies
Assoc. Prof., Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
http://davies-linguistics.byu.edu

** Corpus design and use // Web-database scripting **
** Historical linguistics // Functional-typological grammar **
** Spanish and Portuguese historical and dialectal syntax **
================================================= 


More information about the Corpora mailing list