[Corpora-List] The size of Internet in words

Serge Sharoff s.sharoff at leeds.ac.uk
Tue Jan 20 16:22:42 UTC 2004

Does anyone know the size of Internet in terms of words and relative
to languages?  Google shows the number of documents on its front page
(3,307,998,701 at the time of writing this), there is a comparative
analysis of the database used by various search engines at:

Two things that are not known from the statistics: the number of words
of real text per page and the amount of texts for a given language.

The first question is partly addressed by an older statistic survey:
Can we estimate that 6 terabytes per 800 million pages gives the average
page length to 7.5 KB, or about 1000 words (in English)?  So, the size of
modern Internet would be about 3 terawords, if it was English only. But can
we trust this and how about its distribution over different languages?


Dr. Serge Sharoff
Centre for Translation Studies
School of Modern Languages and Cultures
University of Leeds
Leeds, LS2 9JT

tel: +44(0)113 343 7287
fax: +44(0)113 343 3287
WWW: http://www.comp.leeds.ac.uk/ssharoff/

More information about the Corpora mailing list