Corpora: Re: Lines of English on Internet

juliewg at nac.net juliewg at nac.net
Mon Dec 3 16:42:39 UTC 2001


The question is a complex one. You need to take several things into
account.

1. Not all Internet content exists in HTML (which by itself would be
difficult to to calculate). Pages of on-line full text books, articles,
and white papers still exist in the not-so-common places on the
Internet, such as Gopher and WAIS, and in various university FTP
sources. Are you attempting to count these as well?

2. There are probably million lines of text associated with USEnet
newgroups. Are you trying to account for these as well? There are at
least 30,000 different newsgroups, many of which contain thousands of
lines in their threads.

3. You also need to attempt to calculate some sort of sliding
percentage, so that you can account for the constant growth of the
Internet. It looks like you have already tried to apply probability
theory to this. Does the amount of written content increase by 1% a
month? 5%? There doesn't seem to be any agreed-upon number to use to
calculate growth rate.

I am sure I am forgetting to factor in a lot of other variables, as
well. To be honest, I don't even know where to begin the data
collection, and I would have great concern that after I had found a way
to gather the data, it would already be antiquated.

Regards,
Julie Wang-Gempp

Hristo Tanev wrote:

> Dear Corpora List Members,
> Every week I see on this list many interesting
> questions and discussions. I think our email list is
> something very useful and interesting to read!
>
> I want to put here a question, which answer I couldn't
>
> find in Internet or in the literature I have.
> The question is: approximately how many  pages in
> English exist in Internet?
>
> A friend of mine told me something about the total
> number of pages in Internet (1 milliard). However I
> couldn't find some source, referring to this question.
>
> I tried to calculate the number of pages, using search
> engine and a formula from the probabilistic theory.
>
> The results I obtained were about 50-80 millions of
> pages in English.
>
> I don't know if this figures are wrong, but they seem
> to me too low. Does someone of you know approximately
> how many pages exist in Internet in ENglish language?
> Thank you in advance!
>
> Hristo Tanev
> ITC,Irst
>
> ________________________________________________________________
> Nokia 5510 looks weird sounds great.
> Go to http://uk.promotions.yahoo.com/nokia/ discover and win it!
> The competition ends 16 th of December 2001.



More information about the Corpora mailing list