[Corpora-List] Re: problems with Google counts
FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE
jfidel at siu.buap.mx
Thu Mar 17 02:25:50 UTC 2005
Hi, Corpora Guys,
Sorry I don't remember who wrote suggesting simply repeating the word in
Google to get a supposedly more realistic count of pages with the word in it
(I had deleted all those messages after reading them). I tried this
yesterday on a couple of Spanish words (eficaz, eficiente). (By the way,
the results were apparently consonant with a student's search of the
100,000,000 word corpusdelespañol site.) Anyway, what repeating the word
apparently does is limit the results to those sites which have the word at
least two times, in this case cutting down on the numbers by roughly 10%.
If that is what is happening, this implies serious problems for relatively
rare words, which may not occur twice in very many pages at all. At any
rate, the decrease in pages encountered seemed to be about the same
proportionately in both cases. (We're talking here about roughly 1.5M
original hits.) If I'm missing the point of the suggestion, please
straighten me out.
Jim
James L. Fidelholtz
Posgrado en Ciencias del Lenguaje, ICSyH
Benemérita Universidad Autónoma de Puebla MÉXICO
More information about the Corpora
mailing list