[Corpora-List] Query on the use of Google for corpus research

Tue May 31 22:56:13 UTC 2005

> But then again, why not go simply to UPenn and purchase some
> license for English Gigaword plus some additional tens of millions
> words corpora from LDC?

For example because I'm also interested in 1 billion words of Italian,
German and Japanese?  Or because I think that the web can give us a more
varied picture of a language than a newswire corpus? But more in general
because I think that, with all the linguistic data available out there on
the web (probably orders of magnitude more data than the whole LDC and
ELDA catalogues put together), it is a good idea to develop/gather/share
tools and procedures to get them in "corpus format"...

Which of course does not mean that prefab corpora do not have their
function, as well.

Regards,

Marco