[Corpora-List] On the use of Google ngrams
Marc Brysbaert
Marc.Brysbaert at UGent.be
Wed Feb 9 11:24:45 UTC 2011
Hi,
related to the question of what use validated corpora have now that we
can all download and analyse millions of words from the internet, you
may be interested in the article we just published about the use of
the Google ngrams for psycholinguistic research on word recognition:
http://www.frontiersin.org/language_sciences/abstract/9569
In a nutshell we show that the Google word frequencies (Ngram=1) do
not correlate well with the lexical decision times from the Elexicon
Project and other databases. Furthermore, the correlations decrease
for older books. At first sight, the latter is good news. However, we
also see that 2005+ frequencies are better predictors for experiments
run in 1990, suggesting that part of the quality difference is due to
the types of books included in the Google project over the years. So,
it may be good to keep in mind that word use differences in time to
some extent are influenced by the fact that the types of books
included in Google Books may not be constant over years.
Kind regards,
Marc Brysbaert
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list