[Corpora-List] Wonky ngrams
Mark Davies
Mark_Davies at byu.edu
Fri Jan 4 14:04:58 UTC 2013
In my interface to the Google Book n-grams (http://googlebooks.byu.edu/), the actual frequency data is displayed, and the numbers do make sense.
For example (from the 155 billion word American English n-grams):
in spite (1990s): 289,536 tokens
http://googlebooks.byu.edu/?c=us&q=20263989
in spite of (1990s): 287,612 tokens
http://googlebooks.byu.edu/?c=us&q=20263992
Mark D.
============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Brett Reynolds [Brett.Reynolds at humber.ca]
Sent: Friday, January 04, 2013 5:04 AM
To: corpora at uib.no
Subject: [Corpora-List] Wonky ngrams
Can anyone explain why "in spite of" would have a higher frequency than "in spite" in the following graph from Google ngrams?
http://goo.gl/u7J3F
-------------------------------------
Brett Reynolds
English Language Centre
Humber Institute of Technology and Advanced Learning
Lakeshore Campus
Toronto, Ontario
Phone: 416-675-6622 ex. 3106
brett.reynolds at humber.ca
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list