[Corpora-List] Wonky ngrams

Mark Davies Mark_Davies at byu.edu
Fri Jan 4 14:04:58 UTC 2013


In my interface to the Google Book n-grams (http://googlebooks.byu.edu/), the actual frequency data is displayed, and the numbers do make sense. 

For example (from the 155 billion word American English n-grams):

in spite (1990s): 289,536 tokens
http://googlebooks.byu.edu/?c=us&q=20263989

in spite of (1990s): 287,612 tokens
http://googlebooks.byu.edu/?c=us&q=20263992

Mark D.

============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================




From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Brett Reynolds [Brett.Reynolds at humber.ca]
Sent: Friday, January 04, 2013 5:04 AM
To: corpora at uib.no
Subject: [Corpora-List] Wonky ngrams


Can anyone explain why "in spite of" would have a higher frequency than "in spite" in the following graph from Google ngrams?
http://goo.gl/u7J3F


-------------------------------------


Brett Reynolds
English Language Centre
Humber Institute of Technology and Advanced Learning
Lakeshore Campus
Toronto, Ontario
Phone: 416-675-6622 ex. 3106


brett.reynolds at humber.ca
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list