[Corpora-List] Diachronic frequency change

Mark Davies Mark_Davies at byu.edu
Mon May 14 17:37:24 UTC 2012


My question re. Google Books data is how you can even compare A vs B in the first place. After all, with the standard Google Books interface, the frequency charts are just "pictures". There are no actual numbers to plug into a spreadsheet. So how do you compare one "picture" to another, decade by decade?

With http://googlebooks.byu.edu/ (or the 400 million word COHA: http://corpus.byu.edu/coha/), on the other hand, you do have access the frequency, decade by decade. For more info, see http://googlebooks.byu.edu/compare-googleBooks.asp.

Mark D.

============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Brett Reynolds [brettrey at gmail.com]
Sent: Friday, May 11, 2012 7:27 AM
To: Corpora List
Subject: [Corpora-List] Diachronic frequency change

The string "all of the", for example, demonstrates a dramatic increase in frequency as a percentage of the entire corpus leading up to about 1920 as can be seen in this Google Ngram graph:

http://tinyurl.com/c2mnoor

Since this is a percentage, it shows an increase relative to other words. if you wanted to test for significance, would it make sense to simply use this comparison (string vs entire corpus) or would it make more sense to compare it to another similar string such as "many of the"? What statistical test would you use? Would it be best to compare the nadir and the peak, or to repeatedly compare consecutive years?

I expect that the answers will be something like "that depends on your purpose." Currently, however, I don't really have a purpose. I'm just poking around, observing, and learning.

Best,
Brett

-----------------------
Brett Reynolds
English Language Centre
Humber College Institute of Technology and Advanced Learning
Toronto, Ontario, Canada
brett.reynolds at humber.ca





_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list