[Corpora-List] Diachronic frequency change

Chris Fournier chris.m.fournier at gmail.com
Mon May 14 18:12:00 UTC 2012


Just below the graph <http://books.google.com/ngrams> is a link to download
all of the Google Books Ngram data <http://books.google.com/ngrams/datasets> as
CSV files (including years).

On Mon, May 14, 2012 at 1:37 PM, Mark Davies <Mark_Davies at byu.edu> wrote:

> My question re. Google Books data is how you can even compare A vs B in
> the first place. After all, with the standard Google Books interface, the
> frequency charts are just "pictures". There are no actual numbers to plug
> into a spreadsheet. So how do you compare one "picture" to another, decade
> by decade?
>
> With http://googlebooks.byu.edu/ (or the 400 million word COHA:
> http://corpus.byu.edu/coha/), on the other hand, you do have access the
> frequency, decade by decade. For more info, see
> http://googlebooks.byu.edu/compare-googleBooks.asp.
>
> Mark D.
>
> ============================================
> Mark Davies
> Professor of Linguistics / Brigham Young University
> http://davies-linguistics.byu.edu/
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>
> ________________________________________
> From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Brett
> Reynolds [brettrey at gmail.com]
> Sent: Friday, May 11, 2012 7:27 AM
> To: Corpora List
> Subject: [Corpora-List] Diachronic frequency change
>
> The string "all of the", for example, demonstrates a dramatic increase in
> frequency as a percentage of the entire corpus leading up to about 1920 as
> can be seen in this Google Ngram graph:
>
> http://tinyurl.com/c2mnoor
>
> Since this is a percentage, it shows an increase relative to other words.
> if you wanted to test for significance, would it make sense to simply use
> this comparison (string vs entire corpus) or would it make more sense to
> compare it to another similar string such as "many of the"? What
> statistical test would you use? Would it be best to compare the nadir and
> the peak, or to repeatedly compare consecutive years?
>
> I expect that the answers will be something like "that depends on your
> purpose." Currently, however, I don't really have a purpose. I'm just
> poking around, observing, and learning.
>
> Best,
> Brett
>
> -----------------------
> Brett Reynolds
> English Language Centre
> Humber College Institute of Technology and Advanced Learning
> Toronto, Ontario, Canada
> brett.reynolds at humber.ca
>
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120514/30c7709e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list