<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style id="owaParaStyle">P {
MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px
}
</style>
</head>
<body fPStyle="1" ocsi="0">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 8pt;">
<p>>> Just below the graph is a link to download all of the Google Books Ngram<a></a> data as CSV files (including years).<br>
</p>
<p>Yes, I know that one can download all of the n-grams data; that's what I've used for my version:
<a href="http://googlebooks.byu.edu/">http://googlebooks.byu.edu/</a></p>
<p> </p>
<p>But there are *billions* of rows of data, and it takes a fairly powerful machine to process these. For a typical end-user, who just wants the data for 5-10 strings, there's no way to get this data from the web interface.
</p>
<p> </p>
<p>MD</p>
<div>
<p> </p>
<div style="FONT-FAMILY: Tahoma; FONT-SIZE: 13px">
<p>============================================<br>
Mark Davies<br>
Professor of Linguistics / Brigham Young University<br>
<a href="http://davies-linguistics.byu.edu/">http://davies-linguistics.byu.edu/</a></p>
<p>** Corpus design and use // Linguistic databases **<br>
** Historical linguistics // Language variation **<br>
** English, Spanish, and Portuguese **<br>
============================================<br>
</p>
</div>
</div>
<div style="FONT-FAMILY: Times New Roman; COLOR: #000000; FONT-SIZE: 16px">
<hr tabindex="-1">
<div style="DIRECTION: ltr" id="divRpF834853"><font color="#000000" size="2" face="Tahoma"><b>From:</b> Chris Fournier [chris.m.fournier@gmail.com]<br>
<b>Sent:</b> Monday, May 14, 2012 12:12 PM<br>
<b>To:</b> Mark Davies<br>
<b>Cc:</b> Brett Reynolds; Corpora List<br>
<b>Subject:</b> Re: [Corpora-List] Diachronic frequency change<br>
</font><br>
</div>
<div></div>
<div><a href="http://books.google.com/ngrams" target="_blank">Just below the graph</a> is a
<a href="http://books.google.com/ngrams/datasets" target="_blank">link to download all of the Google Books Ngram data</a> as CSV files (including years).<br>
<br>
<div class="gmail_quote">On Mon, May 14, 2012 at 1:37 PM, Mark Davies <span dir="ltr">
<<a href="mailto:Mark_Davies@byu.edu" target="_blank">Mark_Davies@byu.edu</a>></span> wrote:<br>
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">
My question re. Google Books data is how you can even compare A vs B in the first place. After all, with the standard Google Books interface, the frequency charts are just "pictures". There are no actual numbers to plug into a spreadsheet. So how do you compare
one "picture" to another, decade by decade?<br>
<br>
With <a href="http://googlebooks.byu.edu/" target="_blank">http://googlebooks.byu.edu/</a> (or the 400 million word COHA:
<a href="http://corpus.byu.edu/coha/" target="_blank">http://corpus.byu.edu/coha/</a>), on the other hand, you do have access the frequency, decade by decade. For more info, see
<a href="http://googlebooks.byu.edu/compare-googleBooks.asp" target="_blank">http://googlebooks.byu.edu/compare-googleBooks.asp</a>.<br>
<br>
Mark D.<br>
<br>
============================================<br>
Mark Davies<br>
Professor of Linguistics / Brigham Young University<br>
<a href="http://davies-linguistics.byu.edu/" target="_blank">http://davies-linguistics.byu.edu/</a><br>
<br>
** Corpus design and use // Linguistic databases **<br>
** Historical linguistics // Language variation **<br>
** English, Spanish, and Portuguese **<br>
============================================<br>
<br>
________________________________________<br>
From: <a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a> [<a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a>] on behalf of Brett Reynolds [<a href="mailto:brettrey@gmail.com" target="_blank">brettrey@gmail.com</a>]<br>
Sent: Friday, May 11, 2012 7:27 AM<br>
To: Corpora List<br>
Subject: [Corpora-List] Diachronic frequency change<br>
<div class="HOEnZb">
<div class="h5"><br>
The string "all of the", for example, demonstrates a dramatic increase in frequency as a percentage of the entire corpus leading up to about 1920 as can be seen in this Google Ngram graph:<br>
<br>
<a href="http://tinyurl.com/c2mnoor" target="_blank">http://tinyurl.com/c2mnoor</a><br>
<br>
Since this is a percentage, it shows an increase relative to other words. if you wanted to test for significance, would it make sense to simply use this comparison (string vs entire corpus) or would it make more sense to compare it to another similar string
such as "many of the"? What statistical test would you use? Would it be best to compare the nadir and the peak, or to repeatedly compare consecutive years?<br>
<br>
I expect that the answers will be something like "that depends on your purpose." Currently, however, I don't really have a purpose. I'm just poking around, observing, and learning.<br>
<br>
Best,<br>
Brett<br>
<br>
-----------------------<br>
Brett Reynolds<br>
English Language Centre<br>
Humber College Institute of Technology and Advanced Learning<br>
Toronto, Ontario, Canada<br>
<a href="mailto:brett.reynolds@humber.ca" target="_blank">brett.reynolds@humber.ca</a><br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">
http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">
http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</body>
</html>