<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 5/12/2011 11:15 AM, Mark Davies wrote:
<blockquote
cite="mid:4772975CE1FA44478B61F33DE0E3A633E132E33860@harrow.exch.ad.byu.edu"
type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">Is the corpus itself or part of it available for downloading? It would be more useful if we could process the raw text for our own purpose rather than accessing it from a web interface.
</pre>
</blockquote>
</blockquote>
<pre wrap="">
As mentioned previously, the underlying n-grams data is freely available from Google at <a class="moz-txt-link-freetext" href="http://ngrams.googlelabs.com/datasets">http://ngrams.googlelabs.com/datasets</a> (see <a class="moz-txt-link-freetext" href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</a> re. licensing).
</pre>
</blockquote>
<br>
When I try to use it, I get "Session expired. <a
href="http://googlebooks.byu.edu/" target="_top">Click here</a> to
start new session."<br>
<br>
In theory, though, all the books are available for free from
<a class="moz-txt-link-freetext" href="http://books.google.com/">http://books.google.com/</a> . In the Google ngram interface at
<a class="moz-txt-link-freetext" href="http://ngrams.googlelabs.com/">http://ngrams.googlelabs.com/</a> there are links to date ranges. If
you click on those you will see a date range result for the search
term on the Google Books website. You can then click the "Plain
text" link in the upper right hand corner to see the OCRed text.
Then you can appreciate how rough some of the OCR has been.<br>
<pre class="moz-signature" cols="72">--
-Angus B. Grieve-Smith
<a class="moz-txt-link-abbreviated" href="mailto:grvsmth@panix.com">grvsmth@panix.com</a>
</pre>
</body>
</html>