<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Yes, I think the appeal is in the quick interface: all you have
to do is type in two words and you'll get a cute little graph. A
bunch of people are tweeting them up a storm, and now the developers
have even added a "Tweet" button:<br>
<br>
<a class="moz-txt-link-freetext" href="http://twitter.com/#!/search/ngram">http://twitter.com/#!/search/ngram</a><br>
<br>
But the corpus also has a lot of slips that can't be rectified
without a lot of cleanup. Look at this graph of "hitler" and
"stalin":<br>
<br>
<a class="moz-txt-link-freetext" href="http://ngrams.googlelabs.com/graph?content=hitler%2Cstalin&year_start=1850&year_end=2000&corpus=5&smoothing=3">http://ngrams.googlelabs.com/graph?content=hitler%2Cstalin&year_start=1850&year_end=2000&corpus=5&smoothing=3</a><br>
<br>
Now look at "Hitler" and "Stalin":<br>
<br>
<a class="moz-txt-link-freetext" href="http://ngrams.googlelabs.com/graph?content=Hitler%2C+Stalin&year_start=1850&year_end=2000&corpus=5&smoothing=3">http://ngrams.googlelabs.com/graph?content=Hitler%2C+Stalin&year_start=1850&year_end=2000&corpus=5&smoothing=3</a><br>
<br>
The queries are case-sensitive, which is no big deal, but what's
with all the lower-case "hitler"s from the nineteenth century?
"Beyond the reach of her <i>hitler </i>and withering sarcasm"?
"both in conjunction with his uncle, until the <em>hitler's</em>
retirement in 1819"?<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.google.com/search?q=%22hitler%22&tbs=bks:1,cdr:1,cd_min:1850,cd_max:1853&lr=lang_en">http://www.google.com/search?q=%22hitler%22&tbs=bks:1,cdr:1,cd_min:1850,cd_max:1853&lr=lang_en</a><br>
<br>
Turns out most of them are OCR errors for "bitter" or "latter."
There are also at least two instances where the scanned images for a
twentieth-century book were tacked onto the end of a
nineteenth-century book, with the nineteenth-century metadata. I'm
surprised that there are so many errors for the decade 1850-1860,
though. Maybe the person in charge of OCR for that decade was a
slacker?<br>
<br>
Finally, there's the "long s problem":<br>
<br>
<a class="moz-txt-link-freetext" href="http://ngrams.googlelabs.com/graph?content=myfterious%2Cmysterious&year_start=1700&year_end=2000&corpus=0&smoothing=5">http://ngrams.googlelabs.com/graph?content=myfterious%2Cmysterious&year_start=1700&year_end=2000&corpus=0&smoothing=5</a><br>
<pre class="moz-signature" cols="72">--
-Angus B. Grieve-Smith
<a class="moz-txt-link-abbreviated" href="mailto:grvsmth@panix.com">grvsmth@panix.com</a>
</pre>
</body>
</html>