Ngram -- absolute numbers and false positives

Joel S. Berson Berson at ATT.NET
Tue Dec 24 19:30:05 UTC 2013


Is there any way in NGram to get a count of the number of occurrences
of an ngram, rather than a percentage of the "corpus"?  Google Books
search specifying a date range isn't helpful since it doesn't give a
count, let alone dealing with false positives.

But Ngram also deals out false positives.  I am looking at "nigger"
once again, since a book I'm reading claims a Concord, Mass.,
slaveowner called his slave "an ugly nigger" circa 1774.  Too early,
I think.  Ngram (English corpus) shows initial use in 1789, then not
again until 1798 and 1800.  I'd like to know the absolute numbers.

The OED has a first use as a hostile term -- sense 1.b. -- in 1775
(citing an 1856 publication), and its next quotation is 1811
(although perhaps it left out some in-between).  But its corpus is
not the same as Ngram's, and the OED not only knows the intent of a
use but also eliminates false positives.

For the period 1770--1788, Ngram searches Google Books and returns 10
instances.  Of these I think only one is a true positive; one other
may be (it's preview only); and the remaining eight are optical
scanning errors or Monte Pythons (i.,e., somethings completely different).

Joel

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list