Ngram -- absolute numbers and false positives
Joel S. Berson
Berson at ATT.NET
Tue Dec 24 19:30:05 UTC 2013
Is there any way in NGram to get a count of the number of occurrences
of an ngram, rather than a percentage of the "corpus"? Google Books
search specifying a date range isn't helpful since it doesn't give a
count, let alone dealing with false positives.
But Ngram also deals out false positives. I am looking at "nigger"
once again, since a book I'm reading claims a Concord, Mass.,
slaveowner called his slave "an ugly nigger" circa 1774. Too early,
I think. Ngram (English corpus) shows initial use in 1789, then not
again until 1798 and 1800. I'd like to know the absolute numbers.
The OED has a first use as a hostile term -- sense 1.b. -- in 1775
(citing an 1856 publication), and its next quotation is 1811
(although perhaps it left out some in-between). But its corpus is
not the same as Ngram's, and the OED not only knows the intent of a
use but also eliminates false positives.
For the period 1770--1788, Ngram searches Google Books and returns 10
instances. Of these I think only one is a true positive; one other
may be (it's preview only); and the remaining eight are optical
scanning errors or Monte Pythons (i.,e., somethings completely different).
Joel
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list