[Ads-l] FW: NGram vs. the OED

ADSGarson O'Toole adsgarsonotoole at GMAIL.COM
Thu May 7 17:31:26 UTC 2015

The Ngram database was constructed using a subset of the Google Books
database. Some books used for citations in the OED are not in GB (I
assume). The Wikipedia article for "Google Ngram Viewer" asserts:

[Begin excerpt]
It was developed by Jon Orwant and Will Brockman and released in
mid-December 2010. . . .
Google populated the database from over 5 million books published up to 2008.
[End excerpt]

It is possible that the Ngram database has not been updated after
2010. If this is true then books digitized after 2011 would be absent.

OCR quality is sometimes poor for older works. Also, I still see
metadata errors with regularity.

Google Books does currently contain some instances of "Gentleman
Scholar" and "Gentleman-Scholar" before the 1843 date you mentioned.

The following instance is not hyphenated. The volume was digitized in
March 2011, so it may not be in the Ngram corpus.

Year: 1674
Title: Remains Concerning Britain: Their Languages, Names, Surnames,
Allusions, Anagramms, Armories, Moneys, Impresses, . . .
Author: William Camden
Publisher: Printed for, and sold by, Charles Harper at the Flower de
Luce over against St. Dunstan's Church, and . . . Fletstreet. London
Quote Page 467
Digitized: Mar 3, 2011


[Begin excerpt]
A Gentleman Scholar drawn from the University where he was well liked,
to the Court, for which in respect of his bashful modesty, he was not
fit; . . .
[End excerpt]

Below is a hyphenated instance in Google Books in 1716.  The book was
digitized in July 2007.

Year: 1716
Title: Athenae Britannicae, Or, A Critical History of the Oxford and
Cambridge Writers and
Writings . . .
Author: Myles Davies
Publisher: Printed for the Author and by his Appointment only at the
Corner Little Queen Street Holbourn, London


[Begin excerpt]
Whether some of the higher Clergy us'd that Gentleman-Scholar with
unbecoming Imperiousness, or with a Treatment not suitable to his
unexceptionable Parts and Deserts, and he thereupon grew unredressable
and irreconcilable with the whole Order, or no, is uncertain; . . .
[End excerpt]


On Thu, May 7, 2015 at 1:21 PM, Shapiro, Fred <fred.shapiro at yale.edu> wrote:
> ---------------------- Information from the mail header -----------------------
> Sender:       American Dialect Society <ADS-L at LISTSERV.UGA.EDU>
> Poster:       "Shapiro, Fred" <fred.shapiro at YALE.EDU>
> Subject:      FW: NGram vs. the OED
> -------------------------------------------------------------------------------
> =0A=
> Isn't NGram based on the contents of Google Books, rather than on citations=
>  from the OED?  Or are you assuming that everything cited in the OED is als=
> o in Google Books?=0A=
> =0A=
> Fred Shapiro=0A=
> =0A=
> =0A=
> =0A=
> ________________________________________=0A=
> From: American Dialect Society [ADS-L at LISTSERV.UGA.EDU] on behalf of Joel B=
> erson [berson at att.net]=0A=
> Sent: Thursday, May 07, 2015 12:38 PM=0A=
> Subject: NGram vs. the OED=0A=
> =0A=
> If the OED(2) has quotations for "gentleman-scholar" for 1586 and 1748 (I a=
> ssume it will find more from later years), why does Google's NGram show no =
> occurrences before 1843?=0A=
> =0A=
> ------------------------------------------------------------=0A=
> The American Dialect Society - https://urldefense.proofpoint.com/v2/url?u=
> =3Dhttp-3A__www.americandialect.org&d=3DAwICaQ&c=3D-dg2m7zWuuDZ0MUcV7Sdqw&r=
> =3DsRkhHMQo6W5Ird1lkQFqb23bCfSHAR2XjUSUG53db5M&m=3DSsIT8UgI0iMMoOv2u4miy0nJ=
> iO67mZLp-A0rQrPoOwM&s=3DE-ijGhnwUY0iFvqvP7MO44YyGXfyHVQEvKDqpVGASxo&e=3D=0A=
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org

The American Dialect Society - http://www.americandialect.org

More information about the Ads-l mailing list