[Ads-l] NGram vs. the OED (UNCLASSIFIED)

Joel Berson berson at ATT.NET
Fri May 8 14:44:14 UTC 2015

Well, as Garson found also, Googling turns up a number of instances before 1800.  I see an apparent 6 different quotations before that period -- although one is from Sinclair Lewis.
Interestingly, GBooks seems to find "gentleman scholar" in 11 impressions of Chesterfield's Letters to His Son.  This is one of the two OED quotations.  (The OED has 1748, the year of the letter; GBooks has 1774 etc., the years of publication.)  But NGram apparently doesn't count it.
 Ben wrote:  "The greatest focus has been on ensuring quality in the period from 1800 to 2000."  I can understand that -- it would take a great deal more sophistication to ensure quality before 1800.  For example:  Finding the OED's 1586 quotation, which contains "Gentlemen Schollers".  Or, as I was just reminded by it, NGrams is case sensitive!  (That seems to vitiate NGram's usefulness for books published before, perhaps, the 1870s.)  But NGram doesn't find Chestefield's "Gentleman scholar" either.  And How does NGram handle inflection?  E.g., plurals.  And hyphens?

     From: "Mullins, Bill CIV (US)" <william.d.mullins18.civ at mail.mil>
 To: 'Joel Berson' <berson at att.net> 
 Sent: Thursday, May 7, 2015 12:41 PM
 Subject: RE: NGram vs. the OED (UNCLASSIFIED)
Classification: UNCLASSIFIED
Caveats: NONE

Two possible reasons

-- Because Google Books hasn't scanned the relevant sources
-- Because Google Books did scan them, but the OCR didn't recognize the phrases correctly

> -----Original Message-----
> From: American Dialect Society [mailto:ADS-L at LISTSERV.UGA.EDU] On
> Behalf Of Joel Berson
> Sent: Thursday, May 07, 2015 11:39 AM
> Subject: NGram vs. the OED
> ---------------------- Information from the mail header ---------------
> --------
> Sender:      American Dialect Society <ADS-L at LISTSERV.UGA.EDU>
> Poster:      Joel Berson <berson at ATT.NET>
> Subject:      NGram vs. the OED
> -----------------------------------------------------------------------
> --------
> If the OED(2) has quotations for "gentleman-scholar" for 1586 and 1748
> (I assume it will find more from later years), why does Google's NGram
> show no occurrences before 1843?

The American Dialect Society - http://www.americandialect.org

More information about the Ads-l mailing list