[Ads-l] NGram vs. the OED (UNCLASSIFIED)

Joel Berson berson at ATT.NET
Fri May 8 14:44:14 UTC 2015

Well, as Garson found also, Googling turns up a number of instances before 1800.  I see an apparent 6 different quotations before that period -- although one is from Sinclair Lewis.
Interestingly, GBooks seems to find "gentleman scholar" in 11 impressions of Chesterfield's Letters to His Son.  This is one of the two OED quotations.  (The OED has 1748, the year of the letter; GBooks has 1774 etc., the years of publication.)  But NGram apparently doesn't count it.
 Ben wrote:  "The greatest focus has been on ensuring quality in the period from 1800 to 2000."  I can understand that -- it would take a great deal more sophistication to ensure quality before 1800.  For example:  Finding the OED's 1586 quotation, which contains "Gentlemen Schollers".  Or, as I was just reminded by it, NGrams is case sensitive!  (That seems to vitiate NGram's usefulness for books published before, perhaps, the 1870s.)  But NGram doesn't find Chestefield's "Gentleman scholar" either.  And How does NGram handle inflection?  E.g., plurals.  And hyphens?

Two possible reasons

-- Because Google Books hasn't scanned the relevant sources
-- Because Google Books did scan them, but the OCR didn't recognize the phrases correctly

