Grammar Girl on corpora

Dave Wilton dave at WILTON.NET
Sat Nov 5 19:04:43 UTC 2011


Ngram is extremely unreliable with older texts, due to the problems OCR has
with older fonts. Actually, I'd push the date up to mid-nineteenth century.

In fact, given Google Books severe problems with reliability of its
metadata, I wouldn't rely on the public version at all for any serious
research.



-----Original Message-----
From: American Dialect Society [mailto:ADS-L at LISTSERV.UGA.EDU] On Behalf Of
Joel S. Berson
Sent: Saturday, November 05, 2011 12:44 PM
To: ADS-L at LISTSERV.UGA.EDU
Subject: Re: Grammar Girl on corpora

At 10/28/2011 04:50 PM, Neal Whitman wrote:
>Grammar Girl's current episode gives some good tips for newbies on
>how to use the Google Books and BYU corpora, with a nice shout-out
>to Victor Steinbok: http://t.co/crHKqip1

Why does Grammar Girl start her analysis at 1800?  She writes "the
Ngram Viewer lets you search those words, and it makes graphs of how
often your search terms appeared over time starting around 1800."
Ngram goes back earlier.

On the long s, Grammar girl writes ""It's never made enough of a
difference to matter in my searches".  It surely makes a difference
for many of the searches ADS-Lers do.  (And I'm disappointed she
didn't use as her example "suck".)

Joel

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list