Finding the long s

Ben Zimmer bgzimmer at BABEL.LING.UPENN.EDU
Tue Oct 11 14:53:24 UTC 2011

Mark Liberman wrote about tracking "long s" in December, when the
Ngram Viewer was launched:

On Tue, Oct 11, 2011 at 10:48 AM, Joel S. Berson <Berson at> wrote:
> Ingenious use of Google Ngrams to trace the decline in use of the long s.
> Joel
> >From: Brian Ogilvie <bwogilvie at GMAIL.COM>
> >
> >The disappearance of the long S in English is one of those phenomena
> >that can be traced using Google Ngrams--unintentionally, because
> >Google's OCR appears to fairly reliably misidentify the 18th-century
> >long S as an F. I discovered this while fooling around with the site
> >and being stunned at how rare words like "papist" were before 1800;
> >then it hit me that I ought to be searching for "papift." Compare
> >these graphs:
> >
> ><>
> ><>
> ><>
> >
> >(I enclosed the URLs in angle brackets, which ought to make them
> >clickable even if they cross a line break in your email program, but
> >if not, copy and paste them into a browser.)
> >
> >If you choose any word with a non-terminal S that is not an English
> >word when the S is replaced with an F, you will almost certainly see a
> >similar graph, with the F-version (i.e. the long S) starting to
> >decline c. 1790, crossing the short S right around 1800, and almost
> >totally gone by 1815. Moreover, the upward slope on the S version is
> >nearly a mirror image of the downward slope on the F version. The best
> >words for testing the long S are common words, especially function
> >words--though not always. Looking for she/fhe, for instance, shows a
> >distinct upward trend, first of fhe and then of she. Again, though,
> >1800 is the point where the long S is overtaken by its rival:
> ><>
> >
> >Caveats: my completely unscientific examination of some of the
> >17th-century texts suggests that Google is better at identifying the
> >long S in them, possibly because the half-crossbar tends to be longer.
> >And of course majuscules, such as in titles, are not identified.
> >Still, the evidence seems quite strong, and it's consistent across a
> >range of words.
> >
> >If we want to know _why_ this happened, research such as Paul Nash's
> >article, which Jerry Morris summarized, is crucial.
> >
> >Brian
> >
> >P.S. The implications for anyone who does searches in early modern
> >full text databases are obvious....
> ------------------------------------------------------------
> The American Dialect Society -

Ben Zimmer

The American Dialect Society -

More information about the Ads-l mailing list