Google's metadata mish-mash

Mark Mandel thnidu at GMAIL.COM
Wed Dec 22 01:20:30 UTC 2010

A friend of mine started a discussion
<>on LiveJournal about Google
n-grams. I looked closely into the early data for "crisis" that she cited
and found that almost all of it was bogus; I  commented on it in her blog
post <>,
more or less as follows:


Two plateaus for "crisis" in works in English in 1630-1640 9 books; at least
ten claimed occurrences; one genuine hit.

 1. Bogus.
*    Causae desperatae Gisb. Voetii ... adversus Spongiam ... D. Corn.
Iansenii ... crisis ostensa ...*
   In Latin. Text not available for view. The word "crisis" is in the title
may be the only one, but it hints at many occurrences in the text... none of
which would be in English.

 2. Bogus.
    *Eminent literary and scientific men of Italy, Spain, and Portugal ...
    The cabinet cyclopaedia: Biography
    Volume 2 of Eminent Literary and Scientific Men of Italy, Spain, and
    Mary Wollstonecraft Shelley, James Montgomery
    Dionysius Lardner
2 hits... by Mary Shelley, in 1635? Snd the snippet mentions mentions
Napoleon's defeat.The date on the title page is 1835, mis-OCRed as 1635.

 3. Bogus.
    *The mystery of selfedeceiving: or, a discourse and discovery of the
deceitfulnesse of mansheart*
    Daniel Dyke
OCR says "will v crisis Salomons"; image shows "will verifie Salomons".

 4. The real thing!
    *A compleat history of the present state of war in Africa, between the
Spaniards and Algerines: ...*
    J Morgan.

"States arrived at that Criſis cannot long subsist". A hit, a palpable hit,
even with an s-longa + i ligature.

 5. Bogus.
    *The Tsimshian: Their Arts and Music* - Page 7
    Voila Garfield - 1632 - Full view
    By the first decade of the nineteenth century maritime furs were scarce
and rivalry between the Russian American Company and the Hudson's Bay
Company for control of what is now western Canada and adjacent Alaska was
reaching a crisis. ...
    The date is the founding date of the printing house, from their emblem
on the title page. V*io*la Garfield is one of three authors.

 6. Bogus.
    *Assentos: Volume 2*
    Portuguese India. Conselho do Estado, Panduronga S. S. Pissurlencar,
Vithal Trimbak Gune - 1634
No content available, but the next "related book" is described as
*    Assentos do Conselho do Estado*
    Portuguese India. Conselho do Estado, Vithal Trimbak Gune, Arquivo
Histórico do Estado da India
    Historical Archives of Goa, 1972 - Foreign Language Study
So what we have here is vol. 2 of a historical archive of Portuguese India,
published starting in 1972, and very probably written in Portuguese.

1639 shows a very impressive peak, with "crisis" constituting 0.0100% (1 in
10,000) words. Where do those hits appear?

 1.  Bogus.
    *The Whole Works of the Most Ref. James Ussher, D.D., vol 5.* The text
is in Latin, and the string "crisis", while real (though not English), is
part of a longer word after a hyphenated line break:
  ... hypo-
"Hypocrisis", in the sense of "hypocrisy". Are the Muses sending us a
comment on Google n-grams' claims?

 2. Bogus.
    *Annotations upon the five books of Moses, the book of the Psalmes and
the song of songs,* Henry Ainsworth. An OCR error for italicized "Iericho".

 3. Bogus.
    *American Bibliography: 1639-1729*
*    Volume 1 of American Bibliography: A Chronological Dictionary of All
Books, Pamphlets, and Periodical Publications Printed in the United States
of America from the Genesis of Printing in 1639 Down to and Including the
Year 1820, Charles Evans*
    Charles Evans
    P. Smith, 1639
Look at Google Books took the starting year of the range in the short title
as the year of the book's publication, when the title of the full work dates
it no earlier than 1820.

I give Google Ngrams a D-- here, saved from an F only by that single genuine

Mark A. Mandel

The American Dialect Society -

More information about the Ads-l mailing list