Anomalous and unreliable database behavior
george.thompson at NYU.EDU
Sat Sep 11 16:30:18 UTC 2010
Garson O'Toole: "All the major text databases that I have used are aggravatingly unreliable in my experience. Sometimes the behavior is inexplicable."
I find the newspaper databases to be particularly unreliable, not in the sense of turning up false matches -- that's to be taken for granted -- but in failing to find what is there to be found. I use Proquest, EAN (which seems now to be called America's Historic Newspapers) and the newspaper database from Gale, 19th Century US Newspapers.
As an example: "The African Theatre" that appears in my signature was connected with a open-air cabaret called "The African Grove", run by a man named William A Brown for the benefit of black New Yorkers, in his back yard, on Thomas street. It's only known from a newspaper story in mid summer of 1821. City directories and tax records showed that Brown lived in the house from about 1818, but several considerations lead me to suppose that he did not open this garden until 1821.
I searched EAN for "African Grove" and "William A. Brown" and found nothing.
Brown had two problems with his location on Thomas street -- one was that it was a residential street, and his neighbors objected to being kept awake by the talk, laughter and singing from the Grove. The other was that Thomas street, though only 2 blocks long then, had 3 or 4 whore-houses and low drinking dens. The neighbors started a campaign to get rid of them, and, inadvertently or not, Brown's place was swept away too. So I was interested in lowlife and otherwise on Thomas street. When I searched for "Thomas street", I found
ST VINCENT HOUSE. WILLIAM A. BROWN, the proprietor of the above establishment, No. 48 Thomas-street, hereby gives notice to the PEOPLE OF COLOR, of this city, that he will open his Garden on Saturday evening, June 9th, by the name of the NEW-YORK AFRICAN GROVE, at which time, and during the season, every refreshment peculiar to such places will be served in the best style.
W. A. B. particularly solicits the patronage of the People of Color, as he has been at considerable expense for their comfort and convenience. Je8 1w
New-York Gazette & General Advertiser, June 8, 1821, p. 2, col. 5
This appeared 3 times in the search results -- but the newspaper's note "Je8 1w" showed that it should have been printed 6 times. When I scanned the paper for the following days, I found the additional 3 printings.
So, by my calculation, there were 3 potential matches in each of the 6 appearances of this notice: a total of 18; EAN only found 3 of them. Similar experiences give me the notion that these databases are likely to find only about 1/4 of what they should be finding.
Frequently, for instance, I will read a story in one of these databases and want to find related stories -- earlier or later stories, or other appearances of someone named in the story. I search for the name, and whatever may come up does not include the story I had started from.
George A. Thompson
Author of A Documentary History of "The African Theatre", Northwestern Univ. Pr., 1998, but nothing much lately.
----- Original Message -----
From: Garson O'Toole <adsgarsonotoole at gmail.com>
Date: Friday, September 10, 2010 12:00 pm
Subject: Anomalous and unreliable database behavior
To: ADS-L at LISTSERV.UGA.EDU
> Fred Shapiro wrote:
> > Excellent searching, Garson! When I searched Newspaperarchive this
> morning, I did not get this to come up, indeed I couldn't get any hits
> for "fish needs a bicycle" or "fish without a bicycle." What is it
> with that database?
> All the major text databases that I have used are aggravatingly
> unreliable in my experience. Sometimes the behavior is inexplicable.
> Here is an example with Google Books. The following link goes to a
> webpage for the book "Memoirs of an Amnesiac" by Oscar Levant within
> the Google Books archive. But the actual database slot is filled with
> an unrelated play "Children, Children" by Jack Horrigan:
> This will probably be corrected over time, and the link will go
> somewhere else, or the play "Children, Children" will be replaced with
> something else.
> I do not wish to be overly critical. The Google Books team is
> constructing a fantastic resource, and the search functionality is
> superior to most databases in my opinion. Also, I sometimes have very
> positive experiences with the responsiveness of the Google research
> team when I submit feedback. Recently, a document I inquired about was
> made available in "full view" instead of "snippet view".
> The American Dialect Society - http://www.americandialect.org
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l