[Ads-l] "Access: Newspaper Archive"
ADSGarson O'Toole
adsgarsonotoole at GMAIL.COM
Thu Jan 24 18:09:27 UTC 2019
Thanks for your note, George. I once encountered an extreme form of
bad OCR text. There was a set of newspaper pages that were, nominally,
in the NewspaperArchive database, but the text (probably scanned from
microfilm) was severely degraded which made the pages completely
unreadable by human or machine. The OCR text extracted from each page
was gobbledygook. Overall, the archive claimed to have some newspaper
pages when, in truth, the pages were absent.
I only saw these newspaper pages because they were adjacent to a page
that was partially readable, and I visited the partially readable page
during a search.
There is a danger that hardcopy or superior microfilm scans might be
thrown out because a database company inaccurately claims it already
has the pertinent pages in its database. Any company could use an
automated procedure to flag pages that generated low-quality OCR
results for manual review.
Separate issue: I regularly encounter misdated pages in the Newspapers.com.
Garson
On Thu, Jan 24, 2019 at 12:39 PM George Thompson
<george.thompson at nyu.edu> wrote:
>
> Some of you folks are always eager to learn of files of digitzed
> newspapers. "Access: Newspaper Archive" is a commercial database; I don't
> know how widely available it is, but I at least am blessed to have the use
> of it. But before you charge your academic library demanding that it
> subscribe too, let me offer a bit of caution.
>
> From "Access: Newspaper Archive", searching for the word "streetcar" in NYC
> newspapers:
> "New York Times
> <http://access.newspaperarchive.com/new-york-times/1867-04-08?tag=streetcar&rtserp=tags/streetcar?pc=20293&psi=67&pci=7&psb=dateasc>Monday,
> April 8, 1867, New York, New York
>
> for this Stale, haa lamed a BMaw'to treedmen. dapreeatfaig viol sat
> aaaartlenler enppoMd rtgalB; and mrgtag tham to have MB lor w Oouria lor
> Jen to the rectal *streetcar* Thia totter re- saBjeot whioh. la BOW
> Mcatviag atttn i.on aad exciting of our gljoilaqulB neigbbore. H o( rich o
> This seems to be typical of the readings this database offers. Surely this
> is a useless product? Although a "rectal streetcar" is an interesting
> concept.
> Of the 48 "words" in this extract, I count 17 which are actual English
> words, though some (totter, exciting, and of course rectal) are probably
> the result of a misreading by the OCR.
>
> GAT
>
>
> --
> George A. Thompson
> The Guy Who Still Looks Stuff Up in Books.
> Author of A Documentary History of "The African Theatre", Northwestern
> Univ. Pr., 1998.
>
> But when aroused at the Trump of Doom / Ye shall start, bold kings, from
> your lowly tomb. . .
> L. H. Sigourney, "Burial of Mazeen", Poems. Boston, 1827, p. 112
>
> The Trump of Doom -- also known as The Dunghill Toadstool. (Here's a
> picture of his great-grandfather.)
> http://www.parliament.uk/worksofart/artwork/james-gillray/an-excrescence---a-fungus-alias-a-toadstool-upon-a-dunghill/3851
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list