[Ads-l] Databases for Historical U.S. Searching

Sat Jul 2 02:15:28 UTC 2022

Fred said:
>> If anyone knows of specific state projects that have a lot of material
>> not included in Chronicling America, I would love to hear about them.

Thanks to everyone who has commented on this thread.

Having to visit fifty separate state databases and deal with fifty
separate user interfaces would be a nightmare. Organizations of
historians, librarians, archivists, and linguists (including the
American Dialect Association) should be pushing to improve this
situation.

Archivists employed by U.S. states should be coordinating with one
another to create a standard format for scans and metadata derived
from books, magazines, journals, newspapers and other documents. These
scans should be aggregated into a single searchable database with a
high-quality user interface.

There should be ongoing research to create the best possible optical
character recognition (OCR) engine. The OCR engine should be
maintained as an open source piece of software available to all.
Periodically, the  scans should be processed by the latest-best OCR
engine and a comprehensive index of all the text should be
constructed.

Is the Internet Archive doing this? Is HathiTrust doing this? Is
Chronicling America (Library of Congress) doing this?

Of course, it is easy to point out what should be done and to make
demands. But the current situation is aggravating because it is
extraordinarily wasteful.

Garson

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org