Improving HathiTrust and Google Books: Assigning dates to subsections of volumes

Garson O'Toole adsgarsonotoole at GMAIL.COM
Fri Mar 25 00:12:15 UTC 2011


Assigning a single date to a volume is an understandable methodology
for libraries with traditional paper books, but when books are scanned
and placed in a database this crude technique can be considerably
improved.

A volume containing multiple issues of a periodical is a composite
object, and it should be assigned multiple dates in my opinion. For
example, suppose a volume contains 12 monthly issues from one year.
The scans should be split into 12 subsets and each set should be
labeled with a date specifying the year and the month.The database
must be structured to appropriately handle this extra information.

This somewhat simple augmentation would improve the results generated
by the database. Whenever a match is found the exact issue of the
periodical containing the match could be specified. The granularity of
"Ngram type" data displays could be improved using dates with greater
precision from daily, weekly and monthly periodicals.

Of course, Google Books is plagued with inaccurate metadata and
assigning the correct date to an entire volume is the most important
first step. Yet HathiTrust metadata seems to be more accurate. If GB
or HathiTrust are able to obtain the funding and man/womanpower then I
put forward this idea for consideration. The cost of physically
scanning the books is probably larger than the cost of assigning
metadata of this type.

One response to this message might be: Yeah, everybody working on
digital libraries knows this already. My response: Excellent. I hope
you are able to implement this idea in the future.
Garson

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list