Newspape Archive

Grant Barrett gbarrett at WORLDNEWYORK.ORG
Mon Jan 10 17:42:20 UTC 2005


On Jan 10, 2005, at 12:16, Mullins, Bill wrote:
> I just picked up a subscription to NewspaperArchive, and I've got a
> couple questions for those of you who are "power users".
> 1. Is there any way to put the search results into chronological order?

No.

> 2. Sometimes it will tell me how many results I have, sometimes it
> won't. Any ideas why this varies?

Because it is pathetic and lame. If it doesn't give you a count, just
assume the total is "more than I want to look through page by page" and
see if you can revise your search to produce fewer results.

> 3. Now that I've bought the full subscription at $99.95, it looks like
> I could have saved some money by buying it through Ancestry.com. Is
> their service the same? Same database, same functionality?

The one through Ancestry.com is even worse than NewspaperArchive, which
is itself a lesson in how not to put a full-text resource online.

Some tips:

1. It does seems to respect AND and AND NOT. So you can sometimes
search for things like "foo AND bar AND NOT fubar". However, since the
OCR sucks so badly, the results are still a crap shoot.

2. The OCR has recently been improved--but only in the searching. It
appears that they updated their searching index but not the OCR text
that is embedded in the PDF files. This means that you may get results
for a term but when you load the PDF document, and do a search to find
the term inside the document, it doesn't show up. The solution is to
also search for the words that appear around your bold search term in
the results.

3. It does seem to respect terms in quotes as phrases, but not always.

4. I get the best results using search terms of 15 characters or less,
including connectors: this seems to be the point at which OCR errors
are guaranteed to interfere with any results. This is unfortunate
because it does not allow great long boolean searches that would help
eliminate unwanted articles.

5. Double-check all bibliographic info against the PDF. It's better
than it was, but there are still too many errors in the info provided
with the search results. Sometimes it's necessary to page through all
the pages in a newspaper issue to figure out date, location, etc. In
rare cases, you have to trust the Newspaperarchive info because there
is none in the PDF, but this usually only occurs with very old
newspapers.

Grant Barrett
gbarrett at worldnewyork.org



More information about the Ads-l mailing list