Amazon Full-Text Search of 120,000 Books

Grant Barrett gbarrett at WORLDNEWYORK.ORG
Thu Oct 23 17:32:09 UTC 2003


Rejoice in public corpora:

Amazon is making 120,000 of its books full-text searchable on its site.
That is, the contents of the books, not just the metadata. This has
been active for a while already, and it works pretty well. You do your
regular search, and in the results you'll see an excerpt from the
actual page on which the words or phrases were used. Then you can call
up a fuller quote from the work, or even an image of the very page.
Voilà, instant online citation resource. It also allows searching
within a specific book.

How it Works
http://www.amazon.com/exec/obidos/tg/browse/-/10197021/

Amazon Announcement
http://g-images.amazon.com/images/G/01//books/inside/jeff-letter-2.gif

The main weakness is that the search is not very advanced. For example,
it seems to respect quotes for a phrase search, listing matching
phrases first, but it a) avoids common words like articles and
prepositions. meaning "Missouri mule" and "Missouri on a mule" return
the same results, and b) it also returns non-phrase results, meaning
you still have an unrefined list of results. It returns identical
results for singulars and plurals, no matter which you look for:
"Missouri mule" and "Missouri mules" return the same results, which is
probably better than not doing it.

A minor weakness, one I would hope is amended, is that only 120,000
books are currently searchable. I suspect the dictionaries never will
be, but they've a lot of catching up to do.

Another minor weakness is that the OCR, like most OCR, is imperfect.
There's garbled text here and there.

Still, a good resource.

Grant



More information about the Ads-l mailing list