[Corpora-List] Additions to amazon.com "Search Inside" feature

David Oakey d.j.oakey at bham.ac.uk
Thu Jun 16 10:06:26 UTC 2005


Apologies if I'm be reporting something that everyone already knows
about except me, but Amazon.com's "Inside this book" feature now
provides - for all books in its "Search Inside" scheme - a concordance
(in the sense of a frequency list rather than KWIC citations), text
statistics, and statistically improbable phrases (SIPs). A SIP works a
bit like an n-gram version of a keyword in Wordsmith Tools, with the
reference corpus being all the books in Amazon's "Search Inside" corpus.
If Amazon finds "a phrase that occurs a large number of times in a
particular book relative to all Search Inside books, that phrase is a
SIP in that book." On the shopping page for the book "Into the void with
Ace Frehley," (the notoriously spaced former guitarist in the rock band
KISS) for example, the SIP they list is "black nail polish". This is
impressive - and not at all improbable - if you know much about the
career of Ace Frehley. 

The concordance results are presented alphabetically, with more frequent
words shown in a larger font size. Text statistics include standard
readability indices (the Fog Index seems apt here) and they have a "fun
stats" section where they calculate words per dollar and words per ounce
(words per pound and words per kilo on amazon.co.uk). More information
on the Amazon site about the number of books in the scheme (yes, 120,000
books, 33 million pages etc., but that was nearly 2 years ago), their
subject areas, authorship details etc. would of course be useful. While
this is intended as a marketing feature (it "allows you to search
millions of pages to find exactly the book you want to buy"), I believe
it would be interesting to corpora list members in itself.

Best wishes,

David Oakey 
------------------------------
Lecturer in English Language
English for International Students Unit
University of Birmingham, UK
phone: + 44 121 4145703
email: d.j.oakey at bham.ac.uk
http://www.eisu.bham.ac.uk/staff/oakeydavid.htm
------------------------------ 



More information about the Corpora mailing list