Corpora: Help please - downloading text from the Web
Andrew Harley
aharley at cup.cam.ac.uk
Mon Mar 27 08:53:49 UTC 2000
At 11:34 AM 23/03/2000 GMT, Geoff Wilkins wrote:
>
>Hi. Can anyone help me with the following:
>
>I'm looking for software - preferably freeware or shareware - to
>use to download text from Web sites, for use in a corpus.
For the Cambridge International Corpus, we have used the following two
products to download websites (after obtaining permission from the site
owner - an important point that shouldn't be disregarded):
> WEBWHACKER - http://www.bluesquirrel.com/whacker
> The original off-line browser!
>
> GRAB-A-SITE - http://www.bluesquirrel.com/grabasite
> An "Industrial Strength" off-line browser!
WebWhacker compresses the data while Grab-a-Site delivers it as HTML
organised in directory structures - much easier to handle for us, so we now
use Grab-a-Site.
Andrew Harley
Systems Development Manager
English Language Teaching & Dictionaries
Cambridge University Press
Direct line: (01223)325880
Fax: (01223)325850
Try Cambridge International Dictionaries online (over one and a half
million searches since August 1999) at:
http://www.cup.cam.ac.uk/elt/dictionary
We have recently published the Cambridge Dictionary of American English
(book and CD-ROM combined for only $20.95): see http://www.cup.org/esl/cdae
for more details and to order online.
More information about the Corpora
mailing list