Corpora: Help please - downloading text from the Web
Geoff Wilkins
geoffw at cobuild.collins.co.uk
Thu Mar 23 11:34:28 UTC 2000
Hi. Can anyone help me with the following:
I'm looking for software - preferably freeware or shareware - to
use to download text from Web sites, for use in a corpus.
This will be from large sites, with a lot of files, sub-directories
and internal links. Most basically, the software would simply download
HTML files from the site, following internal links from the Home page.
I've tried various "bots" that do this, but have had problems with all
of them. So I'd welcome recommendations for software that others have
found unproblematic (and powerful/multi-functioned) for this purpose.
And if anyone knows of packages that are more specifically aimed at the
task I'm undertaking, that would be even better.
Also useful would be software that mapped out the structure of sites, giving
an idea of the size of the files.
Geoff Wilkins
More information about the Corpora
mailing list