Corpora: Help please - downloading text from the Web

Dave Braze davebraze at uconn.cted.net
Mon Mar 27 01:07:29 UTC 2000


Knut Hofland wrote:

> On Thu, 23 Mar 2000, Geoff Wilkins wrote:
>
> > I'm looking for software - preferably freeware or shareware - to
> > use to download text from Web sites, for use in a corpus.
>
> I have used w3mir
> http://www.math.uio.no/~janl/w3mir/
> and
> SiteSnagger
> http://hotfiles.zdnet.com/cgi-bin/texis/swlib/hotfiles/info.html?fcode=000P7Z
> Both have shortcomings, but I have downloaded gigabytes of HTML-files
> with the programs.

There is also wget:

http://www.interlog.com/~tcharron/wgetwin.html

I've only used it a little, but it seems serviceable enough.

-Dave


--
Dave Braze
Linguistics Department, U-1145
University of Connecticut
Storrs, CT 06269-1145 USA



More information about the Corpora mailing list