Corpora: Help please - downloading text from the Web
Dave Braze
davebraze at uconn.cted.net
Mon Mar 27 01:07:29 UTC 2000
Knut Hofland wrote:
> On Thu, 23 Mar 2000, Geoff Wilkins wrote:
>
> > I'm looking for software - preferably freeware or shareware - to
> > use to download text from Web sites, for use in a corpus.
>
> I have used w3mir
> http://www.math.uio.no/~janl/w3mir/
> and
> SiteSnagger
> http://hotfiles.zdnet.com/cgi-bin/texis/swlib/hotfiles/info.html?fcode=000P7Z
> Both have shortcomings, but I have downloaded gigabytes of HTML-files
> with the programs.
There is also wget:
http://www.interlog.com/~tcharron/wgetwin.html
I've only used it a little, but it seems serviceable enough.
-Dave
--
Dave Braze
Linguistics Department, U-1145
University of Connecticut
Storrs, CT 06269-1145 USA
More information about the Corpora
mailing list