[Corpora-List] Automatic web file downloader

Siva Reddy gvs.iiit at gmail.com
Thu Sep 9 12:33:42 UTC 2010


Hi Chelo (and colleagues),

You can try BooTCaT http://bootcat.sslmit.unibo.it/ . You need to provide
initial seeds to it.

We developed a tool called Corpus Factory for SketchEngine which can
download large corpora for any language. But it is a licensed tool. More
details about it
http://www.lrec-conf.org/proceedings/lrec2010/pdf/79_Paper.pdf

Best,
Siva

On Thu, Sep 9, 2010 at 3:34 PM, Chelo Vargas <chelo.vargas at ua.es> wrote:

> Dear colleagues,
> I would like to know about software used to build up a corpus of texts by
> downloading web pages with the help of a search engine. I already know
> Webgetter
> (a utility in WST), the one in Sketch Engine, and in TERMINUS
> (http://melot.upf.edu/Terminus2009/index_es.html)
>
> Thank you very much for your help.
>
> Best wishes,
>
> ****************************
> PhD. Ms Chelo Vargas-Sierra
> University of Alicante (Spain)
> Dpto. de Filología Inglesa
> Apdo. 99
> 03080 Alicante
> Tlf. 96 590 3438
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
http://sivareddy.in



-- 
http://sivareddy.in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100909/cda484aa/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list