[Corpora-List] Automatic web file downloader
Siva Reddy
gvs.iiit at gmail.com
Thu Sep 9 12:33:42 UTC 2010
Hi Chelo (and colleagues),
You can try BooTCaT http://bootcat.sslmit.unibo.it/ . You need to provide
initial seeds to it.
We developed a tool called Corpus Factory for SketchEngine which can
download large corpora for any language. But it is a licensed tool. More
details about it
http://www.lrec-conf.org/proceedings/lrec2010/pdf/79_Paper.pdf
Best,
Siva
On Thu, Sep 9, 2010 at 3:34 PM, Chelo Vargas <chelo.vargas at ua.es> wrote:
> Dear colleagues,
> I would like to know about software used to build up a corpus of texts by
> downloading web pages with the help of a search engine. I already know
> Webgetter
> (a utility in WST), the one in Sketch Engine, and in TERMINUS
> (http://melot.upf.edu/Terminus2009/index_es.html)
>
> Thank you very much for your help.
>
> Best wishes,
>
> ****************************
> PhD. Ms Chelo Vargas-Sierra
> University of Alicante (Spain)
> Dpto. de Filología Inglesa
> Apdo. 99
> 03080 Alicante
> Tlf. 96 590 3438
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
--
http://sivareddy.in
--
http://sivareddy.in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100909/cda484aa/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list