[Corpora-List] How to download text from the web to build a corpus ?

Wladimir Sidorenko wlsidorenko at gmail.com
Thu Jun 21 09:46:48 UTC 2012


Hello Imene,

The utility `wget' which is available in most Unix-like OSes might be
useful for you.

With kind regards,
Vladimir

2012/6/21 Imene Bensalem <bens.imene at gmail.com>:
> Dear all,
> I would build a corpus of Arabic text, and I would ask you about tools you
> know to  download text (or html pages) form the source websites.
> I tried to use WinHTTrak to download pages form Wikipedia but it always show
> me an error and did download anything.
> Thank you
> Best regards
>
> Imene Bensalem
> Mentouri University, Constantine , Algeria
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list