[Corpora-List] How to download text from the web to build a corpus ?

Alexandre Trilla alex at atrilla.net
Thu Jun 21 10:44:12 UTC 2012


Perhaps for more specific purposes you could make use of an advanced
scraping service like Hubify.com

Alex


> Hello Imene,
>
> The utility `wget' which is available in most Unix-like OSes might be
> useful for you.
>
> With kind regards,
> Vladimir
>
> 2012/6/21 Imene Bensalem <bens.imene at gmail.com>:
>> Dear all,
>> I would build a corpus of Arabic text, and I would ask you about tools
>> you
>> know to  download text (or html pages) form the source websites.
>> I tried to use WinHTTrak to download pages form Wikipedia but
>> it always show
>> me an error and did download anything.
>> Thank you
>> Best regards
>>
>> Imene Bensalem
>> Mentouri University, Constantine , Algeria
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>


-- 
_________________________________________________

 ALEXANDRE TRILLA
 B.Sc., M.Sc. in Electronics, Telecommunications
 Engineering and Information Technology

 Email: alex at atrilla.net
 Homepage: http://atrilla.net
_________________________________________________



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list