<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Verdana">wget is also available with cygwin (for
Windows) but wget is a command-line tool (I don't know whether you
feel comfortable with this kind of interface). If you are looking
for a software with a GUI you may prefer something like Teleport
Pro (it resembles WinHTTrak a lot).<br>
<br>
If what you need is "crawling" wikipedia pages, maybe this page
will be of interest for you :<br>
<a href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">http://en.wikipedia.org/wiki/Wikipedia:Database_download</a><br>
<br>
With king regards,<br>
<br>
Frederic<br>
<br>
</font>
<div class="moz-cite-prefix">Le 21/06/2012 12:44, Alexandre Trilla a
écrit :<br>
</div>
<blockquote
cite="mid:4f1388c2061ec54147a182db33c3caae.squirrel@atrilla.net"
type="cite">
<pre wrap="">Perhaps for more specific purposes you could make use of an advanced
scraping service like Hubify.com
Alex
</pre>
<blockquote type="cite">
<pre wrap="">Hello Imene,
The utility `wget' which is available in most Unix-like OSes might be
useful for you.
With kind regards,
Vladimir
2012/6/21 Imene Bensalem <a class="moz-txt-link-rfc2396E" href="mailto:bens.imene@gmail.com"><bens.imene@gmail.com></a>:
</pre>
<blockquote type="cite">
<pre wrap="">Dear all,
I would build a corpus of Arabic text, and I would ask you about tools
you
know to download text (or html pages) form the source websites.
I tried to use WinHTTrak to download pages form Wikipedia but
it always show
me an error and did download anything.
Thank you
Best regards
Imene Bensalem
Mentouri University, Constantine , Algeria
_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<pre wrap="">
_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<br>
<br>
</body>
</html>