<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Imene,<br>
<br>
if you are familiar with Python, I would suggest the scrapy
project, as you can easily isolate parts of the page that you are
interested in.<br>
<br>
Btw, Wikipedia I think offers the possibility to download the
content in a compressed archive. This way you avoid stressing
their server.<br>
<br>
best<br>
Lefteris<br>
<br>
On 21/06/12 11:25, Imene Bensalem wrote:<br>
</div>
<blockquote
cite="mid:CAJreDwEL7QPqe67Vw1GU4uVi3BV=XSyAyn2H_pMkyyOFMR3v9g@mail.gmail.com"
type="cite">Dear all,
<div>I would build a corpus of Arabic text, and I would ask you
about tools you know to download text (or html pages) form the
source websites.</div>
<div>I tried to use WinHTTrak to download pages form Wikipedia but
it always show me an error and did download anything.</div>
<div>Thank you</div>
<div>Best regards</div>
<div><br>
</div>
<div>Imene Bensalem</div>
<div>Mentouri University, Constantine , Algeria </div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<br>
<br>
<br>
<pre class="moz-signature" cols="72">--
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30 238 95-1806
Fax. +49-30 238 95-1810
-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------
</pre>
</body>
</html>