<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <font face="Verdana">wget is also available with cygwin (for

      Windows) but wget is a command-line tool (I don't know whether you

      feel comfortable with this kind of interface). If you are looking

      for a software with a GUI you may prefer something like Teleport

      Pro (it resembles WinHTTrak a lot).<br>

      <br>

      If what you need is "crawling" wikipedia pages, maybe this page

      will be of interest for you :<br>

      <a href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">http://en.wikipedia.org/wiki/Wikipedia:Database_download</a><br>

      <br>

      With king regards,<br>

      <br>

      Frederic<br>

      <br>

    </font>

    <div class="moz-cite-prefix">Le 21/06/2012 12:44, Alexandre Trilla a

      écrit :<br>

    </div>

    <blockquote

      cite="mid:4f1388c2061ec54147a182db33c3caae.squirrel@atrilla.net"

      type="cite">

      <pre wrap="">Perhaps for more specific purposes you could make use of an advanced

scraping service like Hubify.com

Alex

</pre>

      <blockquote type="cite">

        <pre wrap="">Hello Imene,

The utility `wget' which is available in most Unix-like OSes might be

useful for you.

With kind regards,

Vladimir

2012/6/21 Imene Bensalem <a class="moz-txt-link-rfc2396E" href="mailto:bens.imene@gmail.com"><bens.imene@gmail.com></a>:

</pre>

        <blockquote type="cite">

          <pre wrap="">Dear all,

I would build a corpus of Arabic text, and I would ask you about tools

you

know to  download text (or html pages) form the source websites.

I tried to use WinHTTrak to download pages form Wikipedia but

it always show

me an error and did download anything.

Thank you

Best regards

Imene Bensalem

Mentouri University, Constantine , Algeria

_______________________________________________

UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>

Corpora mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>

<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>

</pre>

        </blockquote>

        <pre wrap="">

_______________________________________________

UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>

Corpora mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>

<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>

</pre>

      </blockquote>

      <pre wrap="">

</pre>

    </blockquote>

    <br>

    <br>

  </body>

</html>