<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi,<br>
<br>
It depends on the type of PDF you want to convert.<br>
<br>
If your PDF is made of texts (an office doc converted to PDF for
instance) , then pdftotext or Adobe should do the conversion
properly.<br>
<div class="moz-cite-prefix">But if your PDF file is made of images
(scan of documents), which is very common for Arabic PDF files,
than you need an OCR software supporting the Arabic language.<br>
For the latter case I would recommend Abby Fine Reader which gives
good recognition results on Arabic.<br>
<br>
Hope it helps<br>
Djamel<br>
<br>
<div class="moz-signature">-- <br>
<font size="1">
<b>Djamel MOSTEFA</b><br>
Directeur technique / CTO<br>
42, rue de l'Université 69007 Lyon<br>
Tel: +33 (0) 4 78 58 32 35<br>
Mob: +33 (0) 6 04 42 19 66<br>
<a href="http://www.techlimed.com">www.techlimed.com</a></font></div>
<br>
<br>
Le 11/09/2014 12:45, Eric Atwell a écrit :<br>
</div>
<blockquote
cite="mid:alpine.LRH.2.11.1409111136340.32330@cslin-gps.csunix.comp.leeds.ac.uk"
type="cite">Can anyone recommend PDF-to=txt (or PDF-to=xml) tools
for Arabic?
<br>
I have had enquiries from several Arabic corpus linguistics
researchers,
<br>
example below from Anastasiya Andrusenko in Valencia
<br>
<br>
thanks - Eric Atwell, Leeds University
<br>
WWW: <a class="moz-txt-link-freetext" href="http://www.comp.leeds.ac.uk/eric">http://www.comp.leeds.ac.uk/eric</a>
<br>
<a class="moz-txt-link-freetext" href="http://www.comp.leeds.ac.uk/arabic">http://www.comp.leeds.ac.uk/arabic</a>
<br>
<br>
---------- Forwarded message ----------
<br>
Date: Thu, 11 Sep 2014 10:50:36 +0100
<br>
From: Anastasiya Andrusenko <a class="moz-txt-link-rfc2396E" href="mailto:anisika2002@gmail.com"><anisika2002@gmail.com></a>
<br>
To: Eric Atwell <a class="moz-txt-link-rfc2396E" href="mailto:E.S.Atwell@leeds.ac.uk"><E.S.Atwell@leeds.ac.uk></a>
<br>
Subject: Converting PDFs in Arabic to txt. for further corpus
analysis
<br>
<br>
<br>
Hi,
<br>
<br>
I saw your profile in internet and thought may be you can help me.
<br>
My name is Anastasiia Andrusenko, currently I am doing research on
<br>
metadiscourse features in Arabic Research Articles (Analysis of
Arabic corpus)
<br>
at the Department of Applied Linguistics of the Universitat
Politècnica de
<br>
València.
<br>
I have PDF files in Arabic. I need them to be in txt. format. But
the problem
<br>
is that by converting them with Adobe Acrobat Prof. the txt. files
are not
<br>
readible.
<br>
<br>
Could you please advice any solution to this problem or may be you
know any
<br>
tool for text analysis for Arabic.
<br>
Thank you in advance
<br>
<br>
Regards,
<br>
<br>
Anastasiia
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<br>
</body>
</html>