<div dir="ltr">Hi,<div><br></div><div>My own PDF to Text Converter, AntFileConverter, should also work although I have not tested it on Arabic writing. You can find it here on my software page here (just scroll to the middle of the list of tools):</div><div><br></div><div><a href="http://www.laurenceanthony.net/software.html" target="_blank">http://www.laurenceanthony.net/software.html</a><br></div><div><br></div><div><br></div><div>Laurence.</div><div><br></div><div class="gmail_extra"><br clear="all"><div>###############################################################<br>Laurence ANTHONY, Ph.D.<br>Professor<br>Center for English Language Education in Science and Engineering (CELESE)<br>Faculty of Science and Engineering<br>Waseda University<br>3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan<br>E-mail: <a href="mailto:anthony0122@gmail.com" target="_blank">anthony0122@gmail.com</a><br>WWW: <a href="http://www.antlab.sci.waseda.ac.jp/" target="_blank">http://www.antlab.sci.waseda.ac.jp/</a><br>###############################################################</div>
<br><div class="gmail_quote">On Fri, Sep 12, 2014 at 2:29 AM, Craig Pfeifer <span dir="ltr"><<a href="mailto:craig.pfeifer@gmail.com" target="_blank">craig.pfeifer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Another option is the open source Apache Tika project:<div><a href="https://tika.apache.org/" target="_blank">https://tika.apache.org/</a><br></div><div><br></div><div>It *should* handle arabic properly, with the standard caveats about needing OCR for image PDFs.</div><div><br></div><div>Craig</div></div><div class="gmail_extra"><br clear="all"><div>______________<br><a href="mailto:craig.pfeifer@gmail.com" target="_blank">craig.pfeifer@gmail.com</a></div>
<br><div class="gmail_quote"><div><div>On Thu, Sep 11, 2014 at 6:45 AM, Eric Atwell <span dir="ltr"><<a href="mailto:E.S.Atwell@leeds.ac.uk" target="_blank">E.S.Atwell@leeds.ac.uk</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>Can anyone recommend PDF-to=txt (or PDF-to=xml) tools for Arabic?<br>
I have had enquiries from several Arabic corpus linguistics researchers,<br>
example below from Anastasiya Andrusenko in Valencia<br>
<br>
thanks - Eric Atwell, Leeds University<br>
WWW: <a href="http://www.comp.leeds.ac.uk/eric" target="_blank">http://www.comp.leeds.ac.uk/<u></u>eric</a><br>
<a href="http://www.comp.leeds.ac.uk/arabic" target="_blank">http://www.comp.leeds.ac.uk/<u></u>arabic</a><br>
<br>
---------- Forwarded message ----------<br>
Date: Thu, 11 Sep 2014 10:50:36 +0100<br>
From: Anastasiya Andrusenko <<a href="mailto:anisika2002@gmail.com" target="_blank">anisika2002@gmail.com</a>><br>
To: Eric Atwell <<a href="mailto:E.S.Atwell@leeds.ac.uk" target="_blank">E.S.Atwell@leeds.ac.uk</a>><br>
Subject: Converting PDFs in Arabic to txt. for further corpus analysis<br>
<br>
<br>
Hi,<br>
<br>
I saw your profile in internet and thought may be you can help me.<br>
My name is Anastasiia Andrusenko, currently I am doing research on<br>
metadiscourse features in Arabic Research Articles (Analysis of Arabic corpus)<br>
at the Department of Applied Linguistics of the Universitat Politècnica de<br>
València.<br>
I have PDF files in Arabic. I need them to be in txt. format. But the problem<br>
is that by converting them with Adobe Acrobat Prof. the txt. files are not<br>
readible.<br>
<br>
Could you please advice any solution to this problem or may be you know any<br>
tool for text analysis for Arabic.<br>
Thank you in advance<br>
<br>
Regards,<br>
<br>
Anastasiia<br>
<br></div></div><span>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></span></blockquote></div><br></div>
<br>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br></div></div>