[Corpora-List] Converting PDFs in Arabic to txt/xml for further corpus analysis (fwd)

Xiao, Richard r.xiao at lancaster.ac.uk
Thu Sep 11 11:32:49 UTC 2014


Hi,

I have used ABBYY FineReader to convert Chinese PDF texts. The website of the tool shows a wide range of languages including Arabic are supported: http://finereader.abbyy.com/recognition_languages/

A useful feature of this tool  is that it can be configured to automatically ignore headers and footers in converting PDF pages into plain text files.

Regards,

Richard
________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Eric Atwell [E.S.Atwell at leeds.ac.uk]
Sent: 11 September 2014 11:45 AM
To: CORPORA discussion forum
Cc: Mohamed Adaney; Ali el-Ameen Ahmed; Sameer Alrehaili; Anastasiya Andrusenko
Subject: [Corpora-List] Converting PDFs in Arabic to txt/xml for further corpus analysis (fwd)

Can anyone recommend PDF-to=txt (or PDF-to=xml) tools for Arabic?
I have had enquiries from several Arabic corpus linguistics researchers,
example below from Anastasiya Andrusenko in Valencia

thanks - Eric Atwell, Leeds University
  WWW: http://www.comp.leeds.ac.uk/eric
       http://www.comp.leeds.ac.uk/arabic

---------- Forwarded message ----------
Date: Thu, 11 Sep 2014 10:50:36 +0100
From: Anastasiya Andrusenko <anisika2002 at gmail.com>
To: Eric Atwell <E.S.Atwell at leeds.ac.uk>
Subject: Converting PDFs in Arabic to txt. for further corpus analysis


Hi,

I saw your profile in internet and thought may be you can help me.
My name is Anastasiia Andrusenko, currently I am doing research on
metadiscourse features in Arabic Research Articles (Analysis of Arabic corpus)
at the Department of Applied Linguistics of the Universitat Politècnica de
València.
I have PDF files in Arabic. I need them to be in txt. format. But the problem
is that by converting them with Adobe Acrobat Prof. the txt. files are not
readible.

Could you please advice any solution to this problem or may be you know any
tool for text analysis for Arabic.
Thank you in advance

Regards,

Anastasiia

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list