[Corpora-List] Converting PDFs in Arabic to txt/xml for further corpus analysis (fwd)

Eric Atwell E.S.Atwell at leeds.ac.uk
Thu Sep 11 10:45:09 UTC 2014


Can anyone recommend PDF-to=txt (or PDF-to=xml) tools for Arabic?
I have had enquiries from several Arabic corpus linguistics researchers,
example below from Anastasiya Andrusenko in Valencia

thanks - Eric Atwell, Leeds University
  WWW: http://www.comp.leeds.ac.uk/eric
       http://www.comp.leeds.ac.uk/arabic

---------- Forwarded message ----------
Date: Thu, 11 Sep 2014 10:50:36 +0100
From: Anastasiya Andrusenko <anisika2002 at gmail.com>
To: Eric Atwell <E.S.Atwell at leeds.ac.uk>
Subject: Converting PDFs in Arabic to txt. for further corpus analysis


Hi,

I saw your profile in internet and thought may be you can help me.
My name is Anastasiia Andrusenko, currently I am doing research on
metadiscourse features in Arabic Research Articles (Analysis of Arabic corpus)
at the Department of Applied Linguistics of the Universitat Politècnica de
València.
I have PDF files in Arabic. I need them to be in txt. format. But the problem
is that by converting them with Adobe Acrobat Prof. the txt. files are not
readible.

Could you please advice any solution to this problem or may be you know any
tool for text analysis for Arabic.
Thank you in advance

Regards,

Anastasiia
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list