[Corpora-List] Converting PDFs in Arabic to txt/xml for further corpus analysis

Edward Jahn ejahn3141 at gmail.com
Fri Sep 12 12:11:28 UTC 2014


I have used ABBYY FineREader with great sucess for many languages,
including some
with non-Latin scripts, although I have not tried it with Arabic. I have
tried some other
OCR software products, and found this to be the best.

The download link is
http://www.abbyy.com/?adw=google_hq_us_search_brand&gclid=CNa67dDQ28ACFbTm7Aod_UsATA

It needs to be trained on the individual language, which may take time. And
there
are some tricks to using it that take some time to learn. But once the
software and
the user have both been trained, I find it works well.

Ed Jahn
George Mason University
Virginia US
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140912/f24cd6dc/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list