[Corpora-List] Converting PDFs in Arabic to txt/xml for further corpus analysis (fwd)

Stephen Lewis stephen.lewis at gmail.com
Fri Sep 12 14:54:01 UTC 2014


For text based Arabic PDF to text conversion, I haven't found any
conversion tools that are as reliable as just copying and pasting into
something like Microsoft Word.
For graphics based Arabic PDF I'd recommend FineReader, but you will
definitely need some post-editing. Tesseract was not very good.

Stephen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140912/6ba4b068/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list