[Corpora-List] Converting PDFs in Arabic to txt/xml for further corpus analysis

Angus Grieve-Smith grvsmth at panix.com
Fri Sep 12 14:34:29 UTC 2014


I've seen a lot of recommendations for ABBYY's Fine Reader, but I want 
to point out that if all you want to do is convert PDFs to text, ABBYY's 
PDF Transformer uses the exact same OCR engine, and is half the price:

http://pdftransformer.abbyy.com/

I've used it for French with good results.

On 9/12/2014 8:11 AM, Edward Jahn wrote:
> I have used ABBYY FineREader with great sucess for many languages, 
> including some
> with non-Latin scripts, although I have not tried it with Arabic. I 
> have tried some other
> OCR software products, and found this to be the best.
>
> The download link is
> http://www.abbyy.com/?adw=google_hq_us_search_brand&gclid=CNa67dDQ28ACFbTm7Aod_UsATA
>
> It needs to be trained on the individual language, which may take 
> time. And there
> are some tricks to using it that take some time to learn. But once the 
> software and
> the user have both been trained, I find it works well.
>
> Ed Jahn
> George Mason University
> Virginia US
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


-- 
				-Angus B. Grieve-Smith
				grvsmth at panix.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140912/bd808dd1/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list