Arabic-L:GEN:Arabic OCR responses

Dilworth Parkinson dilworth_parkinson at BYU.EDU
Wed Nov 15 18:20:57 UTC 2006


------------------------------------------------------------------------
Arabic-L: Wed 15 Nov 2006
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:Arabic OCR response
2) Subject:Arabic OCR response
3) Subject:Arabic OCR response

-------------------------Messages-----------------------------------
1)
Date: 15 Nov 2006
From: "Ben Huyck" <regexer at gmail.com>
Subject:rabic OCR response

Dear Waheed,

As far as I am aware, there are really only two viable resources for  
Arabic OCR. They are Sakhr's Automatic Reader, and NovoDynamics'  
VERUS. They both perform well, although I personally have more  
experience with Sahkr.

[urls]
www.sakhr.com
www.novodynamics.com

I know Sakhr works with MS Word, and I am fairly certain that VERUS  
does as well. Both are reasonably fast (OCR tools usually are), and  
both are capable of batch processing.

As far as accuracy goes, you'll have to be the judge of how "clean"  
your documents are. Both tools will degrade as the quality of input  
images goes down, but one of NovoDynamics' main selling points is  
that it performs well on degraded images. I'm not aware of any  
independent evaluation of this claim.

Unfortunately, quirk-free software is not generally available in a  
field as new as Arabic OCR, as the tools have not developed a large  
enough user base to fully mature. You'll find that given the type of  
data processed you will get varying results.

Of course, all of this information is nearly useless if you don't  
take steps during the scanning process to ensure that you are  
preparing the electronic images correctly for OCR. This ranges from  
the most obvious characteristics such as resolution (anything under  
200dpi is generally useless, 300 is optimal for most OCR tools), to  
seemingly insignificant settings such as the default contrast.

I co-authored a paper on this very topic. If you're interested, you  
can get it at the following url.

http://www.mitre.org/work/tech_papers/tech_papers_05/05_0150/index.html

All the best,
Ben Huyck

------------------------------------------------------------------------ 
--
2)
Date: 15 Nov 2006
From: Jan Hoogland <j.hoogland at let.ru.nl>
Subject:rabic OCR response

Hi Waheed,
I heared ReadIris, that comes with Agfa scanners and maybe with other  
brands
as well, performed well on Arabic. So no expensive extra software, but
simply the OCR programme that comes with the scanner.
I myself haven't been using OCR for ages. In the nineties there was  
Al Qari'
Al Ali, but you're closer to SAkhr than I am to ask them about its
developments since then.
Regards,
Jan

------------------------------------------------------------------------ 
--
3)
Date: 15 Nov 2006
From: "al-Husein N. Madhany" <anm at post.harvard.edu>
Subject:rabic OCR response

May I recommend IRIS's latest Arabic OCR product?  It costs about
$500.  I just tested it a little while ago, and it works reasonably
well.  But bear in mind you still have to read through everything,
don't trust the program to transcribe a full text error-free for one
second!

A more costly alternative is Sakhr OCR, but it is harder to obtain.

al-Husein Madhany
anm at uchicago.edu

------------------------------------------------------------------------ 
--
End of Arabic-L:  15 Nov 2006
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20061115/4ca89f43/attachment.htm>


More information about the Arabic-l mailing list