Arabic-L:GEN:Arabic OCR responses
Dilworth Parkinson
dilworth_parkinson at BYU.EDU
Wed Nov 15 18:20:57 UTC 2006
------------------------------------------------------------------------
Arabic-L: Wed 15 Nov 2006
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:Arabic OCR response
2) Subject:Arabic OCR response
3) Subject:Arabic OCR response
-------------------------Messages-----------------------------------
1)
Date: 15 Nov 2006
From: "Ben Huyck" <regexer at gmail.com>
Subject:rabic OCR response
Dear Waheed,
As far as I am aware, there are really only two viable resources for
Arabic OCR. They are Sakhr's Automatic Reader, and NovoDynamics'
VERUS. They both perform well, although I personally have more
experience with Sahkr.
[urls]
www.sakhr.com
www.novodynamics.com
I know Sakhr works with MS Word, and I am fairly certain that VERUS
does as well. Both are reasonably fast (OCR tools usually are), and
both are capable of batch processing.
As far as accuracy goes, you'll have to be the judge of how "clean"
your documents are. Both tools will degrade as the quality of input
images goes down, but one of NovoDynamics' main selling points is
that it performs well on degraded images. I'm not aware of any
independent evaluation of this claim.
Unfortunately, quirk-free software is not generally available in a
field as new as Arabic OCR, as the tools have not developed a large
enough user base to fully mature. You'll find that given the type of
data processed you will get varying results.
Of course, all of this information is nearly useless if you don't
take steps during the scanning process to ensure that you are
preparing the electronic images correctly for OCR. This ranges from
the most obvious characteristics such as resolution (anything under
200dpi is generally useless, 300 is optimal for most OCR tools), to
seemingly insignificant settings such as the default contrast.
I co-authored a paper on this very topic. If you're interested, you
can get it at the following url.
http://www.mitre.org/work/tech_papers/tech_papers_05/05_0150/index.html
All the best,
Ben Huyck
------------------------------------------------------------------------
--
2)
Date: 15 Nov 2006
From: Jan Hoogland <j.hoogland at let.ru.nl>
Subject:rabic OCR response
Hi Waheed,
I heared ReadIris, that comes with Agfa scanners and maybe with other
brands
as well, performed well on Arabic. So no expensive extra software, but
simply the OCR programme that comes with the scanner.
I myself haven't been using OCR for ages. In the nineties there was
Al Qari'
Al Ali, but you're closer to SAkhr than I am to ask them about its
developments since then.
Regards,
Jan
------------------------------------------------------------------------
--
3)
Date: 15 Nov 2006
From: "al-Husein N. Madhany" <anm at post.harvard.edu>
Subject:rabic OCR response
May I recommend IRIS's latest Arabic OCR product? It costs about
$500. I just tested it a little while ago, and it works reasonably
well. But bear in mind you still have to read through everything,
don't trust the program to transcribe a full text error-free for one
second!
A more costly alternative is Sakhr OCR, but it is harder to obtain.
al-Husein Madhany
anm at uchicago.edu
------------------------------------------------------------------------
--
End of Arabic-L: 15 Nov 2006
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20061115/4ca89f43/attachment.htm>
More information about the Arabic-l
mailing list