OCR recommendation?

Yoshimasa Tsuji yamato at yt.cache.waseda.ac.jp
Fri May 29 01:15:38 UTC 1998


FineReader Professional (http://www.bitsoft.ru) is
the only choice.  I have taught it to replace jat' by "@e",
fita by "@f", etc. and am so very satisfied with it that I have
not checked if they have released a new version.

FineReader Standard, CuneiForm (try http://www.cognitive.ru) are
also excellent products (for Mac users, MacTiger, the Mac version
of CuneiForm is the only choice). Their accuracy is almost
at the ceiling level (words with three or more letters are correctly
read -- two or three errors in a 300 KB text; recognition of words
with two or fewer letters is rather poor -- 90 per cent for a beautiful
print. My guess is the bare bone recognition capability, that is without
guessing from the spelling -- is 95 per cent, which means about fifteen
errors in a page. All of these OCRs understand some forty languages and
directly output spread sheet in Excel format, etc., but FineReaderPro
excels in many ways: trainability, handwritten text, capability
of creating databases directly from a custom formatted OCR paper such as
"marked sheets", etc.

Try those programs by downloading from the sites I have quoted.

Cheers,
Tsuji


P.S.
If we have got a proper spelling checker, we could OCR a book of three
hundred pages in a day's work. Unfortunately there's no such thing
available. I have written a basic algorithm for a decent spelling
checker but neither time/money nor a help has been available for me,
unfortunately. I wonder if you know of a student who might help me
(we don't have specialists of formal grammar here, so asking students
whether they know of Zaliznjak is a waste of time. Besides I am
teaching economics and my acquaintances are mostly maths people, which
is why I can't get grants for linguistic researches.
  Replace "po" by "no" in a Russian text and see whether your favorite
spelling checker says anything. Humans think it is a typo, but how's
your software? If it does, let me know, please.



More information about the SEELANG mailing list