OCR today?

Paul B. Gallagher paulbg at PBG-TRANSLATIONS.COM
Sat May 21 19:19:48 UTC 2005


Robin LaPasha wrote:

> Hello, folks. The question returns. I have a library colleague with a
> brand new digital production center and... somebody who wants to OCR
> some Russian (printed) text.
>
> The last time I checked (about 9-12 months ago) FineReader was still
> on top, but there were some fans of Readiris. (I gather it would be
> the choice for Mac-only people...) And, as usual, though you wouldn't
> buy it with Russian in mind, if your shop happened to have full
> current OmniPage, it was perhaps worth a try.
>
> It hasn't been discussed on this list since 2003 (except for a recent
> OCS/Old Greek thread), so...
>
> Does anyone have some direct experience with recent revs of these
> products? (Or other newcomers to the market?) If so, please tell us -
> but do also include:
>     - the specific rev and level of your product (so we can compare
> apples and apples),
>     - whether you're using a Mac or a PC version of the product, and
>     - whether you have also used the product to OCR other Slavic or
> non-Slavic languages.
>
> ...

I've been using FineReader Pro 7 (Windows) for the past year and I'm
very pleased with it. The dictionary is much stronger than in past
versions, and they've added PDF support -- you can read a PDF directly
without having to take snapshots of individual pages. If you start with
a good image, you can often breeze through page after page with only a
question or two per page. Crappy images will give any program trouble,
but I'm consistently amazed at what FR 7 can do when I need a magnifying
glass.

A sort of downside is still the treatment of tables, but that just
requires a little skill in using the program. If you tell the program to
read a page, without any further guidance, it will generally recognize
and analyze tables as such, but too often it will not realize that text
in adjacent cells belongs to one cell. So when I see a complex table, I
generally mark that block manually, merging cells to achieve the correct
structure, before asking it to recognize the characters. This optimizes
the use of the dictionary when you have things like this:
	Price	Quan-	Exten-
		tity	sion
(one row, not two)

I generally don't ask the program to do page layout. Even though it's
quite capable, that's not what an OCR program is for, so I just ask it
to send me the right words, and I take it from there.

Many of my texts include bits and pieces of English, and other languages
(mostly company names and bibliographic citations), and occasionally I
use FR 7 for entirely English texts. It does fine with these, and has
specialized medical and legal dictionaries for English, which have
really come in handy.


--
War doesn't determine who's right, just who's left.
--
Paul B. Gallagher
pbg translations, inc.
"Russian Translations That Read Like Originals"
http://pbg-translations.com

-------------------------------------------------------------------------
 Use your web browser to search the archives, control your subscription
  options, and more.  Visit and bookmark the SEELANGS Web Interface at:
                    http://seelangs.home.comcast.net/
-------------------------------------------------------------------------



More information about the SEELANG mailing list