Tue Aug 5 01:50:37 UTC 1997

Subject: 8.1135, Sum: OCR Software

Date:  Mon, 4 Aug 1997
From:  dbeck at chass.utoronto.ca (David Beck)
Subject:  Sum: OCR software

Date:  Mon, 4 Aug 1997 16:15:02 -0400
From:  dbeck at chass.utoronto.ca (David Beck)
Subject:  Sum: OCR software

A couple of weeks back I posted a query about OCR software for the Mac
that is trainable enough to be useful to a linguist scanning Latin or
IPA-based non-English texts. Thanks to

        Jakob Dempsey
        Sarah Rilling
        Michael Betsch
        Andrew Arefiev
        Marc Fryd
and     Daniel Loehr

for their responses.

In the Mac world, it appears that the front-runner in this area is the
widely-available OmniPage programme from Caere Corporation
(http://www.caere.com for info). It is apparently trainable although
one respondent expressed some doubts about being able to train it to
handle more than a single special font. I should also mention that the
first sales rep I talked to previously about OmniPage seemed to think
that it might have trouble with the combinations of letters and
diacrits typical of IPA- based alphabets.  However, the publicity
literature on the Web site seems to imply that it can be trained to
recognize combinations of separate characters and the last sales rep I
talked to seemed to think that there was no doubt that OmniPage could
do the job.

Jakob Dempsey also mentioned an "expensive Kurzweil product" for the
Mac, but I haven't heard anything further about this.

I also got two responses that mentioned Windows-based applications
that are highly trainable. One is a German product called OPTOPUS made
by a German company called Makrolog in Wiesbaden which is "exclusively
trainable"--that is, it needs to be trained from scratch and so can be
configured to any alphabet you like. The other is by a Russian company
called Bit Software (www.bitsoft.ru); their programme is called
FineReader and in addition to having a wide range of set alphabets for
langauges using both Latin and Cyrillic, they report having
sucessfully trained it to recognize Icelandic and Tibetan fonts).

David Beck

David Beck
Department of Linguistics
Sixth Floor, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario  M5S 3H1
e-mail: dbeck at chass.utoronto.ca
phone: (416) 978-4029
       (416) 923-2394 (home)
FAX:   (416) 971-2688

