8.1135, Sum: OCR Software

linguist at linguistlist.org linguist at linguistlist.org
Tue Aug 5 01:50:37 UTC 1997


LINGUIST List:  Vol-8-1135. Mon Aug 4 1997. ISSN: 1068-4875.

Subject: 8.1135, Sum: OCR Software

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Martin Jacobsen <marty at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: Martin Jacobsen <marty at linguistlist.org>

=================================Directory=================================

1)
Date:  Mon, 4 Aug 1997 16:15:02 -0400
From:  dbeck at chass.utoronto.ca (David Beck)
Subject:  Sum: OCR software

-------------------------------- Message 1 -------------------------------

Date:  Mon, 4 Aug 1997 16:15:02 -0400
From:  dbeck at chass.utoronto.ca (David Beck)
Subject:  Sum: OCR software


A couple of weeks back I posted a query about OCR software for the Mac
that is trainable enough to be useful to a linguist scanning Latin or
IPA-based non-English texts. Thanks to

        Jakob Dempsey
        Sarah Rilling
        Michael Betsch
        Andrew Arefiev
        Marc Fryd
and     Daniel Loehr

for their responses.

In the Mac world, it appears that the front-runner in this area is the
widely-available OmniPage programme from Caere Corporation
(http://www.caere.com for info). It is apparently trainable although
one respondent expressed some doubts about being able to train it to
handle more than a single special font. I should also mention that the
first sales rep I talked to previously about OmniPage seemed to think
that it might have trouble with the combinations of letters and
diacrits typical of IPA- based alphabets.  However, the publicity
literature on the Web site seems to imply that it can be trained to
recognize combinations of separate characters and the last sales rep I
talked to seemed to think that there was no doubt that OmniPage could
do the job.

Jakob Dempsey also mentioned an "expensive Kurzweil product" for the
Mac, but I haven't heard anything further about this.

I also got two responses that mentioned Windows-based applications
that are highly trainable. One is a German product called OPTOPUS made
by a German company called Makrolog in Wiesbaden which is "exclusively
trainable"--that is, it needs to be trained from scratch and so can be
configured to any alphabet you like. The other is by a Russian company
called Bit Software (www.bitsoft.ru); their programme is called
FineReader and in addition to having a wide range of set alphabets for
langauges using both Latin and Cyrillic, they report having
sucessfully trained it to recognize Icelandic and Tibetan fonts).

David Beck

======================================================================
David Beck
Department of Linguistics
Sixth Floor, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario  M5S 3H1
Canada
e-mail: dbeck at chass.utoronto.ca
phone: (416) 978-4029
       (416) 923-2394 (home)
FAX:   (416) 971-2688

---------------------------------------------------------------------------
LINGUIST List: Vol-8-1135



More information about the LINGUIST mailing list