8.1135, Sum: OCR Software
linguist at linguistlist.org
linguist at linguistlist.org
Tue Aug 5 01:50:37 UTC 1997
LINGUIST List: Vol-8-1135. Mon Aug 4 1997. ISSN: 1068-4875.
Subject: 8.1135, Sum: OCR Software
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>
Review Editor: Andrew Carnie <carnie at linguistlist.org>
Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
Ann Dizdar <ann at linguistlist.org>
Assistant Editor: Martin Jacobsen <marty at linguistlist.org>
Software development: John H. Remmers <remmers at emunix.emich.edu>
Zhiping Zheng <zzheng at online.emich.edu>
Home Page: http://linguistlist.org/
Editor for this issue: Martin Jacobsen <marty at linguistlist.org>
=================================Directory=================================
1)
Date: Mon, 4 Aug 1997 16:15:02 -0400
From: dbeck at chass.utoronto.ca (David Beck)
Subject: Sum: OCR software
-------------------------------- Message 1 -------------------------------
Date: Mon, 4 Aug 1997 16:15:02 -0400
From: dbeck at chass.utoronto.ca (David Beck)
Subject: Sum: OCR software
A couple of weeks back I posted a query about OCR software for the Mac
that is trainable enough to be useful to a linguist scanning Latin or
IPA-based non-English texts. Thanks to
Jakob Dempsey
Sarah Rilling
Michael Betsch
Andrew Arefiev
Marc Fryd
and Daniel Loehr
for their responses.
In the Mac world, it appears that the front-runner in this area is the
widely-available OmniPage programme from Caere Corporation
(http://www.caere.com for info). It is apparently trainable although
one respondent expressed some doubts about being able to train it to
handle more than a single special font. I should also mention that the
first sales rep I talked to previously about OmniPage seemed to think
that it might have trouble with the combinations of letters and
diacrits typical of IPA- based alphabets. However, the publicity
literature on the Web site seems to imply that it can be trained to
recognize combinations of separate characters and the last sales rep I
talked to seemed to think that there was no doubt that OmniPage could
do the job.
Jakob Dempsey also mentioned an "expensive Kurzweil product" for the
Mac, but I haven't heard anything further about this.
I also got two responses that mentioned Windows-based applications
that are highly trainable. One is a German product called OPTOPUS made
by a German company called Makrolog in Wiesbaden which is "exclusively
trainable"--that is, it needs to be trained from scratch and so can be
configured to any alphabet you like. The other is by a Russian company
called Bit Software (www.bitsoft.ru); their programme is called
FineReader and in addition to having a wide range of set alphabets for
langauges using both Latin and Cyrillic, they report having
sucessfully trained it to recognize Icelandic and Tibetan fonts).
David Beck
======================================================================
David Beck
Department of Linguistics
Sixth Floor, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario M5S 3H1
Canada
e-mail: dbeck at chass.utoronto.ca
phone: (416) 978-4029
(416) 923-2394 (home)
FAX: (416) 971-2688
---------------------------------------------------------------------------
LINGUIST List: Vol-8-1135
More information about the LINGUIST
mailing list