[RNLD] OCR

Fri Dec 6 16:51:15 UTC 2019

Hi Nick,

As of 2017, there didn’t seem to be any for the general use case of OCR for the International Phonetic Alphabet:
https://linguistics.stackexchange.com/questions/19579/ocr-program-for-ipa?atw=1

Tesseract is trainable though.  There is a learning curve and it would take some development effort of a new file (mul.traineddata?) to get it functional for your needs.
https://github.com/tesseract-ocr/tessdata

There is a GUI called a9t9 that may be helpful as well, but it won’t work for IPA right out of the box.  It can make use of Tesseract trained data files.

Best,
Charles Riley
Catalog Librarian for African Languages
Yale University Library

From: Nick Thieberger <thien at unimelb.edu.au>
Sent: Thursday, December 5, 2019 3:16 PM
To: RNLD mailing list <r-n-l-d at lists.unimelb.edu.au>
Subject: [RNLD] OCR

Has anyone had experience of successful OCR of ŋ and superscript w? I have tried in ABBYY and OmniPage with no success. This is to produce a new version of an existing print dictionary for which we havea pdf.

Thanks,

Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20191206/8e210e4d/attachment.htm>