[RNLD] OCR
Riley, Charles
charles.riley at yale.edu
Fri Dec 6 16:51:15 UTC 2019
Hi Nick,
As of 2017, there didn’t seem to be any for the general use case of OCR for the International Phonetic Alphabet:
https://linguistics.stackexchange.com/questions/19579/ocr-program-for-ipa?atw=1
Tesseract is trainable though. There is a learning curve and it would take some development effort of a new file (mul.traineddata?) to get it functional for your needs.
https://github.com/tesseract-ocr/tessdata
There is a GUI called a9t9 that may be helpful as well, but it won’t work for IPA right out of the box. It can make use of Tesseract trained data files.
Best,
Charles Riley
Catalog Librarian for African Languages
Yale University Library
From: Nick Thieberger <thien at unimelb.edu.au>
Sent: Thursday, December 5, 2019 3:16 PM
To: RNLD mailing list <r-n-l-d at lists.unimelb.edu.au>
Subject: [RNLD] OCR
Has anyone had experience of successful OCR of ŋ and superscript w? I have tried in ABBYY and OmniPage with no success. This is to produce a new version of an existing print dictionary for which we havea pdf.
Thanks,
Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20191206/8e210e4d/attachment.htm>
More information about the Resource-network-linguistic-diversity
mailing list