[Corpora-List] Automatic IPA transcription

Christian Pietsch chr.pietsch at googlemail.com
Wed Jun 20 19:17:57 UTC 2012


Hi Sam,

I assume you have English text, not speech. Then what you need is a
grapheme-to-phoneme (G2P) converter. You will find them as components
of text-to-speech (TTS) systems. For English text, you could use
eSpeak or Festival, both of which are easily obtainable, e.g. as
Debian or Ubuntu Linux packages. Here is something I tried:

$ echo 'Will you pronounce this correctly?' | espeak -v en -x -q 
--> wIl ju: pr at n'aUns DIs k at r'Ektli

The output you can see here is what eSpeak calls “phoneme mnemonics”,
but I guess it is X-SAMPA which is an ASCII representation of IPA. For
a mapping table and code in several programming languages, including
Python, see Henrik Theiling's IPA site <http://www.theiling.de/ipa/>.
Using his cxs.py module and CXS.def lookup table, I get this result:
--> wɪl juː prənˈaʊns ðɪs kərˈɛktli

Looks OK to me.

Instead of using parts of a full TTS system, you can also use
stand-alone G2P tools such as Sequitur G2P or Phonetisaurus, but you
might have to train them first.

Hope this helps,
Christian


On Tue, Jun 19, 2012 at 02:23:30PM -0400, Sam Raker wrote:
> I was wondering if anyone has found a good (OSX/*NIX-compatible)
> program for automatic transcription (of English) to IPA. There are a
> few websites that offer to do it, but I'd prefer something I could
> plug in to a python program, if possible.

-- 
  Christian Pietsch
  http://purl.org/net/pietsch
  Bielefeld University, Bielefeld, Germany
  University Library and CRC 882

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list