IPA converter
Herb Stahlke
hfwstahlke at GMAIL.COM
Fri Mar 13 02:19:41 UTC 2009
Back in the early 80s, Dennis Klatt of MIT adapted text-to-speech
software he had developed in the late 70s for commercial use in
Digital Equipment Corporation's DECTalk, drawing in part on a set of
about 625 letter-to-sound rules for English designed by Sharon
Hunnicutt of the Swedish Royal Institute of Technology. The DECTalk
engineers and programmers built in a programmer's back door to the
device that allowed them to check the accuracy of the text to phoneme
conversion; the back door produced an IPA phonemic representation of
the speech that the device would produce. The system was fairly
accurate, but it came with a built-in data file of 9000 words that
defied spelling rules, like "were," and there was a capability for
users to add words to this list, to take care of, for example, proper
nouns. If the system could not produce a phonemic representation, it
would read out the spelling of the word. This was the most effective
commercial text-to-IPA system of its time, although I'm sure better
ones have been developed since I stopped working in that area, and its
accuracy, without a data set of irregular spellings, was about seventy
percent. With the firmware word list it reached about 92% on
newspaper text. That average would fall sharply if you had it read a
phone book. I don't know if any of the old DECTalk devices are still
around--they'd be about twenty-five years old now. Later versions,
which were reduced to a card you could insert in a mother board slot,
no longer had the IPA backdoor.
Herb
On Thu, Mar 12, 2009 at 7:53 PM, Chris Waigl <chris at lascribe.net> wrote:
> ---------------------- Information from the mail header -----------------------
> Sender: American Dialect Society <ADS-L at LISTSERV.UGA.EDU>
> Poster: Chris Waigl <chris at LASCRIBE.NET>
> Subject: Re: IPA converter
> -------------------------------------------------------------------------------
>
> On 12 Mar 2009, at 04:45, Herb Stahlke wrote:
>>
>> Hopelessly unreliable. I tried it on a number of words and phrases.
>> Some it doesn't convert at all, some it converts in almost a random
>> way, like orthographic <h> replaced by [Q]. It didn't recognized the
>> SIL IPA 93 font in my font library, even though that's the font it
>> asks for. It couldn't transcribe "caught." It comes up with
>> unexplained symbols like a double >. It does, however, convert
>> "little" with a syllabic /l/.
>
>
> I've been putting together an embryonic IPA converter, which currently
> lives here here: http://ipalizer.appspot.com/ .
>
> Right now, all it does is to transform Merriam-Webster style phonetics
> into IPA. So in order to use it, you need to:
>
> *Access the MW page for a word (say: http://www.merriam-webster.com/dictionary/friday)
> * Copy the phonetic transcription into your clipboard
> * Paste it into the tool's text field and hit submit
>
> There are two major limitations:
>
> 1. It can't distinguish between [T] (as in "thief") and [D] (as in
> "this"). The reason is that whoever spec'ed the MW representation of
> phonetic characters had the extraordinarily bright idea of using the
> <u> element in the HTML markup to realize underlining, which
> distinguishes the two phonemes in their version of phonetic
> transcription. Markup-level underlining does not copy and paste.
> 2. This is not really at a publishable level of completion. Way pre-
> alpha. While I'd be delighted about any feedback, please get in touch
> if you want to use it for pretty much anything beyond playing around.
>
> The next thing I want to do is to replace the input field with a field
> asking for the word to transcribe, then retrieve the MW page myself,
> scrape out the phonetics, and then transpose those to IPA. Also, the
> same could be done for AHD4 (as per bartleby.com), but they use even
> more markup, which complicated matters.
>
> Chris Waigl
> who is still very unhappy with the state of phonetic transcription in
> English online dictionaries (the OED *still* uses small images for
> some characters! do they need someone helping them out with Unicode?),
> and amused about MW's choice of class names: <dd class="pron"><span
> class="pronchars">
>
>
>
>
> --
> Chris Waigl -- http://chryss.eu -- http://eggcorns.lascribe.net
> twitter: chrys -- friendfeed: chryss
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org
>
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list