Corpora: Phonemic Corpora

Bill Fisher william.fisher at nist.gov
Tue Nov 14 14:53:59 UTC 2000


VSWarren at aol.com wrote:

> Can anyone please suggest either a program to convert from orthographic to
> phonemic or alternatively a large corpora where phonemic transcriptions are
> given for such a large number of different words.

  You can download software that does a pretty good job
of converting text to segmental phonemes from the NIST
website: see http://www.nist.gov/speech/tools/index.htm.
But you should be aware that the output from this is
phonemic underlying forms that often are realized differently
in actual speech; for instance, what usually surfaces as
syllabic consonants are phonemicized as a sequence of
(zero-stressed) schwa plus consonant (as in "button").

  A good free lexicon of English is available from CMU;
see http://www.speech.cs.cmu.edu/cgi-bin/pronounce.
In addition, the LDC offers a high-accuracy one, but
it's not free.  And Joe Picone & Co. of Mississippi State
are making available lexicons derived from their
re-transcription of the Switchboard corpus along with
phonetic transcriptions from ICSI; see
http://www.isip.msstate.edu/projects/switchboard/index.html.

 - Bill F.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20001114/8aac0305/attachment.htm>


More information about the Corpora mailing list