Corpora: rewrite rules for speech

William M. Fisher william.fisher at nist.gov
Mon Oct 23 20:33:00 UTC 2000


Jim Magnuson wrote:

> Hi. I am trying to compute estimates of, e.g., diphone transitional
> probabilities in conversational speech. So far I have worked with the
> CallHome database from the LDC. What I'm working with are orthographic
> transcripts of telephone conversations. I've replaced all of the
> orthographic forms with phonemic citation forms. This gives me very
> different estimates of diphone probabilities than, e.g., written corpora
> or frequency-weighted dictionaries.
>
> However, citation forms are obviously not ideal. For my purposes, it is
> not worth investing in retranscribing the corpus phonetically. But I would
> like to improve my estimates by applying phonological rules to my corpus
> of phonemic citation forms. Could anyone point me towards a source of such
> rules for American English? I've started working on my own, but would
> rather not reinvent anything.
>

  A couple of years ago Steve Greenberg and colleagues at ICSI did
phonetic transcriptions of a part of the Switchboard corpus, and Joe
Picone at ISIP has made them available for downloading from:

     http://www.isip.msstate.edu/projects/switchboard/index.html

 - Bill F.



More information about the Corpora mailing list