Corpora: rewrite rules for speech

William M. Fisher william.fisher at
Mon Oct 23 20:33:00 UTC 2000

Jim Magnuson wrote:

> Hi. I am trying to compute estimates of, e.g., diphone transitional
> probabilities in conversational speech. So far I have worked with the
> CallHome database from the LDC. What I'm working with are orthographic
> transcripts of telephone conversations. I've replaced all of the
> orthographic forms with phonemic citation forms. This gives me very
> different estimates of diphone probabilities than, e.g., written corpora
> or frequency-weighted dictionaries.
> However, citation forms are obviously not ideal. For my purposes, it is
> not worth investing in retranscribing the corpus phonetically. But I would
> like to improve my estimates by applying phonological rules to my corpus
> of phonemic citation forms. Could anyone point me towards a source of such
> rules for American English? I've started working on my own, but would
> rather not reinvent anything.

  A couple of years ago Steve Greenberg and colleagues at ICSI did
phonetic transcriptions of a part of the Switchboard corpus, and Joe
Picone at ISIP has made them available for downloading from:

 - Bill F.

More information about the Corpora mailing list