Corpora: rewrite rules for speech
William M. Fisher
william.fisher at nist.gov
Mon Oct 23 20:33:00 UTC 2000
Jim Magnuson wrote:
> Hi. I am trying to compute estimates of, e.g., diphone transitional
> probabilities in conversational speech. So far I have worked with the
> CallHome database from the LDC. What I'm working with are orthographic
> transcripts of telephone conversations. I've replaced all of the
> orthographic forms with phonemic citation forms. This gives me very
> different estimates of diphone probabilities than, e.g., written corpora
> or frequency-weighted dictionaries.
>
> However, citation forms are obviously not ideal. For my purposes, it is
> not worth investing in retranscribing the corpus phonetically. But I would
> like to improve my estimates by applying phonological rules to my corpus
> of phonemic citation forms. Could anyone point me towards a source of such
> rules for American English? I've started working on my own, but would
> rather not reinvent anything.
>
A couple of years ago Steve Greenberg and colleagues at ICSI did
phonetic transcriptions of a part of the Switchboard corpus, and Joe
Picone at ISIP has made them available for downloading from:
http://www.isip.msstate.edu/projects/switchboard/index.html
- Bill F.
More information about the Corpora
mailing list