Corpora: rewrite rules for speech
Jim Magnuson
magnuson at ling.ling.rochester.edu
Mon Oct 23 17:21:17 UTC 2000
Hi. I am trying to compute estimates of, e.g., diphone transitional
probabilities in conversational speech. So far I have worked with the
CallHome database from the LDC. What I'm working with are orthographic
transcripts of telephone conversations. I've replaced all of the
orthographic forms with phonemic citation forms. This gives me very
different estimates of diphone probabilities than, e.g., written corpora
or frequency-weighted dictionaries.
However, citation forms are obviously not ideal. For my purposes, it is
not worth investing in retranscribing the corpus phonetically. But I would
like to improve my estimates by applying phonological rules to my corpus
of phonemic citation forms. Could anyone point me towards a source of such
rules for American English? I've started working on my own, but would
rather not reinvent anything.
Thanks very much,
jim
********************************************************************
James Magnuson
Brain and Cognitive Sciences
Meliora Hall
University of Rochester
Rochester, NY 14627
phone: (716)275-0860
fax: (716)271-3043
email: magnuson at ling.rochester.edu
URL: http://www.bcs.rochester.edu/bcs/people/students/magnuson/magnuson.html
More information about the Corpora
mailing list