Corpora: rewrite rules for speech

Jim Magnuson magnuson at ling.ling.rochester.edu
Mon Oct 23 17:21:17 UTC 2000


Hi. I am trying to compute estimates of, e.g., diphone transitional
probabilities in conversational speech. So far I have worked with the
CallHome database from the LDC. What I'm working with are orthographic
transcripts of telephone conversations. I've replaced all of the
orthographic forms with phonemic citation forms. This gives me very
different estimates of diphone probabilities than, e.g., written corpora
or frequency-weighted dictionaries.

However, citation forms are obviously not ideal. For my purposes, it is
not worth investing in retranscribing the corpus phonetically. But I would
like to improve my estimates by applying phonological rules to my corpus
of phonemic citation forms. Could anyone point me towards a source of such
rules for American English? I've started working on my own, but would
rather not reinvent anything.

Thanks very much,

jim

********************************************************************
James Magnuson
Brain and Cognitive Sciences
Meliora Hall
University of Rochester
Rochester, NY  14627

phone:	(716)275-0860
fax:	(716)271-3043
email:	magnuson at ling.rochester.edu
URL:	http://www.bcs.rochester.edu/bcs/people/students/magnuson/magnuson.html



More information about the Corpora mailing list