Corpora: rewrite rules for speech

James L. Fidelholtz jfidel at siu.buap.mx
Tue Oct 24 23:40:04 UTC 2000


On Mon, 23 Oct 2000, Jim Magnuson wrote:

>Hi. I am trying to compute estimates of, e.g., diphone transitional
>probabilities in conversational speech. So far I have worked with the
>CallHome database from the LDC. What I'm working with are orthographic
>transcripts of telephone conversations. I've replaced all of the
>orthographic forms with phonemic citation forms. This gives me very
>different estimates of diphone probabilities than, e.g., written corpora
>or frequency-weighted dictionaries.
>
>However, citation forms are obviously not ideal. For my purposes, it is
>not worth investing in retranscribing the corpus phonetically. But I would
>like to improve my estimates by applying phonological rules to my corpus
>of phonemic citation forms. Could anyone point me towards a source of such
>rules for American English? I've started working on my own, but would
>rather not reinvent anything.
>
Jim:
	The Commodore 64 had a pretty decent program for converting
writing to speech (I think it was in C64 BASIC, which should make it
easy to read the rules off of, and to convert for your purposes).  I
can't get at it any more, and I don't remember the name, but it should
be traceable somewhere on the web.
	Another tack: there is a book edited by Philip A. Luelsdorff
(1987. _Orthography and phonology_. Amsterdam: John Benjamins) with
articles which should be some help, and not only for English.  While old
and groty, you might find some help from:

Hultzιn, Lee S.; Joseph H. D. Allen Jr.; and Murray S. Miron. 1964.
_Tables of transitional frequencies of English phonemes_.  Urbana: U of
Illinois Press.

Even older and grotier, but maybe useful is:

Dewey, Godfrey. 1923. _Relativ [sic] frequency of English speech
sounds_. Cambridge: Harvard U. Press.  -- I think this is still not out
of print.

Luelsdorff, in particular, has more (earlier) stuff of interest, I
believe one book in the Mouton blue series.  Just look around a good
library a little.  Lots of work has definitely been done on this.
		Jim

--
James L. Fidelholtz			e-mail: jfidel at siu.buap.mx
Posgrado en Ciencias del Lenguaje	tel.: +(52-2)229-5500 x5705
Instituto de Ciencias Sociales y Humanidades	fax: +(01-2) 229-5681
Benemιrita Universidad Autσnoma de Puebla, MΙXICO



More information about the Corpora mailing list