[Corpora-List] Fwd: Re: Phoneme frequency information

FIDELIS HOLTZ fidelholtz at gmail.com
Tue Dec 15 15:29:33 UTC 2009


Hi, all,

Angus's point is well-taken. Although I haven't done much research on this
particular point, there are clear indications (eg, I once wrote an
assembly-lg. program for syllabifying written Spanish with virtually 100%
coverage using only 310 bytes [no Ks or other prefixes!]) that for Spanish
the differences between written language and the (automatically derived)
spoken language are less striking than in the case of English and (I assume)
German, and very well might produce reasonably accurate statistics. That is,
Spanish orthography, despite the very real (though relatively
minor) problems which it does have and which I won't go into here, is
relatively more 'phonemic' with respect to the spoken language than (at
least) English is. (This, I believe, would be true for the majority of major
dialects of Spanish, ie, at least Mexican Spanish and 'Academic' European
Spanish, as well as other major American dialects ('cultured' Colombian and
Venezuelan, for example -- I'm not so sure about 'Caribbean' varieties).)

Jim

On Tue, Dec 15, 2009 at 8:26 AM, Angus B. Grieve-Smith <grvsmth at panix.com>wrote:

>
> Another solution might be to apply a grapheme-to-phoneme converter (as used
>> in
>> text-to-speech synthesis systems) to your own corpus - written sources or
>> transcribed speech - and compute the phoneme frequencies from this
>> converted
>> corpus.
>>
>   I would like to point out that this will not give you actual phoneme
> frequency, only an estimate of what the frequencies would be if every word
> were pronounced according to the standards encoded in the
> grapheme-to-phoneme converter.
>
> --
>                                -Angus B. Grieve-Smith
>                                grvsmth at panix.com
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
James L. Fidelholtz
Posgrado en Ciencias del Lenguaje
Instituto de Ciencias Sociales y Humanidades
Benemérita Universidad Autónoma de Puebla, MÉXICO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20091215/b4c3478c/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list