[Corpora-List] Phoneme frequency information
caren at brinckmann.de
caren at brinckmann.de
Tue Dec 15 11:21:00 UTC 2009
Dear Thomas,
the lexical database CELEX
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96L14) contains
phonemic transcriptions and frequency information for each German entry (no
Spanish, I'm afraid), which can be used to compute the relative frequency of
each German phoneme. The frequency information in CELEX was computed from
written corpora (5.4 million tokens) and transcribed speech (600,000 tokens).
Another solution might be to apply a grapheme-to-phoneme converter (as used in
text-to-speech synthesis systems) to your own corpus - written sources or
transcribed speech - and compute the phoneme frequencies from this converted
corpus.
HTH
Caren.
--
Caren Brinckmann
Institut für Deutsche Sprache (IDS)
R5, 6-13
68161 Mannheim
Germany
Tel: +49-621-1581-219
Fax: +49-621-1581-200
Thomas Schmidt <thomas.schmidt at uni-hamburg.de> hat am 15. Dezember 2009 um 11:10
geschrieben:
> Dear list members,
>
> a colleague of mine is looking for frequency information of phonemes
> in German and Spanish, i.e. relative frequencies of each phoneme in a
> (reasonably large) corpus of those languages. Does anybody know if
> such frequency lists are out there somewhere? Any hints will be
> greatly appreciated.
>
> - Thomas
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list