[Corpora-List] Phoneme frequency information

caren at brinckmann.de caren at brinckmann.de
Tue Dec 15 11:21:00 UTC 2009


Dear Thomas,

the lexical database CELEX
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96L14) contains
phonemic transcriptions and frequency information for each German entry (no
Spanish, I'm afraid), which can be used to compute the relative frequency of
each German phoneme. The frequency information in CELEX was computed from
written corpora (5.4 million tokens) and transcribed speech (600,000 tokens).

Another solution might be to apply a grapheme-to-phoneme converter (as used in
text-to-speech synthesis systems) to your own corpus - written sources or
transcribed speech - and compute the phoneme frequencies from this converted
corpus.

HTH
Caren.

-- 
Caren Brinckmann
Institut für Deutsche Sprache (IDS)
R5, 6-13
68161 Mannheim
Germany
Tel: +49-621-1581-219
Fax: +49-621-1581-200


Thomas Schmidt <thomas.schmidt at uni-hamburg.de> hat am 15. Dezember 2009 um 11:10
geschrieben:

> Dear list members,
> 
> a colleague of mine is looking for frequency information of phonemes
> in German and Spanish, i.e. relative frequencies of each phoneme in a
> (reasonably large) corpus of those languages. Does anybody know if
> such frequency lists are out there somewhere? Any hints will be
> greatly appreciated.
> 
> - Thomas

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list