[Lingtyp] Frequencies in language archives

Harald Hammarström harald at bombo.se
Tue Apr 7 12:37:05 UTC 2020

> == Graphemes ==
> - the most frequent grapheme in transcriptions is <a>
> - the next most frequent graphemes are <e>, <i>, <n>, but the order is
> different between archives.
> The fact that <a> is the most frequent grapheme is certainly plausible.
> But I am interested in explanations for the differences between <e>,
> <i>, <n>. Would we have expected these three, and which order would we
> have predicted?

These are grapheme fractions from 344 items of vocabulary from 245
languages from different families (data appendix to Erben Johansson 2020 in

a 0.16
i 0.08
u 0.05
k 0.05
t 0.05
n 0.04
o 0.04
e 0.04
m 0.04
h 0.03

So I'd say the numbers you got are not very surprising. The differences in
order, depending on how much variation they reflect, could be due to
regional variations in the archives, or even chance alone.

all the best, H

Erben Johansson, Niklas, Anikin, A., Carling, G., & Holmer,
A. (2020). The typology of sound symbolism: Defining macro-concepts
via their semantic and phonetic features. Linguistic Typology (in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20200407/bc401f95/attachment.htm>

More information about the Lingtyp mailing list