[Lingtyp] Frequency of front/back, high/mid vowels

Daniel Ross djross3 at gmail.com
Mon Sep 14 14:41:31 UTC 2020

Dear Sebastian,

The Spanish text-based statistics are misleading. The frequency of /e/, /a/
and /o/ is due to common inflectional suffixes (for gendered forms:
masculine o, feminine a and general e; and for verbs: first person o, and
other-persons a or e). If you want to consider "token" frequency, then the
most appropriate place might be in the lexicon, looking at roots, rather
than looking at tokens in texts because you will find an overwhelming skew
based on function words and common inflectional suffixes. (Of course you
might argue that there is some underlying reason why certain vowels might
be more likely to appear in grammaticalized forms.) For comparison, you
would find that /i/ is much more common in Italian, simply because it is a
second-person singular verb suffix and masculine plural noun suffix, while
/u/ does not serve any similarly frequent role so it would be as infrequent
as in Spanish. But Spanish and Italian have broadly similar phonology. (One
further issue in Spanish is that <i> and <u> are used as semi-vowels in
diphthongs so you'd have to decide whether you're counting that or not.
Their frequencies would be lower if you exclude that context.)


On Mon, Sep 14, 2020 at 3:14 AM Mark Donohue <mhdonohue at gmail.com> wrote:

> Checking my database of 6950 languages/varieties, I get the following
> figures (interpreting your question somewhat, keeping the categories
> [HIGH], [LOW], [FRONT], [BACK] distinct):
> Taking the vowel set to be limited to [ieɛæaɐɑɔou]
> Front vowels: 15,127 occurrences
> (i, e, ɛ, æ)
> Back vowels: 14,762 occurrences
> (ɑ, ɔ, o, u)
> High vowels: 13,252 occurrences
> (i, u)
> Mid vowels: 15,768 occurrences
> (e, ɛ, ɔ, o)
> Low vowels: 7,779 occurrences
> (æ, a, ɐ, ɑ)
> I, for one, don't find this very helpful.
> Sebastian's questions are more easily answered if we look at individual
> frequencies:
> (note: when there is no more explicit information, and no contrast between e
> and ɛ, or o and ɔ, they are counted as [ɛ, ɔ].)
> i 6,766 97%
> e 1,846 27%
> ɛ 6,007 86%
> æ 508 7%
> a 6,688 96%
> ɐ 95 1%
> ɑ 361 5%
> ɔ 5,787 83%
> o 2,128 31%
> u 6,486 93%
> And we really should do it separately for different types of vowel systems.
> For example, there are 2,037 languages in the database with 5 contrasts in
> quality (in short vowels)
> The most common is
> i ɛ a ɔ u 1785 lgs
> All balanced in terms of front/back, and high/mid.
> Of the remaining 252, we start to see asymmetries of the sort that
> Sebastian is asking about: looking at the languages that are missing just
> one of the vowels above, we have
> missing i: 2 lgs (add ə or ɨ)
> missing ɛ: 26 lgs (12 with ɨ, 5 with ə, 2 with æ, 2 with y, one with ɐ
> and one with ɯ)
> missing a: 15 lgs (9 with æ, 5 with ɑ, 1 with ɐ)
> missing ɔ: 62 lgs (4 with o, 23 with ɨ, 24 with ə, 5 with ɒ, 4 with ɤ, 3
> with æ, 3 with y, 3 with œ, 1 with ʌ and 1 with ø)
> missing u: 28 lgs (11 with ɯ, 11 with ɨ, 2 with ʉ, 1 each with ɤ, ɪ, ə
> and o)
> If we look at missing-2-of the i-ɛ-a-ɔ-u set, of which there are 76
> languages, we find that the most common pattern involves missing ɔ and u:
> 30 languages (14 have o and  ɨ, 6 have o and ɯ, 5 have o and ə, and a
> variety of minority patterns)
> If we look at 3-vowel systems, i-a-u is the most common pattern, but i-a-o
> is pretty frequent as well, and dominant in some parts of the world (see
> Ross and Donohue 2011).
> The point is that we need to look at these things in terms of systems; and
> it's clear that losing/substituting a canonical back vowel is more common
> than a front vowel, and that losing/substituting a mid vowel is more common
> than a high vowel, and that "losing/substituting" a/the low vowel pretty
> much always means the vowel is more explicitly front, or back, but still
> low, and so losing a low vowel from the system isn't really a thing that
> languages do (we can note that there are 16 systems of three-vowel
> languages, with no high vowels, but only 12 with no low vowels (generally
> with a schwa).
> -Mark
> Ross, Bill, and Mark Donohue. 2011. The many origins of diversity and
> complexity in phonology. *Linguistic Typology* 15: 251-265.
> On Mon, 14 Sep 2020 at 18:18, Sebastian Nordhoff <
> sebastian.nordhoff at glottotopia.de> wrote:
>> Dear list members,
>> do we have any information about the cross-linguistic validity of the
>> following hypotheses?
>> 1) front vowels like /i/, /e/ are more frequent than back vowels like
>> /u/, /o/
>> 2) high vowels like /i/, /u/  are more frequent than mid vowels like
>> /e/, /o/.
>> 3) "corner vowels" /a/, /i/, /u/ are more frequent than anything else.
>> I am interested in information about types (phonemic inventory) as well
>> as tokens (counts in texts).
>> Best wishes and than you for your time
>> Sebastian
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20200914/d0333572/attachment.htm>

More information about the Lingtyp mailing list