[Lingtyp] Frequency of front/back, high/mid vowels

Mark Donohue mhdonohue at gmail.com
Mon Sep 14 10:13:35 UTC 2020

Checking my database of 6950 languages/varieties, I get the following
figures (interpreting your question somewhat, keeping the categories
[HIGH], [LOW], [FRONT], [BACK] distinct):

Taking the vowel set to be limited to [ieɛæaɐɑɔou]
Front vowels: 15,127 occurrences
(i, e, ɛ, æ)
Back vowels: 14,762 occurrences
(ɑ, ɔ, o, u)
High vowels: 13,252 occurrences
(i, u)
Mid vowels: 15,768 occurrences
(e, ɛ, ɔ, o)
Low vowels: 7,779 occurrences
(æ, a, ɐ, ɑ)

I, for one, don't find this very helpful.

Sebastian's questions are more easily answered if we look at individual
(note: when there is no more explicit information, and no contrast between e
and ɛ, or o and ɔ, they are counted as [ɛ, ɔ].)
i 6,766 97%
e 1,846 27%
ɛ 6,007 86%
æ 508 7%
a 6,688 96%
ɐ 95 1%
ɑ 361 5%
ɔ 5,787 83%
o 2,128 31%
u 6,486 93%

And we really should do it separately for different types of vowel systems.
For example, there are 2,037 languages in the database with 5 contrasts in
quality (in short vowels)
The most common is

i ɛ a ɔ u 1785 lgs

All balanced in terms of front/back, and high/mid.

Of the remaining 252, we start to see asymmetries of the sort that
Sebastian is asking about: looking at the languages that are missing just
one of the vowels above, we have

missing i: 2 lgs (add ə or ɨ)
missing ɛ: 26 lgs (12 with ɨ, 5 with ə, 2 with æ, 2 with y, one with ɐ and
one with ɯ)
missing a: 15 lgs (9 with æ, 5 with ɑ, 1 with ɐ)
missing ɔ: 62 lgs (4 with o, 23 with ɨ, 24 with ə, 5 with ɒ, 4 with ɤ, 3
with æ, 3 with y, 3 with œ, 1 with ʌ and 1 with ø)
missing u: 28 lgs (11 with ɯ, 11 with ɨ, 2 with ʉ, 1 each with ɤ, ɪ, ə and o

If we look at missing-2-of the i-ɛ-a-ɔ-u set, of which there are 76
languages, we find that the most common pattern involves missing ɔ and u:
30 languages (14 have o and  ɨ, 6 have o and ɯ, 5 have o and ə, and a
variety of minority patterns)

If we look at 3-vowel systems, i-a-u is the most common pattern, but i-a-o
is pretty frequent as well, and dominant in some parts of the world (see
Ross and Donohue 2011).

The point is that we need to look at these things in terms of systems; and
it's clear that losing/substituting a canonical back vowel is more common
than a front vowel, and that losing/substituting a mid vowel is more common
than a high vowel, and that "losing/substituting" a/the low vowel pretty
much always means the vowel is more explicitly front, or back, but still
low, and so losing a low vowel from the system isn't really a thing that
languages do (we can note that there are 16 systems of three-vowel
languages, with no high vowels, but only 12 with no low vowels (generally
with a schwa).


Ross, Bill, and Mark Donohue. 2011. The many origins of diversity and
complexity in phonology. *Linguistic Typology* 15: 251-265.

> Dear list members,
> do we have any information about the cross-linguistic validity of the
> following hypotheses?
> 1) front vowels like /i/, /e/ are more frequent than back vowels like
> /u/, /o/
> 2) high vowels like /i/, /u/  are more frequent than mid vowels like
> /e/, /o/.
> 3) "corner vowels" /a/, /i/, /u/ are more frequent than anything else.
> I am interested in information about types (phonemic inventory) as well
> as tokens (counts in texts).
> Best wishes and than you for your time
> Sebastian
