Atkinson on phoneme inventories in Science

Bill Croft wcroft at
Tue Apr 19 18:24:47 UTC 2011

Dear Funknetters,

     I was asked to repost this from Lingtyp to Funknet. It is mostly 
to do with the Atkinson paper in Science that has been discussed her 
on Funknet, but also a paper by Dunn et al. that appeared in Nature 
last week. There has been some discussion of both papers on Lingtyp, 
which is referred to in this post. Although I think this post can be 
followed as is, I encourage interested parties to look at the cited 
posts on Lingtyp for clarification 

Atkinson argues for the existence of two correlations in a global 
sample of phoneme inventories: a correlation between size of phoneme 
inventory and distance from Africa, and a correlation between size of 
phoneme inventory and size of the population of the speech community. 
Atkinson needs the latter, phoneme-population correlation to justify 
his founder-effect explanation for the former correlation. The 
phoneme-population correlation was also identified by Hay and Bauer 
(2007). (Hay and Bauer also test Pericliev's [2004] data and found, 
pace Pericliev, that the correlation is also strong in his sample 
[Hay and Bauer 2007:397].) Johanna Nichols reports in her post a 
tentative result from her sample: she reports that the global 
correlation is present, but a division of the sample into large areas 
shows that the correlation does not exist, or is even negative, in 
some of the areas. On this basis, Johanna writes, "If there is really 
a correlation between population size and phoneme inventory size (or 
anything else), it should hold within areas as well as worldwide." 
She concludes that the global phoneme-population correlation is an 
artifact of population sizes in Eurasia and Africa, and areality in 
Africa plus neighboring regions.

Interestingly, with Dunn et al., the shoe is on the other foot with 
respect to global correlations and correlations in subpopulations. 
Here it is Dunn et al. who argue against the global word-order 
correlations manifested in Greenbergian word order universals. Dunn 
et al. argue that a correlation between various pairs of word orders 
are supported in some language families but not others. Hence 
word-order correlations are lineage-specific (and culture-specific) 
rather than universal in the Greenbergian sense. Dunn et al. divide 
the global sample into phylogenetic subpopulations rather than areal 
subpopulations, but the point is the same. (There are two differences 
between Dunn et al.'s analysis and the Greenberg universals: the 
Greenberg universals are synchronic, while Dunn et al's data is a 
sample of diachronic word order changes; and the model that Dunn et 
al. tests is not the model implied by Greenbergian universals. While 
these differences are important, as I argued in my post on their 
paper, I believe they aren't relevant to the point being made here.) 
And in the case of Dunn et al., Matthew Dryer argued in a post that 
the lineage-specific correlations are random effects and the globally 
identified Greenbergian word-order correlations are real.

I asked a couple of physicists with whom I collaborate about what to 
think of global correlations when those correlations are not found in 
most or all of the subpopulations that the data may be partitioned 
into (areal, phylogenetic, etc.). They both stated that a global 
correlation is statistically valid even if the same correlation does 
not exist in all the partitioned subpopulations. This situation may 
arise when negative correlations or noncorrelations in some 
subpopulations are more than compensated for by positive correlations 
in other subpopulations, so that the global effect is a positive 
correlation. (One of them further added that another possible reason 
is that the subpopulation samples may be too small to provide a 
significant correlation one way or the other.) When pressed further 
about why a global correlation would not lead to the same 
correlations in (large enough) subpopulations, the response was that, 
in the simplest case, X is dependent not only on Y but also on a 
factor Z that varies considerably from subpopulation to 
subpopulation; and that one would expect the same correlations in the 
subpopulations if and only if most of the observed variation in X is 
due to Y. In fact, this is not the case for the phoneme-population 
correlation: Atkinson shows that language family membership, which 
clearly varies by region, accounts for the greatest amount of 
variance for phoneme inventory size. But the other correlations still 
hold globally when combined with this factor (Atkinson, supplementary 
materials, pp. 5-6). So it appears that the global phoneme-population 
and word-order correlations are valid, that is, there is a factor (or 
factors) Y that needs to be accounted for; but there is apparently 
also a factor or factors Z that lead to areal- and/or 
phylogeny-specific differences in the linguistic patterns.

Of course, correlation is not causation, as we all know. We have to 
find an explanatory framework that allows us to say that when X 
correlates with Y (and Z), there is a causal connection between X and 
Y (and Z). One problem with the global phoneme-population correlation 
is that there is no satisfactory explanation for it: even the 
linguists who found the correlation have only a few suggestions that 
they do not consider to be strong enough to offer as an explanation. 
Conversely, there is no obvious explanation why word-order 
correlations might be lineage- or culture-specific. For example, no 
cultural reason easily comes to mind why Proto-Indo-Europeans and 
their descendants couple verb-object and adposition-noun order, but 
Proto-Uto-Aztecans and their descendants do not. Nor is there an 
obvious culture-specific nonlinguistic behavior that might be 
causally connected to word-order patterns in the way that spatial 
cognition has been shown to be connected to linguistic spatial frames 
of reference by Levinson and his colleagues.


Hay, Jennifer and Laurie Bauer. 2007. Phoneme inventory size and 
population size. Language 83.388-400.

Pericliev, Vladimir. 2004. There is no correlation between the size 
of a community speaking a language and the size of the phonological 
inventory of that language. Linguistic Typology 8.376-83.

More information about the Funknet mailing list