Atkinson on phoneme inventories in Science

Mark Donohue mark at DONOHUE.CC
Wed Apr 20 23:17:13 UTC 2011


For what it's worth: examining a much larger database (1300 languages)
that doesn't "type" consonant and vowel inventory sizes into a 1-5 or
1-3 scale (but rather details real numbers), you find a correlation
with language population size; but that correlates strongly with
travelling west.
Much more significant is community population size, and there we find
very little correlation with phoneme inventory size.
On the question of suprasegmentals, such as tone: since WALS codes
this on a 0-2 scale, whereas we know that there are languages with 10
contrastive pitch contours per syllable, and also that 2 contrastive
contours can represent two contrasts per syllable or two contrasts per
word, just to start, it is impossible to obtain a realistic idea of
"total contrasts possible" from WALS data.

-Mark Donohue



On Wed, Apr 20, 2011 at 3:54 PM, Frans Plank
<frans.plank at uni-konstanz.de> wrote:
> I wonder whether this is a minor or a serious question:  Just HOW does an
> individual, or a speech (sub)community, "lose" or "gain" "a phoneme"?  Well,
> linguistically speaking, it's about featural contrasts, really, and their
> syntagmatic incidence.   Could the answer(s) conceivably matter?  Or can
> historical/developmental phonology really be done without phonology?  But
> these are night thoughts.
>
> Come to think of it, there have been few phonologists in this debate, so
> far.  They don't seem to be losing a night's sleep over the tale of the
> vanished African phonemes.
>
> Frans
>
>
> On Apr 20, 2011, at 10:39 PM, Matthew Dryer wrote:
>
>> There are three problems that I am aware of with the Atkinson paper, one
>> minor, two serious.  My primary concern is the hypothesis that there is a
>> positive correlation between population size and size of phoneme inventory.
>>  Since his further "out-of-Africa" claims assumes the former, it's not clear
>> that he has anything interesting to say about this if the former turns out
>> to be false.
>>
>> First, he claims that phoneme inventories are relatively small in North
>> America.  Maddieson's WALS data does not provide data on overall inventory
>> size, only on consonant inventory, vowel inventory, and complexity of tones.
>>  Atkinson's way of deriving overall inventory size is apparently to treat
>> these three variables equally.  However, this is not what is normally
>> understood by inventory size.
>> Languages in North America tend to have large consonant inventories and
>> small vowel inventories, but large overall inventories since large consonant
>> inventory tends to lead to large overall inventory.  This is not as serious
>> a problem as the next two, because if there IS a correlation between
>> population size and Atkinson's metric, then that would be interesting.
>>
>> Second, in order to test any crosslinguistic hypothesis, one needs to have
>> a sample that is unbiased with respect to the phenomena being examined.  In
>> general, this is true for the WALS languages, as long as one controls for
>> genealogy and area.  But the WALS languages are a highly biased sample as
>> far as population size is concerned.  This is because there is a very strong
>> correlation between population size and availability of grammatical
>> descriptions, especially in Europe, Asia and Africa.  In fact, in
>> constructing the 200-language sample for WALS, we deliberately chose
>> languages with larger speaker populations in the sense that we deliberately
>> chose languages for which data was more readily available.  This bias itself
>> renders Atkinson's claims suspect, but his further claims about distance
>> from Africa are rendered further suspect since the population bias in the
>> sample is lower for languages further from Africa.
>>
>> Third, although Atkinson controlled for non-independence within families,
>> he did not control for non-independence within areas and because he failed
>> to do so, his claims in terms of statistical significance are invalid.  One
>> of the goals of the WALS atlas is to show areal patterns.  One has only to
>> look at Maddieson's WALS map for size of consonant inventories to see that
>> there is a large area in northwest North America with large consonant
>> inventories (with languages from many different families) and a similar area
>> in southeast Asia (again with languages from many different families) and a
>> large area in northern South America with small consonant inventories (again
>> with languages from many different language families) and similarly for New
>> Guinea.  In other words, size of inventory can be an areal phenomenon.
>>
>> I have argued in many places that unless a correlation is found
>> independently in all parts of the world (in my method, the six continental
>> areas I use), then we cannot conclude that it is real.  Bill Croft says "I
>> asked a couple of physicists with whom I collaborate about what to think of
>> global correlations when those correlations are not found in most or all of
>> the subpopulations that the data may be partitioned into (areal,
>> phylogenetic, etc.). They both stated that a global correlation is
>> statistically valid even if the same correlation does not exist in all the
>> partitioned subpopulations."  But the big problem with the Atkinson paper
>> and others like it is precisely that nonlinguists who are experts on
>> statistics do not understand the peculiar nature of crosslinguistic data.
>>  It is obviously the case in general that correlations over a domain can be
>> valid even if they are not found in all subdomains.  But I argued in my 1989
>> paper on large linguistic areas that the only way to determine whether there
>> is a global correlation is see whether it is true in all areas of the world.
>>  This is actually something of an overstatement; there are other ways to
>> control for areal factors.  But as a rule of thumb, if something is not
>> found in all areas or most areas, one should be very suspicious that the
>> apparent global correlation is simply an artifact of not controlling for
>> area.
>>
>> Bill's additional statement "One of them further added that another
>> possible reason is that the subpopulation samples may be too small to
>> provide a significant correlation one way or the other" also betrays a lack
>> of awareness on the part of these physicists of the problem presented by
>> areal phenomena.  With crosslinguistic typological data, it is to all
>> intents and purposes LOGICALLY impossible to test for statistical
>> significance WITHIN linguistic areas because there are such strong areal
>> patterns within large areas that one cannot find enough independent cases to
>> remotely approach statistical significance.  We cannot, for example,
>> determine whether there is a correlation between population size and phoneme
>> inventory size within Africa on the basis of, say, three languages.  But the
>> areal patterns within Africa are such that one cannot find more than three
>> or so languages that are genealogically and areally independent.  Linguists
>> should be very wary of seeking the advice of nonlinguists regarding
>> statistics.
>>
>> Matthew
>>
>> Bill Croft wrote:
>>>
>>> Atkinson argues for the existence of two correlations in a global sample
>>> of phoneme inventories: a correlation between size of phoneme inventory and
>>> distance from Africa, and a correlation between size of phoneme inventory
>>> and size of the population of the speech community. Atkinson needs the
>>> latter, phoneme-population correlation to justify his founder-effect
>>> explanation for the former correlation. The phoneme-population correlation
>>> was also identified by Hay and Bauer (2007). (Hay and Bauer also test
>>> Pericliev's [2004] data and found, pace Pericliev, that the correlation is
>>> also strong in his sample [Hay and Bauer 2007:397].) Johanna Nichols reports
>>> in her post a tentative result from her sample: she reports that the global
>>> correlation is present, but a division of the sample into large areas shows
>>> that the correlation does not exist, or is even negative, in some of the
>>> areas. On this basis, Johanna writes, "If there is really a correlation
>>> between population size and phoneme inventory size (or anything else), it
>>> should hold within areas as well as worldwide." She concludes that the
>>> global phoneme-population correlation is an artifact of population sizes in
>>> Eurasia and Africa, and areality in Africa plus neighboring regions.
>>> Interestingly, with Dunn et al., the shoe is on the other foot with
>>> respect to global correlations and correlations in subpopulations. Here it
>>> is Dunn et al. who argue against the global word-order correlations
>>> manifested in Greenbergian word order universals. Dunn et al. argue that a
>>> correlation between various pairs of word orders are supported in some
>>> language families but not others. Hence word-order correlations are
>>> lineage-specific (and culture-specific) rather than universal in the
>>> Greenbergian sense. Dunn et al. divide the global sample into phylogenetic
>>> subpopulations rather than areal subpopulations, but the point is the same.
>>> (There are two differences between Dunn et al.'s analysis and the Greenberg
>>> universals: the Greenberg universals are synchronic, while Dunn et al's data
>>> is a sample of diachronic word order changes; and the model that Dunn et al.
>>> tests is not the model implied by Greenbergian universals. While these
>>> differences are important, as I argued in my post on their paper, I believe
>>> they aren't relevant to the point being made here.) And in the case of Dunn
>>> et al., Matthew Dryer argued in a post that the lineage-specific
>>> correlations are random effects and the globally identified Greenbergian
>>> word-order correlations are real.
>>> I asked a couple of physicists with whom I collaborate about what to
>>> think of global correlations when those correlations are not found in most
>>> or all of the subpopulations that the data may be partitioned into (areal,
>>> phylogenetic, etc.). They both stated that a global correlation is
>>> statistically valid even if the same correlation does not exist in all the
>>> partitioned subpopulations. This situation may arise when negative
>>> correlations or noncorrelations in some subpopulations are more than
>>> compensated for by positive correlations in other subpopulations, so that
>>> the global effect is a positive correlation. (One of them further added that
>>> another possible reason is that the subpopulation samples may be too small
>>> to provide a significant correlation one way or the other.) When pressed
>>> further about why a global correlation would not lead to the same
>>> correlations in (large enough) subpopulations, the response was that, in the
>>> simplest case, X is dependent not only on Y but also on a factor Z that
>>> varies considerably from subpopulation to subpopulation; and that one would
>>> expect the same correlations in the subpopulations if and only if most of
>>> the observed variation in X is due to Y. In fact, this is not the case for
>>> the phoneme-population correlation: Atkinson shows that language family
>>> membership, which clearly varies by region, accounts for the greatest amount
>>> of variance for phoneme inventory size. But the other correlations still
>>> hold globally when combined with this factor (Atkinson, supplementary
>>> materials, pp. 5-6). So it appears that the global phoneme-population and
>>> word-order correlations are valid, that is, there is a factor (or factors) Y
>>> that needs to be accounted for; but there is apparently also a factor or
>>> factors Z that lead to areal- and/or phylogeny-specific differences in the
>>> linguistic patterns.
>>> Of course, correlation is not causation, as we all know. We have to find
>>> an explanatory framework that allows us to say that when X correlates with Y
>>> (and Z), there is a causal connection between X and Y (and Z). One problem
>>> with the global phoneme-population correlation is that there is no
>>> satisfactory explanation for it: even the linguists who found the
>>> correlation have only a few suggestions that they do not consider to be
>>> strong enough to offer as an explanation. Conversely, there is no obvious
>>> explanation why word-order correlations might be lineage- or
>>> culture-specific. For example, no cultural reason easily comes to mind why
>>> Proto-Indo-Europeans and their descendants couple verb-object and
>>> adposition-noun order, but Proto-Uto-Aztecans and their descendants do not.
>>> Nor is there an obvious culture-specific nonlinguistic behavior that might
>>> be causally connected to word-order patterns in the way that spatial
>>> cognition has been shown to be connected to linguistic spatial frames of
>>> reference by Levinson and his colleagues.
>>> Bill
>>> Hay, Jennifer and Laurie Bauer. 2007. Phoneme inventory size and
>>> population size. Language 83.388-400.
>>> Pericliev, Vladimir. 2004. There is no correlation between the size of a
>>> community speaking a language and the size of the phonological inventory of
>>> that language. Linguistic Typology 8.376-83.
>



More information about the Lingtyp mailing list