Atkinson on phoneme inventories in Science

Matthew Dryer dryer at BUFFALO.EDU
Fri Apr 22 01:53:02 UTC 2011


As if often the case, there is more agreement than disagreement between 
Bill and me on the issues here.  I am largely in agreement with what 
Bill says here.

What I said about consulting nonlinguists who know a lot of statistics 
was specifically a reaction to Bill's reporting that one of the ones he 
talked to suggested that perhaps the fact that a correlation wasn't 
found in some areas because it fell short of statistical significance 
because of a smaller number of languages raised a red flag for me since, 
for reasons I gave in my previous email, I don't think it is remotely 
possible to ever achieve statistical significance within areas because 
there is not sufficient independence to make that possible.  In other 
words, I don't think this individual would have said that had he 
understood the nature of the linguistic data.  Bill seemed to be 
suggesting that I don't think typologists should learn statistics.  But 
I'm saying the opposite: we should learn statistics rather than 
depending on the opinions of experts.  Yes, we should consult experts, 
not for their blanket opinions but for their assistance in our learning 
relevant statistics.

<<I don't know if this is what Matthew intended, but this sounds very 
much as if (cross)linguistic data is somehow immune to the laws of 
statistics, and hence linguists need not concern themselves with 
developments in statistics (especially since such developments are the 
work of nonlinguists).>>
Actually, it's if anything the other way round.  The most important laws 
of statistics are those that dictate what properties data must have in 
order for particular tests to be applicable and my greatest concern 
about the practice of both linguists and nonlinguists applying 
statistics to typological data is that they don't abide by those laws, 
the linguists because they don't understand those restrictions and the 
nonlinguists because they don't realize the nature of the linguistic 
data.  For example, with apologies to Ian Maddieson, I don't believe 
that the data he applied the test to in his recent email meets the 
conditions for that test, so I don't think that he has provided any 
reason to think that there is a correlation between size of consonant 
inventory and distance from the equator, despite his p<.0001.

Is typological data peculiar?  It may well be the case that there is 
data in other domains that is similar and that statistics developed for 
those domains is applicable to typological data.  But it is peculiar 
enough that there is a lot of statistics out there that isn't 
applicable.  My view that it is peculiar is due, I admit, to my 
consulting an expert, who after extended discussion with me on the 
nature of typological data came to the conclusion that it was peculiar 
and helped me understand why it is peculiar.  Controlling for areal 
factors presents a serious challenge.

I agree with Bill that it is a mistake to lump Dunn et al and Atkinson 
together in this discussion, since they are opposite situations.  My 
problem with Atkinson is that he has come to a conclusion that I think 
may be an artifact of his failing to do things right.  I advocate 
skepticism about the results of statistical work that claims the 
existence of patterns of correlations simply because I know of too many 
cases where there have been problems.  I have no problem at all with 
Dunn et al's statistics (except what I read in an earlier email of 
Bill's).  In fact Dunn et al are exhibiting the very skepticism I 
advocate (though I didn't really have skepticism of my own work in mind 
:).  Although I think their conclusions are mistaken, we need more work 
like Dunn et al's that challenges existing statistical claims.

Matthew

Bill Croft wrote:
> I will not discuss the content of Matthew's response here, as I have 
> discussed it with him privately. I agree with his main point, that 
> Atkinson should control for large-scale areal influence. I also agree 
> with Ian that once one opens the door to one geographical correlation 
> hypothesis, one needs to consider others as well.
> 
> However, I strongly disagree with the implications of Matthew's 
> polemical remarks:
> 
> "But the big problem with the Atkinson paper and others like it is 
> precisely that nonlinguists who are experts on statistics do not 
> understand the peculiar nature of crosslinguistic data...Linguists 
> should be very wary of seeking the advice of nonlinguists regarding 
> statistics."
> 
> I don't know if this is what Matthew intended, but this sounds very much 
> as if (cross)linguistic data is somehow immune to the laws of 
> statistics, and hence linguists need not concern themselves with 
> developments in statistics (especially since such developments are the 
> work of nonlinguists).
> Linguistic data is not "peculiar". Linguistic data, like other data from 
> human behavior and other complex systems, is the product of stochastic 
> processes influenced by a variety of factors that causally interact. Our 
> task is to identify the relevant factors and determine their influence, 
> if any. That can be done by a range of statistical methods and models, 
> which for example can deal with large-scale areal influence in the 
> phoneme inventory data if we think it should.
> Of course, identifying the relevant factors depends on the causal models 
> that we propose to account for the behavior. Atkinson has a causal 
> model, which leads him to bring in the factors that he does (and he 
> controls for quite a number of plausible confounding factors, though not 
> area, if you read his supplementary materials). The problem is, we 
> linguists do not believe in the causal model, so we don't think distance 
> from Africa, or even population size, should be the only additional 
> factors considered in the statistical analysis. But linguists don't all 
> agree on causal models of language behavior either. (Note that Dunn et 
> al. are a team of linguists as well as nonlinguists.) And sometimes we 
> have to look outside the box and consider other possibilities, as Hay 
> and Bauer (2007) did - even if they turn out to be artifacts.
> 
> I think that linguists should learn more about statistics. Many of the 
> posts about Atkinson's paper at the Language Log, the NY Times article, 
> and on Funknet do not recognize some basic statistical principles. Even 
> the detailed and carefully reasoned posts would benefit from more 
> detailed knowledge of statistics, I believe. I say that for myself as 
> well, of course. For instance, I have been told (via a psycholinguist) 
> that the puzzle I discussed, the possibility of different correlations 
> in a sample and in partitions of the sample, has a name in statistics, 
> Simpson's Paradox. I checked the indexes of the two statistics textbooks 
> I have by linguists (Woods et al. 1986 and Baayen 2008), and my wife's 
> university statistics textbook (Hays 1988); none of them listed 
> Simpson's Paradox. I'm afraid that for me or any linguist to learn more 
> about statistics means reading books written by nonlinguists, taking 
> courses from nonlinguists, and/or consulting with nonlinguists.
> 
> The response by linguists to Dunn et al. and Atkinson has been uniformly 
> negative. Many have also been arrogant, condescending and dismissive. 
> The attitude appears to be that any work on language by nonlinguists, 
> especially that using fancy statistics, is completely wrong. That is why 
> I have felt obliged to defend those aspects of both papers that I think 
> are positive, and to question some of the criticisms. This doesn't mean 
> that I endorse their results: I don't, in the case of Dunn et al., and I 
> am uncertain about Atkinson. But I think that the problems with Dunn et 
> al. and with Atkinson are quite different - linguistically and 
> statistically - and that it is worth linguists recognizing and 
> understanding these differences.
> 
> Bill
> 
> Baayen, R. Harald. 2008./ Analyzing linguistic data: a practical 
> introduction to statistics using R./ Cambridge: Cambridge University Press.
> Hays, William L. 1988./ Statistics/ (4th ed.). New York: Holt, Rinehart 
> and Winston.
> 
> Woods, Anthony, Paul Fletcher & Arthur Hughes. 1986./ Statistics in 
> language studies/. Cambridge: Cambridge University Press.



More information about the Lingtyp mailing list