Atkinson on phoneme inventories in Science
Matthew Dryer
dryer at BUFFALO.EDU
Fri Apr 22 01:53:02 UTC 2011
As if often the case, there is more agreement than disagreement between
Bill and me on the issues here. I am largely in agreement with what
Bill says here.
What I said about consulting nonlinguists who know a lot of statistics
was specifically a reaction to Bill's reporting that one of the ones he
talked to suggested that perhaps the fact that a correlation wasn't
found in some areas because it fell short of statistical significance
because of a smaller number of languages raised a red flag for me since,
for reasons I gave in my previous email, I don't think it is remotely
possible to ever achieve statistical significance within areas because
there is not sufficient independence to make that possible. In other
words, I don't think this individual would have said that had he
understood the nature of the linguistic data. Bill seemed to be
suggesting that I don't think typologists should learn statistics. But
I'm saying the opposite: we should learn statistics rather than
depending on the opinions of experts. Yes, we should consult experts,
not for their blanket opinions but for their assistance in our learning
relevant statistics.
<<I don't know if this is what Matthew intended, but this sounds very
much as if (cross)linguistic data is somehow immune to the laws of
statistics, and hence linguists need not concern themselves with
developments in statistics (especially since such developments are the
work of nonlinguists).>>
Actually, it's if anything the other way round. The most important laws
of statistics are those that dictate what properties data must have in
order for particular tests to be applicable and my greatest concern
about the practice of both linguists and nonlinguists applying
statistics to typological data is that they don't abide by those laws,
the linguists because they don't understand those restrictions and the
nonlinguists because they don't realize the nature of the linguistic
data. For example, with apologies to Ian Maddieson, I don't believe
that the data he applied the test to in his recent email meets the
conditions for that test, so I don't think that he has provided any
reason to think that there is a correlation between size of consonant
inventory and distance from the equator, despite his p<.0001.
Is typological data peculiar? It may well be the case that there is
data in other domains that is similar and that statistics developed for
those domains is applicable to typological data. But it is peculiar
enough that there is a lot of statistics out there that isn't
applicable. My view that it is peculiar is due, I admit, to my
consulting an expert, who after extended discussion with me on the
nature of typological data came to the conclusion that it was peculiar
and helped me understand why it is peculiar. Controlling for areal
factors presents a serious challenge.
I agree with Bill that it is a mistake to lump Dunn et al and Atkinson
together in this discussion, since they are opposite situations. My
problem with Atkinson is that he has come to a conclusion that I think
may be an artifact of his failing to do things right. I advocate
skepticism about the results of statistical work that claims the
existence of patterns of correlations simply because I know of too many
cases where there have been problems. I have no problem at all with
Dunn et al's statistics (except what I read in an earlier email of
Bill's). In fact Dunn et al are exhibiting the very skepticism I
advocate (though I didn't really have skepticism of my own work in mind
:). Although I think their conclusions are mistaken, we need more work
like Dunn et al's that challenges existing statistical claims.
Matthew
Bill Croft wrote:
> I will not discuss the content of Matthew's response here, as I have
> discussed it with him privately. I agree with his main point, that
> Atkinson should control for large-scale areal influence. I also agree
> with Ian that once one opens the door to one geographical correlation
> hypothesis, one needs to consider others as well.
>
> However, I strongly disagree with the implications of Matthew's
> polemical remarks:
>
> "But the big problem with the Atkinson paper and others like it is
> precisely that nonlinguists who are experts on statistics do not
> understand the peculiar nature of crosslinguistic data...Linguists
> should be very wary of seeking the advice of nonlinguists regarding
> statistics."
>
> I don't know if this is what Matthew intended, but this sounds very much
> as if (cross)linguistic data is somehow immune to the laws of
> statistics, and hence linguists need not concern themselves with
> developments in statistics (especially since such developments are the
> work of nonlinguists).
> Linguistic data is not "peculiar". Linguistic data, like other data from
> human behavior and other complex systems, is the product of stochastic
> processes influenced by a variety of factors that causally interact. Our
> task is to identify the relevant factors and determine their influence,
> if any. That can be done by a range of statistical methods and models,
> which for example can deal with large-scale areal influence in the
> phoneme inventory data if we think it should.
> Of course, identifying the relevant factors depends on the causal models
> that we propose to account for the behavior. Atkinson has a causal
> model, which leads him to bring in the factors that he does (and he
> controls for quite a number of plausible confounding factors, though not
> area, if you read his supplementary materials). The problem is, we
> linguists do not believe in the causal model, so we don't think distance
> from Africa, or even population size, should be the only additional
> factors considered in the statistical analysis. But linguists don't all
> agree on causal models of language behavior either. (Note that Dunn et
> al. are a team of linguists as well as nonlinguists.) And sometimes we
> have to look outside the box and consider other possibilities, as Hay
> and Bauer (2007) did - even if they turn out to be artifacts.
>
> I think that linguists should learn more about statistics. Many of the
> posts about Atkinson's paper at the Language Log, the NY Times article,
> and on Funknet do not recognize some basic statistical principles. Even
> the detailed and carefully reasoned posts would benefit from more
> detailed knowledge of statistics, I believe. I say that for myself as
> well, of course. For instance, I have been told (via a psycholinguist)
> that the puzzle I discussed, the possibility of different correlations
> in a sample and in partitions of the sample, has a name in statistics,
> Simpson's Paradox. I checked the indexes of the two statistics textbooks
> I have by linguists (Woods et al. 1986 and Baayen 2008), and my wife's
> university statistics textbook (Hays 1988); none of them listed
> Simpson's Paradox. I'm afraid that for me or any linguist to learn more
> about statistics means reading books written by nonlinguists, taking
> courses from nonlinguists, and/or consulting with nonlinguists.
>
> The response by linguists to Dunn et al. and Atkinson has been uniformly
> negative. Many have also been arrogant, condescending and dismissive.
> The attitude appears to be that any work on language by nonlinguists,
> especially that using fancy statistics, is completely wrong. That is why
> I have felt obliged to defend those aspects of both papers that I think
> are positive, and to question some of the criticisms. This doesn't mean
> that I endorse their results: I don't, in the case of Dunn et al., and I
> am uncertain about Atkinson. But I think that the problems with Dunn et
> al. and with Atkinson are quite different - linguistically and
> statistically - and that it is worth linguists recognizing and
> understanding these differences.
>
> Bill
>
> Baayen, R. Harald. 2008./ Analyzing linguistic data: a practical
> introduction to statistics using R./ Cambridge: Cambridge University Press.
> Hays, William L. 1988./ Statistics/ (4th ed.). New York: Holt, Rinehart
> and Winston.
>
> Woods, Anthony, Paul Fletcher & Arthur Hughes. 1986./ Statistics in
> language studies/. Cambridge: Cambridge University Press.
More information about the Lingtyp
mailing list