<!doctype html public "-//W3C//DTD W3 HTML//EN">
<html><head><style type="text/css"><!--
blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }
--></style><title>Re: Atkinson on phoneme inventories in
Science</title></head><body>
<div><font face="Times New Roman" color="#000000">I will not discuss
the content of Matthew's response here, as I have discussed it with
him privately. I agree with his main point, that Atkinson should
control for large-scale areal influence. I also agree with Ian that
once one opens the door to one geographical correlation hypothesis,
one needs to consider others as well.<br>
<br>
However, I strongly disagree with the implications of Matthew's
polemical remarks:<br>
<br>
"But the big problem with the Atkinson paper and others like it
is precisely that nonlinguists who are experts on statistics do not
understand the peculiar nature of crosslinguistic data...Linguists
should be very wary of seeking the advice of nonlinguists regarding
statistics."<br>
<br>
I don't know if this is what Matthew intended, but this sounds very
much as if (cross)linguistic data is somehow immune to the laws of
statistics, and hence linguists need not concern themselves with
developments in statistics (especially since such developments are the
work of nonlinguists).</font><br>
<font face="Times New Roman" color="#000000"></font></div>
<div><font face="Times New Roman" color="#000000">Linguistic data is
not "peculiar". Linguistic data, like other data from human
behavior and other complex systems, is the product of stochastic
processes influenced by a variety of factors that causally interact.
Our task is to identify the relevant factors and determine their
influence, if any. That can be done by a range of statistical methods
and models, which for example can deal with large-scale areal
influence in the phoneme inventory data if we think it
should.</font><br>
<font face="Times New Roman" color="#000000"></font></div>
<div><font face="Times New Roman" color="#000000">Of course,
identifying the relevant factors depends on the causal models that we
propose to account for the behavior. Atkinson has a causal model,
which leads him to bring in the factors that he does (and he controls
for quite a number of plausible confounding factors, though not area,
if you read his supplementary materials). The problem is, we linguists
do not believe in the causal model, so we don't think distance from
Africa, or even population size, should be the only additional factors
considered in the statistical analysis. But linguists don't all agree
on causal models of language behavior either. (Note that Dunn et al.
are a team of linguists as well as nonlinguists.) And sometimes we
have to look outside the box and consider other possibilities, as Hay
and Bauer (2007) did - even if they turn out to be
artifacts.</font></div>
<div><font face="Times New Roman" color="#000000"><br></font></div>
<div><font face="Times New Roman" color="#000000">I think that
linguists should learn more about statistics. Many of the posts about
Atkinson's paper at the Language Log, the NY Times article, and on
Funknet do not recognize some basic statistical principles. Even the
detailed and carefully reasoned posts would benefit from more detailed
knowledge of statistics, I believe. I say that for myself as well, of
course. For instance, I have been told (via a psycholinguist) that the
puzzle I discussed, the possibility of different correlations in a
sample and in partitions of the sample, has a name in statistics,
Simpson's Paradox. I checked the indexes of the two statistics
textbooks I have by linguists (Woods et al. 1986 and Baayen 2008), and
my wife's university statistics textbook (Hays 1988); none of them
listed Simpson's Paradox. I'm afraid that for me or any linguist to
learn more about statistics means reading books written by
nonlinguists, taking courses from nonlinguists, and/or consulting with
nonlinguists.</font></div>
<div><font face="Times New Roman" color="#000000"><br>
The response by linguists to Dunn et al. and Atkinson has been
uniformly negative. Many have also been arrogant, condescending and
dismissive. The attitude appears to be that any work on language by
nonlinguists, especially that using fancy statistics, is completely
wrong. That is why I have felt obliged to defend those aspects of both
papers that I think are positive, and to question some of the
criticisms. This doesn't mean that I endorse their results: I don't,
in the case of Dunn et al., and I am uncertain about Atkinson. But I
think that the problems with Dunn et al. and with Atkinson are quite
different - linguistically and statistically - and that it is worth
linguists recognizing and understanding these
differences.</font></div>
<div><font face="Times New Roman" color="#000000"><br>
Bill<br>
<br>
</font><font face="Times" color="#000000">Baayen, R. Harald. 2008.<i>
Analyzing linguistic data: a practical introduction to statistics
using R.</i> Cambridge: Cambridge University Press.</font><br>
<font face="Times" color="#000000"></font></div>
<div><font face="Times" color="#000000">Hays, William L. 1988.<i>
Statistics</i> (4th ed.). New York: Holt, Rinehart and
Winston.</font></div>
<div><font face="Times" color="#000000"><br>
Woods, Anthony, Paul Fletcher & Arthur Hughes. 1986.<i> Statistics
in language studies</i>. Cambridge: Cambridge University
Press.</font><br>
<font face="Times" color="#000000"></font></div>
</body>
</html>