<!doctype html public "-//W3C//DTD W3 HTML//EN">

<html><head><style type="text/css"><!--

blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }

 --></style><title>Re: Atkinson on phoneme inventories in

Science</title></head><body>

<div><font face="Times New Roman" color="#000000">I will not discuss

the content of Matthew's response here, as I have discussed it with

him privately. I agree with his main point, that Atkinson should

control for large-scale areal influence. I also agree with Ian that

once one opens the door to one geographical correlation hypothesis,

one needs to consider others as well.<br>

<br>

However, I strongly disagree with the implications of Matthew's

polemical remarks:<br>

<br>

"But the big problem with the Atkinson paper and others like it

is precisely that nonlinguists who are experts on statistics do not

understand the peculiar nature of crosslinguistic data...Linguists

should be very wary of seeking the advice of nonlinguists regarding

statistics."<br>

<br>

I don't know if this is what Matthew intended, but this sounds very

much as if (cross)linguistic data is somehow immune to the laws of

statistics, and hence linguists need not concern themselves with

developments in statistics (especially since such developments are the

work of nonlinguists).</font><br>

<font face="Times New Roman" color="#000000"></font></div>

<div><font face="Times New Roman" color="#000000">Linguistic data is

not "peculiar". Linguistic data, like other data from human

behavior and other complex systems, is the product of stochastic

processes influenced by a variety of factors that causally interact.

Our task is to identify the relevant factors and determine their

influence, if any. That can be done by a range of statistical methods

and models, which for example can deal with large-scale areal

influence in the phoneme inventory data if we think it

should.</font><br>

<font face="Times New Roman" color="#000000"></font></div>

<div><font face="Times New Roman" color="#000000">Of course,

identifying the relevant factors depends on the causal models that we

propose to account for the behavior. Atkinson has a causal model,

which leads him to bring in the factors that he does (and he controls

for quite a number of plausible confounding factors, though not area,

if you read his supplementary materials). The problem is, we linguists

do not believe in the causal model, so we don't think distance from

Africa, or even population size, should be the only additional factors

considered in the statistical analysis. But linguists don't all agree

on causal models of language behavior either. (Note that Dunn et al.

are a team of linguists as well as nonlinguists.) And sometimes we

have to look outside the box and consider other possibilities, as Hay

and Bauer (2007) did - even if they turn out to be

artifacts.</font></div>

<div><font face="Times New Roman" color="#000000"><br></font></div>

<div><font face="Times New Roman" color="#000000">I think that

linguists should learn more about statistics. Many of the posts about

Atkinson's paper at the Language Log, the NY Times article, and on

Funknet do not recognize some basic statistical principles. Even the

detailed and carefully reasoned posts would benefit from more detailed

knowledge of statistics, I believe. I say that for myself as well, of

course. For instance, I have been told (via a psycholinguist) that the

puzzle I discussed, the possibility of different correlations in a

sample and in partitions of the sample, has a name in statistics,

Simpson's Paradox. I checked the indexes of the two statistics

textbooks I have by linguists (Woods et al. 1986 and Baayen 2008), and

my wife's university statistics textbook (Hays 1988); none of them

listed Simpson's Paradox. I'm afraid that for me or any linguist to

learn more about statistics means reading books written by

nonlinguists, taking courses from nonlinguists, and/or consulting with

nonlinguists.</font></div>

<div><font face="Times New Roman" color="#000000"><br>

The response by linguists to Dunn et al. and Atkinson has been

uniformly negative. Many have also been arrogant, condescending and

dismissive. The attitude appears to be that any work on language by

nonlinguists, especially that using fancy statistics, is completely

wrong. That is why I have felt obliged to defend those aspects of both

papers that I think are positive, and to question some of the

criticisms. This doesn't mean that I endorse their results: I don't,

in the case of Dunn et al., and I am uncertain about Atkinson. But I

think that the problems with Dunn et al. and with Atkinson are quite

different - linguistically and statistically - and that it is worth

linguists recognizing and understanding these

differences.</font></div>

<div><font face="Times New Roman" color="#000000"><br>

Bill<br>

<br>

</font><font face="Times" color="#000000">Baayen, R. Harald. 2008.<i>

Analyzing linguistic data: a practical introduction to statistics

using R.</i> Cambridge: Cambridge University Press.</font><br>

<font face="Times" color="#000000"></font></div>

<div><font face="Times" color="#000000">Hays, William L. 1988.<i>

Statistics</i> (4th ed.). New York: Holt, Rinehart and

Winston.</font></div>

<div><font face="Times" color="#000000"><br>

Woods, Anthony, Paul Fletcher & Arthur Hughes. 1986.<i> Statistics

in language studies</i>. Cambridge: Cambridge University

Press.</font><br>

<font face="Times" color="#000000"></font></div>

</body>

</html>