[Lingtyp] Greenbergian word order universals: confirmed after all

Fri Nov 3 08:18:51 UTC 2023

I agree that one needs people who are knowledgeable in both linguistics and the computational approaches to do this kind of research properly. However, it's also obvious that doing such studies as carefully as would be necessary to get reliable results takes time - hence the people who want to do careful and reliable studies cannot churn out papers at top speed. Others who are less careful throw unreliable data into the black box of the software and get out some results that they can then interpret in whatever way fits them best and publish papers much more quickly - building up an impressive publication list that convinces the search committees who do not understand all the flaws and weaknesses of these publications. The careful researchers are penalized, because they are 'too slow', seemingly unproductive, because they cannot accumulate as many papers in the same time period - and hence these are the researchers who end up without a permanent position - even though these are precisely the kind of researchers one would want in the field to transmit their expertise and know-how and careful approach to future generations of academics. And this doesn't hold just for linguistics, I see it in molecular anthropology as well. 

*******************************
Brigitte PAKENDORF (she/elle/sie/она)
Directrice de recherche / Senior scientist
Dynamique Du Langage
http://www.ddl.cnrs.fr/pakendorf
CNRS & Université Lumière Lyon 2
14 avenue Berthelot
69007 Lyon
FRANCE

-----Original Message-----
From: Lingtyp <lingtyp-bounces at listserv.linguistlist.org> On Behalf Of Johann-Mattis List
Sent: Friday, 03 November 2023 09:08
To: lingtyp at listserv.linguistlist.org
Subject: Re: [Lingtyp] Greenbergian word order universals: confirmed after all

Dear Randy,

As you mention the study by Zhang et al. and the reliability of the data, and since this is quite important here, I think it is worthwhile to point to the discrepancy between the data that you have in the form of an etymological dictionarly like STEDT, and the data that goes to the computers.

Here, the study by Zhang et al. shows several huge problems that we had been looking into at some point, but no time to follow up. The most important problem is the fact that STEDT was not made for these analyses, and that we have in fact no real 100-item Swadesh list, and even no data in no state where you would be able to check individual roots and how they evolved along the tree. All is lost in the numbers, nobody knows if the coding had flaws or was done nicely, we will never know, due to the problematic coding procedure followed in the study, overlooked by all reviewers.

So we can observe another, an additional problem with using phylogenetic databases or typological databases: experts in one domain (etymology), like you, Randy, do often not have the possibility or the expertise in the other domain (data coding) so that a lot can be lost when trusting a database without checking how data was turned into numbers.

For the future, we need more common expertise of people who know both domains, similar to the field of evolutionary biology, where people understand major processes of language change (for example) while at the same time being able to understand how original data is turned into numerical representations in order to test for results computationally.

Best,

Mattis

Am 03.11.23 um 07:10 schrieb Randy LaPolla:
> Hi Martin and all,
> Over the years I have been asked by Nature to review a number of these 
> papers that use a  Bayesian-based algorithm (usually the same exact 
> one)—there has been a fad of such papers, and my response is almost 
> always the same: they have used a method (lexicostatistics) long ago 
> discredited in linguistics, but sometimes come up with results quite 
> similar to the results found by more empirical traditional studies. As 
> their valid  results are never new, the only thing worth mentioning is 
> the methodology, as Jürgen pointed out. The methodology fails 
> sometimes, though, and there are two crucial aspects why it does: the 
> only thing that varies among all these studies is what database they 
> use and how they set the priors, which can greatly bias the outcome. 
> The one such study I supported was by Zhang Menghan et al. in 2019, as 
> it used a very reliable database (Matisoff’s Sino-Tibetan Etymological 
> Dictionary and Thesaurus—developed over 30 years) and did not set any 
> priors that would have biased the outcome. Most of the others use 
> problematic datasets, and as the old saying goes, Garbage in, garbage out.
> 
> Randy
> 
>> On Nov 2, 2023, at 22:22, Martin Haspelmath 
>> <martin_haspelmath at eva.mpg.de> wrote:
>>
>> 
>>
>> Dear all,
>>
>> Twelve years ago, for the first (and so far last) time, typology made 
>> it into /Nature/, and /BBC Online/ reported at the time: “A 
>> long-standing idea that human languages share universal features that 
>> are dictated by human brain structure has been cast into doubt.”
>> (https://www.bbc.com/news/science-environment-13049700). Our journal 
>> /Linguistic Typology/ took this as an opportunity to publish a 
>> “Universals Debate” taking up 200 pages 
>> (https://www.degruyter.com/document/doi/10.1515/lity.2011.023/html).
>> Younger LINGTYP readers may not remember all this, but a lot of stir 
>> was caused at the time by the paper by Dunn et al. (2011), which 
>> claimed that "systematic linkages of traits are likely to be the rare 
>> exception rather than the rule. Linguistic diversity does not seem to 
>> be tightly constrained by universal cognitive factors“ 
>> (https://www.nature.com/articles/nature09923). Their paper argued not 
>> only against Chomskyan UG (universal grammar), but also against the 
>> Greenbergian word order universals (Dryer 1992).
>>
>> In the meantime, however, it has become clear that those surprising 
>> claims about word order universals are not supported – the sample 
>> size (four language families) used in their paper was much too small.
>>
>> Much less prominently, Jäger & Wahle (2021) reexamined those claims 
>> (using similar methods, but many more language families and all 
>> relevant /WALS/ data), finding “statistical evidence for 13 word 
>> order features, which largely confirm the findings of traditional 
>> typological research”
>> (https://www.frontiersin.org/articles/10.3389/fpsyg.2021.682132/full).
>>
>> Similarly, Annemarie Verkerk and colleagues (including Russell Gray) 
>> have recently reexamined a substantial number of claimed universals 
>> on the basis of the much larger Grambank database and found that 
>> especially Greenberg’s word order universals hold up quite well (see 
>> Verkerk’s talk at the recent Grambank workshop at MPI-EVA:
>> https://www.eva.mpg.de/de/linguistic-and-cultural-evolution/events/2023-grambank-workshop/, available on YouTube: https://www.youtube.com/playlist?list=PLSqqgRcaL9yl8FNW_wb8tDIzz9R78t8Uk).
>>
>> So what went wrong in 2011? We are used to paying a lot of attention 
>> to the “big journals” (/Nature, Science, PNAS, Cell/), but they often 
>> focus on sensationalist claims, and they typically publish less 
>> reliable results than average journals (see Brembs 2018:
>> https://www.frontiersin.org/articles/10.3389/fnhum.2018.00037/full).
>>
>> So maybe we should be extra skeptical when a paper is published in a 
>> high-prestige journal. But another question that I have is: Why 
>> didn’t the authors see that their 2011 results were unlikely to be 
>> true, and that their sample size was much too small? Why didn't they 
>> notice that most of the word order changes in their four families 
>> were contact-induced? Were they so convinced that their new 
>> mathematical method (adopted from computational biology) would yield 
>> correct results that they neglected to pay sufficient attention to the data?
>> Would it have helped if they had submitted their paper to a 
>> linguistics journal?
>>
>> Perhaps I’m too pessimistic (see also this blogpost: 
>> https://dlc.hypotheses.org/2368), but in any event, I think that this 
>> intriguing episode (and sobering experience) should be discussed 
>> among typologists, and we should learn from it, in one way or another.
>> Advanced quantitative methods are now everywhere in science, and it 
>> seems that they are often misapplied or misunderstood (see also this 
>> recent blogpost by Richard McElreath:
>> https://elevanth.org/blog/2023/06/13/science-and-the-dumpster-fire/).
>>
>> Martin
>>
>> --
>> Martin Haspelmath
>> Max Planck Institute for Evolutionary Anthropology Deutscher Platz 6
>> D-04103 Leipzig
>> https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin
>> -haspelmath/ _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> 
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
_______________________________________________
Lingtyp mailing list
Lingtyp at listserv.linguistlist.org
https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp