[Lingtyp] Greenbergian word order universals: confirmed after all

Eitan Grossman eitan.grossman at mail.huji.ac.il
Fri Nov 3 13:27:48 UTC 2023


Hi all,

I wonder if there is a (loose) correlation between the size of the claims
and the potential for carelessness with data and methods. I could be wrong,
but it seems to me that much of the work done by descriptive linguists
tends to be seen as valuable for decades after publication, and to
contribute to a genuinely incremental increase of our knowledge, whereas
studies with big claims often seem to remain controversial with no
consensual way of judging their lasting empirical value.

I really debated whether to add some comments to those of Jürgen and
Brigitte -- mainly about bad editorial practices at linguistics journals,
not only high-profile generic science ones, that lead to the inflated
publication of articles that turn out to be less than sound, but I haven't
found a way to phrase it constructively. But I am not sure that Ian's
suggestion that articles be first published in linguistics journals is the
right way forward, simply because I have become increasingly pessimistic
about our own editorial practices.

Eitan




Eitan Grossman
Associate Professor, Department of Linguistics
Department of Linguistics
Hebrew University of Jerusalem
Tel: +972 2 588 3809




On Fri, Nov 3, 2023 at 10:19 AM PAKENDORF Brigitte <
brigitte.pakendorf at cnrs.fr> wrote:

> I agree that one needs people who are knowledgeable in both linguistics
> and the computational approaches to do this kind of research properly.
> However, it's also obvious that doing such studies as carefully as would be
> necessary to get reliable results takes time - hence the people who want to
> do careful and reliable studies cannot churn out papers at top speed.
> Others who are less careful throw unreliable data into the black box of the
> software and get out some results that they can then interpret in whatever
> way fits them best and publish papers much more quickly - building up an
> impressive publication list that convinces the search committees who do not
> understand all the flaws and weaknesses of these publications. The careful
> researchers are penalized, because they are 'too slow', seemingly
> unproductive, because they cannot accumulate as many papers in the same
> time period - and hence these are the researchers who end up without a
> permanent position - even though these are precisely the kind of
> researchers one would want in the field to transmit their expertise and
> know-how and careful approach to future generations of academics. And this
> doesn't hold just for linguistics, I see it in molecular anthropology as
> well.
>
> *******************************
> Brigitte PAKENDORF (she/elle/sie/она)
> Directrice de recherche / Senior scientist
> Dynamique Du Langage
> http://www.ddl.cnrs.fr/pakendorf
> CNRS & Université Lumière Lyon 2
> 14 avenue Berthelot
> 69007 Lyon
> FRANCE
>
> -----Original Message-----
> From: Lingtyp <lingtyp-bounces at listserv.linguistlist.org> On Behalf Of
> Johann-Mattis List
> Sent: Friday, 03 November 2023 09:08
> To: lingtyp at listserv.linguistlist.org
> Subject: Re: [Lingtyp] Greenbergian word order universals: confirmed after
> all
>
> Dear Randy,
>
> As you mention the study by Zhang et al. and the reliability of the data,
> and since this is quite important here, I think it is worthwhile to point
> to the discrepancy between the data that you have in the form of an
> etymological dictionarly like STEDT, and the data that goes to the
> computers.
>
> Here, the study by Zhang et al. shows several huge problems that we had
> been looking into at some point, but no time to follow up. The most
> important problem is the fact that STEDT was not made for these analyses,
> and that we have in fact no real 100-item Swadesh list, and even no data in
> no state where you would be able to check individual roots and how they
> evolved along the tree. All is lost in the numbers, nobody knows if the
> coding had flaws or was done nicely, we will never know, due to the
> problematic coding procedure followed in the study, overlooked by all
> reviewers.
>
> So we can observe another, an additional problem with using phylogenetic
> databases or typological databases: experts in one domain (etymology), like
> you, Randy, do often not have the possibility or the expertise in the other
> domain (data coding) so that a lot can be lost when trusting a database
> without checking how data was turned into numbers.
>
> For the future, we need more common expertise of people who know both
> domains, similar to the field of evolutionary biology, where people
> understand major processes of language change (for example) while at the
> same time being able to understand how original data is turned into
> numerical representations in order to test for results computationally.
>
> Best,
>
> Mattis
>
> Am 03.11.23 um 07:10 schrieb Randy LaPolla:
> > Hi Martin and all,
> > Over the years I have been asked by Nature to review a number of these
> > papers that use a  Bayesian-based algorithm (usually the same exact
> > one)—there has been a fad of such papers, and my response is almost
> > always the same: they have used a method (lexicostatistics) long ago
> > discredited in linguistics, but sometimes come up with results quite
> > similar to the results found by more empirical traditional studies. As
> > their valid  results are never new, the only thing worth mentioning is
> > the methodology, as Jürgen pointed out. The methodology fails
> > sometimes, though, and there are two crucial aspects why it does: the
> > only thing that varies among all these studies is what database they
> > use and how they set the priors, which can greatly bias the outcome.
> > The one such study I supported was by Zhang Menghan et al. in 2019, as
> > it used a very reliable database (Matisoff’s Sino-Tibetan Etymological
> > Dictionary and Thesaurus—developed over 30 years) and did not set any
> > priors that would have biased the outcome. Most of the others use
> > problematic datasets, and as the old saying goes, Garbage in, garbage
> out.
> >
> > Randy
> >
> >> On Nov 2, 2023, at 22:22, Martin Haspelmath
> >> <martin_haspelmath at eva.mpg.de> wrote:
> >>
> >> 
> >>
> >> Dear all,
> >>
> >> Twelve years ago, for the first (and so far last) time, typology made
> >> it into /Nature/, and /BBC Online/ reported at the time: “A
> >> long-standing idea that human languages share universal features that
> >> are dictated by human brain structure has been cast into doubt.”
> >> (https://www.bbc.com/news/science-environment-13049700). Our journal
> >> /Linguistic Typology/ took this as an opportunity to publish a
> >> “Universals Debate” taking up 200 pages
> >> (https://www.degruyter.com/document/doi/10.1515/lity.2011.023/html).
> >> Younger LINGTYP readers may not remember all this, but a lot of stir
> >> was caused at the time by the paper by Dunn et al. (2011), which
> >> claimed that "systematic linkages of traits are likely to be the rare
> >> exception rather than the rule. Linguistic diversity does not seem to
> >> be tightly constrained by universal cognitive factors“
> >> (https://www.nature.com/articles/nature09923). Their paper argued not
> >> only against Chomskyan UG (universal grammar), but also against the
> >> Greenbergian word order universals (Dryer 1992).
> >>
> >> In the meantime, however, it has become clear that those surprising
> >> claims about word order universals are not supported – the sample
> >> size (four language families) used in their paper was much too small.
> >>
> >> Much less prominently, Jäger & Wahle (2021) reexamined those claims
> >> (using similar methods, but many more language families and all
> >> relevant /WALS/ data), finding “statistical evidence for 13 word
> >> order features, which largely confirm the findings of traditional
> >> typological research”
> >> (https://www.frontiersin.org/articles/10.3389/fpsyg.2021.682132/full).
> >>
> >> Similarly, Annemarie Verkerk and colleagues (including Russell Gray)
> >> have recently reexamined a substantial number of claimed universals
> >> on the basis of the much larger Grambank database and found that
> >> especially Greenberg’s word order universals hold up quite well (see
> >> Verkerk’s talk at the recent Grambank workshop at MPI-EVA:
> >>
> https://www.eva.mpg.de/de/linguistic-and-cultural-evolution/events/2023-grambank-workshop/,
> available on YouTube:
> https://www.youtube.com/playlist?list=PLSqqgRcaL9yl8FNW_wb8tDIzz9R78t8Uk).
> >>
> >> So what went wrong in 2011? We are used to paying a lot of attention
> >> to the “big journals” (/Nature, Science, PNAS, Cell/), but they often
> >> focus on sensationalist claims, and they typically publish less
> >> reliable results than average journals (see Brembs 2018:
> >> https://www.frontiersin.org/articles/10.3389/fnhum.2018.00037/full).
> >>
> >> So maybe we should be extra skeptical when a paper is published in a
> >> high-prestige journal. But another question that I have is: Why
> >> didn’t the authors see that their 2011 results were unlikely to be
> >> true, and that their sample size was much too small? Why didn't they
> >> notice that most of the word order changes in their four families
> >> were contact-induced? Were they so convinced that their new
> >> mathematical method (adopted from computational biology) would yield
> >> correct results that they neglected to pay sufficient attention to the
> data?
> >> Would it have helped if they had submitted their paper to a
> >> linguistics journal?
> >>
> >> Perhaps I’m too pessimistic (see also this blogpost:
> >> https://dlc.hypotheses.org/2368), but in any event, I think that this
> >> intriguing episode (and sobering experience) should be discussed
> >> among typologists, and we should learn from it, in one way or another.
> >> Advanced quantitative methods are now everywhere in science, and it
> >> seems that they are often misapplied or misunderstood (see also this
> >> recent blogpost by Richard McElreath:
> >> https://elevanth.org/blog/2023/06/13/science-and-the-dumpster-fire/).
> >>
> >> Martin
> >>
> >> --
> >> Martin Haspelmath
> >> Max Planck Institute for Evolutionary Anthropology Deutscher Platz 6
> >> D-04103 Leipzig
> >> https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin
> >> -haspelmath/ _______________________________________________
> >> Lingtyp mailing list
> >> Lingtyp at listserv.linguistlist.org
> >> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> >
> > _______________________________________________
> > Lingtyp mailing list
> > Lingtyp at listserv.linguistlist.org
> > https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20231103/48a052c7/attachment.htm>


More information about the Lingtyp mailing list