[Lingtyp] Greenbergian word order universals: confirmed after all

Sat Nov 4 18:46:58 UTC 2023

Dear Frans and dear Martin,

Perhaps we should distinguish replication of the empirical correlations
claimed to be universal across languages from replication of
the evolutionary models that attempt to explain them?

Replication of the latter--- the computational phylogenetic models--- would
involve not only re-sampling of the data, but also
replication of the modeling process itself, as Gerhard mentions.  The fact
that Gerhard was able to replicate the Dunn et al 2011 findings
is telling: their advanced computational methods were not "misapplied or
misunderstood" as Martin darkly suggests happens so often.  Rather, their
work appears to be a building block for subsequent theory.

Nowadays, as I mentioned earlier, the gold standard for computational
research is to make
the computer code underlying the models publicly available, as Jaeger &
Wahle 2021 do.  So replication methods have become more stringent,
data sources have become richer, and theoretical models have become more
verifiable and more explanatory.    Why should we be pessimistic?
If we are never wrong, we will never learn.

Joan

On Sat, Nov 4, 2023 at 3:19 AM Frans Plank <frans.plank at ling-phil.ox.ac.uk>
wrote:

> Lieber Gerhard, actually, since you emphasise replicability, LT once also
> had a debate on replication:  Re-doing typology, LT 10, 67-128, 2006.
>  (What did it not have debates on …)  One contribution, especially relevant
> in the present context, was by Martin Haspelmath & Sven Siegmund,
> successfully replicating some of Greenberg’s word order universals through
> the time-honoured method of re-sampling.
>
> Frans
>
> On 4. Nov 2023, at 07:56, Gerhard Jäger <gerhard.jaeger at uni-tuebingen.de>
> wrote:
>
> Dear all,
>
> Let me add my two cents to this discussion.
>
> The fact that Dunn et al. 2011 sparked the discussion in the special issue
> of LT back then, and still inspires discussions and research today, proves
> quite convincingly, I believe, that it was an important contribution.
>
> Martin, you write, "Greenberg's (1963) claims were based on his knowledge
> of hundreds of languages (and on earlier research by W. Schmidt), not only
> on his very small and very skewed 30-language sample, but the fact that
> most of his universals stood the test of time (despite his very imperfect
> methods) seems to show that what mattered primarily was the novel
> (universalistic and quantitative) perspective that he took, not so much the
> methods."
> I am sure you are right, but what were these hundreds of languages, and
> what did Greenberg know about them? This touches on the issue of
> *replicability*. When I embarked on the research culminating in Jäger &
> Wahle (2021), I tried to replicate Dunn et al. I wrote my own code, and I
> use different phylogenies, and I got got almost precisely the same results
> as the ones they reported.
>
> Let me emphasize - I only used the description of methods in Dunn et al.
> (2011), not the original code, but replication worked without problems.
> This is more than what can be said of a substantial portion of scientific
> papers. It demonstrates the robustness of the results. One may legitimately
> disagree on the conclusions the authors drew, but the reported results per
> se are rock solid.
>
> I want to conclude with an observation I was told by a classical
> archaeologist. According to him, when carbon dating started in the 1950s,
> the results were often in stark contrast to received archaeological
> insights. Still, those clueless physicists got there papers published in
> Nature, to the dismay of the archaeologists. Over time, the problems of the
> naive use of carbon dating were ironed out, and carbon dating is now an
> indispensable part of the archaeologist's toolkit.
>
> Best, Gerhard
>
>
>
> On 11/4/23 07:24, Martin Haspelmath wrote:
>
> Many thanks, Simon, for offering your author's perspective on this old
> paper!
>
> You are certainly right that "the scientific process" can sometimes be
> slow, and it may take a while for wrong conclusions to be corrected. We
> should of course not be discouraged from trying out novel methods just
> because they may initially yield strange results.
>
> However, I found your 2011 paper very confusing because it was published
> in such a prominent place, seemingly betraying very high confidence in the
> results. It was only recently that I heard from a good source that the
> authors now agree with the earlier criticism that the sample size was too
> small in 2011, and two of them are now coauthors of the new forthcoming
> paper (with MUCH more data) that Annemarie Verkerk reported on recently. So
> that's a good developement.
>
> Maybe that new paper will come out in *Nature*, too, and then all will be
> well, as the 2011 paper will fall into oblivion. (But since the new paper
> mostly confirms older Greenbergian perspectives, *Nature* may not find
> the results novel enough...)
>
> But I'm still confused about the three points you make:
>
> First, your paper *didn't* show that Matthew Dryer's (1992) correlation
> method was "overly simplistic" – on the contrary, while some of the claims
> made in the 1970s were overly simplistic, Dryer showed that the
> Greenbergian correlations hold, but in a more limited way than had been
> thought (he rejected the head-dependent theory in favour of the
> branching-direction theory).
>
> Second, the "need to understand language systems in a diachronic manner"
> *didn't* need to be lighlighted, because it was something that linguists
> had routinely done since the 19th century, and the idea was made very
> prominent again by Greenberg himself (e.g. Greenberg 1969; 1978; see also
> Croft's well-known work).
>
> Third, linguists knew that "different routes can be taken in different
> families at different times", so this was *not* a contribution of your
> paper either.
>
> I don't know about "rigorous" reviewing in high-profile journals (they
> certainly have a high rejection rate), but I think it is clear that their
> influence is not always justified by the validity of their research
> results. Publishing a paper in *Nature* can have a chilling effect on the
> rest of the discipline, in that other approaches or views may not be seen
> as legitimate anymore. (We saw the bad effects of the outsize political
> influence of high-profile journals in recent years, when some of them
> declared the lab-leak theory and certain alternative approaches to public
> health as wrong, quite prematurely, and with devastating effects on the
> public discussion.)
>
> So one of my reasons for raising the problems with the 2011 paper here is
> to point out that while the correlated-evolution method may have its
> advantages, it requires a huge amount of data (and very unrealistic
> worldwide trees, as is particularly clear with Jäger & Wahle's ASJP-based
> tree, but also with the Bouckaert et al. tree:
> https://osf.io/preprints/socarxiv/f8tr6/). For most other purposes in
> typology, the traditional sampling methods (discussed in our subfield since
> 1978) is all we have.
>
> In their target authors' response article in *Linguistic T*ypology (2011,
> 509-534), Levinson et al. start out by noting the problem of spatial and
> genealogical autocorrelation (which used to be called "Galton's Problem",
> after the originator of eugenics), but they make it sound as if stratified
> sampling has a deep fundamental problem and that the correlated-evolution
> method is always better. But they seem to be exaggerating the problems with
> sampling, e.g. when they say:
>
> "Many typologists may assume that the dangers of covert phylogenetic
> dependence are remote. But given the apparent genetic bottlenecks at the
> beginning of the modern human diaspora out of Africa (Amos & Hoffman 2009),
> something close to language monogenesis seems a reasonable assumption,
> rendering Galton’s problem insurmountable."
>
> Admittedly, this conceivable problem had not occurred to typologists at
> the time of Greenberg and Dryer, and it becomes important in the discussion
> only after Maslova (2000) (it is also mentioned by Cysouw 2011 in LT, and
> by Jäger & Wahle 2021). This is why I wrote a blogpost a few years ago
> asking how realistic it is to suspect that current typological
> distributions might in part reflect Proto-World (
> https://dlc.hypotheses.org/2376). Now recently, Russell Gray told me that
> he never thought that Proto-World retention was an important problem for
> sampling.
>
> If so, then I'm even more happy to recommend continuing with stratified
> sampling (among other methods, of course), as it is a much cheaper method –
> and still fully legitimate, despite the claims made by Levinson and
> colleagues. Greenberg's (1963) claims were based on his knowledge of
> hundreds of languages (and on earlier research by W. Schmidt), not only on
> his very small and very skewed 30-language sample, but the fact that most
> of his universals stood the test of time (despite his very imperfect
> methods) seems to show that what mattered primarily was the novel
> (universalistic and quantitative) perspective that he took, not so much the
> methods.
>
> Best wishes, and thanks again to all for the discussions,
>
> Martin
>
> P.S. The earlier LINGTYP thread can be seen here:
> https://listserv.linguistlist.org/pipermail/lingtyp/
> On 04.11.23 02:22, Simon Greenhill wrote:
>
> Colleagues, Martin, everyone else
>
> Thank you for sharing your perspectives on our 2011 paper. It's nice to see this still be discussed more than a decade later. However, I would like to express my concerns and disagreements with some of the points you've raised.
>
> I'm very proud of the Dunn et al. paper for a number of reasons. I'll name three.
>
> First, the paper showed that the overly simplistic correlation methods that had been used to make sweeping global claims were problematic. We need better tools to tackle these questions, and the tools we applied were one part of a better toolkit.
>
> Second, it highlighted the need to understand language systems in a diachronic manner. We cannot decouple language typology from language history, instead we need to understand how these are entangled.
>
> Third, it emphasised the way that particular configurations of languages can be arrived at via different routes in different families at different times. This enables a much richer understanding of how these particular generalisations have arisen.
>
> Have Jäger and Wähle disproved any of that? no. Maybe these were not completely novel insights (Maslova’s work has been mentioned which touches on a few of these issues too, for example), but these ideas did appear to crystallise in this paper.
>
> While it's certainly important to revisit and reevaluate research findings to ensure accuracy, it is crucial to approach these discussions with an understanding of the scientific process. Scientific paradigms evolve over time, and different studies may yield varying results due to changes in methodologies, data sources, and sample sizes. This doesn't necessarily imply that the initial research was flawed or that the authors were neglectful. In particular, the tools, the data, and our understanding of how languages change are substantially further advanced than they were a decade ago (or, I know that *my* understanding of these things is more advanced now, at least). And these other papers that you mention -- and many other studies -- have built upon the work we did in 2011.
>
> Furthermore, I would like to caution against drawing overly broad conclusions about the quality of research published in high-prestige journals. The peer-review process in such journals is rigorous, and while they may occasionally feature sensationalist claims, this doesn't diminish the overall value they contribute to the scientific community. For the record, of the handful of papers I've had in these journals *all* have been reviewed by people I would infer to be linguists based on the comments and issues they raised. We did not send these papers to these journals to avoid linguistic reviewers but, frankly, I've had better reviews at these journals than at prominent linguistics journals (and by "better" I mean more rigorous, more thorough, and more critical).
>
> Finally, linguistic typology is an ongoing and evolving field trying to tackle very difficult problems. We need all the tools and approaches we can get to solve these problems across all the levels that languages operate on (from detailed language internal analyses to high-level global analyses). Rather than looking back and gate-keeping what is 'real’ typology published in ‘real’ linguistics journals, we should shift our focus forward. Typology can be a welcoming and diverse community that embraces a wide range of approaches, analyses, and styles. Let's look outward to foster connections with other fields and disciplines.
>
> After all, why shouldn't linguistic typology work be everywhere in science? it's certainly interesting enough.
>
> Simon
>
> Dr. Simon J. Greenhill
>
> Associate Professor
>
> Te Kura Mātauranga Koiora | School of Biological Sciences
> Te Whare Wānanga o Tāmaki Makaurau | University of Auckland
>
> Abteilung für Sprach- und Kulturevolution | Department of Linguistic and Cultural Evolution
> Max-Planck-Institut für Evolutionäre Anthropologie | Max Planck Institute for Evolutionary Anthropology
>
>
> --
> Martin Haspelmath
> Max Planck Institute for Evolutionary Anthropology
> Deutscher Platz 6
> D-04103 Leipzighttps://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
>
>
> _______________________________________________
> Lingtyp mailing listLingtyp at listserv.linguistlist.orghttps://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
>
> --
> Prof. Dr. Gerhard Jäger
> Universität Tübingen
> Seminar für Sprachwissenschaft
> Tel.: +49-7071-29-77302http://www.sfs.uni-tuebingen.de/~gjaeger/
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>

-- 
Joan Bresnan
Stanford University
http://www.stanford.edu/~bresnan/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20231104/f3da9bcd/attachment.htm>