[Lingtyp] Reporting cross-linguistic frequencies
Martin Haspelmath
martin_haspelmath at eva.mpg.de
Fri Nov 21 06:59:39 UTC 2025
Thanks, Jürgen! I like the "wave vs. particle" analogy, because these
concrete expressions help us make sense of what seems to be going on
(given the experimental results).
In worldwide comparative linguistics, we also want to make sense of what
is going on, but it seems to me that we need analogies not only for
interpreting results, but also for understanding what we are aiming for.
For me, "removing areal and genealogical/phylogenetic bias" has the aim
of detecting universal tendencies that are caused by
universal/non-historical factors.
I would think that on the imagined concrete scenario of a sample of
isolated isolates (e.g. 100 languages that have long existed on isolated
islands, maybe of the Rapanui type), looking at these 100 isolates
should give the same results as looking at 100 sample languages from
larger families that have been shaped also by contact.
Are there reasons to doubt this? If not, then we can take the "isolated
isolates" scenario simply as a way of illustrating our goals in concrete
terms (somewhat like "wave" and "particle" serve as concrete
illustrations).
But maybe the imagined scenario (which is not an "assumption"!!) is
somehow problematic, because the goals of our enterprise are DIFFERENT.
In Bickel's (2007) paper (LiTy 11), which has been widely cited, the
idea seems to be that looking for "history-free" tendencies is somehow
an obsolete goal.
Some people have suggested that in identifying universal trends, one
MUST take into account genealogies, and isolates are problematic because
they are not part of any genealogy. This is because we should not look
primarily at languages, but at *transitions* (changes from one type to
another). If I understood Verkerk et al. (2025) correctly, they solved
the "isolates problem" by using an artificial world tree (where all
languages are somehow included; the very beautiful tree is used in the
press release
<https://www.mpg.de/25723124/1114-evan-enduring-patterns-in-the-world-s-languages-150495-x>).
Are Verkerk et al. pursuing a different goal? That is not really clear
to me.
I find the notion of an artificial world tree profoundly strange, much
stranger than the hypothetical scenario of 100 isolates on remote
islands. But maybe it is needed, because the goal of the enterprise is
somehow different (along Bickel's lines)? So I like the imagined
"isolated isolates" scenario also because it clarifies what I'm
interested in.
(And isn't Trudgill's idea that isolates are somehow "exotic" very
speculative? Shcherbakova et al. 2023 have not provided strong evidence
against the idea, but they simply did not find evidence in favour of it.)
One last point: Yes, all isolates are survivors from some larger family,
but why is that relevant? Languages may have existed for half a million
years or longer, and we know almost nothing about that deep past. Most
of the currently existing families probably had more branches in earlier
times, and the protolanguages we reconstruct may or may not have been
isolates themselves. We cannot tell, but I don't see why we would need
to know.
Best,
Martin
On 21.11.25 07:07, Juergen Bohnemeyer via Lingtyp wrote:
> Dear all — Here’s a quick explanation of why the assumption of an
> “isolated isolate” is profoundly strange:
>
> Leaving aside sign languages, constructed languages, and artificial
> languages, nobody seems to entertain the possibility that languages
> have emerged spontaneously out of something that we wouldn’t consider
> a language itself over the last few thousands of years. In other
> words, the languages we consider isolates are without exception lone
> survivors; but they did descend from ancestors which are often lost
> and unknown, and these ancestors biased the offshoot's properties by
> dint of inheritance/transmission.
>
> The reason isolates are interesting from a sampling perspective is
> that they may represent entire genera or families without forcing us
> to pick a member. But being an isolate does not mean being free of
> phylogenetic bias. On the contrary: isolates of unknown descend are
> actually particularly problematic in the sense that they are shaped by
> biases that we have no way of identifying directly since the biasing
> ancestors have been lost to time.
>
> As to contact. Languages that are not in contact with other languages
> over long stretches of time are extremely rare and unusual. In fact,
> as I’m sure everyone here is aware, such languages have been plausibly
> argued to tend to evolve exotic properties as a result of their
> isolation (Lupyan & Dale 2010; Trudgill 2011), although this is
> controversial (Shcherbakova et al. 2023). In any case, I would
> certainly not want to make such languages the basis for causal
> inference in typology.
>
> But it gets a lot worse. The “isolated isolate” interpretation doesn’t
> just require us to think of a language that isn’t currently in contact
> with any other language. We would have to assume a language that has
> *never* come into contact with any other language at any point in its
> history (at least not long/intensively enough to change as a result of
> it). I’m seriously uncertain whether such a language has ever existed
> on this planet.
>
> Here’s an analogy from quantum mechanics: Schrödinger’s and
> Heisenberg’s equations are mathematical models that describe the
> experimentally observed behavior of elementary particles under various
> conditions. The particle and the wave interpretation are
> interpretations that we use to make sense of these mathematical
> models. We find these models useful because most of us don’t think in
> mathematical equations (not even theoretical physicists, it would
> seem). But if we push these interpretations beyond a certain point,
> they break down. To begin with, we can’t think of something
> simultaneously as a wave and as a particle.
>
> In the same way, we can mathematically describe the influence
> phylogeny and areality exert on the probability of a particular
> language having certain properties. The “isolated isolate”
> interpretation is just that - an interpretation of the statistical
> models; but, as I tried to show above, it runs into absurdities rather
> more quickly than the particle and wave interpretations in quantum
> mechanics.
>
> Best — Juergen
>
> G. Lupyan, R. Dale, Language structure is partly determined by social
> structure. PLOS ONE5, e8559 (2010).
>
> O. Shcherbakova, S. M. Michaelis, H. J. Haynie, et al. Societies of
> strangers do not speak less complex languages. /Scientific Advances
> /9, eadf7704 (2023).
>
> P. Trudgill, /Sociolinguistic Typology: Social Determinants of
> Linguistic Complexity /(OxfordUniv. Press, 2011).
>
> Juergen Bohnemeyer (He/Him)
> Professor, Department of Linguistics
> University at Buffalo
>
> Office: 642 Baldy Hall, UB North Campus
> Mailing address: 609 Baldy Hall, Buffalo, NY 14260
> Phone: (716) 645 0127
> Fax: (716) 645 3825
> Email: _jb77 at buffalo.edu_
> Web: _http://www.acsu.buffalo.edu/~jb77/_
>
> Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID
> 585 520 2411; Passcode Hoorheh)
>
> There’s A Crack In Everything - That’s How The Light Gets In
> (Leonard Cohen)
>
> --
>
> *From: *Lingtyp <lingtyp-bounces at listserv.linguistlist.org> on behalf
> of Matías Guzmán Naranjo via Lingtyp <lingtyp at listserv.linguistlist.org>
> *Date: *Thursday, November 20, 2025 at 04:01
> *To: *lingtyp at listserv.linguistlist.org
> <lingtyp at listserv.linguistlist.org>
> *Subject: *Re: [Lingtyp] Reporting cross-linguistic frequencies
>
> I'll jump in with some thoughts.
>
>
> - Dryer's method and ours aim at doing basically the same thing:
> quantifying what's "left" after removing genetic and areal bias.
>
> - Whether you should call them proportions or adjusted frequencies...
> I'm not sure that it matters that much? As long as you understand how
> they were calculated...
>
> - How you want to interpret this "what's left" is debatable, I guess,
> but I don't think I agree with Jürgen. As far as I can tell it should be
> compatible with something along the lines of an "isolated isolate" as
> described by Martin. You can also see them as 'universal' preferences
> (more or less the same thing?).
>
> - "the probability of a random language having a certain property
> depends on (or is influenced by, or varies with, etc.) it being related
> to certain other languages, or being spoken (or signed) in a particular
> area" -> In our approach we assumes that the probability of a language L
> having some feature value F depends on three things: 1) its relatedness
> to other languages, 2) its contact to other languages, 3) some universal
> preference for F. Kind of the point of what we do is that we try to
> estimate each of these factors. [We can add more factors and more
> structure, but that's the most basic model]
>
> - You can quantify the contribution of the phylogenetic component and
> the areal component(s) with our techniques, but this is a bit tricky
> because there is unavoidable overlap in the information each one
> contains. These measures also have a different meaning than the adjusted
> frequency and can't be used as a replacement for them, you can use them
> in addition to.
>
>
> Matías
>
>
>
> El 20/11/25 a las 9:36, Omri Amiraz via Lingtyp escribió:
> > Dear all,
> > I agree with Ian that, in addition to genealogical and areal biases,
> > the very question of what counts as a language versus a dialect is
> > partly subjective. This makes actual frequencies even more
> > problematic, since we would obtain different results depending on
> > whether we treat Wu Chinese as one language or as thirty separate
> > languages, as Ian pointed out.
> > Juergen wrote: "We can empirically assess the extent to which the
> > probability of a random language having a certain property depends on
> > (or is influenced by, or varies with, etc.) it being related to
> > certain other languages, or being spoken (or signed) in a particular
> > area."
> >
> > I wonder whether it might be useful to have a measure of the
> > genealogical and areal spread of a feature, essentially quantifying
> > how broadly it is distributed across families and regions in the
> > present-day world. Such a measure might be more straightforward to
> > interpret than an adjusted frequency/probability, since it is not
> > clear whether the described population is a hypothetical set of
> > isolated isolates or something else.
> >
> > Is anyone aware of an existing metric that captures genealogical or
> > areal spread in this way?
> >
> > Best,
> > Omri
> >
> > _______________________________________________
> > Lingtyp mailing list
> > Lingtyp at listserv.linguistlist.org
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C02%7Cjb77%40buffalo.edu%7C88b1df86321b4cb12f9f08de28135c96%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638992260962407959%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=uY52%2BPtTVyzNB0LIowvZ0UzKWB6MWLR%2BG62V70JtNGE%3D&reserved=0
> <https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C02%7Cjb77%40buffalo.edu%7C88b1df86321b4cb12f9f08de28135c96%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638992260962443120%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=X%2F1JMgRNS%2Bn0ZlGa7pPdsJWJBoJy%2BYJt6bHWktCMeRc%3D&reserved=0
> <https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
--
Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251121/54592a26/attachment.htm>
More information about the Lingtyp
mailing list