<div>Dear Martin, dear all,</div><div> </div><div>I am starting to wonder whether statistical analysis of a language sample is at all a suitable method for "detecting universal tendencies that are caused by universal/non-historical factors" (Martin's formulation). Given that there is no consensus as for how to overcome genealogical and areal biases and even whether those biases must be overcome at all and what trying to overcome them actually gets us (apart from getting some of us high-profile publications with ever more complicated mathematical apparatus which others among us struggle to understand and cannot evaluate; not being in any way a "mathematically-gifted person", to borrow Stela's expression, I belong to the latter group), the whole enterprise does not appear to be very productive. What if the more appropriate method, at least if purported functional factors are being concerned, is the one employed by John Hawkins, Natalia Levshina and some others, i.e. to combine experimental research on production / processing with a quantitative study of variation in corpora across a small number of sufficiently distinct languages? If we can show that certain well-defined factors are operative in language processing and result in skewed distributions in corpora ultimately translatable into tendencies of diachronic change, and moreover are able to corroborate these results by similarly skewed distributions of variables in reasonably designed cross-linguistic samples, then what else do we need? In any case, as has been repeatedly stated many times, even if we find that in a certain language sample, however well-designed, a certain variable shows a clearly skewed distribution of, say 80% vs 20%, nothing follows from this in terms of "universal preferences" unless we are able to independently show that the more frequent value is in some or other way "preferred" in processing / production etc. I am sorry if the above is self-evident or naive.</div><div> </div><div>Best regards,</div><div> </div><div>Peter<br /> </div><div> </div><div>----------------</div><div>Кому: lingtyp@listserv.linguistlist.org (lingtyp@listserv.linguistlist.org);</div><div>Тема: [Lingtyp] Reporting cross-linguistic frequencies;</div><div>21.11.2025, 10:19, "Martin Haspelmath via Lingtyp" <lingtyp@listserv.linguistlist.org>:</div><blockquote><p>Thanks, Jürgen! I like the "wave vs. particle" analogy, because these concrete expressions help us make sense of what seems to be going on (given the experimental results).</p><p>In worldwide comparative linguistics, we also want to make sense of what is going on, but it seems to me that we need analogies not only for interpreting results, but also for understanding what we are aiming for. For me, "removing areal and genealogical/phylogenetic bias" has the aim of detecting universal tendencies that are caused by universal/non-historical factors.</p><p>I would think that on the imagined concrete scenario of a sample of isolated isolates (e.g. 100 languages that have long existed on isolated islands, maybe of the Rapanui type), looking at these 100 isolates should give the same results as looking at 100 sample languages from larger families that have been shaped also by contact.</p><p>Are there reasons to doubt this? If not, then we can take the "isolated isolates" scenario simply as a way of illustrating our goals in concrete terms (somewhat like "wave" and "particle" serve as concrete illustrations). </p><p>But maybe the imagined scenario (which is not an "assumption"!!) is somehow problematic, because the goals of our enterprise are DIFFERENT. In Bickel's (2007) paper (LiTy 11), which has been widely cited, the idea seems to be that looking for "history-free" tendencies is somehow an obsolete goal.</p><p>Some people have suggested that in identifying universal trends, one MUST take into account genealogies, and isolates are problematic because they are not part of any genealogy. This is because we should not look primarily at languages, but at *transitions* (changes from one type to another). If I understood Verkerk et al. (2025) correctly, they solved the "isolates problem" by using an artificial world tree (where all languages are somehow included; the very beautiful tree is used in <a href="https://www.mpg.de/25723124/1114-evan-enduring-patterns-in-the-world-s-languages-150495-x" rel="noopener noreferrer">the press release</a>). Are Verkerk et al. pursuing a different goal? That is not really clear to me.</p><p>I find the notion of an artificial world tree profoundly strange, much stranger than the hypothetical scenario of 100 isolates on remote islands. But maybe it is needed, because the goal of the enterprise is somehow different (along Bickel's lines)? So I like the imagined "isolated isolates" scenario also because it clarifies what I'm interested in.</p><p>(And isn't Trudgill's idea that isolates are somehow "exotic" very speculative? Shcherbakova et al. 2023 have not provided strong evidence against the idea, but they simply did not find evidence in favour of it.)</p><p>One last point: Yes, all isolates are survivors from some larger family, but why is that relevant? Languages may have existed for half a million years or longer, and we know almost nothing about that deep past. Most of the currently existing families probably had more branches in earlier times, and the protolanguages we reconstruct may or may not have been isolates themselves. We cannot tell, but I don't see why we would need to know.</p><p>Best,</p><p>Martin</p><p> </p><div>On 21.11.25 07:07, Juergen Bohnemeyer via Lingtyp wrote:</div><blockquote>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
Dear all — Here’s a quick explanation of why the assumption of
an “isolated isolate” is profoundly strange: </div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif">
<span style="font-size:12pt">Leaving aside sign languages,
constructed languages, and artificial languages, nobody seems
to entertain the possibility that languages have emerged
spontaneously out of something that we wouldn’t consider a
language itself over the last few thousands of years. In other
words, the languages we consider isolates are without
exception lone survivors; but they did descend from ancestors
which are often
</span><span style="background-color:rgb( 255 , 255 , 255 );font-size:16px">lost
and unknown</span><span style="font-size:12pt">, and these
ancestors biased the offshoot's properties by dint of
inheritance/transmission.</span></div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
The reason isolates are interesting from a sampling perspective
is that they may represent entire genera or families without
forcing us to pick a member. But being an isolate does not mean
being free of phylogenetic bias. On the contrary: isolates of
unknown descend are actually particularly problematic in the
sense that they are shaped by biases that we have no way of
identifying directly since the biasing ancestors have been lost
to time.</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif">
<span style="font-size:12pt">As to contact. Languages that are
not in contact with other languages over long stretches of
time are extremely rare and unusual. In fact, as I’m sure
everyone here is aware, such languages have been plausibly
argued to tend to evolve exotic properties as a result of
their isolation (</span><span style="background-color:rgb( 255 , 255 , 255 );font-size:16px">Lupyan
& Dale 2010;
</span><span style="font-size:12pt">Trudgill 2011), although
this is controversial (Shcherbakova et al. 2023). In any case,
I would certainly not want to make such languages the basis
for causal inference in typology.</span></div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
But it gets a lot worse. The “isolated isolate” interpretation
doesn’t just require us to think of a language that isn’t
currently in contact with any other language. We would have to
assume a language that has
<b>never</b> come into contact with any other language at any
point in its history (at least not long/intensively enough to
change as a result of it). I’m seriously uncertain whether such
a language has ever existed on this planet. </div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
Here’s an analogy from quantum mechanics: Schrödinger’s and
Heisenberg’s equations are mathematical models that describe the
experimentally observed behavior of elementary particles under
various conditions. The particle and the wave interpretation are
interpretations that we use to make sense of these mathematical
models. We find these models useful because most of us don’t
think in mathematical equations (not even theoretical
physicists, it would seem). But if we push these interpretations
beyond a certain point, they break down. To begin with, we can’t
think of something simultaneously as a wave and as a particle. </div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
In the same way, we can mathematically describe the influence
phylogeny and areality exert on the probability of a particular
language having certain properties. The “isolated isolate”
interpretation is just that - an interpretation of the
statistical models; but, as I tried to show above, it runs into
absurdities rather more quickly than the particle and wave
interpretations in quantum mechanics. </div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
Best — Juergen</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
G. Lupyan, R. Dale, Language structure is partly determined by
social structure. PLOS ONE5, e8559 (2010).</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
O. Shcherbakova, S. M. Michaelis, H. J. Haynie, et al. Societies
of strangers do not speak less complex languages.
<i>Scientific Advances </i>9, eadf7704 (2023).</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
P. Trudgill, <i>Sociolinguistic Typology: Social Determinants
of Linguistic Complexity
</i>(OxfordUniv. Press, 2011).</div>
<div dir="ltr" style="color:rgb( 0 , 0 , 0 );font-family:'aptos' , 'arial' , 'helvetica' , sans-serif;font-size:12pt">
<br />
</div>
<div id="1dadbb124ddfd57c2650379a47ad6a82ms-outlook-mobile-signature">
<p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0in"><span style="color:black;font-family:'helvetica';font-size:9pt">Juergen
Bohnemeyer (He/Him)<br />
Professor, Department of Linguistics<br />
University at Buffalo <br />
<br />
Office: 642 Baldy Hall, UB North Campus<br />
Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br />
Phone: <span class="1f1ea193f6735cf0wmi-callto">(716) 645 0127</span> <br />
Fax: <span class="1f1ea193f6735cf0wmi-callto">(716) 645 3825</span><br />
Email: </span><span style="color:rgb( 0 , 120 , 212 );font-family:'helvetica';font-size:9pt"><u><a class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext" href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" style="color:rgb( 0 , 120 , 212 );margin-bottom:0px;margin-top:0px">jb77@buffalo.edu</a></u></span><span style="color:black;font-family:'helvetica';font-size:9pt"><br />
Web: </span><span style="color:rgb( 5 , 99 , 193 );font-family:'helvetica';font-size:9pt"><u><a class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext" href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" style="color:rgb( 5 , 99 , 193 );margin-bottom:0px;margin-top:0px">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="color:black;font-family:'helvetica';font-size:9pt"> <br />
<br />
</span><span style="color:black">Office hours Tu/Th
3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520
2411; Passcode Hoorheh) </span><span style="color:black;font-family:'helvetica';font-size:9pt"><br />
<br />
There’s A Crack In Everything - That’s How The Light Gets
In <br />
(Leonard Cohen) </span></p>
<p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0in">-- </p>
<p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0in"> </p>
</div>
<div id="a072bc9780e9b7b3b9a49a25572a5efdmail-editor-reference-message-container">
<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">
</div>
<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="border-color:rgb( 181 , 196 , 223 ) currentcolor currentcolor currentcolor;border-style:solid none none none;border-width:1pt medium medium medium;color:black;font-family:'aptos';font-size:12pt;padding:3pt 0in 0in 0in;text-align:left">
<b>From: </b>Lingtyp
<a class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" href="mailto:lingtyp-bounces@listserv.linguistlist.org"><lingtyp-bounces@listserv.linguistlist.org></a> on behalf of
Matías Guzmán Naranjo via Lingtyp
<a class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" href="mailto:lingtyp@listserv.linguistlist.org"><lingtyp@listserv.linguistlist.org></a><br />
<b>Date: </b>Thursday, November 20, 2025 at 04:01<br />
<b>To: </b><a class="324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated" href="mailto:lingtyp@listserv.linguistlist.org">lingtyp@listserv.linguistlist.org</a>
<a class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" href="mailto:lingtyp@listserv.linguistlist.org"><lingtyp@listserv.linguistlist.org></a><br />
<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic
frequencies<br />
<br />
</div>
<div class="662abf391f8b56b11ba8165b7bd48bb4PlainText" style="font-size:11pt">I'll jump in
with some thoughts.<br />
<br />
<br />
- Dryer's method and ours aim at doing basically the same
thing:<br />
quantifying what's "left" after removing genetic and areal
bias.<br />
<br />
- Whether you should call them proportions or adjusted
frequencies...<br />
I'm not sure that it matters that much? As long as you
understand how<br />
they were calculated...<br />
<br />
- How you want to interpret this "what's left" is debatable, I
guess,<br />
but I don't think I agree with Jürgen. As far as I can tell it
should be<br />
compatible with something along the lines of an "isolated
isolate" as<br />
described by Martin. You can also see them as 'universal'
preferences<br />
(more or less the same thing?).<br />
<br />
- "the probability of a random language having a certain
property<br />
depends on (or is influenced by, or varies with, etc.) it
being related<br />
to certain other languages, or being spoken (or signed) in a
particular<br />
area" -> In our approach we assumes that the probability of
a language L<br />
having some feature value F depends on three things: 1) its
relatedness<br />
to other languages, 2) its contact to other languages, 3) some
universal<br />
preference for F. Kind of the point of what we do is that we
try to<br />
estimate each of these factors. [We can add more factors and
more<br />
structure, but that's the most basic model]<br />
<br />
- You can quantify the contribution of the phylogenetic
component and<br />
the areal component(s) with our techniques, but this is a bit
tricky<br />
because there is unavoidable overlap in the information each
one<br />
contains. These measures also have a different meaning than
the adjusted<br />
frequency and can't be used as a replacement for them, you can
use them<br />
in addition to.<br />
<br />
<br />
Matías<br />
<br />
<br />
<br />
El 20/11/25 a las 9:36, Omri Amiraz via Lingtyp escribió:<br />
> Dear all,<br />
> I agree with Ian that, in addition to genealogical and
areal biases,<br />
> the very question of what counts as a language versus a
dialect is<br />
> partly subjective. This makes actual frequencies even
more<br />
> problematic, since we would obtain different results
depending on<br />
> whether we treat Wu Chinese as one language or as thirty
separate<br />
> languages, as Ian pointed out.<br />
> Juergen wrote: "We can empirically assess the extent to
which the<br />
> probability of a random language having a certain
property depends on<br />
> (or is influenced by, or varies with, etc.) it being
related to<br />
> certain other languages, or being spoken (or signed) in
a particular<br />
> area."<br />
><br />
> I wonder whether it might be useful to have a measure of
the<br />
> genealogical and areal spread of a feature, essentially
quantifying<br />
> how broadly it is distributed across families and regions
in the<br />
> present-day world. Such a measure might be more
straightforward to<br />
> interpret than an adjusted frequency/probability, since
it is not<br />
> clear whether the described population is a hypothetical
set of<br />
> isolated isolates or something else.<br />
><br />
> Is anyone aware of an existing metric that captures
genealogical or<br />
> areal spread in this way?<br />
><br />
> Best,<br />
> Omri<br />
><br />
> _______________________________________________<br />
> Lingtyp mailing list<br />
> <a class="324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a><br />
> <a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C02%7Cjb77%40buffalo.edu%7C88b1df86321b4cb12f9f08de28135c96%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638992260962407959%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=uY52%2BPtTVyzNB0LIowvZ0UzKWB6MWLR%2BG62V70JtNGE%3D&reserved=0</a><br />
_______________________________________________<br />
Lingtyp mailing list<br />
<a class="324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a><br />
<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C02%7Cjb77%40buffalo.edu%7C88b1df86321b4cb12f9f08de28135c96%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638992260962443120%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=X%2F1JMgRNS%2Bn0ZlGa7pPdsJWJBoJy%2BYJt6bHWktCMeRc%3D&reserved=0</a><br />
</div>
</div>
<br />
<pre class="3f7f1cfb43cdc145acb8dd7a82f3a2c8moz-quote-pre">_______________________________________________
Lingtyp mailing list
<a class="324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>
<a class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext" href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a>
</pre>
</blockquote><pre>--
Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
<a href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/" rel="noopener noreferrer">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></pre>,<p>_______________________________________________<br />Lingtyp mailing list<br /><a href="mailto:Lingtyp@listserv.linguistlist.org" rel="noopener noreferrer">Lingtyp@listserv.linguistlist.org</a><br /><a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" rel="noopener noreferrer">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a></p></blockquote><div> </div><div> </div><div>-- </div><div>Peter Arkadiev, PhD Habil.</div><div>https://peterarkadiev.github.io/</div><div> </div>