<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">Dear Juergen,<div>you are explaining us how you perform semantics on one language. Of course, linguists working on a language use primary data. I thought we were discussing typology. If you want to conduct a typological study, you now need to enter your results for each language in a database and to exploit this secondary data.</div><div>Best</div><div>Sy<br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>Le 25 nov. 2025 à 17:47, Juergen Bohnemeyer <jb77@buffalo.edu> a écrit :</div><br class="Apple-interchange-newline"><div>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

Dear Sy — Good grief, NO! I do not perform semantic analyses via translation. I would immediately flunk a student who would submit an assignment based on translation in my semantics courses. For crying out loud. </div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

Sorry. Few things get my blood pressure going like ignorance about how semantic research works.  How do you think a baby learns the meanings of the expressions of the languages she grows up with? Of course by observing the (extralinguistic and linguistic) contexts

 in which the expressions are used, formulating hypotheses on the basis of these observations, and then testing these hypotheses by producing new utterances. And that’s pretty much how semantic research works, with a few necessary refinements here and there.</div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

And, I don’t understand why we are still talking about whether corpus data is primary data. Maybe you missed it, but I conceded about two responses back that no, strictly speaking, corpus data isn’t primary data. But it can be

<b>used</b> by linguists like primary data, in the sense that it represents individual utterances, actions performed by speakers in time and space.</div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

Best — Juergen</div>

<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div id="ms-outlook-mobile-signature"><div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><span style="font-family: Helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>

Professor, Department of Linguistics<br>

University at Buffalo <br>

<br>

Office: 642 Baldy Hall, UB North Campus<br>

Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>

Phone: (716) 645 0127 <br>

Fax: (716) 645 3825<br>

Email: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="4aaf664e-ecb1-4660-9e36-c56f9a3f0cc0" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"><br>

Web: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="66341f52-5157-4355-83ec-68a1119580d4" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"> <br>

<br>

</span><span style="">Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) </span><span style="font-family: Helvetica; font-size: 9pt;"><br>

<br>

There’s A Crack In Everything - That’s How The Light Gets In <br>

(Leonard Cohen)  </span></div><div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </div><p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"> </p>

</div>

<div id="mail-editor-reference-message-container">

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"></div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: Aptos; font-size: 12pt;">

<b>From: </b>Sylvain Kahane <sylvain@kahane.fr><br>

<b>Date: </b>Tuesday, November 25, 2025 at 11:14<br>

<b>To: </b>Juergen Bohnemeyer <jb77@buffalo.edu><br>

<b>Cc: </b>Martin Haspelmath <martin_haspelmath@eva.mpg.de>, Linguistic Typology <lingtyp@listserv.linguistlist.org><br>

<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic frequencies<br>

<br>

</div>

<div class="ms-outlook-mobile-reference-message skipProofing">Dear Juergen,</div>

<div class="ms-outlook-mobile-reference-message skipProofing">I don’t understand what you call primary data. Are they something else than raw production of native speakers? To do semantics, you need someone to translate the data. If they are provided with translation

 or IGTs,  they are secondary data, no?</div>

<div class="ms-outlook-mobile-reference-message skipProofing">And to do phonetics, do you just use recordings? Don’t you need some secondary data aligned with the sound, such as a transcription or a translation?</div>

<div class="ms-outlook-mobile-reference-message skipProofing">Maybe some very gifted people can directly work with primary data of one or maybe two dozens of languages, but it is not what typologists do in general.</div>

<div class="ms-outlook-mobile-reference-message skipProofing">Best</div>

<div class="ms-outlook-mobile-reference-message skipProofing">Sy</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>

</div>

<blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing">Le 25 nov. 2025 à 15:20, Juergen Bohnemeyer <jb77@buffalo.edu> a écrit :</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>

</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><span style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">Dear Sy — What you mean when you say "</span><span style="background-color: rgb(255, 255, 255);">but no

 one in typology works with primary data” is that nobody in <b>syntactic</b> typology works with primary data. Like I said before (and Bill misunderstood), most of semantic typology and a little bit of phonetic typology is in fact based on primary data (and

 I emphatically do <b>not</b> mean corpus data now). </span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><span style="background-color: rgb(255, 255, 255);"><br>

</span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><span style="font-size: 16px; background-color: rgb(255, 255, 255);">And a quick response to Martin: "there is a trade-off that will not go away</span><span style="background-color: rgb(255, 255, 255);">”</span><span style="font-size: 16px; background-color: rgb(255, 255, 255);"> —

 What I’m predicting is that that tradeoff <b>will</b> in fact one day go away, although we’re still a ways away from that day. </span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-size: 16px;">

<span style="background-color: rgb(255, 255, 255);"><br>

</span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px;">

<span style="background-color: rgb(255, 255, 255);">Imagine a corpus large enough that all the information about the particular language that is currently available in the large typological databases could be inferred from it. And much more, and in a form that

 represents actual utterances produced by speakers with all their glorious variation rather than to reduce the language to a single observation. Then multiply that corpus by a few thousand languages. </span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 16px;">

<span style="background-color: rgb(255, 255, 255);"><br>

</span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><span style="background-color: rgb(255, 255, 255);">Does that sound utopian? I believe we can get there in a decade easily, but my assumption is that it’ll actually take more like two.</span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><span style="background-color: rgb(255, 255, 255);"><br>

</span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><span style="font-size: 16px; background-color: rgb(255, 255, 255);">Best

</span><span style="background-color: rgb(255, 255, 255);">—</span><span style="font-size: 16px; background-color: rgb(255, 255, 255);"> Juergen</span></div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div id="ms-outlook-mobile-signature">

<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"><span style="font-family: Helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>

Professor, Department of Linguistics<br>

University at Buffalo <br>

<br>

Office: 642 Baldy Hall, UB North Campus<br>

Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>

Phone: (716) 645 0127 <br>

Fax: (716) 645 3825<br>

Email: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" title="mailto:jb77@buffalo.edu" data-outlook-id="52423809-09af-4be5-a5c5-07f32cbe30d1" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"><br>

Web: </span><span style="font-family: Helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="f12df6a3-00eb-4f7e-9218-8226bd45c00f" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: Helvetica; font-size: 9pt;"> <br>

<br>

</span>Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) <span style="font-family: Helvetica; font-size: 9pt;"><br>

<br>

There’s A Crack In Everything - That’s How The Light Gets In <br>

(Leonard Cohen)  </span></div>

<div style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;">-- </div><p style="margin: 0in; font-family: Calibri, sans-serif; font-size: 11pt;"> </p>

</div>

<div id="mail-editor-reference-message-container">

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: Aptos; font-size: 12pt;">

<b>From: </b>Lingtyp <lingtyp-bounces@listserv.linguistlist.org> on behalf of Sylvain Kahane via Lingtyp <lingtyp@listserv.linguistlist.org><br>

<b>Date: </b>Tuesday, November 25, 2025 at 04:20<br>

<b>To: </b>Martin Haspelmath <martin_haspelmath@eva.mpg.de><br>

<b>Cc: </b>Linguistic Typology <lingtyp@listserv.linguistlist.org><br>

<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic frequencies<br>

<br>

</div>

<div class="ms-outlook-mobile-reference-message skipProofing">Once again, I’m sorry, but no one in typology works with primary data. IGTs and UD treebanks are not primary data. If I give you primary data of a language you don’t know, you cannot do much with

 it. You need a specialist to interpret them for you. Developing a UD treebank requires a lot of work and you have to make many choices. You are constantly interpreting your primary data. (I know this because I co-developed treebanks for several languages and

 I used most of the UD treebanks.) Of course, these annotated corpora are very valuable resources, especially because they are linked to primary data, but they are not primary data. And all the typological studies I know of that use of the UD collection only

 rely on annotations, i.e., only on secondary data.</div>

<div class="ms-outlook-mobile-reference-message skipProofing">Similarly, a descriptive grammar can be linked to primary data. I am not sure that a treebank without metadata and without clear guidelines explaining how choices were made is a more reliable resource.

 Again, all typologists work on secondary data. We just need reliable, good-quality secondary data. And corpus-based data, i.e. secondary data linked to primary data, are more reliable, because they are falsifiable. And they allow you to do token-based typology

 with gradual statements.</div>

<div class="ms-outlook-mobile-reference-message skipProofing">Sy</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>

</div>

<blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing">Le 25 nov. 2025 à 07:17, Martin Haspelmath <martin_haspelmath@eva.mpg.de> a écrit :</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>

</div><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Getting back to the issue of "cross-linguistic frequencies": </p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Even though I don't engage in high-level statistics myself, I don't see how we could distinguish between (i) chance, (ii) inheritance, (iii) contact influence and (iv) the universal/non-historical residue if we didn't use statistics. </p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Peter said:</p>

<div class="moz-cite-prefix" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

On 24.11.25 14:55, Peter Arkadiev wrote:</div>

<blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

when I wrote "the whole enterprise does not appear to be very productive" I rather meant the enterprise of trying to discover universal factors by means of a statistical analysis of language samples. </div>

</blockquote><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Maybe we can say that there have been no statistics breakthroughs over the last two decades, but "the whole enterprise" began in 1975 with Sherman's paper on language sampling, and it seems to me that since then, awareness of the problems in identifying universals

 quantitatively has gradually increased, and has been crucial in our understanding of the relationship between universal, areal and genealogical factors. Maybe what Peter meant was that the solution will not come from "statistics", but from better sampling,

 and I sympathize with this: </p>

<blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

I fully appreciate the efforts aimed at improving methods of both constructing samples and analysing them, since these methods allow us to test other types of hypotheses and generalisations.</div>

</blockquote><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

In any event, the issue of "primary" vs. "secondary" data (discussed by Bill Croft and Jürgen Bohnemeyer) is orthogonal to this, though truly worldwide data from a substantial number of languages is hardly available outside of secondary sources. If we want

 more fine-grained data (as in my 1997 book on "Indefinite pronouns", where I had to collect some "primary data"), we usually have to limit ourselves to fairly few languages (my sample of 40 languages was small and very skewed). Thus, there is a trade-off that

 will not go away – but all the approaches that were mentioned have been "productive", I feel.</p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Martin</p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

<span style="font-size: 13px;">Sherman, D. (1975). Stop and fricative systems: A discussion of paradigmatic gaps and the question of language sampling. In Working Papers on Language Universals 17, 1–31. Stanford University.</span></p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

<br>

</p>

<blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

----------------</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Кому: Martin Haspelmath (<a href="mailto:martin_haspelmath@eva.mpg.de" class="moz-txt-link-abbreviated" data-outlook-id="7d1e771b-e693-4a7a-9087-fe18c0a515ff">martin_haspelmath@eva.mpg.de</a>), Peter Arkadiev (<a href="mailto:peterarkadiev@yandex.ru" class="moz-txt-link-abbreviated" data-outlook-id="afb01705-6eb9-4725-876f-6ded1ec6e86d">peterarkadiev@yandex.ru</a>);</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Копия: Linguistic Typology (<a href="mailto:lingtyp@listserv.linguistlist.org" class="moz-txt-link-abbreviated" data-outlook-id="c59a125a-4f0d-4a12-b3d2-5b58fa83119b">lingtyp@listserv.linguistlist.org</a>);</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Тема: [Lingtyp] Reporting cross-linguistic frequencies;</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

24.11.2025, 15:22, "Sylvain Kahane" <a href="mailto:sylvain@kahane.fr" class="moz-txt-link-rfc2396E" data-outlook-id="e374f0c8-ffbf-4452-8673-f724066e1158">

<sylvain@kahane.fr></a>:</div>

<blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Dear Peter and Martin,</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Just a quick note about primary and secondary data. By typology based on primary data, I assume you are referring to typology based on tokens (or what we called typometrics in one of our articles). Basing our assertion on corpora has some advantages: we can

 have quantitative statements using the frequency of our observations, and our results can also be more easily verified and possibly refuted if the data we are working with is freely available (such as the UD collection of syntactic databases). But I wouldn't

 say that we are working on primary data, because this data must be transcribed and annotated in order to be used. Even if you use an LLM on raw data, your LLM has been trained on secondary data. If you examine tags such as nsubj or ADJ in a UD database, you

 need to be very careful, because even if the annotators followed the universal annotation scheme, there are different possible interpretations of these concepts, especially in ergative or functionally inconsistent languages, or in languages whose lexeme categorization

 differs from that of Indo-European languages.</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Best</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Sylvain</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Le 24 nov. 2025 à 08:56, Martin Haspelmath via Lingtyp <<a href="mailto:lingtyp@listserv.linguistlist.org" class="moz-txt-link-freetext" rel="noopener noreferrer" data-outlook-id="59ee063c-3b20-4a73-baba-4b235980d099">lingtyp@listserv.linguistlist.org</a>>

 a écrit :</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; font-family: Helvetica; font-size: 12px;">

 </div><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px;">

<span style="font-family: Helvetica; font-size: 12px;">I agree with Peter that the corpus-based methods employed by Hawkins, Wälchli, Cysouw, Levshina and others have been very important, and also with Jürgen that "when confronting the causal inference problem

 in typology, we must consider every source of evidence that we can get our hands on."</span></p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px;">

<span style="font-family: Helvetica; font-size: 12px;">But I don't agree with Peter that "the whole enterprise [of overcoming genealogical and areal biases] does not appear to be very productive", and I don't agree with Jürgen that we "must eventually move

 from secondary data typology to primary data typology".</span></p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px;">

<span style="font-family: Helvetica; font-size: 12px;">I think that the enterprise of controlling for family and contact effects is absolutely necessary, because otherwise we cannot distinguish outcomes of universal/non-historical factors from outcomes of historical

 events. Peter recognizes this implicitly when he says that we should "combine experimental research ... with a quantitative study of variation in corpora across a small number of sufficiently distinct languages". That's precisely the point: Which languages

 are "sufficiently distinct"? And hasn't the search for empirical universals been *highly productive* over the last few decades? The recent paper by Verkerk et al. (2025) has found good evidence for most of the empirical universals that had been seriously discussed

 earlier, so the Greenbergian universals seem to very robust findings compared to many other prestigious claims in linguistics.</span></p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px;">

<span style="font-family: Helvetica; font-size: 12px;">And I think that there is no reason to abandon secondary-data typology just because we can also (increasingly) do primary-data typology. Typological comparison can be done at multiple scales and multiple

 levels of granularity, and it is not clear that we can dispense with any of these levels. For example, we want to do typology of phonological segments (along the lines of the Phoible.org database), or typology of word meanings (lexification typology, cf.

<a href="https://clics.clld.org/" class="moz-txt-link-freetext" rel="noopener noreferrer" originalsrc="https://clics.clld.org/" data-outlook-id="5382d766-0bca-4c45-a3c6-aa89434eb2a4" style="margin-top: 0px; margin-bottom: 0px;">

https://clics.clld.org/</a>), and for these, it seems that secondary data will not be easily replaced.</span></p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px;">

<span style="font-family: Helvetica; font-size: 12px;">Best,</span></p><p class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px;">

<span style="font-family: Helvetica; font-size: 12px;">Martin</span></p>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

On 21.11.25 16:04, Juergen Bohnemeyer wrote:</div>

<blockquote>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

Dear Peter — I’m a massive fan of corpus-based typology. More broadly, there is no question in my mind that we should, and must, eventually move from secondary data typology to primary data typology. Nobody seems to deny that secondary data typology is fraught

 with too many problematic idealizations: in particular, it reduces entire languages to single observations, and it suffers from incomparable decisions on what is treated as a language in different parts of the world. </div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

(The second problem is closely related to, but not entirely identical with, the countability problem Ian Joo mentions. The fact that

<i>language</i> is a count noun is a powerful illustration of how ordinary language can frame reality in ways that may impede scientific progress if it goes unchecked, as Whorf pointed out. However, actually counting languages is not the issue for regression-based

 modeling, since regression models don’t operate on counts. But the question whether what is treated as an observation (i.e., a language) is uniform across the sample is of course very much a concern for the validity of sampling-based and regression-based modeling

 alike.)</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

There is a broader answer to your question, though: as a matter of course, when confronting the causal inference problem in typology (i.e., when hunting for the causal forces that shape languages), we must consider every source of evidence that we can get our

 hands on.  Aside from corpus-based typology, this includes field-based psycholinguistics and the toolkit of evolutionary linguistics, including simulations and miniature artificial language experiments. </div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

Let me also suggest a distinction between methods that are primarily geared toward the discovery of typological distributions and the examination of their statistical properties and methods than can be used to test hypotheses of causal inference (i.e., explanatory

 hypotheses). Experimental research such as what I just mentioned has its uses primarily for testing explanatory hypotheses. Corpus-based research can have both functions. But if we want to use corpora to discover typological distributions, we’ll need very

 large parallax corpus databases. As are being developed now. </div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

Best — Juergen</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div id="1dadbb124ddfd57c2650379a47ad6a82ms-outlook-mobile-signature">

<div style="margin: 0in; font-family: calibri, sans-serif; font-size: 11pt;"><span style="font-family: helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>

Professor, Department of Linguistics<br>

University at Buffalo <br>

<br>

Office: 642 Baldy Hall, UB North Campus<br>

Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>

Phone: (716) 645 0127 <br>

Fax: (716) 645 3825<br>

Email: </span><span style="font-family: helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" title="mailto:jb77@buffalo.edu" data-outlook-id="7773321d-a498-410a-b262-e09dd7079ecc" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: helvetica; font-size: 9pt;"><br>

Web: </span><span style="font-family: helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="dbc25ef1-d4dc-43dc-8338-7eb6a1263e4a" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: helvetica; font-size: 9pt;"> <br>

<br>

</span>Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) <span style="font-family: helvetica; font-size: 9pt;"><br>

<br>

There’s A Crack In Everything - That’s How The Light Gets In <br>

(Leonard Cohen)  </span></div>

<div style="margin: 0in; font-family: calibri, sans-serif; font-size: 11pt;">-- </div><p style="margin: 0in; font-family: calibri, sans-serif; font-size: 11pt;"> </p>

</div>

<div id="a072bc9780e9b7b3b9a49a25572a5efdmail-editor-reference-message-container">

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="text-align: left; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: aptos; font-size: 12pt;">

<b>From: </b>Lingtyp <a href="mailto:lingtyp-bounces@listserv.linguistlist.org" class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="98aa2383-6a42-4d19-9c6d-47fbd5987b8b">

<lingtyp-bounces@listserv.linguistlist.org></a> on behalf of Peter Arkadiev via Lingtyp

<a href="mailto:lingtyp@listserv.linguistlist.org" class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="4162e167-c2c0-4f16-816f-3b0b9433cf99">

<lingtyp@listserv.linguistlist.org></a><br>

<b>Date: </b>Friday, November 21, 2025 at 05:59<br>

<b>To: </b>Martin Haspelmath <a href="mailto:martin_haspelmath@eva.mpg.de" class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="2f16f6bf-b6f2-46ef-84d9-c5f1a284d2c8">

<martin_haspelmath@eva.mpg.de></a>, Linguistic Typology <a href="mailto:lingtyp@listserv.linguistlist.org" class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="b276f979-9a2e-4ec8-9939-bbf817c239c1">

<lingtyp@listserv.linguistlist.org></a><br>

<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic frequencies<br>

<br>

</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Dear Martin, dear all,</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

 </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

I am starting to wonder whether statistical analysis of a language sample is at all a suitable method for "detecting universal tendencies that are caused by universal/non-historical factors" (Martin's formulation). Given that there is no consensus as for how

 to overcome genealogical and areal biases and even whether those biases must be overcome at all and what trying to overcome them actually gets us (apart from getting some of us high-profile publications with ever more complicated mathematical apparatus which

 others among us struggle to understand and cannot evaluate; not being in any way a "mathematically-gifted person", to borrow Stela's expression, I belong to the latter group), the whole enterprise does not appear to be very productive. What if the more appropriate

 method, at least if purported functional factors are being concerned, is the one employed by John Hawkins, Natalia Levshina and some others, i.e. to combine experimental research on production / processing with a quantitative study of variation in corpora

 across a small number of sufficiently distinct languages? If we can show that certain well-defined factors are operative in language processing and result in skewed distributions in corpora ultimately translatable into tendencies of diachronic change, and

 moreover are able to corroborate these results by similarly skewed distributions of variables in reasonably designed cross-linguistic samples, then what else do we need? In any case, as has been repeatedly stated many times, even if we find that in a certain

 language sample, however well-designed, a certain variable shows a clearly skewed distribution of, say 80% vs 20%, nothing follows from this in terms of "universal preferences" unless we are able to independently show that the more frequent value is in some

 or other way "preferred" in processing / production etc. I am sorry if the above is self-evident or naive.</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

 </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Best regards,</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

 </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Peter<br>

 </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

 </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

----------------</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Кому: <a href="mailto:lingtyp@listserv.linguistlist.org" class="324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated moz-txt-link-freetext" data-outlook-id="471470d8-7001-423a-b481-683f2bafdc9d">

lingtyp@listserv.linguistlist.org</a> (<a href="mailto:lingtyp@listserv.linguistlist.org" class="324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated moz-txt-link-freetext" data-outlook-id="bc7cee94-1f01-419c-a23c-de9ec023d98b">lingtyp@listserv.linguistlist.org</a>);</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Тема: [Lingtyp] Reporting cross-linguistic frequencies;</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

21.11.2025, 10:19, "Martin Haspelmath via Lingtyp" <a href="mailto:lingtyp@listserv.linguistlist.org" class="1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="d9ddbaa3-2648-4043-8ab2-c0d86e6a4285">

<lingtyp@listserv.linguistlist.org></a>:</div>

<blockquote><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Thanks, Jürgen! I like the "wave vs. particle" analogy, because these concrete expressions help us make sense of what seems to be going on (given the experimental results).</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

In worldwide comparative linguistics, we also want to make sense of what is going on, but it seems to me that we need analogies not only for interpreting results, but also for understanding what we are aiming for. For me, "removing areal and genealogical/phylogenetic

 bias" has the aim of detecting universal tendencies that are caused by universal/non-historical factors.</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

I would think that on the imagined concrete scenario of a sample of isolated isolates (e.g. 100 languages that have long existed on isolated islands, maybe of the Rapanui type), looking at these 100 isolates should give the same results as looking at 100 sample

 languages from larger families that have been shaped also by contact.</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Are there reasons to doubt this? If not, then we can take the "isolated isolates" scenario simply as a way of illustrating our goals in concrete terms (somewhat like "wave" and "particle" serve as concrete illustrations). </p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

But maybe the imagined scenario (which is not an "assumption"!!) is somehow problematic, because the goals of our enterprise are DIFFERENT. In Bickel's (2007) paper (LiTy 11), which has been widely cited, the idea seems to be that looking for "history-free"

 tendencies is somehow an obsolete goal.</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Some people have suggested that in identifying universal trends, one MUST take into account genealogies, and isolates are problematic because they are not part of any genealogy. This is because we should not look primarily at languages, but at *transitions*

 (changes from one type to another). If I understood Verkerk et al. (2025) correctly, they solved the "isolates problem" by using an artificial world tree (where all languages are somehow included; the very beautiful tree is used in

<a href="https://www.mpg.de/25723124/1114-evan-enduring-patterns-in-the-world-s-languages-150495-x" rel="noopener noreferrer" originalsrc="https://www.mpg.de/25723124/1114-evan-enduring-patterns-in-the-world-s-languages-150495-x" data-outlook-id="2d301db4-0a35-4653-96ef-7e836d862a4b" style="margin-top: 0px; margin-bottom: 0px;">

the press release</a>). Are Verkerk et al. pursuing a different goal? That is not really clear to me.</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

I find the notion of an artificial world tree profoundly strange, much stranger than the hypothetical scenario of 100 isolates on remote islands. But maybe it is needed, because the goal of the enterprise is somehow different (along Bickel's lines)? So I like

 the imagined "isolated isolates" scenario also because it clarifies what I'm interested in.</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

(And isn't Trudgill's idea that isolates are somehow "exotic" very speculative? Shcherbakova et al. 2023 have not provided strong evidence against the idea, but they simply did not find evidence in favour of it.)</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

One last point: Yes, all isolates are survivors from some larger family, but why is that relevant? Languages may have existed for half a million years or longer, and we know almost nothing about that deep past. Most of the currently existing families probably

 had more branches in earlier times, and the protolanguages we reconstruct may or may not have been isolates themselves. We cannot tell, but I don't see why we would need to know.</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Best,</p><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Martin</p>

<div> </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

On 21.11.25 07:07, Juergen Bohnemeyer via Lingtyp wrote:</div>

<blockquote>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

Dear all — Here’s a quick explanation of why the assumption of an “isolated isolate” is profoundly strange: </div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif;">

<span style="font-size: 12pt;">Leaving aside sign languages, constructed languages, and artificial languages, nobody seems to entertain the possibility that languages have emerged spontaneously out of something that we wouldn’t consider a language itself over

 the last few thousands of years. In other words, the languages we consider isolates are without exception lone survivors; but they did descend from  ancestors which are often

</span><span style="font-size: 16px; background-color: rgb(255, 255, 255);">lost and unknown</span><span style="font-size: 12pt;">, and these ancestors biased the offshoot's properties by dint of inheritance/transmission.</span></div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

The reason isolates are interesting from a sampling perspective is that they may represent entire genera or families without forcing us to pick a member. But being an isolate does not mean being free of phylogenetic bias. On the contrary: isolates of unknown

 descend are actually particularly problematic in the sense that they are shaped by biases that we have no way of identifying directly since the biasing ancestors have been lost to time.</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif;">

<span style="font-size: 12pt;">As to contact. Languages that are not in contact with other languages over long stretches of time are extremely rare and unusual. In fact, as I’m sure everyone here is aware, such languages have been plausibly argued to tend to

 evolve exotic properties as a result of their isolation (</span><span style="font-size: 16px; background-color: rgb(255, 255, 255);">Lupyan & Dale 2010;

</span><span style="font-size: 12pt;">Trudgill 2011), although this is controversial (Shcherbakova et al. 2023). In any case, I would certainly not want to make such languages the basis for causal inference in typology.</span></div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

But it gets a lot worse. The “isolated isolate” interpretation doesn’t just require us to think of a language that isn’t currently in contact with any other language. We would have to assume a language that has

<b>never</b> come into contact with any other language at any point in its history (at least not long/intensively enough to change as a result of it). I’m seriously uncertain whether such a language has ever existed on this planet. </div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

Here’s an analogy from quantum mechanics: Schrödinger’s and Heisenberg’s equations are mathematical models that describe the experimentally observed behavior of elementary particles under various conditions. The particle and the wave interpretation are interpretations

 that we use to make sense of these mathematical models. We find these models useful because most of us don’t think in mathematical equations (not even theoretical physicists, it would seem). But if we push these interpretations beyond a certain point, they

 break down. To begin with, we can’t think of something simultaneously as a wave and as a particle. </div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

In the same way, we can mathematically describe the influence phylogeny and areality exert on the probability of a particular language having certain properties. The “isolated isolate” interpretation is just that - an interpretation of the statistical models;

 but, as I tried to show above, it runs into absurdities rather more quickly than the particle and wave interpretations in quantum mechanics. </div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

Best — Juergen</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

G. Lupyan, R. Dale, Language structure is partly determined by social structure. PLOS ONE5, e8559 (2010).</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

O. Shcherbakova, S. M. Michaelis, H. J. Haynie, et al. Societies of strangers do not speak less complex languages.

<i>Scientific Advances </i>9, eadf7704 (2023).</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

P. Trudgill, <i>Sociolinguistic Typology: Social Determinants of Linguistic Complexity

</i>(OxfordUniv. Press, 2011).</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing" style="font-family: aptos, arial, helvetica, sans-serif; font-size: 12pt;">

<br>

</div>

<div id="c43567bf72e9785131187258bb81de281dadbb124ddfd57c2650379a47ad6a82ms-outlook-mobile-signature">

<div style="margin: 0in; font-family: calibri, sans-serif; font-size: 11pt;"><span style="font-family: helvetica; font-size: 9pt;">Juergen Bohnemeyer (He/Him)<br>

Professor, Department of Linguistics<br>

University at Buffalo <br>

<br>

Office: 642 Baldy Hall, UB North Campus<br>

Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>

Phone: (716) 645 0127 <br>

Fax: (716) 645 3825<br>

Email: </span><span style="font-family: helvetica; font-size: 9pt; color: rgb(0, 120, 212);"><u><a href="mailto:jb77@buffalo.edu" class="b4cc140a61ad6e2cca0049adf245597656221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext 56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" title="mailto:jb77@buffalo.edu" data-outlook-id="bd931089-98c5-47f0-9284-249297dc0b2f" style="color: rgb(0, 120, 212); margin-top: 0px; margin-bottom: 0px;">jb77@buffalo.edu</a></u></span><span style="font-family: helvetica; font-size: 9pt;"><br>

Web: </span><span style="font-family: helvetica; font-size: 9pt; color: rgb(5, 99, 193);"><u><a href="http://www.acsu.buffalo.edu/~jb77/" class="b4cc140a61ad6e2cca0049adf245597656221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext 56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" title="http://www.acsu.buffalo.edu/~jb77/" data-outlook-id="b1849374-d5f6-4093-9d29-0007aca379be" style="color: rgb(5, 99, 193); margin-top: 0px; margin-bottom: 0px;">http://www.acsu.buffalo.edu/~jb77/</a></u></span><span style="font-family: helvetica; font-size: 9pt;"> <br>

<br>

</span>Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 585 520 2411; Passcode Hoorheh) <span style="font-family: helvetica; font-size: 9pt;"><br>

<br>

There’s A Crack In Everything - That’s How The Light Gets In <br>

(Leonard Cohen)  </span></div>

<div style="margin: 0in; font-family: calibri, sans-serif; font-size: 11pt;">-- </div><p style="margin: 0in; font-family: calibri, sans-serif; font-size: 11pt;"> </p>

</div>

<div id="c8768d4c0a34970a3586aba0ec3bb0a072bc9780e9b7b3b9a49a25572a5efdmail-editor-reference-message-container">

<div class="372af16aba6969554415dcab9b8e185b39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 380f653594ff96b7621c3aa3240673d728e41e5405915cc550e9daf9431b8d6skipProofing" style="text-align: left; padding: 3pt 0in 0in; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; font-family: aptos; font-size: 12pt;">

<b>From: </b>Lingtyp <a href="mailto:lingtyp-bounces@listserv.linguistlist.org" class="e78fc4d19319e3dbcccf38048379940e1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="200c6e01-18bb-4bdd-8a6f-5c7b55e04ad9">

<lingtyp-bounces@listserv.linguistlist.org></a> on behalf of Matías Guzmán Naranjo via Lingtyp

<a href="mailto:lingtyp@listserv.linguistlist.org" class="e78fc4d19319e3dbcccf38048379940e1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="6791f909-9e4d-4722-9f9c-4532c86208ce">

<lingtyp@listserv.linguistlist.org></a><br>

<b>Date: </b>Thursday, November 20, 2025 at 04:01<br>

<b>To: </b><a href="mailto:lingtyp@listserv.linguistlist.org" class="c5add141b47d5d4e9646b356789522324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated 56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" data-outlook-id="22537e33-2f20-4b90-86e2-3762dfcb49cb">lingtyp@listserv.linguistlist.org</a> <a href="mailto:lingtyp@listserv.linguistlist.org" class="e78fc4d19319e3dbcccf38048379940e1f85fe41a9161661477b40489dd2f552moz-txt-link-rfc2396E" data-outlook-id="722f9994-de8c-43cf-8f63-66235a385f0a"><lingtyp@listserv.linguistlist.org></a><br>

<b>Subject: </b>Re: [Lingtyp] Reporting cross-linguistic frequencies<br>

<br>

</div>

<div class="36472014ab394f30c45e8fa241f08493662abf391f8b56b11ba8165b7bd48bb4PlainText" style="font-size: 11pt;">

I'll jump in with some thoughts.<br>

<br>

<br>

- Dryer's method and ours aim at doing basically the same thing:<br>

quantifying what's "left" after removing genetic and areal bias.<br>

<br>

- Whether you should call them proportions or adjusted frequencies...<br>

I'm not sure that it matters that much? As long as you understand how<br>

they were calculated...<br>

<br>

- How you want to interpret this "what's left" is debatable, I guess,<br>

but I don't think I agree with Jürgen. As far as I can tell it should be<br>

compatible with something along the lines of an "isolated isolate" as<br>

described by Martin. You can also see them as 'universal' preferences<br>

(more or less the same thing?).<br>

<br>

- "the probability of a random language having a certain property<br>

depends on (or is influenced by, or varies with, etc.) it being related<br>

to certain other languages, or being  spoken (or signed) in a particular<br>

area" -> In our approach we assumes that the probability of a language L<br>

having some feature value F depends on three things: 1) its relatedness<br>

to other languages, 2) its contact to other languages, 3) some universal<br>

preference for F. Kind of the point of what we do is that we try to<br>

estimate each of these factors. [We can add more factors and more<br>

structure, but that's the most basic model]<br>

<br>

- You can quantify the contribution of the phylogenetic component and<br>

the areal component(s) with our techniques, but this is a bit tricky<br>

because there is unavoidable overlap in the information each one<br>

contains. These measures also have a different meaning than the adjusted<br>

frequency and can't be used as a replacement for them, you can use them<br>

in addition to.<br>

<br>

<br>

Matías<br>

<br>

<br>

<br>

El 20/11/25 a las 9:36, Omri Amiraz via Lingtyp escribió:<br>

> Dear all,<br>

> I agree with Ian that, in addition to genealogical and areal biases,<br>

> the very question of what counts as a language versus a dialect is<br>

> partly subjective. This makes actual frequencies even more<br>

> problematic, since we would obtain different results depending on<br>

> whether we treat Wu Chinese as one language or as thirty separate<br>

> languages, as Ian pointed out.<br>

> Juergen wrote: "We can empirically assess the extent to which the<br>

> probability of a random language having a certain property depends on<br>

> (or is influenced by, or varies with, etc.) it being related to<br>

> certain other languages, or being  spoken (or signed) in a particular<br>

> area."<br>

><br>

> I wonder whether it might be useful to have a measure of the<br>

> genealogical and areal spread of a feature, essentially quantifying<br>

> how broadly it is distributed across families and regions in the<br>

> present-day world. Such a measure might be more straightforward to<br>

> interpret than an adjusted frequency/probability, since it is not<br>

> clear whether the described population is a hypothetical set of<br>

> isolated isolates or something else.<br>

><br>

> Is anyone aware of an existing metric that captures genealogical or<br>

> areal spread in this way?<br>

><br>

> Best,<br>

> Omri<br>

><br>

> _______________________________________________<br>

> Lingtyp mailing list<br>

> <a href="mailto:Lingtyp@listserv.linguistlist.org" class="c5add141b47d5d4e9646b356789522324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated 56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" data-outlook-id="e31c3489-fcb3-4090-8708-c250ac7b74f5">

Lingtyp@listserv.linguistlist.org</a><br>

> <a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" originalsrc="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" data-outlook-id="42a19adb-7ad8-4f95-a412-1f392110e9d2">

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C02%7Cjb77%40buffalo.edu%7C88b1df86321b4cb12f9f08de28135c96%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638992260962407959%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=uY52%2BPtTVyzNB0LIowvZ0UzKWB6MWLR%2BG62V70JtNGE%3D&reserved=0</a><br>

_______________________________________________<br>

Lingtyp mailing list<br>

<a href="mailto:Lingtyp@listserv.linguistlist.org" class="c5add141b47d5d4e9646b356789522324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated 56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" data-outlook-id="dc404c19-797f-4f7d-8b42-6b1f96383e9b">Lingtyp@listserv.linguistlist.org</a><br>

<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" originalsrc="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" data-outlook-id="da738b64-2875-4b9a-8608-55a9e91e5960">https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C02%7Cjb77%40buffalo.edu%7C88b1df86321b4cb12f9f08de28135c96%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638992260962443120%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=X%2F1JMgRNS%2Bn0ZlGa7pPdsJWJBoJy%2BYJt6bHWktCMeRc%3D&reserved=0</a></div>

</div>

<div dir="ltr" class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

<br>

</div>

<pre><div class="f6fe311fcf3620eaf45679d16132c82e3f7f1cfb43cdc145acb8dd7a82f3a2c8moz-quote-pre">_______________________________________________

Lingtyp mailing list

<a href="mailto:Lingtyp@listserv.linguistlist.org" class="c5add141b47d5d4e9646b356789522324de92b3f6b2f5e993df2fdf11fa1c7moz-txt-link-abbreviated 56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" data-outlook-id="2856029a-373d-4d5c-8dbb-402f9887557a">Lingtyp@listserv.linguistlist.org</a>

<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" class="b4cc140a61ad6e2cca0049adf245597656221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext 56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" originalsrc="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" data-outlook-id="2e3164fb-4c84-4543-baae-ac0961a662d1">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a>

</div></pre>

</blockquote>

<pre><div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">-- 

Martin Haspelmath

Max Planck Institute for Evolutionary Anthropology

Deutscher Platz 6

D-04103 Leipzig

<a href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/" class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" rel="noopener noreferrer" originalsrc="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/" data-outlook-id="0021846f-d825-44c5-9b51-8cd99bead3fd">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></div></pre>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

,</div><p class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

_______________________________________________<br>

Lingtyp mailing list<br>

<a href="mailto:Lingtyp@listserv.linguistlist.org" class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" rel="noopener noreferrer" data-outlook-id="c3c65970-259c-46d0-94a0-4a06fc3bb8a1" style="margin-top: 0px; margin-bottom: 0px;">Lingtyp@listserv.linguistlist.org</a><br>

<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" rel="noopener noreferrer" originalsrc="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" data-outlook-id="bd8c9462-f9f7-4ff0-a671-1d3f39e171f6" style="margin-top: 0px; margin-bottom: 0px;">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a></p>

</blockquote>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

 </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

 </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

-- </div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

Peter Arkadiev, PhD Habil.</div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

<a href="https://peterarkadiev.github.io/" class="56221ecd4cd88a7e220fd42e552d23b7moz-txt-link-freetext moz-txt-link-freetext" originalsrc="https://peterarkadiev.github.io/" data-outlook-id="c543eda1-e55d-42f0-b993-1cd929f9d8b2">https://peterarkadiev.github.io/</a></div>

<div class="39e1103e8e8cbb86b1168768346b5522ms-outlook-mobile-reference-message 728e41e5405915cc550e9daf9431b8d6skipProofing">

 </div>

</div>

</blockquote>

<pre><div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-size: 12px;">-- 

Martin Haspelmath

Max Planck Institute for Evolutionary Anthropology

Deutscher Platz 6

D-04103 Leipzig

<a href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/" class="moz-txt-link-freetext" rel="noopener noreferrer" originalsrc="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/" data-outlook-id="dd62e6d7-bbe5-4418-aa94-ff893d78602f">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></div></pre>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; font-family: Helvetica; font-size: 12px;">

_______________________________________________<br>

Lingtyp mailing list<br>

<a href="mailto:Lingtyp@listserv.linguistlist.org" class="moz-txt-link-freetext" rel="noopener noreferrer" data-outlook-id="a6a39cfa-52b8-4b7b-ab84-92b0d9712c8f">Lingtyp@listserv.linguistlist.org</a><br>

<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" class="moz-txt-link-freetext" rel="noopener noreferrer" originalsrc="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" data-outlook-id="09a17ece-8f68-487d-b8d2-74541289fec0">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a></div>

</blockquote>

</blockquote>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

-- </div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

Peter Arkadiev, PhD Habil.</div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

<a href="https://peterarkadiev.github.io/" class="moz-txt-link-freetext" originalsrc="https://peterarkadiev.github.io/" data-outlook-id="29fdebbf-7a9d-4c18-a891-bb298fd9ee93">https://peterarkadiev.github.io/</a></div>

<div class="ms-outlook-mobile-reference-message skipProofing" style="text-align: left; text-indent: 0px; font-family: Helvetica; font-size: 12px;">

 </div>

</blockquote>

<pre><div class="moz-signature" style="text-align: left; text-indent: 0px; font-size: 12px;">-- 

Martin Haspelmath

Max Planck Institute for Evolutionary Anthropology

Deutscher Platz 6

D-04103 Leipzig

<a href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/" class="moz-txt-link-freetext" originalsrc="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/" data-outlook-id="2b63747c-ae5e-4b3a-ac2d-6cad6f99cee9">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></div></pre>

</blockquote>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>

</div>

</div>

</blockquote>

<div dir="ltr" class="ms-outlook-mobile-reference-message skipProofing"><br>

</div>

</div>

</div>

</div></blockquote></div><br></div></body></html>