<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:DengXian;

        panose-1:2 1 6 0 3 1 1 1 1 1;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Aptos;}

@font-face

        {font-family:"\@DengXian";

        panose-1:2 1 6 0 3 1 1 1 1 1;}

@font-face

        {font-family:Consolas;

        panose-1:2 11 6 9 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

pre

        {mso-style-priority:99;

        mso-style-link:"HTML - förformaterad Char";

        margin:0cm;

        font-size:10.0pt;

        font-family:"Courier New";}

span.HTML-frformateradChar

        {mso-style-name:"HTML - förformaterad Char";

        mso-style-priority:99;

        mso-style-link:"HTML - förformaterad";

        font-family:Consolas;}

span.E-postmall20

        {mso-style-type:personal-reply;

        font-family:"Aptos",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;

        mso-ligatures:none;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:72.0pt 72.0pt 72.0pt 72.0pt;}

div.WordSection1

        {page:WordSection1;}

--></style>

</head>

<body link="blue" vlink="purple" style="word-wrap:break-word">

<div class="WordSection1">

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt">I also agree that probabilities are to prefer to frequencies, but I am skeptical to the idea that we should be looking for what languages would be like if they were all isolated isolates. One

 objection is that languages that are without known relatives and /or neighbours usually have not always been that way. Another is that there may be properties of languages that we would like to make generalizations about but which depend on contact influence.

 For example, most Germanic languages have large numbers of loanwords from Greek, Latin, and Romance which have many phonological and morphological properties that make them stand out from the rest of the vocabulary. Such a situation presumably could not arise

 in a language without neighbours.    <o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt"><o:p> </o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt">The idea of a language without relatives or neighbours is somewhat reminiscent of “homo economicus” in economics or closer to our concerns, Chomsky’s “</span><span style="font-size:11.0pt">ideal

 speaker-listener in a completely homogeneous speech-community</span><span lang="EN-US" style="font-size:11.0pt">”. Like those phantoms, the isolated isolate can at most serve as a useful temporary construct but can hardly be the final goal of our endeavour.

<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt"><o:p> </o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt">Östen<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt"><o:p> </o:p></span></p>

<div>

<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">

<p class="MsoNormal"><b><span lang="SV" style="font-size:11.0pt;font-family:"Calibri",sans-serif">Från:</span></b><span lang="SV" style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Lingtyp <lingtyp-bounces@listserv.linguistlist.org>

<b>För </b>Martin Haspelmath via Lingtyp<br>

<b>Skickat:</b> den 18 november 2025 16:09<br>

<b>Till:</b> lingtyp@listserv.linguistlist.org<br>

<b>Ämne:</b> Re: [Lingtyp] Reporting cross-linguistic frequencies<o:p></o:p></span></p>

</div>

</div>

<p class="MsoNormal"><o:p> </o:p></p>

<p>I agree with Omri that it would be better to say things like "<i>When genealogical and areal biases are controlled for, the probability of a language being OV is 0.6"."

</i>Indeed, we are trying to find out what the quantitative distribution would be if each language were an isolate with no contact to other languages.<o:p></o:p></p>

<p><o:p> </o:p></p>

<p>Some authors have said that instead of striving for independence of sample languages, we should base our conclusions on inferred changes in larger families (this is also called "phylogenetic approach", and is represented, for example, by

<a href="https://www.nature.com/articles/s41562-025-02325-z">the recent paper by Verkerk et al. 2025</a>). But these changes are rarely independent of each other (because related languages tend to stay in geographic proximity), so

<a href="https://dlc.hypotheses.org/2368">I'm not sure</a> that much is gained by this approach. (Moreover, it only works if one has a very large amount of data.)<o:p></o:p></p>

<p><o:p> </o:p></p>

<p>Be that as it may, it is clear that such probabilities can be estimated only with substantial uncertainties, so that results which do not show a very strong skewing ("overwhelmingly greater than chance frequency", in Greenberg's terms) should be interpreted

 cautiously.<o:p></o:p></p>

<p><o:p> </o:p></p>

<p>Best,<o:p></o:p></p>

<p>Martin<o:p></o:p></p>

<p><o:p> </o:p></p>

<p><o:p> </o:p></p>

<p><o:p> </o:p></p>

<div>

<p class="MsoNormal">On 18.11.25 10:23, Omri Amiraz via Lingtyp wrote:<o:p></o:p></p>

</div>

<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">Dear Colleagues,<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">I would like to raise the question of how cross-linguistic frequencies of typological features ought to be reported. The issue has been discussed extensively, but I still find some aspects conceptually confusing,

 so I hope this discussion might be helpful for others as well.<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">To make this concrete, consider the order of object and verb (OV, VO, no dominant order). Suppose, for the sake of argument, that we have complete data for every language in Glottolog. This would give us the

<i>actual</i> proportion of languages that are OV vs. VO in the present-day world. The core problem, however, is that languages are not independent datapoints, so these actual frequencies also reflect genealogical and areal biases.<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">For that reason, it is common practice to report

<i>adjusted</i> frequencies instead, either through non-proportional stratified sampling (Dryer 2018) or through statistical bias controls (Becker & Guzmán Naranjo 2025). As far as I understand, both methods aim to estimate something like:

<i>If each language were independent (as if every language were an isolate and had no contact with its neighbors), what proportion would be OV vs. VO?</i> In other words, the population being described is not the set of existing languages but a hypothetical

 (and unrealistic) set of independent languages.<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">Now, suppose that the actual frequencies of OV and VO are equal, but the adjusted frequency of OV is higher. In that case, it feels counterintuitive to say that OV is more common cross-linguistically than VO. Perhaps

 it is clearer to speak in terms of probabilities rather than proportions, given that the population is hypothetical. For instance, we might say:

<i>“When genealogical and areal biases are controlled for, the probability of a language being OV is 0.6".

</i>This means that the chance that a randomly sampled language isolate with no contact would be OV is 0.6. By contrast, saying “60% of the world’s languages are OV” when referring to an adjusted frequency seems potentially misleading.<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">I would appreciate hearing what others in the community think about how such statistics should ideally be reported.<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">Best regards,<br>

Omri<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">References:<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">Becker, Laura and Guzmán Naranjo Matías. 2025. Replication and methodological robustness in quantitative typology.

<i>Linguistic Typology</i>.<o:p></o:p></span></p>

</div>

<div style="margin-top:12.0pt;margin-bottom:12.0pt">

<p class="MsoNormal"><span style="color:black">Dryer, Matthew S. 2018. On the order of demonstrative, numeral, adjective, and noun.

<i>Language</i> 94(4), 798-833.<o:p></o:p></span></p>

</div>

<p class="MsoNormal"><br>

<br>

<o:p></o:p></p>

<pre>_______________________________________________<o:p></o:p></pre>

<pre>Lingtyp mailing list<o:p></o:p></pre>

<pre><a href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a><o:p></o:p></pre>

<pre><a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a><o:p></o:p></pre>

</blockquote>

<pre>-- <o:p></o:p></pre>

<pre>Martin Haspelmath<o:p></o:p></pre>

<pre>Max Planck Institute for Evolutionary Anthropology<o:p></o:p></pre>

<pre>Deutscher Platz 6<o:p></o:p></pre>

<pre>D-04103 Leipzig<o:p></o:p></pre>

<pre><a href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a><o:p></o:p></pre>

</div>

</body>

</html>