<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>I agree with Omri that it would be better to say things like "<i>When

        genealogical and areal biases are controlled for, the

        probability of a language being OV is 0.6"." </i>Indeed, we are

      trying to find out what the quantitative distribution would be if

      each language were an isolate with no contact to other languages.</p>

    <p><br>

    </p>

    <p>Some authors have said that instead of striving for independence

      of sample languages, we should base our conclusions on inferred

      changes in larger families (this is also called "phylogenetic

      approach", and is represented, for example, by <a

        moz-do-not-send="true"

        href="https://www.nature.com/articles/s41562-025-02325-z">the

        recent paper by Verkerk et al. 2025</a>). But these changes are

      rarely independent of each other (because related languages tend

      to stay in geographic proximity), so <a moz-do-not-send="true"

        href="https://dlc.hypotheses.org/2368">I'm not sure</a> that

      much is gained by this approach. (Moreover, it only works if one

      has a very large amount of data.)</p>

    <p><br>

    </p>

    <p>Be that as it may, it is clear that such probabilities can be

      estimated only with substantial uncertainties, so that results

      which do not show a very strong skewing ("overwhelmingly greater

      than chance frequency", in Greenberg's terms) should be

      interpreted cautiously.</p>

    <p><br>

    </p>

    <p>Best,</p>

    <p>Martin</p>

    <p><br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 18.11.25 10:23, Omri Amiraz via

      Lingtyp wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:AMBPR09MB81827399627C73F280A75FB285D6A@AMBPR09MB8182.eurprd09.prod.outlook.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        Dear Colleagues,</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        I would like to raise the question of how cross-linguistic

        frequencies of typological features ought to be reported. The

        issue has been discussed extensively, but I still find some

        aspects conceptually confusing, so I hope this discussion might

        be helpful for others as well.</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        To make this concrete, consider the order of object and verb

        (OV, VO, no dominant order). Suppose, for the sake of argument,

        that we have complete data for every language in Glottolog. This

        would give us the

        <i>actual</i> proportion of languages that are OV vs. VO in the

        present-day world. The core problem, however, is that languages

        are not independent datapoints, so these actual frequencies also

        reflect genealogical and areal biases.</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        For that reason, it is common practice to report <i>adjusted</i> frequencies

        instead, either through non-proportional stratified sampling

        (Dryer 2018) or through statistical bias controls (Becker &

        Guzmán Naranjo 2025). As far as I understand, both methods aim

        to estimate something like: <i>If each language were

          independent (as if every language were an isolate and had no

          contact with its neighbors), what proportion would be OV vs.

          VO?</i> In other words, the population being described is not

        the set of existing languages but a hypothetical (and

        unrealistic) set of independent languages.</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        Now, suppose that the actual frequencies of OV and VO are equal,

        but the adjusted frequency of OV is higher. In that case, it

        feels counterintuitive to say that OV is more common

        cross-linguistically than VO. Perhaps it is clearer to speak in

        terms of probabilities rather than proportions, given that the

        population is hypothetical. For instance, we might say:

        <i>“When genealogical and areal biases are controlled for, the

          probability of a language being OV is 0.6".

        </i>This means that the chance that a randomly sampled language

        isolate with no contact would be OV is 0.6. By contrast, saying

        “60% of the world’s languages are OV” when referring to an

        adjusted frequency seems potentially misleading.</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        I would appreciate hearing what others in the community think

        about how such statistics should ideally be reported.</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        Best regards,<br>

        Omri</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        <br>

      </div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        References:</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        Becker, Laura and Guzmán Naranjo Matías. 2025. Replication and

        methodological robustness in quantitative typology.

        <i>Linguistic Typology</i>.</div>

      <div

style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"

        class="elementToProof">

        Dryer, Matthew S. 2018. On the order of demonstrative, numeral,

        adjective, and noun.

        <i>Language</i> 94(4), 798-833.</div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre wrap="" class="moz-quote-pre">_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a>

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Martin Haspelmath

Max Planck Institute for Evolutionary Anthropology

Deutscher Platz 6

D-04103 Leipzig

<a class="moz-txt-link-freetext" href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></pre>

  </body>

</html>