<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>I agree with Omri that it would be better to say things like "<i>When
genealogical and areal biases are controlled for, the
probability of a language being OV is 0.6"." </i>Indeed, we are
trying to find out what the quantitative distribution would be if
each language were an isolate with no contact to other languages.</p>
<p><br>
</p>
<p>Some authors have said that instead of striving for independence
of sample languages, we should base our conclusions on inferred
changes in larger families (this is also called "phylogenetic
approach", and is represented, for example, by <a
moz-do-not-send="true"
href="https://www.nature.com/articles/s41562-025-02325-z">the
recent paper by Verkerk et al. 2025</a>). But these changes are
rarely independent of each other (because related languages tend
to stay in geographic proximity), so <a moz-do-not-send="true"
href="https://dlc.hypotheses.org/2368">I'm not sure</a> that
much is gained by this approach. (Moreover, it only works if one
has a very large amount of data.)</p>
<p><br>
</p>
<p>Be that as it may, it is clear that such probabilities can be
estimated only with substantial uncertainties, so that results
which do not show a very strong skewing ("overwhelmingly greater
than chance frequency", in Greenberg's terms) should be
interpreted cautiously.</p>
<p><br>
</p>
<p>Best,</p>
<p>Martin</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 18.11.25 10:23, Omri Amiraz via
Lingtyp wrote:<br>
</div>
<blockquote type="cite"
cite="mid:AMBPR09MB81827399627C73F280A75FB285D6A@AMBPR09MB8182.eurprd09.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
Dear Colleagues,</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
I would like to raise the question of how cross-linguistic
frequencies of typological features ought to be reported. The
issue has been discussed extensively, but I still find some
aspects conceptually confusing, so I hope this discussion might
be helpful for others as well.</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
To make this concrete, consider the order of object and verb
(OV, VO, no dominant order). Suppose, for the sake of argument,
that we have complete data for every language in Glottolog. This
would give us the
<i>actual</i> proportion of languages that are OV vs. VO in the
present-day world. The core problem, however, is that languages
are not independent datapoints, so these actual frequencies also
reflect genealogical and areal biases.</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
For that reason, it is common practice to report <i>adjusted</i> frequencies
instead, either through non-proportional stratified sampling
(Dryer 2018) or through statistical bias controls (Becker &
Guzmán Naranjo 2025). As far as I understand, both methods aim
to estimate something like: <i>If each language were
independent (as if every language were an isolate and had no
contact with its neighbors), what proportion would be OV vs.
VO?</i> In other words, the population being described is not
the set of existing languages but a hypothetical (and
unrealistic) set of independent languages.</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
Now, suppose that the actual frequencies of OV and VO are equal,
but the adjusted frequency of OV is higher. In that case, it
feels counterintuitive to say that OV is more common
cross-linguistically than VO. Perhaps it is clearer to speak in
terms of probabilities rather than proportions, given that the
population is hypothetical. For instance, we might say:
<i>“When genealogical and areal biases are controlled for, the
probability of a language being OV is 0.6".
</i>This means that the chance that a randomly sampled language
isolate with no contact would be OV is 0.6. By contrast, saying
“60% of the world’s languages are OV” when referring to an
adjusted frequency seems potentially misleading.</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
I would appreciate hearing what others in the community think
about how such statistics should ideally be reported.</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
Best regards,<br>
Omri</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
<br>
</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
References:</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
Becker, Laura and Guzmán Naranjo Matías. 2025. Replication and
methodological robustness in quantitative typology.
<i>Linguistic Typology</i>.</div>
<div
style="margin-top: 1em; margin-bottom: 1em; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"
class="elementToProof">
Dryer, Matthew S. 2018. On the order of demonstrative, numeral,
adjective, and noun.
<i>Language</i> 94(4), 798-833.</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre wrap="" class="moz-quote-pre">_______________________________________________
Lingtyp mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>
<a class="moz-txt-link-freetext" href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
<a class="moz-txt-link-freetext" href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></pre>
</body>
</html>