[Lingtyp] Reporting cross-linguistic frequencies
Martin Haspelmath
martin_haspelmath at eva.mpg.de
Tue Nov 18 15:08:51 UTC 2025
I agree with Omri that it would be better to say things like "/When
genealogical and areal biases are controlled for, the probability of a
language being OV is 0.6"." /Indeed, we are trying to find out what the
quantitative distribution would be if each language were an isolate with
no contact to other languages.
Some authors have said that instead of striving for independence of
sample languages, we should base our conclusions on inferred changes in
larger families (this is also called "phylogenetic approach", and is
represented, for example, by the recent paper by Verkerk et al. 2025
<https://www.nature.com/articles/s41562-025-02325-z>). But these changes
are rarely independent of each other (because related languages tend to
stay in geographic proximity), so I'm not sure
<https://dlc.hypotheses.org/2368> that much is gained by this approach.
(Moreover, it only works if one has a very large amount of data.)
Be that as it may, it is clear that such probabilities can be estimated
only with substantial uncertainties, so that results which do not show a
very strong skewing ("overwhelmingly greater than chance frequency", in
Greenberg's terms) should be interpreted cautiously.
Best,
Martin
On 18.11.25 10:23, Omri Amiraz via Lingtyp wrote:
> Dear Colleagues,
> I would like to raise the question of how cross-linguistic frequencies
> of typological features ought to be reported. The issue has been
> discussed extensively, but I still find some aspects conceptually
> confusing, so I hope this discussion might be helpful for others as well.
> To make this concrete, consider the order of object and verb (OV, VO,
> no dominant order). Suppose, for the sake of argument, that we have
> complete data for every language in Glottolog. This would give us the
> /actual/ proportion of languages that are OV vs. VO in the present-day
> world. The core problem, however, is that languages are not
> independent datapoints, so these actual frequencies also reflect
> genealogical and areal biases.
> For that reason, it is common practice to report
> /adjusted/ frequencies instead, either through non-proportional
> stratified sampling (Dryer 2018) or through statistical bias controls
> (Becker & Guzmán Naranjo 2025). As far as I understand, both methods
> aim to estimate something like: /If each language were independent (as
> if every language were an isolate and had no contact with its
> neighbors), what proportion would be OV vs. VO?/ In other words, the
> population being described is not the set of existing languages but a
> hypothetical (and unrealistic) set of independent languages.
> Now, suppose that the actual frequencies of OV and VO are equal, but
> the adjusted frequency of OV is higher. In that case, it feels
> counterintuitive to say that OV is more common cross-linguistically
> than VO. Perhaps it is clearer to speak in terms of probabilities
> rather than proportions, given that the population is hypothetical.
> For instance, we might say: /“When genealogical and areal biases are
> controlled for, the probability of a language being OV is 0.6". /This
> means that the chance that a randomly sampled language isolate with no
> contact would be OV is 0.6. By contrast, saying “60% of the world’s
> languages are OV” when referring to an adjusted frequency seems
> potentially misleading.
> I would appreciate hearing what others in the community think about
> how such statistics should ideally be reported.
> Best regards,
> Omri
>
> References:
> Becker, Laura and Guzmán Naranjo Matías. 2025. Replication and
> methodological robustness in quantitative typology. /Linguistic Typology/.
> Dryer, Matthew S. 2018. On the order of demonstrative, numeral,
> adjective, and noun. /Language/ 94(4), 798-833.
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
--
Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251118/0924819a/attachment.htm>
More information about the Lingtyp
mailing list