[Lingtyp] Reporting cross-linguistic frequencies

Thu Nov 20 09:22:05 UTC 2025

Hi again,

> When Martin says "we need to look at multiple independent cases of ejectives in high-elevation locations", I assume he means "in high and low elevation locations". This would indeed be a way of testing the hypothesis about a connection between elevation and the presence of ejectives. But I cannot see that it says anything about the probability of a language having ejectives regardless of its history or about the probability of a contactless language having them.

There can also be physiological factors involved, as discussed in the attached paper (Moisik-Dedieu 2017 Anatomical biasing and clicks, Journal of Language Evolution 2017, 37-51).

Randy

> On 20 Nov 2025, at 5:04 AM, Östen Dahl via Lingtyp <lingtyp at listserv.linguistlist.org> wrote:
>
> Martin Haspelmath said:
>
> "But nobody has suggested that the "isolated isolates" fiction should be a "goal of our endeavor". The goal is to estimate the probability of a feature appearing in a language regardless of its history."
>
> What I reacted to was Martin's statement that "we are trying to find out what the quantitative distribution would be if each language were an isolate with no contact to other languages". I am not sure that is this is equivalent to "the probability of a feature appearing in a language regardless of its history". Such a probability would apply to any language, while "an isolate with no contact to other languages" is a language with very specific sociohistoric properties.
>
> When Martin says "we need to look at multiple independent cases of ejectives in high-elevation locations", I assume he means "in high and low elevation locations". This would indeed be a way of testing the hypothesis about a connection between elevation and the presence of ejectives. But I cannot see that it says anything about the probability of a language having ejectives regardless of its history or about the probability of a contactless language having them.
>
> - östen
>
> Från:Lingtyp <lingtyp-bounces at listserv.linguistlist.org>FörMartin Haspelmath via Lingtyp
> Skickat:den 19 november 2025 21:17
> Till:lingtyp at listserv.linguistlist.org
> Ämne:Re: [Lingtyp] Reporting cross-linguistic frequencies
>
> Sorry, I'm confused by two things that have been said.
>
> Michael Cysouw said:
>
> "Any frequency measured depends on many assumptions, including that the current linguistic situation in the world’s languages might very well be different from any situation in the (far) past or (far) future. Even something like estimates of the stable state from a dynamic model of typological transition probabilities (my favourite kind of numbers) is probably just a reflection of the forces influencing languages over the last few thousand years. That is very interesting (I think), but still just one aspect of human language."
>
> Isn't "the stable state from a dynamic model of typological transition probabilities" the same as what Omri talked about in terms of "a hypothetical (and unrealistic) set of independent languages"?
>
> And isn't this the most important indicator in Greenbergian universalist typology? (Of course, one may also be interested in non-universalist questions and use typological notions and results, but this wouldn't be Greenbergian.)
>
> And Östen Dahl said:
>
> "The idea of a language without relatives or neighbours is somewhat reminiscent of “homo economicus” in economics or closer to our concerns, Chomsky’s “ideal speaker-listener in a completely homogeneous speech-community”. Like those phantoms, the isolated isolate can at most serve as a useful temporary construct but can hardly be the final goal of our endeavour."
>
> But nobody has suggested that the "isolated isolates" fiction should be a "goal of our endeavor". The goal is to estimate the probability of a feature appearing in a language regardless of its history. For example, if we want to argue that high elevation makes ejectives more likely (see[Urban & Moran 2021](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0245522)for recent discussion), we need to look at multiple independent cases of ejectives in high-elevation locations. Pointing to a lot of ejectives in the Caucasus is not convincing because ejectives there could be an areal feature that is largely due to contact.
>
> So I don't see any problems with Omri's "isolated isolates" fiction.
>
> Martin
>
> On 19.11.25 16:36, Michael Cysouw via Lingtyp wrote:
>
>> Dear Omri,
>>
>> The real question is what you want to report. There is real value in all numbers that you propose, so simply report them all, explaining how you obtained them.
>>
>> I think that your question arose because of the misconception (in my opinion) that there is something like the “true” frequencies of a typological parameter. They do not exist. Any frequency measured depends on many assumptions, including that the current linguistic situation in the world’s languages might very well be different from any situation in the (far) past or (far) future. Even something like estimates of the stable state from a dynamic model of typological transition probabilities (my favourite kind of numbers) is probably just a reflection of the forces influencing languages over the last few thousand years. That is very interesting (I think), but still just one aspect of human language.
>>
>> best
>>
>> Michael
>>
>> ————————
>>
>> Prof. Dr. Michael Cysouw
>>
>> Forschungszentrum Deutscher Sprachatlas
>>
>> Philipps Universität Marburg
>>
>> Pilgrimstein 16
>>
>> D-35032 Marburg
>>
>> Office: +49-6421-28-22488
>>
>> Secretary: +49-6421-28-22483
>>
>> Email:
>> cysouw at uni-marburg.de
>>
>> Web:
>> www.deutscher-sprachatlas.de/mitarbeiter/cysouw/
>>
>> Web:
>> www.cysouw.de/home/
>>
>> ORCID:
>> orcid.org/0000-0003-3168-4946
>>
>> Standort Biegenstrasse, Gebäude B|05
>>
>> Pilgrimstein 16, Raum 106 (+1/0060)
>>
>> http://www.uni-marburg.de/kontakt/
>>
>> ————————
>>
>>> On 18. Nov 2025, at 10:23, Omri Amiraz via Lingtyp
>>> [<lingtyp at listserv.linguistlist.org>](mailto:lingtyp at listserv.linguistlist.org)
>>> wrote:
>>>
>>> Dear Colleagues,
>>>
>>> I would like to raise the question of how cross-linguistic frequencies of typological features ought to be reported. The issue has been discussed extensively, but I still find some aspects conceptually confusing, so I hope this discussion might be helpful for others as well.
>>>
>>> To make this concrete, consider the order of object and verb (OV, VO, no dominant order). Suppose, for the sake of argument, that we have complete data for every language in Glottolog. This would give us theactual proportion of languages that are OV vs. VO in the present-day world. The core problem, however, is that languages are not independent datapoints, so these actual frequencies also reflect genealogical and areal biases.
>>>
>>> For that reason, it is common practice to report adjusted frequencies instead, either through non-proportional stratified sampling (Dryer 2018) or through statistical bias controls (Becker & Guzmán Naranjo 2025). As far as I understand, both methods aim to estimate something like: If each language were independent (as if every language were an isolate and had no contact with its neighbors), what proportion would be OV vs. VO? In other words, the population being described is not the set of existing languages but a hypothetical (and unrealistic) set of independent languages.
>>>
>>> Now, suppose that the actual frequencies of OV and VO are equal, but the adjusted frequency of OV is higher. In that case, it feels counterintuitive to say that OV is more common cross-linguistically than VO. Perhaps it is clearer to speak in terms of probabilities rather than proportions, given that the population is hypothetical. For instance, we might say: “When genealogical and areal biases are controlled for, the probability of a language being OV is 0.6". This means that the chance that a randomly sampled language isolate with no contact would be OV is 0.6. By contrast, saying “60% of the world’s languages are OV” when referring to an adjusted frequency seems potentially misleading.
>>>
>>> I would appreciate hearing what others in the community think about how such statistics should ideally be reported.
>>>
>>> Best regards,
>>>
>>> Omri
>>>
>>> References:
>>>
>>> Becker, Laura and Guzmán Naranjo Matías. 2025. Replication and methodological robustness in quantitative typology. Linguistic Typology.
>>>
>>> Dryer, Matthew S. 2018. On the order of demonstrative, numeral, adjective, and noun. Language 94(4), 798-833.
>>>
>>> _______________________________________________
>>>
>>> Lingtyp mailing list
>>>
>>> Lingtyp at listserv.linguistlist.org
>>>
>>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>>
>> _______________________________________________
>>
>> Lingtyp mailing list
>>
>> Lingtyp at listserv.linguistlist.org
>>
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
> --
>
> Martin Haspelmath
>
> Max Planck Institute for Evolutionary Anthropology
>
> Deutscher Platz 6
>
> D-04103 Leipzig
>
> https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251120/6690e351/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1424 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251120/6690e351/attachment-0001.p7s>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Moisik-Dedieu 2017 Anatomical biasing and clicks.pdf
Type: application/pdf
Size: 3008304 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251120/6690e351/attachment-0001.pdf>