[Lingtyp] Reporting cross-linguistic frequencies

Thu Nov 20 12:37:58 UTC 2025

Dear colleagues,

As many of you know, I am very much in favour of human intuition and of considering alternative solutions. This is why I often wonder why discussions of this kind rarely ask how colleagues working in language documentation approach classification questions.

It might be illuminating to hear from experienced fieldworkers — for example, colleagues such as Keren Rice or Marianne Mithun, among many others — how they determine OV/VO order in practice: at what stage in documentation they feel confident assigning a language to one group or the other, and what kinds of evidence they rely on. Their approach may or may not align with the assumptions built into adjusted-frequency methods, but precisely for that reason, their insights could be highly valuable.

Best,

Stela

> On 20.11.2025, at 11:14, Volker Gast via Lingtyp <lingtyp at listserv.linguistlist.org> wrote:
> 
> 
> Dear all,
> 
> I think 'baseline probability' could be a good term for the probability of observing OV or VO order in the "isolated isolate".
> 
> But there's a conceptual problem when you look at it from the perspective of the individual. A language user learns the language from someone and receives a certain input. The assumption of an independent baseline probability would amount to assuming something like universal grammar that exists independently of the input. On this assumption it would make sense to assume that the probability of observing VO or OV in a user's output is a function of the baseline probability and the ecological conditions of language acquisition and development.
> 
> Or the baseline probability could correspond to the probability of observing VO or OV at the "initial stage" of a language. The observed proportions could then be treated as a function of the initial stage, transition probabilities and the ecological conditions of development. My problem with this approach is that I believe in a slow-evolution scenario, probably gesture-first, where grammatical categories did not exist at the beginning. So at that stage a language can't be either VO or OV.
> 
> In any case, from a conceptual point of view I think it is sounder to ask "What is the probability of observing VO vs. OV in a given individual's linguistic output" than asking "What is the probability of observing OV vs. VO in a given (grammatical description of a) language". Perhaps you can regard grammars as operationalizations of (idealized) language users (though they are second-order observations so to speak). But I agree with all those who have pointed out that you can't really count languages. You can count language users, and perhaps speech communities; but they, too, are hierarchically structured.
> 
> Best,
> Volker
> 
> ---
> Prof. V. Gast
> http://linktype.iaa.uni-jena.de/VG
> 
> On Thu, 20 Nov 2025, Matías Guzmán Naranjo via Lingtyp wrote:
> 
>> I'll jump in with some thoughts.
>> 
>> 
>> - Dryer's method and ours aim at doing basically the same thing: quantifying what's "left" after removing genetic and areal bias.
>> 
>> - Whether you should call them proportions or adjusted frequencies... I'm not sure that it matters that much? As long as you understand how they were calculated...
>> 
>> - How you want to interpret this "what's left" is debatable, I guess, but I don't think I agree with Jürgen. As far as I can tell it should be compatible with something along the lines of an "isolated isolate" as described by Martin. You can also see them as 'universal' preferences (more or less the same thing?).
>> 
>> - "the probability of a random language having a certain property depends on (or is influenced by, or varies with, etc.) it being related to certain other languages, or being  spoken (or signed) in a particular area" -> In our approach we assumes that the probability of a language L having some feature value F depends on three things: 1) its relatedness to other languages, 2) its contact to other languages, 3) some universal preference for F. Kind of the point of what we do is that we try to estimate each of these factors. [We can add more factors and more structure, but that's the most basic model]
>> 
>> - You can quantify the contribution of the phylogenetic component and the areal component(s) with our techniques, but this is a bit tricky because there is unavoidable overlap in the information each one contains. These measures also have a different meaning than the adjusted frequency and can't be used as a replacement for them, you can use them in addition to.
>> 
>> 
>> Matías
>> 
>> 
>> 
>> El 20/11/25 a las 9:36, Omri Amiraz via Lingtyp escribió:
>>> Dear all,
>>> I agree with Ian that, in addition to genealogical and areal biases, the
>>> very question of what counts as a language versus a dialect is partly
>>> subjective. This makes actual frequencies even more problematic, since we
>>> would obtain different results depending on whether we treat Wu Chinese as
>>> one language or as thirty separate languages, as Ian pointed out.
>>> Juergen wrote: "We can empirically assess the extent to which the
>>> probability of a random language having a certain property depends on (or
>>> is influenced by, or varies with, etc.) it being related to certain other
>>> languages, or being  spoken (or signed) in a particular area."
>>> 
>>> I wonder whether it might be useful to have a measure of the genealogical
>>> and areal spread of a feature, essentially quantifying how broadly it is
>>> distributed across families and regions in the present-day world. Such a
>>> measure might be more straightforward to interpret than an adjusted
>>> frequency/probability, since it is not clear whether the described
>>> population is a hypothetical set of isolated isolates or something else.
>>> 
>>> Is anyone aware of an existing metric that captures genealogical or areal
>>> spread in this way?
>>> 
>>> Best,
>>> Omri
>>> 
>>> _______________________________________________
>>> Lingtyp mailing list
>>> Lingtyp at listserv.linguistlist.org
>>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>> 
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20251120/031e8048/attachment.htm>