[Lingtyp] Reporting cross-linguistic frequencies
Volker Gast
volker.gast at uni-jena.de
Thu Nov 20 10:14:09 UTC 2025
Dear all,
I think 'baseline probability' could be a good term for the probability of
observing OV or VO order in the "isolated isolate".
But there's a conceptual problem when you look at it from the perspective
of the individual. A language user learns the language from someone and
receives a certain input. The assumption of an independent baseline
probability would amount to assuming something like universal grammar that
exists independently of the input. On this assumption it would make sense
to assume that the probability of observing VO or OV in a user's output is
a function of the baseline probability and the ecological conditions of
language acquisition and development.
Or the baseline probability could correspond to the probability of
observing VO or OV at the "initial stage" of a language. The observed
proportions could then be treated as a function of the initial stage,
transition probabilities and the ecological conditions of development. My
problem with this approach is that I believe in a slow-evolution scenario,
probably gesture-first, where grammatical categories did not exist at the
beginning. So at that stage a language can't be either VO or OV.
In any case, from a conceptual point of view I think it is sounder to ask
"What is the probability of observing VO vs. OV in a given individual's
linguistic output" than asking "What is the probability of observing OV
vs. VO in a given (grammatical description of a) language". Perhaps you
can regard grammars as operationalizations of (idealized) language users
(though they are second-order observations so to speak). But I agree with
all those who have pointed out that you can't really count languages. You
can count language users, and perhaps speech communities; but they, too,
are hierarchically structured.
Best,
Volker
---
Prof. V. Gast
http://linktype.iaa.uni-jena.de/VG
On Thu, 20 Nov 2025, Matías Guzmán Naranjo via Lingtyp wrote:
> I'll jump in with some thoughts.
>
>
> - Dryer's method and ours aim at doing basically the same thing: quantifying
> what's "left" after removing genetic and areal bias.
>
> - Whether you should call them proportions or adjusted frequencies... I'm not
> sure that it matters that much? As long as you understand how they were
> calculated...
>
> - How you want to interpret this "what's left" is debatable, I guess, but I
> don't think I agree with Jürgen. As far as I can tell it should be compatible
> with something along the lines of an "isolated isolate" as described by
> Martin. You can also see them as 'universal' preferences (more or less the
> same thing?).
>
> - "the probability of a random language having a certain property depends on
> (or is influenced by, or varies with, etc.) it being related to certain other
> languages, or being spoken (or signed) in a particular area" -> In our
> approach we assumes that the probability of a language L having some feature
> value F depends on three things: 1) its relatedness to other languages, 2)
> its contact to other languages, 3) some universal preference for F. Kind of
> the point of what we do is that we try to estimate each of these factors. [We
> can add more factors and more structure, but that's the most basic model]
>
> - You can quantify the contribution of the phylogenetic component and the
> areal component(s) with our techniques, but this is a bit tricky because
> there is unavoidable overlap in the information each one contains. These
> measures also have a different meaning than the adjusted frequency and can't
> be used as a replacement for them, you can use them in addition to.
>
>
> Matías
>
>
>
> El 20/11/25 a las 9:36, Omri Amiraz via Lingtyp escribió:
>> Dear all,
>> I agree with Ian that, in addition to genealogical and areal biases, the
>> very question of what counts as a language versus a dialect is partly
>> subjective. This makes actual frequencies even more problematic, since we
>> would obtain different results depending on whether we treat Wu Chinese as
>> one language or as thirty separate languages, as Ian pointed out.
>> Juergen wrote: "We can empirically assess the extent to which the
>> probability of a random language having a certain property depends on (or
>> is influenced by, or varies with, etc.) it being related to certain other
>> languages, or being spoken (or signed) in a particular area."
>>
>> I wonder whether it might be useful to have a measure of the genealogical
>> and areal spread of a feature, essentially quantifying how broadly it is
>> distributed across families and regions in the present-day world. Such a
>> measure might be more straightforward to interpret than an adjusted
>> frequency/probability, since it is not clear whether the described
>> population is a hypothetical set of isolated isolates or something else.
>>
>> Is anyone aware of an existing metric that captures genealogical or areal
>> spread in this way?
>>
>> Best,
>> Omri
>>
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
>
More information about the Lingtyp
mailing list