[Lingtyp] Reporting cross-linguistic frequencies

Thu Nov 20 09:01:04 UTC 2025

I'll jump in with some thoughts.

- Dryer's method and ours aim at doing basically the same thing: 
quantifying what's "left" after removing genetic and areal bias.

- Whether you should call them proportions or adjusted frequencies... 
I'm not sure that it matters that much? As long as you understand how 
they were calculated...

- How you want to interpret this "what's left" is debatable, I guess, 
but I don't think I agree with Jürgen. As far as I can tell it should be 
compatible with something along the lines of an "isolated isolate" as 
described by Martin. You can also see them as 'universal' preferences 
(more or less the same thing?).

- "the probability of a random language having a certain property 
depends on (or is influenced by, or varies with, etc.) it being related 
to certain other languages, or being  spoken (or signed) in a particular 
area" -> In our approach we assumes that the probability of a language L 
having some feature value F depends on three things: 1) its relatedness 
to other languages, 2) its contact to other languages, 3) some universal 
preference for F. Kind of the point of what we do is that we try to 
estimate each of these factors. [We can add more factors and more 
structure, but that's the most basic model]

- You can quantify the contribution of the phylogenetic component and 
the areal component(s) with our techniques, but this is a bit tricky 
because there is unavoidable overlap in the information each one 
contains. These measures also have a different meaning than the adjusted 
frequency and can't be used as a replacement for them, you can use them 
in addition to.

Matías

El 20/11/25 a las 9:36, Omri Amiraz via Lingtyp escribió:
> Dear all,
> I agree with Ian that, in addition to genealogical and areal biases, 
> the very question of what counts as a language versus a dialect is 
> partly subjective. This makes actual frequencies even more 
> problematic, since we would obtain different results depending on 
> whether we treat Wu Chinese as one language or as thirty separate 
> languages, as Ian pointed out.
> Juergen wrote: "We can empirically assess the extent to which the 
> probability of a random language having a certain property depends on 
> (or is influenced by, or varies with, etc.) it being related to 
> certain other languages, or being  spoken (or signed) in a particular 
> area."
>
> I wonder whether it might be useful to have a measure of the 
> genealogical and areal spread of a feature, essentially quantifying 
> how broadly it is distributed across families and regions in the 
> present-day world. Such a measure might be more straightforward to 
> interpret than an adjusted frequency/probability, since it is not 
> clear whether the described population is a hypothetical set of 
> isolated isolates or something else.
>
> Is anyone aware of an existing metric that captures genealogical or 
> areal spread in this way?
>
> Best,
> Omri
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp