[Lingtyp] Areal and phylogenetic researcher biases

Sun Sep 29 10:41:57 UTC 2024

Dear Jürgen and others,

I think this is one of the major methodological problems of linguistic 
typology (which, if I remember correctly, has been discussed on this 
list before). There's no single 'correct' way of analysing a language. 
Two linguists working on the same language will often provide very 
different analyses, and both may be right in their own ways. It starts 
with phonology, where you have a lot of degrees of freedom in, for 
instance, minimizing or maximizing phoneme inventories (e.g. by [not] 
introducing phonological domains and features operating on these 
domains), and it gets worse in morphology, specifically if there is 
distributed exponence and other complexities of this type. At the level 
of syntax the impact of the specific theoretical background can be seen, 
for instance, in publications using the UD corpora. These corpora were 
annotated with a specific version of dependency grammar, I think 
essentially for pragmatic reasons (dependency grammar was very popular 
among computational linguists for a while). The theorerical assumptions 
of the annotation model obviously have an impact on the results (just 
think of the very old discussion of what a 'subject' is, represented as 
the 'nsubj' relation in the UD annotations).

For many languages we only have one description, and the linguist 
describing it comes from a specific background or 'school' (and these 
schools are often associated with particular areas and particular 
phylogenetic groupings, introducing further biases of the type you 
mention). Again, the effects are visible at the level of phonology 
already. For example, the Papuan language Idi could be described as 
having just three vowels, or as having nine vowels (perhaps even more), 
depending on your assumptions about phonotactics etc. (There's a 
published analysis of that language, by D. Schokkin, N. Evans, C. Döhler 
and me, but the analysis really reflects some kind of compromise between 
the authors, and it leaves a few non-trivial questions open.)

The specific linguist and their school or background is a source of 
statistical non-independence. Even relying on exactly one description 
per language, and having the data coded by several researchers, often 
leads to low inter-annotator agreement in my experience.

I think we need to be aware that typological data is behavioural data at 
three layers: (i) language is a behavioural activity, (ii) describing a 
language is a behavioural activity, and (iii) extracting information 
from descriptions is another behavioural activity. Variance occurs at 
all levels and is multiplied in the process from (i) to (iii).

Approximately determining the amount of variance of that type would be a 
major project. For instance, we could have five undocumented 
(unstandardized) languages described by five linguists each, using data 
from five different speakers per language. Many will think that this 
would be a waste of resources, given the number of (varieties) of 
languages that still await description.

What follows from all this, in my view, is that we need to be careful in 
applying statistical analyses "blindly". Linguistics is not a natural 
science. Given the large amount of inherent variance in typological data 
we linguists should remain in the driver's seat and use quantitative 
typological evidence as an assistance system, being aware of its limits 
and possibilities, rather than take a back seat and let the autopilot drive.

Best,
Volker

Am 28.09.2024 um 20:17 schrieb Juergen Bohnemeyer via Lingtyp:
>
> Dear all – I’m wondering whether anybody has attempted to estimate the 
> size of the following putative effect on descriptive and typological 
> research:
>
> Suppose there is a particular phenomenon in Language L, the known 
> properties of which are equally compatible with an analysis in terms 
> of construction types (comparative concepts) A and B.
>
> Suppose furthermore that L belongs to a language family and/or 
> linguistic area such that A has much more commonly been invoked in 
> descriptions of languages of that family/area than B.
>
> Then to the extent that a researcher attempting to adjudicate between 
> A and B wrt. L (whether in a description of L, in a typological study, 
> or in coding for an evolving typological database) is aware of the 
> prevalence of A-coding/analyses for languages of the family/area in 
> question, that might make them more likely to code/analyze L as 
> exhibiting A as well.
>
> So for example, a researcher who assumes languages of the family/area 
> of L to be typically tenseless may be influenced by this assumption 
> and as a result become (however slightly) more likely to treat L as 
> tenseless as well. In contrast, if she assumes languages of the 
> family/area of L to be typically tensed, that might make her ever so 
> slightly more likely to analyze L also as tensed.
>
> It seems to me that this is a cognitive bias related to, and possibly 
> a case of, essentialism. (And just as in the case of (other forms of) 
> essentialism, the actual cognitive causes/mechanisms of the bias may 
> vary.)
>
> But regardless, my question is, again, has anybody tried to guestimate 
> to what extent the results of current typological studies may be 
> warped by this kind of researcher bias? (Note that the bias may be 
> affecting both authors of descriptive work and typologists using 
> descriptive work as data, so there is a possible double-whammy effect.)
>
> Thanks! – Juergen
>
> Juergen Bohnemeyer (He/Him)
> Professor, Department of Linguistics
> University at Buffalo
>
> Office: 642 Baldy Hall, UB North Campus
> Mailing address: 609 Baldy Hall, Buffalo, NY 14260
> Phone: (716) 645 0127
> Fax: (716) 645 3825
> Email: jb77 at buffalo.edu <mailto:jb77 at buffalo.edu>
> Web: http://www.acsu.buffalo.edu/~jb77/ 
> <http://www.acsu.buffalo.edu/~jb77/>
>
> Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID 
> 585 520 2411; Passcode Hoorheh)
>
> There’s A Crack In Everything - That’s How The Light Gets In
> (Leonard Cohen)
>
> -- 
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20240929/73a91780/attachment.htm>

[Lingtyp] Areal and phylogenetic *researcher* biases

[Lingtyp] Areal and phylogenetic researcher biases