[Lingtyp] Areal and phylogenetic *researcher* biases
Volker Gast
volker.gast at uni-jena.de
Sun Sep 29 10:41:57 UTC 2024
Dear Jürgen and others,
I think this is one of the major methodological problems of linguistic
typology (which, if I remember correctly, has been discussed on this
list before). There's no single 'correct' way of analysing a language.
Two linguists working on the same language will often provide very
different analyses, and both may be right in their own ways. It starts
with phonology, where you have a lot of degrees of freedom in, for
instance, minimizing or maximizing phoneme inventories (e.g. by [not]
introducing phonological domains and features operating on these
domains), and it gets worse in morphology, specifically if there is
distributed exponence and other complexities of this type. At the level
of syntax the impact of the specific theoretical background can be seen,
for instance, in publications using the UD corpora. These corpora were
annotated with a specific version of dependency grammar, I think
essentially for pragmatic reasons (dependency grammar was very popular
among computational linguists for a while). The theorerical assumptions
of the annotation model obviously have an impact on the results (just
think of the very old discussion of what a 'subject' is, represented as
the 'nsubj' relation in the UD annotations).
For many languages we only have one description, and the linguist
describing it comes from a specific background or 'school' (and these
schools are often associated with particular areas and particular
phylogenetic groupings, introducing further biases of the type you
mention). Again, the effects are visible at the level of phonology
already. For example, the Papuan language Idi could be described as
having just three vowels, or as having nine vowels (perhaps even more),
depending on your assumptions about phonotactics etc. (There's a
published analysis of that language, by D. Schokkin, N. Evans, C. Döhler
and me, but the analysis really reflects some kind of compromise between
the authors, and it leaves a few non-trivial questions open.)
The specific linguist and their school or background is a source of
statistical non-independence. Even relying on exactly one description
per language, and having the data coded by several researchers, often
leads to low inter-annotator agreement in my experience.
I think we need to be aware that typological data is behavioural data at
three layers: (i) language is a behavioural activity, (ii) describing a
language is a behavioural activity, and (iii) extracting information
from descriptions is another behavioural activity. Variance occurs at
all levels and is multiplied in the process from (i) to (iii).
Approximately determining the amount of variance of that type would be a
major project. For instance, we could have five undocumented
(unstandardized) languages described by five linguists each, using data
from five different speakers per language. Many will think that this
would be a waste of resources, given the number of (varieties) of
languages that still await description.
What follows from all this, in my view, is that we need to be careful in
applying statistical analyses "blindly". Linguistics is not a natural
science. Given the large amount of inherent variance in typological data
we linguists should remain in the driver's seat and use quantitative
typological evidence as an assistance system, being aware of its limits
and possibilities, rather than take a back seat and let the autopilot drive.
Best,
Volker
Am 28.09.2024 um 20:17 schrieb Juergen Bohnemeyer via Lingtyp:
>
> Dear all – I’m wondering whether anybody has attempted to estimate the
> size of the following putative effect on descriptive and typological
> research:
>
> Suppose there is a particular phenomenon in Language L, the known
> properties of which are equally compatible with an analysis in terms
> of construction types (comparative concepts) A and B.
>
> Suppose furthermore that L belongs to a language family and/or
> linguistic area such that A has much more commonly been invoked in
> descriptions of languages of that family/area than B.
>
> Then to the extent that a researcher attempting to adjudicate between
> A and B wrt. L (whether in a description of L, in a typological study,
> or in coding for an evolving typological database) is aware of the
> prevalence of A-coding/analyses for languages of the family/area in
> question, that might make them more likely to code/analyze L as
> exhibiting A as well.
>
> So for example, a researcher who assumes languages of the family/area
> of L to be typically tenseless may be influenced by this assumption
> and as a result become (however slightly) more likely to treat L as
> tenseless as well. In contrast, if she assumes languages of the
> family/area of L to be typically tensed, that might make her ever so
> slightly more likely to analyze L also as tensed.
>
> It seems to me that this is a cognitive bias related to, and possibly
> a case of, essentialism. (And just as in the case of (other forms of)
> essentialism, the actual cognitive causes/mechanisms of the bias may
> vary.)
>
> But regardless, my question is, again, has anybody tried to guestimate
> to what extent the results of current typological studies may be
> warped by this kind of researcher bias? (Note that the bias may be
> affecting both authors of descriptive work and typologists using
> descriptive work as data, so there is a possible double-whammy effect.)
>
> Thanks! – Juergen
>
> Juergen Bohnemeyer (He/Him)
> Professor, Department of Linguistics
> University at Buffalo
>
> Office: 642 Baldy Hall, UB North Campus
> Mailing address: 609 Baldy Hall, Buffalo, NY 14260
> Phone: (716) 645 0127
> Fax: (716) 645 3825
> Email: jb77 at buffalo.edu <mailto:jb77 at buffalo.edu>
> Web: http://www.acsu.buffalo.edu/~jb77/
> <http://www.acsu.buffalo.edu/~jb77/>
>
> Office hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom (Meeting ID
> 585 520 2411; Passcode Hoorheh)
>
> There’s A Crack In Everything - That’s How The Light Gets In
> (Leonard Cohen)
>
> --
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20240929/73a91780/attachment.htm>
More information about the Lingtyp
mailing list