[Lingtyp] Areal and phylogenetic researcher biases

Mon Sep 30 21:04:59 UTC 2024

I have to disagree about the point that "the *classifications* should not
be different if the different linguists have access to the same
information".

In many Himalayan languages low tone is associated with breathy voice, and
voicing is (stochastically) predictable from tone.
The one language can then be analysed (has been analysed) by linguists from
different descriptive backgrounds as having

a. ph vs. p consonant manners, with contrastive tone,
or
b. ph vs. p vs. b vs. bh consonant manners, with no contrastive tone.

Under a., the language is classified as having tone.
Under b., the language is not classified as having tone.

I'm thinking of Tamang.

Mazaudon, Martine. 1973. Phonologie Tamang: Etude phonologique du dialecte
tamang de Risiangky (langue tibéto-birmane du Népal). Paris: Centre
National de la Recherche Scientifique, Société d‘Études Linguistiques et
Anthropologiques de France.

Michaud, Alexis, and Martine Mazaudon. 2006. Pitch and voice quality
characteristics of the lexical word-tones of Tamang, as compared with level
tones (Naxi data) and pitch-plus-voice-quality tones (Vietnamese
data). *Proceedings
of Speech Prosody 2006, Dresden*, 823-826. Available online at:
https://sprosig.org/sp2006/contents/papers/PS7-18_0137.pdf.

Poudel, Kedar Prasad. 2006. *Dhankute Tamang Grammar*. Munich: Lincom
Europa.

-Mark

On Mon, 30 Sept 2024 at 23:12, Martin Haspelmath via Lingtyp <
lingtyp at listserv.linguistlist.org> wrote:

> Of course, "areal/phylogenetic researcher bias (APRB)" exists, and during
> the Grambank coding, I often heard Hedvig Skirgård talk about it as a
> potential issue. (I don't remember if it was addressed in a specific way,
> though.)
>
> I don't know if it can be measured somehow (given the enormous diversity
> of researcher traditions, I'm a bit skeptical), but I think it can be
> mitigated if we are aware that the purpose of comparative concepts in
> typology is NOT to provide *analyses* – rather, it is to enable us to
> *classify* languages.
>
> Volker Gast rightly says: "Two linguists working on the same language
> will often provide very different analyses, and both may be right in their
> own ways."
>
> But while the *analyses* may well be different (because of the well-known
> non-uniqueness problem first highlighted by Yuen-Ren Chao in 1934:
> https://dlc.hypotheses.org/3381), the *classifications* should not be
> different if the different linguists have access to the same information.
>
> I wrote about this in the following blogpost, where I note that the
> "difficulties of classification" that typologists talk about are typically
> due to the unclarity of the comparative concepts, not necessarily to lack
> of data: https://dlc.hypotheses.org/2528.
>
> In practice, of course, different linguists do not have access to the same
> kinds of data, and subjectiveness cannot be excluded entirely. However, if
> we are careful to distinguish between analyses/descriptions (at the
> p-level) and classifications and cross-linguistic generalizations (at the
> g-level), some problems will go away.
>
> Best,
>
> Martin
> On 29.09.24 12:41, Volker Gast via Lingtyp wrote:
>
> Dear Jürgen and others,
>
> I think this is one of the major methodological problems of linguistic
> typology (which, if I remember correctly, has been discussed on this list
> before). There's no single 'correct' way of analysing a language. Two
> linguists working on the same language will often provide very different
> analyses, and both may be right in their own ways. It starts with
> phonology, where you have a lot of degrees of freedom in, for instance,
> minimizing or maximizing phoneme inventories (e.g. by [not] introducing
> phonological domains and features operating on these domains), and it gets
> worse in morphology, specifically if there is distributed exponence and
> other complexities of this type. At the level of syntax the impact of the
> specific theoretical background can be seen, for instance, in publications
> using the UD corpora. These corpora were annotated with a specific version
> of dependency grammar, I think essentially for pragmatic reasons
> (dependency grammar was very popular among computational linguists for a
> while). The theorerical assumptions of the annotation model obviously have
> an impact on the results (just think of the very old discussion of what a
> 'subject' is, represented as the 'nsubj' relation in the UD annotations).
>
> For many languages we only have one description, and the linguist
> describing it comes from a specific background or 'school' (and these
> schools are often associated with particular areas and particular
> phylogenetic groupings, introducing further biases of the type you
> mention). Again, the effects are visible at the level of phonology already.
> For example, the Papuan language Idi could be described as having just
> three vowels, or as having nine vowels (perhaps even more), depending on
> your assumptions about phonotactics etc. (There's a published analysis of
> that language, by D. Schokkin, N. Evans, C. Döhler and me, but the analysis
> really reflects some kind of compromise between the authors, and it leaves
> a few non-trivial questions open.)
>
> The specific linguist and their school or background is a source of
> statistical non-independence. Even relying on exactly one description per
> language, and having the data coded by several researchers, often leads to
> low inter-annotator agreement in my experience.
>
> I think we need to be aware that typological data is behavioural data at
> three layers: (i) language is a behavioural activity, (ii) describing a
> language is a behavioural activity, and (iii) extracting information from
> descriptions is another behavioural activity. Variance occurs at all levels
> and is multiplied in the process from (i) to (iii).
>
> Approximately determining the amount of variance of that type would be a
> major project. For instance, we could have five undocumented
> (unstandardized) languages described by five linguists each, using data
> from five different speakers per language. Many will think that this would
> be a waste of resources, given the number of (varieties) of languages that
> still await description.
>
> What follows from all this, in my view, is that we need to be careful in
> applying statistical analyses "blindly". Linguistics is not a natural
> science. Given the large amount of inherent variance in typological data we
> linguists should remain in the driver's seat and use quantitative
> typological evidence as an assistance system, being aware of its limits and
> possibilities, rather than take a back seat and let the autopilot drive.
>
> Best,
> Volker (Gast)
>
>
> Am 28.09.2024 um 20:17 schrieb Juergen Bohnemeyer via Lingtyp:
>
> Dear all – I’m wondering whether anybody has attempted to estimate the
> size of the following putative effect on descriptive and typological
> research:
>
>
>
> Suppose there is a particular phenomenon in Language L, the known
> properties of which are equally compatible with an analysis in terms of
> construction types (comparative concepts) A and B.
>
>
>
> Suppose furthermore that L belongs to a language family and/or linguistic
> area such that A has much more commonly been invoked in descriptions of
> languages of that family/area than B.
>
>
>
> Then to the extent that a researcher attempting to adjudicate between A
> and B wrt. L (whether in a description of L, in a typological study, or in
> coding for an evolving typological database) is aware of the prevalence of
> A-coding/analyses for languages of the family/area in question, that might
> make them more likely to code/analyze L as exhibiting A as well.
>
>
>
> So for example, a researcher who assumes languages of the family/area of L
> to be typically tenseless may be influenced by this assumption and as a
> result become (however slightly) more likely to treat L as tenseless as
> well. In contrast, if she assumes languages of the family/area of L to be
> typically tensed, that might make her ever so slightly more likely to
> analyze L also as tensed.
>
>
>
> It seems to me that this is a cognitive bias related to, and possibly a
> case of, essentialism. (And just as in the case of (other forms of)
> essentialism, the actual cognitive causes/mechanisms of the bias may vary.)
>
>
>
> But regardless, my question is, again, has anybody tried to guestimate to
> what extent the results of current typological studies may be warped by
> this kind of researcher bias? (Note that the bias may be affecting both
> authors of descriptive work and typologists using descriptive work as data,
> so there is a possible double-whammy effect.)
>
>
>
> Thanks! – Juergen
>
>
>
>
>
> Juergen Bohnemeyer (He/Him)
> Professor, Department of Linguistics
> University at Buffalo
>
> --
> Martin Haspelmath
> Max Planck Institute for Evolutionary Anthropology
> Deutscher Platz 6
> D-04103 Leipzighttps://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20241001/84a25f48/attachment.htm>

[Lingtyp] Areal and phylogenetic *researcher* biases

[Lingtyp] Areal and phylogenetic researcher biases