[Lingtyp] Areal and phylogenetic *researcher* biases
Martin Haspelmath
martin_haspelmath at eva.mpg.de
Tue Oct 1 07:54:26 UTC 2024
Thanks, Mark, for bringing up this concrete example! Indeed, such
questions often arise in comparative work, both in phonology and
morphosyntax. But I think the answer is always the same: Comparison
cannot be in terms of structural CONTRASTS, but must ultimately be in
terms of (phonetic and conceptual-functional) SUBSTANCE.
I can recommend the following two articles. The first deals with
meanings and diachrony, though Bybee has also argued for phonetic
substance as key to understanding phonology. The second is more general.
Bybee, Joan L. 1988. Semantic substance vs. contrast in the development
of grammatical meaning. /Berkeley Linguistics Society/ 14. 247–264.
Boye, Kasper & Engberg-Pedersen, Elisabeth. 2016. Substance and
structure in linguistics. /Acta Linguistica Hafniensia/ 48(1). 5–6.
(doi:10.1080/03740463.2016.1202014
<https://doi.org/10.1080/03740463.2016.1202014>)
Specifically for phonology, there is a 2018 book on typology edited by
Larry Hyman and Frans Plank
(https://www.degruyter.com/document/doi/10.1515/9783110451931/html),
which includes papers by Kiparsky and Maddieson that discuss the
conceptual foundations of phonological comparison.
Kiparsky says that "there are no non-analytic universals of language.
All universals are analytic, and their validity often turns on a set of
critical cases where different solutions can be and have been
entertained", though confusingly, he says "descriptive" instead of
"non-analytic". The issues are discussed further in my 2019 blogpost:
https://dlc.hypotheses.org/1817
(I admit, however, that I'm not sure what exactly this means for the
typology of "tone" and "obstruent (breathy) voicing". It may ultimately
mean that traditional typologies in terms of these notions need to be
revised quite thoroughly.)
Best,
Martin (Haspelmath)
On 30.09.24 23:04, Mark Donohue wrote:
> I have to disagree about the point that "the *classifications* should
> not be different if the different linguists have access to the same
> information".
>
> In many Himalayan languages low tone is associated with breathy voice,
> and voicing is (stochastically) predictable from tone.
> The one language can then be analysed (has been analysed) by linguists
> from different descriptive backgrounds as having
>
> a. ph vs. p consonant manners, with contrastive tone,
> or
> b. ph vs. p vs. b vs. bh consonant manners, with no contrastive tone.
>
> Under a., the language is classified as having tone.
> Under b., the language is not classified as having tone.
>
> I'm thinking of Tamang.
>
> Mazaudon, Martine. 1973. Phonologie Tamang: Etude phonologique du
> dialecte tamang de Risiangky (langue tibéto-birmane du Népal). Paris:
> Centre National de la Recherche Scientifique, Société d‘Études
> Linguistiques et Anthropologiques de France.
>
> Michaud, Alexis, and Martine Mazaudon. 2006. Pitch and voice quality
> characteristics of the lexical word-tones of Tamang, as compared with
> level tones (Naxi data) and pitch-plus-voice-quality tones (Vietnamese
> data)./Proceedings of Speech Prosody 2006, Dresden/, 823-826.
> Available online at:
> https://sprosig.org/sp2006/contents/papers/PS7-18_0137.pdf.
>
> Poudel, Kedar Prasad. 2006./Dhankute Tamang Grammar/. Munich: Lincom
> Europa.
>
>
> - Mark (Donohue)
>
>
> On Mon, 30 Sept 2024 at 23:12, Martin Haspelmath via Lingtyp
> <lingtyp at listserv.linguistlist.org> wrote:
>
> Of course, "areal/phylogenetic researcher bias (APRB)" exists, and
> during the Grambank coding, I often heard Hedvig Skirgård talk
> about it as a potential issue. (I don't remember if it was
> addressed in a specific way, though.)
>
> I don't know if it can be measured somehow (given the enormous
> diversity of researcher traditions, I'm a bit skeptical), but I
> think it can be mitigated if we are aware that the purpose of
> comparative concepts in typology is NOT to provide *analyses* –
> rather, it is to enable us to *classify* languages.
>
> Volker Gast rightly says: "Two linguists working on the same
> language will often provide very different analyses, and both may
> be right in their own ways."
>
> But while the *analyses* may well be different (because of the
> well-known non-uniqueness problem first highlighted by Yuen-Ren
> Chao in 1934: https://dlc.hypotheses.org/3381), the
> *classifications* should not be different if the different
> linguists have access to the same information.
>
> I wrote about this in the following blogpost, where I note that
> the "difficulties of classification" that typologists talk about
> are typically due to the unclarity of the comparative concepts,
> not necessarily to lack of data: https://dlc.hypotheses.org/2528.
>
> In practice, of course, different linguists do not have access to
> the same kinds of data, and subjectiveness cannot be excluded
> entirely. However, if we are careful to distinguish between
> analyses/descriptions (at the p-level) and classifications and
> cross-linguistic generalizations (at the g-level), some problems
> will go away.
>
> Best,
>
> Martin
>
> On 29.09.24 12:41, Volker Gast via Lingtyp wrote:
>>
>> Dear Jürgen and others,
>>
>> I think this is one of the major methodological problems of
>> linguistic typology (which, if I remember correctly, has been
>> discussed on this list before). There's no single 'correct' way
>> of analysing a language. Two linguists working on the same
>> language will often provide very different analyses, and both may
>> be right in their own ways. It starts with phonology, where you
>> have a lot of degrees of freedom in, for instance, minimizing or
>> maximizing phoneme inventories (e.g. by [not] introducing
>> phonological domains and features operating on these domains),
>> and it gets worse in morphology, specifically if there is
>> distributed exponence and other complexities of this type. At the
>> level of syntax the impact of the specific theoretical background
>> can be seen, for instance, in publications using the UD corpora.
>> These corpora were annotated with a specific version of
>> dependency grammar, I think essentially for pragmatic reasons
>> (dependency grammar was very popular among computational
>> linguists for a while). The theorerical assumptions of the
>> annotation model obviously have an impact on the results (just
>> think of the very old discussion of what a 'subject' is,
>> represented as the 'nsubj' relation in the UD annotations).
>>
>> For many languages we only have one description, and the linguist
>> describing it comes from a specific background or 'school' (and
>> these schools are often associated with particular areas and
>> particular phylogenetic groupings, introducing further biases of
>> the type you mention). Again, the effects are visible at the
>> level of phonology already. For example, the Papuan language Idi
>> could be described as having just three vowels, or as having nine
>> vowels (perhaps even more), depending on your assumptions about
>> phonotactics etc. (There's a published analysis of that language,
>> by D. Schokkin, N. Evans, C. Döhler and me, but the analysis
>> really reflects some kind of compromise between the authors, and
>> it leaves a few non-trivial questions open.)
>>
>> The specific linguist and their school or background is a source
>> of statistical non-independence. Even relying on exactly one
>> description per language, and having the data coded by several
>> researchers, often leads to low inter-annotator agreement in my
>> experience.
>>
>> I think we need to be aware that typological data is behavioural
>> data at three layers: (i) language is a behavioural activity,
>> (ii) describing a language is a behavioural activity, and (iii)
>> extracting information from descriptions is another behavioural
>> activity. Variance occurs at all levels and is multiplied in the
>> process from (i) to (iii).
>>
>> Approximately determining the amount of variance of that type
>> would be a major project. For instance, we could have five
>> undocumented (unstandardized) languages described by five
>> linguists each, using data from five different speakers per
>> language. Many will think that this would be a waste of
>> resources, given the number of (varieties) of languages that
>> still await description.
>>
>> What follows from all this, in my view, is that we need to be
>> careful in applying statistical analyses "blindly". Linguistics
>> is not a natural science. Given the large amount of inherent
>> variance in typological data we linguists should remain in the
>> driver's seat and use quantitative typological evidence as an
>> assistance system, being aware of its limits and possibilities,
>> rather than take a back seat and let the autopilot drive.
>>
>> Best,
>> Volker (Gast)
>>
>>
>> Am 28.09.2024 um 20:17 schrieb Juergen Bohnemeyer via Lingtyp:
>>>
>>> Dear all – I’m wondering whether anybody has attempted to
>>> estimate the size of the following putative effect on
>>> descriptive and typological research:
>>>
>>> Suppose there is a particular phenomenon in Language L, the
>>> known properties of which are equally compatible with an
>>> analysis in terms of construction types (comparative concepts) A
>>> and B.
>>>
>>> Suppose furthermore that L belongs to a language family and/or
>>> linguistic area such that A has much more commonly been invoked
>>> in descriptions of languages of that family/area than B.
>>>
>>> Then to the extent that a researcher attempting to adjudicate
>>> between A and B wrt. L (whether in a description of L, in a
>>> typological study, or in coding for an evolving typological
>>> database) is aware of the prevalence of A-coding/analyses for
>>> languages of the family/area in question, that might make them
>>> more likely to code/analyze L as exhibiting A as well.
>>>
>>> So for example, a researcher who assumes languages of the
>>> family/area of L to be typically tenseless may be influenced by
>>> this assumption and as a result become (however slightly) more
>>> likely to treat L as tenseless as well. In contrast, if she
>>> assumes languages of the family/area of L to be typically
>>> tensed, that might make her ever so slightly more likely to
>>> analyze L also as tensed.
>>>
>>> It seems to me that this is a cognitive bias related to, and
>>> possibly a case of, essentialism. (And just as in the case of
>>> (other forms of) essentialism, the actual cognitive
>>> causes/mechanisms of the bias may vary.)
>>>
>>> But regardless, my question is, again, has anybody tried to
>>> guestimate to what extent the results of current typological
>>> studies may be warped by this kind of researcher bias? (Note
>>> that the bias may be affecting both authors of descriptive work
>>> and typologists using descriptive work as data, so there is a
>>> possible double-whammy effect.)
>>>
>>> Thanks! – Juergen
>>>
>>> Juergen Bohnemeyer (He/Him)
>>> Professor, Department of Linguistics
>>> University at Buffalo
>>>
> --
> Martin Haspelmath
> Max Planck Institute for Evolutionary Anthropology
> Deutscher Platz 6
> D-04103 Leipzig
> https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>
--
Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20241001/4bc3d221/attachment.htm>
More information about the Lingtyp
mailing list