<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Of course, <span style="font-family:"CMU Serif"">"areal/phylogenetic
researcher bias (APRB)" exists, and during the Grambank coding,
I often heard Hedvig Skirgård talk about it as a potential
issue. (I don't remember if it was addressed in a specific way,
though.)</span></p>
<p><span style="font-family:"CMU Serif"">I don't know if
it can be measured somehow (given the enormous diversity of
researcher traditions, I'm a bit skeptical), but I think it can
be mitigated if we are aware that the purpose of comparative
concepts in typology is NOT to provide *analyses* – rather, it
is to enable us to *classify* languages.</span></p>
<p><span style="font-family:"CMU Serif"">Volker Gast
rightly says: "</span>Two linguists working on the same language
will often provide very different analyses, and both may be right
in their own ways."</p>
<p>But while the *analyses* may well be different (because of the
well-known non-uniqueness problem first highlighted by Yuen-Ren
Chao in 1934: <a class="moz-txt-link-freetext" href="https://dlc.hypotheses.org/3381">https://dlc.hypotheses.org/3381</a>), the
*classifications* should not be different if the different
linguists have access to the same information.</p>
<p>I wrote about this in the following blogpost, where I note that
the "difficulties of classification" that typologists talk about
are typically due to the unclarity of the comparative concepts,
not necessarily to lack of data: <a class="moz-txt-link-freetext" href="https://dlc.hypotheses.org/2528">https://dlc.hypotheses.org/2528</a>.</p>
<p>In practice, of course, different linguists do not have access to
the same kinds of data, and subjectiveness cannot be excluded
entirely. However, if we are careful to distinguish between
analyses/descriptions (at the p-level) and classifications and
cross-linguistic generalizations (at the g-level), some problems
will go away.</p>
<p>Best,</p>
<p>Martin<br>
</p>
<div class="moz-cite-prefix">On 29.09.24 12:41, Volker Gast via
Lingtyp wrote:<br>
</div>
<blockquote type="cite"
cite="mid:7a4da6d8-2900-436d-bd64-364ca42704dc@uni-jena.de">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>Dear Jürgen and others,<br>
</p>
<p>I think this is one of the major methodological problems of
linguistic typology (which, if I remember correctly, has been
discussed on this list before). There's no single 'correct' way
of analysing a language. Two linguists working on the same
language will often provide very different analyses, and both
may be right in their own ways. It starts with phonology, where
you have a lot of degrees of freedom in, for instance,
minimizing or maximizing phoneme inventories (e.g. by [not]
introducing phonological domains and features operating on these
domains), and it gets worse in morphology, specifically if there
is distributed exponence and other complexities of this type. At
the level of syntax the impact of the specific theoretical
background can be seen, for instance, in publications using the
UD corpora. These corpora were annotated with a specific version
of dependency grammar, I think essentially for pragmatic reasons
(dependency grammar was very popular among computational
linguists for a while). The theorerical assumptions of the
annotation model obviously have an impact on the results (just
think of the very old discussion of what a 'subject' is,
represented as the 'nsubj' relation in the UD annotations).<br>
</p>
<p>For many languages we only have one description, and the
linguist describing it comes from a specific background or
'school' (and these schools are often associated with particular
areas and particular phylogenetic groupings, introducing further
biases of the type you mention). Again, the effects are visible
at the level of phonology already. For example, the Papuan
language Idi could be described as having just three vowels, or
as having nine vowels (perhaps even more), depending on your
assumptions about phonotactics etc. (There's a published
analysis of that language, by D. Schokkin, N. Evans, C. Döhler
and me, but the analysis really reflects some kind of compromise
between the authors, and it leaves a few non-trivial questions
open.)<br>
</p>
<p>The specific linguist and their school or background is a
source of statistical non-independence. Even relying on exactly
one description per language, and having the data coded by
several researchers, often leads to low inter-annotator
agreement in my experience.</p>
<p>I think we need to be aware that typological data is
behavioural data at three layers: (i) language is a behavioural
activity, (ii) describing a language is a behavioural activity,
and (iii) extracting information from descriptions is another
behavioural activity. Variance occurs at all levels and is
multiplied in the process from (i) to (iii).</p>
<p>Approximately determining the amount of variance of that type
would be a major project. For instance, we could have five
undocumented (unstandardized) languages described by five
linguists each, using data from five different speakers per
language. Many will think that this would be a waste of
resources, given the number of (varieties) of languages that
still await description.</p>
<p>What follows from all this, in my view, is that we need to be
careful in applying statistical analyses "blindly". Linguistics
is not a natural science. Given the large amount of inherent
variance in typological data we linguists should remain in the
driver's seat and use quantitative typological evidence as an
assistance system, being aware of its limits and possibilities,
rather than take a back seat and let the autopilot drive.</p>
<p>Best,<br>
Volker (Gast)<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">Am 28.09.2024 um 20:17 schrieb
Juergen Bohnemeyer via Lingtyp:<br>
</div>
<blockquote type="cite"
cite="mid:SJ0PR15MB4696A146401E096FD352F9D7DD742@SJ0PR15MB4696.namprd15.prod.outlook.com">
<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8">
<meta name="Generator"
content="Microsoft Word 15 (filtered medium)">
<style>@font-face
{font-family:Helvetica;
panose-1:0 0 0 0 0 0 0 0 0 0;}@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}@font-face
{font-family:"CMU Serif";
panose-1:2 0 6 3 0 0 0 0 0 0;}@font-face
{font-family:"Times New Roman \(Body CS\)";
panose-1:2 11 6 4 2 2 2 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"CMU Serif";
color:windowtext;
font-weight:normal;
font-style:normal;}.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;}div.WordSection1
{page:WordSection1;}</style>
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">Dear
all – I’m wondering whether anybody has attempted to
estimate the size of the following putative effect on
descriptive and typological research:<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">Suppose
there is a particular phenomenon in Language L, the known
properties of which are equally compatible with an
analysis in terms of construction types (comparative
concepts) A and B.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">Suppose
furthermore that L belongs to a language family and/or
linguistic area such that A has much more commonly been
invoked in descriptions of languages of that family/area
than B.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">Then
to the extent that a researcher attempting to adjudicate
between A and B wrt. L (whether in a description of L, in
a typological study, or in coding for an evolving
typological database) is aware of the prevalence of
A-coding/analyses for languages of the family/area in
question, that might make them more likely to code/analyze
L as exhibiting A as well. <o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">So
for example, a researcher who assumes languages of the
family/area of L to be typically tenseless may be
influenced by this assumption and as a result become
(however slightly) more likely to treat L as tenseless as
well. In contrast, if she assumes languages of the
family/area of L to be typically tensed, that might make
her ever so slightly more likely to analyze L also as
tensed. <o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">It
seems to me that this is a cognitive bias related to, and
possibly a case of, essentialism. (And just as in the case
of (other forms of) essentialism, the actual cognitive
causes/mechanisms of the bias may vary.)<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">But
regardless, my question is, again, has anybody tried to
guestimate to what extent the results of current
typological studies may be warped by this kind of
researcher bias? (Note that the bias may be affecting both
authors of descriptive work and typologists using
descriptive work as data, so there is a possible
double-whammy effect.)<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif"">Thanks!
– Juergen<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>
<div>
<div>
<p class="MsoNormal"><span
style="font-size:9.0pt;font-family:Helvetica;color:black;mso-ligatures:none">Juergen
Bohnemeyer (He/Him)<br>
Professor, Department of Linguistics<br>
University at Buffalo </span><span
style="white-space: pre-wrap">
</span></p>
</div>
</div>
</div>
</blockquote>
</blockquote>
<pre class="moz-signature" cols="72">--
Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
<a class="moz-txt-link-freetext" href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></pre>
</body>
</html>