<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Of course, <span style="font-family:"CMU Serif"">"areal/phylogenetic

        researcher bias (APRB)" exists, and during the Grambank coding,

        I often heard Hedvig Skirgård talk about it as a potential

        issue. (I don't remember if it was addressed in a specific way,

        though.)</span></p>

    <p><span style="font-family:"CMU Serif"">I don't know if

        it can be measured somehow (given the enormous diversity of

        researcher traditions, I'm a bit skeptical), but I think it can

        be mitigated if we are aware that the purpose of comparative

        concepts in typology is NOT to provide *analyses* – rather, it

        is to enable us to *classify* languages.</span></p>

    <p><span style="font-family:"CMU Serif"">Volker Gast

        rightly says: "</span>Two linguists working on the same language

      will often provide very different analyses, and both may be right

      in their own ways."</p>

    <p>But while the *analyses* may well be different (because of the

      well-known non-uniqueness problem first highlighted by Yuen-Ren

      Chao in 1934: <a class="moz-txt-link-freetext" href="https://dlc.hypotheses.org/3381">https://dlc.hypotheses.org/3381</a>), the

      *classifications* should not be different if the different

      linguists have access to the same information.</p>

    <p>I wrote about this in the following blogpost, where I note that

      the "difficulties of classification" that typologists talk about

      are typically due to the unclarity of the comparative concepts,

      not necessarily to lack of data: <a class="moz-txt-link-freetext" href="https://dlc.hypotheses.org/2528">https://dlc.hypotheses.org/2528</a>.</p>

    <p>In practice, of course, different linguists do not have access to

      the same kinds of data, and subjectiveness cannot be excluded

      entirely. However, if we are careful to distinguish between

      analyses/descriptions (at the p-level) and classifications and

      cross-linguistic generalizations (at the g-level), some problems

      will go away.</p>

    <p>Best,</p>

    <p>Martin<br>

    </p>

    <div class="moz-cite-prefix">On 29.09.24 12:41, Volker Gast via

      Lingtyp wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:7a4da6d8-2900-436d-bd64-364ca42704dc@uni-jena.de">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <p>Dear Jürgen and others,<br>

      </p>

      <p>I think this is one of the major methodological problems of

        linguistic typology (which, if I remember correctly, has been

        discussed on this list before). There's no single 'correct' way

        of analysing a language. Two linguists working on the same

        language will often provide very different analyses, and both

        may be right in their own ways. It starts with phonology, where

        you have a lot of degrees of freedom in, for instance,

        minimizing or maximizing phoneme inventories (e.g. by [not]

        introducing phonological domains and features operating on these

        domains), and it gets worse in morphology, specifically if there

        is distributed exponence and other complexities of this type. At

        the level of syntax the impact of the specific theoretical

        background can be seen, for instance, in publications using the

        UD corpora. These corpora were annotated with a specific version

        of dependency grammar, I think essentially for pragmatic reasons

        (dependency grammar was very popular among computational

        linguists for a while). The theorerical assumptions of the

        annotation model obviously have an impact on the results (just

        think of the very old discussion of what a 'subject' is,

        represented as the 'nsubj' relation in the UD annotations).<br>

      </p>

      <p>For many languages we only have one description, and the

        linguist describing it comes from a specific background or

        'school' (and these schools are often associated with particular

        areas and particular phylogenetic groupings, introducing further

        biases of the type you mention). Again, the effects are visible

        at the level of phonology already. For example, the Papuan

        language Idi could be described as having just three vowels, or

        as having nine vowels (perhaps even more), depending on your

        assumptions about phonotactics etc. (There's a published

        analysis of that language, by D. Schokkin, N. Evans, C. Döhler

        and me, but the analysis really reflects some kind of compromise

        between the authors, and it leaves a few non-trivial questions

        open.)<br>

      </p>

      <p>The specific linguist and their school or background is a

        source of statistical non-independence. Even relying on exactly

        one description per language, and having the data coded by

        several researchers, often leads to low inter-annotator

        agreement in my experience.</p>

      <p>I think we need to be aware that typological data is

        behavioural data at three layers: (i) language is a behavioural

        activity, (ii) describing a language is a behavioural activity,

        and (iii) extracting information from descriptions is another

        behavioural activity. Variance occurs at all levels and is

        multiplied in the process from (i) to (iii).</p>

      <p>Approximately determining the amount of variance of that type

        would be a major project. For instance, we could have five

        undocumented (unstandardized) languages described by five

        linguists each, using data from five different speakers per

        language. Many will think that this would be a waste of

        resources, given the number of (varieties) of languages that

        still await description.</p>

      <p>What follows from all this, in my view, is that we need to be

        careful in applying statistical analyses "blindly". Linguistics

        is not a natural science. Given the large amount of inherent

        variance in typological data we linguists should remain in the

        driver's seat and use quantitative typological evidence as an

        assistance system, being aware of its limits and possibilities,

        rather than take a back seat and let the autopilot drive.</p>

      <p>Best,<br>

        Volker (Gast)<br>

      </p>

      <p><br>

      </p>

      <div class="moz-cite-prefix">Am 28.09.2024 um 20:17 schrieb

        Juergen Bohnemeyer via Lingtyp:<br>

      </div>

      <blockquote type="cite"

cite="mid:SJ0PR15MB4696A146401E096FD352F9D7DD742@SJ0PR15MB4696.namprd15.prod.outlook.com">

        <meta http-equiv="Content-Type"

          content="text/html; charset=UTF-8">

        <meta name="Generator"

          content="Microsoft Word 15 (filtered medium)">

        <style>@font-face

        {font-family:Helvetica;

        panose-1:0 0 0 0 0 0 0 0 0 0;}@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face

        {font-family:Aptos;

        panose-1:2 11 0 4 2 2 2 2 2 4;}@font-face

        {font-family:"CMU Serif";

        panose-1:2 0 6 3 0 0 0 0 0 0;}@font-face

        {font-family:"Times New Roman \(Body CS\)";

        panose-1:2 11 6 4 2 2 2 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:11.0pt;

        font-family:"Aptos",sans-serif;

        mso-ligatures:standardcontextual;}span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"CMU Serif";

        color:windowtext;

        font-weight:normal;

        font-style:normal;}.MsoChpDefault

        {mso-style-type:export-only;

        font-size:11.0pt;}div.WordSection1

        {page:WordSection1;}</style>

        <div class="WordSection1">

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">Dear

              all – I’m wondering whether anybody has attempted to

              estimate the size of the following putative effect on

              descriptive and typological research:<o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">Suppose

              there is a particular phenomenon in Language L, the known

              properties of which are equally compatible with an

              analysis in terms of construction types (comparative

              concepts) A and B.<o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">Suppose

              furthermore that L belongs to a language family and/or

              linguistic area such that A has much more commonly been

              invoked in descriptions of languages of that family/area

              than B.<o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">Then

              to the extent that a researcher attempting to adjudicate

              between A and B wrt. L (whether in a description of L, in

              a typological study, or in coding for an evolving

              typological database) is aware of the prevalence of

              A-coding/analyses for languages of the family/area in

              question, that might make them more likely to code/analyze

              L as exhibiting A as well. <o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">So

              for example, a researcher who assumes languages of the

              family/area of L to be typically tenseless may be

              influenced by this assumption and as a result become

              (however slightly) more likely to treat L as tenseless as

              well. In contrast, if she assumes languages of the

              family/area of L to be typically tensed, that might make

              her ever so slightly more likely to analyze L also as

              tensed. <o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">It

              seems to me that this is a cognitive bias related to, and

              possibly a case of, essentialism. (And just as in the case

              of (other forms of) essentialism, the actual cognitive

              causes/mechanisms of the bias may vary.)<o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">But

              regardless, my question is, again, has anybody tried to

              guestimate to what extent the results of current

              typological studies may be warped by this kind of

              researcher bias? (Note that the bias may be affecting both

              authors of descriptive work and typologists using

              descriptive work as data, so there is a possible

              double-whammy effect.)<o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif"">Thanks!

              – Juergen<o:p></o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

              style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

          <div>

            <div>

              <p class="MsoNormal"><span

style="font-size:9.0pt;font-family:Helvetica;color:black;mso-ligatures:none">Juergen

                  Bohnemeyer (He/Him)<br>

                  Professor, Department of Linguistics<br>

                  University at Buffalo </span><span

                style="white-space: pre-wrap">

</span></p>

            </div>

          </div>

        </div>

      </blockquote>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Martin Haspelmath

Max Planck Institute for Evolutionary Anthropology

Deutscher Platz 6

D-04103 Leipzig

<a class="moz-txt-link-freetext" href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></pre>

  </body>

</html>