<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Dear Jürgen and others,<br>

    </p>

    <p>I think this is one of the major methodological problems of

      linguistic typology (which, if I remember correctly, has been

      discussed on this list before). There's no single 'correct' way of

      analysing a language. Two linguists working on the same language

      will often provide very different analyses, and both may be right

      in their own ways. It starts with phonology, where you have a lot

      of degrees of freedom in, for instance, minimizing or maximizing

      phoneme inventories (e.g. by [not] introducing phonological

      domains and features operating on these domains), and it gets

      worse in morphology, specifically if there is distributed

      exponence and other complexities of this type. At the level of

      syntax the impact of the specific theoretical background can be

      seen, for instance, in publications using the UD corpora. These

      corpora were annotated with a specific version of dependency

      grammar, I think essentially for pragmatic reasons (dependency

      grammar was very popular among computational linguists for a

      while). The theorerical assumptions of the annotation model

      obviously have an impact on the results (just think of the very

      old discussion of what a 'subject' is, represented as the 'nsubj'

      relation in the UD annotations).<br>

    </p>

    <p>For many languages we only have one description, and the linguist

      describing it comes from a specific background or 'school' (and

      these schools are often associated with particular areas and

      particular phylogenetic groupings, introducing further biases of

      the type you mention). Again, the effects are visible at the level

      of phonology already. For example, the Papuan language Idi could

      be described as having just three vowels, or as having nine vowels

      (perhaps even more), depending on your assumptions about

      phonotactics etc. (There's a published analysis of that language,

      by D. Schokkin, N. Evans, C. Döhler and me, but the analysis

      really reflects some kind of compromise between the authors, and

      it leaves a few non-trivial questions open.)<br>

    </p>

    <p>The specific linguist and their school or background is a source

      of statistical non-independence. Even relying on exactly one

      description per language, and having the data coded by several

      researchers, often leads to low inter-annotator agreement in my

      experience.</p>

    <p>I think we need to be aware that typological data is behavioural

      data at three layers: (i) language is a behavioural activity, (ii)

      describing a language is a behavioural activity, and (iii)

      extracting information from descriptions is another behavioural

      activity. Variance occurs at all levels and is multiplied in the

      process from (i) to (iii).</p>

    <p>Approximately determining the amount of variance of that type

      would be a major project. For instance, we could have five

      undocumented (unstandardized) languages described by five

      linguists each, using data from five different speakers per

      language. Many will think that this would be a waste of resources,

      given the number of (varieties) of languages that still await

      description.</p>

    <p>What follows from all this, in my view, is that we need to be

      careful in applying statistical analyses "blindly". Linguistics is

      not a natural science. Given the large amount of inherent variance

      in typological data we linguists should remain in the driver's

      seat and use quantitative typological evidence as an assistance

      system, being aware of its limits and possibilities, rather than

      take a back seat and let the autopilot drive.</p>

    <p>Best,<br>

      Volker</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">Am 28.09.2024 um 20:17 schrieb Juergen

      Bohnemeyer via Lingtyp:<br>

    </div>

    <blockquote type="cite"

cite="mid:SJ0PR15MB4696A146401E096FD352F9D7DD742@SJ0PR15MB4696.namprd15.prod.outlook.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator"

        content="Microsoft Word 15 (filtered medium)">

      <style>@font-face

        {font-family:Helvetica;

        panose-1:0 0 0 0 0 0 0 0 0 0;}@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face

        {font-family:Aptos;

        panose-1:2 11 0 4 2 2 2 2 2 4;}@font-face

        {font-family:"CMU Serif";

        panose-1:2 0 6 3 0 0 0 0 0 0;}@font-face

        {font-family:"Times New Roman \(Body CS\)";

        panose-1:2 11 6 4 2 2 2 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:11.0pt;

        font-family:"Aptos",sans-serif;

        mso-ligatures:standardcontextual;}span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"CMU Serif";

        color:windowtext;

        font-weight:normal;

        font-style:normal;}.MsoChpDefault

        {mso-style-type:export-only;

        font-size:11.0pt;}div.WordSection1

        {page:WordSection1;}</style>

      <div class="WordSection1">

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">Dear

            all – I’m wondering whether anybody has attempted to

            estimate the size of the following putative effect on

            descriptive and typological research:<o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">Suppose

            there is a particular phenomenon in Language L, the known

            properties of which are equally compatible with an analysis

            in terms of construction types (comparative concepts) A and

            B.<o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">Suppose

            furthermore that L belongs to a language family and/or

            linguistic area such that A has much more commonly been

            invoked in descriptions of languages of that family/area

            than B.<o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">Then

            to the extent that a researcher attempting to adjudicate

            between A and B wrt. L (whether in a description of L, in a

            typological study, or in coding for an evolving typological

            database) is aware of the prevalence of A-coding/analyses

            for languages of the family/area in question, that might

            make them more likely to code/analyze L as exhibiting A as

            well.

            <o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">So

            for example, a researcher who assumes languages of the

            family/area of L to be typically tenseless may be influenced

            by this assumption and as a result become (however slightly)

            more likely to treat L as tenseless as well. In contrast, if

            she assumes languages of the family/area of L to be

            typically tensed, that might make her ever so slightly more

            likely to analyze L also as tensed.

            <o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">It

            seems to me that this is a cognitive bias related to, and

            possibly a case of, essentialism. (And just as in the case

            of (other forms of) essentialism, the actual cognitive

            causes/mechanisms of the bias may vary.)<o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">But

            regardless, my question is, again, has anybody tried to

            guestimate to what extent the results of current typological

            studies may be warped by this kind of researcher bias? (Note

            that the bias may be affecting both authors of descriptive

            work and typologists using descriptive work as data, so

            there is a possible double-whammy effect.)<o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif"">Thanks!

            – Juergen<o:p></o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

            style="font-size:12.0pt;font-family:"CMU Serif""><o:p> </o:p></span></p>

        <div>

          <div>

            <p class="MsoNormal"><span

style="font-size:9.0pt;font-family:Helvetica;color:black;mso-ligatures:none">Juergen

                Bohnemeyer (He/Him)<br>

                Professor, Department of Linguistics<br>

                University at Buffalo <br>

                <br>

                Office: 642 Baldy Hall, UB North Campus<br>

                Mailing address: 609 Baldy Hall, Buffalo, NY 14260 <br>

                Phone: (716) 645 0127 <br>

                Fax: (716) 645 3825<br>

                Email: </span><span

style="font-family:"Calibri",sans-serif;mso-ligatures:none"><a

                  href="mailto:jb77@buffalo.edu"

                  title="mailto:jb77@buffalo.edu" moz-do-not-send="true"><span

style="font-size:9.0pt;font-family:Helvetica;color:#0078D4">jb77@buffalo.edu</span></a></span><span

style="font-size:9.0pt;font-family:Helvetica;color:black;mso-ligatures:none"><br>

                Web: </span><span

style="font-family:"Calibri",sans-serif;mso-ligatures:none"><a

                  href="http://www.acsu.buffalo.edu/~jb77/"

                  title="http://www.acsu.buffalo.edu/~jb77/"

                  moz-do-not-send="true"><span

style="font-size:9.0pt;font-family:Helvetica;color:#0563C1">http://www.acsu.buffalo.edu/~jb77/</span></a></span><span

style="font-size:9.0pt;font-family:Helvetica;color:black;mso-ligatures:none"> <br>

                <br>

              </span><span

style="font-family:"Calibri",sans-serif;color:black;mso-ligatures:none">Office

                hours Tu/Th 3:30-4:30pm in 642 Baldy or via Zoom

                (Meeting ID 585 520 2411; Passcode Hoorheh) </span><span

style="font-size:9.0pt;font-family:Helvetica;color:black;mso-ligatures:none"><br>

                <br>

                There’s A Crack In Everything - That’s How The Light

                Gets In <br>

                (Leonard Cohen)  </span><span

style="font-family:"Calibri",sans-serif;mso-ligatures:none"><o:p></o:p></span></p>

            <p class="MsoNormal"><span

style="font-family:"Calibri",sans-serif;mso-ligatures:none">-- <o:p></o:p></span></p>

          </div>

        </div>

        <p class="MsoNormal"><span lang="DE"><o:p> </o:p></span></p>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre wrap="" class="moz-quote-pre">_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a>

</pre>

    </blockquote>

  </body>

</html>