<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Dear Björn,</p>

    <p>Since you mentioned works on cross-linguistic inter-coder

      reliability as well (e.g. Himmelmann et al. 2018 on the

      universality of intonational phrases):</p>

    <p>I think it's important to have clear and simple definitions of

      annotation categories, so if you are interested, for example, in "<span

        lang="EN-US" style="color:black;mso-fareast-language:EN-US">the

        coding of clause-initial “particles” (are they just particles,

        operators of “analytical mood”, or complementizers?)", you need

        to have clear and simple definitions of <i>particle</i>, <i>mood</i>,

        and <i>complementizer</i> as comparative concepts. ("</span>The

      burden is on those who formulate the guidelines", as Christian

      Lehmann said.)</p>

    <p><span lang="EN-US" style="color:black;mso-fareast-language:EN-US">I

        think one can define <i>particle</i> as "a bound morph that is

        neither a root nor an affix nor a person form nor a linker", but

        this definition of course presupposes that one has a definition

        of "root", of "affix", and so on. These terms are not understood

        uniformly either, and <i>mood</i> is perhaps the worst of all

        traditional terms (even worse than "subordination", I think).</span></p>

    <p>Matters are quite different with materials from little-studied

      languages, i.e. with "<span style="font-size: 16px;">transcribing

        and annotating recordings", </span>as described by Jürgen

      Bohnemeyer. Language-particular descriptive categories are much

      easier to identify across texts than comparatively defined

      categories are to identify across languages.</p>

    <p>Best wishes for the New Year,</p>

    <p>Martin</p>

    <div class="moz-cite-prefix">On 03.01.26 12:54, Wiemer, Bjoern via

      Lingtyp wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:41f16f708cbc43f48c87e00fb0e7da5c@uni-mainz.de">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator"

        content="Microsoft Word 15 (filtered medium)">

      <style>@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face

        {font-family:Aptos;}@font-face

        {font-family:Times;

        panose-1:2 2 6 3 5 4 5 2 3 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:#0563C1;

        text-decoration:underline;}p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph

        {mso-style-priority:34;

        margin-top:0cm;

        margin-right:0cm;

        margin-bottom:0cm;

        margin-left:36.0pt;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}p.bibliography, li.bibliography, div.bibliography

        {mso-style-name:bibliography;

        mso-margin-top-alt:auto;

        margin-right:0cm;

        mso-margin-bottom-alt:auto;

        margin-left:0cm;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}span.E-MailFormatvorlage20

        {mso-style-type:personal-reply;

        color:black;}.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;

        mso-ligatures:none;}div.WordSection1

        {page:WordSection1;}ol

        {margin-bottom:0cm;}ul

        {margin-bottom:0cm;}</style>

      <div class="WordSection1">

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">Dear All,<o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">since this

            seems to be the first post on this list this year, I wish

            everybody a successful, more peaceful and decent year than

            the previous one.<o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">I want to

            raise an issue which gets back to a discussion from October

            2023 on this list (see the thread below, in inverse

            chronological order). I’m interested to know whether anybody

            has a satisfying answer to the question how to deal with

            semantic annotation, or the annotation of more complex (and

            less obvious) relations, in particular with the annotation

            of interclausal relations, both in terms of syntax and in

            semantic terms. Problems arise already with the

            coordination-subordination gradient, which ultimately is an

            outcome of a complex bunch of semantic criteria (like

            independence of illocutionary force, perspective from which

            referential expressions like tense or person deixis are

            interpreted; see also the factors that were analyzed

            meticulously, e.g., by Verstraete 2007). Other questions

            concern the coding of clause-initial “particles”: are they

            just particles, operators of “analytical mood”, or

            complementizers? (Notably, these things do not exclude one

            another, but they heavily depend on one’s theory, in

            particular one’s stance toward complementation and mood.)

            Another case in point is the annotation of the functions and

            properties of constructions in TAME-domains, especially if

            the annotation grid is more fine-grained than mainstream

            categorizing.<o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">               

            The problems which I have encountered (in pilot studies) are

            very similar to those discussed in October 2023 for

            seemingly even “simpler”, or more coarse-grained

            annotations. And they aggravate a lot when we turn to data

            from diachronic corpora: even if being an informed native

            speaker is usually an asset, with diachronic data this asset

            is often useless, and native knowledge may be even a

            hindrance since it leads the analyst to project one’s habits

            and norms of contemporary usage to earlier stages of the

            “same” language. (Similar points apply for closely related

            languages.) I entirely agree that annotators have to be

            trained, and grids of annotation to be tested, first of all

            because you have to exclude the (very likely) possibility

            that raters disagree just because some of the criteria are

            not clear to at least one of them (with the consequence that

            you cannot know whether disagreement or low Kappa doesn’t

            result from misunderstandings, instead of reflecting

            properties of your object of study). I also agree that each

            criterion of a grid has to be sufficiently defined, and the

            annotation grid (or even its “history”) as such be

            documented in order to save objective criteria for

            replicability and comparability (for cross-linguistic

            research, but also for diachronic studies based on a series

            of “synchronic cuts” of the given language).<o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">On this

            background, I’d like to formulate the following questions:<o:p></o:p></span></p>

        <ol style="margin-top:0cm" start="1" type="1">

          <li class="MsoListParagraph"

            style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">

            <span lang="EN-US" style="mso-fareast-language:EN-US">Which

              arguments are there that (informed) native speakers are

              better annotators than linguistically well-trained

              students/linguists who are not native speakers of the

              respective language(s), but can be considered experts?<o:p></o:p></span></li>

          <li class="MsoListParagraph"

            style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">

            <span lang="EN-US" style="mso-fareast-language:EN-US">Conversely,

              which arguments are there that non-native speaker experts

              might be even better suited as annotators (for this or

              that kind of issue)?<o:p></o:p></span></li>

          <li class="MsoListParagraph"

            style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">

            <span lang="EN-US" style="mso-fareast-language:EN-US">Have

              assumptions about pluses and minuses of both kinds of

              annotators been tested in practice? That is, do we have

              empirical evidence for any such assumptions (or do we just

              rely on some sort of common sense, or on the personal

              experience of those who have done more complicated

              annotation work)?<o:p></o:p></span></li>

          <li class="MsoListParagraph"

            style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">

            <span lang="EN-US" style="mso-fareast-language:EN-US">How

              can pluses and minuses of both kinds of annotators be

              counterbalanced in a not too time (and money) consuming

              way?<o:p></o:p></span></li>

          <li class="MsoListParagraph"

            style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">

            <span lang="EN-US" style="mso-fareast-language:EN-US">What

              can we do with data from diachronic corpora if we have to

              admit that (informed) native speakers are of no use, and

              non-native experts are not acknowledged, either? Are we

              just deemed to refrain from any reliable and valid

              in-depth research based on annotations (and statistics)

              for diachronically earlier stages and for diachronic

              change?<o:p></o:p></span></li>

          <li class="MsoListParagraph"

            style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">

            <span lang="EN-US" style="mso-fareast-language:EN-US">In

              connection with this, has any cross-linguistic research

              that is interested in diachrony tried to implement

              insights from such fields like historical semantics and

              pragmatics into annotations? In typology, linguistic

              change has increasingly become more prominent during the

              last 10-15 years (not only from a macro-perspective). I

              thus wonder whether typologists have tried to “borrow”

              methodology from fields that have possibly been better in

              interpreting diachronic data, and even quantify them (to

              some extent).<o:p></o:p></span></li>

        </ol>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">I don’t want

            to be too pessimistic, but if we have no good answers as for

            who should be doing annotations – informed native speakers

            or non-native experts (or only those who are both native and

            experts)? – and how we might be able to test the validity of

            annotation grids (for comparisons across time and/or

            languages), there won’t be convincing arguments how to deal

            with diachronic data (or data of lesser studied languages

            for which there might be no native speakers available) in

            empirical studies that are to disclose more fine-grained

            distinctions and changes, also in order to quantify them. In

            particular, reviewers of project applications may always ask

            for a convincing methodology, and if no such research is

            funded we’ll remain ignorant of quite many reasons and

            backgrounds of language change.

            <o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">I’d

            appreciate advice, in particular if it provides answers to

            any of the questions under 1-6 above.<o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">Best,<o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US">Björn

            (Wiemer).<o:p></o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US"

            style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>

        <div>

          <div

style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">

            <p class="MsoNormal"><b><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif">Von:</span></b><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif">

                Lingtyp

                <a class="moz-txt-link-rfc2396E" href="mailto:lingtyp-bounces@listserv.linguistlist.org"><lingtyp-bounces@listserv.linguistlist.org></a>

                <b>Im Auftrag von </b>William Croft<br>

                <b>Gesendet:</b> Montag, 16. </span><span lang="EN-US"

style="font-size:11.0pt;font-family:"Calibri",sans-serif">Oktober

                2023 15:52<br>

                <b>An:</b> Volker Gast <a class="moz-txt-link-rfc2396E" href="mailto:volker.gast@uni-jena.de"><volker.gast@uni-jena.de></a><br>

                <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:LINGTYP@LISTSERV.LINGUISTLIST.ORG">LINGTYP@LISTSERV.LINGUISTLIST.ORG</a><br>

                <b>Betreff:</b> Re: [Lingtyp] typology projects that use

                inter-rater reliability?<o:p></o:p></span></p>

          </div>

        </div>

        <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span lang="EN-US">An early

            cross-linguistic study with multiple annotators is this one:<o:p></o:p></span></p>

        <div>

          <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

        </div>

        <div>

          <p class="bibliography"

style="mso-margin-top-alt:0cm;margin-right:0cm;margin-bottom:0cm;margin-left:18.0pt;text-align:justify;text-indent:-18.0pt">

            <span

style="font-size:13.5pt;font-family:"Times",serif">Gundel,

              Jeannette K., Nancy Hedberg & Ron Zacharski.

            </span><span lang="EN-US"

style="font-size:13.5pt;font-family:"Times",serif">1993.

              Cognitive status and the form of referring expressions in

              discourse. <i>Language</i> 69.274-307.<o:p></o:p></span></p>

        </div>

        <div>

          <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

        </div>

        <div>

          <p class="MsoNormal"><span lang="EN-US">It doesn’t have all

              the documentation that Volker suggests; our standards for

              providing documentation has risen.<o:p></o:p></span></p>

        </div>

        <div>

          <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

        </div>

        <div>

          <p class="MsoNormal"><span lang="EN-US">I have been involved

              in annotation projects in natural language processing,

              where the aim is to annotate corpora so that automated

              methods can “learn” the annotation categories from the

              “gold standard” (i.e. “expert”) annotation -- this is

              supervised learning in NLP. Recent efforts are aiming at

              developing a single annotation scheme for use across

              languages, such as Universal Dependencies (for syntactic

              annotation), Uniform Meaning Representation (for semantic

              annotation), and Unimorph (for morphological annotation).

              My experience is somewhat similar to Volker’s: even when

              the annotation scheme is very coarse-grained (from a

              theoretical linguist’s point of view), getting good enough

              interannotator agreement is hard, even when the annotators

              are the ones who designed the scheme, or are native

              speakers or have done fieldwork on the language. I would

              add to Volker’s comments that one has to be trained for

              annotation; but that training can introduce (mostly

              implicit) bases, at least in the eyes of proponents of a

              different theoretical approach -- something that is more

              apparent in a field such as linguistics where there are

              large differences in theoretical approaches.<o:p></o:p></span></p>

        </div>

        <div>

          <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

        </div>

        <div>

          <p class="MsoNormal"><span lang="EN-US">Bill<o:p></o:p></span></p>

          <div>

            <p class="MsoNormal"><span lang="EN-US"><br>

                <br>

                <o:p></o:p></span></p>

            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">

              <div>

                <p class="MsoNormal"><span lang="EN-US">On Oct 16, 2023,

                    at 6:02 AM, Volker Gast <</span><a

                    href="mailto:volker.gast@uni-jena.de"

                    moz-do-not-send="true"><span lang="EN-US">volker.gast@uni-jena.de</span></a><span

                    lang="EN-US">> wrote:<o:p></o:p></span></p>

              </div>

              <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

              <div>

                <div>

                  <p class="MsoNormal"><span lang="EN-US"><br>

                      Hey Adam (and others),<br>

                      <br>

                      I think you could phrase the question differently:

                      What typological studies have been carried out

                      with multiple annotators and careful documentation

                      of the annotation process, including precise

                      annotation guidelines, the training of the

                      annotators, publication of all the (individual)

                      annotations, calculation of inter-annotator

                      agreement etc.?<br>

                      <br>

                      I think there are very few. The reason is that the

                      process is very time-consuming, and "risky". I was

                      a member of a project co-directed with Vahram

                      Atayan (Heidelberg) where we carried out very

                      careful annotations dealing with what we call

                      'adverbials of immediate posteriority' (see the

                      references below). Even though we only dealt with

                      a few well-known European languages, it took us

                      quite some time to develop annotation guidelines

                      and train annotators. The inter-rater agreement

                      was surprisingly low even for categories that

                      appeared straightforward to us, e.g. agentivity of

                      a predicate; and we were dealing with well-known

                      languages (English, German, French, Spanish,

                      Italian). So the outcomes of this process were

                      very moderate in comparison with the work that

                      went into the annotations. (Note that the project

                      was primarily situated in the field of contrastive

                      linguistics and translation studies, not

                      linguistic typology, but the challenges are the

                      same).<br>

                      <br>

                      It's a dilemma: as a field, we often fail to meet

                      even the most basic methodological requirements

                      that are standardly made in other fields (most

                      notably psychology). I know of at least two

                      typological projects where inter-rater agreement

                      tests were run, but the results were so poor that

                      a decision was made to not pursue this any further

                      (meaning, the projects were continued, but without

                      inter-annotator agreement tests; that's what makes

                      annotation projects "risky": what do you do if you

                      never reach a satisfactory level of

                      inter-annotator agreement?). Most annotation

                      projects, including some of my own earlier work,

                      are based on what we euphemistically call 'expert

                      annotation', with 'expert' referring to ourselves,

                      the authors. Today I would minimally expect the

                      annotations to be done by someone who is not an

                      author, and I try to implement that requirement in

                      my role as a journal editor (Linguistics), but

                      it's hard. We do want to see more empirical work

                      published, and if the methodological standards are

                      too high, we will end publishing nothing at all.<br>

                      <br>

                      I'd be very happy if there were community

                      standards for this, and I'd like to hear about any

                      iniatives implementing more rigorous

                      methodological standards in lingusitic typology.

                      Honestly, I wouldn't know what to require. But it

                      seems clear to me that we cannot simply go on like

                      this, annotating our own data, which we

                      subsequently analyze, as it is well known that

                      annotation decisions are influenced by (mostly

                      implicit) biases.<br>

                      <br>

                      Best,<br>

                      Volker<br>

                      <br>

                      Gast, Volker & Vahram Atayan (2019).

                      'Adverbials of immediate posteriority in French

                      and German: A contrastive corpus study of tout de

                      suite, immédiatement, gleich and sofort'. In

                      Emonds, J., M. Janebová & L. Veselovská

                      (eds.): Language Use and Linguistic Structure.

                      Proceedings of the Olomouc Linguistics Colloquium

                      2018, 403-430. Olomouc Modern Lanuage Series.

                    </span>Olomouc: Palacký University Olomouc.<br>

                    <br>

                    in German:<br>

                    <br>

                    Atayan, V., B. Fetzer, V. Gast, D. Möller, T.

                    Ronalter (2019). 'Ausdrucksformen der unmittelbaren

                    Nachzeitigkeit in Originalen und Übersetzungen: Eine

                    Pilotstudie zu den deutschen Adverbien gleich und

                    sofort und ihren Äquivalenten im Französischen,

                    Italienischen, Spanischen und Englischen'. In

                    Ahrens, B., S. Hansen-Schirra, M. Krein-Kühle, M.

                    Schreiber, U. Wienen (eds.): Translation --

                    Linguistik -- Semiotik, 11-82. Berlin: Frank &

                    Timme.<br>

                    <br>

                    Gast, V., V. Atayan, J. Biege, B. Fetzer, S.

                    Hettrich, A. Weber (2019). 'Unmittelbare

                    Nachzeitigkeit im Deutschen und Französischen: Eine

                    Studie auf Grundlage des OpenSubtitles-Korpus'.

                    <span lang="EN-US">In Konecny, C., C. Konzett, E.

                      Lavric, W. Pöckl (eds.): Comparatio delectat III.

                    </span>Akten der VIII. Internationalen Arbeitstagung

                    zum romanisch-deutschen und innerromanischen

                    Sprachvergleich, 223-249.

                    <span lang="EN-US">Frankfurt: Lang.<br>

                      <br>

                      <br>

                      ---<br>

                      Prof. V. Gast<br>

                    </span><a href="https://linktype.iaa.uni-jena.de/VG"

                      moz-do-not-send="true"><span lang="EN-US">https://linktype.iaa.uni-jena.de/VG</span></a><span

                      lang="EN-US"><br>

                      <br>

                      On Sat, 14 Oct 2023, Adam James Ross Tallman

                      wrote:<br>

                      <br>

                      <br>

                      <o:p></o:p></span></p>

                  <blockquote

                    style="margin-top:5.0pt;margin-bottom:5.0pt">

                    <p class="MsoNormal"><span lang="EN-US">Hello all,<br>

                        I am gathering a list of projects / citations /

                        papers that use or refer to inter-rater

                        reliability. So far I have.<br>

                        Himmelmann et al. 2018. On the universality of

                        intonational phrases: a cross-linguistic

                        interrater study. Phonology 35.<br>

                        Gast & Koptjevskaja-Tamm. 2022. Patterns of

                        persistence and diffusibility in the European

                        lexicon. Linguistic Typology (not explicitly the

                        topic of the paper, but interrater reliability

                        metrics are used)<br>

                        I understand people working with Grambank have

                        used it, but I don't know if there is a

                        publication on that.<br>

                        best,<br>

                        Adam<br>

                        --<br>

                        Adam J.R. Tallman<br>

                        Post-doctoral Researcher<br>

                        Friedrich Schiller Universität<br>

                        Department of English Studies<o:p></o:p></span></p>

                  </blockquote>

                  <p class="MsoNormal"><span lang="EN-US">_______________________________________________<br>

                      Lingtyp mailing list<br>

                    </span><a

                      href="mailto:Lingtyp@listserv.linguistlist.org"

                      moz-do-not-send="true"><span lang="EN-US">Lingtyp@listserv.linguistlist.org</span></a><span

                      lang="EN-US"><br>

                    </span><a

href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp"

                      moz-do-not-send="true"><span lang="EN-US">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</span></a><span

                      lang="EN-US"><o:p></o:p></span></p>

                </div>

              </div>

            </blockquote>

          </div>

          <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

        </div>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre wrap="" class="moz-quote-pre">_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a>

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Martin Haspelmath

Max Planck Institute for Evolutionary Anthropology

Deutscher Platz 6

D-04103 Leipzig

<a class="moz-txt-link-freetext" href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></pre>

  </body>

</html>