<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear Björn,</p>
<p>Since you mentioned works on cross-linguistic inter-coder
reliability as well (e.g. Himmelmann et al. 2018 on the
universality of intonational phrases):</p>
<p>I think it's important to have clear and simple definitions of
annotation categories, so if you are interested, for example, in "<span
lang="EN-US" style="color:black;mso-fareast-language:EN-US">the
coding of clause-initial “particles” (are they just particles,
operators of “analytical mood”, or complementizers?)", you need
to have clear and simple definitions of <i>particle</i>, <i>mood</i>,
and <i>complementizer</i> as comparative concepts. ("</span>The
burden is on those who formulate the guidelines", as Christian
Lehmann said.)</p>
<p><span lang="EN-US" style="color:black;mso-fareast-language:EN-US">I
think one can define <i>particle</i> as "a bound morph that is
neither a root nor an affix nor a person form nor a linker", but
this definition of course presupposes that one has a definition
of "root", of "affix", and so on. These terms are not understood
uniformly either, and <i>mood</i> is perhaps the worst of all
traditional terms (even worse than "subordination", I think).</span></p>
<p>Matters are quite different with materials from little-studied
languages, i.e. with "<span style="font-size: 16px;">transcribing
and annotating recordings", </span>as described by Jürgen
Bohnemeyer. Language-particular descriptive categories are much
easier to identify across texts than comparatively defined
categories are to identify across languages.</p>
<p>Best wishes for the New Year,</p>
<p>Martin</p>
<div class="moz-cite-prefix">On 03.01.26 12:54, Wiemer, Bjoern via
Lingtyp wrote:<br>
</div>
<blockquote type="cite"
cite="mid:41f16f708cbc43f48c87e00fb0e7da5c@uni-mainz.de">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator"
content="Microsoft Word 15 (filtered medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:Aptos;}@font-face
{font-family:Times;
panose-1:2 2 6 3 5 4 5 2 3 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}p.bibliography, li.bibliography, div.bibliography
{mso-style-name:bibliography;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}span.E-MailFormatvorlage20
{mso-style-type:personal-reply;
color:black;}.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}div.WordSection1
{page:WordSection1;}ol
{margin-bottom:0cm;}ul
{margin-bottom:0cm;}</style>
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">Dear All,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">since this
seems to be the first post on this list this year, I wish
everybody a successful, more peaceful and decent year than
the previous one.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">I want to
raise an issue which gets back to a discussion from October
2023 on this list (see the thread below, in inverse
chronological order). I’m interested to know whether anybody
has a satisfying answer to the question how to deal with
semantic annotation, or the annotation of more complex (and
less obvious) relations, in particular with the annotation
of interclausal relations, both in terms of syntax and in
semantic terms. Problems arise already with the
coordination-subordination gradient, which ultimately is an
outcome of a complex bunch of semantic criteria (like
independence of illocutionary force, perspective from which
referential expressions like tense or person deixis are
interpreted; see also the factors that were analyzed
meticulously, e.g., by Verstraete 2007). Other questions
concern the coding of clause-initial “particles”: are they
just particles, operators of “analytical mood”, or
complementizers? (Notably, these things do not exclude one
another, but they heavily depend on one’s theory, in
particular one’s stance toward complementation and mood.)
Another case in point is the annotation of the functions and
properties of constructions in TAME-domains, especially if
the annotation grid is more fine-grained than mainstream
categorizing.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">
The problems which I have encountered (in pilot studies) are
very similar to those discussed in October 2023 for
seemingly even “simpler”, or more coarse-grained
annotations. And they aggravate a lot when we turn to data
from diachronic corpora: even if being an informed native
speaker is usually an asset, with diachronic data this asset
is often useless, and native knowledge may be even a
hindrance since it leads the analyst to project one’s habits
and norms of contemporary usage to earlier stages of the
“same” language. (Similar points apply for closely related
languages.) I entirely agree that annotators have to be
trained, and grids of annotation to be tested, first of all
because you have to exclude the (very likely) possibility
that raters disagree just because some of the criteria are
not clear to at least one of them (with the consequence that
you cannot know whether disagreement or low Kappa doesn’t
result from misunderstandings, instead of reflecting
properties of your object of study). I also agree that each
criterion of a grid has to be sufficiently defined, and the
annotation grid (or even its “history”) as such be
documented in order to save objective criteria for
replicability and comparability (for cross-linguistic
research, but also for diachronic studies based on a series
of “synchronic cuts” of the given language).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">On this
background, I’d like to formulate the following questions:<o:p></o:p></span></p>
<ol style="margin-top:0cm" start="1" type="1">
<li class="MsoListParagraph"
style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">
<span lang="EN-US" style="mso-fareast-language:EN-US">Which
arguments are there that (informed) native speakers are
better annotators than linguistically well-trained
students/linguists who are not native speakers of the
respective language(s), but can be considered experts?<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">
<span lang="EN-US" style="mso-fareast-language:EN-US">Conversely,
which arguments are there that non-native speaker experts
might be even better suited as annotators (for this or
that kind of issue)?<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">
<span lang="EN-US" style="mso-fareast-language:EN-US">Have
assumptions about pluses and minuses of both kinds of
annotators been tested in practice? That is, do we have
empirical evidence for any such assumptions (or do we just
rely on some sort of common sense, or on the personal
experience of those who have done more complicated
annotation work)?<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">
<span lang="EN-US" style="mso-fareast-language:EN-US">How
can pluses and minuses of both kinds of annotators be
counterbalanced in a not too time (and money) consuming
way?<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">
<span lang="EN-US" style="mso-fareast-language:EN-US">What
can we do with data from diachronic corpora if we have to
admit that (informed) native speakers are of no use, and
non-native experts are not acknowledged, either? Are we
just deemed to refrain from any reliable and valid
in-depth research based on annotations (and statistics)
for diachronically earlier stages and for diachronic
change?<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="color:black;margin-left:0cm;mso-list:l0 level1 lfo3">
<span lang="EN-US" style="mso-fareast-language:EN-US">In
connection with this, has any cross-linguistic research
that is interested in diachrony tried to implement
insights from such fields like historical semantics and
pragmatics into annotations? In typology, linguistic
change has increasingly become more prominent during the
last 10-15 years (not only from a macro-perspective). I
thus wonder whether typologists have tried to “borrow”
methodology from fields that have possibly been better in
interpreting diachronic data, and even quantify them (to
some extent).<o:p></o:p></span></li>
</ol>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">I don’t want
to be too pessimistic, but if we have no good answers as for
who should be doing annotations – informed native speakers
or non-native experts (or only those who are both native and
experts)? – and how we might be able to test the validity of
annotation grids (for comparisons across time and/or
languages), there won’t be convincing arguments how to deal
with diachronic data (or data of lesser studied languages
for which there might be no native speakers available) in
empirical studies that are to disclose more fine-grained
distinctions and changes, also in order to quantify them. In
particular, reviewers of project applications may always ask
for a convincing methodology, and if no such research is
funded we’ll remain ignorant of quite many reasons and
backgrounds of language change.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">I’d
appreciate advice, in particular if it provides answers to
any of the questions under 1-6 above.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US">Björn
(Wiemer).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"
style="color:black;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div>
<div
style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">Von:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Lingtyp
<a class="moz-txt-link-rfc2396E" href="mailto:lingtyp-bounces@listserv.linguistlist.org"><lingtyp-bounces@listserv.linguistlist.org></a>
<b>Im Auftrag von </b>William Croft<br>
<b>Gesendet:</b> Montag, 16. </span><span lang="EN-US"
style="font-size:11.0pt;font-family:"Calibri",sans-serif">Oktober
2023 15:52<br>
<b>An:</b> Volker Gast <a class="moz-txt-link-rfc2396E" href="mailto:volker.gast@uni-jena.de"><volker.gast@uni-jena.de></a><br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:LINGTYP@LISTSERV.LINGUISTLIST.ORG">LINGTYP@LISTSERV.LINGUISTLIST.ORG</a><br>
<b>Betreff:</b> Re: [Lingtyp] typology projects that use
inter-rater reliability?<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">An early
cross-linguistic study with multiple annotators is this one:<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
<div>
<p class="bibliography"
style="mso-margin-top-alt:0cm;margin-right:0cm;margin-bottom:0cm;margin-left:18.0pt;text-align:justify;text-indent:-18.0pt">
<span
style="font-size:13.5pt;font-family:"Times",serif">Gundel,
Jeannette K., Nancy Hedberg & Ron Zacharski.
</span><span lang="EN-US"
style="font-size:13.5pt;font-family:"Times",serif">1993.
Cognitive status and the form of referring expressions in
discourse. <i>Language</i> 69.274-307.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US">It doesn’t have all
the documentation that Volker suggests; our standards for
providing documentation has risen.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US">I have been involved
in annotation projects in natural language processing,
where the aim is to annotate corpora so that automated
methods can “learn” the annotation categories from the
“gold standard” (i.e. “expert”) annotation -- this is
supervised learning in NLP. Recent efforts are aiming at
developing a single annotation scheme for use across
languages, such as Universal Dependencies (for syntactic
annotation), Uniform Meaning Representation (for semantic
annotation), and Unimorph (for morphological annotation).
My experience is somewhat similar to Volker’s: even when
the annotation scheme is very coarse-grained (from a
theoretical linguist’s point of view), getting good enough
interannotator agreement is hard, even when the annotators
are the ones who designed the scheme, or are native
speakers or have done fieldwork on the language. I would
add to Volker’s comments that one has to be trained for
annotation; but that training can introduce (mostly
implicit) bases, at least in the eyes of proponents of a
different theoretical approach -- something that is more
apparent in a field such as linguistics where there are
large differences in theoretical approaches.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US">Bill<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-US"><br>
<br>
<o:p></o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span lang="EN-US">On Oct 16, 2023,
at 6:02 AM, Volker Gast <</span><a
href="mailto:volker.gast@uni-jena.de"
moz-do-not-send="true"><span lang="EN-US">volker.gast@uni-jena.de</span></a><span
lang="EN-US">> wrote:<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US"><br>
Hey Adam (and others),<br>
<br>
I think you could phrase the question differently:
What typological studies have been carried out
with multiple annotators and careful documentation
of the annotation process, including precise
annotation guidelines, the training of the
annotators, publication of all the (individual)
annotations, calculation of inter-annotator
agreement etc.?<br>
<br>
I think there are very few. The reason is that the
process is very time-consuming, and "risky". I was
a member of a project co-directed with Vahram
Atayan (Heidelberg) where we carried out very
careful annotations dealing with what we call
'adverbials of immediate posteriority' (see the
references below). Even though we only dealt with
a few well-known European languages, it took us
quite some time to develop annotation guidelines
and train annotators. The inter-rater agreement
was surprisingly low even for categories that
appeared straightforward to us, e.g. agentivity of
a predicate; and we were dealing with well-known
languages (English, German, French, Spanish,
Italian). So the outcomes of this process were
very moderate in comparison with the work that
went into the annotations. (Note that the project
was primarily situated in the field of contrastive
linguistics and translation studies, not
linguistic typology, but the challenges are the
same).<br>
<br>
It's a dilemma: as a field, we often fail to meet
even the most basic methodological requirements
that are standardly made in other fields (most
notably psychology). I know of at least two
typological projects where inter-rater agreement
tests were run, but the results were so poor that
a decision was made to not pursue this any further
(meaning, the projects were continued, but without
inter-annotator agreement tests; that's what makes
annotation projects "risky": what do you do if you
never reach a satisfactory level of
inter-annotator agreement?). Most annotation
projects, including some of my own earlier work,
are based on what we euphemistically call 'expert
annotation', with 'expert' referring to ourselves,
the authors. Today I would minimally expect the
annotations to be done by someone who is not an
author, and I try to implement that requirement in
my role as a journal editor (Linguistics), but
it's hard. We do want to see more empirical work
published, and if the methodological standards are
too high, we will end publishing nothing at all.<br>
<br>
I'd be very happy if there were community
standards for this, and I'd like to hear about any
iniatives implementing more rigorous
methodological standards in lingusitic typology.
Honestly, I wouldn't know what to require. But it
seems clear to me that we cannot simply go on like
this, annotating our own data, which we
subsequently analyze, as it is well known that
annotation decisions are influenced by (mostly
implicit) biases.<br>
<br>
Best,<br>
Volker<br>
<br>
Gast, Volker & Vahram Atayan (2019).
'Adverbials of immediate posteriority in French
and German: A contrastive corpus study of tout de
suite, immédiatement, gleich and sofort'. In
Emonds, J., M. Janebová & L. Veselovská
(eds.): Language Use and Linguistic Structure.
Proceedings of the Olomouc Linguistics Colloquium
2018, 403-430. Olomouc Modern Lanuage Series.
</span>Olomouc: Palacký University Olomouc.<br>
<br>
in German:<br>
<br>
Atayan, V., B. Fetzer, V. Gast, D. Möller, T.
Ronalter (2019). 'Ausdrucksformen der unmittelbaren
Nachzeitigkeit in Originalen und Übersetzungen: Eine
Pilotstudie zu den deutschen Adverbien gleich und
sofort und ihren Äquivalenten im Französischen,
Italienischen, Spanischen und Englischen'. In
Ahrens, B., S. Hansen-Schirra, M. Krein-Kühle, M.
Schreiber, U. Wienen (eds.): Translation --
Linguistik -- Semiotik, 11-82. Berlin: Frank &
Timme.<br>
<br>
Gast, V., V. Atayan, J. Biege, B. Fetzer, S.
Hettrich, A. Weber (2019). 'Unmittelbare
Nachzeitigkeit im Deutschen und Französischen: Eine
Studie auf Grundlage des OpenSubtitles-Korpus'.
<span lang="EN-US">In Konecny, C., C. Konzett, E.
Lavric, W. Pöckl (eds.): Comparatio delectat III.
</span>Akten der VIII. Internationalen Arbeitstagung
zum romanisch-deutschen und innerromanischen
Sprachvergleich, 223-249.
<span lang="EN-US">Frankfurt: Lang.<br>
<br>
<br>
---<br>
Prof. V. Gast<br>
</span><a href="https://linktype.iaa.uni-jena.de/VG"
moz-do-not-send="true"><span lang="EN-US">https://linktype.iaa.uni-jena.de/VG</span></a><span
lang="EN-US"><br>
<br>
On Sat, 14 Oct 2023, Adam James Ross Tallman
wrote:<br>
<br>
<br>
<o:p></o:p></span></p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span lang="EN-US">Hello all,<br>
I am gathering a list of projects / citations /
papers that use or refer to inter-rater
reliability. So far I have.<br>
Himmelmann et al. 2018. On the universality of
intonational phrases: a cross-linguistic
interrater study. Phonology 35.<br>
Gast & Koptjevskaja-Tamm. 2022. Patterns of
persistence and diffusibility in the European
lexicon. Linguistic Typology (not explicitly the
topic of the paper, but interrater reliability
metrics are used)<br>
I understand people working with Grambank have
used it, but I don't know if there is a
publication on that.<br>
best,<br>
Adam<br>
--<br>
Adam J.R. Tallman<br>
Post-doctoral Researcher<br>
Friedrich Schiller Universität<br>
Department of English Studies<o:p></o:p></span></p>
</blockquote>
<p class="MsoNormal"><span lang="EN-US">_______________________________________________<br>
Lingtyp mailing list<br>
</span><a
href="mailto:Lingtyp@listserv.linguistlist.org"
moz-do-not-send="true"><span lang="EN-US">Lingtyp@listserv.linguistlist.org</span></a><span
lang="EN-US"><br>
</span><a
href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp"
moz-do-not-send="true"><span lang="EN-US">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</span></a><span
lang="EN-US"><o:p></o:p></span></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre wrap="" class="moz-quote-pre">_______________________________________________
Lingtyp mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>
<a class="moz-txt-link-freetext" href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Martin Haspelmath
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
D-04103 Leipzig
<a class="moz-txt-link-freetext" href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/">https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/</a></pre>
</body>
</html>