[Lingtyp] complex annotations and inter-rater reliability
Christian Lehmann
christian.lehmann at uni-erfurt.de
Sun Jan 4 11:48:12 UTC 2026
Dear Björn,
I should have added that I am well aware that a description such as I
postulated is not available for many languages. However, if this is so,
requiring good annotations despite the absence of a complete description
plus guidelines amounts to requiring that annotators do the work that
the linguist employing them has not done.
Best,
Christian
------------------------------------------------------------------------------------------------------
Am 04.01.2026 um 12:42 schrieb Wiemer, Bjoern:
>
> Dear Christian,
>
> thanks for your suggestions. As a first reaction, I’d like to point
> out two problems which you seem to skip over (or take for granted,
> though they cannot).
>
> First, your “if we ignore these [semantic and pragmatic factors] for a
> moment”. This is to beg one big question. One reason is that you need
> to understand how fine-grained (or coarse) your grid (value set) for a
> given distinction can or should be.
>
> Second, you require a “complete linguistic description of the
> language”. This looks much like a postulate from strict structuralism
> that you have to know all elements and their relations to each other
> (“où tout se tient”) before you may determine the meaning of a
> particular item in an utterance. To my knowledge, this strict
> postulate has never been met in reality (and probably it cannot be met
> by 100% for any language). And how will you do this for historically
> earlier stages that are more often than not documented, let alone
> described in structural terms, rather fragmentarily?
>
> Brought to its logical end, if we have to base our work strictly on
> these principles, we are forced to say that we cannot do reliable
> research in diachronic change (at least such one in which semantic and
> pragmatic functions occupy center stage)…
>
> Best,
>
> Björn.
>
> *Von:*Lingtyp <lingtyp-bounces at listserv.linguistlist.org> *Im Auftrag
> von *Christian Lehmann via Lingtyp
> *Gesendet:* Sonntag, 4. Januar 2026 12:13
> *An:* lingtyp at listserv.linguistlist.org
> *Betreff:* Re: [Lingtyp] complex annotations and inter-rater reliability
>
> Dear Björn,
>
> I have never studied systematically the quality of the product of
> different annotators, so please consider me incompetent in this
> respect. However, a presupposition of any such study is obviously a
> definition of what a good/correct annotation is. Such a definition
> would be possible on certain conditions:
>
> 1. The utterance to be annotated has one linguistic (phonological,
> grammatical, semantic) structure. This implies that its meaning is
> known and there is no (licit) variation of annotations reflecting
> an ambiguity in the data.
> 2. There is a complete linguistic description of the language. Among
> other things, it comprises lists of all linguistic units, the
> regularities in their distribution and the set of constructions
> that they form.
> 3. On the basis of this description, annotation guidelines are
> formulated which provide a procedure by which the identity of a
> unit found in an utterance is to be determined.
> 4. The annotation grid stipulates a representation for every
> linguistic unit to be annotated.
>
> If all of this (unless I forget anything) could be made formally
> explicit, then even an algorithm could produce a correct annotation.
> It cannot be made fully explicit because of semantic and pragmatic
> factors which cannot be systematized. Now if we ignore these for a
> moment, then a given annotation is either correct or false, and the
> comparison of products of annotators boils down to an examination of
> whether their annotations are correct. Given this, it would seem to be
> of secondary importance whether an annotator is a native speaker or a
> linguist or what not; the only question is to what extent he or she
> obeys the guidelines.
>
> The moral of my argument is: the burden is principally on the
> shoulders of the person who formulates the guidelines. The annotator
> can do no better than these.
>
> --------------------------------------------------
>
> Am 03.01.2026 um 12:54 schrieb Wiemer, Bjoern via Lingtyp:
>
> Dear All,
>
> since this seems to be the first post on this list this year, I
> wish everybody a successful, more peaceful and decent year than
> the previous one.
>
> I want to raise an issue which gets back to a discussion from
> October 2023 on this list (see the thread below, in inverse
> chronological order). I’m interested to know whether anybody has a
> satisfying answer to the question how to deal with semantic
> annotation, or the annotation of more complex (and less obvious)
> relations, in particular with the annotation of interclausal
> relations, both in terms of syntax and in semantic terms. Problems
> arise already with the coordination-subordination gradient, which
> ultimately is an outcome of a complex bunch of semantic criteria
> (like independence of illocutionary force, perspective from which
> referential expressions like tense or person deixis are
> interpreted; see also the factors that were analyzed meticulously,
> e.g., by Verstraete 2007). Other questions concern the coding of
> clause-initial “particles”: are they just particles, operators of
> “analytical mood”, or complementizers? (Notably, these things do
> not exclude one another, but they heavily depend on one’s theory,
> in particular one’s stance toward complementation and mood.)
> Another case in point is the annotation of the functions and
> properties of constructions in TAME-domains, especially if the
> annotation grid is more fine-grained than mainstream categorizing.
>
> The problems which I have encountered (in pilot studies) are very
> similar to those discussed in October 2023 for seemingly even
> “simpler”, or more coarse-grained annotations. And they aggravate
> a lot when we turn to data from diachronic corpora: even if being
> an informed native speaker is usually an asset, with diachronic
> data this asset is often useless, and native knowledge may be even
> a hindrance since it leads the analyst to project one’s habits and
> norms of contemporary usage to earlier stages of the “same”
> language. (Similar points apply for closely related languages.) I
> entirely agree that annotators have to be trained, and grids of
> annotation to be tested, first of all because you have to exclude
> the (very likely) possibility that raters disagree just because
> some of the criteria are not clear to at least one of them (with
> the consequence that you cannot know whether disagreement or low
> Kappa doesn’t result from misunderstandings, instead of reflecting
> properties of your object of study). I also agree that each
> criterion of a grid has to be sufficiently defined, and the
> annotation grid (or even its “history”) as such be documented in
> order to save objective criteria for replicability and
> comparability (for cross-linguistic research, but also for
> diachronic studies based on a series of “synchronic cuts” of the
> given language).
>
> On this background, I’d like to formulate the following questions:
>
> 1. Which arguments are there that (informed) native speakers are
> better annotators than linguistically well-trained
> students/linguists who are not native speakers of the
> respective language(s), but can be considered experts?
> 2. Conversely, which arguments are there that non-native speaker
> experts might be even better suited as annotators (for this or
> that kind of issue)?
> 3. Have assumptions about pluses and minuses of both kinds of
> annotators been tested in practice? That is, do we have
> empirical evidence for any such assumptions (or do we just
> rely on some sort of common sense, or on the personal
> experience of those who have done more complicated annotation
> work)?
> 4. How can pluses and minuses of both kinds of annotators be
> counterbalanced in a not too time (and money) consuming way?
> 5. What can we do with data from diachronic corpora if we have to
> admit that (informed) native speakers are of no use, and
> non-native experts are not acknowledged, either? Are we just
> deemed to refrain from any reliable and valid in-depth
> research based on annotations (and statistics) for
> diachronically earlier stages and for diachronic change?
> 6. In connection with this, has any cross-linguistic research
> that is interested in diachrony tried to implement insights
> from such fields like historical semantics and pragmatics into
> annotations? In typology, linguistic change has increasingly
> become more prominent during the last 10-15 years (not only
> from a macro-perspective). I thus wonder whether typologists
> have tried to “borrow” methodology from fields that have
> possibly been better in interpreting diachronic data, and even
> quantify them (to some extent).
>
> I don’t want to be too pessimistic, but if we have no good answers
> as for who should be doing annotations – informed native speakers
> or non-native experts (or only those who are both native and
> experts)? – and how we might be able to test the validity of
> annotation grids (for comparisons across time and/or languages),
> there won’t be convincing arguments how to deal with diachronic
> data (or data of lesser studied languages for which there might be
> no native speakers available) in empirical studies that are to
> disclose more fine-grained distinctions and changes, also in order
> to quantify them. In particular, reviewers of project applications
> may always ask for a convincing methodology, and if no such
> research is funded we’ll remain ignorant of quite many reasons and
> backgrounds of language change.
>
> I’d appreciate advice, in particular if it provides answers to any
> of the questions under 1-6 above.
>
> Best,
>
> Björn (Wiemer).
>
> --
>
> Prof. em. Dr. Christian Lehmann
> Rudolfstr. 4
> 99092 Erfurt
> Deutschland
>
> Tel.:
>
>
>
> +49/361/2113417
>
> E-Post:
>
>
>
> christianw_lehmann at arcor.de
>
> Web:
>
>
>
> https://www.christianlehmann.eu
>
--
Prof. em. Dr. Christian Lehmann
Rudolfstr. 4
99092 Erfurt
Deutschland
Tel.: +49/361/2113417
E-Post: christianw_lehmann at arcor.de
Web: https://www.christianlehmann.eu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20260104/ba834a68/attachment.htm>
More information about the Lingtyp
mailing list