[Lingtyp] typology projects that use inter-rater reliability?
Volker Gast
volker.gast at uni-jena.de
Mon Oct 16 12:02:32 UTC 2023
Hey Adam (and others),
I think you could phrase the question differently: What typological
studies have been carried out with multiple annotators and careful
documentation of the annotation process, including precise annotation
guidelines, the training of the annotators, publication of all the
(individual) annotations, calculation of inter-annotator agreement etc.?
I think there are very few. The reason is that the process is very
time-consuming, and "risky". I was a member of a project co-directed with
Vahram Atayan (Heidelberg) where we carried out very careful annotations
dealing with what we call 'adverbials of immediate posteriority' (see the
references below). Even though we only dealt with a few well-known
European languages, it took us quite some time to develop annotation
guidelines and train annotators. The inter-rater agreement was
surprisingly low even for categories that appeared straightforward to us,
e.g. agentivity of a predicate; and we were dealing with well-known
languages (English, German, French, Spanish, Italian). So the outcomes of
this process were very moderate in comparison with the work that went into
the annotations. (Note that the project was primarily situated in the
field of contrastive linguistics and translation studies, not linguistic
typology, but the challenges are the same).
It's a dilemma: as a field, we often fail to meet even the most basic
methodological requirements that are standardly made in other fields (most
notably psychology). I know of at least two typological projects where
inter-rater agreement tests were run, but the results were so poor that a
decision was made to not pursue this any further (meaning, the projects
were continued, but without inter-annotator agreement tests; that's what
makes annotation projects "risky": what do you do if you never reach a
satisfactory level of inter-annotator agreement?). Most annotation
projects, including some of my own earlier work, are based on what we
euphemistically call 'expert annotation', with 'expert' referring to
ourselves, the authors. Today I would minimally expect the annotations to
be done by someone who is not an author, and I try to implement that
requirement in my role as a journal editor (Linguistics), but it's hard.
We do want to see more empirical work published, and if the methodological
standards are too high, we will end publishing nothing at all.
I'd be very happy if there were community standards for this, and I'd like
to hear about any iniatives implementing more rigorous methodological
standards in lingusitic typology. Honestly, I wouldn't know what to
require. But it seems clear to me that we cannot simply go on like this,
annotating our own data, which we subsequently analyze, as it is well
known that annotation decisions are influenced by (mostly implicit)
biases.
Best,
Volker
Gast, Volker & Vahram Atayan (2019). 'Adverbials of immediate posteriority
in French and German: A contrastive corpus study of tout de suite,
immédiatement, gleich and sofort'. In Emonds, J., M. Janebová & L.
Veselovská (eds.): Language Use and Linguistic Structure. Proceedings of
the Olomouc Linguistics Colloquium 2018, 403-430. Olomouc Modern Lanuage
Series. Olomouc: Palacký University Olomouc.
in German:
Atayan, V., B. Fetzer, V. Gast, D. Möller, T. Ronalter (2019).
'Ausdrucksformen der unmittelbaren Nachzeitigkeit in Originalen und
Übersetzungen: Eine Pilotstudie zu den deutschen Adverbien gleich und
sofort und ihren Äquivalenten im Französischen, Italienischen, Spanischen
und Englischen'. In Ahrens, B., S. Hansen-Schirra, M. Krein-Kühle, M.
Schreiber, U. Wienen (eds.): Translation -- Linguistik -- Semiotik, 11-82.
Berlin: Frank & Timme.
Gast, V., V. Atayan, J. Biege, B. Fetzer, S. Hettrich, A. Weber (2019).
'Unmittelbare Nachzeitigkeit im Deutschen und Französischen: Eine Studie
auf Grundlage des OpenSubtitles-Korpus'. In Konecny, C., C. Konzett, E.
Lavric, W. Pöckl (eds.): Comparatio delectat III. Akten der VIII.
Internationalen Arbeitstagung zum romanisch-deutschen und innerromanischen
Sprachvergleich, 223-249. Frankfurt: Lang.
---
Prof. V. Gast
https://linktype.iaa.uni-jena.de/VG
On Sat, 14 Oct 2023, Adam James Ross Tallman wrote:
> Hello all,
>
> I am gathering a list of projects / citations / papers that use or refer to inter-rater reliability. So far I have.
>
> Himmelmann et al. On the universality of intonational phrases: a cross-linguistic interrater study. Phonology 35.
>
> Gast & Koptjevskaja-Tamm. 2022. Patterns of persistence and diffusibility in the European lexicon. Linguistic Typology (not explicitly the topic of the paper, but interrater reliability metrics are used)
>
> I understand people working with Grambank have used it, but I don't know if there is a publication on that.
>
> best,
>
> Adam
>
>
>
> --
> Adam J.R. Tallman
> Post-doctoral Researcher
> Friedrich Schiller Universität
> Department of English Studies
>
>
More information about the Lingtyp
mailing list