[Lingtyp] typology projects that use inter-rater reliability?

Mon Oct 16 12:02:32 UTC 2023

Hey Adam (and others),

I think you could phrase the question differently: What typological 
studies have been carried out with multiple annotators and careful 
documentation of the annotation process, including precise annotation 
guidelines, the training of the annotators, publication of all the 
(individual) annotations, calculation of inter-annotator agreement etc.?

I think there are very few. The reason is that the process is very 
time-consuming, and "risky". I was a member of a project co-directed with 
Vahram Atayan (Heidelberg) where we carried out very careful annotations 
dealing with what we call 'adverbials of immediate posteriority' (see the 
references below). Even though we only dealt with a few well-known 
European languages, it took us quite some time to develop annotation 
guidelines and train annotators. The inter-rater agreement was 
surprisingly low even for categories that appeared straightforward to us, 
e.g. agentivity of a predicate; and we were dealing with well-known 
languages (English, German, French, Spanish, Italian). So the outcomes of 
this process were very moderate in comparison with the work that went into 
the annotations. (Note that the project was primarily situated in the 
field of contrastive linguistics and translation studies, not linguistic 
typology, but the challenges are the same).

It's a dilemma: as a field, we often fail to meet even the most basic 
methodological requirements that are standardly made in other fields (most 
notably psychology). I know of at least two typological projects where 
inter-rater agreement tests were run, but the results were so poor that a 
decision was made to not pursue this any further (meaning, the projects 
were continued, but without inter-annotator agreement tests; that's what 
makes annotation projects "risky": what do you do if you never reach a 
satisfactory level of inter-annotator agreement?). Most annotation 
projects, including some of my own earlier work, are based on what we 
euphemistically call 'expert annotation', with 'expert' referring to 
ourselves, the authors. Today I would minimally expect the annotations to 
be done by someone who is not an author, and I try to implement that 
requirement in my role as a journal editor (Linguistics), but it's hard. 
We do want to see more empirical work published, and if the methodological 
standards are too high, we will end publishing nothing at all.

I'd be very happy if there were community standards for this, and I'd like 
to hear about any iniatives implementing more rigorous methodological 
standards in lingusitic typology. Honestly, I wouldn't know what to 
require. But it seems clear to me that we cannot simply go on like this, 
annotating our own data, which we subsequently analyze, as it is well 
known that annotation decisions are influenced by (mostly implicit) 
biases.

Best,
Volker

Gast, Volker & Vahram Atayan (2019). 'Adverbials of immediate posteriority 
in French and German: A contrastive corpus study of tout de suite, 
immédiatement, gleich and sofort'. In Emonds, J., M. Janebová & L. 
Veselovská (eds.): Language Use and Linguistic Structure. Proceedings of 
the Olomouc Linguistics Colloquium 2018, 403-430. Olomouc Modern Lanuage 
Series. Olomouc: Palacký University Olomouc.

in German:

Atayan, V., B. Fetzer, V. Gast, D. Möller, T. Ronalter (2019). 
'Ausdrucksformen der unmittelbaren Nachzeitigkeit in Originalen und 
Übersetzungen: Eine Pilotstudie zu den deutschen Adverbien gleich und 
sofort und ihren Äquivalenten im Französischen, Italienischen, Spanischen 
und Englischen'. In Ahrens, B., S. Hansen-Schirra, M. Krein-Kühle, M. 
Schreiber, U. Wienen (eds.): Translation -- Linguistik -- Semiotik, 11-82. 
Berlin: Frank & Timme.

Gast, V., V. Atayan, J. Biege, B. Fetzer, S. Hettrich, A. Weber (2019). 
'Unmittelbare Nachzeitigkeit im Deutschen und Französischen: Eine Studie 
auf Grundlage des OpenSubtitles-Korpus'. In Konecny, C., C. Konzett, E. 
Lavric, W. Pöckl (eds.): Comparatio delectat III. Akten der VIII. 
Internationalen Arbeitstagung zum romanisch-deutschen und innerromanischen 
Sprachvergleich, 223-249. Frankfurt: Lang.

---
Prof. V. Gast
https://linktype.iaa.uni-jena.de/VG

On Sat, 14 Oct 2023, Adam James Ross Tallman wrote:

> Hello all,
> 
> I am gathering a list of projects / citations / papers that use or refer to inter-rater reliability. So far I have.
> 
> Himmelmann et al. On the universality of intonational phrases: a cross-linguistic interrater study. Phonology 35.
> 
> Gast & Koptjevskaja-Tamm. 2022. Patterns of persistence and diffusibility in the European lexicon. Linguistic Typology (not explicitly the topic of the paper, but interrater reliability metrics are used)
> 
> I understand people working with Grambank have used it, but I don't know if there is a publication on that.
> 
> best,
> 
> Adam
> 
> 
> 
> --
> Adam J.R. Tallman
> Post-doctoral Researcher
> Friedrich Schiller Universität
> Department of English Studies
> 
>