Rate of agreement for transcriptions

Mon Dec 19 13:00:22 UTC 2005

  Dear Colleagues,
Many thanks to all who responded to my request concerning the rate of 
agreement for transcriptions.
Two main points emerge: percentage of agreement and Cohen’s Kappa (a 
statistical test allowing to assess the agreement between two or more 
observant of (?) the same phenomenon, fore more information, see 
http://kappa.chez-alice.fr/)
I put below the initial query, the references that I received and then I 
give some replies.

Dear Info-CHILDES Members,
I'm looking for references on the rate of agreement of transcriptors for 
the same transcription.
Firstly, I would like to know how compute a rate of agreement and secondly, 
which rate value determines the reliability of a given transcription being 
transcribed by 2 transcriptors.
Many thanks
Aurélie

References:
Roberts, F., Robinson, J.D., (2004), Interobserver agreement on first-stage 
conversation analytic transcription, Human Communication research, Vol.30, n°3.
Yoon, Tae-Jin / Chavarria, Sandra / Cole, Jennifer / Hasegawa-Johnson, Mark 
(2004): "Intertranscriber reliability of prosodic labeling on telephone 
conversation using toBI", In INTERSPEECH-2004, 2729-2732.
Pye, C., Wilcox, K. A., Siren, K. A. (1988). Refining transcriptions:The 
significance of transcriber "errors." Journal of Child Language.Vol 15(1), 
17-37.
Gut, U. & Bayerl, P. S. (2004): Measuring the Reliability f Manual 
Annotations of Speech orpora. Proceedings of Seech Prosody 2004, Nara, 
565-568.
Shriberg, L. D., & Lof, G. L. (1991). Reliability studies in broad and 
narrow phonetic transcription. Clinical Linguistics and Phonetics, 5, 225279.
Kent, R. D. (1996). Hearing and believing: some limits to the 
auditory-perceptual assessment of speech and voice disorders. American 
Journal of Speech-Language Pathology, 5(3), 7-23.

A bout Cohen's Kappa:(by Julian Lloyd).
The two main methods for assessing inter-transcriber reliability are 
percentage agreement and Cohen's kappa. Regarding percentage agreement, the 
type of study you are carrying out will obviously determine your level of 
analysis (e.g., word-by-word, phoneme-by-phoneme, utterance segmentation, 
etc).You assess reliability for a sample of your data, say 20%. Taking 
words as an example, you would calculate the number of times that the two 
transcribers agree and disagree on words. Percentage agreement is then 
calculated as follows:
PA = 100 x number of agreements / number of agreements + number of 
disagreements
A limitation of percentage agreements is that they do not make any 
corrections for chance (i.e., the transcriber guessing). Cohen's (1960) 
kappa is a reliability index that does correct for chance.
k = (Po - Pe) / (1 - Pe)
Po = proportion of observed agreements
Pe = proportion of agreements that would be expected by chance
You're looking for a result greater than 0.7.

About the methodology: (by Diane Pesco)
CALCULATING RELIABILITY FOR WORD-WORD AGREEMENT:
Transcriber 2 transcribes segment of pre-established length
Transcriber 1 & 2 comparison:
On the "original" transcript 1:
underline words that are discrepant (that is, a word is marked in transcriber
2's file but it is not the same word that transcriber 1 transcribed)
circle words that transcriber 2 did not transcribe/omitted
draw a circle to indicate words that transcriber 1 omitted AND pencil in word
(this way single printout can be used to review video & reach consensus as
necessary)
count all the words in transcriber 1 printout + all circles with penciled 
words
to obtain total # words
total at bottom of each page to ensure accuracy in counting
calculate disagreement (then derive agreement) by dividing # discrepant + #
omissions (both those of transcriber 1 and 2) by total # words

About the methodology: (by Gisela Szagun)
I think different researchers have approached this problem differently. In 
our research we started with a training of transcribers. First, 
transcribers are introduced into the transcription rules (i.e. spelling of 
contractions etc.). We made our own rules for German. Then they do a 
transcript which is checked by an experienced transcriber. Then all the 
transcribers (we had up to 7) meet and discuss problems. Then they all do 
the same transcript and transcriptions are compared and differences 
discussed. If things are moderately okay after this training, we work in 
pairs of transcribers. Each member of the pair has their transcript checked 
by the other member who has the transcript and listens to the tape. If the 
person checking hear something different they make a comment. You can also 
have both transcribers do 100 utterances independently, actually 
transcribing them. In our big study (more than 400 2-hour recordings) we 
obtained agreement in this way on 7.3 % of the speech samples. We simply 
calculated percentage agreement, i.e. the number of utterances agreeing and 
those which don't. Agreement should be 90 %. We obtained between 96 % and 
100 % . To my knowledge there is no conventional standard for agreement, 
like for instance we have in statistical analyses of observer reliabilities.

Many thanks also to Elena Lieven, Ulrike Gut, Eve V. Clark, Joe Stemberger 
and Christelle Dodane for their replies.

Kind regards.
Aurélie

Aurélie Nardy
Université Stendhal
Laboratoire Lidilem
BP 25, 38040 Grenoble cedex 9
Tel (bureau) : 04 76 82 68 13

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/info-childes/attachments/20051219/abfd1e03/attachment.htm>