Rate of agreement for transcriptions
Aurélie NARDY
aurelie.nardy at u-grenoble3.fr
Mon Dec 19 13:00:22 UTC 2005
Dear Colleagues,
Many thanks to all who responded to my request concerning the rate of
agreement for transcriptions.
Two main points emerge: percentage of agreement and Cohens Kappa (a
statistical test allowing to assess the agreement between two or more
observant of (?) the same phenomenon, fore more information, see
http://kappa.chez-alice.fr/)
I put below the initial query, the references that I received and then I
give some replies.
Dear Info-CHILDES Members,
I'm looking for references on the rate of agreement of transcriptors for
the same transcription.
Firstly, I would like to know how compute a rate of agreement and secondly,
which rate value determines the reliability of a given transcription being
transcribed by 2 transcriptors.
Many thanks
Aurélie
References:
Roberts, F., Robinson, J.D., (2004), Interobserver agreement on first-stage
conversation analytic transcription, Human Communication research, Vol.30, n°3.
Yoon, Tae-Jin / Chavarria, Sandra / Cole, Jennifer / Hasegawa-Johnson, Mark
(2004): "Intertranscriber reliability of prosodic labeling on telephone
conversation using toBI", In INTERSPEECH-2004, 2729-2732.
Pye, C., Wilcox, K. A., Siren, K. A. (1988). Refining transcriptions:The
significance of transcriber "errors." Journal of Child Language.Vol 15(1),
17-37.
Gut, U. & Bayerl, P. S. (2004): Measuring the Reliability f Manual
Annotations of Speech orpora. Proceedings of Seech Prosody 2004, Nara,
565-568.
Shriberg, L. D., & Lof, G. L. (1991). Reliability studies in broad and
narrow phonetic transcription. Clinical Linguistics and Phonetics, 5, 225279.
Kent, R. D. (1996). Hearing and believing: some limits to the
auditory-perceptual assessment of speech and voice disorders. American
Journal of Speech-Language Pathology, 5(3), 7-23.
A bout Cohen's Kappa:(by Julian Lloyd).
The two main methods for assessing inter-transcriber reliability are
percentage agreement and Cohen's kappa. Regarding percentage agreement, the
type of study you are carrying out will obviously determine your level of
analysis (e.g., word-by-word, phoneme-by-phoneme, utterance segmentation,
etc).You assess reliability for a sample of your data, say 20%. Taking
words as an example, you would calculate the number of times that the two
transcribers agree and disagree on words. Percentage agreement is then
calculated as follows:
PA = 100 x number of agreements / number of agreements + number of
disagreements
A limitation of percentage agreements is that they do not make any
corrections for chance (i.e., the transcriber guessing). Cohen's (1960)
kappa is a reliability index that does correct for chance.
k = (Po - Pe) / (1 - Pe)
Po = proportion of observed agreements
Pe = proportion of agreements that would be expected by chance
You're looking for a result greater than 0.7.
About the methodology: (by Diane Pesco)
CALCULATING RELIABILITY FOR WORD-WORD AGREEMENT:
Transcriber 2 transcribes segment of pre-established length
Transcriber 1 & 2 comparison:
On the "original" transcript 1:
underline words that are discrepant (that is, a word is marked in transcriber
2's file but it is not the same word that transcriber 1 transcribed)
circle words that transcriber 2 did not transcribe/omitted
draw a circle to indicate words that transcriber 1 omitted AND pencil in word
(this way single printout can be used to review video & reach consensus as
necessary)
count all the words in transcriber 1 printout + all circles with penciled
words
to obtain total # words
total at bottom of each page to ensure accuracy in counting
calculate disagreement (then derive agreement) by dividing # discrepant + #
omissions (both those of transcriber 1 and 2) by total # words
About the methodology: (by Gisela Szagun)
I think different researchers have approached this problem differently. In
our research we started with a training of transcribers. First,
transcribers are introduced into the transcription rules (i.e. spelling of
contractions etc.). We made our own rules for German. Then they do a
transcript which is checked by an experienced transcriber. Then all the
transcribers (we had up to 7) meet and discuss problems. Then they all do
the same transcript and transcriptions are compared and differences
discussed. If things are moderately okay after this training, we work in
pairs of transcribers. Each member of the pair has their transcript checked
by the other member who has the transcript and listens to the tape. If the
person checking hear something different they make a comment. You can also
have both transcribers do 100 utterances independently, actually
transcribing them. In our big study (more than 400 2-hour recordings) we
obtained agreement in this way on 7.3 % of the speech samples. We simply
calculated percentage agreement, i.e. the number of utterances agreeing and
those which don't. Agreement should be 90 %. We obtained between 96 % and
100 % . To my knowledge there is no conventional standard for agreement,
like for instance we have in statistical analyses of observer reliabilities.
Many thanks also to Elena Lieven, Ulrike Gut, Eve V. Clark, Joe Stemberger
and Christelle Dodane for their replies.
Kind regards.
Aurélie
Aurélie Nardy
Université Stendhal
Laboratoire Lidilem
BP 25, 38040 Grenoble cedex 9
Tel (bureau) : 04 76 82 68 13
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/info-childes/attachments/20051219/abfd1e03/attachment.htm>
More information about the Info-childes
mailing list