<html>
<font face="Arial, Helvetica"><b> Dear Colleagues,<br>
Many thanks to all who responded to my request concerning the rate of
agreement for transcriptions. <br>
Two main points emerge: percentage of agreement and Cohen’s Kappa (a
statistical test allowing to assess the agreement between two or more
observant of (?) the same phenomenon, fore more information, see
</font><a href="http://kappa.chez-alice.fr/" eudora="autourl"><font face="Courier New, Courier" color="#0000FF"><u>http://kappa.chez-alice.fr/</a></u></font><font face="Courier New, Courier">)<br>
</font><font face="Arial, Helvetica">I put below the initial query, the
references that I received and then I give some replies.<br>
<br>
</b></font>Dear Info-CHILDES Members,<br>
I'm looking for references on the rate of agreement of transcriptors for
the same transcription.<br>
Firstly, I would like to know how compute a rate of agreement and
secondly, which rate value determines the reliability of a given
transcription being transcribed by 2 transcriptors.<br>
Many thanks<br>
Aurélie<br>
<br>
<font face="@Arial Unicode MS"><u>References</u>: <br>
Roberts, F., Robinson, J.D., (2004), Interobserver agreement on
first-stage conversation analytic transcription, Human Communication
research, Vol.30, n°3.<br>
Yoon, Tae-Jin / Chavarria, Sandra / Cole, Jennifer / Hasegawa-Johnson,
Mark (2004): "Intertranscriber reliability of prosodic labeling on
telephone conversation using toBI", In INTERSPEECH-2004,
2729-2732.<br>
Pye, C., Wilcox, K. A., Siren, K. A. (1988). Refining transcriptions:The
significance of transcriber "errors." Journal of Child
Language.Vol 15(1), 17-37. <br>
Gut, U. & Bayerl, P. S. (2004): Measuring the Reliability f Manual
Annotations of Speech orpora. Proceedings of Seech Prosody 2004, Nara,
565-568. <br>
Shriberg, L. D., & Lof, G. L. (1991). Reliability studies in broad
and narrow phonetic transcription. Clinical Linguistics and Phonetics, 5,
225279.<br>
Kent, R. D. (1996). Hearing and believing: some limits to the
auditory-perceptual assessment of speech and voice disorders. American
Journal of Speech-Language Pathology, 5(3), 7-23.<br>
<br>
<u>A bout Cohen's Kappa</u>:(by Julian Lloyd).<br>
The two main methods for assessing inter-transcriber reliability are
percentage agreement and Cohen's kappa. Regarding percentage agreement,
the type of study you are carrying out will obviously determine your
level of analysis (e.g., word-by-word, phoneme-by-phoneme, utterance
segmentation, etc).You assess reliability for a sample of your data, say
20%. Taking words as an example, you would calculate the number of times
that the two transcribers agree and disagree on words. Percentage
agreement is then calculated as follows:<br>
PA = 100 x number of agreements / number of agreements + number of
disagreements<br>
A limitation of percentage agreements is that they do not make any
corrections for chance (i.e., the transcriber guessing). Cohen's (1960)
kappa is a reliability index that does correct for chance.<br>
k = (Po - Pe) / (1 - Pe)<br>
Po = proportion of observed agreements<br>
Pe = proportion of agreements that would be expected by chance<br>
You're looking for a result greater than 0.7.<br>
<br>
<u>About the methodology</u>: (by Diane Pesco)<br>
CALCULATING RELIABILITY FOR WORD-WORD AGREEMENT:<br>
Transcriber 2 transcribes segment of pre-established length <br>
Transcriber 1 & 2 comparison:<br>
On the "original" transcript 1:<br>
underline words that are discrepant (that is, a word is marked in
transcriber <br>
2's file but it is not the same word that transcriber 1
transcribed)<br>
circle words that transcriber 2 did not transcribe/omitted<br>
draw a circle to indicate words that transcriber 1 omitted AND pencil in
word <br>
(this way single printout can be used to review video & reach
consensus as <br>
necessary)<br>
count all the words in transcriber 1 printout + all circles with penciled
words <br>
to obtain total # words <br>
total at bottom of each page to ensure accuracy in counting<br>
calculate disagreement (then derive agreement) by dividing # discrepant +
# <br>
omissions (both those of transcriber 1 and 2) by total # words<br>
<br>
<u>About the methodology</u>: (by Gisela Szagun)<br>
I think different researchers have approached this problem differently.
In our research we started with a training of transcribers. First,
transcribers are introduced into the transcription rules (i.e. spelling
of contractions etc.). We made our own rules for German. Then they do a
transcript which is checked by an experienced transcriber. Then all the
transcribers (we had up to 7) meet and discuss problems. Then they all do
the same transcript and transcriptions are compared and differences
discussed. If things are moderately okay after this training, we work in
pairs of transcribers. Each member of the pair has their transcript
checked by the other member who has the transcript and listens to the
tape. If the person checking hear something different they make a
comment. You can also have both transcribers do 100 utterances
independently, actually transcribing them. In our big study (more than
400 2-hour recordings) we obtained agreement in this way on 7.3 % of the
speech samples. We simply calculated percentage agreement, i.e. the
number of utterances agreeing and those which don't. Agreement should be
90 %. We obtained between 96 % and 100 % . To my knowledge there is no
conventional standard for agreement, like for instance we have in
statistical analyses of observer reliabilities.<br>
<br>
Many thanks also to Elena Lieven, Ulrike Gut, Eve V. Clark, Joe
Stemberger and Christelle Dodane for their replies.<br>
<br>
Kind regards.<br>
Aurélie<br>
<br>
<br>
<br>
</font><font color="#800080"><b>Aurélie Nardy<br>
</b>Université Stendhal<br>
Laboratoire Lidilem <br>
BP 25, 38040 Grenoble cedex 9<br>
Tel (bureau) : 04 76 82 68 13 <br>
<br>
</font></html>