<html>

<font face="Arial, Helvetica"><b> Dear Colleagues,<br>

Many thanks to all who responded to my request concerning the rate of

agreement for transcriptions. <br>

Two main points emerge: percentage of agreement and Cohen’s Kappa (a

statistical test allowing to assess the agreement between two or more

observant of (?) the same phenomenon, fore more information, see

</font><a href="http://kappa.chez-alice.fr/" eudora="autourl"><font face="Courier New, Courier" color="#0000FF"><u>http://kappa.chez-alice.fr/</a></u></font><font face="Courier New, Courier">)<br>

</font><font face="Arial, Helvetica">I put below the initial query, the

references that I received and then I give some replies.<br>

<br>

</b></font>Dear Info-CHILDES Members,<br>

I'm looking for references on the rate of agreement of transcriptors for

the same transcription.<br>

Firstly, I would like to know how compute a rate of agreement and

secondly, which rate value determines the reliability of a given

transcription being transcribed by 2 transcriptors.<br>

Many thanks<br>

Aurélie<br>

<br>

<font face="@Arial Unicode MS"><u>References</u>: <br>

Roberts, F., Robinson, J.D., (2004), Interobserver agreement on

first-stage conversation analytic transcription, Human Communication

research, Vol.30, n°3.<br>

Yoon, Tae-Jin / Chavarria, Sandra / Cole, Jennifer / Hasegawa-Johnson,

Mark (2004): "Intertranscriber reliability of prosodic labeling on

telephone conversation using toBI", In INTERSPEECH-2004,

2729-2732.<br>

Pye, C., Wilcox, K. A., Siren, K. A. (1988). Refining transcriptions:The

significance of transcriber "errors." Journal of Child

Language.Vol 15(1), 17-37. <br>

Gut, U. & Bayerl, P. S. (2004): Measuring the Reliability f Manual

Annotations of Speech orpora. Proceedings of Seech Prosody 2004, Nara,

565-568. <br>

Shriberg, L. D., & Lof, G. L. (1991). Reliability studies in broad

and narrow phonetic transcription. Clinical Linguistics and Phonetics, 5,

225279.<br>

Kent, R. D. (1996). Hearing and believing: some limits to the

auditory-perceptual assessment of speech and voice disorders. American

Journal of Speech-Language Pathology, 5(3), 7-23.<br>

<br>

<u>A bout Cohen's Kappa</u>:(by Julian Lloyd).<br>

The two main methods for assessing inter-transcriber reliability are

percentage agreement and Cohen's kappa. Regarding percentage agreement,

the type of study you are carrying out will obviously determine your

level of analysis (e.g., word-by-word, phoneme-by-phoneme, utterance

segmentation, etc).You assess reliability for a sample of your data, say

20%. Taking words as an example, you would calculate the number of times

that the two transcribers agree and disagree on words. Percentage

agreement is then calculated as follows:<br>

PA = 100 x number of agreements / number of agreements + number of

disagreements<br>

A limitation of percentage agreements is that they do not make any

corrections for chance (i.e., the transcriber guessing). Cohen's (1960)

kappa is a reliability index that does correct for chance.<br>

k = (Po - Pe) / (1 - Pe)<br>

Po = proportion of observed agreements<br>

Pe = proportion of agreements that would be expected by chance<br>

You're looking for a result greater than 0.7.<br>

<br>

<u>About the methodology</u>: (by Diane Pesco)<br>

CALCULATING RELIABILITY FOR WORD-WORD AGREEMENT:<br>

Transcriber 2 transcribes segment of pre-established length <br>

Transcriber 1 & 2 comparison:<br>

On the "original" transcript 1:<br>

underline words that are discrepant (that is, a word is marked in

transcriber <br>

2's file but it is not the same word that transcriber 1 

transcribed)<br>

circle words that transcriber 2 did not transcribe/omitted<br>

draw a circle to indicate words that transcriber 1 omitted AND pencil in

word <br>

(this way single printout can be used to review video & reach

consensus as <br>

necessary)<br>

count all the words in transcriber 1 printout + all circles with penciled

words <br>

to obtain total # words <br>

total at bottom of each page to ensure accuracy in counting<br>

calculate disagreement (then derive agreement) by dividing # discrepant +

# <br>

omissions (both those of transcriber 1 and 2) by total # words<br>

<br>

<u>About the methodology</u>: (by Gisela Szagun)<br>

I think different researchers have approached this problem differently.

In our research we started with a training of transcribers. First,

transcribers are introduced into the transcription rules (i.e. spelling

of contractions etc.). We made our own rules for German. Then they do a

transcript which is checked by an experienced transcriber. Then all the

transcribers (we had up to 7) meet and discuss problems. Then they all do

the same transcript and transcriptions are compared and differences

discussed. If things are moderately okay after this training, we work in

pairs of transcribers. Each member of the pair has their transcript

checked by the other member who has the transcript and listens to the

tape. If the person checking hear something different they make a

comment. You can also have both transcribers do 100 utterances

independently, actually transcribing them. In our big study (more than

400 2-hour recordings) we obtained agreement in this way on 7.3 % of the

speech samples. We simply calculated percentage agreement, i.e. the

number of utterances agreeing and those which don't. Agreement should be

90 %. We obtained between 96 % and 100 % . To my knowledge there is no

conventional standard for agreement, like for instance we have in

statistical analyses of observer reliabilities.<br>

<br>

Many thanks also to Elena Lieven, Ulrike Gut, Eve V. Clark, Joe

Stemberger and Christelle Dodane for their replies.<br>

<br>

Kind regards.<br>

Aurélie<br>

<br>

<br>

<br>

</font><font color="#800080"><b>Aurélie Nardy<br>

</b>Université Stendhal<br>

Laboratoire Lidilem <br>

BP 25, 38040 Grenoble cedex 9<br>

Tel (bureau) : 04 76 82 68 13 <br>

<br>

</font></html>