transcript validation

Sigal Uziel-Karl sigaluk at gmail.com
Wed Sep 5 11:04:35 UTC 2007


Dear Elisabeth,

My colleagues and I at Haifa University (Israel) have some experience
transcribing Spoken Palestinian Arabic which has no written tradition and
has a variety of different dialects. To cope with the transcription task, we
have done the following: (1) we've developed a set of transcription
conventions to represent the Arabic sounds that cannot be represented by the
symbols used for English; (2) we have arbitrarily decided to use the forms
of the most widely used dialect (alternatively one could use the forms of
Modern Standard Arabic) as an ancor so that on the transcription line each
word is transcribed as uttered, but the ancor word appears in square
brackets following it. For examle, ?al [: qal] 'said'. This way both forms
are listed but in the output of the FREQ command you get only the ancor form
(unless you look for the specific dialectal variation); (3) One of the
transcription headers lists the dialect which the speakers use, for
reference; (4) we run freq on the transcripts occasionally and compare the
output lists to make sure there isn intra-transcriber consistency, and
inter-transcriber consistency at least in the ancor words.

Best,
Sigal Uziel-Karl.

-----Original Message-----
From: info-childes at mail.talkbank.org [mailto:info-childes at mail.talkbank.org]
On Behalf Of Elizabeth Prado
Sent: Monday, September 03, 2007 11:16 AM
To: info-childes at mail.talkbank.org
Subject: transcript validation

I am working on transcribing children's speech on the Indonesian island of
Lombok, where the local language is Sasak. I'm working with 6 transcribers,
all of whom are native speakers of Sasak. 10% of every transcript is being
re-transcribed by another transcriber for validation and we're having
difficulty getting high levels of agreement. I think there are two possible
reasons for this: one is that Sasak is rarely written since all education
from elementary school to university is couducted in Indonesian (the
national language). The other is that there is significant dialect variation
across the island. We are trying to give recordings of children to
transcribers from the same dialect (same general area of the island) but
this is difficult since dialect variation can occur from village to village.

The main purpose of the transcriptions is to validate a parent-report
sentence complexity measure that we have developed to evaluate the language
development of children whose mothers received micronutrient supplements
during pregnancy.

I was wondering if anyone has transcribed any non-written languages and if
you have any advice about how to increase agreement between transcribers.
Even when we don't count spelling differences as differences between the
transcriptions, we're still getting agreement <80%. Any advice would be
appreciated!

-- 

*******************************************
Elizabeth Prado
Psychology Department
Fylde C Floor
Lancaster LA14YF
UK
Tel: 01524 592947
Website: http://www.psych.lancs.ac.uk/people/BethPrado.html



More information about the Info-childes mailing list