Autoreply: "RE: transcript validation "

Petra Schulz P.Schulz at em.uni-frankfurt.de
Wed Sep 5 11:20:55 UTC 2007


------                                                           ------

This message was automatically generated by email software
The delivery of your message has not been affected.

------                                                           ------

Ich bin vom 5. bis zum 9. September wegen einer Tagung in Barcelona nicht
erreichbar. In dringenden Faellen wenden
Sie sich bitte an meine Sekretaerin Andrea Hegewald :
hegewald at em.uni-frankfurt.de

 Mit freundlichen Gruessen,
 Petra Schulz

I'm out of town and will not be reading my mail before September, 9. Your
mail will be dealt with when I return. In urgent cases, please contact my
secretary Andrea Hegewald: hegewald at em.uni-frankfurt.de

Yours sincerely,
Petra Schulz

------ This is a copy of the message, including all the headers. ------

Received: from mx2.cluster.uni-frankfurt.de ([10.2.21.2])
	by thot.rz.uni-frankfurt.de with esmtp (Exim 4.52)
	id 1ISswJ-002fBE-35; Wed, 05 Sep 2007 13:20:55 +0200
Received: from mail.talkbank.org ([128.2.64.233])
	by mx2.cluster.uni-frankfurt.de with smtp (Exim 4.64)
	(envelope-from <info-childes at mail.talkbank.org>)
	id 1ISsvV-0006ER-5f; Wed, 05 Sep 2007 13:20:05 +0200
Received: from gmail.com by mail.talkbank.org with SMTP; Wed, 5 Sep 2007
 07:05:05 -0400
Date: Wed, 05 Sep 2007 14:04:35 +0300
From: Sigal Uziel-Karl <sigaluk at gmail.com>
Subject: RE: transcript validation
In-reply-to: <46DBC2BC.20602 at lancaster.ac.uk>
X-012-Sender: karl-y at 012.net.il
To: 'Elizabeth Prado' <e.prado at lancaster.ac.uk>,
 info-childes at mail.talkbank.org
Reply-to: sigal at alum.mit.edu
Message-id: <007f01c7efac$8b388920$1700000a at sigallenovo>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
X-Mailer: Microsoft Office Outlook 11
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Thread-index: AcfuWOpf6IvBcdjJTnSJbk8hG0ZMgwBRjfdA
Sender: <info-childes at mail.talkbank.org>
Precedence: List
List-Software: LetterRip Pro 4.05 (1404) by LetterRip Software, LLC.
List-Unsubscribe: <mailto:info-childes-off at mail.talkbank.org>
X-LR-SENT-TO: em.uni-frankfurt.de
X-MailScanner: Found to be clean
X-MailScanner-SpamCheck: not spam, SpamAssassin (nicht zwischen gespeichert,
	Wertung=0, benoetigt 4, autolearn=not spam)

Dear Elisabeth,

My colleagues and I at Haifa University (Israel) have some experience
transcribing Spoken Palestinian Arabic which has no written tradition and
has a variety of different dialects. To cope with the transcription task, we
have done the following: (1) we've developed a set of transcription
conventions to represent the Arabic sounds that cannot be represented by the
symbols used for English; (2) we have arbitrarily decided to use the forms
of the most widely used dialect (alternatively one could use the forms of
Modern Standard Arabic) as an ancor so that on the transcription line each
word is transcribed as uttered, but the ancor word appears in square
brackets following it. For examle, ?al [: qal] 'said'. This way both forms
are listed but in the output of the FREQ command you get only the ancor form
(unless you look for the specific dialectal variation); (3) One of the
transcription headers lists the dialect which the speakers use, for
reference; (4) we run freq on the transcripts occasionally and compare the
output lists to make sure there isn intra-transcriber consistency, and
inter-transcriber consistency at least in the ancor words.

Best,
Sigal Uziel-Karl.

-----Original Message-----
From: info-childes at mail.talkbank.org [mailto:info-childes at mail.talkbank.org]
On Behalf Of Elizabeth Prado
Sent: Monday, September 03, 2007 11:16 AM
To: info-childes at mail.talkbank.org
Subject: transcript validation

I am working on transcribing children's speech on the Indonesian island of
Lombok, where the local language is Sasak. I'm working with 6 transcribers,
all of whom are native speakers of Sasak. 10% of every transcript is being
re-transcribed by another transcriber for validation and we're having
difficulty getting high levels of agreement. I think there are two possible
reasons for this: one is that Sasak is rarely written since all education
from elementary school to university is couducted in Indonesian (the
national language). The other is that there is significant dialect variation
across the island. We are trying to give recordings of children to
transcribers from the same dialect (same general area of the island) but
this is difficult since dialect variation can occur from village to village.

The main purpose of the transcriptions is to validate a parent-report
sentence complexity measure that we have developed to evaluate the language
development of children whose mothers received micronutrient supplements
during pregnancy.

I was wondering if anyone has transcribed any non-written languages and if
you have any advice about how to increase agreement between transcribers.
Even when we don't count spelling differences as differences between the
transcriptions, we're still getting agreement <80%. Any advice would be
appreciated!

-- 

*******************************************
Elizabeth Prado
Psychology Department
Fylde C Floor
Lancaster LA14YF
UK
Tel: 01524 592947
Website: http://www.psych.lancs.ac.uk/people/BethPrado.html



More information about the Info-childes mailing list