Thai Frogs

Brian MacWhinney macw at cmu.edu
Tue Dec 26 08:31:53 UTC 2000


Dear Info-CHILDES,

  I am happy to announce the addition to the CHILDES database of a new set
of Frog Story narrative transcripts from Thai, donated by Jordan Zlatev of
Lund University and Peerapat YangKlang of Chulalongkorn University, Bangkok.
The corpus is done in CHAT format and follows additional guidelines for Frog
Story as formulated by Berman and Slobin.  The readme for the corpus is as
follows.  Some of the formatting of tables is a bit off because of the
conversion from FrameMaker.  However, the original documentation for this
corpus, along with all the other corpora added in 2000 can be found in the
electronic version of the database manual on the childes.psy.cmu.edu server.
The data include both a romanized transcription and a transcript in Thai
orthography.  However, to read the Thai orthography you need to use a
specific font on Windows called Cordia.

--Brian MacWhinney


Thai Corpus

Zlatev, Jordan
Lund University
jordan_zlatev at lucs.lu.se

Yangkland, Peerapat
Department of Linguistics
Chulalongkom University
Bangkok, Thailand

This data was collected as part of the First Language Acquisition of Thai
project, funded by The Swedish Foundation for International Cooperation in
Research and Higher Education (STINT) and hosted by the Department of
Linguistics, Chulalongkorn, Thailand during 2000. Though our project focused
on the development of spatial expressions in Thai, we made a serious effort
to make the data as consistent and general as possible so that it could be
used for other studies as well. We would also like to thank everyone who
helped us carry out the collection and transcription of this data: Janich
Feangfu, Maneeya Sangjan, Mingmit Sriprasit, Soraya Osathanonda, Martha
Karrebaek Hentze and Katarina Lindblom.

The child data was collected in three Bangkok schools and the adult data was
collected from students of Chulalongkorn University. The interviewer, always
a native Thai speaker, first showed the Frog Story book to the subject and
let him scan through it by himself for about 5 minutes. For the children,
the instruction were approximately as follows:
This story is about a boy, his dog, and a frog.
I¹ll let you take a look at the pictures of the story, first.
Then, I will ask you to tell me the story, picture by picture.

The interviewer sometimes encouraged the child to proceed with the story.
These utterances of the interviewer have not been transcribed. Even though
we tried to keep the elicitation conditions as uniform as possible, there
were inevitable differences due to the fact that the data was collected by 5
different interviewers. (The name of the interviewer appears first in the
@Transcriber list.)
Transcription
Each recorded narrative was transcribed in standard Thai orthography, in
almost all cases by the person, who performed the interview. The Thai
transcription was then converted into a phonemic notation via the
semi-automatic Thai Transcription program, developed at the Department of
Linguistics, Chulalongkorn University. The consonsants are as follows:
Thai Consonants
    labial    postdental    palatal    velar    glottal
+voice stop    b    d
-voice -asp    p    t    c    k    ?
-voice +asp    ph    th    ch    kh
spirants    f    s            h
semivowel    w        j
nasal    m    n        N
lateral        l   
trill        r     

The vowels are as follows:
Thai Vowels
    Front    Central    Back
Close    i    U    u
Mid    e    q    o
Open    x    a    O


Tones were marked as:    Mid: 0, Low: 1, Falling: 2, High: 3, Rising: 4. Due
to requirements of CHAT, the ? for glottal stop was omitted. The presence of
the glottal stop is nevertheless derivable from the data since Thai
syllables can not begin with a vowel or end with a short vowel. Whenever
that seems to be the case in the data, there is an ³invisible² glottal stop
before the initial vowel or after the final short vowel.
Segmentation
The transliteration was placed on the main tier, and the transcription in
Thai orthography, using font Cordia UPC 14 (Win95:CordiaUPC:-19:222), was
placed on a dependent tier. Thai orthography does not place spaces between
words and the computer program does not perform word segmentation so
word-segmentation had to be performed manually. Compound expressions
sometimes posed problems. In deciding how to treat a multi-syllabic word, we
used the following criteria:
1.    One simple word IFF the two (or more) syllables do not have any clear
separate meaning (e.g. naa2taaN1 Œwindow¹)
2.    One complex word (³+² between the syllables) IFF the syllables have
clear separate meaning, but the sum of the parts does not equal the whole
(e.g. phuu2+jaj1 Œadult¹, dek1+chaaj0 Œboy¹)
3.    Noun phrase: (SPACE between the parts) IFF the parts have separate
meaning, BUT the parts combine systematically to give the meaning of the
whole: (raN0 phUN2 Œbee hive¹ maa4 noj4 Œlittle dog¹)
CHAT Formatting
The rough phonemic transcription was then checked against the original tape
recordings and corrections were made. Deviations from standard pronunciation
were included, using the convention offered by CHAT, placing the standard in
square brackets behind pronounced form, e.g. laN0 [: raN0]. We then listened
through the tape once more in order to mark all pauses: short (#) and long
(##) and extra-long vowels, e.g. maa:4. Repetitions and re-tracings were
marked using the CHAT conventions, i.e. the repeated material was surrounded
by <> and followed by [/], [//] or [///].

Following the CHAT convention, each main line was made to include only one
utterance ­ defined with a combination of phonetic and grammatical criteria.
Thus, a line/utterance ends when both conditions are met:
1.    There is short pause (#), a long pause (##), or a ³vowel lengthening²,
and
2.    This coincides with the end of a clause, marked as [c].
If only (1) is met, the pause is marked within the utterance/line. If only
(2) is met, [c] marks the end of the clause but not the utterance/line.
However, we sometimes allow a line/utterance to end even if there is a word
between the pause and clause boundary.

Because of the ubiquity of serial verb constructions in Thai, it was not
always easy to determine where a clause ends, e.g. the criterion of ³one
unitary predication² used by Berman and Slobin (1994) could not be applied.
The criteria for deciding that there is a clause boundary were the
following:
1.    Before a new explicit or implicit subject
2.    Before the complementizers thii2 and sUN2 (Œthat¹), when there are
verbs both preceding and following these words
3.    Before a conjunction (lx3 Œand¹, lxxw3 Œand¹, lxxw3 kO2 Œand then¹,
kO2 Œthen¹, thxx1 Œbut¹), when there are verbs both preceding and following
these words
4.    After wa2 Œthat¹, when there are verbs both preceding and following
5.    When a chain of verbs can be interrupted with any conjunction (e.g.
lxxw3)

Finally, each of the 50 narratives was read though once again by at least
two different checkers, correcting for any inconsistencies. Furthermore, a
listing of all the words in the corpus was produced using the CLAN command
freq +k *.cha +u +r6, and we went through this list word by word, making
sure that each word is transcribed consistently throughout the corpus. In
addition to standard CHAT codes, we used the %tai dependent tier for the
Thai transcription and a double ++ to indicate reduplication as in
³luuk2++luuk2.²
Files
The files are summarized in the following table. The subjects¹ names do not
appear in the transcripts.
Thai Frog Files
File    Age    Sex    Date
3a    4;3.8    f    18-FEB-2000
3b     3;11.20     f      18-FEB-2000
3c     4;4.4     m     18-FEB-2000
3d     3;10.12     f      18-FEB-2000
3e     3;11.16     m     18-FEB-2000
3f     4;0.2     f      8-SEP-2000
3g     3;11.19     m     8-SEP-2000
3h     3;6.15     m     8-SEP-2000
3I     3;11.2     m     8-SEP-2000
3j     3;11.22     f      8-SEP-2000
5a     6;02.15     m     18-FEB-2000
5b     5;11.23     m     18-FEB-2000
5c     5;6.01     m     18-FEB-2000
5d     5;06.11     m     18-FEB-2000
5e     5;08.18     f      18-FEB-2000
5f     6;04.1     m     18-FEB-2000
5g     5;10.10     f      18-FEB-2000
5h     5;06.25     m     18-FEB-2000
5I     5;11.21     m     18-FEB-2000
5j     6;01.17     f      16-FEB-2000
9a     8;7.8     m     16-FEB-2000
9b     9;0.21     f      16-FEB-2000
9c     8;10.27     m     16-FEB-2000
9d     9;0.11     f      16-FEB-2000
9e     9;2.0     m     16-FEB-2000
9f     9;3.2     f      16-FEB-2000
9g     8;10.8     m     16-FEB-2000
9h     8;7.0     f      16-FEB-2000
9I     9;0.3     f      16-FEB-2000
9j     9;7.2     f      2-FEB-2000
11a     10;10.16     f      16-FEB-2000
11b     10;05.22     f      16-FEB-2000
11c     11;1.5     f      16-FEB-2000
11d     11;4.20     f      16-FEB-2000
11e     11;2.24     m     16-FEB-2000
11f     10;8.27    m    16-FEB-2000
11g     10;3.22     m     16-FEB-2000
11h     11;0.16     m     16-FEB-2000
11i     10;4.24     f      16-FEB-2000
11j     11;3.18     m     16-FEB-2000
20a     22;8.0     f      15-APR-2000
20b     35;06.0     f      15-APR-2000
20c     21;7.0     f      15-APR-2000
20d     22;11.0     f      15-APR-2000
20e     22;8.0     f      15-APR-2000
20f     22;11.30     m     15-APR-2000
20g     29;01.20     m     15-APR-2000
20h     20;07.11     m     15-APR-2000
20I     23;10.27     m     15-APR-2000
20j     19;06.06     m     15-APR-2000

            

If you publish any paper based on this data, please send an MS-Word or
PDF-formatted version of your paper as an attachment to
jordan_zlatev at lucs.lu.se. Users of this data should cite Zlatev, J. and
Yangklang, P. (2001) ³Frog stories in Thai: Transcription and Analysis of 50
Thai narratives from 5 age groups². (forthcoming)



More information about the Info-childes mailing list