Japanese corpus with audio data
Brian MacWhinney
macw at cmu.edu
Tue Dec 26 10:29:25 UTC 2000
Dear Info-CHILDES,
We have now added a third linked audio corpus to the data at
http://childes.psy.cmu.edu/audio/
This directory includes both transcripts on the learning of Japanese from
Susanne Miyata's subject Tai. The transcripts are linked to audio files
which are included in the directory.
This is now the third linked corpus, along with the Bernstein and
MacWhinney corpora. Thanks to Susanne for sending us this great resource
for the study of the acquisition of Japanese.
--Brian MacWhinney
Here Susanne's readme file:
The TAI Corpus: Longitudinal Speech Data of a Japanese Boy aged 1;5.20 -
3;1.1 v.2000/7 by Susanne Miyata
********************************
** Please cite:
***************
** Miyata, Susanne (2000). The TAI Corpus: Longitudinal Speech Data of a
Japanese Boy aged 1;5.20 - 3;1.1 Bulletin of Shukutoku Junior College 39,
77-85.
********************************
********************************
Contact Address:
Dr. Susanne Miyata
Aichi Shukutoku University
23 Sakuragaoka Chikusa-ku
Nagoya, 464-8671 Japan smiyata at asu.aasa.ac.jp
********************************
History
This data was collected during September, 1993 and June, 1995. Tai was after
Ryo, Nao, and Aki (Miyata, 1992, 1993, 1995) the fourth child observed
longitudinally. For Tai's observation I applied the same schedule used for
the observation of the other children: that is once a week for about one
hour at his home while playing with his mother.
In the previous observations it had proved convenient for both mother and
observer, to fix weekday and time. In Tai's case, we decided to start about
10 o'clock in the morning. After a short period of preparation (video
setting, and the indispensable cup of coffee for the observer), we would
start with the recordings about 10:30.
The recordings were done parallel on mini-discs (audio recording) and 8mm
video. This was done out of two reasons. The sound quality for MD was
considerably better than for the video. On the other hand, the video
recording contains necessary information to be able to judge the utterances
of the child. The second reason is the rather low reliability of the
equipment. Actually, out of 75 MD recordings, 3 were not usable for
different reasons (battery problems especially in the cold season, or tape
damage). In this case the additional video-recording can step in for the
audio recording.
For the recording, the video camera was placed on the TV set in the corner
of the 16qm living room. With a fish eye lens, as well as a microphone with
an recording angle of 90 degrees most of the sound and movement in this
space could be captured. Different from Aki, Tai did not show any interest
in the equipment, and we could leave it unattended on the TV set. Although
the living room was open to a kitchen of the same size, this room was
defined as "play room" used during the observational sessions, and the child
accepted it soon. When getting older, he would prepare his toys and the
cushion (zabuton) for the observer, and urging us to start with the play
session right away.The observer would sit in the second corner, as passive
as possible, in order not to disturb the mother-child interaction. The
setting was free indoor play. The mother was instructed to 'make the child
speak'. In order to obtain as many free spontaneous speech from the child as
possible, she was told not to entertain 'not too much' story telling and
singing. The recording time was a little more than 40 minutes, and was cut
done to 40 minutes in the transcriptions. After the recording we would sit
down in the kitchen and discuss the development of the child, his friendship
relations, and his health, as well as general issues of education.
Transcription
The sound data was computerized, and sound-linked to CHAT files (MacWhinney,
2000). The transcription was done on the base of the beforehand linked sound
stretches. The easiness to access the sound (it is possible to listen to an
utterance just with one mouse-click) proved to be very convenient during
this process.
The transcription was done in Latin script (Hepburn system) following JCHAT
1.0 Hebon (Oshima-Takane & MacWhinney, 1995). Word separation follows
WAKACHI98 (Miyata & Naka, 1998). For unclear sound stretches I have used
UNIBET for Japanese (Terao, 1995).
Biographical Data
Tai was born on April, 10th, 1992 in Nagoya, the firstborn child. His mother
was 28 years old at he time of his birth. Pregnancy and delivery were
normal. Tai's birth weight was 3330 g. His physical development was normal,
and he was healthy throughout the observation.
Tai was an active, curious, and sensitive child, with a long concentration
span. He displayed a high sense of responsibility. His pronunciation was
very clear. At present (March, 2000) he is a healthy and awake first grader
with excellent records.
Other participants:
TMO Mother, called "Kakka", 29 years, housewife, former secretary at a
University in Nagoya. Educational level 15
TFA Father, called "Totto", 30 years, research engineer. Educational
level 15
SUU Investigator, called "Suuchan", friend of TMO
Pseudonyms
Tai's parents gave their kind consent for the publication of this data.
Although they consented to the use of their actual names, I have decided to
anonymize all last names (except my own) and other identifying information
throughout the corpus in order to preserve a certain amount of privacy.
Table of Contents
File No. File Name Age Minutes MLUm (based on all utterances)
1 T930930 1;5.20 40 1.514
2 T931007 1;5.27 40 1.591
3 T931014 1;6.4 30 1.288
4 T931021 1;6.11 40 1.440
5 T931029 1;6.19 40 1.788
6 T931103 1;6.24 40 1.924
7 T931111 1;7.1 40 1.477
8 T931118 1;7.8 40 1.635
9 T931125 1;7.15 40 1.820
10 T931223 1;8.13 40 1.691
11 T940107 1;8.28 40 2.105
12 T940113 1;9.3 40 2.329
13 T940120 1;9.10 40 2.331
14 T940127 1;9.17 40 2.180
15 T940204 1;9.25 40 2.223
16 T940210 1;10.0 40 2.235
17 T940217 1;10.7 40 2.313
18 T940224 1;10.14 40 2.233
19 T940303 1;10.20 40 2.348
20 T940311 1;11.1 40 2.467
21 T940324 1;11.14 40 2.739
22 T940330 1;11.20 40 2.529
23 T940407 1;11.28 40 3.306
24 T940414 2;0.4 40 2.519
25 T940421 2;0.11 40 2.471
26 T940428 2;0.18 40 2.689
27 T940505 2;0.25 40 2.929
28 T940512 2;1.2 40 3.042
29 T940519 2;1.9 40 3.004
30 T940526 2;1.16 40 3.248
31 T940602 2;1.23 40 3.737
32 T940609 2;1.30 40 3.368
33 T940616 2;2.6 40 3.485
34 T940623 2;2.13 40 3.178
35 T940630 2;2.20 40 3.016
36 T940707 2;2.27 40 3.609
37 T940714 2;3.4 40 3.413
38 T940721 2;3.11 40 2.831
39 T940728 2;3.18 40 3.288
40 T940804 2;3.25 40 2.998
41 T940813 2;4.3 40 3.102
42 T940825 2;4.15 40 2.934
43 T940831 2;4.21 40 3.158
44 T940909 2;4.30 40 3.425
45 T940916 2;5.6 40 2.916
46 T940922 2;5.12 40 3.325
47 T940929 2;5.19 40 3.564
48 T941006 2;5.26 40 3.134
49 T941013 2;6.3 40 3.486
50 T941020 2;6.10 40 3.688
51 T941028 2;6.18 40 4.036
52 T941103 2;6.24 40 3.182
53 T941110 2;7.0 40 3.252
54 T941117 2;7.7 40 3.563
55 T941123 2;7.13 40 3.516
56 T941201 2;7.21 40 4.006
57 T941208 2;7.28 40 3.932
58 T941215 2;8.5 40 4.486
59 T941222 2;8.11 40 4.040
60 T950112 2;9.2 40 4.175
61 T950119 2;9.9 40 4.779
62 T950127 2;9.17 40 3.806
63 T950202 2;9.23 40 3.133
64 T950209 2;9.30 40 4.261
65 T950216 2;10.6 40 3.286
66 T950223 2;10.13 40 4.085
67 T950302 2;10.20 40 3.663
68 T950310 2;11.0 40 4.059
69 T950324 2;11.14 40 5.003
70 T950330 2;11.20 40 5.672
71 T950413 3;0.3 40 4.227
72 T950504 3;0.24 40 5.058
73 T950511 3;1.1 40 4.923
74 T950518 3;1.8 40 4.133
75 T950608 3;1.29 40 3.787
Warnings
a) Reliability was not checked.
b) Comments and descriptions concerning child activities are not yet
supplied. They will be added in a later version.
Acknowledgments
I gratefully acknowledge the support of this research by the Ministry of
Education, Science, Sports and Culture through the Grant-in-Aid for
Scientific Research on Priority Areas 10114104 entitled "Development of
Mind", and through the Grant-in-Aid for Scientific Research (Database) 184.
I would like to thank Brian MacWhinney (Carnegie Mellon University) for his
understanding technical support during the various phases of transcription,
the members of the JCHAT Project for their encouraging supportment, and the
numerous students who helped with the transcription, especially Yumiko
Naganawa and Naomi Hamasaki. My special thanks go to Beverley Curran (Aichi
Shukutoku University) for the emotional support and encouragement throughout
this work. My warmest thanks though go to Tai and his mother. Without their
understanding collaboration, this project would not have been possible.
Literature
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd
ed. Mahwah, N.J.: Lawrence Erlbaum Assoc.
Miyata, S. (1992). Wh-Questions of the Third Kind: The Strange Use of
Wa-Questions in Japanese Children, Bulletin of Aichi Shukutoku Junior
College No.31, p.151-155
Miyata, S. (1993). Japanische Kinderfragen: Zum Erwerb von Form - Inhalt -
Funktion von Frageausdruecken, Hamburg: OAG.
Miyata, S. (1995). The Aki Corpus. Longitudinal Speech Data of a Japanese
Boy aged 1.6-2.12. Bulletin of Aichi Shukutoku Junior College No.34, 183-191
Miyata, S. (2000). Assigning MLU stages in Japanese. Journal of Educational
Systems and Technologies. The Audio Visual Center, Chukyo University Nagoya
Japan. Vol.9.
Miyata, S. & N. Naka. (1998). Wakachigaki Gaidorain WAKACHI98 v.1.1.
Educational Psychology Forum Report No. FR-98-003. The Japanese Association
of Educational Psychology.
Oshima-Takane, Y. & B. MacWhinney (eds.) (1995, 2nd ed. 1998). CHILDES
Manual for Japanese. Montreal: McGill University / Nagoya: Chukyo
University.
Sugiura, M., N. Naka, S.Miyata & Y.Oshima. (1997). Nihongo Shutoku Kenkyu no
tame no Joho Shisutemu CHILDES no Nihongoka. Gengo, 26, 3, 80-87.
Terao, Y. (1995). Nihongo no tame no UNIBET. Oshima-Takane, Y. & B.
MacWhinney (eds.) (1995). CHILDES Manual for Japanese. Montreal: McGill
University. 97-100.
More information about the Info-childes
mailing list