new Japanese corpus

Brian MacWhinney macw at cmu.edu
Wed Nov 17 00:14:02 UTC 1999


Dear Info-CHILDES,
  I am delighted to announce the availability of a new corpus of Japanese
data from Dr. Takeo Ishii of Kyoto University.  A particularly interesting
aspect of this new "JUN" corpus is the availability of a set of digitized
movies linked to the corpus.  One of these is available on the web at
http://ccnic15.kyoto-su.ac.jp/~ishii/
and others are available from Dr. Ishii on request.  Our thanks to Takeo for
this ground-breaking contribution.  The data can be found on the server in
japanese.sit.

--Brian MacWhinney

   The readme file follows:

Takeo Ishii
Department of Foreign Languages
Kyoto Sangyo University
Motoyama, Kamigamo, Kita-ku, Kyoto
Japan 603-8555

Jun is a third child in the family with a brother Ken and a sister Yasuko.
The family lived in Kyoto City and moved to Kusatsu City, Shiga Pref., where
Jun was born. The family speak Kyoto dialect. Dialect and family words are
listed in the file dialect.cdc.

The Jun corpus is made public to child language researchers. Any researcher
interested in child language acquisition may use these data freely. More data
will be added in the future. Some warnings concerning the corpus are: (1)
Reliability was not checked, (2) The length of the observational sessions
differ, (3) Some of the movies, especially earlier ones, are not very clear
due to the weather, and (4) UNIBET symbols are used, especially for earlier
sessions when child utterances are unclear. The data include only the
utterances of participants with few situational descriptions, as it was very
complicated to describe the situations fully.

The database currently contains 61 files and each recording lasts about 15
minutes. The resulting movie files are between 300 and 590 megabytes in size.
The first 31 files cover the ages between 0;8 and 1;11 at a roughly bimonthly
frequency. The second set of 31 files cover the period from 3;5 to 3;8, but
each session lasts nearly one hour and is divided into about 4 periods of 15
minute recordings.

If you use this data or parts of it, please send one printed copy of your
article/publication to Takeo Ishii. Please cite Ishii, Takeo 1999, The JUN
Corpus, unpublished. Movies on CD-ROM are available upon request (only for
Macintosh now).  Each includes a movie file, a chat file. If you want copies
of movie files, please send blank CD-Rs together with a return addressed
envelope and postage stamps. About 630MB is recordable on one CD-R. Please be
sure to specify the file names you want.



More information about the Info-childes mailing list