Noji corpus

Brian MacWhinney macwhinn at hku.hk
Sat Aug 4 10:04:25 UTC 2001


Dear Info-CHILDES,
  I am happy to announce the addition to CHILDES of a large corpus of
Japanese child language data that was collected beginning in 1948 by Junya
Noji from his son Sumihare.  The current corpus was prepared and contributed
by Norio Naka and Susanne Miyata with the permission of Noji. Thanks to all
of them for this contribution.

--Brian MacWhinney


***************************
*** Noji Corpus v.1.1 *****
***************************

Susanne Miyata
Faculty of Creativity and Culture
Aichi Shukutoku University
Sakuragaoka 23, Chikusa-ku
Nagoya, Japan 464-8671
smiyata at asu.aasa.ac.jp


***History***
The Noji Corpus contains diary data collected by the Japanese linguist and
dialectologist Junya Noji. He observed his first-born son Sumihare from
birth (1948, March, 9th) until the age of 7, as he was growing up in
Hiroshima. The data is based on handwritten records collected virtually
daily
(2243 days over 7 years), although the focus lies in the 3rd year. In the
later years, less records were taken, resulting in a lower number of
utterances available per month. Detailed description of the methodology can
be found in the printed edition (Bunka Hyoron Shuppan).
The data contains approximately 40,000 utterances by Sumihare, and about
22,000 utterances by other family members (his mother and father and his
younger brother, Teruki) and other speakers such as the children from the
neighborhood (Seejikun and Keekochan). A comment is provided for each
utterance, establishing the context and interpreting the child's utterance.
The electronic version of this data was entered, compared to the original,
and adjusted to CHAT format by Norio Naka (Osaka Gakuin U.). The final
brush up using CHECK was done by Susanne Miyata (Aichi Shukutoku U.).


***Format***
The print original uses katakana (phonetic syllable script) for the
utterances, and regular hiragana (syllabic) and kanji (Chinese characters)
for the comments, as well as a number of special symbols such as arrows
to indicate the speaker and the addressee. The electronic version was done
in Hebon (Hepburn transcription system) and separated into words (wakachi;
spoken utterances only). The format follows the Japanese adaption of CHAT,
JCHAT 1.0 (Oshima-Takane & MacWhinney, eds., 1998).
When the data entry began in 1992, only ASCII was available within the
CHILDES system. But now, even though there is no longer any restriction
concerning the fonts, the use of Hebon (at least in the main line) has the
advantage of compatibility with programs such as MOR, and renders the data
accessible to a greater number of researchers by removing the barrier of
Japanese script.

***Warnings***
1)  The wakachi (word separation) format is not yet adjusted to the JMOR-
      compatible WAKACHI99 format.
2)  Words are transcribed as pronounced (e.g. 'futachu' for 'futatsu')
3)  Proper names are not capitalized.

when using this corpus please cite:
Noji, Junya. (1973-77). Yooji no gengo seikatsu no jittai I -IV.
    Bunka Hyoron Shuppan.


***Table of Contents***
#########################################################
year    month    age    # of     # of utt.
            files
            (days)    SUM    others    all utt.
########################################################
1948    3    0;0    26    0    0    0
    4    0;1    29    0    0    0
    5    0;2    30    1    1    2
    6    0;3    30    6    2    8
    7    0;4    17    2    1    3
    8    0;5    17    0    0    0
    9    0;6    17    2    0    2
    10    0;7    19    0    0    0
    11    0;8    14    4    1    5
    12    0;9    16    18    2    20
1949    1    0;10    19    17    11    28
    2    0;11    27    52    27    79
#######################################################
            261    102    45    147
#######################################################
    3    1;0    24    65    28    93
    4    1;1    20    67    31    98
    5    1;2    27    61    26    87
    6    1;3    29    110    43    153
    7    1;4    30    81    26    107
    8    1;5    31    342    97    439
    9    1;6    30    453    149    602
    10    1;7    31    436    178    614
    11    1;8    30    425    144    569
    12    1;9    31    414    125    539
1950    1    1;10    30    349    66    415
    2    1;11    28    820    146    966
#######################################################
            341    3.623    1.059    4.682
#######################################################
    3    2;0    31    800    137    937
    4    2;1    30    1.892    571    2463
    5    2;2    31    3.201    1.050    4251
    6    2;3    30    1.198    423    1621
    7    2;4    30    1.280    557    1837
    8    2;5    31    1.779    971    2750
    9    2;6    30    939    419    1358
    10    2;7    31    1.317    524    1841
    11    2;8    30    1.368    641    2009
    12    2;9    31    1.312    727    2039
1951    1    2;10    31    991    719    1710
    2    2;11    28    771    518    1289
#########################################################
            364    16848    7257    24.105
#########################################################
    3    3;0    31    709    477    1186
    4    3;1    30    847    542    1389
    5    3;2    31    918    584    1502
    6    3;3    30    1071    792    1863
    7    3;4    31    1024    754    1778
    8    3;5    31    689    517    1206
    9    3;6    30    493    375    868
    10    3;7    31    1321    870    2191
    11    3;8    30    865    631    1496
    12    3;9    31    620    519    1139
1952    1    3;10    30    537    337    874
    2    3;11    29    497    375    872
##########################################################
            365    9.591    6.773    16.364
##########################################################
    3    4;0    31    576    435    1011
    4    4;1    30    523    344    867
    5    4;2    31    285    236    521
    6    4;3    30    365    206    571
    7    4;4    31    315    172    487
    8    4;5    30    242    140    382
    9    4;6    27    202    118    320
    10    4;7    31    249    169    418
    11    4;8    27    262    166    428
    12    4;9    30    612    392    1004
1953    1    4;10    29    476    348    824
    2    4;11    28    410    284    694
##########################################################
            355    4.517    3.010    7.527
##########################################################
    3    5;0    30    279    189    468
    4    5;1    30    366    262    628
    5    5;2    31    322    238    560
    6    5;3    29    286    186    472
    7    5;4    31    337    217    554
    8    5;5    31    362    296    658
    9    5;6    30    393    347    740
    10    5;7    26    163    161    324
    11    5;8    29    248    186    434
    12    5;9    28    343    313    656
1954    1    5;10    24    172    150    322
    2    5;11    25    167    162    329
#########################################################
            344    3.438    2.707    6.145
##########################################################
    3    6;0    16    97    64    161
    4    6;1    18    85    65    150
    5    6;2    18    105    77    182
    6    6;3    26    251    224    475
    7    6;4    29    359    297    656
    8    6;5    29    346    233    579
    9    6;6    25    111    115    226
    10    6;7    14    44    50    94
    11    6;8    8    23    29    52
    12    6;9    11    50    43    93
1955    1    6;10    15    47    50    97
    2    6;11    3    4    6    10
#########################################################
            212    1522    1.253    2.775
#########################################################
total sum        2.242    39.641    22.104    61.745

*****end



More information about the Info-childes mailing list