two new corpora

Brian MacWhinney macw at cmu.edu
Sun Jan 31 21:10:45 UTC 2010


Dear Info-CHILDES,
    It is my pleasure to announce the addition to CHILDES of two new corpora from children learning British English.  The first is the Lara corpus from Caroline Rowland.  This corpus consists of 120 hours of audio recorded speech from one child between 1;9 and 3;3.  These files were analyzed with an earlier version of MOR and we will eventually update the %mor line.  
   The second corpus is the Thomas corpus from Jeannine Goh and Elena Lieven of the MPI Child Study Centre in Manchester.  This corpus is far and away the densest corpus yet available in CHILDES.  Thomas was recorded intensively throughout the period of 2;0 to 4;11, but particularly so during the period from 2;0 to 3;2.  The corpus now on the web does not include a %mor line, but that will be added later in the year.  By way of comparison, the classic three-child Brown corpus has a size of about 23MB without the additional %mor and %gra lines, and the single-child Thomas corpus has a size of 123 MB.  Moreover, the Thomas corpus is fully linked to audio and the transcripts linked to audio can be listened to directly over the web through the CHILDES browser.  Last names and addresses have been removed from the transcripts and audio to maintain anonymity.
   I would like to thank Caroline, Elena, Jeannine and all of the others who worked on these projects for the contribution to CHILDES of two important data sets.  Details regarding both corpora can now be found in the database manual for UK English on the web.

-- Brian MacWhinney

-- 
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com.
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.



More information about the Info-childes mailing list