ESF Second Language Corpus
Brian MacWhinney
macwhinn at hku.hk
Sat Jun 16 08:43:33 UTC 2001
Dear Info-CHILDES,
I am happy to announce the release of an enormous corpus of data in CHAT
format on the acquisition of a second language by guest workers in Europe.
This data was collected is the ESF (European Science Foundation) Project
directed by Wolfgang Klein and Clive Perdue in the 1970s. These data were
converted to CHAT format over the last ten years by Marianne Starren, Daan
Broeder, and other researchers at the Max-Planck Institute in Nijmegen.
Moreover, about half of the files have been linked using sonic CHAT to
digitized audio. The transcript files alone amount to over 37 MB of data
and the digitized audio files occupy a very sizable stack of CD-ROMs. These
files will also be available soon in the audio area of the CHILDES server.
Eventually, we hope to convert them to MP3 for faster downloading.
The transcript files are now fully check and can be downloaded from
http://childes.psy.cmu.edu/mac/biling/esf/ and
http://childes.psy.cmu.edu/win/biling/esf/
The files are packaged in terms of combinations of L1 and L2, as in GermSpan
for German L2 and Spanish L1.
The biographical sketches for the subjects are all packaged together in
bios.zip and bios.sit.
Although these files are currently on the CHILDES server, they are from
adult speakers and will eventually be integrated more properly with the
larger TalkBank Project, as well as the new work of the LIDES group
(http://childes.psy.cmu.edu/lides/)
Thanks to the many workers at the MPI and other institutions in France,
Sweden, the Netherlands, and the UK who have made possible this extremely
important contribution to CHILDES.
--Brian MacWhinney
More information about the Info-childes
mailing list