York Corpus
Brian MacWhinney
macw at cmu.edu
Mon Oct 27 01:11:28 UTC 2003
Dear Info-CHILDES,
I am happy to announce the addition to CHILDES of a new corpus on the
acquisition of French as a native language. The York corpus has been
contributed by Bernadette Plunkett with assistance from Cecile De Cat. It
includes case studies across 18 months each of three children, one in
Belgium, one in France, and one in Canada. Following are the first sections
of the documentation for the corpus. The full documentation can be found in
the electronic version of the database manual for Romance. Many thanks to
Bernadette Plunkett for this contribution.
--Brian MacWhinney
This directory contains transcripts from a study of three children acquiring
French that were collected and compiled during an project entitled "The
Syntactic Acquisition of Wh-Questions in French: a cross-dialectal
comparison" run from the University of York (UK) (The study was funded by
an Economic and Social Research Council grant to Bernadette Plunkett,
#R000221972). Data collection began in early 1997. The project involved
an 18-month study of three children, each one a speaker of a different
dialect of French. The children were taped fortnightly for approximately
half an hour in a familiar environment. The sessions were videotaped and
separately audiorecorded using Sony professional cassette recorders. The
three fieldworkers collecting the data were all native speakers of French.
Initial transcriptions were in most cases done by these investigators on
the basis of the audiotape, then checked against the video and coded by the
research assistant on the project Cécile De Cat, a native speaker of Belgian
French. The names used for the target children in these corpora are all
pseudonyms. The data are in French, without English glosses. Comments are
in English. Researchers who require more information as well as any
using data from the York corpus are asked to contact Bernadette Plunkett by
email and to send her copies of any research papers using this data. The
conventions used in this corpus are under constant re-evaluation; users
with comments or anyone who notices inconsistent application of the
conventions listed below are also asked to contact her with details. The
corpus has recently been digitised and the digital sound stream has been
used to double check the consistency of certain aspects of transcription,
but since permission for public release of the audio corpus was not
originally sought from participants only the transcripts have been donated.
The Belgium corpus contains 36 chat files, Liea001.cha-Liea036 which
correspond to the transcripts of the Belgian child (Léa, Liège) from 2;8.22
to 4;3.21. The Canada corpus contains 36 chat files Mona001.cha-Mona037
which correspond to the transcripts of the Canadian child (Max, Montréal)
from 1;9.19 to 3;2.23. The France corpus contains 35 chat files
Para001.cha-Para035 which correspond to the transcripts of the French child
(Anne, Paris) from 1;10.12 to 3;5.4. Other children were also present during
some recording sessions. Only two of them have a significant presence,
however. They are Pol (born on 21-AUG-1992), who is Max's brother, and
Lore (born on 6-MAR-1995), who was at the same childminder¹s as Anne. The
sessions during which they were present are represented in a table in the
Canadian and French sections respectively, together with a calculation of
their age in those sessions.
More information about the Info-childes
mailing list