York Corpus

Brian MacWhinney macw at cmu.edu
Mon Oct 27 01:11:28 UTC 2003


Dear Info-CHILDES,
  I am happy to announce the addition to CHILDES of a new corpus on the
acquisition of French as a native language.  The York corpus has been
contributed by Bernadette Plunkett with assistance from Cecile De Cat.  It
includes case studies across 18 months each of three children, one in
Belgium, one in France, and one in Canada.  Following are the first sections
of the documentation for the corpus.  The full documentation can be found in
the electronic version of the database manual for Romance.  Many thanks to
Bernadette Plunkett for this contribution.

--Brian MacWhinney

This directory contains transcripts from a study of three children acquiring
French  that were collected and compiled during an project entitled "The
Syntactic Acquisition of  Wh-Questions in French: a cross-dialectal
comparison" run from the University of York  (UK) (The study was funded by
an Economic and Social Research Council grant to  Bernadette Plunkett,
#R000221972).  Data collection began in early 1997.     The project involved
an 18-month study of three children, each one a speaker of a  different
dialect of French.  The children were taped fortnightly for approximately
half an  hour in a familiar environment.  The sessions were videotaped and
separately audiorecorded using Sony professional cassette recorders. The
three fieldworkers collecting the  data were all native speakers of French.
Initial transcriptions were in most cases done by  these investigators on
the basis of the audiotape, then checked against the video and  coded by the
research assistant on the project Cécile De Cat, a native speaker of Belgian
French. The names used for the target children in these corpora are all
pseudonyms.  The  data are in French, without English glosses.  Comments are
in English.      Researchers who require more information as well as any
using data from the York  corpus are asked to contact Bernadette Plunkett by
email and to send her copies of any  research papers using this data.  The
conventions used in this corpus are under constant  re-evaluation; users
with comments or anyone who notices inconsistent application of the
conventions listed below are also asked to contact her with details. The
corpus has  recently been digitised and the digital sound stream has been
used to double check the  consistency of certain aspects of transcription,
but since permission for public release of  the audio corpus was not
originally sought from participants only the transcripts have  been donated.

The Belgium corpus contains 36 chat files, Liea001.cha-Liea036 which
correspond to  the transcripts of the Belgian child (Léa, Liège) from 2;8.22
to 4;3.21. The Canada corpus  contains 36 chat files Mona001.cha-Mona037
which correspond to the transcripts of the  Canadian child (Max, Montréal)
from 1;9.19 to 3;2.23. The France corpus contains 35  chat files
Para001.cha-Para035 which correspond to the transcripts of the French child
(Anne, Paris) from 1;10.12 to 3;5.4. Other children were also present during
some  recording sessions.  Only two of them have a significant presence,
however.  They are  Pol (born on 21-AUG-1992), who is Max's brother, and
Lore (born on 6-MAR-1995),  who was at the same childminder¹s as Anne.  The
sessions during which they were  present are represented in a table in the
Canadian and French sections respectively,  together with a calculation of
their age in those sessions. 



More information about the Info-childes mailing list