new multilingual corpus

Brian MacWhinney macw at cmu.edu
Tue Sep 12 19:20:15 UTC 2000


Dear Info-CHILDES,

  I am happy to announce the addition to the CHILDES database of a new
corpus of transcripts documenting the parallel acquisition by three
children of Portuguese and Swedish, along with later acquisition of
English.  These data are from Madalena Cruz-Ferreira and are called mcf.sit
and mcf.zip in the database.  The readme file is as follows:


Madalena Crus-Ferreira
39 Chancery Lane
#01-01 Villa Chancery
309568 Singapore
mcf at pacific.net.sg

This corpus contains longitudinal and cross-sectional data from three
children, two girls and one boy, primary bilinguals in Portuguese and
Swedish, who acquired English as the language of schooling.


The children
Karin, Sofia and Mikael are siblings, from an upper middle-class family
background. The father is a native speaker of (Central Standard) Swedish
and the mother, who is also the researcher and a trained phonetician, is a
native speaker of (European) Portuguese.
Karin and Sofia were born in Sweden, in September 1986 and July 1988,
respectively, Mikael was born in Portugal in October 1990. From birth, the
children have been exposed to Portuguese and Swedish according to the
one-person, one-language principle that the parents adhere to since then.
The parents are otherwise fluent in one another's language as well as in
English. In all exchanges between the children and Portuguese or Swedish
relatives and friends the one-person, one-language principle is easily
maintained. The children have been exposed to several accents of Swedish
and Portuguese, the latter including Brazilian Portuguese.
Due to the father's professional commitments, the family has had several
moves to different countries since the children's birth. A schematic
indication follows, in order to highlight the extent of the children's
exposure to different languages.
- July 1986 - two months before Karin's birth, the parents moved from
Denmark (Copenhagen) to the south of Sweden, where the family set up their
permanent home. From October 1987 to June 1988, Karin (1;1 to 1;9) attended
a local kindergarten, where she spent an average of 15 hours a week.
- September 1988 - seven weeks after Sofia's birth, the family moved to
Portugal. From September 1989 to June 1990, Karin (3;0 to 3;9) attended
daily kindergarten at the Swedish School, Lisbon area.
- November 1990 - three weeks after Mikael's birth, the family moved to
Austria, Vienna area. From November 1990 to June 1992, and from September
1991 to June 1992, Karin (4;2 to 5;9) and Sofia (3;1 to 3;11),
respectively, attended a local kindergarten. Two months after the start of
school, the girls' teachers reported that the girls were quite =
comfortable communicating in German. This language is however not part of
this corpus.
- July 1992 - the mother and the children moved back to Portugal. From
August 1992 to May 1993, the father was posted in the USA and travelled to
Portugal for short weekend visits on an irregular monthly basis. From
September 1992 to June 1993, Karin (6;0 to 6;9) attended grade 1, and Sofia

(4;1 to 4;11) attended kindergarten at the Swedish School, Lisbon area.
- August 1993 - the family moved to Hong Kong. From September 1993 to June
1994, Karin (7;0 to 7;9) attended grade 2 at a British school. During this
period, on the advice of Sofia's teachers and due to progressive
proficiency in English, Sofia was successively upgraded, from a Montessori
kindergarten, to reception/grade 1 and then to grade 1 at the same British
school in each term of the academic year. Apart from two months of
two-hourly tuition per week in English for Karin and Sofia (Karin from 6;8,
Sofia from 4;10), when the family had confirmed the coming move to Hong
Kong, this move marks the beginning of the children's regular contact =
with English. For Mikael, English was also the language of his first school
ever, where from November 1993 to June 1994 he (3;1 to 3;8) attended the
same Montessori kindergarten as Sofia. At this English language school,
both Sofia and Mikael had regular exposure to Cantonese through songs,
counting and nursery rhymes.
- August 1994 - the family moved to Singapore, where they have lived for
nearly 6 years at the time of writing. The children attend English school,
Karin from 8;0 in grade 3, Sofia from 6;2 in grade 2 and Mikael from 3;11
in nursery.

When in Europe, the family travelled to Sweden for the summer and to
Portugal for Christmas or vice-versa. In Asia, the family travels to both
countries for either the summer or Christmas. Before 1993, the children
also had irregular exposure to English, through exchanges between the
parents and foreign guests to the home, or from social gatherings involving
Swedish and Portuguese relatives or friends.

At the age of 6, all three children started attending once weekly Swedish
Supply School in the countries where the family has lived, where they learn
about the language and the country. The children never had any formal
tuition of this type in Portuguese, although they are comfortably familiar
with the culture of both Sweden and Portugal.
As far as exposure to other languages than those involved above is
concerned, Karin (10;0) and Sofia (9;2) started curricular lessons in
Mandarin at school from grade 5. Both girls have Latin at school, and Sofia
has French. They are, of course, exposed to the local languages spoken in
Singapore, the main ones being Mandarin, as well as other Chinese
languages, Malay and Tamil. They are also familiar with different accents
of English, including non-native accents.
Sofia, the latest speaker of all three, was diagnosed at age 4 with 40%
deafness due to recurrent middle-ear infections for which she had been
receiving regular medication since babyhood. She underwent grommet and
adenoid surgery twice, first in Portugal at 4;9 and later in Singapore at
6;2, when the problem was solved. The noteworthy consequence of this
problem was that up to the age of 10 her delivery was rather slurred in
both Portuguese and Swedish, whereas her delivery in English, which she
started learning with normal hearing, was faster and clearer from the very
start. Mikael had a lisp, which he spontaneously corrected at age 5;9. The
children are otherwise healthy and their development is normal.

The children have always lived with both parents, and always taken active
part in the family's life. The mother is the main caregiver, having =
stayed at home during the children's first years. The children are
therefore mostly exposed to Portuguese at home. In order to counterbalance
this asymmetry, compounded by the regular absences of the father due to
business travel, the parents chose to address one another mostly in Swedish
in the presence of the children. While consistently using either Portuguese
or Swedish in exchanges with each parent, the children started by using
Portuguese among themselves, except when recalling or discussing events
specifically related to Sweden, like skiing or the midsummer celebration,
for which they used Swedish. From the start of their regular schooling in
English, they gradually started using more English among themselves,
English being now almost exclusively the language of their exchanges. None
of the children has ever felt self-conscious about using Portuguese or
Swedish with their parents in front of non-speakers of the languages,
including other children.


Data collection
Data are being collected, since the birth of each child, through audio
recordings, video recordings and diary notes made by the mother.
Audio and video tapes are reviewed soon after recording, and supplemented
by diary notes wherever clarification is needed. Otherwise, extensive diary
=

notes are used to record each child's progress, both linguistic and in
other developmental areas. Recordings are typically made whenever a new
linguistic trait appears in the children's speech, in the same way that
progress in other areas is noted down in the diaries, that is, on no
regular chronological basis. The data in this corpus concern the =
children's Portuguese, Swedish and, from 1993, their English. Most of the
data reflect spontaneous speech, except in cases where the child was
specifically asked to speak (or sing, or read) 'for the record', for
example, to say the colour or animal names in a picture book.
Typical recording sessions took place, in the first months of the
children's life, with the child safely lying down and playing on its own =
or interfacing with one parent or relative. Later, the tape recorder was
turned on in an inconspicuous place where the children were busying
themselves or being attended to. The children were obviously aware of the
camera during video recordings, but its presence soon became an
uninteresting detail of their routine. Recordings encompass a broad
spectrum of situations. Aside from the recordings made to capture specific
progress, which were usually made at home, recordings include daily
routines, solitary play or with other children, festive gatherings with
family and friends, and outings. The data therefore give a broad view of
each child's full (socio)linguistic ability, including making =
acquaintance with adults and children, voice modulations and strategies to
call the attention of distant hearers, or strategies to overcome background
noise. For the recordings of spontaneous interaction with children outside
of the family, parental permission to use the data was duly requested and
obtained.
One possible shortcoming of the recorded data is that the mother was
regularly present during collection, except in those cases when the tape
recorder was left on with the children on their own. Other shortcomings of
spontaneous child speech collection are well-known to researchers in this
area, from the children's unwillingness to cooperate, to disruptions from =

siblings or equipment during recording of one particular child. The detail
included in the diaries therefore constitutes an invaluable complementary
resource.


Transcription and coding
Data were transcribed and coded by the researcher, who is competent in all
three languages. Transcription was made as soon as possible after
recording, and rechecked when coding into CHAT format, from January 2000.
All files in the corpus include a %pho: tier and a %int: tier. Both tiers
are also used to transcribe adult utterances with characteristic features
of child-directed speech, or otherwise non-standard.

The %pho: tier.
- Font - IPAPhon. A narrow transcription is attempted, while compromising
with readability. Babbled strings are transcribed in full, with problematic
sounds discussed in the %com: tier. The %mod: tier gives colloquial forms,
as spoken in the family.
- Symbols - adult speech, and child speech that can safely be recognised as
(renderings of a) target, is transcribed according to the conventions in
the Handbook of the International Phonetic Association (Cambridge
University Press, 1999) for each language. In transcriptions of babble or
otherwise unintelligible speech, the symbols used represent standard
International Phonetic Alphabet values. For example, the IPA [_] symbol
represents a vowel with similar vowel quality to one mid central vowel
found in both Portuguese and Swedish. In target-like child forms, this
vowel is transcribed with [_] in Portuguese and with [_] in Swedish; in
babble, only the symbol  [_] is used.
- Diphthongs - vowel sequences are taken as diphthongs if the second vowel
follows the tone initiated in the first. The glide segment of the diphthong
is transcribed with [j] or [w], which therefore represent vocoids. Hence,
e.g., [aw] represents one syllable, [au] represents two.
- Obstruents - voiced symbols that are marked devoiced, e.g., [__],
indicate voiceless lenis articulations.
- Syllables ' for the purposes of stress assignment, intervocalic =
consonant sequences are syllabified as onsets - according to the
phonotactics of the language involved in the case of adult and target-like
child forms. This is one choice among many possible, and does not imply
sanctioning one type of syllabification in child speech. Two adjacent
identical vowel symbols indicate that the child pronounced the vowel as two
syllables.

- Stress - pitch obtrusion usually makes it clear which syllable is being
stressed. Other cues to stress are duration and intensity at the syllabic
peak. Stress is marked with [_] before the affected syllable.
- Words - a space delimits what was interpreted as a word or a phrase
within the same tone group, in child or child-directed speech, even when
not corresponding to these constituents in target forms.

The %int: tier.
This tier transcribes uses of pitch, adapting the principles of nuclear
notation described in the CHAT Manual, and includes indication of voice
quality and paralinguistic features, e.g., creak, tempo.
Adult speech and target-like child speech is transcribed by means of
abbreviated paired symbols. In simple falling, rising or level tones, the
first symbol denotes the high, mid or low pitch at which the tone starts,
and the second symbol denotes the type of pitch movement, falling, rising
or level. The one exception is the Portuguese extra-low fall, see below.
'High', 'mid' and 'low' are relative terms: a 'mid' pitch =
level denotes the speaker's average tone range, as it is
impressionistically detected in regular contact with any speaker, 'high'
and 'low' being =
accordingly defined in relation to 'mid' for each speaker. In complex
tones, the successive symbols indicate the type of pitch movement: The
conventions are as follows:
Simple falls:
- LF - low-fall
- MF - mid-fall
- HF - high-fall
- eLF ' extra-low fall, from low to below the speaker's usual low =
range.

Simple rises:
- LR ' low-rise
- MR ' mid-rise
- HR ' high-rise

Level tones:
- LL ' low-level
- ML ' mid-level
- HL ' high-level

Complex tones:
- RF ' rise-fall
- FR ' fall-rise
- RFR ' rise-fall-rise
- FRF ' fall-rise-fall.

Complementary indication of where the pitch ends is added where relevant,
e.g., "HF to mid".


Other conventions are:
- preH - prehead: unstressed syllables before the first stressed syllable
in the utterance.
- H - head: begins on the first stressed syllable in the utterance and
stretches up to the nuclear syllable.

These symbols always follow symbols indicating pitch start or type, so that
=

confusion between the H denoting 'high' and the H denoting 'head' =
is avoided. Examples of their use are:
- LpreH ' low prehead
- MpreH ' mid prehead
- HpreH ' high prehead

- LH ' low head
- MH ' mid head
- HH ' high head
- FH ' falling head
- RFH ' rising-falling head

Transcription of each tone group (tg) is given on successive lines of the
%int: tier. Prehead, head and tone are separated by + signs in the
transcription, e.g. (file ptgsw.K880500, lines 865 and 867-869):

*DAD:	vad heter //det f=F6r n=E5t # vad =E4r //det f=F6r n=E5t # vad =
//heter det.
%int:	1tg, MH+ML;
	2tg, MH+LR;
	3tg, LH+MF.

	In babbled speech, no assumption is made concerning the existence of an
intonational nucleus. Transcription of babble concerns pitch height and
movement on each babbled syllable, according to similar conventions. The
main difference is that + signs here indicate syllable boundaries, e.g.
(file ptgsw.M901215, lines 77-81):

*MIK:	yyy.
_____	________________________
%int:	1tg, ML+long MF;
		2tg, LL;
		3tg, HL+short LF.


Other conventions and symbols.
- Orthography - adult utterances, and children's utterances recognised as =

(renderings of) target forms, are given in standard orthography. A form of
ad-hoc 'baby orthography' is also used for child connected speech that, =

although replicating target utterances, distorts segments and prosody
beyond any readable use of CHAT conventions for truncated child utterances.
=

In these cases, standard orthography is given in the %gls: tier. It is
hoped that 'baby orthography' will be easily understandable by native users
of the database. One example is in file ptgsw.SM910100, lines 24-25:

*SOF:	a/m=E3 # k/lhi klh/k=F3?
%gls:	mam=E3, a Karin est=E1 na escola?

- Ptg, Sw, Eng - indicate quotation of data in Portuguese, Swedish and
English, respectively, in %com, %exp or %lan tiers. Notations of the type
PtgEng are used in the same way for multilingual mixes, with the first
symbol indicating host language (the language accepting an intrusion) and
the second guest language (the intruding language). In the %lan tier, the
use of one language symbol on its own indicates a probable rendition of a
target in the language.

- tg(s) ' tone group(s)
- syll(s) ' syllable(s)
- dipt(s) ' diphthong(s)
- min(s) ' minute(s)
- sec(s) ' second(s)

- Other abbreviations, such as Det, VP, follow accepted standards.


The files
The files contain monolingual and/or mixed production by one or more of the
=

children. The filenames include a language prefix, the child(ren)'s
initial(s) and the date of recording, given as yymmdd. An indication of 00
for the day means that the exact day of recording is unknown. Files
containing all three languages are prefixed ptswen.

File ID	K  age	S  age	M  age
K861020	0;1.18		
K861113	0;2.11		
K870117	0;4.15		
K870203	0;5.1		
K870319	0;6.17		
K870500	0;8.		
K870600	0;9.		
K870800	0;11.		
K880500	1;8.		
KS881200	2;3.	0;5	
KS890105	2;4.3	0;5.24	
KS890510	2;8.8	0;9.	
KSM901205	4;3.3	2;4.24	0;1.25
KSM910700	4;10.	3;0.	0;9
KSM920408	5;7.6	3;8.27	1;5.28
M901100			0;1.
M901215			0;2.5
M910108			0;2.28
M910500			0;7.
M910528			0;7.18
M910600			0;8.
M910800			0;10.
M910900			0;11.
M911125			1;0.15
S880815		0;1.4	
S880900		0;2.	
S881000		0;3.	
S881004		0;2.23	
S890100		0;6.	
S890300		0;8.	
SM910100		2;6.	0;3.





----------------



More information about the Info-childes mailing list