36.1970, FYI: Child Language Corpus of Jordanian Arabic
The LINGUIST List
linguist at listserv.linguistlist.org
Thu Jun 26 00:05:02 UTC 2025
LINGUIST List: Vol-36-1970. Thu Jun 26 2025. ISSN: 1069 - 4875.
Subject: 36.1970, FYI: Child Language Corpus of Jordanian Arabic
Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Editor for this issue: Joel Jenkins <joel at linguistlist.org>
================================================================
Date: 24-Jun-2025
From: Marwan Jarrah [m.jarrah at ju.edu.jo]
Subject: Child Language Corpus of Jordanian Arabic
https://sites.ju.edu.jo/en/Childcorpus/Home.aspx
Welcome to the Child Language Corpus of Jordanian Arabic (JA)—the
first large-scale, systematically compiled linguistic resource
dedicated to documenting the spoken language of typically developing
children in Jordan. This corpus represents a foundational step in
Arabic language acquisition research, offering a rich and
unprecedented dataset of natural child speech across regional, age,
and gender lines.
Spanning a total of approximately 500,000 words, this corpus is based
on over 500 recorded interviews with children aged 2 years and 6
months to 12 years. These interactions capture a diverse spectrum of
everyday, spontaneous language use, reflecting the authentic voices of
Jordanian children across urban, rural, and Bedouin communities. The
corpus offers an inclusive and highly representative view of
vernacular Jordanian Arabic (JA) in real-life contexts.
Each interview was carefully transcribed to mirror exactly how
children pronounce words, preserving phonetic details and including
markers for pauses and disfluencies. This attention to detail ensures
that the corpus is not only a record of what children say but also how
they say it—a critical resource for research in phonology,
morphosyntax, discourse development, and beyond.
Many of the recorded sessions, especially those involving younger
children, feature interactions with parents or caregivers, providing a
naturalistic context for child-directed speech. These interactions
offer valuable insights into turn-taking, scaffolding, and
social-pragmatic development in early language acquisition.
Ethical standards were upheld throughout the project. Informed consent
was obtained from all participating families, and data was anonymized
to protect the privacy of the children and their families. The project
was reviewed and approved in accordance with institutional ethical
guidelines.
Key Features of the Corpus
Age Range: Children aged 2.6 to 12 years, covering key stages of early
and late language development
Regional Diversity: Includes data from urban centers, rural areas, and
Bedouin communities across Jordan
Search and Filter Options: Users can filter data based on region, age,
and sex, enabling targeted investigations
Phonetic Transcription: Utterances are transcribed exactly as
pronounced, maintaining critical phonological data
Context-Rich Interactions: Many interviews include natural
caregiver-child dialogues, ideal for pragmatic and discourse studies
Why This Corpus Matters
This project is the first of its kind in the Arab world—a systematic,
large-scale corpus that centers the voices of children in their
everyday linguistic environments. Despite the centrality of Arabic in
global linguistic diversity, child language acquisition in Arabic
dialects has long been underrepresented in corpus-based research. This
gap has limited our understanding of the developmental pathways unique
to Arabic and of how children acquire complex syntactic and
phonological features found in Arabic varieties.
The Child Language Corpus of Jordanian Arabic directly addresses this
gap. It offers a robust empirical foundation for testing hypotheses in
language development, morphosyntactic theory, dialect variation, and
pragmatic competence. It also opens the door to cross-linguistic
comparisons, shedding light on universal vs. language-specific
features of acquisition.
Researchers in fields as diverse as developmental linguistics,
language education, psycholinguistics, dialectology, speech-language
pathology, and natural language processing (NLP) will find this corpus
an invaluable tool. Moreover, the inclusion of vernacular, spoken
Arabic—rather than Modern Standard Arabic—reflects a more accurate
linguistic reality of children's day-to-day experiences and enhances
the ecological validity of the research.
Looking Ahead
This corpus is more than a collection of interviews—it is a platform
for collaboration, discovery, and innovation. As the database
continues to expand and evolve, we welcome scholars and educators to
explore, analyze, and build upon this resource. Together, we can
deepen our understanding of how children acquire language in context
and ensure that the linguistic experiences of Arabic-speaking children
are fully represented in global academic discourse.
We hope this resource contributes meaningfully to the development of
child language research in Arabic and inspires similar projects across
the region and beyond.
https://sites.ju.edu.jo/en/Childcorpus/Home.aspx
Linguistic Field(s): Anthropological Linguistics
Language Acquisition
Syntax
Subject Language(s): Arabic (ara)
Language Family(ies): Semitic
------------------------------------------------------------------------------
********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:
https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8
LINGUIST List is supported by the following publishers:
Bloomsbury Publishing http://www.bloomsbury.com/uk/
Cambridge University Press http://www.cambridge.org/linguistics
Cascadilla Press http://www.cascadilla.com/
De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton
Edinburgh University Press http://www.edinburghuniversitypress.com
Elsevier Ltd http://www.elsevier.com/linguistics
John Benjamins http://www.benjamins.com/
Language Science Press http://langsci-press.org
Lincom GmbH https://lincom-shop.eu/
MIT Press http://mitpress.mit.edu/
Multilingual Matters http://www.multilingual-matters.com/
Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/
Oxford University Press http://www.oup.com/us
Wiley http://www.wiley.com
----------------------------------------------------------
LINGUIST List: Vol-36-1970
----------------------------------------------------------
More information about the LINGUIST
mailing list