30.3330, FYI: New Resource for corpus-based typology: Multi-CAST
The LINGUIST List
linguist at listserv.linguistlist.org
Thu Sep 5 10:59:42 UTC 2019
LINGUIST List: Vol-30-3330. Thu Sep 05 2019. ISSN: 1069 - 4875.
Subject: 30.3330, FYI: New Resource for corpus-based typology: Multi-CAST
Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/
Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================
Date: Thu, 05 Sep 2019 06:59:20
From: Geoffrey Haig [geoffrey.haig at uni-bamberg.de]
Subject: New Resource for corpus-based typology: Multi-CAST
We are delighted to announce the launch of ''Multi-CAST'' (the Multilingual
Corpus of Annotated Spoken Texts), now available at:
https://multicast.aspra.uni-bamberg.de/
Multi-CAST in a nutshell:
Multi-CAST is an online collection of annotated spoken language corpora from a
steadily expanding range of typologically diverse languages.
It features standardized annotations across multiple levels, targeting
morphosyntactic structure and reference.
Multi-CAST has been designed as a tool for quantitative, corpus-based
typology.
It is based on open-source software resources, and all data are fully
accessible under a Creative Commons licence.
- 11 corpora from diverse languages
- each corpus comprises at least 1000 clauses, for a total of 20000 clauses
(c. 85000 words)
- 10 additional corpora in preparation
- multiple annotation layers for morphosyntax and referent tracking (including
zero anaphora) using unified annotation schemes (GRAID, RefIND)
- a companion R package facilitates quantitative cross-corpus analysis
For a comprehensive one-stop overview, see the following document:
https://multicast.aspra.uni-bamberg.de/data/docs/general/collection-overview/m
c_collection-overview.pdf
Enjoy!
The Multi-CAST team
Geoffrey Haig, Stefan Schnell, Nils Schiborr
Linguistic Field(s): Text/Corpus Linguistics
Typology
------------------------------------------------------------------------------
*************************** LINGUIST List Support ***************************
The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
to find out how to donate and check how your university, country or discipline
ranks in the fund drive challenges. Or go directly to the donation site:
https://iufoundation.fundly.com/the-linguist-list-2019
Let's make this a short fund drive!
Please feel free to share the link to our campaign:
https://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-30-3330
----------------------------------------------------------
More information about the LINGUIST
mailing list