[Lingtyp] Resource for corpus-based typology: Multi-CAST (Multilingual Corpus of Annotated Spoken Texts)
Geoffrey Haig
geoffrey.haig at uni-bamberg.de
Sun Sep 1 05:56:06 UTC 2019
*APOLOGIES FOR MULTIPLE POSTINGS*
*
*
*We are delighted to announce the launch of "Multi-CAST" (the
Multilingual Corpus of Annotated Spoken Texts), now available at:*
*
*
https://multicast.aspra.uni-bamberg.de/
*
*
*Multi-CAST in a nutshell:*
Multi-CAST is an online collection of annotated spoken language corpora
from a steadily expanding range of typologically diverse languages.
It features standardized annotations across multiple levels, targeting
morphosyntactic structure and reference.
Multi-CAST has been designed as a tool for quantitative, corpus-based
typology.
It is based on open-source software resources, and all data are fully
accessible under a Creative Commons licence.
- 11 corpora from diverse languages
- each corpus comprises at least 1000 clauses, for a total of 20000
clauses (c. 85000 words)
- 10 additional corpora in preparation
- multiple annotation layers for morphosyntax and referent tracking
(including zero anaphora) using unified annotation schemes (GRAID, RefIND)
- a companion R package facilitates quantitative cross-corpus analysis
For a comprehensive one-stop overview, see the following document:
https://multicast.aspra.uni-bamberg.de/data/docs/general/collection-overview/mc_collection-overview.pdf
Enjoy!
The Multi-CAST team
Geoffrey Haig, Stefan Schnell, Nils Schiborr
--
Prof. Dr. Geoffrey Haig
Lehrstuhl Allgemeine Sprachwissenschaft
Institut fuer Orientalistik
Universität Bamberg
Schillerplatz 7
96047 Bamberg
Office:+49 951 863 2490;
Admin:+49 951 863 2491; email: admin.aspra at uni-bamberg.de
https://www.uni-bamberg.de/aspra/team/prof-dr-geoffrey-haig/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20190901/36a4e218/attachment.htm>
More information about the Lingtyp
mailing list