30.3330, FYI: New Resource for corpus-based typology: Multi-CAST

The LINGUIST List linguist at listserv.linguistlist.org
Thu Sep 5 10:59:42 UTC 2019


LINGUIST List: Vol-30-3330. Thu Sep 05 2019. ISSN: 1069 - 4875.

Subject: 30.3330, FYI: New Resource for corpus-based typology: Multi-CAST

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Thu, 05 Sep 2019 06:59:20
From: Geoffrey Haig [geoffrey.haig at uni-bamberg.de]
Subject: New Resource for corpus-based typology: Multi-CAST

 
We are delighted to announce the launch of  ''Multi-CAST'' (the Multilingual
Corpus of Annotated Spoken Texts), now available at:

https://multicast.aspra.uni-bamberg.de/

Multi-CAST in a nutshell:
Multi-CAST is an online collection of annotated spoken language corpora from a
steadily expanding range of typologically diverse languages.

It features standardized annotations across multiple levels, targeting
morphosyntactic structure and reference.
Multi-CAST has been designed as a tool for quantitative, corpus-based
typology.
It is based on open-source software resources, and all data are fully
accessible under a Creative Commons licence.

- 11 corpora from diverse languages
- each corpus comprises at least 1000 clauses, for a total of 20000 clauses
(c. 85000 words)
- 10 additional corpora in preparation
- multiple annotation layers for morphosyntax and referent tracking (including
zero anaphora) using unified annotation schemes (GRAID, RefIND)
- a companion R package facilitates quantitative cross-corpus analysis

For a comprehensive one-stop overview, see the following document:

https://multicast.aspra.uni-bamberg.de/data/docs/general/collection-overview/m
c_collection-overview.pdf

Enjoy!

The Multi-CAST team

Geoffrey Haig, Stefan Schnell, Nils Schiborr
 



Linguistic Field(s): Text/Corpus Linguistics
                     Typology





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-30-3330	
----------------------------------------------------------






More information about the LINGUIST mailing list