29.1295, Disc: Request for Comment: Cross-Linguistic Data Formats

The LINGUIST List linguist at listserv.linguistlist.org
Fri Mar 23 17:08:39 UTC 2018


LINGUIST List: Vol-29-1295. Fri Mar 23 2018. ISSN: 1069 - 4875.

Subject: 29.1295, Disc: Request for Comment: Cross-Linguistic Data Formats

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================


Date: Fri, 23 Mar 2018 13:08:00
From: Harald Hammarström [harald at bombo.se]
Subject: Request for Comment: Cross-Linguistic Data Formats

 
RFC: Cross-Linguistic Data Formats (CLDF), version 1.0

Resulting from discussions over several years, and triggered in
particular by work presented in the two workshops of the ''Language
Comparison with Linguistic Databases'' series [1,2], we'd like to
request your comments on version 1.0 of CLDF - a specification for
Cross-Linguistic Data Formats (see http://cldf.clld.org).

The specification proposes a standard format for
- wordlists, including cognate judgments and phonetic alignents,
- grammatical structure datasets like WALS features and other typological
surveys.

CLDF is built upon W3C's ''Tabular Data and Metadata on the Web''
recommendation [3] and can be thought of as a domain specific adaption
of this in linguistics.

Extensibility is built into CLDF, to allow support of evolving
standards for more complex types of linguistic data. As of version
1.0, modules for simple dictionary data and parallel-text corpora are
included for further experimentation.

CLDF datasets can be read and written using the Python programming
library pycldf (https://pypi.python.org/pypi/pycldf), but also using
off the shelf tools like spreadsheet software or programming
environments like R, because the data file format in CLDF is based on
comma-separated values (CSV).

The CLDF specification is available at
https://github.com/cldf/cldf/blob/master/README.md

Examples of CLDF datasets and how to access CLDF data are provided at
- https://github.com/cldf/cldf/tree/master/examples and
- https://github.com/cldf/cookbook

We welcome all comments, either posted as reply to this announcement or as
issues at https://github.com/cldf/cldf/issues

[1]
http://www.mpi.nl/events/language-comparison-with-linguistic-databases-reflex-
and-typological-databases
[2] http://www.eva.mpg.de/linguistics/conferences/2014-ws-lanclid2/index.html
[3] https://www.w3.org/TR/tabular-data-model/



Linguistic Field(s): Computational Linguistics
                     Genetic Classification
                     Historical Linguistics
                     Typology



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-29-1295	
----------------------------------------------------------
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
          http://multitree.org/







More information about the LINGUIST mailing list