28.2007, FYI: Languages of Indonesia - Data Deposit

The LINGUIST List linguist at listserv.linguistlist.org
Mon May 1 14:56:38 UTC 2017


LINGUIST List: Vol-28-2007. Mon May 01 2017. ISSN: 1069 - 4875.

Subject: 28.2007, FYI: Languages of Indonesia - Data Deposit

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                       Fund Drive 2017
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Yue Chen <yue at linguistlist.org>
================================================================


Date: Mon, 01 May 2017 10:56:08
From: Bradley Taylor [brad6020 at yahoo.com]
Subject: Languages of Indonesia - Data Deposit

 Dear Colleagues,

We are pleased to announce that the Jakarta Field Station of the Max Planck
Institute for Evolutionary Anthropology (MPI-EVA), along with its
collaborating projects, has just finalised the deposit of its corpora in The
Language Archive (TLA) at the Max Planck Institute for Psycholinguistics:

https://hdl.handle.net/1839/00-0000-0000-0021-10DE-A@view

The Jakarta Field Station was a major field project of the former Department
of Linguistics at MPI-EVA. Based in Jakarta, with field assistants working in
various locations across Indonesia, it operated between 1999 and 2015 with the
primary purpose of recording and documenting languages of the region. Together
with collaborating projects and scientists, it gathered over 2.3 million
transcribed utterances from primarily naturalistic language recordings. An
archive of the Field Station's website can be found here:
http://jakarta.shh.mpg.de

Most utterances are fully glossed into English and translated into either
English or Indonesian or both. All have session and speaker metadata and, in
the TLA, are in Toolbox format, with many in ELAN format as well. All data are
open-access, can be downloaded, and are free to use, with appropriate
citation.

Some rough tallies:

Transcribed sessions:  2,800
Text records (~utterances):  2.3 million
Words (tokens):  8.7 million
Recorded audio (WAV):  2,000 files, 1,100 hours
Recorded video (MPEG):  1,600 files, 1,150 hours

In addition to the above, csv text files - one per entity type (texts,
sessions, speakers, etc) - can be downloaded here:
http://jakarta.shh.mpg.de/data.php

---
Bradley Taylor
brad6020 at yahoo.com

David Gil
gil at shh.mpg.de

Linguistic Field(s): Language Documentation
                     Text/Corpus Linguistics

Language Family(ies): Austronesian



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
                       Fund Drive 2017
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

This year the LINGUIST List hopes to raise $70,000. This money
will go to help keep the List running by supporting all of our 
Student Editors for the coming year.

Don't forget to check out the Fund Drive 2017 site!

http://funddrive.linguistlist.org/

We collect donations via the eLinguistics Foundation, a
registered 501(c) Non Profit organization with the federal tax
number 45-4211155. The donations can be offset against your
federal and sometimes your state tax return (U.S. tax payers
only). For more information visit the IRS Web-Site, or contact
your financial advisor.

Many companies also offer a gift matching program. Contact
your human resources department and send us the necessary form.

Thank you very much for your support of LINGUIST!
 


----------------------------------------------------------
LINGUIST List: Vol-28-2007	
----------------------------------------------------------
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
          http://multitree.org/







More information about the LINGUIST mailing list