[Lingtyp] Languages of Indonesia - Data Deposit

Bradley Taylor brad6020 at yahoo.com
Tue May 2 06:08:51 EDT 2017


Dear linguists,

We are pleased to announce that the Jakarta Field Station of the Max
Planck Institute for Evolutionary Anthropology (MPI-EVA), along with its
collaborating projects, has just finalised the deposit of its corpora in
The Language Archive (TLA) at the Max Planck Institute for
Psycholinguistics:

https://hdl.handle.net/1839/00-0000-0000-0021-10DE-A@view

The Jakarta Field Station was a major field project of the former
Department of Linguistics at MPI-EVA. Based in Jakarta, with field
assistants working in various locations across Indonesia, it operated
between 1999 and 2015 with the primary purpose of recording and
documenting languages of the region. Together with collaborating
projects and scientists, it gathered over 2.3 million transcribed
utterances from primarily naturalistic language recordings. An archive
of the Field Station's website can be found here: http://jakarta.shh.mpg.de

Most utterances are fully glossed into English and translated into
either English or Indonesian or both. All have session and speaker
metadata and, in the TLA, are in Toolbox format, with many in ELAN
format as well. All data are open-access, can be downloaded, and are
free to use, with appropriate citation.

Some rough tallies:

Transcribed sessions:  2,800
Text records (~utterances):  2.3 million
Words (tokens):  8.7 million
Recorded audio (WAV):  2,000 files, 1,100 hours
Recorded video (MPEG):  1,600 files, 1,150 hours

In addition to the above, csv text files - one per entity type (texts,
sessions, speakers, etc) - can be downloaded here:
http://jakarta.shh.mpg.de/data.php

---
Bradley Taylor
brad6020 at yahoo.com

David Gil
gil at shh.mpg.de


More information about the Lingtyp mailing list