<div dir="ltr"><div class="gmail_quote">-------- Forwarded Message --------<br>
Subject: Languages of Indonesia - Data Deposit<br>
Date: Tue, 2 May 2017 19:24:31 +1000<br>
From: Bradley Taylor <<a href="mailto:btaylor@texonsite.com.au">btaylor@texonsite.com.au</a>><br>
To: <a href="mailto:sealang-l@listserv.linguistlist.org">sealang-l@listserv.<wbr>linguistlist.org</a>,<br>
<a href="mailto:lingtyp@listserv.linguistlist.org">lingtyp@listserv.linguistlist.<wbr>org</a>, <a href="mailto:an-lang@anu.edu.au">an-lang@anu.edu.au</a><br>
<br>
<br>
<br>
Dear linguists,<br>
<br>
We are pleased to announce that the Jakarta Field Station of the Max<br>
Planck Institute for Evolutionary Anthropology (MPI-EVA), along with its<br>
collaborating projects, has just finalised the deposit of its corpora in<br>
The Language Archive (TLA) at the Max Planck Institute for<br>
Psycholinguistics:<br>
<br>
<a href="https://hdl.handle.net/1839/00-0000-0000-0021-10DE-A@view" rel="noreferrer" target="_blank">https://hdl.handle.net/1839/<wbr>00-0000-0000-0021-10DE-A@view</a><br>
<br>
The Jakarta Field Station was a major field project of the former<br>
Department of Linguistics at MPI-EVA. Based in Jakarta, with field<br>
assistants working in various locations across Indonesia, it operated<br>
between 1999 and 2015 with the primary purpose of recording and<br>
documenting languages of the region. Together with collaborating<br>
projects and scientists, it gathered over 2.3 million transcribed<br>
utterances from primarily naturalistic language recordings. An archive<br>
of the Field Station's website can be found here: <a href="http://jakarta.shh.mpg.de" rel="noreferrer" target="_blank">http://jakarta.shh.mpg.de</a><br>
<br>
Most utterances are fully glossed into English and translated into<br>
either English or Indonesian or both. All have session and speaker<br>
metadata and, in the TLA, are in Toolbox format, with many in ELAN<br>
format as well. All data are open-access, can be downloaded, and are<br>
free to use, with appropriate citation.<br>
<br>
Some rough tallies:<br>
<br>
Transcribed sessions: 2,800<br>
Text records (~utterances): 2.3 million<br>
Words (tokens): 8.7 million<br>
Recorded audio (WAV): 2,000 files, 1,100 hours<br>
Recorded video (MPEG): 1,600 files, 1,150 hours<br>
<br>
In addition to the above, csv text files - one per entity type (texts,<br>
sessions, speakers, etc) - can be downloaded here:<br>
<a href="http://jakarta.shh.mpg.de/data.php" rel="noreferrer" target="_blank">http://jakarta.shh.mpg.de/<wbr>data.php</a><br>
<br>
---<br>
Bradley Taylor<br>
<a href="mailto:brad6020@yahoo.com">brad6020@yahoo.com</a><br>
<br>
David Gil<br>
<a href="mailto:gil@shh.mpg.de">gil@shh.mpg.de</a><br>
<br>
</div><br><br clear="all"><div><br></div>--
</div>