<div dir="ltr"><div class="gmail_quote">-------- Forwarded Message --------<br>

Subject:     Languages of Indonesia - Data Deposit<br>

Date:     Tue, 2 May 2017 19:24:31 +1000<br>

From:     Bradley Taylor <<a href="mailto:btaylor@texonsite.com.au">btaylor@texonsite.com.au</a>><br>

To:     <a href="mailto:sealang-l@listserv.linguistlist.org">sealang-l@listserv.<wbr>linguistlist.org</a>,<br>

<a href="mailto:lingtyp@listserv.linguistlist.org">lingtyp@listserv.linguistlist.<wbr>org</a>, <a href="mailto:an-lang@anu.edu.au">an-lang@anu.edu.au</a><br>

<br>

<br>

<br>

Dear linguists,<br>

<br>

We are pleased to announce that the Jakarta Field Station of the Max<br>

Planck Institute for Evolutionary Anthropology (MPI-EVA), along with its<br>

collaborating projects, has just finalised the deposit of its corpora in<br>

The Language Archive (TLA) at the Max Planck Institute for<br>

Psycholinguistics:<br>

<br>

<a href="https://hdl.handle.net/1839/00-0000-0000-0021-10DE-A@view" rel="noreferrer" target="_blank">https://hdl.handle.net/1839/<wbr>00-0000-0000-0021-10DE-A@view</a><br>

<br>

The Jakarta Field Station was a major field project of the former<br>

Department of Linguistics at MPI-EVA. Based in Jakarta, with field<br>

assistants working in various locations across Indonesia, it operated<br>

between 1999 and 2015 with the primary purpose of recording and<br>

documenting languages of the region. Together with collaborating<br>

projects and scientists, it gathered over 2.3 million transcribed<br>

utterances from primarily naturalistic language recordings. An archive<br>

of the Field Station's website can be found here: <a href="http://jakarta.shh.mpg.de" rel="noreferrer" target="_blank">http://jakarta.shh.mpg.de</a><br>

<br>

Most utterances are fully glossed into English and translated into<br>

either English or Indonesian or both. All have session and speaker<br>

metadata and, in the TLA, are in Toolbox format, with many in ELAN<br>

format as well. All data are open-access, can be downloaded, and are<br>

free to use, with appropriate citation.<br>

<br>

Some rough tallies:<br>

<br>

Transcribed sessions:  2,800<br>

Text records (~utterances):  2.3 million<br>

Words (tokens):  8.7 million<br>

Recorded audio (WAV):  2,000 files, 1,100 hours<br>

Recorded video (MPEG):  1,600 files, 1,150 hours<br>

<br>

In addition to the above, csv text files - one per entity type (texts,<br>

sessions, speakers, etc) - can be downloaded here:<br>

<a href="http://jakarta.shh.mpg.de/data.php" rel="noreferrer" target="_blank">http://jakarta.shh.mpg.de/<wbr>data.php</a><br>

<br>

---<br>

Bradley Taylor<br>

<a href="mailto:brad6020@yahoo.com">brad6020@yahoo.com</a><br>

<br>

David Gil<br>

<a href="mailto:gil@shh.mpg.de">gil@shh.mpg.de</a><br>

<br>

</div><br><br clear="all"><div><br></div>-- 

</div>