<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><font face="Calibri">Dear colleagues,</font></p>
<p><font face="Calibri">The first versions of two digital corpora
developed as part of the INEL project
(<a class="moz-txt-link-freetext" href="https://inel.corpora.uni-hamburg.de/?page_id=920">https://inel.corpora.uni-hamburg.de/?page_id=920</a>), Selkup and
Kamas, are published online.<br>
<br>
Texts are provided with interlinear glossing (with lexical
glosses in English and Russian), translations into English,
Russian and German. Some texts also have (partial) annotations
for syntactic functions, semantic roles and information status,
lexical borrowings and code-switching.<br>
<br>
The corpora are published in open access under Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International Public
License (CC BY-NC-SA 4.0). See below for details on using the
corpora.<br>
<br>
The corpora are primarily intended for typologically aware
corpus-based grammatical research but may also be of interest to
linguists of other branches as well as to specialists in
folklore, anthropology and history.<br>
<br>
<br>
1. INEL Selkup Corpus (v0.1)<br>
<a class="moz-txt-link-freetext" href="http://hdl.handle.net/11022/0000-0007-CAE5-3">http://hdl.handle.net/11022/0000-0007-CAE5-3</a><br>
<br>
Selkup is an endangered Samoyedic language (Uralic family),
which used to be spoken in many small settlements dispersed over
a large territory in Western Siberia.<br>
The INEL Selkup corpus is composed of texts from the archive of
Angelina Ivanovna Kuzmina (1924–2002), who gathered a large
amount of material on Selkup in almost all regions where the
Selkup people lived in 1962–1977. Most texts in the corpus
originate from the handwritten part of the archive that she
transferred to Hamburg in 2001, the others come from her sound
recordings digitized in 2001, which have been transcribed and
translated within the INEL project.<br>
The present version of the corpus comprises 78 texts (18 673
words), mostly representing Northern varieties of Selkup.<br>
<br>
<br>
2. INEL Kamas Corpus (v0.1)<br>
<a class="moz-txt-link-freetext" href="http://hdl.handle.net/11022/0000-0007-CAE6-2">http://hdl.handle.net/11022/0000-0007-CAE6-2</a><br>
<br>
Kamas belongs to the Samoyedic branch of the Uralic language
family. The language became extinct by the late XXth century,
with the death of its last known speaker, Klavdiya Plotnikova
(1895–1989). All the surviving Kamas texts document Forest Kamas
varieties spoken in the settlement of Abalakovo, in the present
Krasnoyarsk Krai in Southern Siberia.<br>
The INEL Kamas corpus is the first publicly available digital
resource with annotated Kamas texts. The INEL Kamas corpus
consists of two parts: folklore texts collected by Kai Donner in
1912–1914, and transcribed audio recordings of Klavdiya
Plotnikova made between 1964 and 1970 in Abalakovo, Tartu and
Tallinn. Most of these recordings were transcribed within the
INEL project (including re-transcribing some tapes fragments of
which were published by Ago Künnap in 1976–1992).<br>
The present version of the corpus comprises 137 texts (48 293
words); this includes 16 texts collected by Kai Donner and 121
text from the recordings of Klavdiya Plotnikova (ca. 10,5
hours).<br>
<br>
<br>
Working with the corpora<br>
<br>
The data in the corpora (annotated texts as well as
corresponding metadata) are represented in XML formats of the
freely distributed EXMARaLDA suite (<a class="moz-txt-link-freetext" href="http://exmaralda.org/en/">http://exmaralda.org/en/</a>).<br>
<br>
User guides (in English) are available here:<br>
<a class="moz-txt-link-freetext" href="https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:selkup-0.1_INEL_Selkup_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Selkup_Corpus.pdf">https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:selkup-0.1_INEL_Selkup_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Selkup_Corpus.pdf</a><br>
<a class="moz-txt-link-freetext" href="https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:kamas-0.1_INEL_Kamas_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Kamas_Corpus.pdf">https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:kamas-0.1_INEL_Kamas_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Kamas_Corpus.pdf</a><br>
<br>
For browsing (and playback) of individual texts, use «Sessions»
tab on the main corpus page. Each text can be viewed in one of
three online formats (e.g. Visualizations: Score) and downloaded
in EXB (an EXMARaLDA format). The sources of texts, i.e. scanned
pages (PDF) or sound files (WAV, MP3) can also be
viewed/downloaded.<br>
<br>
For searching across the whole corpus, the complete archive of
the corpus files can be downloaded and searched with the EXAKT
program of the EXMARaLDA suite.<br>
Furthermore, in the next few weeks, an online search interface
will be open for both corpora, based on the Tsakonian Corpus
Platform (Tsakorpus, <a class="moz-txt-link-freetext" href="https://bitbucket.org/tsakorpus/">https://bitbucket.org/tsakorpus/</a>). A test
search across a fragment of the Selkup corpus is currently
available at
<a class="moz-txt-link-freetext" href="https://inel.corpora.uni-hamburg.de/SelkupCorpus/search">https://inel.corpora.uni-hamburg.de/SelkupCorpus/search</a>.<br>
<br>
Please send your comments and suggestions to:
<a class="moz-txt-link-abbreviated" href="mailto:inel@uni-hamburg.de">inel@uni-hamburg.de</a>.<br>
<br>
Best regards,<br>
Alexandre Arkhipov</font><br>
</p>
</body>
</html>