[Ura-list] FYI: INEL Selkup and Kamas corpora

Alexandre Arkhipov alexandre.arkhipov at uni-hamburg.de
Mon Jun 24 12:38:39 EDT 2019


Dear colleagues,

We are glad to inform you that the previously announced INEL corpora of 
Selkup and Kamas languages are now accessible through a new search 
interface.
The starting pages for both corpora are as follows:
Selkup: https://inel.corpora.uni-hamburg.de/SelkupCorpus/search
Kamas: https://inel.corpora.uni-hamburg.de/KamasCorpus/search

The search is based on the Tsakonian Corpus Platform (Tsakorpus, 
https://bitbucket.org/tsakorpus/). It allows for searching based on 
transcription ("Word" field, also "Lemma" for the default root shape), 
translation, and/or grammatical glosses or categories, including 
negative queries and multi-word queries (with or without distance 
constraints). Change the "View" from "standard" to "glossed" in the 
options to get the glossed output.
Grammatical tags are unordered, they are generated from glosses by 
rules. It is usually more efficient to search for grammatical tags than 
for specific glosses, unless you're sure of the exact gloss(es) and 
their relative order in the word.
Pop-up windows in both "Grammar" and "Gloss" fields let you pick and 
choose specific grammatical tags/glosses from the corpus-specific list.

We hope you will enjoy using this search interface.
While help pages will be gradually updated, please send your comments 
and suggestions to: inel at uni-hamburg.de.

Best regards,
Alexandre Arkhipov


25/01/2019 18:33, Alexandre Arkhipov пишет:
>
> Dear colleagues,
>
> The first versions of two digital corpora developed as part of the 
> INEL project (https://inel.corpora.uni-hamburg.de/), Selkup and Kamas, 
> are published online.
>
> Texts are provided with interlinear glossing (with lexical glosses in 
> English and Russian), translations into English, Russian and German. 
> Some texts also have (partial) annotations for syntactic functions, 
> semantic roles and information status, lexical borrowings and 
> code-switching.
>
> The corpora are published in open access under Creative Commons 
> Attribution-NonCommercial-ShareAlike 4.0 International Public License 
> (CC BY-NC-SA 4.0). See below for details on using the corpora.
>
> The corpora are primarily intended for typologically aware 
> corpus-based grammatical research but may also be of interest to 
> linguists of other branches as well as to specialists in folklore, 
> anthropology and history.
>
>
> 1. INEL Selkup Corpus (v0.1)
> http://hdl.handle.net/11022/0000-0007-CAE5-3
>
> Selkup is an endangered Samoyedic language (Uralic family), which used 
> to be spoken in many small settlements dispersed over a large 
> territory in Western Siberia.
> The INEL Selkup corpus is composed of texts from the archive of 
> Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of 
> material on Selkup in almost all regions where the Selkup people lived 
> in 1962–1977. Most texts in the corpus originate from the handwritten 
> part of the archive that she transferred to Hamburg in 2001, the 
> others come from her sound recordings digitized in 2001, which have 
> been transcribed and translated within the INEL project.
> The present version of the corpus comprises 78 texts (18 673 words), 
> mostly representing Northern varieties of Selkup.
>
>
> 2. INEL Kamas Corpus (v0.1)
> http://hdl.handle.net/11022/0000-0007-CAE6-2
>
> Kamas belongs to the Samoyedic branch of the Uralic language family. 
> The language became extinct by the late XXth century, with the death 
> of its last known speaker, Klavdiya Plotnikova (1895–1989). All the 
> surviving Kamas texts document Forest Kamas varieties spoken in the 
> settlement of Abalakovo, in the present Krasnoyarsk Krai in Southern 
> Siberia.
> The INEL Kamas corpus is the first publicly available digital resource 
> with annotated Kamas texts. The INEL Kamas corpus consists of two 
> parts: folklore texts collected by Kai Donner in 1912–1914, and 
> transcribed audio recordings of Klavdiya Plotnikova made between 1964 
> and 1970 in Abalakovo, Tartu and Tallinn. Most of these recordings 
> were transcribed within the INEL project (including re-transcribing 
> some tapes fragments of which were published by Ago Künnap in 1976–1992).
> The present version of the corpus comprises 137 texts (48 293 words); 
> this includes 16 texts collected by Kai Donner and 121 text from the 
> recordings of Klavdiya Plotnikova (ca. 10,5 hours).
>
>
> Working with the corpora
>
> The data in the corpora (annotated texts as well as corresponding 
> metadata) are represented in XML formats of the freely distributed 
> EXMARaLDA suite (http://exmaralda.org/en/).
>
> User guides (in English) are available here:
> https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:selkup-0.1_INEL_Selkup_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Selkup_Corpus.pdf
> https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:kamas-0.1_INEL_Kamas_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Kamas_Corpus.pdf
>
> For browsing (and playback) of individual texts, use «Sessions» tab on 
> the main corpus page. Each text can be viewed in one of three online 
> formats (e.g. Visualizations: Score) and downloaded in EXB (an 
> EXMARaLDA format). The sources of texts, i.e. scanned pages (PDF) or 
> sound files (WAV, MP3) can also be viewed/downloaded.
>
> For searching across the whole corpus, the complete archive of the 
> corpus files can be downloaded and searched with the EXAKT program of 
> the EXMARaLDA suite.
> Furthermore, in the next few weeks, an online search interface will be 
> open for both corpora, based on the Tsakonian Corpus Platform 
> (Tsakorpus, https://bitbucket.org/tsakorpus/). A test search across a 
> fragment of the Selkup corpus is currently available at 
> https://inel.corpora.uni-hamburg.de/SelkupCorpus/search.
>
> Please send your comments and suggestions to: inel at uni-hamburg.de.
>
> Best regards,
> Alexandre Arkhipov
>
> -- 
> Dr. Alexandre Arkhipov
> Universität Hamburg
> Institut für Finnougristik/Uralistik - Akademieprojekt INEL
> https://inel.corpora.uni-hamburg.de/
> Max-Brauer-Allee 60
> D-22761 Hamburg
> +49 40 42838 6890

-- 
Dr. Alexandre Arkhipov
Universität Hamburg
Institut für Finnougristik/Uralistik - Akademieprojekt INEL
https://inel.corpora.uni-hamburg.de/
Max-Brauer-Allee 60
D-22765 Hamburg
+49 40 42838 6890

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/ura-list/attachments/20190624/9290aa84/attachment.html>


More information about the Ura-list mailing list