[Ura-list] FYI: INEL Selkup and Kamas corpora
Alexandre Arkhipov
alexandre.arkhipov at uni-hamburg.de
Mon Jun 24 16:38:39 UTC 2019
Dear colleagues,
We are glad to inform you that the previously announced INEL corpora of
Selkup and Kamas languages are now accessible through a new search
interface.
The starting pages for both corpora are as follows:
Selkup: https://inel.corpora.uni-hamburg.de/SelkupCorpus/search
Kamas: https://inel.corpora.uni-hamburg.de/KamasCorpus/search
The search is based on the Tsakonian Corpus Platform (Tsakorpus,
https://bitbucket.org/tsakorpus/). It allows for searching based on
transcription ("Word" field, also "Lemma" for the default root shape),
translation, and/or grammatical glosses or categories, including
negative queries and multi-word queries (with or without distance
constraints). Change the "View" from "standard" to "glossed" in the
options to get the glossed output.
Grammatical tags are unordered, they are generated from glosses by
rules. It is usually more efficient to search for grammatical tags than
for specific glosses, unless you're sure of the exact gloss(es) and
their relative order in the word.
Pop-up windows in both "Grammar" and "Gloss" fields let you pick and
choose specific grammatical tags/glosses from the corpus-specific list.
We hope you will enjoy using this search interface.
While help pages will be gradually updated, please send your comments
and suggestions to: inel at uni-hamburg.de.
Best regards,
Alexandre Arkhipov
25/01/2019 18:33, Alexandre Arkhipov пишет:
>
> Dear colleagues,
>
> The first versions of two digital corpora developed as part of the
> INEL project (https://inel.corpora.uni-hamburg.de/), Selkup and Kamas,
> are published online.
>
> Texts are provided with interlinear glossing (with lexical glosses in
> English and Russian), translations into English, Russian and German.
> Some texts also have (partial) annotations for syntactic functions,
> semantic roles and information status, lexical borrowings and
> code-switching.
>
> The corpora are published in open access under Creative Commons
> Attribution-NonCommercial-ShareAlike 4.0 International Public License
> (CC BY-NC-SA 4.0). See below for details on using the corpora.
>
> The corpora are primarily intended for typologically aware
> corpus-based grammatical research but may also be of interest to
> linguists of other branches as well as to specialists in folklore,
> anthropology and history.
>
>
> 1. INEL Selkup Corpus (v0.1)
> http://hdl.handle.net/11022/0000-0007-CAE5-3
>
> Selkup is an endangered Samoyedic language (Uralic family), which used
> to be spoken in many small settlements dispersed over a large
> territory in Western Siberia.
> The INEL Selkup corpus is composed of texts from the archive of
> Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of
> material on Selkup in almost all regions where the Selkup people lived
> in 1962–1977. Most texts in the corpus originate from the handwritten
> part of the archive that she transferred to Hamburg in 2001, the
> others come from her sound recordings digitized in 2001, which have
> been transcribed and translated within the INEL project.
> The present version of the corpus comprises 78 texts (18 673 words),
> mostly representing Northern varieties of Selkup.
>
>
> 2. INEL Kamas Corpus (v0.1)
> http://hdl.handle.net/11022/0000-0007-CAE6-2
>
> Kamas belongs to the Samoyedic branch of the Uralic language family.
> The language became extinct by the late XXth century, with the death
> of its last known speaker, Klavdiya Plotnikova (1895–1989). All the
> surviving Kamas texts document Forest Kamas varieties spoken in the
> settlement of Abalakovo, in the present Krasnoyarsk Krai in Southern
> Siberia.
> The INEL Kamas corpus is the first publicly available digital resource
> with annotated Kamas texts. The INEL Kamas corpus consists of two
> parts: folklore texts collected by Kai Donner in 1912–1914, and
> transcribed audio recordings of Klavdiya Plotnikova made between 1964
> and 1970 in Abalakovo, Tartu and Tallinn. Most of these recordings
> were transcribed within the INEL project (including re-transcribing
> some tapes fragments of which were published by Ago Künnap in 1976–1992).
> The present version of the corpus comprises 137 texts (48 293 words);
> this includes 16 texts collected by Kai Donner and 121 text from the
> recordings of Klavdiya Plotnikova (ca. 10,5 hours).
>
>
> Working with the corpora
>
> The data in the corpora (annotated texts as well as corresponding
> metadata) are represented in XML formats of the freely distributed
> EXMARaLDA suite (http://exmaralda.org/en/).
>
> User guides (in English) are available here:
> https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:selkup-0.1_INEL_Selkup_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Selkup_Corpus.pdf
> https://corpora.uni-hamburg.de/hzsk/en/islandora/object/file:kamas-0.1_INEL_Kamas_Corpus_0.1_User_Documentation/datastream/PDF/INEL_Kamas_Corpus.pdf
>
> For browsing (and playback) of individual texts, use «Sessions» tab on
> the main corpus page. Each text can be viewed in one of three online
> formats (e.g. Visualizations: Score) and downloaded in EXB (an
> EXMARaLDA format). The sources of texts, i.e. scanned pages (PDF) or
> sound files (WAV, MP3) can also be viewed/downloaded.
>
> For searching across the whole corpus, the complete archive of the
> corpus files can be downloaded and searched with the EXAKT program of
> the EXMARaLDA suite.
> Furthermore, in the next few weeks, an online search interface will be
> open for both corpora, based on the Tsakonian Corpus Platform
> (Tsakorpus, https://bitbucket.org/tsakorpus/). A test search across a
> fragment of the Selkup corpus is currently available at
> https://inel.corpora.uni-hamburg.de/SelkupCorpus/search.
>
> Please send your comments and suggestions to: inel at uni-hamburg.de.
>
> Best regards,
> Alexandre Arkhipov
>
> --
> Dr. Alexandre Arkhipov
> Universität Hamburg
> Institut für Finnougristik/Uralistik - Akademieprojekt INEL
> https://inel.corpora.uni-hamburg.de/
> Max-Brauer-Allee 60
> D-22761 Hamburg
> +49 40 42838 6890
--
Dr. Alexandre Arkhipov
Universität Hamburg
Institut für Finnougristik/Uralistik - Akademieprojekt INEL
https://inel.corpora.uni-hamburg.de/
Max-Brauer-Allee 60
D-22765 Hamburg
+49 40 42838 6890
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/ura-list/attachments/20190624/9290aa84/attachment.htm>
More information about the Ura-list
mailing list