[Lingtyp] FYI: INEL Selkup corpus 1.0

Alexandre Arkhipov sarkipo at yandex.ru
Fri Jul 3 14:16:00 UTC 2020


Dear colleagues,

We are glad to announce the release of an updated version (now 1.0) of 
the Selkup corpus developed in the INEL project 
(https://inel.corpora.uni-hamburg.de/).

Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2020. INEL Selkup 
Corpus. Version 1.0. Publication date 2020-06-30. Archived in Hamburger 
Zentrum für Sprachkorpora.
http://hdl.handle.net/11022/0000-0007-E1D5-A
In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, 
Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern 
Eurasian languages.

- The present release contains 264 texts from 74 speakers, representing 
Northern, Central and Southern dialects of Selkup. They count 7887 
sentences and 42466 words in total.
- Many texts have been provided with (partial) annotations for syntactic 
functions and semantic roles.
- Corrections made to audio transcriptions, glossing and other annotations.
User documentation (in English) is available here:
https://corpora.uni-hamburg.de/hzsk/de/islandora/object/file:selkup-1.0_User_Documentation_for_INEL_Selkup_Corpus_1.0/datastream/PDF/INEL_Selkup_Corpus.pdf

The corpus can be searched online thanks to the Tsakorpus platform:
https://inel.corpora.uni-hamburg.de/SelkupCorpus/search

About the corpus

Selkup is an endangered Samoyedic language (Uralic family), which used 
to be spoken in many small settlements dispersed over a large territory 
in Western Siberia.
The INEL Selkup corpus is composed of texts from the archive of Angelina 
Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on 
Selkup in almost all regions where the Selkup people lived in 1962–1977. 
Most texts in the corpus originate from the handwritten part of the 
archive that she transferred to Hamburg in 2001, the others come from 
her sound recordings digitized in 2001, which have been transcribed and 
translated within the INEL project.

The corpus is released under CC BY-NC-SA 4.0 license. In parallel to the 
online search option, the complete archive of the corpus files can be 
downloaded and searched with the EXAKT program of the EXMARaLDA suite.

For browsing individual texts, use «Sessions» tab on the main corpus 
page. Each text can be browsed in one of online formats (e.g. 
Visualizations: Score) or downloaded as EXB (an EXMARaLDA format, 
convertible to ELAN). The sources of texts, i.e. scanned pages (PDF) or 
sound files (WAV, MP3) can also be viewed/downloaded.

Please feel free to send your comments and suggestions to: 
inel at uni-hamburg.de.

Best regards,
Alexandre Arkhipov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20200703/3e28171a/attachment.htm>


More information about the Lingtyp mailing list