[Lingtyp] FYI: INEL Selkup corpus 1.0
Alexandre Arkhipov
sarkipo at yandex.ru
Fri Jul 3 14:16:00 UTC 2020
Dear colleagues,
We are glad to announce the release of an updated version (now 1.0) of
the Selkup corpus developed in the INEL project
(https://inel.corpora.uni-hamburg.de/).
Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2020. INEL Selkup
Corpus. Version 1.0. Publication date 2020-06-30. Archived in Hamburger
Zentrum für Sprachkorpora.
http://hdl.handle.net/11022/0000-0007-E1D5-A
In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka,
Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern
Eurasian languages.
- The present release contains 264 texts from 74 speakers, representing
Northern, Central and Southern dialects of Selkup. They count 7887
sentences and 42466 words in total.
- Many texts have been provided with (partial) annotations for syntactic
functions and semantic roles.
- Corrections made to audio transcriptions, glossing and other annotations.
User documentation (in English) is available here:
https://corpora.uni-hamburg.de/hzsk/de/islandora/object/file:selkup-1.0_User_Documentation_for_INEL_Selkup_Corpus_1.0/datastream/PDF/INEL_Selkup_Corpus.pdf
The corpus can be searched online thanks to the Tsakorpus platform:
https://inel.corpora.uni-hamburg.de/SelkupCorpus/search
About the corpus
Selkup is an endangered Samoyedic language (Uralic family), which used
to be spoken in many small settlements dispersed over a large territory
in Western Siberia.
The INEL Selkup corpus is composed of texts from the archive of Angelina
Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on
Selkup in almost all regions where the Selkup people lived in 1962–1977.
Most texts in the corpus originate from the handwritten part of the
archive that she transferred to Hamburg in 2001, the others come from
her sound recordings digitized in 2001, which have been transcribed and
translated within the INEL project.
The corpus is released under CC BY-NC-SA 4.0 license. In parallel to the
online search option, the complete archive of the corpus files can be
downloaded and searched with the EXAKT program of the EXMARaLDA suite.
For browsing individual texts, use «Sessions» tab on the main corpus
page. Each text can be browsed in one of online formats (e.g.
Visualizations: Score) or downloaded as EXB (an EXMARaLDA format,
convertible to ELAN). The sources of texts, i.e. scanned pages (PDF) or
sound files (WAV, MP3) can also be viewed/downloaded.
Please feel free to send your comments and suggestions to:
inel at uni-hamburg.de.
Best regards,
Alexandre Arkhipov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20200703/3e28171a/attachment.htm>
More information about the Lingtyp
mailing list