Corpora: Corpus of Serbian Language (CSL)

Sat Jan 19 13:22:59 UTC 2002

         Quantitative Description Of Serbian Language Structure

                     Corpus Of Serbian Language
                                 By
                            Djordje Kostic

                  ADDRESS: www.serbian-corpus.edu.yu

Please circulate to those interested


     The Corpus of Serbian Language CSL was compiled from a sample of 11
million words and spans the Serbian language from the 12th century to the
present day.

      Each word in the CSL is manually tagged for its grammatical status
(at the level of inflected morphology), number of graphemes and syllables
and phonological structure. The text is also tagged for the beginning and
end points of sentences and paragraphs. The system of tagging consists of
about 2000 grammatical (inflected) forms.

      The CSL project was initiated and conducted in the late fifties by
Prof. Đorđe Kostić at the Institute for Experimental Phonetics and Speech
Pathology in Belgrade. In 1996, through joint efforts of the Institute for
Experimental Phonetics and Speech Pathology and the Laboratory for
Experimental Psychology, University of Belgrade, the project was
reactivated and the material transferred into an electronic format.

      The CSL was and still is financed entirely by the Institute for
Experimental Phonetics and Speech Pathology, Belgrade. The head of the
project is Prof. Aleksandar Kostić, director of the Laboratory for
Experimental Psychology, University of Belgrade. The chief editor of all
CSL publications is Dr. Mirjana Sovilj, who is the director of the
Institute for Experimental Phonetics and Speech Pathology.