34.2135, Software: CLASSLA web corpora of Croatian, Serbian and Slovenian
linguist at listserv.linguistlist.org
Thu Jul 6 05:05:02 UTC 2023
LINGUIST List: Vol-34-2135. Thu Jul 06 2023. ISSN: 1069 - 4875.
Subject: 34.2135, Software: CLASSLA web corpora of Croatian, Serbian and Slovenian
Moderators: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Daniel Swanson, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn, Natasha Singh, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
Editor for this issue: Everett Green <everett at linguistlist.org>
Date: 23-Jun-2023
From: Taja Kuzman [taja.kuzman at ijs.si]
Subject: CLASSLA web corpora of Croatian, Serbian and Slovenian
The CLASSLA Knowledge centre for South Slavic languages
(https://www.clarin.si/info/k-centre/) is delighted to announce the
release of the pilot versions (v0.1) of the CLASSLA web corpora for
Croatian (2.3 billion words), Serbian (2.4 billion words) and
Slovenian (1.9 billion words). They are available for querying via the
CLARIN.SI concordancers (https://www.clarin.si/ske/#open). The main
features of the newly released corpora, aside from their large size
and recency (crawled in 2022) is their automatic enrichment with genre
information (https://huggingface.co/classla/xlm-roberta-base-multiling
ual-text-genre-classifier) and their linguistic processing with the
improved CLASSLA-Stanza annotation pipeline
(https://pypi.org/project/classla/). The pilot versions of these
corpora are intended to gather valuable user feedback, while the
official release (v1.0) of the three existing corpora, along with web
corpora for Bosnian, Montenegrin, Macedonian, and Bulgarian, is
scheduled for later this year.
We warmly welcome you to explore our corpora and feel free to reach
out to us at helpdesk.classla at clarin.si with any ideas for
improvements. You are also invited to read our blog post on the use of
CLASSLA web corpora via the open CLARIN.SI concordancers: https://www.
If you are interested in South Slavic resources and technologies, we
also invite you to join the CLASSLA mailing list
(https://mailman.ijs.si/mailman/listinfo/classla) and to follow the
CLARIN.SI infrastructure on Twitter
Linguistic Field(s): Applied Linguistics
Computational Linguistics
Discourse Analysis
Language Acquisition
Text/Corpus Linguistics
Subject Language(s): Croatian (hrv)
Serbian (srp)
Slovenian (slv)
Language Family(ies): Sogdian-Choresmian-Bactrian
South Slavic
Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html
LINGUIST List is supported by the following publishers:
American Dialect Society/Duke University Press http://dukeupress.edu
Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.bloomsbury.com/uk/
Brill http://www.brill.com
Cambridge Scholars Publishing http://www.cambridgescholars.com/
Cambridge University Press http://www.cambridge.org/linguistics
Cascadilla Press http://www.cascadilla.com/
De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton
Dictionary Society of North America http://dictionarysociety.com/
Edinburgh University Press www.edinburghuniversitypress.com
Equinox Publishing Ltd http://www.equinoxpub.com/
European Language Resources Association (ELRA) http://www.elra.info
Georgetown University Press http://www.press.georgetown.edu
John Benjamins http://www.benjamins.com/
Lincom GmbH https://lincom-shop.eu/
Linguistic Association of Finland http://www.ling.helsinki.fi/sky/
MIT Press http://mitpress.mit.edu/
Multilingual Matters http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/
Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/
Oxford University Press http://www.oup.com/us
SIL International Publications http://www.sil.org/resources/publications
Springer Nature http://www.springer.com
Wiley http://www.wiley.com
LINGUIST List: Vol-34-2135
More information about the LINGUIST
mailing list