Corpora: Free Swedish lexical and corpora resources for research purposes

Yvonne Cederholm Yvonne.Cederholm at svenska.gu.se
Fri Apr 5 11:37:58 UTC 2002


Språkdata and Språkbanken (The Bank of Swedish), Department for Swedish,
University of Göteborg, have decided to release dictionary and corpora
resources for research and education purposes within Swedish
universities. The resources are available under certain conditions,
which are specified in the licence files attached to the resource files.
Foreign universities can apply to the acting director for The Bank of
Swedish, Yvonne Cederholm (lbadm at svenska.gu.se).

Dictionary: Svenska ord (LEXIN)
-------------------------------
A Swedish dictionary containing appr. 20 000 lexical units (lexical
categories: pronunciation, part-of-speech, inflexion, definition,
valency, and linguistic exemples).

The dictionary is available in two formats:

- web version (access only for Swedish universities)
Address: http://spraakbanken.gu.se/lb/lexin/

- XML version for language technology purposes
Address: ftp://ftp.spraakbanken.gu.se/pub/reskit/LEXIN.zip

The SynTag Tree Bank
--------------------
A Swedish tree bank, containing 158 newspaper articles (about 100 000
running words) from the Press-65 corpus,
The corpus can only be used for research purposes and for higher
education. Instructions are required as the format doesn't follow modern
markup standards. Contact Jerker Järborg (Jerker.Jaerborg at svenska.gu.se)
for more information.
Address: ftp://ftp.spraakbanken.gu.se/pub/reskit/syntag.zip

The Swedish PAROLE corpus
-------------------------
A morfosyntactically annoted corpus comprising about 19 million running
words. The corpus can only be used for research purposes and for higher
education.
Address: ftp://ftp.spraakbanken.gu.se/pub/reskit/parole.zip

There is also a web version of the Swedish PAROLE corpus (unrestricted
access):
http://spraakbanken.gu.se/lb/parole/

(The Language Bank plans to release a new lemmatized and
morfosyntactically annotated corpus of about 100 mill. running words at
the end of this year. The annotation is based on the information in the
SAOL (The Swedish Academy Glossary).



The board of the Language Bank of Swedish:
Yvonne Cederholm (acting director), Jerker Järborg, Torgny Rasmark, and
Karin Warmenius





--
__________________________________
Yvonne Cederholm
Tf föreståndare för Språkbanken,

Inst. för svenska språket
Göteborgs universitet
Box 200
SE 405 30 GÖTEBORG
tfn.: +46 (0)31 - 773 52 25
fax: +46 (0)31 - 773 44 55



More information about the Corpora mailing list