Corpora: RE: Swedish taggers & parsers

Atro Voutilainen atro.voutilainen at
Fri Oct 6 10:59:12 UTC 2000

A few days ago I posted a query about references to tagging & parsing of Swedish.
Attached are the results. I wish to thank the following people for their help:

Daniel Ridings
Dimitrios Kokkinakis
Jakub Zavrel
Joakim Nivre
Jussi Karlgren
Nikolaj Lindberg

I would like to ask a second question: are there any available
collections of Swedish sentences that exemplify different grammatical
phenomena in Swedish? In return for pointers, I'll post a summary.

Atro Voutilainen

Atro Voutilainen                              mobile: +358 50 5437452
Conexor oy                                       fax: +358 9 37468502
Helsinki Science Park                     atro.voutilainen at
Koetilantie 3, 00710 Helsinki, Finland
-------------- next part --------------


Bj?rn Gamb?ck. 1997. Processing Swedish Sentences: A Unification-Based
Grammar and some Applications. Doctor of Engineering Thesis, The Royal
Institute of Technology and Stockholm University, Dept. of Computer
and Systems Sciences, Stockholm, Sweden, June. Also available as SICS
Dissertation Series 21, Swedish Institute of Computer Science, Kista,

Kokkinakis D. and Johansson Kokkinakis S. (1999), A Cascaded
Finite-State Parser for Syntactic Analysis of Swedish, In Proceedings
of the 9th EACL (European Chapter of The Association of Computational
Linguistics), Bergen, Norway

                  morphology, tagging

Most papers and pointers are related to tagging (morphology, POS,
morphosyntactic functions).

Karlgren & Cutting paper on implementing a HMM tagger of Swedish,
Proc. Nodalida '93, Stockholm.

Martin Eineborg & Nikolaj Lindberg, 1999.  Improving Part of Speech
Disambiguation Rules by Adding Linguistic Knowledge. In Proceedings of
the Ninth International Workshop on Inductive Logic Programming (
ILP'99 ), Bled, Slovenia.

Brants & Samuelsson 1995. Tagging the Teleman Corpus.
Procs. Nodalida'95. Helsinki.

"SWETWOL: A Comprehensive Morphological Analyzer for Swedish". Nordic
Journal of Linguistics 15, 1992, 1-45.

Juhani Birn, Lingsoft, Inc., 1998. Swedish Constraint Grammar: A Short

AUTHOR = "Eineborg, Martin and Lindberg, Nikolaj",
TITLE = "Induction of {C}onstraint {G}rammar-Rules Using {P}rogol",
BOOKTITLE = "Proceedings of The Eighth International Conference on
Inductive Logic Programming ({ILP}'98)",
YEAR = "1998",
ADDRESS = "Madison, Wisconsin",
PAGES = "116--124",

AUTHOR = "Lindberg, Nikolaj and Eineborg, Martin",
TITLE = "Learning {C}onstraint {G}rammar-style disambiguation rules
using {I}nductive {L}ogic {P}rogramming",
BOOKTITLE = "Proceedings of COLING/ACL'98",
YEAR = "1998",
PAGES = "775--779",
ADDRESS = "Montreal, Canada",

AUTHOR = "Lindberg, Nikolaj and Eineborg, Martin",
TITLE = "Improving Part of Speech Disambiguation Rules by Adding
Linguistic Knowledge",
BOOKTITLE = "Proceedings of the Ninth International Workshop on
Inductive Logic Programming ({ILP}'99)",
PAGES = "186--197",
YEAR = 1999,
EDITOR = "D\v{z}eroski, Sa\v{s}o and Flach, Peter",
ADDRESS = "Bled, Slovenia"

Eineborg, M. and Lindberg, N. (2000). ILP in Part-of-Speech Tagging - An
Overview. In James Cussens and Saso Dzeroski, editors, Learning Language
in Logic, volume 1925 of LNAI. Springer, 2000.

AUTHOR = "Lager, Torbj{\"o}rn",
TITLE = "The $\mu$-{TBL} System: Logic Programming Tools for
Transformation-Based Learning",
BOOKTITLE = "Proceedings of CoNLL'99",
YEAR = "1999",
ADDRESS = "Bergen, Norway"

AUTHOR = "Carlberger, Johan and Kann, Viggo",
TITLE = "Implementing an Efficient Part-of-Speech Tagger",
YEAR = "1999",
NOTE = "To appear. Available at {\tt}"

AUTHOR="{R}idings, Daniel",
TITLE="{SUC} and the {B}rill tagger",
HOWPUBLISHED="{GU-ISS-98-1} (Research
Reports from the Department of Swedish, G{\"o}teborg University)"

torbj?rn lager has done some work on learning CG-rules using an
error-driven transformation based learning approach. see

Nivre, J., Gr?nqvist, L., Gustafsson, M., Lager, T. & Sofkova,
S. (1996) Tagging Spoken Language Using Written Language
Statistics. In Proceedings of the 16th International Conference of
Computational Linguistics (COLING-96). Copenhagen: Center for Language

Nivre, J. (2000) Sparse Data and Smoothing in Statistical Part-of-Speech
Tagging. Journal of Quantitative Lingustics, 7(1), 1-17.

Nivre, J. & Gr?nqvist, L. (in press) Tagging a Corpus of Spoken Swedish.
To appear in International Journal of Corpus Linguistics.

On the ILK webpage in Tilburg, we have a demo on-line of a
Memory-based Swedish tagger. It's been trained on the SUC corpus...
The URL is: Under "Demonstrations" we have a demo of our tagger.

Kokkinakis D. and Johansson Kokkinakis D. (1997), A Robust and
Modularized Lemmatizer/Tagger for Swedish Based on Large Lexical
Resources, Research Reports from the Department of Swedish,
GU-ISS-97-1, Spr?kdata. Swedish tagger and light parser

More information about the Corpora mailing list