[Corpora-List] Annotated Corpora in Estonian
Ivelina Nikolova
iva at lml.bas.bg
Sat Feb 28 14:09:25 UTC 2009
Many thanks to everyone who sent references concerning annotated corpora in
Estonian!
Here is a short summary of the available resources:
----------------------------------------------------------------------
Multext-East - morphologycal annotation and names
The TRACTOR archive -
http://tractor.bham.ac.uk/tractor/catalogue.html#Estonian
http://www.cl.ut.ee/korpused/ contains some links to annotated corpora.
Part of (approximately 400.000 words) corpus annotated with labels of
shallow syntactic functions subjects, objects, adverbials etc). Attributes
have special tags which indicate the direction of their heads but the
attribute and the head have not been linked. It is traditional constraint
grammar style mark-up.
http://www.cs.ut.ee/~kaili/Korpus/pindmine/
A small treebank (370 simple sentences) in Tiger XML.
see 2 last links from http://www.cs.ut.ee/~kaili/Korpus/puud/
Eckhard Bick's semiautomatically generated experimental treebank:
http://corp.hum.sdu.dk/arborest.html
but there are lot of errors in it.
The papers describing the parsing process can be found in:
http://www.cs.ut.ee/~kaili/papers/index.html
Heli Uibo. Syntactically annotated corpora of Estonian. In: The First Baltic
Conference "Human Language Technology – the Baltic Perspective“, Riga,
Latvia, April 21-22, 2004, pp. 45-48.
Heli Uibo and Eckhard Bick. Treebank-based research and e-learning of
Estonian syntax. In: Proceedings of Second Baltic Conference on Human
Language Technologies. Tallinn, April 4-5, 2005. Editors: M. Langemets, P.
Penjam. Pp. 195-200.
Resources from University of Tartu at
http://w3.msi.vxu.se/~nivre/research/nt.html
Best regards,
Ivelina Nikolova
----- Original Message -----
From: <artanisz at mail.bg>
To: "Ivelina Nikolova" <iva at lml.bas.bg>
Sent: Friday, February 27, 2009 7:13 PM
Subject: Re: [Corpora-List] Annotated Corpora in Estonian
Здравей Ивелина,
Може би това ще ти бъде от полза: Nordic Treebank Network,
http://w3.msi.vxu.se/~nivre/research/nt.html , особено ресурсите на
университета в Tartu.
поздрави:
Атанас Чанев, PhD
> Dear All,
>
> I am looking for an annotated corpus in Estonian in order to train a
> parser on it.
> May you give me any pointers to work on NE recognition for Estonian as
> well?
>
> Thanks in advance,
> Ivelina Nikolova
-------------------------------------
Пролетни намаления от ICN.Bg с 15% отстъпка.
Вземете своя Безплатен домейн при регистрация на
хостинг план за 12 месеца.
http://icn.bg/
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list