[Corpora-List] Annotated Corpora in Estonian

Ivelina Nikolova iva at lml.bas.bg
Sat Feb 28 14:09:25 UTC 2009


Many thanks to everyone who sent references concerning annotated corpora in 
Estonian!

Here is a short summary of the available resources:
----------------------------------------------------------------------

Multext-East  - morphologycal annotation and names

The TRACTOR archive - 
http://tractor.bham.ac.uk/tractor/catalogue.html#Estonian

http://www.cl.ut.ee/korpused/ contains some links to annotated corpora.

Part of (approximately 400.000 words) corpus annotated with labels of 
shallow syntactic functions subjects, objects, adverbials etc). Attributes 
have special tags which indicate the direction of their heads but the 
attribute and the head have not been linked. It is traditional constraint 
grammar style mark-up.
http://www.cs.ut.ee/~kaili/Korpus/pindmine/

A small treebank (370 simple sentences) in Tiger XML.
see 2 last links from http://www.cs.ut.ee/~kaili/Korpus/puud/

Eckhard Bick's semiautomatically generated experimental treebank:
http://corp.hum.sdu.dk/arborest.html
but there are lot of errors in it.

The papers describing the parsing process can be found in:
http://www.cs.ut.ee/~kaili/papers/index.html

Heli Uibo. Syntactically annotated corpora of Estonian. In: The First Baltic 
Conference "Human Language Technology – the Baltic Perspective“, Riga, 
Latvia, April 21-22, 2004, pp. 45-48.

Heli Uibo and Eckhard Bick. Treebank-based research and e-learning of 
Estonian syntax. In: Proceedings of  Second Baltic Conference on Human 
Language Technologies. Tallinn, April 4-5, 2005. Editors: M. Langemets, P. 
Penjam. Pp. 195-200.

Resources from University of Tartu at
http://w3.msi.vxu.se/~nivre/research/nt.html


Best regards,
Ivelina Nikolova






----- Original Message ----- 
From: <artanisz at mail.bg>
To: "Ivelina Nikolova" <iva at lml.bas.bg>
Sent: Friday, February 27, 2009 7:13 PM
Subject: Re: [Corpora-List] Annotated Corpora in Estonian


Здравей Ивелина,

Може би това ще ти бъде от полза: Nordic Treebank Network,
http://w3.msi.vxu.se/~nivre/research/nt.html , особено ресурсите на
университета в Tartu.

поздрави:
Атанас Чанев, PhD



> Dear All,

>

> I am looking for an annotated corpus in Estonian in order to train a

> parser on it.

> May you give me any pointers to work on NE recognition for Estonian as 
> well?

>

> Thanks in advance,

> Ivelina Nikolova



-------------------------------------

Пролетни намаления от ICN.Bg с 15% отстъпка.
Вземете своя Безплатен домейн при регистрация на
хостинг план за 12 месеца.
  http://icn.bg/



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list