[Corpora-List] Romanian corpus
Eckhard Bick
eckhard.bick at mail.dk
Mon Jul 2 13:02:17 UTC 2007
Hello,
I would like to announce the completion of a grammatically annotated
Romanian corpus at http://corp.hum.sdu.dk
The corpus covers the business language domain and has a size of 21.4
million words (27 million tokens). It was compiled by Arina Greavu
(arinagreavu at yahoo.com) from news text sources, and annotated with (a)
PoS and morphology using Dan Tufis' tagger
(http://www.infoiasi.ro/bin/view/Structure/tufis), as well as (b)
syntactic function and shallow dependency markers using a Constraint
Grammar system at VISL
(http://beta.visl.sdu.dk/constraint_grammar.html). Both text and
annotation can be searched password-free through a menu-based interface.
However, search results will be in concordance style, not running text
or entire articles.
Best regards,
Eckhard Bick
--
Eckhard Bick,
cand.med., dr.phil.
University of Southern Denmark
e-mail: eckhard.bick at mail.dk
web: http://beta.visl.sdu.dk
More information about the Corpora
mailing list