[Corpora-List] Romanian corpus

Eckhard Bick eckhard.bick at mail.dk
Mon Jul 2 13:02:17 UTC 2007


Hello,

I would like to announce the completion of a grammatically annotated 
Romanian corpus at http://corp.hum.sdu.dk

The corpus covers the business language domain and has a size of  21.4 
million words (27 million tokens). It was compiled by Arina Greavu 
(arinagreavu at yahoo.com) from news text sources, and annotated with (a) 
PoS and morphology using Dan Tufis' tagger 
(http://www.infoiasi.ro/bin/view/Structure/tufis), as well as (b) 
syntactic function and shallow dependency markers using a Constraint 
Grammar system at VISL 
(http://beta.visl.sdu.dk/constraint_grammar.html). Both text and 
annotation can be searched password-free through a menu-based interface. 
However, search results will be in concordance style, not running text 
or entire articles.

Best regards,
Eckhard Bick


-- 
Eckhard Bick,
cand.med., dr.phil.
University of Southern Denmark
e-mail: eckhard.bick at mail.dk
web: http://beta.visl.sdu.dk



More information about the Corpora mailing list