[Corpora-List] Romanian language corpora
Vlad V. Gojol
gojol at rnc.ro
Thu Dec 6 07:50:00 UTC 2007
Dear Mr. Frumuselu,
We have a corpus of newspapers texts : 56 mil. words
with diacritics ( Unicode .txt format ), parsed with
GojolParser ( two formats : dependency maps and trees )
with an accuracy comparable to the manual one : cca 0.1%
tagging errors rate. We may add up its translation into
English, sentence by sentence, with GojolTranslator
( GoTra ), intelligible enough ( anyway superior to
Systran translator's one for English-French, for instance ).
Not downloadable, but deliverable on DVD/CD.
Regards,
Vlad Gojol
---------------------------
LANGOS S.R.L.
Phone: +40 21 778 5315
Fax: +40 21 413 7420
E-mail: gojol at rnc.ro
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list