[Corpora-List] tagger for Ukranian

chris brew cbrew at acm.org
Tue Feb 8 13:32:39 UTC 2011


The work that Anna Feldman, Jiri Hana and I did is designed to produce a
reasonably good tagger with
very little effort. I would be really pleased if this approach turned out to
be of value for Ukrainian. Our system's probability model has two
components: the first (the transition component) says how one part-of-speech
follows another,  the second (the emission component) says how  individual
words are associated with parts of speech. Anna's dissertation includes
follow-on experiments motivated by the hope that the emission component can
be significantly improved using traditional linguistic notions such as
cognatehood. Our system can definitely use some insights from traditional
philology in this area, and Ukrainian might be just the language where
success can be demonstrated, given the long history of contact with Russian.

Chris

2011/2/8 Natalia Kotsyba <gnatko at gmail.com>

> Thanks to all for the comments and advice, it is really motivating.
>
> >> By the way, if there are any volunteers on the list who
> >> would be willing to join the disambiguation part of the project, they
> >> would most welcome.
> >
> > Is it intended to release the result under an open-source/free licence ?
>
> Yes, the ultimate goal is a free web-service with somewhat abridged
> (for copyright reasons) but still reasonable for work dictionary.
> Meanwhile, taken that the interest in the resource exists, we are
> preparing a command-line version to be placed on sourceforge, which I
> hope to announce on the list by the end of this week.
>
> > If so I know several people who may be interested and will pass the
> > details along to them. If you are interested in arguments for why this
> > would be a good idea, check out Ted Pedersen's paper here[1].
> >
> > What disambiguation framework are you using for the rules ? Something
> > like Constraint Grammar ?
>
> I am focusing on LanguageTool now, http://www.languagetool.org/,
> hoping to involve eventually people with traditional education in
> Ukrainian philology for whom it would be friendly enough to work
> further on disambiguation rules and other available features. If you
> have other suggestions, I would be glad to hear them.
>
> Regards,
> Natalia.
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Chris Brew, Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110208/602511e4/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list