[Corpora-List] POS Tagger for German / Java
Niels Ott
niels at drni.de
Sun Jan 14 16:56:30 UTC 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear Michael,
Michael Sonntag schrieb:
> 3. I also used qtag. But it comes only with a, for my task too
> small data base (lexicon and matrix).
I used Qtag for some testing and I found that the quality of its output
depends on the training data. (I assume this is true for most taggers.)
In case you have a large tagged corpus, try training Qtag with it. If
you plan to use corpora/treebanks in TigerXML, I can provide an XSLT
style sheet to convert them into vertical training data for Qtag.
> So, is there any POS tagger out there that is easy to use and up
> for the task?
TreeTagger (TT) seems to be a renowned tagger. However, I found it has
problems with processing Unicode. As you seem to require it to work with
your Java program, your wrapper should ensure that it feeds TT with
iso-8859-1 only.
Regards,
Niels Ott
P.S.: You will get this message twice, as I forgot to include the
corpora list into the recipient list.
- --
Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
"Paper or plastic?" "Not (not paper and not plastic)." (Augustus
DeMorgan in a grocery store ;-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
iD8DBQFFqmC+bosnVosUgx0RApBTAKCEPNQoHhTvhiu/GW36DBYfV9sioACfV9wD
blU9XV55J1f4IbYUtT7pY4Y=
=BSbA
-----END PGP SIGNATURE-----
More information about the Corpora
mailing list