[Corpora-List] POS Tagger for German / Java

Niels Ott niels at drni.de
Sun Jan 14 16:56:30 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear Michael,

Michael Sonntag schrieb:
> 3. I also used qtag. But it comes only with a, for my task too
> small data base (lexicon and matrix).

I used Qtag for some testing and I found that the quality of its output
depends on the training data. (I assume this is true for most taggers.)

In case you have a large tagged corpus, try training Qtag with it. If
you plan to use corpora/treebanks in TigerXML, I can provide an XSLT
style sheet to convert them into vertical training data for Qtag.

> So, is there any POS tagger out there that is easy to use and up
> for the task?

TreeTagger (TT) seems to be a renowned tagger. However, I found it has
problems with processing Unicode. As you seem to require it to work with
your Java program, your wrapper should ensure that it feeds TT with
iso-8859-1 only.

Regards,

  Niels Ott

P.S.: You will get this message twice, as I forgot to include the
corpora list into the recipient list.

- --
Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
"Paper or plastic?" "Not (not paper and not plastic)."  (Augustus
DeMorgan in a grocery store ;-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFFqmC+bosnVosUgx0RApBTAKCEPNQoHhTvhiu/GW36DBYfV9sioACfV9wD
blU9XV55J1f4IbYUtT7pY4Y=
=BSbA
-----END PGP SIGNATURE-----



More information about the Corpora mailing list