[Corpora-List] POS Tagger for German / Java

Niels Ott niels at drni.de
Sun Jan 14 16:56:30 UTC 2007

Hash: SHA1

Dear Michael,

Michael Sonntag schrieb:
> 3. I also used qtag. But it comes only with a, for my task too
> small data base (lexicon and matrix).

I used Qtag for some testing and I found that the quality of its output
depends on the training data. (I assume this is true for most taggers.)

In case you have a large tagged corpus, try training Qtag with it. If
you plan to use corpora/treebanks in TigerXML, I can provide an XSLT
style sheet to convert them into vertical training data for Qtag.

> So, is there any POS tagger out there that is easy to use and up
> for the task?

TreeTagger (TT) seems to be a renowned tagger. However, I found it has
problems with processing Unicode. As you seem to require it to work with
your Java program, your wrapper should ensure that it feeds TT with
iso-8859-1 only.


  Niels Ott

P.S.: You will get this message twice, as I forgot to include the
corpora list into the recipient list.

- --
Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
"Paper or plastic?" "Not (not paper and not plastic)."  (Augustus
DeMorgan in a grocery store ;-)
Version: GnuPG v1.4.2.2 (GNU/Linux)


More information about the Corpora mailing list