[Corpora-List] QTAG tag assignment problem

Tony Berber Sardinha tony4 at uol.com.br
Wed Jun 18 11:35:12 UTC 2003

Dear list members

I wonder if anyone could help me with a QTAG tagging problem (wrong tag
assignment). I'm using a Portuguese language model based on 500K words of tagged

For example, it tagged the Portuguese preposition 'de' as:


The correct output would be 'de_PRP'. There is not a single occurrence of
'de_CJ' in the training corpus.

The possibilities given by the tagger (with the '-f ac' option) are:

de : CJ [28:0.0] IN [1:0.0] PRP [13834:0.0]

This shows the PRP tag as the most likely one by far (even though 13834 does not
correspond to the training corpus frequency of 19886).

The tag most frequently assigned in error is N (noun).

The frequency of tags in the training corpus is:

 110614 N
  61415 PT
  60166 V
  45531 PRP
  44735 ART
  35132 CPR
  27563 PROP
  20933 ADJ
  19281 CJ
  17530 PRN
  16506 ADV
   5258 NUM
    665 DESC
     94 IN

Dr Tony Berber Sardinha
(Catholic University of Sao Paulo, Brazil)
tony4 at uol.com.br
[New website]

More information about the Corpora mailing list