[Corpora-List] QTAG tag assignment problem

Tony Berber Sardinha tony4 at uol.com.br
Wed Jun 18 11:35:12 UTC 2003


Dear list members

I wonder if anyone could help me with a QTAG tagging problem (wrong tag
assignment). I'm using a Portuguese language model based on 500K words of tagged
data.

For example, it tagged the Portuguese preposition 'de' as:

de_CJ

The correct output would be 'de_PRP'. There is not a single occurrence of
'de_CJ' in the training corpus.

The possibilities given by the tagger (with the '-f ac' option) are:

de : CJ [28:0.0] IN [1:0.0] PRP [13834:0.0]

This shows the PRP tag as the most likely one by far (even though 13834 does not
correspond to the training corpus frequency of 19886).

The tag most frequently assigned in error is N (noun).

The frequency of tags in the training corpus is:

 110614 N
  61415 PT
  60166 V
  45531 PRP
  44735 ART
  35132 CPR
  27563 PROP
  20933 ADJ
  19281 CJ
  17530 PRN
  16506 ADV
   5258 NUM
    665 DESC
     94 IN

cheers
tony.
-------------------------------------
Dr Tony Berber Sardinha
LAEL, PUC/SP
(Catholic University of Sao Paulo, Brazil)
tony4 at uol.com.br
http://lael.pucsp.br/~tony
[New website]



More information about the Corpora mailing list