[Corpora-List] QTAG tag assignment problem
Tony Berber Sardinha
tony4 at uol.com.br
Wed Jun 18 11:35:12 UTC 2003
Dear list members
I wonder if anyone could help me with a QTAG tagging problem (wrong tag
assignment). I'm using a Portuguese language model based on 500K words of tagged
data.
For example, it tagged the Portuguese preposition 'de' as:
de_CJ
The correct output would be 'de_PRP'. There is not a single occurrence of
'de_CJ' in the training corpus.
The possibilities given by the tagger (with the '-f ac' option) are:
de : CJ [28:0.0] IN [1:0.0] PRP [13834:0.0]
This shows the PRP tag as the most likely one by far (even though 13834 does not
correspond to the training corpus frequency of 19886).
The tag most frequently assigned in error is N (noun).
The frequency of tags in the training corpus is:
110614 N
61415 PT
60166 V
45531 PRP
44735 ART
35132 CPR
27563 PROP
20933 ADJ
19281 CJ
17530 PRN
16506 ADV
5258 NUM
665 DESC
94 IN
cheers
tony.
-------------------------------------
Dr Tony Berber Sardinha
LAEL, PUC/SP
(Catholic University of Sao Paulo, Brazil)
tony4 at uol.com.br
http://lael.pucsp.br/~tony
[New website]
More information about the Corpora
mailing list