Spanish tagger problem: failure to disambiguate determiners (la, los, las) from homophonous clitics
Hannah Forsythe
ani.forsythe at gmail.com
Fri Jan 27 20:25:31 UTC 2017
In using the Spanish tagger, I have come across the following consistent
error: determiners (la, los, las) are consistently mislabeled as their
homophonous clitics. See the following example from the most recent version
of Orea-Pine corpus.
Ex. 36 *FAT: coge Papá los señores ?
%mor: v|coge-3S&PRES=take n:prop|Papá pro:obj|él&m-PL=he n|señor&m-PL=sir
?
The correct label should be: det:art|el&m&PL=the
While MOR correctly provides both labels, POST often chooses the wrong
label. I see many cases of determiners mislabeled as clitics, but not the
reverse.
My question is:
1. Can this problem be overcome by modifying POST?
2. Does any corpus exist that does not have this error?
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/a8cbb762-31b6-447c-ac51-6427c17d15e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170127/cd3716cc/attachment.htm>
More information about the Chibolts
mailing list