Spanish tagger problem: failure to disambiguate determiners (la, los, las) from homophonous clitics

Hannah Forsythe ani.forsythe at gmail.com
Fri Jan 27 20:25:31 UTC 2017


In using the Spanish tagger, I have come across the following consistent 
error: determiners (la, los, las) are consistently mislabeled as their 
homophonous clitics. See the following example from the most recent version 
of Orea-Pine corpus.

Ex. 36    *FAT: coge Papá los señores ?
%mor: v|coge-3S&PRES=take n:prop|Papá pro:obj|él&m-PL=he n|señor&m-PL=sir
?

The correct label should be: det:art|el&m&PL=the

While MOR correctly provides both labels, POST often chooses the wrong 
label. I see many cases of determiners mislabeled as clitics, but not the 
reverse.

My question is:

   1. Can this problem be overcome by modifying POST?
   2. Does any corpus exist that does not have this error?

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/a8cbb762-31b6-447c-ac51-6427c17d15e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170127/cd3716cc/attachment.htm>


More information about the Chibolts mailing list