Spanish tagger problem: failure to disambiguate determiners (la, los, las) from homophonous clitics

Brian MacWhinney macw at cmu.edu
Sat Jan 28 08:26:55 UTC 2017


Dear Hannah,

This is puzzling.  There are no errors in the training corpus and one would think that POST could learn the context needed to disambiguate these.  It is true that clitics are one of the most difficult things for learners, but a program should be able to figure this out. Fortunately, it should be possible to fix these using POSTMORTEM.  I will work on the rules for that.  However, the other better approach would be to generate more training files for Spanish.  That has been a problem. I was hoping for some volunteers, but none have emerged so far.  It is not that hard a job, but it would take a few days to make sure all the training materials were correct.

--Brian

From: <chibolts at googlegroups.com> on behalf of Hannah Forsythe <ani.forsythe at gmail.com>
Reply-To: "chibolts at googlegroups.com" <chibolts at googlegroups.com>
Date: Saturday, 28 January 2017 at 4:25 AM
To: chibolts <chibolts at googlegroups.com>
Subject: Spanish tagger problem: failure to disambiguate determiners (la, los, las) from homophonous clitics

In using the Spanish tagger, I have come across the following consistent error: determiners (la, los, las) are consistently mislabeled as their homophonous clitics. See the following example from the most recent version of Orea-Pine corpus.

Ex. 36    *FAT:  coge Papá los señores ?
%mor: v|coge-3S&PRES=take n:prop|Papá pro:obj|él&m-PL=he n|señor&m-PL=sir
?

The correct label should be: det:art|el&m&PL=the

While MOR correctly provides both labels, POST often chooses the wrong label. I see many cases of determiners mislabeled as clitics, but not the reverse.

My question is:

  1.  Can this problem be overcome by modifying POST?
  2.  Does any corpus exist that does not have this error?
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com<mailto:chibolts+unsubscribe at googlegroups.com>.
To post to this group, send email to chibolts at googlegroups.com<mailto:chibolts at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/a8cbb762-31b6-447c-ac51-6427c17d15e2%40googlegroups.com<https://groups.google.com/d/msgid/chibolts/a8cbb762-31b6-447c-ac51-6427c17d15e2%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/BD431C79-A2CD-4031-A235-D43C05C49D7E%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170128/86913469/attachment-0001.htm>


More information about the Chibolts mailing list