question about spanish MOR
Janet Bang
janet.bang at gmail.com
Fri Jan 17 18:44:51 UTC 2020
Hi Brian,
Thank you for your help! Yes we have noted that for Spanish that the
taggers are quite good in terms of the break down. In our own review of a
couple of transcripts we've also noted about 95% accuracy for part of
speech and lemma each and even a higher percentage if you consider accuracy
of the lemma alone, so we are pretty happy with how we can automate such a
complicated step.
Our version of the SPA MOR was downloaded in 2018, so we will download the
newer version.
For the prepost rules, I am not familiar with how to incorporate these. How
would I include this in the SPA MOR?
Janet
On Thu, Jan 16, 2020 at 6:16 PM Brian MacWhinney <macw at andrew.cmu.edu>
wrote:
> Dear Janet,
> Typically, automatic taggers get about 95% of the words right. The
> best for adult written English are up to 97%. For CHILDES English, we can
> hit about 97%. For Spanish, we are probably down around 95%, although
> sometimes it seems better.
>
> In the case of ratito, you must be using some much earlier version of SPA
> MOR. The current version doesn't have this problem. The case of bonita is
> different. It created a few test sentences and it seems that this problem
> arises when bonita is the last word in the sentences, as in "es una niña
> bonita". The problem is that the association of "co" with final position
> is so strong that it overrides the association of an adjective with the
> preceding noun. This can be corrected with a couple of prepost rules that
> take words that can be either co or adj and force them to be adj when they
> follow a noun. Here are the rules:
>
> # es una gata bonita
> n|* co:voc|*^adj|* => n|* adj|*
> # es una niña bonita
> co:voc|*^n|* co:voc|*^adj|* => n|* adj|*
>
> These work for these cases, but (1) you may have other cases I don't know
> about, and (2) prepost rules can also mess up other things. So, please
> give this a try.
>
> --Brian
>
> On Jan 16, 2020, at 8:47 PM, Janet Y Bang <jbang at stanford.edu> wrote:
>
> Hello,
>
> We are working with the Spanish MOR and noticed a few errors in the MOR
> line:
>
>
> 1. bonita oftentimes gets broken down into co:voc|bonita=pretty in
> cases where the word "bonita" is being used to modify a noun, e.g.,
> utterance is "niña bonita.
> 2. In cases were "rato" or "ratito" is used to indicate a period of
> time, oftentimes the breakdown is n|rato&m-DIM=rat OR n|rato-m=rat.
>
> What would be the most efficient way to fix these errors? We have a
> lab-internal .cut file where we are adding new words to our internal mor
> dictionary that were not in the downloaded dictionary (so that we can keep
> track of differences between our lab lexical items and the downloaded
> dictionary), but we weren't sure how to override words that were already in
> the dictionary? Would it be best to fix these in the relevant cut files in
> the lex folder (i.e., adj.cut, n.cut, respectively)?
>
> Thank you,
> Janet
>
> --
> Janet Y. Bang, Ph.D.
> Postdoctoral Fellow
> Department of Psychology
> Stanford University
>
> jbang at stanford.edu
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/DM5PR02MB3275154B3A345DAA4BD9B5A2D7310%40DM5PR02MB3275.namprd02.prod.outlook.com
> <https://groups.google.com/d/msgid/chibolts/DM5PR02MB3275154B3A345DAA4BD9B5A2D7310%40DM5PR02MB3275.namprd02.prod.outlook.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/ECE843B9-F0F4-4D19-920D-BE7D4DD5F425%40andrew.cmu.edu
> <https://groups.google.com/d/msgid/chibolts/ECE843B9-F0F4-4D19-920D-BE7D4DD5F425%40andrew.cmu.edu?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAC5V4hjo%3DopXfGrbh3r1mPBRhJyubDVkD2nCP7f9a_7wBwmEfw%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20200117/72cc4c57/attachment.htm>
More information about the Chibolts
mailing list