question about spanish MOR
Janet Bang
janet.bang at gmail.com
Fri Jan 17 20:33:04 UTC 2020
Got it. We will try it out and see if it will work for our
transcripts. Thank you!
On Fri, Jan 17, 2020 at 12:20 PM Brian MacWhinney <macw at andrew.cmu.edu>
wrote:
> Dear Janet,
>
> The use of prepost rules is described in the MOR manual. However, I
> wouldn’t recommend creating them unless you have a really clear idea about
> all the ways in which they could overgeneralize.
>
> -- Brian MacWhinney
> Teresa Heinz Professor of Cognitive Psychology,
> Computational Linguistics,
> and Modern Languages, CMU
>
> > On Jan 17, 2020, at 1:44 PM, Janet Bang <janet.bang at gmail.com> wrote:
> >
> > Hi Brian,
> >
> > Thank you for your help! Yes we have noted that for Spanish that the
> taggers are quite good in terms of the break down. In our own review of a
> couple of transcripts we've also noted about 95% accuracy for part of
> speech and lemma each and even a higher percentage if you consider accuracy
> of the lemma alone, so we are pretty happy with how we can automate such a
> complicated step.
> >
> > Our version of the SPA MOR was downloaded in 2018, so we will download
> the newer version.
> >
> > For the prepost rules, I am not familiar with how to incorporate these.
> How would I include this in the SPA MOR?
> >
> > Janet
> >
> > On Thu, Jan 16, 2020 at 6:16 PM Brian MacWhinney <macw at andrew.cmu.edu>
> wrote:
> > Dear Janet,
> > Typically, automatic taggers get about 95% of the words right. The
> best for adult written English are up to 97%. For CHILDES English, we can
> hit about 97%. For Spanish, we are probably down around 95%, although
> sometimes it seems better.
> >
> > In the case of ratito, you must be using some much earlier version of
> SPA MOR. The current version doesn't have this problem. The case of
> bonita is different. It created a few test sentences and it seems that this
> problem arises when bonita is the last word in the sentences, as in "es una
> niña bonita". The problem is that the association of "co" with final
> position is so strong that it overrides the association of an adjective
> with the preceding noun. This can be corrected with a couple of prepost
> rules that take words that can be either co or adj and force them to be adj
> when they follow a noun. Here are the rules:
> >
> > # es una gata bonita
> > n|* co:voc|*^adj|* => n|* adj|*
> > # es una niña bonita
> > co:voc|*^n|* co:voc|*^adj|* => n|* adj|*
> >
> > These work for these cases, but (1) you may have other cases I don't
> know about, and (2) prepost rules can also mess up other things. So,
> please give this a try.
> >
> > --Brian
> >
> >> On Jan 16, 2020, at 8:47 PM, Janet Y Bang <jbang at stanford.edu> wrote:
> >>
> >> Hello,
> >>
> >> We are working with the Spanish MOR and noticed a few errors in the MOR
> line:
> >>
> >> • bonita oftentimes gets broken down into co:voc|bonita=pretty in
> cases where the word "bonita" is being used to modify a noun, e.g.,
> utterance is "niña bonita.
> >> • In cases were "rato" or "ratito" is used to indicate a period of
> time, oftentimes the breakdown is n|rato&m-DIM=rat OR n|rato-m=rat.
> >> What would be the most efficient way to fix these errors? We have a
> lab-internal .cut file where we are adding new words to our internal mor
> dictionary that were not in the downloaded dictionary (so that we can keep
> track of differences between our lab lexical items and the downloaded
> dictionary), but we weren't sure how to override words that were already in
> the dictionary? Would it be best to fix these in the relevant cut files in
> the lex folder (i.e., adj.cut, n.cut, respectively)?
> >>
> >> Thank you,
> >> Janet
> >>
> >> --
> >> Janet Y. Bang, Ph.D.
> >> Postdoctoral Fellow
> >> Department of Psychology
> >> Stanford University
> >>
> >> jbang at stanford.edu
> >>
> >>
> >>
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups "chibolts" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an email to chibolts+unsubscribe at googlegroups.com.
> >> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/DM5PR02MB3275154B3A345DAA4BD9B5A2D7310%40DM5PR02MB3275.namprd02.prod.outlook.com
> .
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "chibolts" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to chibolts+unsubscribe at googlegroups.com.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/ECE843B9-F0F4-4D19-920D-BE7D4DD5F425%40andrew.cmu.edu
> .
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "chibolts" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to chibolts+unsubscribe at googlegroups.com.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/CAC5V4hjo%3DopXfGrbh3r1mPBRhJyubDVkD2nCP7f9a_7wBwmEfw%40mail.gmail.com
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/8B7FBEEB-CD2D-4FA5-A903-3609DF85E2D3%40andrew.cmu.edu
> .
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAC5V4hjy0%2BKYhGdOfvL9CmQPaf6et10CsobALsn25-haziDTUw%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20200117/c5a24670/attachment.htm>
More information about the Chibolts
mailing list