Bilingual glossing using MOR/POST
Brian MacWhinney
macw at cmu.edu
Mon Dec 27 20:49:39 UTC 2010
Dear Kevin,
The problem with mixed utterances is that they don't have the required syntactic context
needed to do disambiguation through POST. One can imagine possible solutions, but they would
require a pretty huge programming effort. For now, I would think that the best solution is to pay close
attention to the directionality of coding in each mixed utterance. In the case, you give here, it makes
more sense to call this an English sentence with a single Spanish word. In the case of Cantonese-English
mixed language sentences, I often found a single Cantonese form inside a sentence that was otherwise
all English. More interestingly, these interposed forms were often communicators or interjections such as
"haiwo" or just "wo". I was able to handle this by creating a file of interposed Cantonese forms inside the English MOR. You might wish to take a look at this.
There is also a solution that allows you to forcibly insert a part of speech by hand in advance. This is given on
page 26 of the manual. The example there is to use recordar at s$v:inf to show that this Spanish word is an infinitive.
And the other alternative is to simply post-process all the L2| codes in the %mor line by hand.
-- Brian
On Dec 27, 2010, at 10:23 AM, Kevin Donnelly wrote:
> Hi
>
> I've converted a Spanish/English bilingual file to use the current CLAN default
> for language marking, and I'm then going on to gloss it using the MOR/POST
> resources for the two languages.
>
> Words in utterances that are completely in Spanish are unmarked, utterances
> that are completely in English are marked with the precode [- eng], and
> English words in a Spanish utterance are marked with @s.
>
> However, after following the instructions on pp165-6 of the CLAN manual, I'm
> getting the following output (this is an excerpt from the full file):
> =====
> @UTF8
> @Begin
> @Languages: spa, eng
> @Participants: SOF Sofía Adult, KEV Kevin Adult
> *KEV: bueno y qué tú crees de [/] de aquí la cuadra lo que están haciendo ?
> %mor: co|bueno=good conj|y=and pro:int|qué=what pro:per|tú=you vpres|
> cree&PRES=believe prep|de=of adv|aquí=here det:art|el=the n|cuadra=ward
> pro:per:1|lo=him rel|que=that v:aux|esta&PRES=be vger|hace-PROG=do ?
> *SOF: +< pero that's at s illegal at s .
> %mor: conj|pero=but L2|that's L2|illegal .
> *SOF: esa rotonda es illegal at s .
> %mor: det:dem|ese=that ?|rotonda vpres|se-3S&PRES=be L2|illegal .
> *SOF: [- eng] from what I know .
> %mor: prep|from pro:wh|what pro|I v|know .
> @End
> =====
>
> As you can see, the first utterance in Spanish and the last one in English are
> fine, but in the two mixed utterances in the middle the English words are not
> glossed, and are just marked L2.
>
> The manual basically says to run your default language MOR grammar with a
> switch to leave out the utterances marked with a precode, and then to run the
> second language MOR grammar on the utterances marked with a precode. But is
> there a step missing - to deal with mixed utterances?
>
> --
> Pob hwyl / Best wishes
>
> Kevin Donnelly
> kevindonnelly.org.uk
>
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To post to this group, send email to chibolts at googlegroups.com.
> To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
>
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com.
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
More information about the Chibolts
mailing list