excluding utterances with code-switching
Leonid Spektor
spektor at andrew.cmu.edu
Tue Nov 20 02:08:09 UTC 2018
Janet,
MLU and most other commands work on word by word bases, so the -s"L2|*" or -s"*@s* options will only exclude those words, but not the whole utterance. KWAL and COMBO allow selection or exclusion of the whole utterances based on words on those utterances. If you have "[- eng]" pre-code and @s symbols on words and you want to exclude all utterances with "[- eng]" pre-code or if the utterance has at least one word with @s, then you need to use KWAL to extract the utterances you want first and then run MLU on the output of KWAL command. This KWAL and MLU commands should do what you want:
kwal -s"[- eng]" -s*@s* +o at ID +o% -d +f filename(s).cha
mlu *.kwal.cex
If this doesn't do what you want, then please email to me a sample of your data files, so that I could see how you have coded them and give me more details on what you want to achieve.
Leonid.
> On Nov 19, 2018, at 18:52, Janet Bang <jbang at stanford.edu> wrote:
>
> Hello,
>
> We are working on bilingual transcriptions and had a question about code-switched utterances. Apologies if I've missed this in the manual.
>
> One of our goals is to obtain an mlu for Spanish only utterances, excluding mixed utterances. For example:
> *MOT: ahorita tienes que comer.
> *MOT: no es time at s to at s sleep at s.
>
> We would like to obtain an mlu (on the %mor line) excluding the utterance with code-switching. We've tried the following command, but this includes bothutterances excluding the English words, where we'd like the output to consider the Spanish-only line.
> mlu -s"[- eng]" -s"L2|*"
>
> It seems like our options are:
> 1) go back to our transcripts and add a postcode for any code-switched utterances to use the +s switch with postcodes
> 2) use kwal to exclude utterances with the @s symbol similar to what is seen here <https://groups.google.com/forum/#!msg/chibolts/kdeVQEw7OZI/Siad8ni4SrEJ;context-place=searchin/chibolts/adding$20postcode%7Csort:date>
>
> I wanted to know if there was a way to use the switches to exclude utterances with the @s symbol, or automate a way to include a postcode in our transcripts for every utterance with the @s symbol?
>
> Thank you in advance,
> Janet
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To post to this group, send email to chibolts at googlegroups.com <mailto:chibolts at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/MWHPR02MB32801F5C1CF989ABE63E162FD7D80%40MWHPR02MB3280.namprd02.prod.outlook.com <https://groups.google.com/d/msgid/chibolts/MWHPR02MB32801F5C1CF989ABE63E162FD7D80%40MWHPR02MB3280.namprd02.prod.outlook.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/09D9B74F-676E-4C90-971E-94C6E0E898BF%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20181119/895d285a/attachment.htm>
More information about the Chibolts
mailing list