language alternation search

Fri Mar 24 10:50:01 UTC 2017

Dear Leonid,

Thank you for the fast response. Gladys would like to extract are *pairs* 
of sentences, one spoken in one language, the other in another. Imagine a 
sequence like this:

   1. English
   2. 
*English *
   3. *French*
   4. French
   5. 
*French *
   6. *English*

Gladys would like to extract sentences 2-3 (switch Eng->Fr), and 5-6 
(switch Fr->Eng).

Of course, this can be approximated by using kwal, extracting the [- spa] 
sentences with some context, and then looking through by hand to see if the 
context is also in Spanish (so not a switch) or in Qom (yes, it's a switch, 
and thus part of what we would like to extract). I wonder if there is an 
elegant solution for this in CLAN already.

If I were to do this in bash, I'd do something not very elegant like 
(imagining there is only the content of the transcription):
sed -E '/[- spa]/!s/^/[- qom]/' | #add [- qom] to all lines NOT marked with 
[- spa]
   tr '\n' '€' |                             #next replace the line breaks 
by a placeholder
   sed 's/€\(.....)/\1€\1/g' |       #duplicate the language marker on each 
side of the placeholder
   tr '€' '\n'  |                            #translate back the 
placeholder into  line breaks
grep -A 1 -B 1 '[- qom]*[- spa]'  # and finally extract sentences that have 
both language markers

Does that make more sense? Thank you in advance,

Alex

On Thursday, March 23, 2017 at 8:27:02 PM UTC+1, Spektor, Leonid: CMU wrote:
>
> Alex,
>
>     I am not sure what do you mean by "LANGUAGE SWITCH", but you can use 
> +s"[- spa]" option to analyze only utterances with "[- spa]" code and -s"[- 
> spa]" option to analyze only utterances that do not have "[- spa]" code. If 
> this doesn't help, then please email to me with more input data files 
> examples and examples of output that you want to get.
>
> Leonid.
>
>
> On 23-03-17 14:19, A Cristia wrote:
>
> Dear clan users,
>
> In a bilingual corpus, is there a way to search for pairs of sentences 
> where a language switch has occurred? A search for the tagged language will 
> only reveal switches from the minor to the major language, but we'd like to 
> extract both:
>
> *FAC:    ʔaqaixana . 
> *FAC:    ten qaica naxa qaicaʔ . 
> *FAC:    [- spa] vamos afuera . <---- LANGUAGE SWITCH FROM THE PREVIOUS 
> SENTENCE TO THIS SENTENCE (major to minor -- can be found searching for [- 
> spa])
> *FAC:    ñaq qaica ten  paʔatauec na . <---- LANGUAGE SWITCH FROM THE 
> PREVIOUS SENTENCE TO THIS SENTENCE (minor to major -- can it be found?)
> *FAC:    ñaq qaica ten . 
>
>
>
> Thank you in advance,
>
> Gladys Ojea and Alex Cristia
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com 
> <https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/36b82d51-4b8b-457b-ae97-37a1a484a963%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170324/955000e1/attachment.htm>