<div dir="ltr">Dear Leonid,<div><br></div><div>Thank you for the fast response. Gladys would like to extract are *pairs* of sentences, one spoken in one language, the other in another. Imagine a sequence like this:</div><div><ol><li>English<br></li><li><b>English<br></b></li><li><b>French</b></li><li>French<br></li><li><b>French<br></b></li><li><b>English</b><br></li></ol></div><div>Gladys would like to extract sentences 2-3 (switch Eng->Fr), and 5-6 (switch Fr->Eng).</div><div><br></div><div>Of course, this can be approximated by using kwal, extracting the [- spa] sentences with some context, and then looking through by hand to see if the context is also in Spanish (so not a switch) or in Qom (yes, it's a switch, and thus part of what we would like to extract). I wonder if there is an elegant solution for this in CLAN already.</div><div><br></div><div>If I were to do this in bash, I'd do something not very elegant like (imagining there is only the content of the transcription):</div><div>sed -E '/[- spa]/!s/^/[- qom]/' | #add [- qom] to all lines NOT marked with [- spa]</div><div> tr '\n' '€' | #next replace the line breaks by a placeholder</div><div> sed 's/€\(.....)/\1€\1/g' | #duplicate the language marker on each side of the placeholder<br><div> tr '€' '\n' | #translate back the placeholder into line breaks</div><div>grep -A 1 -B 1 '[- qom]*[- spa]' # and finally extract sentences that have both language markers</div><div><br></div><div>Does that make more sense? Thank you in advance,</div><div><br></div><div>Alex</div><div><br></div>On Thursday, March 23, 2017 at 8:27:02 PM UTC+1, Spektor, Leonid: CMU wrote:<blockquote class="gmail_quote" style="margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
<div bgcolor="#FFFFFF" text="#000000">
<p>Alex,</p>
<p> I am not sure what do you mean by "LANGUAGE SWITCH", but you
can use +s"[- spa]" option to analyze only utterances with "[-
spa]" code and -s"[- spa]" option to analyze only utterances that
do not have "[- spa]" code. If this doesn't help, then please
email to me with more input data files examples and examples of
output that you want to get.<br>
</p>
<pre cols="72">Leonid.
</pre>
<div>On 23-03-17 14:19, A Cristia wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Dear clan users,<br>
<br>
In a bilingual corpus, is there a way to search for pairs of
sentences where a language switch has occurred? A search for the
tagged language will only reveal switches from the minor to the
major language, but we'd like to extract both:<br>
<br>
*FAC: ʔaqaixana . <br>
*FAC: ten qaica naxa qaicaʔ . <br>
*FAC: [- spa] vamos afuera . <---- LANGUAGE SWITCH FROM
THE PREVIOUS SENTENCE TO THIS SENTENCE (major to minor -- can be
found searching for [- spa])<br>
*FAC: ñaq qaica ten paʔatauec na . <---- LANGUAGE SWITCH
FROM THE PREVIOUS SENTENCE TO THIS SENTENCE (minor to major --
can it be found?)<br>
*FAC: ñaq qaica ten . <br>
<br>
<br>
<br>
Thank you in advance,<br>
<br>
Gladys Ojea and Alex Cristia<br>
<br>
<br>
</div>
-- <br>
You received this message because you are subscribed to the Google
Groups "chibolts" group.<br>
To unsubscribe from this group and stop receiving emails from it,
send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="uaejvwuHAAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">chibolts+u...@<wbr>googlegroups.com</a>.<br>
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="uaejvwuHAAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">chib...@googlegroups.com</a>.<br>
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium=email&utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;" onclick="this.href='https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return true;">https://groups.google.com/d/<wbr>msgid/chibolts/b465a75f-66da-<wbr>4a69-86c1-35cd9bc50ea8%<wbr>40googlegroups.com</a>.<br>
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/<wbr>optout</a>.<br>
</blockquote>
<br>
</div>
</blockquote></div></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups "chibolts" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:chibolts+unsubscribe@googlegroups.com">chibolts+unsubscribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href="mailto:chibolts@googlegroups.com">chibolts@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/chibolts/36b82d51-4b8b-457b-ae97-37a1a484a963%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/chibolts/36b82d51-4b8b-457b-ae97-37a1a484a963%40googlegroups.com</a>.<br />
For more options, visit <a href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.<br />