<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Alex,</p>
<p>Here is are two commands that will do what I think you want. They
are not extremely elegant, but then again nothing involving
regular expressions search is. For English to French switch try
this command:</p>
<p>combo +b2 -l +s"\**:^*s:eng^*^\**:^*s:fra" *.cha</p>
<p>And for French to English switch try this command:</p>
<p>combo +b2 -l +s"\**:^*s:fra^*^\**:^*s:eng" *.cha</p>
<p>If this is not working well for you and Gladys, then I really
need you to email to me directly a sample of your data file, so
that I can see all tags and their use in the file in order to
suggest a more precise command. I understand that this feature is
very valuable to studying bilingual data, so we might even try to
add some new features to CLAN to do a better job at searching for
language switching.<br>
</p>
<p><br>
</p>
<pre class="moz-signature" cols="72">
Leonid.
</pre>
<div class="moz-cite-prefix">On 24-03-17 06:50, A Cristia wrote:<br>
</div>
<blockquote
cite="mid:36b82d51-4b8b-457b-ae97-37a1a484a963@googlegroups.com"
type="cite">
<div dir="ltr">Dear Leonid,
<div><br>
</div>
<div>Thank you for the fast response. Gladys would like to
extract are *pairs* of sentences, one spoken in one language,
the other in another. Imagine a sequence like this:</div>
<div>
<ol>
<li>English<br>
</li>
<li><b>English<br>
</b></li>
<li><b>French</b></li>
<li>French<br>
</li>
<li><b>French<br>
</b></li>
<li><b>English</b><br>
</li>
</ol>
</div>
<div>Gladys would like to extract sentences 2-3 (switch
Eng->Fr), and 5-6 (switch Fr->Eng).</div>
<div><br>
</div>
<div>Of course, this can be approximated by using kwal,
extracting the [- spa] sentences with some context, and then
looking through by hand to see if the context is also in
Spanish (so not a switch) or in Qom (yes, it's a switch, and
thus part of what we would like to extract). I wonder if there
is an elegant solution for this in CLAN already.</div>
<div><br>
</div>
<div>If I were to do this in bash, I'd do something not very
elegant like (imagining there is only the content of the
transcription):</div>
<div>sed -E '/[- spa]/!s/^/[- qom]/' | #add [- qom] to all lines
NOT marked with [- spa]</div>
<div> tr '\n' '€' | #next replace
the line breaks by a placeholder</div>
<div> sed 's/€\(.....)/\1€\1/g' | #duplicate the
language marker on each side of the placeholder<br>
<div> tr '€' '\n' | #translate
back the placeholder into line breaks</div>
<div>grep -A 1 -B 1 '[- qom]*[- spa]' # and finally extract
sentences that have both language markers</div>
<div><br>
</div>
<div>Does that make more sense? Thank you in advance,</div>
<div><br>
</div>
<div>Alex</div>
<div><br>
</div>
On Thursday, March 23, 2017 at 8:27:02 PM UTC+1, Spektor,
Leonid: CMU wrote:
<blockquote class="gmail_quote" style="margin: 0;margin-left:
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
<div bgcolor="#FFFFFF" text="#000000">
<p>Alex,</p>
<p> I am not sure what do you mean by "LANGUAGE
SWITCH", but you can use +s"[- spa]" option to analyze
only utterances with "[- spa]" code and -s"[- spa]"
option to analyze only utterances that do not have "[-
spa]" code. If this doesn't help, then please email to
me with more input data files examples and examples of
output that you want to get.<br>
</p>
<pre cols="72">Leonid.
</pre>
<div>On 23-03-17 14:19, A Cristia wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Dear clan users,<br>
<br>
In a bilingual corpus, is there a way to search for
pairs of sentences where a language switch has
occurred? A search for the tagged language will only
reveal switches from the minor to the major language,
but we'd like to extract both:<br>
<br>
*FAC: ʔaqaixana . <br>
*FAC: ten qaica naxa qaicaʔ . <br>
*FAC: [- spa] vamos afuera . <---- LANGUAGE
SWITCH FROM THE PREVIOUS SENTENCE TO THIS SENTENCE
(major to minor -- can be found searching for [- spa])<br>
*FAC: ñaq qaica ten paʔatauec na . <----
LANGUAGE SWITCH FROM THE PREVIOUS SENTENCE TO THIS
SENTENCE (minor to major -- can it be found?)<br>
*FAC: ñaq qaica ten . <br>
<br>
<br>
<br>
Thank you in advance,<br>
<br>
Gladys Ojea and Alex Cristia<br>
<br>
<br>
</div>
-- <br>
You received this message because you are subscribed to
the Google Groups "chibolts" group.<br>
To unsubscribe from this group and stop receiving emails
from it, send an email to <a moz-do-not-send="true"
href="javascript:" target="_blank"
gdf-obfuscated-mailto="uaejvwuHAAAJ" rel="nofollow"
onmousedown="this.href='javascript:';return true;"
onclick="this.href='javascript:';return true;">chibolts+u...@<wbr>googlegroups.com</a>.<br>
To post to this group, send email to <a
moz-do-not-send="true" href="javascript:"
target="_blank" gdf-obfuscated-mailto="uaejvwuHAAAJ"
rel="nofollow"
onmousedown="this.href='javascript:';return true;"
onclick="this.href='javascript:';return true;">chib...@googlegroups.com</a>.<br>
To view this discussion on the web visit <a
moz-do-not-send="true"
href="https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium=email&utm_source=footer"
target="_blank" rel="nofollow"
onmousedown="this.href='https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return
true;"
onclick="this.href='https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter';return
true;">https://groups.google.com/d/<wbr>msgid/chibolts/b465a75f-66da-<wbr>4a69-86c1-35cd9bc50ea8%<wbr>40googlegroups.com</a>.<br>
For more options, visit <a moz-do-not-send="true"
href="https://groups.google.com/d/optout"
target="_blank" rel="nofollow"
onmousedown="this.href='https://groups.google.com/d/optout';return
true;"
onclick="this.href='https://groups.google.com/d/optout';return
true;">https://groups.google.com/d/<wbr>optout</a>.<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
-- <br>
You received this message because you are subscribed to the Google
Groups "chibolts" group.<br>
To unsubscribe from this group and stop receiving emails from it,
send an email to <a moz-do-not-send="true"
href="mailto:chibolts+unsubscribe@googlegroups.com">chibolts+unsubscribe@googlegroups.com</a>.<br>
To post to this group, send email to <a moz-do-not-send="true"
href="mailto:chibolts@googlegroups.com">chibolts@googlegroups.com</a>.<br>
To view this discussion on the web visit <a
moz-do-not-send="true"
href="https://groups.google.com/d/msgid/chibolts/36b82d51-4b8b-457b-ae97-37a1a484a963%40googlegroups.com?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/chibolts/36b82d51-4b8b-457b-ae97-37a1a484a963%40googlegroups.com</a>.<br>
For more options, visit <a moz-do-not-send="true"
href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.<br>
</blockquote>
<br>
</body>
</html>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups "chibolts" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:chibolts+unsubscribe@googlegroups.com">chibolts+unsubscribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href="mailto:chibolts@googlegroups.com">chibolts@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/chibolts/4879f196-817b-9155-1544-6a0374d0e98e%40andrew.cmu.edu?utm_medium=email&utm_source=footer">https://groups.google.com/d/msgid/chibolts/4879f196-817b-9155-1544-6a0374d0e98e%40andrew.cmu.edu</a>.<br />
For more options, visit <a href="https://groups.google.com/d/optout">https://groups.google.com/d/optout</a>.<br />