language alternation search
A Cristia
alecristia at gmail.com
Fri Mar 24 10:50:01 UTC 2017
Dear Leonid,
Thank you for the fast response. Gladys would like to extract are *pairs*
of sentences, one spoken in one language, the other in another. Imagine a
sequence like this:
1. English
2.
*English *
3. *French*
4. French
5.
*French *
6. *English*
Gladys would like to extract sentences 2-3 (switch Eng->Fr), and 5-6
(switch Fr->Eng).
Of course, this can be approximated by using kwal, extracting the [- spa]
sentences with some context, and then looking through by hand to see if the
context is also in Spanish (so not a switch) or in Qom (yes, it's a switch,
and thus part of what we would like to extract). I wonder if there is an
elegant solution for this in CLAN already.
If I were to do this in bash, I'd do something not very elegant like
(imagining there is only the content of the transcription):
sed -E '/[- spa]/!s/^/[- qom]/' | #add [- qom] to all lines NOT marked with
[- spa]
tr '\n' '€' | #next replace the line breaks
by a placeholder
sed 's/€\(.....)/\1€\1/g' | #duplicate the language marker on each
side of the placeholder
tr '€' '\n' | #translate back the
placeholder into line breaks
grep -A 1 -B 1 '[- qom]*[- spa]' # and finally extract sentences that have
both language markers
Does that make more sense? Thank you in advance,
Alex
On Thursday, March 23, 2017 at 8:27:02 PM UTC+1, Spektor, Leonid: CMU wrote:
>
> Alex,
>
> I am not sure what do you mean by "LANGUAGE SWITCH", but you can use
> +s"[- spa]" option to analyze only utterances with "[- spa]" code and -s"[-
> spa]" option to analyze only utterances that do not have "[- spa]" code. If
> this doesn't help, then please email to me with more input data files
> examples and examples of output that you want to get.
>
> Leonid.
>
>
> On 23-03-17 14:19, A Cristia wrote:
>
> Dear clan users,
>
> In a bilingual corpus, is there a way to search for pairs of sentences
> where a language switch has occurred? A search for the tagged language will
> only reveal switches from the minor to the major language, but we'd like to
> extract both:
>
> *FAC: ʔaqaixana .
> *FAC: ten qaica naxa qaicaʔ .
> *FAC: [- spa] vamos afuera . <---- LANGUAGE SWITCH FROM THE PREVIOUS
> SENTENCE TO THIS SENTENCE (major to minor -- can be found searching for [-
> spa])
> *FAC: ñaq qaica ten paʔatauec na . <---- LANGUAGE SWITCH FROM THE
> PREVIOUS SENTENCE TO THIS SENTENCE (minor to major -- can it be found?)
> *FAC: ñaq qaica ten .
>
>
>
> Thank you in advance,
>
> Gladys Ojea and Alex Cristia
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com
> <javascript:>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com
> <https://groups.google.com/d/msgid/chibolts/b465a75f-66da-4a69-86c1-35cd9bc50ea8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/36b82d51-4b8b-457b-ae97-37a1a484a963%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170324/955000e1/attachment.htm>
More information about the Chibolts
mailing list