language alternation search
alecristia at gmail.com
Fri Mar 24 06:50:01 EDT 2017
Thank you for the fast response. Gladys would like to extract are *pairs*
of sentences, one spoken in one language, the other in another. Imagine a
sequence like this:
Gladys would like to extract sentences 2-3 (switch Eng->Fr), and 5-6
Of course, this can be approximated by using kwal, extracting the [- spa]
sentences with some context, and then looking through by hand to see if the
context is also in Spanish (so not a switch) or in Qom (yes, it's a switch,
and thus part of what we would like to extract). I wonder if there is an
elegant solution for this in CLAN already.
If I were to do this in bash, I'd do something not very elegant like
(imagining there is only the content of the transcription):
sed -E '/[- spa]/!s/^/[- qom]/' | #add [- qom] to all lines NOT marked with
tr '\n' '€' | #next replace the line breaks
by a placeholder
sed 's/€\(.....)/\1€\1/g' | #duplicate the language marker on each
side of the placeholder
tr '€' '\n' | #translate back the
placeholder into line breaks
grep -A 1 -B 1 '[- qom]*[- spa]' # and finally extract sentences that have
both language markers
Does that make more sense? Thank you in advance,
On Thursday, March 23, 2017 at 8:27:02 PM UTC+1, Spektor, Leonid: CMU wrote:
> I am not sure what do you mean by "LANGUAGE SWITCH", but you can use
> +s"[- spa]" option to analyze only utterances with "[- spa]" code and -s"[-
> spa]" option to analyze only utterances that do not have "[- spa]" code. If
> this doesn't help, then please email to me with more input data files
> examples and examples of output that you want to get.
> On 23-03-17 14:19, A Cristia wrote:
> Dear clan users,
> In a bilingual corpus, is there a way to search for pairs of sentences
> where a language switch has occurred? A search for the tagged language will
> only reveal switches from the minor to the major language, but we'd like to
> extract both:
> *FAC: ʔaqaixana .
> *FAC: ten qaica naxa qaicaʔ .
> *FAC: [- spa] vamos afuera . <---- LANGUAGE SWITCH FROM THE PREVIOUS
> SENTENCE TO THIS SENTENCE (major to minor -- can be found searching for [-
> *FAC: ñaq qaica ten paʔatauec na . <---- LANGUAGE SWITCH FROM THE
> PREVIOUS SENTENCE TO THIS SENTENCE (minor to major -- can it be found?)
> *FAC: ñaq qaica ten .
> Thank you in advance,
> Gladys Ojea and Alex Cristia
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> To post to this group, send email to chib... at googlegroups.com
> To view this discussion on the web visit
> For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/36b82d51-4b8b-457b-ae97-37a1a484a963%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Chibolts