combo using clause delimiters

Bruno Estigarribia brunilda at gmail.com
Wed Apr 23 12:22:12 UTC 2014


Thank you Leonid. Just to clarify: transforming this
*RAM:    Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c]
    ha(s)ta at 2 que at 2 un at 2a at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
    liga-ite at 4 , [^c] nunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2 
    que at 2 a#heja at 1 ichupe at 1 [^c] .

into this

*RAM:    Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c] +.
*RAM:    ha(s)ta at 2 que at 2 un at 2a at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
    liga-ite at 4 , [^c] +.
*RAMnunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2 
    que at 2 a#heja at 1 ichupe at 1 [^c] .

On Wednesday, April 23, 2014 1:12:09 AM UTC-4, Spektor, Leonid: CMU wrote:
>
> Bruno,
>
> I am afraid I don't have a good news for you. Both COMBO and KWAL are 
> designed to search one tier at a time. Tier is defined as text between tier 
> code names, that starts with either @, * or % character at the beginning of 
> a new line, and end with one of the those character at the beginning of 
> first of the following new lines. CHAT was decided, for convenience, to 
> have one utterance per tier. This means that having multiple clauses per 
> tier or multiple utterances per tier, as you wanted to replace all clauses 
> with utterance delimiter like "+.", will not force COMBO to output one 
> matched utterance or clause of multiple ones within a speaker tier. But, 
> instead COMBO will still output the whole tier. It might be possible to 
> change COMBO to break-up tiers based on multiple clauses or utterances and 
> to output only the ones that match, but it will create a great problem if a 
> user would want to output a corresponding dependent tier as well, because 
> there is no way to figure out, without additional coding on that dependent 
> tier, which part of dependent tier belongs to which corresponding utterance 
> or clause of multiple utterances or clauses on main speaker tier.
>
> Leonid.
>
>
>  
> On Apr 22, 2014, at 17:21 , Bruno Estigarribia <brun... at gmail.com<javascript:>> 
> wrote:
>
> Hello everyone,
>
> I have a code-switching transcript where we used [^c] as a clause 
> delimiter when a line (=utterance) consisted of more than one clause.
> We have also used @1 and @2 as word markers for each one of the two 
> languages. And we have used @4 to mark mixed words. An example line follows 
> (please ignore the morphological markings on the main tier for the 
> moment--I've discussed this in a different thread and we intend to replace 
> them with a proper MOR tier):
>
> *RAM:    Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c]
>     ha(s)ta at 2 que at 2 un at 2a at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
>     liga-ite at 4 , [^c] nunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2 
>     que at 2 a#heja at 1 ichupe at 1 [^c] .
>
> I want to find and count all mixed CLAUSES (intraclausal switching, 
> excluding interclausal switching). The best I could come up with was this 
> command:
> combo +r5 +t* +s(*\@1^*^![\^c]^*^*\@2)+(*\@2^*^![\^c]^*^*\@1)+(*\@4) +f
>
> This outputs and retrieves all lines with any sort of mix, so for example 
> the line above would be output once. We want to output each matched CLAUSE 
> (so the line above would give actually 4 output matches, since all 4 
> clauses have some kind of mixing (note that this is not the same as 
> outputting each match, since we collapse all matches obtained within a 
> single clause--see the first clause in the example above).
> I know that MLU has the +C option to work on clauses rather than 
> utterances, but it is limited to MLU.
> I assume I can transform all clauses into unique lines by using the 
> transcription break terminator +. and use COMBO the normal way. But is 
> there another (perhaps more elegant) solution?
> Thank you
> Bruno
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com<javascript:>
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/chibolts/7ed33e08-fd66-4ca7-9880-aba5e4dd935f%40googlegroups.com<https://groups.google.com/d/msgid/chibolts/7ed33e08-fd66-4ca7-9880-aba5e4dd935f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/23daa30e-31a7-4e75-873a-97be508d4cfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20140423/37cb7bb4/attachment.htm>


More information about the Chibolts mailing list