combo using clause delimiters

Wed Apr 23 12:25:19 UTC 2014

(Sorry, sent unfinished by mistake)
Thank you Leonid. Just to clarify: transforming this
*RAM:    Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c]
    ha(s)ta at 2 que at 2 un at 2 día at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
    liga-ite at 4 , [^c] nunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2 
    que at 2 a#heja at 1 ichupe at 1 [^c] .

into this

*RAM:    Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c] +.
*RAM:    ha(s)ta at 2 que at 2 un at 2 día at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
    liga-ite at 4 , [^c] +.
*RAM     nunca at 2 má@2 a#de(s)cansá@4 , [^c] +.
*RAM:    ha(s)ta at 2
    que at 2 a#heja at 1 ichupe at 1 [^c] .

would work, right? Now each tier contains only one clause. But this causes 
other problems, in that now you cannot do measures on utterances anymore, 
correct? There is no way for any program to see a transcription break +. 
and recognize that that tier's content in in the same utterance as 
something that follows...
Thanks
Bruno

On Tuesday, April 22, 2014 5:21:00 PM UTC-4, Bruno Estigarribia wrote:
>
> Hello everyone,
>
> I have a code-switching transcript where we used [^c] as a clause 
> delimiter when a line (=utterance) consisted of more than one clause.
> We have also used @1 and @2 as word markers for each one of the two 
> languages. And we have used @4 to mark mixed words. An example line follows 
> (please ignore the morphological markings on the main tier for the 
> moment--I've discussed this in a different thread and we intend to replace 
> them with a proper MOR tier):
>
> *RAM:    Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c]
>     ha(s)ta at 2 que at 2 un at 2 día at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
>     liga-ite at 4 , [^c] nunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2 
>     que at 2 a#heja at 1 ichupe at 1 [^c] .
>
> I want to find and count all mixed CLAUSES (intraclausal switching, 
> excluding interclausal switching). The best I could come up with was this 
> command:
> combo +r5 +t* +s(*\@1^*^![\^c]^*^*\@2)+(*\@2^*^![\^c]^*^*\@1)+(*\@4) +f
>
> This outputs and retrieves all lines with any sort of mix, so for example 
> the line above would be output once. We want to output each matched CLAUSE 
> (so the line above would give actually 4 output matches, since all 4 
> clauses have some kind of mixing (note that this is not the same as 
> outputting each match, since we collapse all matches obtained within a 
> single clause--see the first clause in the example above).
> I know that MLU has the +C option to work on clauses rather than 
> utterances, but it is limited to MLU.
> I assume I can transform all clauses into unique lines by using the 
> transcription break terminator +. and use COMBO the normal way. But is 
> there another (perhaps more elegant) solution?
> Thank you
> Bruno
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/161bbc60-d2c9-41ed-a9b7-67cd838d2550%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20140423/5b8514b5/attachment.htm>