combo using clause delimiters
Bruno Estigarribia
brunilda at gmail.com
Wed Apr 23 12:22:12 UTC 2014
Thank you Leonid. Just to clarify: transforming this
*RAM: Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c]
ha(s)ta at 2 que at 2 un at 2 día at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
liga-ite at 4 , [^c] nunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2
que at 2 a#heja at 1 ichupe at 1 [^c] .
into this
*RAM: Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c] +.
*RAM: ha(s)ta at 2 que at 2 un at 2 día at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
liga-ite at 4 , [^c] +.
*RAMnunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2
que at 2 a#heja at 1 ichupe at 1 [^c] .
On Wednesday, April 23, 2014 1:12:09 AM UTC-4, Spektor, Leonid: CMU wrote:
>
> Bruno,
>
> I am afraid I don't have a good news for you. Both COMBO and KWAL are
> designed to search one tier at a time. Tier is defined as text between tier
> code names, that starts with either @, * or % character at the beginning of
> a new line, and end with one of the those character at the beginning of
> first of the following new lines. CHAT was decided, for convenience, to
> have one utterance per tier. This means that having multiple clauses per
> tier or multiple utterances per tier, as you wanted to replace all clauses
> with utterance delimiter like "+.", will not force COMBO to output one
> matched utterance or clause of multiple ones within a speaker tier. But,
> instead COMBO will still output the whole tier. It might be possible to
> change COMBO to break-up tiers based on multiple clauses or utterances and
> to output only the ones that match, but it will create a great problem if a
> user would want to output a corresponding dependent tier as well, because
> there is no way to figure out, without additional coding on that dependent
> tier, which part of dependent tier belongs to which corresponding utterance
> or clause of multiple utterances or clauses on main speaker tier.
>
> Leonid.
>
>
>
> On Apr 22, 2014, at 17:21 , Bruno Estigarribia <brun... at gmail.com<javascript:>>
> wrote:
>
> Hello everyone,
>
> I have a code-switching transcript where we used [^c] as a clause
> delimiter when a line (=utterance) consisted of more than one clause.
> We have also used @1 and @2 as word markers for each one of the two
> languages. And we have used @4 to mark mixed words. An example line follows
> (please ignore the morphological markings on the main tier for the
> moment--I've discussed this in a different thread and we intend to replace
> them with a proper MOR tier):
>
> *RAM: Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c]
> ha(s)ta at 2 que at 2 un at 2 día at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
> liga-ite at 4 , [^c] nunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2
> que at 2 a#heja at 1 ichupe at 1 [^c] .
>
> I want to find and count all mixed CLAUSES (intraclausal switching,
> excluding interclausal switching). The best I could come up with was this
> command:
> combo +r5 +t* +s(*\@1^*^![\^c]^*^*\@2)+(*\@2^*^![\^c]^*^*\@1)+(*\@4) +f
>
> This outputs and retrieves all lines with any sort of mix, so for example
> the line above would be output once. We want to output each matched CLAUSE
> (so the line above would give actually 4 output matches, since all 4
> clauses have some kind of mixing (note that this is not the same as
> outputting each match, since we collapse all matches obtained within a
> single clause--see the first clause in the example above).
> I know that MLU has the +C option to work on clauses rather than
> utterances, but it is limited to MLU.
> I assume I can transform all clauses into unique lines by using the
> transcription break terminator +. and use COMBO the normal way. But is
> there another (perhaps more elegant) solution?
> Thank you
> Bruno
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com<javascript:>
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/7ed33e08-fd66-4ca7-9880-aba5e4dd935f%40googlegroups.com<https://groups.google.com/d/msgid/chibolts/7ed33e08-fd66-4ca7-9880-aba5e4dd935f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/23daa30e-31a7-4e75-873a-97be508d4cfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20140423/37cb7bb4/attachment.htm>
More information about the Chibolts
mailing list