combo using clause delimiters
Leonid Spektor
spektor at andrew.cmu.edu
Wed Apr 23 05:12:09 UTC 2014
Bruno,
I am afraid I don't have a good news for you. Both COMBO and KWAL are designed to search one tier at a time. Tier is defined as text between tier code names, that starts with either @, * or % character at the beginning of a new line, and end with one of the those character at the beginning of first of the following new lines. CHAT was decided, for convenience, to have one utterance per tier. This means that having multiple clauses per tier or multiple utterances per tier, as you wanted to replace all clauses with utterance delimiter like "+.", will not force COMBO to output one matched utterance or clause of multiple ones within a speaker tier. But, instead COMBO will still output the whole tier. It might be possible to change COMBO to break-up tiers based on multiple clauses or utterances and to output only the ones that match, but it will create a great problem if a user would want to output a corresponding dependent tier as well, because there is no way to figure out, without additional coding on that dependent tier, which part of dependent tier belongs to which corresponding utterance or clause of multiple utterances or clauses on main speaker tier.
Leonid.
On Apr 22, 2014, at 17:21 , Bruno Estigarribia <brunilda at gmail.com> wrote:
> Hello everyone,
>
> I have a code-switching transcript where we used [^c] as a clause delimiter when a line (=utterance) consisted of more than one clause.
> We have also used @1 and @2 as word markers for each one of the two languages. And we have used @4 to mark mixed words. An example line follows (please ignore the morphological markings on the main tier for the moment--I've discussed this in a different thread and we intend to replace them with a proper MOR tier):
>
> *RAM: Che~niko at 1 che#felí(z)@4 con at 2 mi at 2 concubin-o at 2 akue at 1 , [^c]
> ha(s)ta at 2 que at 2 un at 2 día at 2 fatale at 2 a#menda at 1 hese at 1 por at 2
> liga-ite at 4 , [^c] nunca at 2 má@2 a#de(s)cansá@4 , [^c] ha(s)ta at 2
> que at 2 a#heja at 1 ichupe at 1 [^c] .
>
> I want to find and count all mixed CLAUSES (intraclausal switching, excluding interclausal switching). The best I could come up with was this command:
> combo +r5 +t* +s(*\@1^*^![\^c]^*^*\@2)+(*\@2^*^![\^c]^*^*\@1)+(*\@4) +f
>
> This outputs and retrieves all lines with any sort of mix, so for example the line above would be output once. We want to output each matched CLAUSE (so the line above would give actually 4 output matches, since all 4 clauses have some kind of mixing (note that this is not the same as outputting each match, since we collapse all matches obtained within a single clause--see the first clause in the example above).
> I know that MLU has the +C option to work on clauses rather than utterances, but it is limited to MLU.
> I assume I can transform all clauses into unique lines by using the transcription break terminator +. and use COMBO the normal way. But is there another (perhaps more elegant) solution?
> Thank you
> Bruno
>
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To post to this group, send email to chibolts at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/7ed33e08-fd66-4ca7-9880-aba5e4dd935f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/92230E66-A215-41D3-A5B8-A2C2B4C1FB7A%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20140423/0d55bc7b/attachment.htm>
More information about the Chibolts
mailing list