combo hangs on negation sometimes

Leonid Spektor spektor at andrew.cmu.edu
Tue Aug 6 21:02:36 UTC 2013


Paul,

	COMBO was re-writen over a year ago to find all possible matches for search pattern that have multiple OR elements, represented with "+" symbol. It also has improved mixed negative and positive complex search patterns function. However, it was never tested for all negative searches like  +s'!*:wh|*^!*?' or "+s!xxx", because COMBO is searching for any match of search pattern anywhere within an utterance and the two negative search patterns above will technically match virtually all utterances. For example, the "+s!xxx" will match utterance:

*CHI:		xxx .

because the pattern "!xxx" does not match utterance delimiter ".", which means match was a success. The only way the the pattern "!xxx" will not match utterance is if it only has "xxx" and nothing else, like this:

*CHI:		xxx


You've shown that this is not what people expect. I will try to change COMBO to be less literal, but it will take some time. In the mean time please use KWAL as you have noted that you can as an alternative.


Leonid.



On Aug 5, 2013, at 21:29, paul wrote:

> I'm trying to rerun some combo searches that ran successfully a year ago but
> haven't been used since then. I've observed identical behavior on Windows XP,
> Ubuntu 12.04, and Arch Linux with CLAN 05-Aug-2013 and the version before it.
> 
> It seems that combo is hanging when encountering the negation operator "!" in
> certain contexts. For example:
> 
>   combo +t'*CHI' +t%mor +t%xgra +s'!*:wh|*^!*?' +d1 ~/corpora/childes/Valian/01a.cha
> 
> is intended to filter out utterances containing wh questions, although it's
> unclear to me exactly how to parse that search string (I didn't write it).
> 
> The same thing happens on a simpler combo line like
> 
>   combo @ +t'*CHI' +t%mor +t%xgra +s'!xxx' +d1 ~/corpora/childes/Valian/01a.cha
> 
> though I realize this could be rewritten with kwal.
> 
> In both cases combo never gets past
> 
>   combo +t*CHI +t%mor +t%xgra +s!xxx +d1 /home/paul/corpora/childes/Valian/01a.cha
>   Mon Aug  5 20:43:10 2013
>   combo (05-Aug-2013) is conducting analyses on:
>     ONLY speaker main tiers matching: *CHI;
>       and those speakers' ONLY dependent tiers matching: %MOR; %XGRA;
>   ****************************************
>   From file <01a.cha>
> 
> After poking around a little with gdb and enabling the debug print statement in
> combo.cpp:findmatch I get
> 
>   combo +t*CHI +t%mor +t%xgra +s!xxx +d1 /home/paul/corpora/childes/Valian/01a.cha
>   Mon Aug  5 20:54:54 2013
>   combo (05-Aug-2013) is conducting analyses on:
>     ONLY speaker main tiers matching: *CHI;
>       and those speakers' ONLY dependent tiers matching: %MOR; %XGRA;
>   ****************************************
>   From file <01a.cha>
>   1; pat=xxx;wild=0;origmac->neg=1;txt=tape it up and two tape players .       %mor: v|tape pro|it adv:loc|up coord|and det:num|two n|tape n|play&dv-agt-pl   .  %xgra: 1|4|coord 2|1|obj 3|1|jct 4|0|root 5|6|quant 6|4|coord 7|6|jct  8|4|punct 
>   1; pat=xxx;wild=0;origmac->neg=1;txt=tape it up and two tape players .       %mor: v|tape pro|it adv:loc|up coord|and det:num|two n|tape n|play&dv-agt-pl   .  %xgra: 1|4|coord 2|1|obj 3|1|jct 4|0|root 5|6|quant 6|4|coord 7|6|jct  8|4|punct 
>   1; pat=xxx;wild=0;origmac->neg=1;txt=tape it up and two tape players .       %mor: v|tape pro|it adv:loc|up coord|and det:num|two n|tape n|play&dv-agt-pl   .  %xgra: 1|4|coord 2|1|obj 3|1|jct 4|0|root 5|6|quant 6|4|coord 7|6|jct  8|4|punct
>   ... and so on until killing the process.
> 
> It appears that at some point in the file it stops moving across words
> boundaries/consuming input tokens and gets stuck. Note that "tape it up and two tape players" i s not
> the first utterance in the file.
> 
> searches like +s'!xxx^yyy'  and +s'xxx^!yyy' run to completion.
> 
> Anyway, I'm not sure if this is a bug or maybe an abuse of deprecated syntax or
> something, but any advice would be appreciated.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To post to this group, send email to chibolts at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/cbdacf62-dcd5-4286-982d-c7b8ee263bcd%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/DD3A1287-62CA-40A9-92F5-F10BA6C15533%40andrew.cmu.edu.
For more options, visit https://groups.google.com/groups/opt_out.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20130806/328ab874/attachment.htm>


More information about the Chibolts mailing list