Searching for combinations of words from a file

Leonid Spektor spektor at andrew.cmu.edu
Fri May 15 20:36:51 UTC 2009


Jonathan,

    combo does not fully implement the regular expressions. Other
applications such as BBEdit do a better job. But, CLAN has two advantages,
you can search only specific speaker(s) tiers and you can send the result of
your search to other CLAN commands for further analysis. Only combo command
can search for adjacent function words. The "*" operator is used to indicate
that any part of the word will be matched. It serves the function of a wild
card character. The "^" operator indicates that word on the left side of "^"
is follow by the word on the right side. The "+" operator indicates that
either item on either side of "+" can match. It works like an OR operator.
The file that has list of item you are searching for should have ".cut"
extension. And when you open it in CLAN editor you should see "[TEXT]" on
black status line that has line numbers at the bottom of the window. If you
see "[CHAT]", then use "Mode->Chat mode" menu to change to TEXT mode. You
will need to get the latest version of CLAN before you try this, but if I
understand correctly what you are trying to do then here are a few examples
for you to try.

I have created the following "test.cha" file:

@Begin
@Languages:    en
@Participants:    MOT  Mother, CHI Target_Child
@ID:    en|change_me_later|MOT|||||Mother||
@ID:    en|change_me_later|CHI|||||Target_Child||
*CHI:    text  el uword word2 word3 la  huword .
*MOT:    word4 word5 .
*CHI:    word4 word6 .
*MOT:    la uword word7 el word8 huword .
*CHI:    word8 el huword word10.
@End

The command you want to use is either one of the following:

combo +sel^hu*+el^u*+la^hu*+la^u* test.cha
combo +s at file.cut test.cha

Both commands produce the following output:

(((el^hu*)+(el^u*)+(la^hu*)+(la^u*)))
combo +s at file.cut test.cha
Fri May 15 15:54:42 2009
combo (15-May-2009) is conducting analyses on:
  ALL speaker tiers
****************************************
>From file <TEST.cha>
----------------------------------------
*** File "TEST.cha": line 6.
*CHI:    text  (1)el (1)uword word2 word3 (2)la  (2)huword .
----------------------------------------
*** File "TEST.cha": line 9.
*MOT:    (1)la (1)uword word7 el word8 huword .
----------------------------------------
*** File "TEST.cha": line 10.
*CHI:    word8 (1)el (1)huword word10 .

If you want to search just child's tiers then add "+t*CHI" option to the
above commands. You can also make the result of this search usable by other
CLAN commands by adding option "+d". For example:

combo +s at file.cut +d test.cha
Fri May 15 15:59:09 2009
combo (15-May-2009) is conducting analyses on:
  ALL speaker tiers
****************************************
>From file <test.cha>
*CHI:    text  el uword word2 word3 la  huword .
*MOT:    la uword word7 el word8 huword .
*CHI:    word8 el huword word10 .

If you want only the items that match your search to go to output, then use
"+d3" option. For example:

combo +s at file.cut +d3 test.cha
Fri May 15 16:07:46 2009
combo (15-May-2009) is conducting analyses on:
  ALL speaker tiers
****************************************
>From file <test.cha>
@Comment:    -----------------------------------
@Comment:    *** File "test.cha": line 6;
*CHI:          el uword             la  huword
@Comment:    -----------------------------------
@Comment:    *** File "test.cha": line 9;
*MOT:    la uword  
@Comment:    -----------------------------------
@Comment:    *** File "test.cha": line 10;
*CHI:          el huword

I am attaching the "test.cha", "commands.txt" and three search items list
file to this message for you to try.

I hope this helps,

Leonid.

On 15-05-09 03:24, "Jonathan" <jonathan.udoff at gmail.com> wrote:

> 
> I've read through the CLAN manual, and I'm still unclear how to
> reference a word list file to do a complex search. Or if there are any
> format requirements for the file.
> 
> I am trying to search for certain combination of adjacent function
> words and clitics in Catalan. For example, looking for underuse of
> elision of articles with nouns, I need to find all instances of "el/la
> (h)V". So I created a word list of el^hu*, el^u*, la^hu*, la^u*, etc.
> The words are separated by carriage returns, and I saved the file as
> a .cha in the folder that lib is set to. But I either get errors or no
> matches when trying to run a search linking to that file.
> 
> 1) Is the .cha file I created correct? Is it legal to use the * and ^
> operators?
> 2) What is the proper format for calling that file using the +s
> switch? I've seen +sfilename.cha, +s at filename, +s@, +s"filename.cha"
> and none of these work, neither their variants.
> 3) Should I be doing this search via the kwal or combo commands - is
> there a difference in this case?
> 4) For future reference, what is the proper syntax to refer to
> multiple files in a +s switch, eg, +s at file1^@file2^@file1 ?
> 
> Sorry I'm such a neophyte, but there don't seem to be many web-based
> resources for CLAN besides this forum and the manual!
> 
> > 
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: commands.txt
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20090515/c6c165ab/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.cha
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20090515/c6c165ab/attachment-0004.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: file2.cut
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20090515/c6c165ab/attachment-0005.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: file1.cut
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20090515/c6c165ab/attachment-0006.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: file.cut
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20090515/c6c165ab/attachment-0007.ksh>


More information about the Chibolts mailing list