CLAN and Excel

Leonid Spektor spektor at andrew.cmu.edu
Mon Mar 2 19:58:08 UTC 2009


Susanna,

    The search patterns that you use to find codes in your data are
overlapping. For example, pattern "%|S-%" will match all codes that pattern
"%|S-PRO:PP-%" will match, plus more. The pattern "%|%-PRO:PP-%" will match
all codes that both patterns "%|S-PRO:PP-%" and "%|O-PRO:PP-%" will match,
plus some more. This means that you have some very general patterns that
will override the more specific patterns. To deal with this you need to
priorities the search patterns order. The more general patterns, such as
"%|S-%" and "%|O-%", need to be listed last and the more specific patterns,
such as "%|S-PRO:PP-%" and "%|O-PRO:PP-%", need to be listed first. This is
a basic rule in CLAN, all the processing is done from left to right and from
top to bottom. The new sequence of search patterns is:

%|S-DA:T-%
%|O-DA:T-%
%|S-PRO:PP-%
%|O-PRO:PP-%
%|%-DA:T-%
%|%-PRO:PP-%
%|S-%
%|O-%

I have changed the code.cut file to reflect this and I am attaching it to
this message.

The last this to remember is that CLAN will only list patterns that it
actually can match to something is the data, so if some pattern doesn't
match anything in the data set, then there isn't going to be a column
created for it in the output.

Leonid.



On 02-03-09 08:06, "bartsch" <bartsch at zas.gwz-berlin.de> wrote:

> 
> Dear Leonid,
> 
> thank you again for your hints. Some interlinear answers below.
> 
> 
>> 
>> Susanna,
>> 
>>     Try this command:
>> 
>> freq +d2 +t at ID="*target_child*" -t* +t%cod +s"%|S-%" +s"%|O-%"
>> +s"%|%-DA:T-%" +s"%|%-PRO:PP-%" +s"%|S-DA:T-%" +s"%|O-DA:T-%"
>> +s"%|S-PRO:PP-%" +s"%|O-PRO:PP-%"
>> 
>> Notice I have replaced all the '*' characters with '%' character. The
> above
>> example should be on one command line. If this is too much, then you can
> use
>> the file "codes.cut" that I am attaching to this email with this command:
>> 
>> freq +d2 +t at ID="*target_child*" -t* +t%cod +s at codes.cut
>> 
>> If this doesn't help you, then please send me a sample of your data file
> and
>> further description of what exactly didn't work for you.
> 
> Well, some things worked, others didn't. Regardless of using either the
> full long command or the codes.cut file, the output were the same with
> following 2 problems:
> 
> 1. There were no columns for two searched coding strings: "%|%-PRO:PP-%"
> and "%|O-PRO:PP-%"
> 
> 2. Frequencies in half the columns were different when checking them using
> a combo command.
> 
> I'll send you a sample of my data directly to you, and a message with more
> details.
> 
> 
>>     In my tests I did not see an extra column between Œsituation¹ and
> the
>> first word of the concordance. Perhaps the @ID header tiers in your data
>> files have an extra element at the end. You can see the correct output by
>> looking at "sample.cha" and running commands:
>> 
>> freq +d2 +s"pro:%|%" +s"pro|%" sample.cha +t at ID="*mother*" -t* +t%mor
>> statfreq stat.out.cex +f +d
>> 
>> on it. The "sample.cha" file located in clan/lib/sample/ folder.
> 
> 
> Yes, curious, I hadn't this extra column anymore, although I didn't make
> any changes in the @ID header tiers...
> 
> Thank you in advance for further help.
> 
> Kindest regards,
> Susanna
> 
> 
> 
>> 
>> Leonid.
>> 
>> 
>> On 22-02-09 22:43, "Brian MacWhinney" <macw at cmu.edu> wrote:
>> 
>>> Susanna Bartsch <susanna_gabriel at web.de>
>>> Dear all,
>>> 
>>> I have some questions about importing CLAN outputs to Excel files.
>>> 
>>> With the help of the CLAN manual, I could learn how to use INSERT,
>>> FREQ and STATFREQ, and the switches +d2 and +d3, in order to import
>>> outputs on word frequencies and/or types, tokens, and type-token
>>> ratios to an Excel file. In section 8.22 STATFREQ it reads that it is
>>> also possible to produce code frequencies, but I don¹t seem to be able
>>> to find more details about it in the manual. So, at first, I have
>>> tried a COMBO command like that:
>>> 
>>> combo +s"*|S-DA:T-*" +t%cod +d3 +f +t at ID="*target_child*" @
>>> (where S-DA:T is the code combination searched for: S for subject,
>>> DA:T for bare demonstrative, and S-DA:T together for bare
>>> demonstrative in subject position)
>>> 
>>> CLAN generated ins.cmb.cex files, but not a stat.out.cex file, as this
>>> is the case with FREQ, as it seems, the +d2 or +d3 switch works only
>>> with FREQ? So, I used the same command line above, but with the
>>> command FREQ, and then STATFREQ, and I can import the stat.out.sat.cex
>>> file to Excel, everything as expected, but, of course, what I have is
>>> a table with columns labeled 'types', 'tokens', and 'TTR'. Of course,
>>> in the process of converting the file into Excel and also later, I can
>>> make changes in the table such that it corresponds to my purposes, but
>>> I was wondering whether I¹m missing some switch or something that
>>> tells CLAN to generate in the stat.out.sat.cex a column labeled with
>>> the searched string (in the case above, *|S-DA:T-*). Or should I
>>> perhaps use a command other than FREQ?
>>> 
>>> I was also wondering whether it is possible to have a stat.out.sat.cex
>>> file such that it summarises a series of search operations. For
>>> instance, if I want to make a pivot table with children¹s frequencies
>>> of several linguistic units, such as subject antecedents (*|S-*),
>>> object antecedents (*|O-*), bare demonstratives (*|*-DA:T-*), personal
>>> pronouns (*|*-PRO:PP-*), bare demonstratives in subject (*|S-DA:T-*)
>>> and  object position (*|O-DA:T-*), personal pronouns in subject (*|S-
>>> PRO:PP-*) and object position (*|O-PRO:PP-*), etc.: Is there a
>>> procedure for getting one only output file including all of these
>>> frequencies, e.g.:
>>> 
>>> Š  speaker   Š  *|S-*    *|O-*   *|*-DA:T-*      *|*-PRO:PP-*    *|S-
>>> DA:T-*     *|O-DA:T-*     *|S-PRO:PP-*    *|O-PRO:PP-*    Š
>>> Š  001         Š  n           n           n
>>> n                         n
>>> n                      n                         n
>>> Š  002         Š  n           n           n
>>> n                         n
>>> n                      n                         n
>>> Š
>>> 
>>> Finally, when using the +d2 switch, the stat.out.sat.cex (and the
>>> Excel file) has a column between Œsituation¹ and the first word of the
>>> concordance. It is the 10th column ­ what does this column mean?
>>> 
>>> I¹d be most grateful for help in clarifying these questions and I
>>> apologise in advance for asking questions about basics the answers to
>>> which are perhaps easily found in the manual.
>>> 
>>> Best wishes,
>>> Susanna
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> 
> 
> *****************************************************************
> Susanna Bartsch
> bartsch at zas.gwz-berlin.de
> http://www.zas.gwz-berlin.de/mitarb/homepage/bartsch
> Zentrum fuer Allgemeine Sprachwissenschaft (ZAS)
> Centre for General Linguistics
> Schuetzenstr. 18
> 10117 Berlin
> Germany
> Tel. +49 (0)30 20 192 503
> Fax  +49 (0)30 20 192 402
> *****************************************************************
> 
> 
> 
> 
> * * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * * * * * *
> * *
> 
> Avira MailGate has processed a mail addressed to you, which contained no known
> potential malicious software.
> 
> In case you notice abnormal behavior of your software after opening the
> mail or one of its attachments, please forward the complete mail to
> Avira GmbH <mailto:support at avira.com> so it can be
> checked for unknown new potential malicious software.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: code.cut
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20090302/2d6879bf/attachment-0001.ksh>


More information about the Chibolts mailing list