CLAN and Excel

Leonid Spektor spektor at andrew.cmu.edu
Mon Mar 2 22:59:42 UTC 2009


Susanna,

    The code that matches "%|%-PRO:PP-%" pattern was matched by
"%|S-PRO:PP-%" pattern. CLAN counts only first match and only once, no
matter how many other patterns can match the same token. This illuminates
duplicate counting of the same token. Please, remember that freq counts the
frequency of occurrence of each token, so the same token should not be
counted more than once. This program is not designed to simply fill up a
spread-sheet table.

    I don't know why CLAN crashes now and not before on your computer.
Please, try to reboot your computer. Otherwise, please tell me the version
of CLAN and the OS system you are using. And it would also be helpful if you
could send me the data that crashes CLAN on your computer.

Please, respond to my email directly until we solve all the issues.

Leonid.


On 02-03-09 17:31, "bartsch" <bartsch at zas.gwz-berlin.de> wrote:

> 
> Dear Leonid,
> 
> Thank you. Interlinear answers below.
> 
> 
> On Mon, 02 Mar 2009 14:58:08 -0500, Leonid Spektor <spektor at andrew.cmu.edu>
> wrote:
>> Susanna,
>> 
>>     The search patterns that you use to find codes in your data are
>> overlapping. For example, pattern "%|S-%" will match all codes that
>> pattern
>> "%|S-PRO:PP-%" will match, plus more. The pattern "%|%-PRO:PP-%" will
>> match
>> all codes that both patterns "%|S-PRO:PP-%" and "%|O-PRO:PP-%" will
> match,
>> plus some more. This means that you have some very general patterns that
>> will override the more specific patterns. To deal with this you need to
>> priorities the search patterns order. The more general patterns, such as
>> "%|S-%" and "%|O-%", need to be listed last and the more specific
>> patterns,
>> such as "%|S-PRO:PP-%" and "%|O-PRO:PP-%", need to be listed first. This
>> is
>> a basic rule in CLAN, all the processing is done from left to right and
>> from
>> top to bottom. The new sequence of search patterns is:
>> 
>> %|S-DA:T-%
>> %|O-DA:T-%
>> %|S-PRO:PP-%
>> %|O-PRO:PP-%
>> %|%-DA:T-%
>> %|%-PRO:PP-%
>> %|S-%
>> %|O-%
>> 
>> I have changed the code.cut file to reflect this and I am attaching it to
>> this message.
> 
> I tried with the long command, as well as with the code.cut file - in both
> cases, CLAN freezes. I tried it several times, I restarted the computer and
> tried again and again - without success.
> 
> 
>> 
>> The last this to remember is that CLAN will only list patterns that it
>> actually can match to something is the data, so if some pattern doesn't
>> match anything in the data set, then there isn't going to be a column
>> created for it in the output.
> 
> I see. In the case of coding I think it wouldn't be bad to get a column
> also for such pattern-data non-matching cases, though.
> But more important for the moment: As you can see from the Excel file I
> sent you, in one of the missing columns (|-PRO:PP-) one child did have 2
> tokens for the searched coding string (combo found them, and they are also
> in the transcript). However, the column was not generated... What might
> have happened?
> 
> Kind regards,
> Susanna
> 
>> On 02-03-09 08:06, "bartsch" <bartsch at zas.gwz-berlin.de> wrote:
>> 
>>> 
>>> Dear Leonid,
>>> 
>>> thank you again for your hints. Some interlinear answers below.
>>> 
>>> 
>>>> 
>>>> Susanna,
>>>> 
>>>>     Try this command:
>>>> 
>>>> freq +d2 +t at ID="*target_child*" -t* +t%cod +s"%|S-%" +s"%|O-%"
>>>> +s"%|%-DA:T-%" +s"%|%-PRO:PP-%" +s"%|S-DA:T-%" +s"%|O-DA:T-%"
>>>> +s"%|S-PRO:PP-%" +s"%|O-PRO:PP-%"
>>>> 
>>>> Notice I have replaced all the '*' characters with '%' character. The
>>> above
>>>> example should be on one command line. If this is too much, then you
>> can
>>> use
>>>> the file "codes.cut" that I am attaching to this email with this
>> command:
>>>> 
>>>> freq +d2 +t at ID="*target_child*" -t* +t%cod +s at codes.cut
>>>> 
>>>> If this doesn't help you, then please send me a sample of your data
>> file
>>> and
>>>> further description of what exactly didn't work for you.
>>> 
>>> Well, some things worked, others didn't. Regardless of using either the
>>> full long command or the codes.cut file, the output were the same with
>>> following 2 problems:
>>> 
>>> 1. There were no columns for two searched coding strings: "%|%-PRO:PP-%"
>>> and "%|O-PRO:PP-%"
>>> 
>>> 2. Frequencies in half the columns were different when checking them
>> using
>>> a combo command.
>>> 
>>> I'll send you a sample of my data directly to you, and a message with
>> more
>>> details.
>>> 
>>> 
>>>>     In my tests I did not see an extra column between Œsituation¹ and
>>> the
>>>> first word of the concordance. Perhaps the @ID header tiers in your
>> data
>>>> files have an extra element at the end. You can see the correct output
>> by
>>>> looking at "sample.cha" and running commands:
>>>> 
>>>> freq +d2 +s"pro:%|%" +s"pro|%" sample.cha +t at ID="*mother*" -t* +t%mor
>>>> statfreq stat.out.cex +f +d
>>>> 
>>>> on it. The "sample.cha" file located in clan/lib/sample/ folder.
>>> 
>>> 
>>> Yes, curious, I hadn't this extra column anymore, although I didn't make
>>> any changes in the @ID header tiers...
>>> 
>>> Thank you in advance for further help.
>>> 
>>> Kindest regards,
>>> Susanna
> 
> *****************************************************************
> Susanna Bartsch
> bartsch at zas.gwz-berlin.de
> http://www.zas.gwz-berlin.de/mitarb/homepage/bartsch
> Zentrum fuer Allgemeine Sprachwissenschaft (ZAS)
> Centre for General Linguistics
> Schuetzenstr. 18
> 10117 Berlin
> Germany
> Tel. +49 (0)30 20 192 503
> Fax  +49 (0)30 20 192 402
> *****************************************************************
> 
> 
> 
> 
> * * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * * * * * *
> * *
> 
> Avira MailGate has processed a mail addressed to you, which contained no known
> potential malicious software.
> 
> In case you notice abnormal behavior of your software after opening the
> mail or one of its attachments, please forward the complete mail to
> Avira GmbH <mailto:support at avira.com> so it can be
> checked for unknown new potential malicious software.



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---



More information about the Chibolts mailing list