CLAN and Excel

bartsch bartsch at zas.gwz-berlin.de
Wed Mar 4 13:06:55 UTC 2009


Dear Leonid,

On Tue, 03 Mar 2009 10:31:22 -0500, Leonid Spektor <spektor at andrew.cmu.edu>
wrote:
>
> Susanna,
>
>     From what I understand you want to find frequency count for each
> pattern
> independently of all other patterns.

Yes, that's exactly what I need. I need the frequencies of 6 types of
referring
expressions(definite, indefinite and bare NPs; null, personal, and
demonstrative pronouns) and then things such:
- grammatical role of the referring expression
- animacy status of the referent
- grammatical role of the antecedent
- referential distance between referring expression and antecedent
among others.

And then some combinations of frequencies of each type of referring
expression and the things just listed, say:
- personal pronoun with an antecedent in subject position
- personal pronoun with an antecedent in subject position for animate
referents
- personal pronoun with an antecedent in subject position for animate
referents and with 1 clause between antecedent and pronoun
etc.

So, I'm carrying out some tests. For that, I've made different codes files.
I've done some freq/statfreq searches and it works good, but there are two
problems:
- in one of the searches, one column wasn't generated, although the
searched coding string exists in the data and it isn't included in any of
the other searched coding strings;
- to check the frequencies, I've carried out combo searches, and sometimes
freq/statfreq and combo yielded different frequencies.

I'm checking up whether there are problems in the coding of the transcripts
which might explain these problems. If not, I'll get back to you soon.

Kindly,
Susanna



If this is true, then you need to run
> freq and statfreq separately for each pattern and then use Excel to join
> each resulting patterns column together into one table.
>
> Leonid.
>
>
>
> On 03-03-09 05:56, "bartsch" <bartsch at zas.gwz-berlin.de> wrote:
>
>>
>> Dear Leonid,
>>
>> Thank you for your quick feedback. I have Windows Vista and I had the
>> 27-Aug-2008 CLAN version and after reading your message, I downloaded
> the
>> 11-Feb-2009 version. Now CLAN doesn't freeze any more, but the outputs I
>> get with freq and statfreq using the new sequence of search patterns are
>> exactly the same as those using the old sequence. Given the features of
>> freq as you kindly explained in your last message, I'll try to do the
> freq
>> searches differently and get back to you soon.
>>
>> Kindly,
>> Susanna
>>
>>
>>
>> On Mon, 02 Mar 2009 17:59:42 -0500, Leonid Spektor
> <spektor at andrew.cmu.edu>
>> wrote:
>>>
>>> Susanna,
>>>
>>>     The code that matches "%|%-PRO:PP-%" pattern was matched by
>>> "%|S-PRO:PP-%" pattern. CLAN counts only first match and only once, no
>>> matter how many other patterns can match the same token. This
> illuminates
>>> duplicate counting of the same token. Please, remember that freq counts
>>> the
>>> frequency of occurrence of each token, so the same token should not be
>>> counted more than once. This program is not designed to simply fill up
> a
>>> spread-sheet table.
>>>
>>>     I don't know why CLAN crashes now and not before on your computer.
>>> Please, try to reboot your computer. Otherwise, please tell me the
>> version
>>> of CLAN and the OS system you are using. And it would also be helpful
> if
>>> you
>>> could send me the data that crashes CLAN on your computer.
>>>
>>> Please, respond to my email directly until we solve all the issues.
>>>
>>> Leonid.
>>>
>>>
>>> On 02-03-09 17:31, "bartsch" <bartsch at zas.gwz-berlin.de> wrote:
>>>
>>>>
>>>> Dear Leonid,
>>>>
>>>> Thank you. Interlinear answers below.
>>>>
>>>>
>>>> On Mon, 02 Mar 2009 14:58:08 -0500, Leonid Spektor
>>> <spektor at andrew.cmu.edu>
>>>> wrote:
>>>>> Susanna,
>>>>>
>>>>>     The search patterns that you use to find codes in your data are
>>>>> overlapping. For example, pattern "%|S-%" will match all codes that
>>>>> pattern
>>>>> "%|S-PRO:PP-%" will match, plus more. The pattern "%|%-PRO:PP-%" will
>>>>> match
>>>>> all codes that both patterns "%|S-PRO:PP-%" and "%|O-PRO:PP-%" will
>>>> match,
>>>>> plus some more. This means that you have some very general patterns
>>> that
>>>>> will override the more specific patterns. To deal with this you need
> to
>>>>> priorities the search patterns order. The more general patterns, such
>>> as
>>>>> "%|S-%" and "%|O-%", need to be listed last and the more specific
>>>>> patterns,
>>>>> such as "%|S-PRO:PP-%" and "%|O-PRO:PP-%", need to be listed first.
>>> This
>>>>> is
>>>>> a basic rule in CLAN, all the processing is done from left to right
> and
>>>>> from
>>>>> top to bottom. The new sequence of search patterns is:
>>>>>
>>>>> %|S-DA:T-%
>>>>> %|O-DA:T-%
>>>>> %|S-PRO:PP-%
>>>>> %|O-PRO:PP-%
>>>>> %|%-DA:T-%
>>>>> %|%-PRO:PP-%
>>>>> %|S-%
>>>>> %|O-%
>>>>>
>>>>> I have changed the code.cut file to reflect this and I am attaching
> it
>>> to
>>>>> this message.
>>>>
>>>> I tried with the long command, as well as with the code.cut file - in
>>> both
>>>> cases, CLAN freezes. I tried it several times, I restarted the
> computer
>>> and
>>>> tried again and again - without success.
>>>>
>>>>
>>>>>
>>>>> The last this to remember is that CLAN will only list patterns that
> it
>>>>> actually can match to something is the data, so if some pattern
> doesn't
>>>>> match anything in the data set, then there isn't going to be a column
>>>>> created for it in the output.
>>>>
>>>> I see. In the case of coding I think it wouldn't be bad to get a
> column
>>>> also for such pattern-data non-matching cases, though.
>>>> But more important for the moment: As you can see from the Excel file
> I
>>>> sent you, in one of the missing columns (|-PRO:PP-) one child did have
> 2
>>>> tokens for the searched coding string (combo found them, and they are
>>> also
>>>> in the transcript). However, the column was not generated... What
> might
>>>> have happened?
>>>>
>>>> Kind regards,
>>>> Susanna
>>>>
>>>>> On 02-03-09 08:06, "bartsch" <bartsch at zas.gwz-berlin.de> wrote:
>>>>>
>>>>>>
>>>>>> Dear Leonid,
>>>>>>
>>>>>> thank you again for your hints. Some interlinear answers below.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Susanna,
>>>>>>>
>>>>>>>     Try this command:
>>>>>>>
>>>>>>> freq +d2 +t at ID="*target_child*" -t* +t%cod +s"%|S-%" +s"%|O-%"
>>>>>>> +s"%|%-DA:T-%" +s"%|%-PRO:PP-%" +s"%|S-DA:T-%" +s"%|O-DA:T-%"
>>>>>>> +s"%|S-PRO:PP-%" +s"%|O-PRO:PP-%"
>>>>>>>
>>>>>>> Notice I have replaced all the '*' characters with '%' character.
> The
>>>>>> above
>>>>>>> example should be on one command line. If this is too much, then
> you
>>>>> can
>>>>>> use
>>>>>>> the file "codes.cut" that I am attaching to this email with this
>>>>> command:
>>>>>>>
>>>>>>> freq +d2 +t at ID="*target_child*" -t* +t%cod +s at codes.cut
>>>>>>>
>>>>>>> If this doesn't help you, then please send me a sample of your data
>>>>> file
>>>>>> and
>>>>>>> further description of what exactly didn't work for you.
>>>>>>
>>>>>> Well, some things worked, others didn't. Regardless of using either
>>> the
>>>>>> full long command or the codes.cut file, the output were the same
> with
>>>>>> following 2 problems:
>>>>>>
>>>>>> 1. There were no columns for two searched coding strings:
>>> "%|%-PRO:PP-%"
>>>>>> and "%|O-PRO:PP-%"
>>>>>>
>>>>>> 2. Frequencies in half the columns were different when checking them
>>>>> using
>>>>>> a combo command.
>>>>>>
>>>>>> I'll send you a sample of my data directly to you, and a message
> with
>>>>> more
>>>>>> details.
>>>>>>
>>>>>>
>>>>>>>     In my tests I did not see an extra column between Œsituation¹
>>> and
>>>>>> the
>>>>>>> first word of the concordance. Perhaps the @ID header tiers in your
>>>>> data
>>>>>>> files have an extra element at the end. You can see the correct
>>> output
>>>>> by
>>>>>>> looking at "sample.cha" and running commands:
>>>>>>>
>>>>>>> freq +d2 +s"pro:%|%" +s"pro|%" sample.cha +t at ID="*mother*" -t*
> +t%mor
>>>>>>> statfreq stat.out.cex +f +d
>>>>>>>
>>>>>>> on it. The "sample.cha" file located in clan/lib/sample/ folder.
>>>>>>
>>>>>>
>>>>>> Yes, curious, I hadn't this extra column anymore, although I didn't
>>> make
>>>>>> any changes in the @ID header tiers...
>>>>>>
>>>>>> Thank you in advance for further help.
>>>>>>
>>>>>> Kindest regards,
>>>>>> Susanna
>>>>
>>>> *****************************************************************
>>>> Susanna Bartsch
>>>> bartsch at zas.gwz-berlin.de
>>>> http://www.zas.gwz-berlin.de/mitarb/homepage/bartsch
>>>> Zentrum fuer Allgemeine Sprachwissenschaft (ZAS)
>>>> Centre for General Linguistics
>>>> Schuetzenstr. 18
>>>> 10117 Berlin
>>>> Germany
>>>> Tel. +49 (0)30 20 192 503
>>>> Fax  +49 (0)30 20 192 402
>>>> *****************************************************************
>>>>
>>>>
>>>>
>>>>
>>>> * * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * *
> *
>>> * * *
>>>> * *
>>>>
>>>> Avira MailGate has processed a mail addressed to you, which contained
> no
>>> known
>>>> potential malicious software.
>>>>
>>>> In case you notice abnormal behavior of your software after opening
> the
>>>> mail or one of its attachments, please forward the complete mail to
>>>> Avira GmbH <mailto:support at avira.com> so it can be
>>>> checked for unknown new potential malicious software.
>>>
>>>
>>>
>>>>
>>>
>>> * * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * * *
> *
>>> * * * *
>>>
>>> Avira MailGate has processed a mail addressed to you, which contained
> no
>>> known
>>> potential malicious software.
>>>
>>> In case you notice abnormal behavior of your software after opening the
>>> mail or one of its attachments, please forward the complete mail to
>>> Avira GmbH <mailto:support at avira.com> so it can be
>>> checked for unknown new potential malicious software.
>>>
>>> --
>>> Avira MailGate
>>>
>>> Copyright (c) 2008 by Avira GmbH.
>>> All rights reserved.
>>> For more information see http://www.avira.com/
>>
>>
>> * * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * * *
> * * *
>> * *
>>
>> Avira MailGate has processed a mail addressed to you, which contained no
> known
>> potential malicious software.
>>
>> In case you notice abnormal behavior of your software after opening the
>> mail or one of its attachments, please forward the complete mail to
>> Avira GmbH <mailto:support at avira.com> so it can be
>> checked for unknown new potential malicious software.
>
>
>
> >
>
> * * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * * * *
> * * * *
>
> Avira MailGate has processed a mail addressed to you, which contained no
> known
> potential malicious software.
>
> In case you notice abnormal behavior of your software after opening the
> mail or one of its attachments, please forward the complete mail to
> Avira GmbH <mailto:support at avira.com> so it can be
> checked for unknown new potential malicious software.
>
> --
> Avira MailGate
>
> Copyright (c) 2008 by Avira GmbH.
> All rights reserved.
> For more information see http://www.avira.com/


* * * * * * * * * * * * * * * Avira MailGate NOTICE * * * * * * * * * * * * * * *

Avira MailGate has processed a mail addressed to you, which contained no known
potential malicious software.

In case you notice abnormal behavior of your software after opening the
mail or one of its attachments, please forward the complete mail to
Avira GmbH <mailto:support at avira.com> so it can be
checked for unknown new potential malicious software.

-- 
Avira MailGate

Copyright (c) 2008 by Avira GmbH.
All rights reserved.
For more information see http://www.avira.com/

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---



More information about the Chibolts mailing list