KWAL, MLU

Leonid Spektor spektor at andrew.cmu.edu
Mon Feb 16 21:33:33 UTC 2004


Emily,

    I have changed kwal to work a bit more like mlu and mlt.

    Kwal will not count tiers or utterances that consist of these items
only:

    0*
    &*
    +*
    -*
    #*

Mlu will not count the following items as words/morphemes and it will not
count tiers or utterances that consist of these items only:

    0*
    &*
    +*
    -*
    #*
    </?>
    </->
    <///>
    <//>
    </>
    <% bch>
    uh
    um
    :*
    $*

    Mlt will not count tiers or utterances that consist of these items only:

    0*
    &*
    +*
    -*
    #*
    </?>
    </->
    <///>
    <//>
    </>
    <% bch>
    $*


    The following command will create an output file with 133 childs
utterances, plus what ever other utterances are found within this range:

    kwal -z133u +t*chi *.cha +d +o* +o@ +o% +f

    This command will create an output file with 133 childes turns, plus
other utterances within this range:

    kwal -z133t +t*chi *.cha +d +o* +o@ +o% +f

    When you run "mlu +t*chi" or "mlt +t*chi" on the output of the above
commands both mlu and mlt should count corresponding number of tiers and
utterances. If they don't, then please e-mail me the data file and command
lines that resulted in wrong output.

    Hope this helps.

Leonid.

On 16-02-04 05:11, "Emily" <h0009780 at hkusua.hku.hk> wrote:

>> ===== Original Message From "Brian MacWhinney" <macw at cmu.edu> =====
>> Dear Emily,
>>  Sorry about the delay in adding further comment on the issue of
>> analyzing files with a certain number of turns.  The short answer is
>> that KWAL pulls out turns without exclusions, whereas MLU applies
>> additional exclusionary criteria.  So the results are usually different.
>> But, then, KWAL and MLU have different purposes.  It might be possible
>> to modify the programs to achieve some of your purposes.  However, right
>> now, I am not clear enough about what you are trying to do.  Unless,
>> there is some particular reason to worry about MLU, why can't you just
>> use KWAL's method for pulling out a specified range of utterances?
>>
>> --Brian MacWhinney
>
> Dear Brian,
>
> I have counted the numbers of code-mixing of the subject of each file because
> I think code-mixng can give me ideas how the bilingual subject's languages
> would be like. However, since the file length varies between different
> transcripts, the numbers of code-mixing of all the files are not comparable so
> they cannot be fit into statistical calculation. Therefore, I would like to
> standardize the length of all the files in terms of utterances by selecting
> the shortest file, which contains the smallest total number of utterances of
> the subject, as the baseline to shorten other files which are longer than
> that. Because of this, I need MLU to give me the total number of utterances
> while KWAL to pull out a specified range of utterances. However, as what you
> told me, they generate different number of utterances.
>
> In fact, I would like to know how to generate 'upper bound' in CLAN program.
> Thank you for your attention.
>
> Best regards,
> Emily
>
>
>
>
>



More information about the Chibolts mailing list