types of utterances

Javier lpxao at psychology.nottingham.ac.uk
Mon Oct 13 13:45:05 UTC 2003


Thanks for your comments.

UNIQ appeared to be ideal for my purpose until I discovered two main
problems with its use.

1st, it doesn't exclude text preceded by [//] or between < > [text excluded
by KWAL and other commands by using +r6].
Moreover, it takes into account the final bullets [%snd:] / [%mov:] codes
(therefore, it wont find different utterances in sound-linked data like
mine).

2nd, it doesn't allow including additional tiers (like %mor or %com) which I
would like to keep for later analyses.

Julian Pine (just at the next room) recommend me to use COOCCUR with +n20
just after I sent my initial question. COOCCUR works quite well since
accepts +r6 option and it ignores final bullets, however I cannot pool-out
%mor tiers with COOCUR, can I?


Is it possible to sort this out with UNIQ, LONGTIER or any other command not
yet included in the clan.pdf manual?


Thanks,
Javier Aguado Orea
School of Psychology
University of Nottingham



on 10/10/03 8:05 pm, Brian MacWhinney at macw at cmu.edu wrote:

> Dear Javier,
> You want to use UNIQ.  First, I would run LONGTIER on the files to remove
> carriage returns inside utterances.  Then, I would output all the CHI
> utterances to a file.  Then I would run UNIQ with the +d option.  This would
> remove all duplicates.  I would then further analyze from there.
> 
> --Brian MacWhinney
> 
> 
> On 10/10/03 11:34 AM, "Javier" <lpxao at psychology.nottingham.ac.uk> wrote:
> 
>> Hello,
>> 
>> Does anyone there know about a good CLAN command that would pool-out types
>> of utterances?
>> 
>> Let me explain myself. I want to restrict the analysis to those utterances
>> that differ in at least one word/position. Therefore, I want CLAN to exclude
>> identical utterances.
>> 
>> For instance, let's imagine a corpus with the following set of utterances:
>> 
>> *CHI:   hola , Mamá .
>> *CHI:   0¿ qué tal ?
>> *CHI:   adiós , Mamá.
>> *CHI:   buenas noches , Mamá .
>> *CHI:   hola , Mamá .
>> 
>> If if use the following command:
>> 
>> FREQ +t*CHI +s"Mamá"
>> 
>> the out would count 4 tokens and 1 type.
>> 
>> However, I want CLAN to exclude one of these tokens, since the child said
>> "hola, Mamá" two times. Therefore, I would like to have 3 tokens instead of
>> 4.
>> 
>> A possible way to solve the problem it would be to delete from the corpus
>> all the duplicates... but I don't know how to do it.
>> 
>> I don't have any problem with self-repetitions of repetitions with the
>> previous utterance (since they are all labelled). My problem is when the
>> speaker reproduces the same sentence at two or more different points in time
>> and I just want one of them.
>> 
>> 
>> Thanks for your help.
>> 
>> 
>> Javier Aguado Orea
>> School of Psychology
>> University of Nottingham
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 



More information about the Chibolts mailing list