types of utterances

Brian MacWhinney macw at cmu.edu
Fri Oct 10 19:05:03 UTC 2003


Dear Javier,
  You want to use UNIQ.  First, I would run LONGTIER on the files to remove
carriage returns inside utterances.  Then, I would output all the CHI
utterances to a file.  Then I would run UNIQ with the +d option.  This would
remove all duplicates.  I would then further analyze from there.

--Brian MacWhinney


On 10/10/03 11:34 AM, "Javier" <lpxao at psychology.nottingham.ac.uk> wrote:

> Hello,
> 
> Does anyone there know about a good CLAN command that would pool-out types
> of utterances?
> 
> Let me explain myself. I want to restrict the analysis to those utterances
> that differ in at least one word/position. Therefore, I want CLAN to exclude
> identical utterances.
> 
> For instance, let's imagine a corpus with the following set of utterances:
> 
> *CHI:   hola , Mamá .
> *CHI:   0¿ qué tal ?
> *CHI:   adiós , Mamá.
> *CHI:   buenas noches , Mamá .
> *CHI:   hola , Mamá .
> 
> If if use the following command:
> 
> FREQ +t*CHI +s"Mamá"
> 
> the out would count 4 tokens and 1 type.
> 
> However, I want CLAN to exclude one of these tokens, since the child said
> "hola, Mamá" two times. Therefore, I would like to have 3 tokens instead of
> 4.
> 
> A possible way to solve the problem it would be to delete from the corpus
> all the duplicates... but I don't know how to do it.
> 
> I don't have any problem with self-repetitions of repetitions with the
> previous utterance (since they are all labelled). My problem is when the
> speaker reproduces the same sentence at two or more different points in time
> and I just want one of them.
> 
> 
> Thanks for your help.
> 
> 
> Javier Aguado Orea
> School of Psychology
> University of Nottingham
> 
> 
> 
> 
> 
> 



More information about the Chibolts mailing list