types of utterances
Brian MacWhinney
macw at cmu.edu
Fri Oct 10 19:05:03 UTC 2003
Dear Javier,
You want to use UNIQ. First, I would run LONGTIER on the files to remove
carriage returns inside utterances. Then, I would output all the CHI
utterances to a file. Then I would run UNIQ with the +d option. This would
remove all duplicates. I would then further analyze from there.
--Brian MacWhinney
On 10/10/03 11:34 AM, "Javier" <lpxao at psychology.nottingham.ac.uk> wrote:
> Hello,
>
> Does anyone there know about a good CLAN command that would pool-out types
> of utterances?
>
> Let me explain myself. I want to restrict the analysis to those utterances
> that differ in at least one word/position. Therefore, I want CLAN to exclude
> identical utterances.
>
> For instance, let's imagine a corpus with the following set of utterances:
>
> *CHI: hola , Mamá .
> *CHI: 0¿ qué tal ?
> *CHI: adiós , Mamá.
> *CHI: buenas noches , Mamá .
> *CHI: hola , Mamá .
>
> If if use the following command:
>
> FREQ +t*CHI +s"Mamá"
>
> the out would count 4 tokens and 1 type.
>
> However, I want CLAN to exclude one of these tokens, since the child said
> "hola, Mamá" two times. Therefore, I would like to have 3 tokens instead of
> 4.
>
> A possible way to solve the problem it would be to delete from the corpus
> all the duplicates... but I don't know how to do it.
>
> I don't have any problem with self-repetitions of repetitions with the
> previous utterance (since they are all labelled). My problem is when the
> speaker reproduces the same sentence at two or more different points in time
> and I just want one of them.
>
>
> Thanks for your help.
>
>
> Javier Aguado Orea
> School of Psychology
> University of Nottingham
>
>
>
>
>
>
More information about the Chibolts
mailing list