[Corpora-List] Keyness across Texts (fwd)

Sat Aug 4 18:04:58 UTC 2007

---------- Forwarded message ----------
Date: Thu, 12 Jul 2007 19:40:02 +0100
From: Mike Scott <mike at lexically.net>
To: Przemek Kaszubski <przemka at amu.edu.pl>
Cc: corpora at uib.no
Subject: Re: [Corpora-List] Keyness across Texts

Just to comment that there seem to be two very different senses of
"cluster" here (I think). One sense, that Przemek has been using, is
roughly the same as "n-gram". The other sense relates to (statistical)
cluster analysis, quite a different animal altogether, which as I
understand it is concerned with determining which items belong together
in some statistical sense, not which words follow one another frequently.

As I understand it the method explained in the paper suggested by Eric
Ringger in this thread seems to depend on something like statistical
cluster analysis. Unfortunately, though, I have to admit do *not* fully
understand it. It would be useful to be pointed to a simpler treatment
of the rationale suitable for the statistically challenged.

Mike Scott
School of English
University of Liverpool

_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora