[Corpora-List] Measuring relative collocational strength

Thu Oct 14 14:02:10 UTC 2010

Hi Yuval,

> I also found log-likelihood ratios to be useful. I would use second order
> measures, meaning, I would measure the LL between word A and each of its
> collocates, creating a vector containing these values; I'd do the same with
> word B; then compare the two vectors, simply with cosine or other known
> vector similarity function

That looks like a good idea. Thanks!

> Last note: I would suspect that a 1.5M token corpus might give you results
> that are so-so. If you can scale up to 100M and even an order of magnitude
> larger, I think you'd do better.

Unfortunately, that's out of the question. I know 1.5M is very little
by lexicographical standards, but my work is on corpus-aided discourse
analysis, and quantitative tests are only a first approximation to the
data. Manually analysing a few thousand concordances is enough work as
it is, and a million would be unthinkable.

A.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora