[Corpora-List] Measuring relative collocational strength
Alon Lischinsky
alon.lischinsky at kultmed.umu.se
Thu Oct 14 14:02:10 UTC 2010
Hi Yuval,
> I also found log-likelihood ratios to be useful. I would use second order
> measures, meaning, I would measure the LL between word A and each of its
> collocates, creating a vector containing these values; I'd do the same with
> word B; then compare the two vectors, simply with cosine or other known
> vector similarity function
That looks like a good idea. Thanks!
> Last note: I would suspect that a 1.5M token corpus might give you results
> that are so-so. If you can scale up to 100M and even an order of magnitude
> larger, I think you'd do better.
Unfortunately, that's out of the question. I know 1.5M is very little
by lexicographical standards, but my work is on corpus-aided discourse
analysis, and quantitative tests are only a first approximation to the
data. Manually analysing a few thousand concordances is enough work as
it is, and a million would be unthinkable.
A.
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list