[Corpora-List] Measuring relative collocational strength

Wed Oct 13 14:01:45 UTC 2010

Hi.

I am looking for help with a kind of statistical measure that has
probably been described in the literature, but which I don't know how
to call. I should point out that I'm relatively new to corpus studies,
having a background in qualitative discourse studies, and am still
coming to terms with some of the technical lexis.

Simply put, I want to find out, given two terms that are seemingly
synonymous but different in absolute frequency (say, "potato" and
"spud"), which (lexical) terms have statistically significant
differences in their collocation with either. I suppose I could simply
look at the full list of collocates for each term ordered by t-score
or MI and spot differences, but since one of the terms is much rarer
and MI scores are affected by absolute frequency, I guess this might
lead to quite a few artifacts.

I don't know of any piece of software that can do that, so I would
appreciate any pointers, or even suggestions as to how to go about
doing it in R or any other statistical software (my programming skills
aren't great, but I trust I could manage with a little guidance).

Best,

Alon Lischinsky

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora