[Corpora-List] question as to MI and t score

Stefan Evert stefan.evert at uos.de
Thu Dec 15 16:57:12 UTC 2005


Dear Helene,

I suppose your rationale is that since MI and t-score measure two 
different aspects of collocations (MI not being sensitive to absolute 
frequency per se, while t-score is very sensitive in this respect), if 
both values are the same for "play - role" and "fight - battle", the 
"collocational strength" should be the same in all respects. Is this 
interpretation correct?

However, if both scores are the same for the two collocations, this 
means simply that both the observed frequencies and the expected 
frequencies of "play - role" and "fight - battle" are identical (you can 
work this out relatively easily from equations, e.g. those given on 
www.collocations.de/AM). While this doesn't indicate a difference in the 
degree of collocation, of course, it no more "proves" that the 
collocational strength is really identical than observing the same 
frequency for a phenomenon in two different corpora proves anything 
about that phenomenon in general – the observation may just as well be 
due to the vagaries of sampling, especially when the frequencies are 
very low.

What you can do is to rule out a large difference between the 
collocational strengths of "play a role" and "fight a battle" with a 
certain degree of statistical confidence. Working out exactly what upper 
bounds on this difference one can assume with how much confidence is 
almost as difficult as a mathematical problem as interpreting the 
differences is as a linguistic problem (what does it really mean if the 
difference in collocational strength is at most "1.7"??).

Best regards,
Stefan

>
>
> Imagine you have called up collocation listings for the node word 
> lemmas "play" and "fight". In both lists, the association with for 
> example the collocates "role" and "battle" has the exactly the same MI 
> / t score. Can I assume that both collocations, i.e. "play a role" and 
> "fight a battle" have the same "collocational strength", or is that a 
> wrong assumption?
>
> Thanks,
> Helene



More information about the Corpora mailing list