[Corpora-List] question as to MI and t score

JFS jfs at di.fct.unl.pt
Thu Dec 15 14:24:28 UTC 2005


Ramesh Krishnamurthy wrote:

>
> Please see http://torvald.aksis.uib.no/corpora/1999-4/0146.html
>
> If I have understood correctly, the MI score tells you about the 
> 'strength of association'
> (but if the corpus frequency figures for either item are very low, 
> then you may not have much
> confidence in the association; eg extreme case: X and Y occur only 
> once each in the corpus,
> but in that one occurrence, they are adjacent to each other); t-score 
> takes into account the
> corpus frequency of the items, so gives you a'confidence rating' in 
> the association...
>
> I suspect that the corpus frequencies for ['play' and 'role] and 
> ['fight' and 'battle'] would also have to be
> similar for you to make the claim that they have a similar overall 
> collocational relationship...
>
> Hope this helps
> Ramesh
>
>
> At 16:14 14/12/2005, Helene Stengers wrote:
>
>> Dear list,
>>
>> Imagine you have called up collocation listings for the node word 
>> lemmas "play" and "fight". In both lists, the association with for 
>> example the  collocates "role" and "battle"  has the exactly the same 
>> MI / t score. Can I assume that both collocations, i.e. "play a role" 
>> and "fight a battle" have the same "collocational strength", or is 
>> that a wrong assumption?
>>
>> Thanks,
>> Helene 
>
> Ramesh Krishnamurthy
> Lecturer in English Studies
> School of Languages and Social Sciences
> Aston University, Birmingham B4 7ET, UK
> Tel: +44 (0)121-204-3812
> Fax: +44 (0)121-204-3766
> http://www.aston.ac.uk/lss/english/
>
Dear

MI measure is not independent of the bigram frequency. This may be seen 
when X and Y occurs in a prefect co-occurence bigram (X occurs only on 
left of Y, and Y occurs only on right of X);  in these cases MI gives a 
higher  scores for  bigrams of low frequency.

Try scp(X,Y)= f(X,Y)² / (f(X) * f(Y)). It gives the cohesion between X 
and Y and it is independent of the bigram frequency.

Or try cosine(X,Y) = f(X,Y)/ sqrt(f(X) * f(Y)). It is also independent 
of the bigram frequency.

Both measures gives values from 0 to 1.

Joaquim

-- 
Joaquim Ferreira da Silva      	| Tel: +351 21 294 8536
Professor Auxiliar		|      +351 21 291 8330 ext: 10732
Departamento de Informática	| Fax: +351 21 294 8541
Fac. de Ciências e Tecnologia	|jfs at di.fct.unl.pt
Universidade Nova de Lisboa	|http://terra.di.fct.unl.pt/~jfs/
2829-516 Caparica, PORTUGAL
 



More information about the Corpora mailing list