[Corpora-List] Dice coefficient

JFS jfs at di.fct.unl.pt
Wed Apr 19 10:13:01 UTC 2006


Markus Saers wrote:

> Hello,
>
> My name is Markus Saers, and I am currently implementing an anlignment 
> tool as part of a course in Java for NLP. When trying to implement the 
> Dice coefficient, I ran into some problems that I was hoping someone 
> could help me with.
>
> The only definition of the Dice coefficient that I have seen looks 
> like this:
>
> Dice = 2 * p(ws, wt) / ( p(ws) + p(wt) )
>
> Where p(ws, wt) is the probability of the source word co-occurring 
> with the target word, p(ws) is the probability of the source word and 
> p(wt) is the probability of the target word.
>
> Although it is stated as probabilities, some info that I gathered on 
> the net seems to suggest that frequency count is used instead, which 
> is problematic in word alignment since that would presuppose that 
> Ns=Nt (where Ns is the number of source words and Nt is the number of 
> target words).
>
> The second problem arise when probabilities ARE used. p(ws) and p(wt) 
> are easy to estimate, but how is p(ws, wt) estimated?
>
> Best regards
> Markus Saers
> PhD student, Uppsala University

Dear Markus

If I understand, yoy may solve it like this:

You may use just frequencies since the all N space are equal, as the 
following approach needs it. I supose you have several pairs of 
(Source,Target) docs, where a ws in Source doc of a pair, have several 
candidates in the Target doc of the same pair. So:

Dice(ws,wt)= 2 * sum_(for all pairs Source.i,Targ.j)(freq(ws,Source.i) * 
freq(wt,Targ.j)) / ( sum_(for all Source.i)(freq(ws,Source.i)) + 
sum_(for all Targ.j)(freq(ws,j)) )

by freq(ws,Source.i) I mean the frequency of the ws in Source i.


Tell me if it is clear.

We have tested this approach and it works. However it was not by using 
Dice. It was with SCP. I guess Dice will give you good results too. If 
you want you can test SCP instead and compare the results. Tell us 
something about it, please.

dice(x,y)=2*f(x,y) / (f(x)+f(y))

scp(x,y)= f(x,y)^2 / (f(x)*f(y))

Best regards.

Joaquim

-- 
Joaquim Ferreira da Silva      	| Tel: +351 21 294 8536
Professor Auxiliar		|      +351 21 291 8330 ext: 10732
Departamento de Informática	| Fax: +351 21 294 8541
Fac. de Ciências e Tecnologia	|jfs at di.fct.unl.pt
Universidade Nova de Lisboa	|http://terra.di.fct.unl.pt/~jfs/
2829-516 Caparica, PORTUGAL
 



More information about the Corpora mailing list