Hello, My name is Markus Saers, and I am currently implementing an anlignment tool as part of a course in Java for NLP. When trying to implement the Dice coefficient, I ran into some problems that I was hoping someone could help me with.

The only definition of the Dice coefficient that I have seen looks like this: Dice = 2 * p(ws, wt) / ( p(ws) + p(wt) ) Where p(ws, wt) is the probability of the source word co-occurring with the target word, p(ws) is the probability of the source word and p(wt) is the probability of the target word.

Although it is stated as probabilities, some info that I gathered on the net seems to suggest that frequency count is used instead, which is problematic in word alignment since that would presuppose that Ns=Nt (where Ns is the number of source words and Nt is the number of target words).

<br><br>The second problem arise when probabilities ARE used. p(ws) and p(wt) are easy to estimate, but how is p(ws, wt) estimated?<br><br>Best regards<br>Markus Saers<br>PhD student, Uppsala University<br>