[Corpora-List] Dice coefficient

Markus Saers masaers at gmail.com
Fri Apr 21 14:43:58 UTC 2006


Hello Scott,

OK, I see. So p(w) is read as "the probability of w occurring in a sentence"
rather than "the probability of w occurring in the corpus". Thank you very
much!

Best regards
Markus Saers



On 19/04/06, Piao, Songlin < s.piao at lancaster.ac.uk> wrote:
>
> Hi Markus,
>
> You must be working on word alignment, but I am not sure if you are using
> sentence aligned corpora.
>
> >that frequency count is used instead, which is problematic
> >in word alignment since that would presuppose that Ns=Nt
>
> If you are using sentence-aligned corpora, you can get the frequencies for
> ws and wt by counting the aligned sentence pairs in which each of them
> occurs. In this case, Ns=Nt=total_number_of_aligned_sentence_pairs. As to
> the co-occurrence frequency for (ws, wt), you can get it by counting the
> aligned sentence pairs in which both of them occur.
>
> If you are not using aligned corpora, you can substitute the aligned
> sentence pairs with certain corresponsing text segments, such as paragraphs
> or sections.
>
> Hope this helps.
>
> Scott Piao
> --------------------
> Computing Department
> Lancaster University
> UK
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060421/ebaa5e47/attachment.htm>


More information about the Corpora mailing list