<div>Hello Scott,<br><br>OK, I see. So p(w) is read as "the probability of w occurring in a sentence" rather than "the probability of w occurring in the corpus". Thank you very much!</div>
<div> </div>
<div>Best regards</div>
<div>Markus Saers<br><br><br> </div>
<div><span class="gmail_quote">On 19/04/06, <b class="gmail_sendername">Piao, Songlin</b> <<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:s.piao@lancaster.ac.uk" target="_blank"> s.piao@lancaster.ac.uk
</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">Hi Markus,<br><br>You must be working on word alignment, but I am not sure if you are using sentence aligned corpora.
<br><br>>that frequency count is used instead, which is problematic<br>>in word alignment since that would presuppose that Ns=Nt<br><br>If you are using sentence-aligned corpora, you can get the frequencies for ws and wt by counting the aligned sentence pairs in which each of them occurs. In this case, Ns=Nt=total_number_of_aligned_sentence_pairs. As to the co-occurrence frequency for (ws, wt), you can get it by counting the aligned sentence pairs in which both of them occur.
<br><br>If you are not using aligned corpora, you can substitute the aligned sentence pairs with certain corresponsing text segments, such as paragraphs or sections.<br><br>Hope this helps.<br><br>Scott Piao<br>--------------------
<br>Computing Department<br>Lancaster University<br>UK<br></blockquote></div><br>