Corpora: some language modeling questions

Mon Jul 30 13:54:58 UTC 2001

I have some questions about language modeling. For the class-based n-gram
models (Brown et al.  1990), the probability of word w_k given its history
w_1_(k-1) is defined as

Pr(w_k|w_1_(k-1)) = Pr(w_k|c_k)Pr(c_k|c_1_(k-1))

where w_1_(k-1) is the history of work w_k: w_1...w_(k-1),
c_k is the class which word w_k is in,
c_1_(k-1) is the class history of word w_k: c_1...c_(k-1),

Under this definition, the sum of Pr(w_k|w_1_(k-1)) over all w_k
is not equal to 1, it's Pr(c_k|c_1_(k-1)). Isn't it?

Isn't it a necessary condition for a language model to satisfy the
condtion that \sum_w Pr(w|history) = 1?

Maybe it's not a question for you, but it puzzled me for a while. thanks
in advance for help.

Best regards

Fuchun

---------------------------------------------------------
 Fuchun Peng
 Computer Science Department, University of Waterloo
 Waterloo, Ontario, Canada, N2L 3G1
 1-519-888-4567 ext 3478
 f3peng at ai.uwaterloo.ca
 --------------------------------------------------------