Corpora: some language modeling questions

Daniel.Walker at lhsl.com Daniel.Walker at lhsl.com
Mon Jul 30 20:13:34 UTC 2001


This is kind of ugly in ASCII with underscores but ...

>From "Class-based n-gram models of natural language," (Brown, 1990)
P ( w_k | w_1 ... w_k-1) = P ( w_k | c_w_k ) P ( c_w_k | c_w_1 ... c_w_k-1)

I'm using c_w_k instead of c_k to more strongly indicate that this is the
class of the kth word. So, if you sum over all k words you get

sum for w_k  P ( w_k | w_1 ... w_k-1) = sum for w_k P ( w_k | c_w_k ) P (
c_w_k | c_w_1 ... c_w_k-1)

not P(c_w_k|c_w_1 ... c_w_k-1). This will be 1 for you're corpus if you are
using maximum likelihood estimates. Models do not necessarily have to sum
to 1 for some corpus. For example, if you are discounting you're estimates
to leave some probability mass left for phenomena which you may not have
seen in the corpus, the sum may be less than 1. In this case the model
should still some to 1 over the event space, but now you're trying to shoot
for an event space that is larger than the corpus. At least, that's my
understanding. Good luck!

Daniel Walker




                    "F. Peng"
                    <f3peng at logos.math.uwa       To:     CORPORA at HD.UIB.NO
                    terloo.ca>                   cc:
                    Sent by:                     Subject:     Corpora: some language modeling questions
                    owner-corpora at lists.ui
                    b.no


                    07/30/2001 06:54 AM







I have some questions about language modeling. For the class-based n-gram
models (Brown et al.  1990), the probability of word w_k given its history
w_1_(k-1) is defined as

Pr(w_k|w_1_(k-1)) = Pr(w_k|c_k)Pr(c_k|c_1_(k-1))

where w_1_(k-1) is the history of work w_k: w_1...w_(k-1),
c_k is the class which word w_k is in,
c_1_(k-1) is the class history of word w_k: c_1...c_(k-1),

Under this definition, the sum of Pr(w_k|w_1_(k-1)) over all w_k
is not equal to 1, it's Pr(c_k|c_1_(k-1)). Isn't it?

Isn't it a necessary condition for a language model to satisfy the
condtion that \sum_w Pr(w|history) = 1?

Maybe it's not a question for you, but it puzzled me for a while. thanks
in advance for help.

Best regards

Fuchun

---------------------------------------------------------
 Fuchun Peng
 Computer Science Department, University of Waterloo
 Waterloo, Ontario, Canada, N2L 3G1
 1-519-888-4567 ext 3478
 f3peng at ai.uwaterloo.ca
 --------------------------------------------------------



More information about the Corpora mailing list