[Corpora-List] Question about smoothing of

Wladimir Sidorenko wlsidorenko at gmail.com
Mon Jun 25 11:21:32 UTC 2012


Hi Coen,

I tried the modified Kneser-Ney smoothing a few months ago (as a part
of one of the programming assignments in Stanford online NLP course).
Unfortunately the formula I saw didn't tackle the zero problem at all.
It yielded zero once one of the bigram parts had a zero frequency. But
maybe I saw a wrong formula. My solution was to replace zeros with
some very small constant, say 0,000001 (you can play around with its
value). After that the results were more or less acceptable - it
outperformed the Katz' back-off interpolation by 1 % or something like
that. You could also try to combine Kneser-Ney and add-lambda
smoothing, which I suspect would give you even better results. (But I
can be wrong)

Kind regards,
Vladimir


2012/6/21 Coen Jonker <coen.j.jonker at gmail.com>:
> Dear readers of the corpora list,
>
>
> As a part of the AI-master course handwriting recognition I am working on
> the implementation of a Statistical Language Model for 19th century Dutch. I
> am running into a problem and hope you may be able to help. I have already
> spoken with prof. Ernst Wit and he suggested I contacted you. I would be
> very grateful if you could help me along.
>
> The purpose of the statistical language model is to provide a
> knowledge-based estimation for the conditional probability of a word w given
> the history h (previous words), let this probability be P(w|h).
>
> Since the available corpus for this project is quite sparse I want to use
> statistical smoothing on the conditional probabilities. I have learned that
> using a simple maximum likelihood estimation for P(w|h) will yield zero
> probabilities for word sequences that are not in the corpus, even though
> many grammatically correct sequences are not in the corpus. Furthermore, the
> actual probabilities for P(w|h) will be overestimated by maximum likelihood.
>
> There are many smoothing techniques available, but empirically a modified
> form of Kneser-Ney smoothing has been proven very effective (I have attached
> a paper by Stanley Chen and Joshua Goodman explaining this). A quick intro
> on the topic is on: http://www.youtube.com/watch?v=ody1ysUTD7o
>
> The Kneser-Ney smoothing interpolates discounted probabilities for trigrams
> with lower order bigram probabilities. The equations on page 12 (370 in the
> journal numbering) of the attached PDF are the ones I use. The problem I run
> into is that the denominator of the fraction, which is the count of the
> history h in the corpus may be zero, yielding errors, but also making the
> gamma-term zero, yielding zero-probabilities. Avoiding zero probabilities
> was one of the reasons to implement smoothing in the first place.
>
> This problem has frustrated me for a few weeks now, after reading most of
> the available literature on the topic I am afraid that my knowledge of
> language modeling or statistics may be insufficient or that I misunderstood
> a fundamental part of the technique.
>
> Did I misunderstand anything? I sincerely hope you are able to point me in
> the direction of a solution.
>
> Sincerely,
>
> Coen Jonker
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list