[Corpora-List] A question About Chomsky normal form

Miles Osborne miles at inf.ed.ac.uk
Sun Sep 21 10:14:10 UTC 2003


you need to be more clear how the inside-outside algorithm failed.  if you mean
that you experienced underflow problems, then a standard approach is to use
logarithms and renormalise when necessary.  if you mean that the image grew too
large, or else it takes too long to converge, then you might be able to tie
rules together (group them into equivalence classes).  or, you might be able to
use a monte carlo simulation (since the inside-outside algorithm computes
expectations, an mc simulation could approximate such expectations).  or, you
might be able to only partially compute the expectation step, or perhaps not
fully maximise at each round.  a paper describing this is:

A View of the EM Algorithm that Justifies Incremental, Sparse, and Other
Variants (Radford Neal and Geoffrey Hinton)

http://www.cs.toronto.edu/~radford/em.abstract.html

it strikes me that people ought to be more interested in scaling-up our machine
learning / statistical inference methods.  why not take this chance to see how
you can scale the inside-outside algorithm?  don't forget to tell us about it ...


Miles





Quoting Heshaam Feili <hfaili at mehr.sharif.edu>:

> Dear Colleguese,
> I need a relateively large bracketed data set with CNF format to test it
> on
> Algorithms like inside-outside (lari and young 1990). I choosed NEGRA
> data
> set and trying to change it to CNF format.
> After the changing to CNF format, a lot of non-terminals will be
> created
> because of binarization ... so the algorithm (inside-outside) failed.
> What can I do in order to overcome this problem?
> Best
>
>
>
>



More information about the Corpora mailing list