[Corpora-List] looking for a topic models toolkit

Marco Baroni marco.baroni at unitn.it
Sun Feb 28 21:14:12 UTC 2010


Dear All,

I'd like to ask for advice re a "Topic Models" (aka Latent Dirichlet 
Allocation, etc.) modeling toolkit.

In particular, I'm looking for something that takes as input a 
word-by-document matrix (or similar data structures), and produces 
probability distributions for the words over the latent topics (my 
immediate goal is to measure word similarity).

I'd like to work with an input corpus with billions of words (millions 
of documents), so I'd need something that scales up well.

Finally, the more out-of-the-boxy it is, the better (in particular, if 
it came with reasonable default choices for the various parameters, that 
would be great).

I'd be grateful for any pointers.

Thanks in advance.

Regards,

Marco




-- 
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list