[Corpora-List] topic models toolkit: summary
Marco Baroni
marco.baroni at unitn.it
Sun Mar 7 10:41:14 UTC 2010
Hi everybody.
About a week ago, I asked the list for advice re tools for implementing
Topic Models.
Here is a summary of the responses I've got, for future reference --
thanks to all those who replied!
***
Diarmuid Ó Séaghdha and Vu Hoang Cong Duy recommended the Mallet's topic
modeling library (Diarmuid also provided evidence that it scales up very
well from his experiments, and this is also what I'm going to try first):
http://mallet.cs.umass.edu/topics.php
***
Amaç Herdagdelen mentioned that the Apache Lucene Mahout project
contains functionalities for Latent Dirichlet Allocation:
http://lucene.apache.org/mahout/
***
Barbara Plank recommended Johnathan Chang's R LDA implementation (no
guarantee on scalability, but should be nice at least for prototyping):
http://cran.r-project.org/web/packages/lda/
***
Gianluca Lebani mentioned David Blei's LDA implementation and links to
other toolkits:
http://www.cs.princeton.edu/~blei/lda-c/
***
Thu Le Dieu recommended GibbsLDA++ and its Java variant, and also
provided evidence on their scalability (links to more toolkits on the
first of these pages):
http://gibbslda.sourceforge.net/
http://sourceforge.net/projects/jgibblda/
***
Thanks again to those who replied, and I hope this short survey will be
also useful to somebody else.
Regards,
Marco
--
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list