[Corpora-List] topic models toolkit: summary

Marco Baroni marco.baroni at unitn.it
Sun Mar 7 10:41:14 UTC 2010


Hi everybody.

About a week ago, I asked the list for advice re tools for implementing 
Topic Models.

Here is a summary of the responses I've got, for future reference -- 
thanks to all those who replied!

***

Diarmuid Ó Séaghdha and Vu Hoang Cong Duy recommended the Mallet's topic 
modeling library (Diarmuid also provided evidence that it scales up very 
well from his experiments, and this is also what I'm going to try first):

http://mallet.cs.umass.edu/topics.php

***

Amaç Herdagdelen mentioned that the Apache Lucene Mahout project 
contains functionalities for Latent Dirichlet Allocation:

http://lucene.apache.org/mahout/

***

Barbara Plank recommended Johnathan Chang's R LDA implementation (no 
guarantee on scalability, but should be nice at least for prototyping):

http://cran.r-project.org/web/packages/lda/

***

Gianluca Lebani mentioned David Blei's LDA implementation and links to 
other toolkits:

http://www.cs.princeton.edu/~blei/lda-c/

***

Thu Le Dieu recommended GibbsLDA++ and its Java variant, and also 
provided evidence on their scalability (links to more toolkits on the 
first of these pages):

http://gibbslda.sourceforge.net/
http://sourceforge.net/projects/jgibblda/

***

Thanks again to those who replied, and I hope this short survey will be 
also useful to somebody else.

Regards,

Marco


-- 
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list