[Corpora-List] looking for a topic models toolkit

Mon Mar 1 00:59:56 UTC 2010

Hi,

You can use MALLET at http://mallet.cs.umass.edu/.

--
Cheers,
Vu

On Mon, Mar 1, 2010 at 5:14 AM, Marco Baroni <marco.baroni at unitn.it> wrote:

> Dear All,
>
> I'd like to ask for advice re a "Topic Models" (aka Latent Dirichlet
> Allocation, etc.) modeling toolkit.
>
> In particular, I'm looking for something that takes as input a
> word-by-document matrix (or similar data structures), and produces
> probability distributions for the words over the latent topics (my immediate
> goal is to measure word similarity).
>
> I'd like to work with an input corpus with billions of words (millions of
> documents), so I'd need something that scales up well.
>
> Finally, the more out-of-the-boxy it is, the better (in particular, if it
> came with reasonable default choices for the various parameters, that would
> be great).
>
> I'd be grateful for any pointers.
>
> Thanks in advance.
>
> Regards,
>
> Marco
>
>
>
>
> --
> Marco Baroni
> Center for Mind/Brain Sciences (CIMeC)
> University of Trento
> http://clic.cimec.unitn.it/marco
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100301/a0508b94/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora