[Corpora-List] token clustering tool

Hal Daume III hdaume at ISI.EDU
Tue May 11 13:58:19 UTC 2004


Also,

  http://www.isi.edu/~och/mkcls.html

works quite well.

On Tue, 11 May 2004, Tony Berber Sardinha wrote:

> Hi Murk
>
> (1) SImple chunker:
> -First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
> password
> -Then go to http://lael.pucsp.br/corpora/ngrama/index.html, enter your password
> and cluster size, click on Fazer
> -See results
> (2) N-gram Statistics Package v.0.5 (by  Ted Pedersen and Satanjeev Banerjee)
> -First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
> password
> -Go to http://lael.pucsp.br/corpora/nsp/index.html, enter your password and
> other options, click on Fazer
> -See results
>
> If you're on Linux / Mac OSX / Unix / Cygwin I can send you a simple Unix Shell
> script for that.
>
> cheers
> tony.
> -------------------------------------
> Dr Tony Berber Sardinha
> LAEL, PUC/SP
> (Catholic University of Sao Paulo, Brazil)
> tony4 at uol.com.br
> http://lael.pucsp.br/~tony
> [New website]
>
> ----- Original Message -----
> From: "Murk Wuite" <Murk at polderland.nl>
> To: <CORPORA at HD.UIB.NO>
> Sent: terça-feira, 11 de maio de 2004 04:24
> Subject: [Corpora-List] token clustering tool
>
>
> Dear all,
>
> Does anyone know of a tool (or algorithm), preferably available freely
> for research purposes, that takes as its input a corpus only and
> produces as its output clusters of tokens that occur close to each other
> relatively often?
>
> Best wishes,
>
> Murk Wuite
> MA student at the Department of Language and Speech, Katholieke
> Universiteit Nijmegen, The Netherlands
>
>
>

--
 Hal Daume III                                   | hdaume at isi.edu
 "Arrest this man, he talks in maths."           | www.isi.edu/~hdaume



More information about the Corpora mailing list