[Corpora-List] token clustering tool

Tue May 11 13:32:41 UTC 2004

Hi Murk

(1) SImple chunker:
-First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
password
-Then go to http://lael.pucsp.br/corpora/ngrama/index.html, enter your password
and cluster size, click on Fazer
-See results
(2) N-gram Statistics Package v.0.5 (by  Ted Pedersen and Satanjeev Banerjee)
-First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
password
-Go to http://lael.pucsp.br/corpora/nsp/index.html, enter your password and
other options, click on Fazer
-See results

If you're on Linux / Mac OSX / Unix / Cygwin I can send you a simple Unix Shell
script for that.

cheers
tony.
-------------------------------------
Dr Tony Berber Sardinha
LAEL, PUC/SP
(Catholic University of Sao Paulo, Brazil)
tony4 at uol.com.br
http://lael.pucsp.br/~tony
[New website]

----- Original Message -----
From: "Murk Wuite" <Murk at polderland.nl>
To: <CORPORA at HD.UIB.NO>
Sent: terça-feira, 11 de maio de 2004 04:24
Subject: [Corpora-List] token clustering tool

Dear all,

Does anyone know of a tool (or algorithm), preferably available freely
for research purposes, that takes as its input a corpus only and
produces as its output clusters of tokens that occur close to each other
relatively often?

Best wishes,

Murk Wuite
MA student at the Department of Language and Speech, Katholieke
Universiteit Nijmegen, The Netherlands