[Corpora-List] token clustering tool

Jose Maria Gomez Hidalgo jmgomez at uem.es
Tue May 11 08:19:33 UTC 2004


At 09:24 11/05/2004, Murk Wuite wrote:
>Dear all,
>
>Does anyone know of a tool (or algorithm), preferably available freely
>for research purposes, that takes as its input a corpus only and
>produces as its output clusters of tokens that occur close to each other
>relatively often?

It is possible that the document clustering toolkit CLUTO fit your 
necessities, perhaps with some adaptation.
http://www-users.cs.umn.edu/~karypis/cluto/


>Best wishes,
>
>Murk Wuite
>MA student at the Department of Language and Speech, Katholieke
>Universiteit Nijmegen, The Netherlands

Jose Maria Gomez Hidalgo
Departamento de Inteligencia Artificial
Universidad Europea de Madrid
28670 - Villaviciosa de Odon - MADRID
(+34) 912115670
jmgomez at uem.es
http://www.esi.uem.es/~jmgomez/

La legislación española ampara el secreto de las comunicaciones. Este 
correo electrónico es estrictamente confidencial y va dirigido 
exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda 
ni copie la transmisión y nos lo notifique cuanto antes.

Spanish law guarantees privacy in electronic communications. This 
electronic transmission is strictly confidential and intended solely for 
the addressee. If you are not the intended addressee, you are kindly 
requested not to disclose nor to copy this transmission and to notify us as 
soon as possible.



More information about the Corpora mailing list