[Corpora-List] token clustering tool

Maarten Jansonius jansonius at lige.ucl.ac.be
Mon May 24 08:00:06 UTC 2004


At 10:19 11-5-2004, you wrote:
>At 09:24 11/05/2004, Murk Wuite wrote:
>>Dear all,
>>
>>Does anyone know of a tool (or algorithm), preferably available freely
>>for research purposes, that takes as its input a corpus only and
>>produces as its output clusters of tokens that occur close to each other
>>relatively often?
>
>It is possible that the document clustering toolkit CLUTO fit your 
>necessities, perhaps with some adaptation.
>http://www-users.cs.umn.edu/~karypis/cluto/

WordSmith Tools (not free) has a Cluster function which takes a corpus and 
outputs word clusters based on co-occurence statistics. 
http://www.lexically.net/wordsmith/
Version 4, while still in beta, can be used freely for about a month. 
Wordsmith can be used also with annotated corpora (it can ignore or use tags).

The freeware AntConc program has a similar function for outputting word 
clusters.
http://www.f.waseda.jp/anthony/

And here's a further list of links to some similar programs: 
http://www.lboro.ac.uk/research/mmethods/research/software/stats.html

Hope this helps,
Maarten Jansonius


_______________________________
Maarten Jansonius
FLTR / GERM / LIGE
Université catholique de Louvain

Collège Erasme, C468
010 / 47.49.73
_______________________________



More information about the Corpora mailing list