[Corpora-List] token clustering tool
Maarten Jansonius
jansonius at lige.ucl.ac.be
Mon May 24 08:00:06 UTC 2004
At 10:19 11-5-2004, you wrote:
>At 09:24 11/05/2004, Murk Wuite wrote:
>>Dear all,
>>
>>Does anyone know of a tool (or algorithm), preferably available freely
>>for research purposes, that takes as its input a corpus only and
>>produces as its output clusters of tokens that occur close to each other
>>relatively often?
>
>It is possible that the document clustering toolkit CLUTO fit your
>necessities, perhaps with some adaptation.
>http://www-users.cs.umn.edu/~karypis/cluto/
WordSmith Tools (not free) has a Cluster function which takes a corpus and
outputs word clusters based on co-occurence statistics.
http://www.lexically.net/wordsmith/
Version 4, while still in beta, can be used freely for about a month.
Wordsmith can be used also with annotated corpora (it can ignore or use tags).
The freeware AntConc program has a similar function for outputting word
clusters.
http://www.f.waseda.jp/anthony/
And here's a further list of links to some similar programs:
http://www.lboro.ac.uk/research/mmethods/research/software/stats.html
Hope this helps,
Maarten Jansonius
_______________________________
Maarten Jansonius
FLTR / GERM / LIGE
Université catholique de Louvain
Collège Erasme, C468
010 / 47.49.73
_______________________________
More information about the Corpora
mailing list