Hi,<br><br>I need a software (even a raw piece of code) which can cluster words from a large untagged corpus into groups using their distributional and morphological similarity.<br>One such software is provided by Alexander Clark (<a href="http://www.cs.rhul.ac.uk/home/alexc/">http://www.cs.rhul.ac.uk/home/alexc/</a>) but his code works only for ASCII characters. I have used it earlier and it works pretty well.<br>


<br>I need something which can work for Unicode encoding.<br>I can deal with it even if the software doesnt take morphological info into account.<br><br>Thanks !<br>Manaal Faruqui<br>IIT Kharagpur, India<br>