Hi,<br><br> I'm looking for a software package that I can use to generate the document similarity matrix for a small corpus of 50 documents, using various of the standard algorithms like tfidf, okapi, language models, cosine, lsa, etc.<br>

<br> Research code is fine I just want a trusted implementation of these algorithms, languages in order of preference are [Python, C, C++] , [Java], Perl], and from there it's not really preferred anymore but fine nonetheless :)<br>

<br> I want to correlate these with human ratings in a research setting.<br><br> Thank you very much!<br> Stephan.<br>