[Corpora-List] Document similarity tools?
Sebastian Nagel
wastl.nagel at googlemail.com
Sun Mar 3 21:29:08 UTC 2013
Hi Ivelina,
have a look at
https://github.com/vilda/shash/
an implementation of Charikar's similarity hash.
(you need to compile the source code)
Sebastian
On 03/03/2013 12:09 PM, Kayla Jacobs wrote:
> Dear Ivelina,
>
> Try MALLET for document classification and topic modelling:
>
> http://mallet.cs.umass.edu/
>
> Fast, reliable, pretty easy to use, well-documented, and free.
>
> Good luck!
> Kayla Jacobs
>
>
> Date: Sun, 03 Mar 2013 11:46:41 +0200
>> From: Ivelina Nikolova <iva at lml.bas.bg>
>> Subject: [Corpora-List] Document similarity tools?
>> To: corpora at uib.no
>>
>> Dear All,
>>
>> I was wondering whether there is a public library or toolbox including
>> various document similarity measures.
>>
>> Thanks,
>> Ivelina
>>
>>
>>
>> ----------------------------------------------------------------------
>> Send Corpora mailing list submissions to
>> corpora at uib.no
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://mailman.uib.no/listinfo/corpora
>> or, via email, send a message with subject or body 'help' to
>> corpora-request at uib.no
>>
>> You can reach the person managing the list at
>> corpora-owner at uib.no
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Corpora digest..."
>>
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>> End of Corpora Digest, Vol 69, Issue 4
>> **************************************
>>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list