[Corpora-List] Document similarity tools?

Sebastian Nagel wastl.nagel at googlemail.com
Sun Mar 3 21:29:08 UTC 2013


Hi Ivelina,

have a look at
  https://github.com/vilda/shash/
an implementation of Charikar's similarity hash.
(you need to compile the source code)

Sebastian

On 03/03/2013 12:09 PM, Kayla Jacobs wrote:
> Dear Ivelina,
> 
> Try MALLET for document classification and topic modelling:
> 
> http://mallet.cs.umass.edu/
> 
> Fast, reliable, pretty easy to use, well-documented, and free.
> 
> Good luck!
> Kayla Jacobs
> 
> 
> Date: Sun, 03 Mar 2013 11:46:41 +0200
>> From: Ivelina Nikolova <iva at lml.bas.bg>
>> Subject: [Corpora-List] Document similarity tools?
>> To: corpora at uib.no
>>
>> Dear All,
>>
>> I was wondering whether there is a public library or toolbox including
>> various document similarity measures.
>>
>> Thanks,
>> Ivelina
>>
>>
>>
>> ----------------------------------------------------------------------
>> Send Corpora mailing list submissions to
>>         corpora at uib.no
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         http://mailman.uib.no/listinfo/corpora
>> or, via email, send a message with subject or body 'help' to
>>         corpora-request at uib.no
>>
>> You can reach the person managing the list at
>>         corpora-owner at uib.no
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Corpora digest..."
>>
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>> End of Corpora Digest, Vol 69, Issue 4
>> **************************************
>>
> 
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> 


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list