[Corpora-List] near duplicate detection

Marco Baroni baroni at sslmit.unibo.it
Thu Jun 2 12:28:56 UTC 2005


Dear Linda,

There was a thread about near duplicate detection on the list in late
December/early January -- perhaps, there is also something useful to your
problem there.

In particular, Marc Kupietz made his tool for near dup detection
available:

http://torvald.aksis.uib.no/corpora/2004-3/0374.html

We also have a tool, that we hope to be able to make available in a week
or so (it requires mysql, and I'm not sure it would run on any platform
but linux...)

Best regards,

Marco



More information about the Corpora mailing list