[Corpora-List] near duplicate detection
Marco Baroni
baroni at sslmit.unibo.it
Thu Jun 2 12:28:56 UTC 2005
Dear Linda,
There was a thread about near duplicate detection on the list in late
December/early January -- perhaps, there is also something useful to your
problem there.
In particular, Marc Kupietz made his tool for near dup detection
available:
http://torvald.aksis.uib.no/corpora/2004-3/0374.html
We also have a tool, that we hope to be able to make available in a week
or so (it requires mysql, and I'm not sure it would run on any platform
but linux...)
Best regards,
Marco
More information about the Corpora
mailing list