[Corpora-List] Q: How to identify duplicates in a largedocument collection
Marc Kupietz
kupietz at ids-mannheim.de
Wed Jan 12 13:12:21 UTC 2005
As promised, you can now download the part of our tool which calculates
n-gram based similarities in text collections via anonymous-ftp from:
ftp://ftp.ids-mannheim.de/kt/CSSCCb-4.0.tar.bz2
Regards,
Marc
P.S.: Our network connection is only about 80% up and currently only
active ftp is possible...
--
Marc Kupietz Tel. (+49) 621/1581-409
Institut für Deutsche Sprache, Dept. of Lexical Studies/Corpus Technology
PO Box 101621, 68016 Mannheim, Germany http://www.ids-mannheim.de/
More information about the Corpora
mailing list