[Corpora-List] Q: How to identify duplicates in a largedocument collection

Marc Kupietz kupietz at ids-mannheim.de
Wed Jan 12 13:12:21 UTC 2005


As promised, you can now download the part of our tool which calculates
n-gram based similarities in text collections via anonymous-ftp from:

ftp://ftp.ids-mannheim.de/kt/CSSCCb-4.0.tar.bz2

Regards,
 Marc

P.S.: Our network connection is only about 80% up and currently only
active ftp is possible...

-- 
Marc Kupietz                                      Tel. (+49) 621/1581-409
Institut für Deutsche Sprache, Dept. of Lexical Studies/Corpus Technology
PO Box 101621, 68016 Mannheim, Germany        http://www.ids-mannheim.de/



More information about the Corpora mailing list