Corpora: Plagiarism detection

Mon May 8 15:56:31 UTC 2000

> Does anyone know of any current plagiarism detection projects currently
> going on? I know of Malcolm Coulthard and Copycatch, but are there any other
> projects? Also, I would like to do some statistical work on plagiarised
> work, but does anyone know where I can find any data?

The following reference and also the references cited within might be helpful.

"Syntactic Clustering of the Web" by A. Z. Broder, S. C. Glassman, M. S.
Manasse, G. Zweig from Proc of WWW6, available at http://decweb.ethz.ch/WWW6/Te
chnical/Paper205/Paper205.html

They use document fingerprinting to cluster syntactically similar documents.
The same technique has been used to find documents on the web that are similar
by Nevin Heintze, see http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html

-Anoop