Corpora: Plagiarism detection

Tom Vanallemeersch Tom.Vanallemeersch at
Mon May 8 16:37:50 UTC 2000

Paul Clough wrote:
> Hi,
> Does anyone know of any current plagiarism detection projects currently
> going on? I know of Malcolm Coulthard and Copycatch, but are there any other
> projects? Also, I would like to do some statistical work on plagiarised
> work, but does anyone know where I can find any data? I am after plagiarism
> of natural language rather than software plagiarism. Any help would be very
> much appreciated.
> Thanks,
> Paul Clough.
> Postgraduate at The University of Sheffield,
> England.

A while ago, I made a program which can be used for detecting strings
shared by two texts. It works under Unix and takes two filenames as
arguments. The output is a list of shared strings ordered by length,
with information on the occurrences in each text. Strings are only
listed if they appear with a variable context (e.g. "with respect to"
would only appear if it is preceded/followed by different words in the
texts). A shared string may also be a very large text block, in case
of very similar texts.
If you think this is useful, I can send you a copy of the program.



LANT nv/sa, Research Park Haasrode, Interleuvenlaan 21, B-3001 Leuven
mailto:Tom.Vanallemeersch at               Phone: ++32 16 405140                             Fax: ++32 16 404961

More information about the Corpora mailing list