Corpora: Measuring Text Reuse

Bill Mann bill_mann at sil.org
Sat May 6 11:10:06 UTC 2000


     Paul and all:

     A large amount of thought has gone into finding clear boundaries and
     categories of reuse,  under labels such as copyright, plagiarism,
     translation, intellectual property and such.  It would be nice if that
     work could be used directly or the concepts adapted for corpus
     research.  It seems worth looking at.

     Bill Mann


______________________________ Reply Separator _________________________________
Subject: Corpora: Measuring Text Reuse
Author:  <p.clough at dcs.shef.ac.uk> at Internet
Date:    5/5/00 9:39 AM


Hi,

I am a postgraduate working on a project looking at how text from a British
news agency is being reused by various newspapers. I am interested in
whether anyone else is working on anything similar or knows any other
projects dealing with reuse. I am trying to get a grasp on how to actually
define reuse as not only am I dealing with verbatim copy of text, I am also
looking at cases where text is paraphrased. Has anyone any ideas or opinions
on what they consider as reuse of text, or what tools could be used to
extract reused material?

Thanks for any comments,

Paul Clough.
University of Sheffield.



More information about the Corpora mailing list