[Corpora-List] Corpora for text reuse and plagiarism detection

Linda Bawcom linda.bawcom at sbcglobal.net
Tue Mar 2 23:45:36 UTC 2010


Dear Adeel,

I am not familiar with METER or PAN, so I'm not quite sure if the following will be helpful, but I  have used a free program called Wcopyfind 2.6 which you can access at the URL below.  I used it to find text reuse for a small corpus I created of newspaper articles on one particular subject. It worked very quickly, and there are various settings to choose from (e.g. have it bridge 3 or more words that are not  100% matches). There is also a very user-friendly guide that explains all the settings on the same web site. I had the program scan some 73 articles (from plain text fies) and it took perhaps 5 seconds for the results. The program tells you how many words and what percentage of reuse there is in the comparisons.  You  can also get a side by side screen of the results per pair. This was all very helpful because a few of the articles were around 95% text reuse, so if I had kept them in the corpus, it would have skewed the statistical
 results I was working on. But you do need all the articles or essays or whatever that you want to scan (e.g. it does not check the Internet). 

http://plagiarism.phys.virginia.edu/Wsoftware.html

Kindest regards,
Linda




________________________________
From: Muhammad Adeel <nawabadeel at gmail.com>
To: corpora at uib.no
Sent: Tue, March 2, 2010 4:46:43 PM
Subject: [Corpora-List] Corpora for text reuse and plagiarism detection

Hi Everyone, 

Does anyone know corpora for text reuse and plagiarism detection apart from METER and PAN Ist Plagiarism detection competition corpora??

-- 
Regards
Adeel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100302/5d80ae36/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list