[Corpora-List] Corpora for text reuse and plagiarism detection
Linda Bawcom
linda.bawcom at sbcglobal.net
Tue Mar 2 23:45:36 UTC 2010
Dear Adeel,
I am not familiar with METER or PAN, so I'm not quite sure if the following will be helpful, but I have used a free program called Wcopyfind 2.6 which you can access at the URL below. I used it to find text reuse for a small corpus I created of newspaper articles on one particular subject. It worked very quickly, and there are various settings to choose from (e.g. have it bridge 3 or more words that are not 100% matches). There is also a very user-friendly guide that explains all the settings on the same web site. I had the program scan some 73 articles (from plain text fies) and it took perhaps 5 seconds for the results. The program tells you how many words and what percentage of reuse there is in the comparisons. You can also get a side by side screen of the results per pair. This was all very helpful because a few of the articles were around 95% text reuse, so if I had kept them in the corpus, it would have skewed the statistical
results I was working on. But you do need all the articles or essays or whatever that you want to scan (e.g. it does not check the Internet).
http://plagiarism.phys.virginia.edu/Wsoftware.html
Kindest regards,
Linda
________________________________
From: Muhammad Adeel <nawabadeel at gmail.com>
To: corpora at uib.no
Sent: Tue, March 2, 2010 4:46:43 PM
Subject: [Corpora-List] Corpora for text reuse and plagiarism detection
Hi Everyone,
Does anyone know corpora for text reuse and plagiarism detection apart from METER and PAN Ist Plagiarism detection competition corpora??
--
Regards
Adeel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100302/5d80ae36/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list