<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Dear everyone,<div class="">I’ve heard that shuffling a corpus, so that its original sentence order cannot be retrieved, is enough and counts as a transformation, thus alleviating the risk of potential copyright infringement. </div><div class="">Can anyone confirm this?</div><div class=""><br class=""></div><div class="">Best and happy new year,</div><div class=""><br class=""></div><div class="">Djamé </div><div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">Le 6 janv. 2015 à 16:04, Mcenery, Tony <<a href="mailto:a.mcenery@lancaster.ac.uk" class="">a.mcenery@lancaster.ac.uk</a>> a écrit :</div><br class="Apple-interchange-newline"><div class=""><div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); direction: ltr; font-family: Tahoma; font-size: 10pt;" class=""><font face="Tahoma, Geneva, sans-serif" class="">Thanks to all who have contributed to this thread - I have really enjoyed it. Khalid made a passing reference to the UK position - this has recently become quite permissive for non-commercial text mining research, but we have been debating back and forth in Lancaster exactly what this means for corpus linguists. Due to the case-law nature of English Law we won't really know until some cases have been brought forward and we are able to see how the laws/regulations are to be interpreted, hence Khalid's comment about the situation being unclear, I assume. Anyway, for those of you interested in the new exceptions to copyright in the UK, you can read all about it here:</font><div style="font-family: Tahoma, Geneva, sans-serif; font-size: 10pt;" class=""><br class=""></div><div class=""><font face="Tahoma, Geneva, sans-serif" class=""><a href="https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf" style="color: purple; text-decoration: underline;" class="">https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf</a></font><br class=""><div style="font-family: Tahoma, Geneva, sans-serif; font-size: 10pt;" class=""><br class=""><div style="font-family: Tahoma; font-size: 13px;" class=""> </div></div><div style="font-family: 'Times New Roman'; font-size: 16px;" class=""><hr tabindex="-1" class=""><div id="divRpF417969" style="direction: ltr;" class=""><font face="Tahoma" size="2" class=""><b class="">From:</b><span class="Apple-converted-space"> </span><a href="mailto:corpora-bounces@uib.no" style="color: purple; text-decoration: underline;" class="">corpora-bounces@uib.no</a><span class="Apple-converted-space"> </span>[<a href="mailto:corpora-bounces@uib.no" style="color: purple; text-decoration: underline;" class="">corpora-bounces@uib.no</a>] on behalf of Mark Davies [<a href="mailto:Mark_Davies@byu.edu" style="color: purple; text-decoration: underline;" class="">Mark_Davies@byu.edu</a>]<br class=""><b class="">Sent:</b><span class="Apple-converted-space"> </span>06 January 2015 13:36<br class=""><b class="">To:</b><span class="Apple-converted-space"> </span><a href="mailto:corpora@uib.no" style="color: purple; text-decoration: underline;" class="">corpora@uib.no</a><br class=""><b class="">Subject:</b><span class="Apple-converted-space"> </span>Re: [Corpora-List] Copyright question again<br class=""></font><br class=""></div><div class=""></div><div class=""><div style="margin-top: 0px; margin-bottom: 0px;" class="">Marc Brysbaert wrote:<br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">>> <span style="color: rgb(31, 73, 125); font-family: Calibri, sans-serif; font-size: 11pt;" class="">For what it is worth, in my experience word frequency lists and N-gram lists are not a problem. </span></div><div class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><br class=""></span></div><div class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">I agree. I've distributed COCA/COHA word frequency (<a href="http://www.wordfrequency.info/" style="color: purple; text-decoration: underline;" class="">http://www.wordfrequency.info</a>) and n-grams (<a href="http://www.ngrams.info/" style="color: purple; text-decoration: underline;" class="">http://www.ngrams.info</a>) data for several years now, and I've never had any issues.</span></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">>> <span style="color: rgb(31, 73, 125); font-family: Calibri, sans-serif; font-size: 15px; background-color: rgb(255, 255, 255);" class="">The big problem we are encountering is that currently there is no guidance about whether corpora can be shared. As a result, nearly all corpora assembled remain next to inaccessible, meaning that everyone has to collect their own corpus. This is a lot of needless work and also means that little cumulative work can be done.</span><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">I've also been distributing "full-text" data from 450 million word COCA and the 1.9 billion word GloWbE (<a href="http://corpus.byu.edu/glowbe" style="color: purple; text-decoration: underline;" class="">http://corpus.byu.edu/glowbe</a>) for a while now, and again no problems to this point. There is a "twist", though, in terms of how the full-text data has been slightly altered to avoid copyright problems:<br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><a href="http://corpus.byu.edu/full-text/limitations.asp" target="_blank" style="color: purple; text-decoration: underline;" class="">http://corpus.byu.edu/full-text/limitations.asp</a><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">Best,<br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">Mark D.<br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div id="Signature" class=""><div style="font-family: Tahoma; font-size: 13px;" class=""><div style="font-family: Tahoma; font-size: 13px;" class=""><div style="margin-top: 0px; margin-bottom: 0px;" class="">============================================<br class="">Mark Davies<br class="">Professor of Linguistics / Brigham Young University<br class=""><a tabindex="0" href="http://davies-linguistics.byu.edu/" target="_blank" style="color: purple; text-decoration: underline;" class="">http://davies-linguistics.byu.edu/</a></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">** Corpus design and use // Linguistic databases **<br class="">** Historical linguistics // Language variation **<br class="">** English, Spanish, and Portuguese **<br class="">============================================<br class=""></div></div></div></div></div></div></div></div><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); float: none; display: inline !important;" class="">UNSUBSCRIBE from this page:<span class="Apple-converted-space"> </span></span><a href="http://mailman.uib.no/options/corpora" style="color: purple; text-decoration: underline; font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class="">http://mailman.uib.no/options/corpora</a><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); float: none; display: inline !important;" class="">Corpora mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""><a href="mailto:Corpora@uib.no" style="color: purple; text-decoration: underline; font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class="">Corpora@uib.no</a><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""><a href="http://mailman.uib.no/listinfo/corpora" style="color: purple; text-decoration: underline; font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class="">http://mailman.uib.no/listinfo/corpora</a><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""></div></blockquote></div><br class=""></div></body></html>