<div dir="ltr"><div>Every country has its own laws. <br></div>Janne<br></div><div class="gmail_extra"><br><div class="gmail_quote">2015-01-06 16:56 GMT+01:00 Djamé Seddah <span dir="ltr"><<a href="mailto:djame.seddah@free.fr" target="_blank">djame.seddah@free.fr</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Dear everyone,<div>I’ve heard that shuffling a corpus, so that its original sentence order cannot be retrieved, is enough and counts as a transformation, thus alleviating the risk of potential copyright infringement.  </div><div>Can anyone confirm this?</div><div><br></div><div>Best and happy new year,</div><div><br></div><div>Djamé </div><div><br></div><div><br><div><blockquote type="cite"><div>Le 6 janv. 2015 à 16:04, Mcenery, Tony <<a href="mailto:a.mcenery@lancaster.ac.uk" target="_blank">a.mcenery@lancaster.ac.uk</a>> a écrit :</div><br><div><div><div class="h5"><div style="font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);direction:ltr;font-family:Tahoma;font-size:10pt"><font face="Tahoma, Geneva, sans-serif">Thanks to all who have contributed to this thread - I have really enjoyed it. Khalid made a passing reference to the UK position - this has recently become quite permissive for non-commercial text mining research, but we have been debating back and forth in Lancaster exactly what this means for corpus linguists. Due to the case-law nature of English Law we won't really know until some cases have been brought forward and we are able to see how the laws/regulations are to be interpreted, hence Khalid's comment about the situation being unclear, I assume. Anyway, for those of you interested in the new exceptions to copyright in the UK, you can read all about it here:</font><div style="font-family:Tahoma,Geneva,sans-serif;font-size:10pt"><br></div><div><font face="Tahoma, Geneva, sans-serif"><a href="https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf" style="color:purple;text-decoration:underline" target="_blank">https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf</a></font><br><div style="font-family:Tahoma,Geneva,sans-serif;font-size:10pt"><br><div style="font-family:Tahoma;font-size:13px"> </div></div><div style="font-family:'Times New Roman';font-size:16px"><hr><div style="direction:ltr"><font face="Tahoma"><b>From:</b><span> </span><a href="mailto:corpora-bounces@uib.no" style="color:purple;text-decoration:underline" target="_blank">corpora-bounces@uib.no</a><span> </span>[<a href="mailto:corpora-bounces@uib.no" style="color:purple;text-decoration:underline" target="_blank">corpora-bounces@uib.no</a>] on behalf of Mark Davies [<a href="mailto:Mark_Davies@byu.edu" style="color:purple;text-decoration:underline" target="_blank">Mark_Davies@byu.edu</a>]<br><b>Sent:</b><span> </span>06 January 2015 13:36<br><b>To:</b><span> </span><a href="mailto:corpora@uib.no" style="color:purple;text-decoration:underline" target="_blank">corpora@uib.no</a><br><b>Subject:</b><span> </span>Re: [Corpora-List] Copyright question again<br></font><br></div><div></div><div><div style="margin-top:0px;margin-bottom:0px">Marc Brysbaert wrote:<br></div><div style="margin-top:0px;margin-bottom:0px"><br></div><div style="margin-top:0px;margin-bottom:0px">>> <span style="color:rgb(31,73,125);font-family:Calibri,sans-serif;font-size:11pt">For what it is worth, in my experience word frequency lists and N-gram lists are not a problem. </span></div><div><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><br></span></div><div><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">I agree. I've distributed COCA/COHA word frequency (<a href="http://www.wordfrequency.info/" style="color:purple;text-decoration:underline" target="_blank">http://www.wordfrequency.info</a>) and n-grams (<a href="http://www.ngrams.info/" style="color:purple;text-decoration:underline" target="_blank">http://www.ngrams.info</a>) data for several years now, and I've never had any issues.</span></div><div style="margin-top:0px;margin-bottom:0px"><br></div><div style="margin-top:0px;margin-bottom:0px">>> <span style="color:rgb(31,73,125);font-family:Calibri,sans-serif;font-size:15px;background-color:rgb(255,255,255)">The big problem we are encountering is that currently there is no guidance about whether corpora can be shared. As a result, nearly all corpora assembled remain next to inaccessible, meaning that everyone has to collect their own corpus. This is a lot of needless work and also means that little cumulative work can be done.</span><br></div><div style="margin-top:0px;margin-bottom:0px"><br></div><div style="margin-top:0px;margin-bottom:0px">I've also been distributing "full-text" data from 450 million word COCA and the 1.9 billion word GloWbE (<a href="http://corpus.byu.edu/glowbe" style="color:purple;text-decoration:underline" target="_blank">http://corpus.byu.edu/glowbe</a>) for a while now, and again no problems to this point. There is a "twist", though, in terms of how the full-text data has been slightly altered to avoid copyright problems:<br></div><div style="margin-top:0px;margin-bottom:0px"><br></div><div style="margin-top:0px;margin-bottom:0px"><a href="http://corpus.byu.edu/full-text/limitations.asp" style="color:purple;text-decoration:underline" target="_blank">http://corpus.byu.edu/full-text/limitations.asp</a><br></div><div style="margin-top:0px;margin-bottom:0px"><br></div><div style="margin-top:0px;margin-bottom:0px">​Best,<br></div><div style="margin-top:0px;margin-bottom:0px"><br></div><div style="margin-top:0px;margin-bottom:0px">Mark D.<br></div><div style="margin-top:0px;margin-bottom:0px"><br></div><div><div style="font-family:Tahoma;font-size:13px"><div style="font-family:Tahoma;font-size:13px"><div style="margin-top:0px;margin-bottom:0px">============================================<br>Mark Davies<br>Professor of Linguistics / Brigham Young University<br><a href="http://davies-linguistics.byu.edu/" style="color:purple;text-decoration:underline" target="_blank">http://davies-linguistics.byu.edu/</a></div><div style="margin-top:0px;margin-bottom:0px">** Corpus design and use // Linguistic databases **<br>** Historical linguistics // Language variation **<br>** English, Spanish, and Portuguese **<br>============================================<br></div></div></div></div></div></div></div></div></div></div><span class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);float:none;display:inline!important">_______________________________________________</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);float:none;display:inline!important">UNSUBSCRIBE from this page:<span> </span></span><a href="http://mailman.uib.no/options/corpora" style="color:purple;text-decoration:underline;font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" target="_blank">http://mailman.uib.no/options/corpora</a><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);float:none;display:inline!important">Corpora mailing list</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)"><a href="mailto:Corpora@uib.no" style="color:purple;text-decoration:underline;font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" target="_blank">Corpora@uib.no</a><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)"><a href="http://mailman.uib.no/listinfo/corpora" style="color:purple;text-decoration:underline;font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)"></span></div></blockquote></div><br></div></div><br>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr">Janne Bondi Johannessen<br>Professor<div><a href="http://www.hf.uio.no/iln/english/about/organization/text-laboratory/" target="_blank">The Text Laboratory, ILN,  </a>&<br><a href="http://www.hf.uio.no/multiling/english/" target="_blank">Center for Multilingualism in Society across the Lifespan </a><br>University of Oslo<br>Tel: +47 22 85 68 14, mob.: +47 928 966 34<br></div></div></div>
</div>