[Corpora-List] Copyright question again

Djamé Seddah djame.seddah at free.fr
Tue Jan 6 15:56:48 UTC 2015


Dear everyone,
I’ve heard that shuffling a corpus, so that its original sentence order cannot be retrieved, is enough and counts as a transformation, thus alleviating the risk of potential copyright infringement.  
Can anyone confirm this?

Best and happy new year,

Djamé 


> Le 6 janv. 2015 à 16:04, Mcenery, Tony <a.mcenery at lancaster.ac.uk <mailto:a.mcenery at lancaster.ac.uk>> a écrit :
> 
> Thanks to all who have contributed to this thread - I have really enjoyed it. Khalid made a passing reference to the UK position - this has recently become quite permissive for non-commercial text mining research, but we have been debating back and forth in Lancaster exactly what this means for corpus linguists. Due to the case-law nature of English Law we won't really know until some cases have been brought forward and we are able to see how the laws/regulations are to be interpreted, hence Khalid's comment about the situation being unclear, I assume. Anyway, for those of you interested in the new exceptions to copyright in the UK, you can read all about it here:
> 
> https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf <https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf>
> 
>  
> From: corpora-bounces at uib.no <mailto:corpora-bounces at uib.no> [corpora-bounces at uib.no <mailto:corpora-bounces at uib.no>] on behalf of Mark Davies [Mark_Davies at byu.edu <mailto:Mark_Davies at byu.edu>]
> Sent: 06 January 2015 13:36
> To: corpora at uib.no <mailto:corpora at uib.no>
> Subject: Re: [Corpora-List] Copyright question again
> 
> Marc Brysbaert wrote:
> 
> >> For what it is worth, in my experience word frequency lists and N-gram lists are not a problem. 
> 
> I agree. I've distributed COCA/COHA word frequency (http://www.wordfrequency.info <http://www.wordfrequency.info/>) and n-grams (http://www.ngrams.info <http://www.ngrams.info/>) data for several years now, and I've never had any issues.
> 
> >> The big problem we are encountering is that currently there is no guidance about whether corpora can be shared. As a result, nearly all corpora assembled remain next to inaccessible, meaning that everyone has to collect their own corpus. This is a lot of needless work and also means that little cumulative work can be done.
> 
> I've also been distributing "full-text" data from 450 million word COCA and the 1.9 billion word GloWbE (http://corpus.byu.edu/glowbe <http://corpus.byu.edu/glowbe>) for a while now, and again no problems to this point. There is a "twist", though, in terms of how the full-text data has been slightly altered to avoid copyright problems:
> 
> http://corpus.byu.edu/full-text/limitations.asp <http://corpus.byu.edu/full-text/limitations.asp>
> 
> ​Best,
> 
> Mark D.
> 
> ============================================
> Mark Davies
> Professor of Linguistics / Brigham Young University
> http://davies-linguistics.byu.edu/ <http://davies-linguistics.byu.edu/>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora <http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora <http://mailman.uib.no/listinfo/corpora>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20150106/826110b1/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list