[Corpora-List] Copyright question again

Adam Kilgarriff adam.kilgarriff at sketchengine.co.uk
Tue Jan 6 19:30:58 UTC 2015


- but some countries are bullies.  As Darren Cook noted,

> Data on servers has to follow the rules of the country where the server
> is. But, from certain points of view, the data is owned by the
>  corporation owning the server:

Guess which country takes the view that, if the company is its company,
then its law controls access to the server?  Yes, the U S of A - which just
happens to be the country where all the companies owning those servers,
live.  We'll trample on your laws if they don't suit us.  (cf Mr Assange,
Snowden - when the stakes get high enough it's only geopolitical enemies of
the USA (Russia, Venezuela) who refuse to be trampled)

Coming from a small company based partly in a mid-Atlantic island and
partly in a small central-European state, collecting data from all over the
world and with customers from around sixty countries, it is scarcely worth
asking 'what the law says;' as it could be any of sixty legal systems
(another critical consideration being, as Khalid points out, it's way
outside our territory to pay one, let alone sixty, sets of lawyers)

Adam


On 6 January 2015 at 18:46, Janne Bondi Johannessen <jannebj at iln.uio.no>
wrote:

> Every country has its own laws.
> Janne
>
> 2015-01-06 16:56 GMT+01:00 Djamé Seddah <djame.seddah at free.fr>:
>
>> Dear everyone,
>> I’ve heard that shuffling a corpus, so that its original sentence order
>> cannot be retrieved, is enough and counts as a transformation, thus
>> alleviating the risk of potential copyright infringement.
>> Can anyone confirm this?
>>
>> Best and happy new year,
>>
>> Djamé
>>
>>
>> Le 6 janv. 2015 à 16:04, Mcenery, Tony <a.mcenery at lancaster.ac.uk> a
>> écrit :
>>
>> Thanks to all who have contributed to this thread - I have really enjoyed
>> it. Khalid made a passing reference to the UK position - this has recently
>> become quite permissive for non-commercial text mining research, but we
>> have been debating back and forth in Lancaster exactly what this means for
>> corpus linguists. Due to the case-law nature of English Law we won't really
>> know until some cases have been brought forward and we are able to see how
>> the laws/regulations are to be interpreted, hence Khalid's comment about
>> the situation being unclear, I assume. Anyway, for those of you interested
>> in the new exceptions to copyright in the UK, you can read all about it
>> here:
>>
>>
>> https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf
>>
>>
>> ------------------------------
>> *From:* corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of
>> Mark Davies [Mark_Davies at byu.edu]
>> *Sent:* 06 January 2015 13:36
>> *To:* corpora at uib.no
>> *Subject:* Re: [Corpora-List] Copyright question again
>>
>> Marc Brysbaert wrote:
>>
>> >> For what it is worth, in my experience word frequency lists and
>> N-gram lists are not a problem.
>>
>> I agree. I've distributed COCA/COHA word frequency (
>> http://www.wordfrequency.info) and n-grams (http://www.ngrams.info) data
>> for several years now, and I've never had any issues.
>>
>> >> The big problem we are encountering is that currently there is no
>> guidance about whether corpora can be shared. As a result, nearly all
>> corpora assembled remain next to inaccessible, meaning that everyone has to
>> collect their own corpus. This is a lot of needless work and also means
>> that little cumulative work can be done.
>>
>> I've also been distributing "full-text" data from 450 million word COCA
>> and the 1.9 billion word GloWbE (http://corpus.byu.edu/glowbe) for a
>> while now, and again no problems to this point. There is a "twist", though,
>> in terms of how the full-text data has been slightly altered to
>> avoid copyright problems:
>>
>> http://corpus.byu.edu/full-text/limitations.asp
>>
>> ​Best,
>>
>> Mark D.
>>
>> ============================================
>> Mark Davies
>> Professor of Linguistics / Brigham Young University
>> http://davies-linguistics.byu.edu/
>> ** Corpus design and use // Linguistic databases **
>> ** Historical linguistics // Language variation **
>> ** English, Spanish, and Portuguese **
>> ============================================
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
>
> --
> Janne Bondi Johannessen
> Professor
> The Text Laboratory, ILN,
> <http://www.hf.uio.no/iln/english/about/organization/text-laboratory/>&
> Center for Multilingualism in Society across the Lifespan
> <http://www.hf.uio.no/multiling/english/>
> University of Oslo
> Tel: +47 22 85 68 14, mob.: +47 928 966 34
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
=============================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at sketchengine.co.uk
Director                                    Lexical Computing Ltd
<http://www.sketchengine.co.uk/>
Visiting Research Fellow                 University of Leeds
<http://leeds.ac.uk/>
*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk/>
 and      SKELL <http://skell.sketchengine.co.uk/>
=============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20150106/ed00ac85/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list