[Corpora-List] Copyright question again

Khurshid Ahmad kahmad at scss.tcd.ie
Tue Jan 6 18:37:02 UTC 2015


Dear All
The copyright question is a typical European question.  Mark Davies 
runs an excellent show and has given new life to corpus linguistics - 
both the American English and British English variety. The carefully 
designed BNC comprises many texts published by the partners in the BNC 
consortium;  Mark's ANC comes closest to a randomly sample corpus The 
ELRA/ELDA show has been running on the question of copyright almost 
since the time of its inception.

As far as legal opinion is concerned- one can have all the 57 varieties 
of legal opinions if you can afford and/or have access to lawyers. 
Please can we all enjoy the wonderment of language as it manifests 
itself in corpora.

On 06-01-2015 15:04, Mcenery, Tony wrote:
> Thanks to all who have contributed to this thread - I have really
> enjoyed it. Khalid made a passing reference to the UK position - this
> has recently become quite permissive for non-commercial text mining
> research, but we have been debating back and forth in Lancaster
> exactly what this means for corpus linguists. Due to the case-law
> nature of English Law we won't really know until some cases have been
> brought forward and we are able to see how the laws/regulations are 
> to
> be interpreted, hence Khalid's comment about the situation being
> unclear, I assume. Anyway, for those of you interested in the new
> exceptions to copyright in the UK, you can read all about it here:
>
> 
> https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375951/Education_and_Teaching.pdf
>
> -------------------------
>
> FROM: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of
> Mark Davies [Mark_Davies at byu.edu]
>  SENT: 06 January 2015 13:36
>  TO: corpora at uib.no
>  SUBJECT: Re: [Corpora-List] Copyright question again
>
> Marc Brysbaert wrote:
>
>>> For what it is worth, in my experience word frequency lists and
> N-gram lists are not a problem.
>
> I agree. I've distributed COCA/COHA word frequency
> (http://www.wordfrequency.info) and n-grams (http://www.ngrams.info)
> data for several years now, and I've never had any issues.
>
>>> The big problem we are encountering is that currently there is no
> guidance about whether corpora can be shared. As a result, nearly all
> corpora assembled remain next to inaccessible, meaning that everyone
> has to collect their own corpus. This is a lot of needless work and
> also means that little cumulative work can be done.
>
> I've also been distributing "full-text" data from 450 million word
> COCA and the 1.9 billion word GloWbE (http://corpus.byu.edu/glowbe)
> for a while now, and again no problems to this point. There is a
> "twist", though, in terms of how the full-text data has been slightly
> altered to avoid copyright problems:
>
> http://corpus.byu.edu/full-text/limitations.asp [1]
>
> ​Best,
>
> Mark D.
>
> ============================================
>  Mark Davies
>  Professor of Linguistics / Brigham Young University
>  http://davies-linguistics.byu.edu/ [2]
>
> ** Corpus design and use // Linguistic databases **
>  ** Historical linguistics // Language variation **
>  ** English, Spanish, and Portuguese **
>  ============================================
>
>
> Links:
> ------
> [1] http://corpus.byu.edu/full-text/limitations.asp
> [2] http://davies-linguistics.byu.edu/

-- 
Best wishes

Khurshid Ahmad. PhD, FBCS, FTCD, CITP
Professor of Computer Science
School of Computer Science and Statistics
Trinity College
Dublin 2
IRELAND

Phone: 00353 1 896 8429 (Labs: 00 353 1 8968435)
Fax 353 1 677 2204
Webpage: www.cs.tcd.ie/khurshid.ahmad

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list