[Corpora-List] Legal aspects of compiling corpora

Mark Sanderson m.sanderson at shef.ac.uk
Fri Jun 13 20:55:58 UTC 2003


Someone has just kindly pointed out that I was wrong to say that no single
organisation holds terabytes of data (apart from search engines).
Organisations like Lexis Nexis have such quantities.


At 15:47 13/06/03 -0400, William Mann wrote:
>Without making the problem more difficult, I want to point out that very
>similar problems arise in discourse linguistics, where the objects of study
>are connected texts, often necessarily whole texts.
>
>If a researcher wants to make claims about a whole text, for example about
>how coherence arises,  it is often necessary to exhibit the whole text so
>that such claims are examinable.  And just as for Corpus Linguistics, the
>texts cannot be made examinable like sentences in a grammar paper, because
>bulk prohibits such large citations.
>
>There has been a lot of implicit reliance on   "fair use,"  accompanied by
>circulation on the internet.  It would be hard for discourse linguistics to
>achieve open discussion of results and evidence without something like this.
>==================
>
>There is another locus of examination which might turn out to be very
>relevant.  I know about it, but not the details.  The Oxford Text Archive
>promotes the protection and circulation of extensive works.   They put a lot
>of effort into these issues, including copyright legalities,  not
>diminishing the rights of a contributor of a piece, and not creating
>unjustified claims of rights for the Archive itself.
>
>The result is a multipage License agreement that potential submitters agree
>to.
>
>They are at http://ota.ahds.ac.uk/ .
>
>I agree with Doug Cooper that we ought to take a stance.  But who is "we"?
>
>Perhaps one of the new departments of corpus science could take leadership
>on this.  It would give it an air of professionalism.
>
>Bill Mann
>
>----- Original Message -----
>From: "Doug Cooper" <doug at th.net>
>To: <corpora at hd.uib.no>
>Sent: Friday, June 13, 2003 2:22 PM
>Subject: Re: [Corpora-List] Legal aspects of compiling corpora
>
>
>| At 14:40 13/6/03 +0100, Mark Sanderson wrote:
>| >  I think the honest answer is that it is a question with no clear
>answer.
>|
>| Not so clear.  The original query was whether a 100-
>| character citation of a text would be a copyright violation.
>| Is there a copyright law anywhere that does not grant
>| "fair use" rights to this sort of minimal citation in all but
>| pathological cases (eg. extremely short texts like song
>| lyrics, or perhaps many consecutive citatations of a
>| single text)?
>|
>|   In any case, this question comes up periodically, and the
>| response is almost invariably something along the lines of
>| 'well, you'll probably get away with it.'
>|
>|   I am rather surprised that the corpus-using community has
>| not come out with a position statement -- not everybody has
>| to sign on to it, of course --  that articulates the point of view
>| that:
>|
>|    a) distributing minimal citations of copyrighted texts, and
>|    b) allowing public, indirect access to privately held collections
>|        of copyrighted texts for statistical purposes
>| are:
>|    a) a necessary part of corpus linguistics research, and
>|    b) believed by CL practitioners to be inherently protected
>|     as fair use, particularly in non-profit research contexts.
>|
>| and perhaps also gives a few examples of what might _not_
>| be considered professional conduct; eg. making full texts
>| available or easily reconstructed.
>|
>|   It seems to me that such a statement would be useful in:
>|
>|    a) helping to clarify that CL applications promote the
>|       'Progress of Science;' ie. are a genuine research use;
>|    b) helping individual researchers show that they are
>|       acting in good faith. in accordance with others in the
>|       profession.
>|
>|   Obviously, a bunch of us getting together and saying that
>| black is white won't make it so.  But to the extent that there
>| _is_ a possible gray area in the balance between copyright
>| and fair use, I think it is important to start to establish our side's
>| position as well.
>|
>|   Doug Cooper
>|



More information about the Corpora mailing list