[Corpora-List] Legal aspects of compiling corpora
Mark Sanderson
m.sanderson at shef.ac.uk
Fri Jun 13 20:55:58 UTC 2003
Someone has just kindly pointed out that I was wrong to say that no single
organisation holds terabytes of data (apart from search engines).
Organisations like Lexis Nexis have such quantities.
At 15:47 13/06/03 -0400, William Mann wrote:
>Without making the problem more difficult, I want to point out that very
>similar problems arise in discourse linguistics, where the objects of study
>are connected texts, often necessarily whole texts.
>
>If a researcher wants to make claims about a whole text, for example about
>how coherence arises, it is often necessary to exhibit the whole text so
>that such claims are examinable. And just as for Corpus Linguistics, the
>texts cannot be made examinable like sentences in a grammar paper, because
>bulk prohibits such large citations.
>
>There has been a lot of implicit reliance on "fair use," accompanied by
>circulation on the internet. It would be hard for discourse linguistics to
>achieve open discussion of results and evidence without something like this.
>==================
>
>There is another locus of examination which might turn out to be very
>relevant. I know about it, but not the details. The Oxford Text Archive
>promotes the protection and circulation of extensive works. They put a lot
>of effort into these issues, including copyright legalities, not
>diminishing the rights of a contributor of a piece, and not creating
>unjustified claims of rights for the Archive itself.
>
>The result is a multipage License agreement that potential submitters agree
>to.
>
>They are at http://ota.ahds.ac.uk/ .
>
>I agree with Doug Cooper that we ought to take a stance. But who is "we"?
>
>Perhaps one of the new departments of corpus science could take leadership
>on this. It would give it an air of professionalism.
>
>Bill Mann
>
>----- Original Message -----
>From: "Doug Cooper" <doug at th.net>
>To: <corpora at hd.uib.no>
>Sent: Friday, June 13, 2003 2:22 PM
>Subject: Re: [Corpora-List] Legal aspects of compiling corpora
>
>
>| At 14:40 13/6/03 +0100, Mark Sanderson wrote:
>| > I think the honest answer is that it is a question with no clear
>answer.
>|
>| Not so clear. The original query was whether a 100-
>| character citation of a text would be a copyright violation.
>| Is there a copyright law anywhere that does not grant
>| "fair use" rights to this sort of minimal citation in all but
>| pathological cases (eg. extremely short texts like song
>| lyrics, or perhaps many consecutive citatations of a
>| single text)?
>|
>| In any case, this question comes up periodically, and the
>| response is almost invariably something along the lines of
>| 'well, you'll probably get away with it.'
>|
>| I am rather surprised that the corpus-using community has
>| not come out with a position statement -- not everybody has
>| to sign on to it, of course -- that articulates the point of view
>| that:
>|
>| a) distributing minimal citations of copyrighted texts, and
>| b) allowing public, indirect access to privately held collections
>| of copyrighted texts for statistical purposes
>| are:
>| a) a necessary part of corpus linguistics research, and
>| b) believed by CL practitioners to be inherently protected
>| as fair use, particularly in non-profit research contexts.
>|
>| and perhaps also gives a few examples of what might _not_
>| be considered professional conduct; eg. making full texts
>| available or easily reconstructed.
>|
>| It seems to me that such a statement would be useful in:
>|
>| a) helping to clarify that CL applications promote the
>| 'Progress of Science;' ie. are a genuine research use;
>| b) helping individual researchers show that they are
>| acting in good faith. in accordance with others in the
>| profession.
>|
>| Obviously, a bunch of us getting together and saying that
>| black is white won't make it so. But to the extent that there
>| _is_ a possible gray area in the balance between copyright
>| and fair use, I think it is important to start to establish our side's
>| position as well.
>|
>| Doug Cooper
>|
More information about the Corpora
mailing list