[Corpora-List] RE: Legal aspects of compiling corpora

Raphael Salkie R.M.Salkie at bton.ac.uk
Fri Jun 13 16:17:39 UTC 2003


Let's separate the moral question from the legal question.

The moral question is, are you doing anything wrong if you include text from
someone else's web page in your corpus?

My answer: Presumably the relevant general moral principle is that you
should not deprive the original author of money which is rightfully theirs.
This must depend on whether you using their text purely for non-commercial
research, or in order to make money yourself.

If you are using someone else's words purely for research, that is morally
right, in my opinion.  In fact, you are likely to be increasing their
income, because you are giving them free publicity by including their words
in your corpus.

Now consider the harder case where you use someone else's words in a corpus
to help you write a textbook which sells millions of copies. Some people
might argue that the original author is morally entitled to a share of your
money.  A counterargument would be that the original text was written to be
read, not to be included in a corpus and (for example) searched for frequent
collocations.  The textbook writer has used the original text as data, not
for its intellectual content, and it is the analysis of the data which gives
the text its commercial value.  Therefore the original author has no moral
right to any of the money from the textbook.

(Compare these two cases: (1) a textbook writer enhances her book by citing
a page from someone's research article which contains supporting arguments.
(2) a textbook writer uses that same page from the research article to
illustrate the use of connectives in academic texts.  In case (1) the
original author has a moral claim on some of the money generated by the
textbook.  In case (2) the original author does not have a moral claim, I
think -- the argument above about free publicity applies, instead. It would
be interesting to know what other list member think about this).

The argument that it is the analysis by the corpus scholar which creates the
commercial value of a text in a corpus can perhaps be taken further.
Suppose I take a printed book which is currently on sale and making money
for its author, scan it into electronic form, and use it in my corpus for
commercial purposes such as textbook writing.  This is probably the hardest
case, since both parties involved have made money out of the same text.
Perhaps even in this case the original author has no claim to a share of my
profits.  It could be argued, indeed, that the original author should feel
honoured that I am using their text in this way.

Using for corpus analysis someone else's data which is in the public domain
(free or for a price, it makes no difference to the moral question) is no
different from any other experimental data.  You have a moral duty to the
person who supplied the data, and to your professional colleagues, to
acknowledge the source of the data, and sometimes you should anonymise the
data so as not to humiliate the person who supplied it; but I don't think
you owe them any money that you earn from using the data.

The legal question is different.  I concur with Adam Kilgarriff's earlier
statement that it depends on how rich your enemies are.  On the other hand,
if you can show that you have taken into account the best current thinking
about the moral question, that might strengthen your case before a court.

Any comments?

Raphael Salkie
School of Languages
University of Brighton, England



-----Original Message-----
From: delucca at nilc.icmc.usp.br [mailto:delucca at nilc.icmc.usp.br]
Sent: 13 June 2003 13:49
To: corpora at hd.uib.no
Subject: [Corpora-List] Legal aspects of compiling corpora


Dear Linguists and Lawyers,

I am troubled with Legal aspects of corpora compiling. I am in
doubt if is an illegal procedure storage webpages (or part of them)
in a database (see at http://www.dictionarium.com/project.htm),
not available to public, and display its contents as short collocations
less than 100 characters by time by search method.

On the other hand, the Internet search engines uses cached (temporary ?)
copies of the sites and display a short of the web pages.

My procedure is wrong? Which the Legal difference? I need ask permission
for each website to storage its pages? If I mention the source and the
author
I will be protecting the copyrights?


I look forward to hearing from you.


Yours Sincerely,


J. L. De Lucca

-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/



More information about the Corpora mailing list