[Corpora-List] Legal aspects of compiling corpora

Larry Spitz spitz at docrec.com
Fri Jun 13 21:16:43 UTC 2003


I find this discussion very interesting and I want to add an extra twist
which probably is not of much interest to the Corpora list in general, but
unfortunately I do not know of a better forum.

Aside from the legal aspect of collecting text are the legal aspects of
collecting scanned images of documents. For those of us who are interested
in the analysis of document images obtaining databases of images is quite
difficult, particularly generally available databases where the results of
individual research can be compared.

Since the University of Washington and the University of Nevada, Las Vegas
have stopped publishing such databases, I do not know of anyone who is in
the process of doing so.

One of the real problems is getting copyright permission on document
images. To many authors it is an incomprehensible concept. Does the holder
of a copyright also de facto hold a copyright on the image of the text?
Since the goal is not to photocopy, or otherwise reproduce, that image but
to use it as a basis for research, does copyright law even apply?

In general, we in the document image analysis community are not
particularly interested in the document content as intellectual property
though we are interested in being able to reproduce it (OCR) or understand
its structure, or find it (IR).

To some of us, getting copyright permission on document images does not
seem to be a rational (moral, if you will) requirement.

I would be interested in a discussion as to how copyright law and practice
with respect to images fits in with text corpus collection.

Cheers,

Larry
--
       	 DocRec Ltd   http://www.docrec.com/
      phone: +64-3-545-2105 fax: +64-3-545-2106
34 Strathaven Place, Atawhai, Nelson 7001, New Zealand



More information about the Corpora mailing list