[Corpora-List] Legal aspects of compiling corpora

peetm peet.morris at comlab.ox.ac.uk
Tue Jun 17 14:22:42 UTC 2003


One possible reason for Google having not been sued so far, is that it is a
private company, i.e. not worth anything like as much as it would be if it
floated (in fact, if it's private, is it worth anything in 'real terms'?)

My own research - building specific corpora in realtime using grid
computing, uses the web as its data-source - so I will be allowing academics
access to entire texts.  I consulted with lawyers at the Oxford Internet
Institute (www.oii.ox.ac.uk) and the bottom line was an opinion (isn't it
always) - that I wouldn't be sued.  'Couldn't' wasn't brought up however.

peetm

www.clg.ox.ac.uk


-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Mark Davies
Sent: 17 June 2003 16:55
To: corpora at hd.uib.no
Subject: RE: [Corpora-List] Legal aspects of compiling corpora

When I was compiling the 100 million word Corpus del Español
(www.corpusdelespanol.org), I
consulted two professors from the US who are experts on copyright law, as
applied to the
Internet.  I explained to them that in my corpus, at least, end users
wouldn't have access
to etnire paragraphs of text, much less an entire text itself.  Both were in
agreement
that it would be quite unlikely that there would be any copyright problems.

What has me intrigued with search engines like Google, however, is their
"cached web page"
functionality, in which they are in essnce reproducing an entire web page --
and all of
the web pages of a given site (assuming no use of robots.txt).  It seems
that this is much
more than the limited context that I ( and others) make available in our
corpora, and yet
there has been no legal challenge.

On the other hand, both of the professors who I consulted mentioned that
it's still a very
murky issue with little or no clearly defined legal precedent -- at least in
the US.

Mark Davies

=================================================
Mark Davies
Assoc. Prof., Spanish Linguistics
Illinois State University
http://mdavies.for.ilstu.edu/

** Corpus design and use // Web-database scripting **
** Historical and dialectal Spanish and Portuguese syntax **
=================================================



More information about the Corpora mailing list