[Corpora-List] Legal aspects of compiling corpora

Tue Jun 17 13:50:16 UTC 2003

There is a lot of advice on copyright at the Arts and Humanities Data
Service website at http://ahds.ac.uk/copyrightfaq.htm.

My non-professional understanding is that the publication of a document on
the web does not make it legal to copy it. You may however infer that
someone who has made a document freely available to be read on the web will
not be unhappy if you copy it and use it for research, and they are unlikely
to sue you, but copying is still technically illegal.

However, as has been pointed out, Google caches webpages, and so indeed does
your browser, and this could be interpreted as a violation of copyright,
although I don't believe that this has been tested in law.
In the real world, corpus builders usually weigh up the

As Bill Mann pointed out, there are archives like ours, the Oxford Text
Archive, which have resources which are cleared for copyright for research
use. This means that you can be confident about downloading and using a
resource from a trusted archive like this. But this doesn't really help to
take advantage of the mass of electronic text out there on the web.

In the case of the BNC, the copyright holders of all the texts have
explicitly given permission for the use of the texts in research, so
licensed users of the BNC should have no concerns about copyright.

__
Martin Wynne
Head of the Oxford Text Archive

Oxford University Computing Services
13 Banbury Road
Oxford
UK - OX2 6NN
Tel: +44 1865 283299
Fax: +44 1865 273275
martin.wynne at ota.ahds.ac.uk