[Corpora-List] Legal aspects of compiling corpora

Mark Sanderson m.sanderson at sheffield.ac.uk
Fri Jun 13 13:40:29 UTC 2003

I think the honest answer is that it is a question with no clear answer.

I know that legal concerns have prevented US government funded projects
such as TREC (http://trec.nist.gov) from building Web collections and they
have got other organisations to build and distribute such collections. I
also know that Web search engines have been ordered to take off image and
sound collections from their Web sites, but I don't think this has happened
with HTML. Maybe text is viewed as being generally less valuable than other
media types.

At 09:49 13/06/2003 -0300, delucca at nilc.icmc.usp.br wrote:

>Dear Linguists and Lawyers,
>I am troubled with Legal aspects of corpora compiling. I am in
>doubt if is an illegal procedure storage webpages (or part of them)
>in a database (see at http://www.dictionarium.com/project.htm),
>not available to public, and display its contents as short collocations
>less than 100 characters by time by search method.
>On the other hand, the Internet search engines uses cached (temporary ?)
>copies of the sites and display a short of the web pages.
>My procedure is wrong? Which the Legal difference? I need ask permission
>for each website to storage its pages? If I mention the source and the author
>I will be protecting the copyrights?
>I look forward to hearing from you.
>Yours Sincerely,
>J. L. De Lucca
>This mail sent through IMP: http://horde.org/imp/

Mark Sanderson, Room 303                   Tel: +44 (0) 114 22 22648
Department of Information Studies          Fax: +44 (0) 114 27 80300
University of Sheffield, Regent Court,     mailto:m.sanderson at shef.ac.uk
211 Portobello St., Sheffield, S1 4DP, UK  http://dis.shef.ac.uk/mark/
Good judgement comes from experience, experience comes from bad judgement
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030613/d7c7ab8b/attachment.htm>

More information about the Corpora mailing list