[Corpora-List] Legal aspects of compiling corpora

Torzec Nicolas ATER LSI nicolas.torzec at enssat.fr
Tue Jun 17 08:55:48 UTC 2003


Dear Linguists and Lawyers,
I have got the same "problem" with a large (tagged) monitor corpus of
texts from french written on-line forums :
- these messages are publically available in the sense that everybody
can read and reuse them
- each newsgroup server stores and uses its own copies of them
- search engines use and exploit cached copies of them
- ...

So,
- It is an illegal procedure to store these messages - in an anonymous
way - in a database ?
- It is an illegal procedure to exploit this corpus for research
purposes ? (i.e. to realise linguistic studies and to develop NLP
processing using corpus-based machine learning methods)
- It is an illegal procedure to illustrate scientific articles with
examples from this corpus ?


Do I need to ask permission for each author to store and use its
messages ? What if I mention the source and the author ? What about the
copyrights?

Moreover,
- What if I want to make my corpus publically available for researchers
?
- What if NLP processing developed from this corpus are to be integrated
in commercial products ?

Thank you in advances for your help...
References, pointers and suggestions are welcome, especially for the
legal aspects for France...


Nicolas Torzec

--
Nicolas Torzec
PhD Student in NLP processing
--


delucca at nilc.icmc.usp.br wrote:
>
> Dear Linguists and Lawyers,
>
> I am troubled with Legal aspects of corpora compiling. I am in
> doubt if is an illegal procedure storage webpages (or part of them)
> in a database (see at http://www.dictionarium.com/project.htm),
> not available to public, and display its contents as short collocations
> less than 100 characters by time by search method.
>
> On the other hand, the Internet search engines uses cached (temporary ?)
> copies of the sites and display a short of the web pages.
>
> My procedure is wrong? Which the Legal difference? I need ask permission
> for each website to storage its pages? If I mention the source and the author
> I will be protecting the copyrights?
>
>
> I look forward to hearing from you.
>
> Yours Sincerely,
>
> J. L. De Lucca
>
> -------------------------------------------------
> This mail sent through IMP: http://horde.org/imp/

--
Nicolas TORZEC

ENSSAT / Université de Rennes 1
6, rue de Kerampont
22300 Lannion

Mel : nicolas.torzec at enssat.fr
Tel : 02.96.46.27.30
Fax : 02.96.37.01.99
Web : http://www.enssat.fr
--



More information about the Corpora mailing list