[Corpora-List] Copyright question again

Tue Jan 6 08:03:50 UTC 2015

Hello Damir,

Perhaps you may read this news article from New York Times and take
necessary precautions. But let me indicate that, Issues about copyrights
are very delicate and must be handled with care. Especially when you are an
individual. Please take a look.

http://www.nytimes.com/2013/01/13/technology/aaron-swartz-internet-activist-dies-at-26.html?pagewanted=all&_r=0

Hope this helps

Regards
Emmanuel

-- 
Emmanuel Buabin
Lecturer, Department of Information Technology
Methodist University College Ghana
Box DC 940
Dansoman

personal: www.ebuabin.net

On Tue, Jan 6, 2015 at 5:00 AM, Damir Cavar <dcavar at me.com> wrote:

> Hi everybody,
>
> I know, this question has been addressed a lot, but, just to get an
> update on this issue and your expert opinion:
>
> If I am accessing the internet from the US, as I am right now, and I
> decide to generate N-gram-based language models by exploiting the web as
> a corpus and publish the word-lists and frequency profiles openly on my
> homepage, sell them even, change or manipulate them, and reuse them in
> various ways, would this be
>
> a. ok as fair-use for research only, excluding commercial use
> b. legal in general, independent of my research interests
> c. legal only in some countries (so, my models would be illegal in some
> others)
>
> What is the current status of the web as a corpus and extracted language
> models from the legal perspective in the US and globally?
>
> If I do the same now with open-access journals and extract frequency
> profiles of tokens for a certain research domain, would it be the same?
> It I use Google Books? Or even some news website?
>
> Is the extraction of a language model, maybe a domain specific frequency
> profile a copyright infringement per se? The text cannot be
> reconstructed, the content is not visible, the authors style neither, in
> particular not, if the corpus is larger etc.
>
> Thanks!
>
> Damir
>
>
>
> --
> Damir Cavar
> Department of Linguistics
> Indiana University
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20150106/38125e17/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora