[Corpora-List] Copyright question again

liling tan alvations at gmail.com
Tue Jan 6 11:29:14 UTC 2015


Dear All,

I have some questions that hopefully experienced researchers can help:

What happens if the corpora is crawled and hosted from countries where
copyright laws don't exist?

And then models are built on server in those countries and we are merely
calling API to access the models and get the outputs, are the outputs
considered as derived work? Assuming that there is no way to reconstruct
text given the outputs.

When googling "which country has no copyright laws?", it returns "Eritrea,
Turkmenistan and San Marino". Is that really true?

If we crawl copyrighted data and don't distribute it and use it, wouldn't
it be violating copyrights already since most site comes with the caveat of
not allowing local copies to be stored anywhere.

*What do we do with browse-able yet non-download-able data?* (assuming that
the data will really advance state-of-art in some way or another) Should we
not use them at all? What a waste.

*What do we do with download-able and non-distributable data? *Is
re-distributing the crawler to download the data illegal?

*What do we do with download-able but not derivable data?* Can we even
build a model from them?

Best Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20150106/ef6e37c0/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list