[Corpora-List] Copyright question again

Tue Jan 6 12:02:56 UTC 2015

> What happens if the corpora is crawled and hosted from countries where
> copyright laws don't exist?
> 
> And then models are built on server in those countries and we are merely
> calling API to access the models and get the outputs, are the outputs
> considered as derived work? Assuming that there is no way to reconstruct
> text given the outputs.

Data on servers has to follow the rules of the country where the server
is. But, from certain points of view, the data is owned by the
corporation owning the server:
  http://www.bbc.co.uk/news/technology-27191500

(Also, from a couple of years before that,
http://www.forbes.com/sites/ciocentral/2012/01/02/can-european-firms-legally-use-u-s-clouds-to-store-data/
)

Going back to the original question, I'd be cautious publishing n-gram
data where you have only one occurrence, as they would be the primary
targets for any legal claim. This is also the data of least interest to
researchers, so little is being lost.

Darren

-- 
Darren Cook, Software Researcher/Developer
My new book: Data Push Apps with HTML5 SSE
Published by O'Reilly: (ask me for a discount code!)
  http://shop.oreilly.com/product/0636920030928.do
Also on Amazon and at all good booksellers!

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora