<div dir="ltr"><div class="gmail_extra">Dear All,</div><div class="gmail_extra"><br></div><div class="gmail_extra">I have some questions that hopefully experienced researchers can help:</div><div class="gmail_extra"><br></div><div class="gmail_extra">What happens if the corpora is crawled and hosted from countries where copyright laws don't exist? </div><div class="gmail_extra"><br></div><div class="gmail_extra">And then models are built on server in those countries and we are merely calling API to access the models and get the outputs, are the outputs considered as derived work? Assuming that there is no way to reconstruct text given the outputs.</div><div class="gmail_extra"><br></div><div class="gmail_extra">When googling "which country has no copyright laws?", it returns "Eritrea, Turkmenistan and San Marino". Is that really true?</div><div class="gmail_extra"><br></div><div class="gmail_extra">If we crawl copyrighted data and don't distribute it and use it, wouldn't it be violating copyrights already since most site comes with the caveat of not allowing local copies to be stored anywhere. </div><div class="gmail_extra"><br></div><div class="gmail_extra"><b>What do we do with browse-able yet non-download-able data?</b> (assuming that the data will really advance state-of-art in some way or another) Should we not use them at all? What a waste.</div><div class="gmail_extra"><br></div><div class="gmail_extra"><b>What do we do with download-able and non-distributable data? </b>Is re-distributing the crawler to download the data illegal?</div><div class="gmail_extra"><br></div><div class="gmail_extra"><b>What do we do with download-able but not derivable data?</b> Can we even build a model from them?</div><div class="gmail_extra"><br></div><div class="gmail_extra">Best Regards,</div><div class="gmail_extra">Liling</div></div>