<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Sad story. But it surely that illustrates Adam's point. He took  on
    some powerful people. Edward Snowden took on powerful agencies too.
    If you tweak a tiger's tail you might get pounced on.<br>
    <br>
    Damir was on about frequency profiles only, not source texts. I
    think he is safe enough until frequency profiles become very
    valuable resources.<br>
    <br>
    Mike<br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 06/01/2015 08:03, Buabin, Emmanuel
      wrote:<br>
    </div>
    <blockquote
cite="mid:CACV+o_w9W0Qx7kFn-VMs2+pEM2DyBcBr+BX3Z=bWv5XUekgqZA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>Hello Damir,<br>
            <br>
          </div>
          Perhaps you may read this news article from New York Times and
          take necessary precautions. But let me indicate that, Issues
          about copyrights are very delicate and must be handled with
          care. Especially when you are an individual. Please take a
          look. <br>
          <br>
          <a moz-do-not-send="true"
href="http://www.nytimes.com/2013/01/13/technology/aaron-swartz-internet-activist-dies-at-26.html?pagewanted=all&_r=0">http://www.nytimes.com/2013/01/13/technology/aaron-swartz-internet-activist-dies-at-26.html?pagewanted=all&_r=0</a>
          <br>
          <br>
        </div>
        Hope this helps<br>
        <div><br>
          Regards<br>
          Emmanuel<br>
          <br>
          <br>
          -- <br>
          <div class="gmail_signature">
            <div dir="ltr">
              <div>Emmanuel Buabin</div>
              <div>Lecturer, Department of Information Technology</div>
              <div>Methodist University College Ghana</div>
              <div>Box DC 940</div>
              <div>Dansoman</div>
              <div> </div>
              <div>personal: <a moz-do-not-send="true"
                  href="http://www.ebuabin.net" target="_blank">www.ebuabin.net</a>
              </div>
            </div>
          </div>
          <br>
             <br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Tue, Jan 6, 2015 at 5:00 AM,
              Damir Cavar <span dir="ltr"><<a moz-do-not-send="true"
                  href="mailto:dcavar@me.com" target="_blank">dcavar@me.com</a>></span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">Hi everybody,<br>
                <br>
                I know, this question has been addressed a lot, but,
                just to get an<br>
                update on this issue and your expert opinion:<br>
                <br>
                If I am accessing the internet from the US, as I am
                right now, and I<br>
                decide to generate N-gram-based language models by
                exploiting the web as<br>
                a corpus and publish the word-lists and frequency
                profiles openly on my<br>
                homepage, sell them even, change or manipulate them, and
                reuse them in<br>
                various ways, would this be<br>
                <br>
                a. ok as fair-use for research only, excluding
                commercial use<br>
                b. legal in general, independent of my research
                interests<br>
                c. legal only in some countries (so, my models would be
                illegal in some<br>
                others)<br>
                <br>
                What is the current status of the web as a corpus and
                extracted language<br>
                models from the legal perspective in the US and
                globally?<br>
                <br>
                If I do the same now with open-access journals and
                extract frequency<br>
                profiles of tokens for a certain research domain, would
                it be the same?<br>
                It I use Google Books? Or even some news website?<br>
                <br>
                Is the extraction of a language model, maybe a domain
                specific frequency<br>
                profile a copyright infringement per se? The text cannot
                be<br>
                reconstructed, the content is not visible, the authors
                style neither, in<br>
                particular not, if the corpus is larger etc.<br>
                <br>
                Thanks!<br>
                <br>
                Damir<br>
                <br>
                <br>
                <br>
                --<br>
                Damir Cavar<br>
                Department of Linguistics<br>
                Indiana University<br>
                <br>
                <br>
                <br>
                _______________________________________________<br>
                UNSUBSCRIBE from this page: <a moz-do-not-send="true"
                  href="http://mailman.uib.no/options/corpora"
                  target="_blank">http://mailman.uib.no/options/corpora</a><br>
                Corpora mailing list<br>
                <a moz-do-not-send="true" href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
                <a moz-do-not-send="true"
                  href="http://mailman.uib.no/listinfo/corpora"
                  target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
                <br>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
    </blockquote>
    <br>
    <div class="moz-signature">-- <br>
      <i>--<br>
        Mike Scott<br>
        *** <br>
        If you publish research which uses WordSmith, do let me know so
        I can include it at
        <a class="moz-txt-link-freetext" href="http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm">http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm</a>
        <br>
        ***<br>
        Aston University<br>
        and<br>
        Lexical Analysis Software Ltd<br>
        <a class="moz-txt-link-abbreviated" href="http://www.lexically.net">www.lexically.net</a><br>
      </i></div>
  </body>
</html>