[Corpora-List] Google Books, copyrights, and corpora

Thu Jun 15 03:56:32 UTC 2006

I'm not a legal expert, but if Google loses the case and is prevented
from displaying, for free, a snippet of a few sentences from a
copyrighted book, then I wonder about the consequences on things like
simply citing a book's paragraph: what's the difference between a
Google's snippet, and the paragraph I cite in my paper, in my book, on
on my web site, to refer truthfully to the original author's words?
Will I be expected to pay copyright fees for every citation?
Obviously the amount of data we are talking about is on a different
scale in the case of Google, but if the snippet view in itself is the
target rather than the "mass citation", then I don't really see a
difference between the two situations.

Best.

--
 Jean-Philippe Prost
  Centre for Language Technology
  Macquarie University ~ Sydney, Australia
and
  Laboratoire Parole et Langage (Speech & Language Lab.)
  Universit=E9 de Provence ~ Aix-en-Provence, France
<http://www.ics.mq.edu.au/~jpprost/>
_______________________________________________

On 6/15/06, Mark Davies <Mark_Davies at byu.edu> wrote:
> Most of us are familiar with the Google Books initiative -- the project that will digitize tens of millions of books from several leading libraries (http://books.google.com/intl/en/googlebooks/about.html). Google scans these books and then makes them searchable for end users via the Web.
>
> For copyrighted works, the end users see only a "snippet" view -- similar to what we linguists would call an entry in a KWIC display. This is the line of text containing the word or phrase searched for, and maybe one line of text before and one after.
>
> Google claims that although the entire text is (indexed) on the server, the end user sees only very limited context, and there is therefore no violation of US Fair Use Law. See http://books.google.com/googlebooks/newsviews/legal.html for their legal claims and http://fairuse.stanford.edu/ for US Fair Use Law.
>
> In 2005 Google was sued by the American Association of Publishers, which claimed that the "snippet defense" is not adequate in this case (see http://publishers.org/press/releases.cfm?PressReleaseArticleID=292). The case is still in litigation.
>
> ---
>
> What are the implications of this for corpus creation and use? If Google wins, does it mean that we can include *ANY* texts in a corpus, as long as the end user only has access to short KWIC entries (especially if the search interface prevents them from "chaining" these together to re-create larger strings of text)? I guess I'm interested in this question right now, as I'm considering the legal implications of using a particular text collection (300+ million words) as part of a historical corpus of English.
>
> In the past, we've discussed copyright and we've discussed Google and we've discussed Google copyright issues (see several CORPORA posts in June 2003 relating to cached web pages). But this discussion was before Google announced the Google Books initiative, and before they announced the "snippet defense", which seems to have clear application to what we're doing (or could do) with corpora.
>
> Any comments?
>
> =================================================
> Mark Davies
> Assoc. Prof., Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> http://davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> =================================================
>
>
>