[Corpora-List] Google Books, copyrights, and corpora

Wed Jun 14 18:17:03 UTC 2006

The technical question is "Is it possible to reconstruct the full text
from snippets of concordance?". The answer to this depends on how
snippets are selected. The answer will be "yes" if, for every token in
the full text, there is some query that would return that token, along
with enough context to allow the snippets to be sewn back
together. You would be about as certain that the text was right as you
are when you solve a cryptogram. While this is less than complete
mathematical certainty, it would probably convince a judge. The answer
might be "no" if there are enough tokens that Google can guarantee
will never appear in a snippet.

As for the legal question, even a decision in Google's favor might be
narrowly drawn, in which case we would be on legally dangerous ground
were we to assume that we can do something just because it seems to us
similar enough to what Google would (hypothetically) be allowed to
do. Lawyers have training which allows them to make intelligent
guesses about things like this, but even they have rather few firm
precedents to go on. My guess is that a lawyer would advise caution,
at least for now, simply because it is unclear what judges will
eventually decide, if and when such a case comes to court. That, however,
is just a guess.

--