[Corpora-List] free tagged corpus

Adam Kilgarriff adam at lexmasterclass.com
Thu Nov 17 16:07:56 UTC 2005


At risk of adding more complexity than anyone wants, here is another option:
Freedom to provide a web interface to a corpus. 

If I provide a web interface to a corpus, I am doing something rather less
than redistributing the corpus.  I am giving my users another flavour of
"freedom 0", rather than "freedom 1".

I am also doing what Google and Yahoo do, in relation to the corpus that is
the web.  (They neither pay anything to data owners, nor even ask
permission)

Adam


-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of David Graff
Sent: 17 November 2005 15:03
To: CORPORA at UIB.NO
Subject: Re: [Corpora-List] free tagged corpus 


martin.wynne at oucs.ox.ac.uk said:
> With corpora, a parallel classification may be possible:
>
>      * The freedom to access and analyse the corpus (freedom 0).
>      * The freedom to run your own tools on the corpus, and adapt it to
> your needs (freedom 1). Access to the full text of the corpus is a
> precondition for this.
>      * The freedom to redistribute copies so you can help your neighbor
> (freedom 2).
>      * The freedom to add texts or metadata or annotations, and release
> your improvements to the public, so that the whole community benefits
> (freedom 3). 

Regarding "freedom 3" (the last point), there can be one important
difference between corpora and software.  For many kinds of corpus
research, it's possible to circulate metadata and annotations in
"stand-off" form: instead of including the corpus data with the
annotations, you include indexing information (file name, document ID, 
byte offset, etc) that cites a reference release of the corpus data.

Obviously, the only people who can make use of stand-off annotations are
those who already have or can get "freedom 1" (access to full text) for the
given corpus.  (Or maybe there are ways to make these annotations work for
people who only have "freedom 0"?)

In any case, many researchers can contribute to the community in this way,
and many others can benefit, without risking property-rights infringements:
given that the annotations do not contain a replication of the corpus,
ownership of the annotations (and the choice of whether/how to distribute
them) resides with the annotation creator, and is not limited in any direct
way by the distribution constraints of the corpus.

	David Graff



More information about the Corpora mailing list