[Corpora-List] Licensing output of a GPL'd morphological analyser

Fri Jan 15 13:31:00 UTC 2010

Dear corpora users,
we've got a formal problem with understanding of GPL licences when
applied to a morphological analyser and its output. I'm sure someone
before has dealt with a similar issue (and this may be of interest to
others as well), so I'm asking for help here.

Let's assume a morphological analyser is released under GPL. It
consists of an extensive lexicon (which in binary form is compiled to
a transducer) and the actual source code of the transducer and some
interface. The analyser reads plain text, tokenises it and outputs a
sequence of tokens with sets of tags attached (each word is assigned
its entry from the underlying lexicon).

The problem is: does the licence require that a corpus which is
obtained by running the analyser must be released under a similar
licence as well?

Why yes: source code is "the preferred form of the work for making
modifications to it [a work]" (www.gnu.org), thus in case of such an
analyser, it should include the lexicon as well. What the analyser
actually does is to systematically dump parts of its lexicon (thus its
source code) and attach them to output. So the resulting corpus
actually contains parts of the source code of the analyser.

Why no: this situation resembles using the GNU compiler. When
compiling some code, gcc outputs some parts of its components to
generate the resulting object/binary. Yet nobody claims that any
output of gcc automatically becomes GPL'd.

Any ideas welcome.

Regards,
Adam Radziszewski
Wrocław University of Technology

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora