[Corpora-List] Licensing output of a GPL'd morphological analyser

Sat Jan 16 02:35:37 UTC 2010

I've always wondered about the limits of this. What if there were an
annotated corpus with a restrictive license, but the text were public
domain. A tool is trained on the corpus and provided under a
restrictive license. I then turn around and run it back over its
training data. In the limit case, it's a memory-based learner that
will achieve 100% accuracy on its training corpus. Surely this isn't
legal, but where might the boundary line be?

On Sat, Jan 16, 2010 at 11:03 AM, Linas Vepstas <linasvepstas at gmail.com> wrote:
> 2010/1/15 Francis Tyers <ftyers at prompsit.com>:
>> El dv 15 de 01 de 2010 a les 14:42 +0000, en/na Jimmy O'Regan va
>> escriure:
>>> 2010/1/15 Adam Radziszewski <kocikikut at gmail.com>:
>>> > Dear corpora users,
>>> > we've got a formal problem with understanding of GPL licences when
>>> > applied to a morphological analyser and its output. I'm sure someone
>>> > before has dealt with a similar issue (and this may be of interest to
>>> > others as well), so I'm asking for help here.
>>> >
>>> > Let's assume a morphological analyser is released under GPL. It
>>> > consists of an extensive lexicon (which in binary form is compiled to
>>> > a transducer) and the actual source code of the transducer and some
>>> > interface. The analyser reads plain text, tokenises it and outputs a
>>> > sequence of tokens with sets of tags attached (each word is assigned
>>> > its entry from the underlying lexicon).
>>> >
>>> > The problem is: does the licence require that a corpus which is
>>> > obtained by running the analyser must be released under a similar
>>> > licence as well?
>>> >
>>> > Why yes: source code is "the preferred form of the work for making
>>> > modifications to it [a work]" (www.gnu.org), thus in case of such an
>>> > analyser, it should include the lexicon as well. What the analyser
>>> > actually does is to systematically dump parts of its lexicon (thus its
>>> > source code) and attach them to output. So the resulting corpus
>>> > actually contains parts of the source code of the analyser.
>>> >
>>> > Why no: this situation resembles using the GNU compiler. When
>>> > compiling some code, gcc outputs some parts of its components to
>>> > generate the resulting object/binary. Yet nobody claims that any
>>> > output of gcc automatically becomes GPL'd.
>>
>> Would the opposite be true ? Taking a non-free morphological analyser,
>> and running a corpus through it and publishing the results as GPL ?
>> Would that be "legal" ?
>
> Well, IANAL, but ...
>
> if we take the previously mentioned
> http://www.gnu.org/licenses/gpl-faq.html#GPLOutput
> to heart, then the answer would be "yes, of course" -- that faq entry
> seems to imply that there's no legal way of licensing a tool (any tool)
> to restrict the use of the tool. (Err, unless the tool is a gun, an x-ray
> machine or something that's dangerous enough to require a govt. license
> to operate but even then there's little restriction on use once you have
> the license).
>
> (I'm assuming your initial corpus was GPL'ed to begin with)
>
> --linas
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora