Thank you for the suggestion. We will look into the LGPL-LR.<div><br></div><div>I would also like to clarify that we do explicitly state the LGPL license in a readme file and at the beginning of all data files in the corpus (in addition to our website and announcements), and the full GPL/LGPL text is included in the root directory of the distribution.</div>

<div><br></div><div>Anton</div><div><br><div class="gmail_quote">On Tue, Jan 11, 2011 at 10:24 AM, Karen Fort <span dir="ltr"><<a href="mailto:karen.fort@inist.fr">karen.fort@inist.fr</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi,<br>

<br>

If it's a corpus, then you should have a look at the LGPL-LR (for Linguistic Resource) licence.<br>

<br>

Also note that without the explicit mention of the license (for example using a lisence.txt file), the corpus rights are the most restrictive.<br>

<br>

Hope this helps,<br>

<br>

Karen<br>

<br>

Le 10/01/2011 18:31, Anton Karl Ingason a écrit :<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div></div><div class="h5">

Numerous argument exist for the benefits of free and open resources. In<br>

our corpus project, the Icelandic Parsed Historical Corpus (IcePaHC),<br>

one of our goals is to identify how we can make the most out of these<br>

benefits and compare our approach to the approaches that others have<br>

taken with their parsed corpora (the same issues will of course in many<br>

cases apply equivalently to other types of resources). Our goal is not<br>

to "win the competition of the most free parsed corpus", but rather to<br>

learn what steps one might take to maximize the benefits of such an<br>

approach, while doing our best to carry out these steps in the context<br>

of our project.<br>

<br>

Below is a list of steps we decided to pursue to this end.<br>

We would like to ask Corpora List:<br>

- Are there some other concrete steps that we should state explicitly in<br>

order to acheive our goal?<br>

- Do you disagree with some of the steps?<br>

- What is the situation for other parsed corpora with regard to the<br>

steps we list? In particular it would be useful to get a<br>

"yes/no/comment" for each item on the list for a particular corpus<br>

and/or a reference to a paper/website that can be cited for that<br>

information.<br>

<br>

The steps we have taken with IcePaHC:<br>

1) Raw data available can be downloaded for local use (corpus not hidden<br>

behind a search interface)<br>

2) Comprehensive documentation freely available online<br>

3) Available without registration, user identification of some sort, or<br>

signing of contracts<br>

4) Development process of corpus relies only on free/open source<br>

software tools (for transparent replication of annotation process)<br>

5) Open development (annotation is carried out in an open online version<br>

control repository for transparency regarding the actual steps taken in<br>

the development and immediate access to work-in-progress)<br>

6) Regular scheduled releases of numbered versions during development as<br>

well as for more permanent milestone versions so that researchers can<br>

always produce replicable results on a recent version of the corpus<br>

7) Users can improve the corpus and release modified versions without<br>

special permission<br>

8) Free of cost to academia<br>

9) Free of cost to commercial users<br>

10) Corpus released under a standard free license of some sort for<br>

straightforward compatibility with other projects (GPL, LGPL, CC, etc.)<br>

<br>

The latest version of our corpus, IcePaHC, preview version 0.3, with<br>

262.000 words is available for download as described in the announcement<br>

below.<br>

<br>

-----------<br>

<br>

Available: Icelandic Parsed Historical Corpus, V0.3<br>

<br>

We are pleased to announce that version 0.3 of the Icelandic Parsed<br>

Historical Corpus (IcePaHC) is now available for free download.<br>

<br>

The corpus is syntactically parsed, annotated for full phrase structure<br>

using an adaptation of the annotation scheme used by the Penn parsed<br>

corpora of historical English (<a href="http://www.ling.upenn.edu/hist-corpora/" target="_blank">http://www.ling.upenn.edu/hist-corpora/</a>)<br>

and other corpora in that tradition (see links from website). The corpus<br>

contains ca. 262.000 words from every century between the 12th and the<br>

19th centuries inclusive. Please note that this is about a quarter of<br>

the ultimate goal for the completed corpus, ca. 1 million words.<br>

<br>

The corpus is distributed as raw UTF-8 data in labeled bracketing format<br>

and it is therefore compatible with various existing programs, including<br>

CorpusSearch (<a href="http://corpussearch.sourceforge.net/" target="_blank">http://corpussearch.sourceforge.net/</a>).<br>

<br>

The corpus can be downloaded from:<br>

<a href="http://www.linguist.is/icelandic_treebank/Download" target="_blank">www.linguist.is/icelandic_treebank/Download</a><br></div></div>

<<a href="http://www.linguist.is/icelandic_treebank/Download" target="_blank">http://www.linguist.is/icelandic_treebank/Download</a>><div class="im"><br>

<br>

Further information on the annotation guidelines and project<br>

organization can be found on the project wiki:<br>

<a href="http://www.linguist.is/icelandic_treebank/" target="_blank">www.linguist.is/icelandic_treebank/</a><br></div>

<<a href="http://www.linguist.is/icelandic_treebank/" target="_blank">http://www.linguist.is/icelandic_treebank/</a>><div><div></div><div class="h5"><br>

<br>

We hope that this release will result in feedback that allows us to<br>

improve the resource for upcoming versions. Updates are released every<br>

three months - the upcoming 0.4 version will be released on April 4th<br>

2011. Between releases, development can be tracked at our open<br>

repository at Github (<a href="http://github.com/antonkarl/icecorpus" target="_blank">http://github.com/antonkarl/icecorpus</a>) but use of<br>

released versions is encouraged to ensure that results can be replicated.<br>

<br>

Texts included in Version 0.3:<br>

4439 words from The First Grammatical Treatise (entire text) (12th century)<br>

8179 words from Íslensk hómilíubok (Icelandic book of homilies) (12th<br>

century)<br>

3459 words from Egils saga (theta fragment) (13th century)<br>

22720 words from Sturlunga saga (13th century)<br>

23040 words from Finnboga saga ramma (1350)<br>

11486 words from Bandamanna saga (1450)<br>

23041 words from Vilhjálms saga Sjóðs (1450)<br>

8582 words from Erasmus saga (1525)<br>

20683 words from the New Testament's Gospel of John (1540)<br>

16421 words from the New Testament's Acts (1540)<br>

17127 words from Ólafur Egilsson's travelogue (1628)<br>

9760 words from Píslarsaga Jóns Magnússonar (1659)<br>

22905 words from Jón Indíafari's travelogue (1661)<br>

22099 words from Jón Steingrímsson's biography (1791)<br>

3269 words from Jónas Hallgrímsson's essay on the nature and origin of<br>

the earth (1835)<br>

17837 words from Piltur og stúlka (novel by Jón Thoroddsen) (1850)<br>

27192 words from Brynjólfur Sveinsson biskup (novel by Torfhildur Hólm)<br>

(1882)<br>

Total number of words: 262240<br>

<br>

<br>

Joel C. Wallenberg (<a href="mailto:joel.wallenberg@gmail.com" target="_blank">joel.wallenberg@gmail.com</a><br></div></div>

<mailto:<a href="mailto:joel.wallenberg@gmail.com" target="_blank">joel.wallenberg@gmail.com</a>>)<div class="im"><br>

Anton Karl Ingason (<a href="mailto:anton.karl.ingason@gmail.com" target="_blank">anton.karl.ingason@gmail.com</a><br></div>

<mailto:<a href="mailto:anton.karl.ingason@gmail.com" target="_blank">anton.karl.ingason@gmail.com</a>>)<br>

Einar Freyr Sigurðsson (<a href="mailto:einarfs@gmail.com" target="_blank">einarfs@gmail.com</a> <mailto:<a href="mailto:einarfs@gmail.com" target="_blank">einarfs@gmail.com</a>>)<br>

Eiríkur Rögnvaldsson (<a href="mailto:eirikur@hi.is" target="_blank">eirikur@hi.is</a> <mailto:<a href="mailto:eirikur@hi.is" target="_blank">eirikur@hi.is</a>>)<div class="im"><br>

University of Iceland<br>

<br>

The project is funded by the following grants:<br>

<br>

Icelandic Research Fund (RANNÍS), grant nr. 090662011,"Viable Language<br>

Technology beyond English – Icelandic as a test case".<br>

<br>

U.S. National Science Foundation (NSF) International Research Fellowship<br>

Program (IRFP), grant #OISE-0853114, "Evolution of Language Systems: a<br>

comparative study of grammatical change in Icelandic and English".<br>

<br>

</div></blockquote>

<br>

-- <br>

Karën FORT<br>

Ingénieure/Engineer et/and doctorante/PhD student<br>

INIST-CNRS / LIPN<br>

2, allée de Brabois<br>

54500 Vandoeuvre-lès-Nancy<br>

France<br>

Bureau/Office: H112<br>

+33 (0)3 83 50 46 36<br>

<br>

<a href="http://www-lipn.univ-paris13.fr/~fort/" target="_blank">http://www-lipn.univ-paris13.fr/~fort/</a><div><div></div><div class="h5"><br>

<br>

_______________________________________________<br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br><a href="http://www.linguist.is">www.linguist.is</a><br>s: 846 2613 / tel: +354 846 2613<br><br>

</div>