Thank you for the suggestion. We will look into the LGPL-LR.<div><br></div><div>I would also like to clarify that we do explicitly state the LGPL license in a readme file and at the beginning of all data files in the corpus (in addition to our website and announcements), and the full GPL/LGPL text is included in the root directory of the distribution.</div>
<div><br></div><div>Anton</div><div><br><div class="gmail_quote">On Tue, Jan 11, 2011 at 10:24 AM, Karen Fort <span dir="ltr"><<a href="mailto:karen.fort@inist.fr">karen.fort@inist.fr</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi,<br>
<br>
If it's a corpus, then you should have a look at the LGPL-LR (for Linguistic Resource) licence.<br>
<br>
Also note that without the explicit mention of the license (for example using a lisence.txt file), the corpus rights are the most restrictive.<br>
<br>
Hope this helps,<br>
<br>
Karen<br>
<br>
Le 10/01/2011 18:31, Anton Karl Ingason a écrit :<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div></div><div class="h5">
Numerous argument exist for the benefits of free and open resources. In<br>
our corpus project, the Icelandic Parsed Historical Corpus (IcePaHC),<br>
one of our goals is to identify how we can make the most out of these<br>
benefits and compare our approach to the approaches that others have<br>
taken with their parsed corpora (the same issues will of course in many<br>
cases apply equivalently to other types of resources). Our goal is not<br>
to "win the competition of the most free parsed corpus", but rather to<br>
learn what steps one might take to maximize the benefits of such an<br>
approach, while doing our best to carry out these steps in the context<br>
of our project.<br>
<br>
Below is a list of steps we decided to pursue to this end.<br>
We would like to ask Corpora List:<br>
- Are there some other concrete steps that we should state explicitly in<br>
order to acheive our goal?<br>
- Do you disagree with some of the steps?<br>
- What is the situation for other parsed corpora with regard to the<br>
steps we list? In particular it would be useful to get a<br>
"yes/no/comment" for each item on the list for a particular corpus<br>
and/or a reference to a paper/website that can be cited for that<br>
information.<br>
<br>
The steps we have taken with IcePaHC:<br>
1) Raw data available can be downloaded for local use (corpus not hidden<br>
behind a search interface)<br>
2) Comprehensive documentation freely available online<br>
3) Available without registration, user identification of some sort, or<br>
signing of contracts<br>
4) Development process of corpus relies only on free/open source<br>
software tools (for transparent replication of annotation process)<br>
5) Open development (annotation is carried out in an open online version<br>
control repository for transparency regarding the actual steps taken in<br>
the development and immediate access to work-in-progress)<br>
6) Regular scheduled releases of numbered versions during development as<br>
well as for more permanent milestone versions so that researchers can<br>
always produce replicable results on a recent version of the corpus<br>
7) Users can improve the corpus and release modified versions without<br>
special permission<br>
8) Free of cost to academia<br>
9) Free of cost to commercial users<br>
10) Corpus released under a standard free license of some sort for<br>
straightforward compatibility with other projects (GPL, LGPL, CC, etc.)<br>
<br>
The latest version of our corpus, IcePaHC, preview version 0.3, with<br>
262.000 words is available for download as described in the announcement<br>
below.<br>
<br>
-----------<br>
<br>
Available: Icelandic Parsed Historical Corpus, V0.3<br>
<br>
We are pleased to announce that version 0.3 of the Icelandic Parsed<br>
Historical Corpus (IcePaHC) is now available for free download.<br>
<br>
The corpus is syntactically parsed, annotated for full phrase structure<br>
using an adaptation of the annotation scheme used by the Penn parsed<br>
corpora of historical English (<a href="http://www.ling.upenn.edu/hist-corpora/" target="_blank">http://www.ling.upenn.edu/hist-corpora/</a>)<br>
and other corpora in that tradition (see links from website). The corpus<br>
contains ca. 262.000 words from every century between the 12th and the<br>
19th centuries inclusive. Please note that this is about a quarter of<br>
the ultimate goal for the completed corpus, ca. 1 million words.<br>
<br>
The corpus is distributed as raw UTF-8 data in labeled bracketing format<br>
and it is therefore compatible with various existing programs, including<br>
CorpusSearch (<a href="http://corpussearch.sourceforge.net/" target="_blank">http://corpussearch.sourceforge.net/</a>).<br>
<br>
The corpus can be downloaded from:<br>
<a href="http://www.linguist.is/icelandic_treebank/Download" target="_blank">www.linguist.is/icelandic_treebank/Download</a><br></div></div>
<<a href="http://www.linguist.is/icelandic_treebank/Download" target="_blank">http://www.linguist.is/icelandic_treebank/Download</a>><div class="im"><br>
<br>
Further information on the annotation guidelines and project<br>
organization can be found on the project wiki:<br>
<a href="http://www.linguist.is/icelandic_treebank/" target="_blank">www.linguist.is/icelandic_treebank/</a><br></div>
<<a href="http://www.linguist.is/icelandic_treebank/" target="_blank">http://www.linguist.is/icelandic_treebank/</a>><div><div></div><div class="h5"><br>
<br>
We hope that this release will result in feedback that allows us to<br>
improve the resource for upcoming versions. Updates are released every<br>
three months - the upcoming 0.4 version will be released on April 4th<br>
2011. Between releases, development can be tracked at our open<br>
repository at Github (<a href="http://github.com/antonkarl/icecorpus" target="_blank">http://github.com/antonkarl/icecorpus</a>) but use of<br>
released versions is encouraged to ensure that results can be replicated.<br>
<br>
Texts included in Version 0.3:<br>
4439 words from The First Grammatical Treatise (entire text) (12th century)<br>
8179 words from Íslensk hómilíubok (Icelandic book of homilies) (12th<br>
century)<br>
3459 words from Egils saga (theta fragment) (13th century)<br>
22720 words from Sturlunga saga (13th century)<br>
23040 words from Finnboga saga ramma (1350)<br>
11486 words from Bandamanna saga (1450)<br>
23041 words from Vilhjálms saga Sjóðs (1450)<br>
8582 words from Erasmus saga (1525)<br>
20683 words from the New Testament's Gospel of John (1540)<br>
16421 words from the New Testament's Acts (1540)<br>
17127 words from Ólafur Egilsson's travelogue (1628)<br>
9760 words from Píslarsaga Jóns Magnússonar (1659)<br>
22905 words from Jón Indíafari's travelogue (1661)<br>
22099 words from Jón Steingrímsson's biography (1791)<br>
3269 words from Jónas Hallgrímsson's essay on the nature and origin of<br>
the earth (1835)<br>
17837 words from Piltur og stúlka (novel by Jón Thoroddsen) (1850)<br>
27192 words from Brynjólfur Sveinsson biskup (novel by Torfhildur Hólm)<br>
(1882)<br>
Total number of words: 262240<br>
<br>
<br>
Joel C. Wallenberg (<a href="mailto:joel.wallenberg@gmail.com" target="_blank">joel.wallenberg@gmail.com</a><br></div></div>
<mailto:<a href="mailto:joel.wallenberg@gmail.com" target="_blank">joel.wallenberg@gmail.com</a>>)<div class="im"><br>
Anton Karl Ingason (<a href="mailto:anton.karl.ingason@gmail.com" target="_blank">anton.karl.ingason@gmail.com</a><br></div>
<mailto:<a href="mailto:anton.karl.ingason@gmail.com" target="_blank">anton.karl.ingason@gmail.com</a>>)<br>
Einar Freyr Sigurðsson (<a href="mailto:einarfs@gmail.com" target="_blank">einarfs@gmail.com</a> <mailto:<a href="mailto:einarfs@gmail.com" target="_blank">einarfs@gmail.com</a>>)<br>
Eiríkur Rögnvaldsson (<a href="mailto:eirikur@hi.is" target="_blank">eirikur@hi.is</a> <mailto:<a href="mailto:eirikur@hi.is" target="_blank">eirikur@hi.is</a>>)<div class="im"><br>
University of Iceland<br>
<br>
The project is funded by the following grants:<br>
<br>
Icelandic Research Fund (RANNÍS), grant nr. 090662011,"Viable Language<br>
Technology beyond English – Icelandic as a test case".<br>
<br>
U.S. National Science Foundation (NSF) International Research Fellowship<br>
Program (IRFP), grant #OISE-0853114, "Evolution of Language Systems: a<br>
comparative study of grammatical change in Icelandic and English".<br>
<br>
</div></blockquote>
<br>
-- <br>
Karën FORT<br>
Ingénieure/Engineer et/and doctorante/PhD student<br>
INIST-CNRS / LIPN<br>
2, allée de Brabois<br>
54500 Vandoeuvre-lès-Nancy<br>
France<br>
Bureau/Office: H112<br>
+33 (0)3 83 50 46 36<br>
<br>
<a href="http://www-lipn.univ-paris13.fr/~fort/" target="_blank">http://www-lipn.univ-paris13.fr/~fort/</a><div><div></div><div class="h5"><br>
<br>
_______________________________________________<br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><a href="http://www.linguist.is">www.linguist.is</a><br>s: 846 2613 / tel: +354 846 2613<br><br>
</div>