[Corpora-List] Available: Icelandic Parsed Historical Corpus, V0.1

Anton Karl Ingason anton.karl.ingason at gmail.com
Fri Jul 2 18:51:35 UTC 2010


We are pleased to announce that a preview version of the Icelandic Parsed
Historical Corpus (IcePaHC) is now available for free download. The corpus
is syntactically parsed, annotated for full phrase structure using an
adaptation of the annotation scheme used by the Penn parsed corpora of
historical English (http://www.ling.upenn.edu/hist-corpora/) and other
corpora in that tradition (see links from website). The preview contains ca.
31.000 words from two periods, the 12th century and the 19th century. Please
note that this is a small portion of the ultimate goal for the completed
corpus, ca. 1 million words from the 12th-19th centuries.

The corpus is distributed as raw UTF-8 data in labeled bracketing format and
it is therefore compatible with various existing programs, including
CorpusSearch (http://corpussearch.sourceforge.net/).

The corpus can be downloaded from:
www.linguist.is/wiki/Download

Further information on the annotation guidelines and project organization
can be found on the project wiki:
www.linguist.is/wiki/

We hope that this early release will result in feedback that allows us to
improve the resource for upcoming versions. Updates will be released every
three months for the next 12 months - starting with version 0.2 which will
be released on October 1st 2010. Between releases, development can be
tracked at our open repository at Github (
http://github.com/antonkarl/icecorpus) but use of released versions is
encouraged to ensure that results can be replicated.

Joel Wallenberg (joel.wallenberg at gmail.com)
Anton Karl Ingason (anton.karl.ingason at gmail.com)
Einar Freyr Sigurðsson (einarfs at gmail.com)
Eiríkur Rögnvaldsson (eirikur at hi.is)
University of Iceland

The project is funded by the following grants:

Icelandic Research Fund (RANNÍS), grant nr. 090662011,"Viable Language
Technology beyond English – Icelandic as a test case".

U.S. National Science Foundation (NSF) International Research Fellowship
Program (IRFP), grant #OISE-0853114, "Evolution of Language Systems: a
comparative study of grammatical change in Icelandic and English".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100702/f58324f3/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list