[Corpora-List] New Testament corpus

Mon Mar 22 14:38:18 UTC 2010

The PROIEL project has now completed a morphologically tagged and
syntactically parsed corpus of the Ancient Greek text of the Gospels,
based on Ulrik Sandborg-Petersen's electronic version of Tischendorff's
edition.

The corpus is available for browsing after (free) registration at
http://foni.uio.no:3000 
There is at the moment no query interface for morphology and syntax, but
it is possible to download the raw data in various xml-based formats
including TIGER-XML for querying in TIGERSearch.

The Greek Gospel text is part of the PROIEL corpus, which contains
several old Indo-European bible translations (Latin, Gothic, Old Church
Slavic and Armenian) and some non-biblical texts. All the texts in the
corpus can be browsed, but only the morphological and syntactic analyses
that have been controlled after annotation can be seen by external
users. Currently that includes

1. The complete Greek Gospels and scattered sentences from the rest of
the NT.
2. The whole gospel of Mark + Luke 1-22 in Jerome's Latin translation,
as well as scattered sentences from the rest of the NT.
3. The whole of Mark and Luke  + Matthew 5-19 from the Codex Marianus
4. The extant parts of Mark, Luke and Matthew in Gothic.

The publicly available corpus currently counts 18688 sentences

We are all the time cross-checking and reviewing new sentences, and new
exports are generated daily. We foresee that the Gospels texts will be
available in all four languages before the summer, but we need help to
complete the Armenian.

The corpus is made available under the Creative Commons Attribution -
Noncommercial - Share Alike 3.0 license.

More information about the PROIEL project at www.hf.uio.no/ifikk/proiel

Dag Haug

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora