[Corpora-List] New Release of the TueBa-D/Z, a referentially and syntactically annotated German corpus

Kathrin Beck kathrin.beck at uni-tuebingen.de
Wed May 14 09:34:11 UTC 2008


The Division of Computational Linguistics at the Seminar fuer
Sprachwissenschaft of the University of Tuebingen (Germany) is happy
to announce the release a referentially and syntactically annotated
German corpus:

* The Tuebingen Treebank of Written German (TueBa-D/Z) - fourth release

The TueBa-D/Z treebank is a manually annotated German newspaper corpus
based on data taken from the daily issues of the 'die tageszeitung'.
It currently comprises approximately 36 000 sentences (ca. 640 000
words).

The syntactic annotation scheme of the TueBa-D/Z distinguishes four
levels of syntactic constituency: the lexical level, the phrasal
level, the level of topological fields, and the clausal level.
In addition to constituent structure, annotated trees contain edge
labels between nodes which encode grammatical functions.
Words are annotated with inflectional morphology at the lexical level.

The treebank is available in 3 different formats:
     * NEGRA export format
     * XML format
     * Penn Treebank format

Currently, about 36 000 sentences of the treebank (about 1 700
articles) have been enriched with anaphoric and coreference relations
referring to nominal and pronominal antecedents.
Linking relations include: coreferential (two NPs refer to the same
extralinguistic referent), anaphoric/cataphoric (a definite pronoun
refers to a contextual antecedent) and other relations
(split-antecedent, instance) as well as marking of expletive pronouns.

The referential annotation is available in a unified representation of
syntactic and referential information, in the NEGRA Export and XML
formats.

What is new in the fourth release:

- about 9 000 additional sentences
- about 600 more articles with referential annotation
- cleaner versions of the trees published in the third release

The license for TueBa-D/Z is granted free of charge for scientific use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

With best regards,

Erhard W. Hinrichs
Kathrin Beck
Yannick Versley
Holger Wunsch
Heike Zinsmeister



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list