[Corpora-List] New Release of the T üBa-D/Z

Kathrin Beck kathrin.beck at uni-tuebingen.de
Mon Nov 30 17:18:23 UTC 2009


The Department of Linguistics at the University of Tuebingen, Germany
=====================================================================

is happy to announce a new release (version 5.9)

of the "Tübingen Treebank of Written German" (TüBa-D/Z)
=======================================================

The TueBa-D/Z treebank is a manually annotated German newspaper corpus 
based on data taken from the daily issues of the 'die tageszeitung'. 
Apart from syntactic annotation, it also includes coreference annotation 
at the NP level.

It currently comprises:
     * 794,079 tokens
     * 45,200 sentences
     * 2,213 newspaper articles

The syntactic annotation scheme of the TueBa-D/Z distinguishes four 
levels of syntactic constituency:
     * the lexical level
     * the phrasal level
     * the level of topological fields
     * the clausal level
     * In addition to constituent structure, annotated trees contain 
edge labels with grammatical functions.

All words are annotated with:
     * inflectional morphology at the lexical level
     * POS tags

All newspaper articles of the treebank have been enriched with anaphoric 
and coreference relations referring to nominal and pronominal 
antecedents. Linking relations include:
     * coreferential (two NPs refer to the same extralinguistic referent)
     * anaphoric/cataphoric (a definite pronoun refers to a contextual 
antecedent)
     * and other relations (split-antecedent, instance)
     * as well as marking of inherent reflexive pronouns and expletive 
pronouns.

The treebank is available in 3 different formats:
     * NEGRA export format
     * XML format
     * Penn Treebank format
     * joint syntactic and referential annotation is available in the 
Export and ExportXML formats

What is new in the fifth release:
     * about 9 000 additional sentences
     * about 500 more articles with referential annotation
     * cleaner versions of the trees published in the fourth release
     * the entire referential annotation has been checked and revised

The license for TueBa-D/Z is granted free of charge for academic use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml


With the best regards,

Prof. Dr. Erhard W. Hinrichs
Kathrin Beck
Heike Telljohann
Yannick Versley

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list