[Corpora-List] New Release of the T üBa-D/Z
Kathrin Beck
kathrin.beck at uni-tuebingen.de
Mon Nov 30 17:18:23 UTC 2009
The Department of Linguistics at the University of Tuebingen, Germany
=====================================================================
is happy to announce a new release (version 5.9)
of the "Tübingen Treebank of Written German" (TüBa-D/Z)
=======================================================
The TueBa-D/Z treebank is a manually annotated German newspaper corpus
based on data taken from the daily issues of the 'die tageszeitung'.
Apart from syntactic annotation, it also includes coreference annotation
at the NP level.
It currently comprises:
* 794,079 tokens
* 45,200 sentences
* 2,213 newspaper articles
The syntactic annotation scheme of the TueBa-D/Z distinguishes four
levels of syntactic constituency:
* the lexical level
* the phrasal level
* the level of topological fields
* the clausal level
* In addition to constituent structure, annotated trees contain
edge labels with grammatical functions.
All words are annotated with:
* inflectional morphology at the lexical level
* POS tags
All newspaper articles of the treebank have been enriched with anaphoric
and coreference relations referring to nominal and pronominal
antecedents. Linking relations include:
* coreferential (two NPs refer to the same extralinguistic referent)
* anaphoric/cataphoric (a definite pronoun refers to a contextual
antecedent)
* and other relations (split-antecedent, instance)
* as well as marking of inherent reflexive pronouns and expletive
pronouns.
The treebank is available in 3 different formats:
* NEGRA export format
* XML format
* Penn Treebank format
* joint syntactic and referential annotation is available in the
Export and ExportXML formats
What is new in the fifth release:
* about 9 000 additional sentences
* about 500 more articles with referential annotation
* cleaner versions of the trees published in the fourth release
* the entire referential annotation has been checked and revised
The license for TueBa-D/Z is granted free of charge for academic use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml
With the best regards,
Prof. Dr. Erhard W. Hinrichs
Kathrin Beck
Heike Telljohann
Yannick Versley
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list