[Corpora-List] 2 new treebanks at University of Tübingen

Sandra Kübler kuebler at sfs.uni-tuebingen.de
Thu Dec 18 16:16:16 UTC 2003


The Division of Computational Linguistics at the Seminar fuer
Sprachwissenschaft of the University of Tuebingen (Germany) is happy
to announce the release of two new German language resources:

1.  The Tuebingen Treebank of Written German (TueBa-D/Z)

The TueBa-D/Z treebank is a manually annotated, German newspaper corpus
based on data taken from the daily issues of the 'die tageszeitung'
(taz) ranging from May 3rd to May 7th 1999. The annotation
scheme distinguishes four levels of syntactic constituency: the
lexical level, the phrasal level, the level of topological fields, and
the clausal level.  In addition to constituent structure, annotated
trees contain edge labels between node labels which encode grammatical
functions.

The treebank currently comprises approximately 15 000 sentences
(ca. 260 000 words).

The license for TueBa-D/Z is granted free of charge for scientific
use.  For more information, please refer to:
http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml


2. The Tuebingen Partially Parsed Corpus of Written German (TuePP-D/Z)

TuePP-D/Z is a collection of articles from the taz newspaper which have
been automatically annotated with clause structure, topological
fields, and chunks, in addition to more low level annotation including
parts of speech and morphological ambiguity classes. All texts are
processed automatically, starting from paragraph, sentence and token
segmentation. Tokens include information about some regular types of
named entities, including dates, telephone numbers, and number/unit
combinations.

The TuePP-D/Z data are based on taz newspaper articles from September
2, 1986 up to May 7, 1999, consisting of more than 200 million word
tokens.

The license for TuePP-D/Z is granted at a nominal fee (covering cost
of DVD and postage) for scientific use.  For more information, please
refer to: http://www.sfs.uni-tuebingen.de/en_tuepp.shtml


********************************************************************

We invite you to visit our web site and browse the resources and tools
of the SfS:

http://www.sfs.uni-tuebingen.de/en_nf_asc_resources.shtml



More information about the Corpora mailing list