23.5334, FYI: New Release of the T=?UTF-8?Q?=C3=BCBa-D/Z_?=German Treebank
linguist at linguistlist.org
linguist at linguistlist.org
Tue Dec 18 18:06:22 UTC 2012
LINGUIST List: Vol-23-5334. Tue Dec 18 2012. ISSN: 1069 - 4875.
Subject: 23.5334, FYI: New Release of the TüBa-D/Z German Treebank
Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
<reviews at linguistlist.org>
Homepage: http://linguistlist.org
Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!
USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21
For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.
Editor for this issue: Brent Miller <brent at linguistlist.org>
================================================================
Date: Tue, 18 Dec 2012 13:06:18
From: Kathrin Beck [kathrin.beck at uni-tuebingen.de]
Subject: New Release of the TüBa-D/Z German Treebank
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-5334.html&submissionid=5861096&topicid=6&msgnumber=1
The Department of Linguistics of the University of Tuebingen (Germany) is
happy to announce the new release of a referentially and syntactically
annotated German corpus:
* The Tuebingen Treebank of Written German (TüBa-D/Z) - 8th release
The TueBa-D/Z treebank is a manually annotated German newspaper corpus based
on data taken from the daily issues of the 'die tageszeitung'. It currently
comprises approximately 75,000 sentences (ca. 1,300,000 words).
The syntactic annotation scheme of the TueBa-D/Z distinguishes four levels of
syntactic constituency: the lexical level, the phrasal level, the level of
topological fields, and the clausal level.
The treebank (about 3,200 newspaper articles) has been enriched with anaphoric
and coreference relations referring to nominal and pronominal antecedents.
Linking relations include: coreferential (two NPs refer to the same
extralinguistic referent), anaphoric/cataphoric (a definite pronoun
refers to a contextual antecedent) and other relations (split-antecedent,
instance) as well as marking of expletive pronouns.
For selected discourse connectives, the instances occurring in the treebank
have been annotated with the discourse relation(s) conveyed by the connective
instance. Portions of the treebank have been sense-annotated for the
connectives 'nachdem' (298 instances), 'während' (531 instances), 'sobald' (28
instances), 'seitdem' (13 instances), 'als' (169 instances), 'aber' (161
instances), and 'bevor' (119 instances).
Another annotation layer contains structural information as well as implicit
discourse relations for a subcorpus of 41 annotated newspaper articles
(21,817 tokens) with 1,458 (explicit and implicit) discourse relations.
The annotation comprises information on
* inflectional morphology
* lemmas
* syntactic constituency
* grammatical functions
* (complex) named entities incl. semantic classification
* anaphora and coreference relations
* dependency relations (automatically created)
* chunk annotation (automatically created)
The treebank is available in 5 different formats:
* NEGRA export format
* XML format (TigerXML and exportXML)
* Penn Treebank format
* CoNLL format
The license for TueBa-D/Z is granted free of charge for scientific use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tuebadz.html
With best regards,
Erhard W. Hinrichs
Kathrin Beck
Heike Telljohann
Yannick Versley
---
Kathrin Beck
Project Coordinator D-SPIN & CLARIN-D
Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19/ 2.31
72074 Tübingen
Germany
Tel.: +49-7071-29-73970
Fax: +49-7071-29-5214
E-Mail: kbeck at sfs.uni-tuebingen.de,
kathrin.beck at uni-tuebingen.de
Linguistic Field(s): Computational Linguistics
Discourse Analysis
Morphology
Syntax
Text/Corpus Linguistics
----------------------------------------------------------
LINGUIST List: Vol-23-5334
----------------------------------------------------------
More information about the LINGUIST
mailing list