26.3634, FYI: Release 10.0 of the TüBa-D/Z German Treebank

The LINGUIST List via LINGUIST linguist at listserv.linguistlist.org
Fri Aug 14 02:37:57 UTC 2015


LINGUIST List: Vol-26-3634. Thu Aug 13 2015. ISSN: 1069 - 4875.

Subject: 26.3634, FYI:  Release 10.0 of the TüBa-D/Z German Treebank

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
              http://funddrive.linguistlist.org/donate/

Editor for this issue: Ashley Parker <ashley at linguistlist.org>
================================================================


Date: Thu, 13 Aug 2015 22:37:15
From: Marie Hinrichs [marie.hinrichs at uni-tuebingen.de]
Subject: Release 10.0 of the TüBa-D/Z German Treebank

 The Department of Linguistics of the University of Tübingen (Germany) is pleased to announce Release 10.0 of the TüBa-D/Z, a referentially and syntactically annotated German corpus.

The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of 'die tageszeitung.' It currently comprises 3,644 newspaper articles (95,595 sentences; 1,787,801 tokens

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency (lexical, phrasal, clausal, topological fields) and contains the following annotation layers:

- inflectional morphology 
- lemmas 
- syntactic constituency 
- grammatical functions 
- (complex) named entities including semantic classification 
- anaphora and coreference relations 
- discourse connectives (explicit and implicit, partial coverage) 
- GermaNet word senses 
- dependency relations (automatically created) 
- chunk annotation (automatically created)

New in this release:

- An additional 200 articles (10,237 sentences; 217,885 tokens) have been annotated. 
- STYLEBOOK: The annotation stylebook has been updated and can be found on the webpage.
- Also included (since minor Release 9.1) are 17,910 manual annotations of a selected set of lemmas (30 nouns, 79 verbs) with their corresponding senses in the German wordnet GermaNet with the goal of providing a gold standard for word sense disambiguation.

The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please visit the website at:
http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html 

Best Regards,
Erhard W. Hinrichs
Heike Telljohann
Marie Hinrichs
--
Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19
72074 Tübingen
Germany
 
Linguistic Field(s): Computational Linguistics
                     Discourse Analysis
                     Morphology
                     Syntax
                     Text/Corpus Linguistics

Subject Language(s): German (deu)

Language Family(ies): Germanic



----------------------------------------------------------
LINGUIST List: Vol-26-3634	
----------------------------------------------------------







More information about the LINGUIST mailing list