26.3634, FYI: Release 10.0 of the TüBa-D/Z German Treebank
The LINGUIST List via LINGUIST
linguist at listserv.linguistlist.org
Fri Aug 14 02:37:57 UTC 2015
LINGUIST List: Vol-26-3634. Thu Aug 13 2015. ISSN: 1069 - 4875.
Subject: 26.3634, FYI: Release 10.0 of the TüBa-D/Z German Treebank
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org
***************** LINGUIST List Support *****************
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
Editor for this issue: Ashley Parker <ashley at linguistlist.org>
================================================================
Date: Thu, 13 Aug 2015 22:37:15
From: Marie Hinrichs [marie.hinrichs at uni-tuebingen.de]
Subject: Release 10.0 of the TüBa-D/Z German Treebank
The Department of Linguistics of the University of Tübingen (Germany) is pleased to announce Release 10.0 of the TüBa-D/Z, a referentially and syntactically annotated German corpus.
The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of 'die tageszeitung.' It currently comprises 3,644 newspaper articles (95,595 sentences; 1,787,801 tokens
The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency (lexical, phrasal, clausal, topological fields) and contains the following annotation layers:
- inflectional morphology
- lemmas
- syntactic constituency
- grammatical functions
- (complex) named entities including semantic classification
- anaphora and coreference relations
- discourse connectives (explicit and implicit, partial coverage)
- GermaNet word senses
- dependency relations (automatically created)
- chunk annotation (automatically created)
New in this release:
- An additional 200 articles (10,237 sentences; 217,885 tokens) have been annotated.
- STYLEBOOK: The annotation stylebook has been updated and can be found on the webpage.
- Also included (since minor Release 9.1) are 17,910 manual annotations of a selected set of lemmas (30 nouns, 79 verbs) with their corresponding senses in the German wordnet GermaNet with the goal of providing a gold standard for word sense disambiguation.
The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please visit the website at:
http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html
Best Regards,
Erhard W. Hinrichs
Heike Telljohann
Marie Hinrichs
--
Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19
72074 Tübingen
Germany
Linguistic Field(s): Computational Linguistics
Discourse Analysis
Morphology
Syntax
Text/Corpus Linguistics
Subject Language(s): German (deu)
Language Family(ies): Germanic
----------------------------------------------------------
LINGUIST List: Vol-26-3634
----------------------------------------------------------
More information about the LINGUIST
mailing list