29.2717, FYI: Release 11.0 of the TüBa-D/Z German Treebank
The LINGUIST List
linguist at listserv.linguistlist.org
Fri Jun 29 15:18:09 UTC 2018
LINGUIST List: Vol-29-2717. Fri Jun 29 2018. ISSN: 1069 - 4875.
Subject: 29.2717, FYI: Release 11.0 of the TüBa-D/Z German Treebank
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté)
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================
Date: Fri, 29 Jun 2018 11:18:00
From: Marie Hinrichs [marie.hinrichs at uni-tuebingen.de]
Subject: Release 11.0 of the TüBa-D/Z German Treebank
The Tübingen Treebank of Written German (TüBa-D/Z) - Final Release 11.0
The Department of Linguistics of the University of Tübingen (Germany) is
pleased to announce Release 11.0 of the TüBa-D/Z, a referentially and
syntactically annotated German corpus. In addition to the previously released
formats, this release also contains the treebank in an automatically converted
CoNLL-U format. This will be the FINAL release, although we would like to do
manual corrections of the CoNLL-U trees if possible.
This final release is dedicated to and in memory of Dr. Heike Telljohann. The
high quality of the treebank is largely owed to her commitment to the project,
diligence, and attention to detail over many years.
The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on
data taken from the daily issues of 'die tageszeitung'. It currently comprises
3,816 newspaper articles (104,787 sentences; 1,959,474 tokens).
The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of
syntactic constituency (lexical, phrasal, clausal, topological fields) and
contains the following annotation layers:
- Inflectional morphology
- Lemmas
- Syntactic constituency
- Grammatical functions
- (complex) named entities including semantic classification
- Anaphora and coreference relations
- Discourse connectives (explicit and implicit, partial coverage)
- GermaNet word senses
- Dependency relations (automatically created)
- Chunk annotation (automatically created)
New in this Release:
- An additional 172 articles (9,192 sentences; 171,673 tokens) have been
annotated.
- STYLEBOOK: The annotation stylebook has been updated and can be found on the
webpage.
= CoNLL-U format, automatically generated
The license for TueBa-D/Z is granted free of charge for scientific use. For
more information, please visit the website at:
http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html
Best regards,
Erhard W. Hinrichs
Marie Hinrichs
Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19
72074 Tübingen
Germany
Linguistic Field(s): Computational Linguistics
Discourse Analysis
Morphology
Syntax
Text/Corpus Linguistics
Subject Language(s): German (deu)
------------------------------------------------------------------------------
***************** LINGUIST List Support *****************
Please support the LL editors and operation with a donation at:
The IU Foundation Crowd Funding site:
https://iufoundation.fundly.com/the-linguist-list
The LINGUIST List FundDrive Page:
http://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-29-2717
----------------------------------------------------------
More information about the LINGUIST
mailing list