29.2717, FYI: Release 11.0 of the TüBa-D/Z German Treebank

The LINGUIST List linguist at listserv.linguistlist.org
Fri Jun 29 15:18:09 UTC 2018


LINGUIST List: Vol-29-2717. Fri Jun 29 2018. ISSN: 1069 - 4875.

Subject: 29.2717, FYI: Release 11.0 of the TüBa-D/Z German Treebank

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================


Date: Fri, 29 Jun 2018 11:18:00
From: Marie Hinrichs [marie.hinrichs at uni-tuebingen.de]
Subject: Release 11.0 of the TüBa-D/Z German Treebank

 
The Tübingen Treebank of Written German (TüBa-D/Z) - Final Release 11.0 

The Department of Linguistics of the University of Tübingen (Germany) is
pleased to announce Release 11.0 of the TüBa-D/Z, a referentially and
syntactically annotated German corpus. In addition to the previously released
formats, this release also contains the treebank in an automatically converted
CoNLL-U format. This will be the FINAL release, although we would like to do
manual corrections of the CoNLL-U trees if possible.

This final release is dedicated to and in memory of Dr. Heike Telljohann. The
high quality of the treebank is largely owed to her commitment to the project,
diligence, and attention to detail over many years.

The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on
data taken from the daily issues of 'die tageszeitung'. It currently comprises
3,816 newspaper articles (104,787 sentences; 1,959,474 tokens).

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of
syntactic constituency (lexical, phrasal, clausal, topological fields) and
contains the following annotation layers:

- Inflectional morphology 
- Lemmas 
- Syntactic constituency 
- Grammatical functions 
- (complex) named entities including semantic classification 
- Anaphora and coreference relations 
- Discourse connectives (explicit and implicit, partial coverage) 
- GermaNet word senses 
- Dependency relations (automatically created) 
- Chunk annotation (automatically created)

New in this Release:

- An additional 172 articles (9,192 sentences; 171,673 tokens) have been
annotated.  
- STYLEBOOK: The annotation stylebook has been updated and can be found on the
webpage.
= CoNLL-U format, automatically generated

The license for TueBa-D/Z is granted free of charge for scientific use. For
more information, please visit the website at:
http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html  

Best regards,

Erhard W. Hinrichs
Marie Hinrichs

Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19
72074 Tübingen
Germany
 



Linguistic Field(s): Computational Linguistics
                     Discourse Analysis
                     Morphology
                     Syntax
                     Text/Corpus Linguistics

Subject Language(s): German (deu)





 



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:

              The IU Foundation Crowd Funding site:
       https://iufoundation.fundly.com/the-linguist-list

               The LINGUIST List FundDrive Page:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-29-2717	
----------------------------------------------------------






More information about the LINGUIST mailing list