19.1565, FYI: New Release of the TueBa-D/Z German Treebank

LINGUIST Network linguist at LINGUISTLIST.ORG
Thu May 15 14:11:43 UTC 2008


LINGUIST List: Vol-19-1565. Thu May 15 2008. ISSN: 1068 - 4875.

Subject: 19.1565, FYI: New Release of the TueBa-D/Z German Treebank

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Ann Sawyer <sawyer at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 14-May-2008
From: Kathrin Beck < kathrin.beck at uni-tuebingen.de >
Subject: New Release of the TueBa-D/Z German Treebank

 

	
-------------------------Message 1 ---------------------------------- 
Date: Thu, 15 May 2008 10:08:36
From: Kathrin Beck [kathrin.beck at uni-tuebingen.de]
Subject: New Release of the TueBa-D/Z German Treebank
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-1565.html&submissionid=178342&topicid=6&msgnumber=1  

The Division of Computational Linguistics at the Seminar fuer Sprachwissenschaft
of the University of Tuebingen (Germany) is happy to announce the release a
referentially and syntactically annotated German corpus:

* The Tuebingen Treebank of Written German (TueBa-D/Z) - fourth release

The TueBa-D/Z treebank is a manually annotated German newspaper corpus based on
data taken from the daily issues of the 'die tageszeitung'.  It currently
comprises approximately 36 000 sentences (ca. 640 000  words).

The syntactic annotation scheme of the TueBa-D/Z distinguishes four levels of
syntactic constituency: the lexical level, the phrasal level, the level of
topological fields, and the clausal level.  In addition to constituent
structure, annotated trees contain edge labels between nodes which encode
grammatical functions. Words are annotated with inflectional morphology at the
lexical level.

The treebank is available in 3 different formats:
    * NEGRA export format
    * XML format
    * Penn Treebank format

Currently, about 36 000 sentences of the treebank (about 1 700 articles) have
been enriched with anaphoric and coreference relations referring to nominal and
pronominal antecedents. Linking relations include: coreferential (two NPs refer
to the same extralinguistic referent), anaphoric/cataphoric (a definite pronoun
refers to a contextual antecedent) and other relations (split-antecedent,
instance) as well as marking of expletive pronouns.

The referential annotation is available in a unified representation of syntactic
and referential information, in the NEGRA Export and XML formats.

What is new in the fourth release:

- about 9 000 additional sentences
- about 600 more articles with referential annotation
- cleaner versions of the trees published in the third release

The license for TueBa-D/Z is granted free of charge for scientific use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

With best regards,

Erhard W. Hinrichs
Kathrin Beck
Yannick Versley
Holger Wunsch
Heike Zinsmeister 



Linguistic Field(s): Computational Linguistics
                     Discourse Analysis
                     Syntax
                     Text/Corpus Linguistics

Subject Language(s): German, Standard (deu)







-----------------------------------------------------------
LINGUIST List: Vol-19-1565	

	



More information about the LINGUIST mailing list