17.2061, FYI: German Treebank - Now with Anaphora
    linguist at LINGUISTLIST.ORG 
    linguist at LINGUISTLIST.ORG
       
    Fri Jul 14 19:44:04 UTC 2006
    
    
  
LINGUIST List: Vol-17-2061. Fri Jul 14 2006. ISSN: 1068 - 4875.
Subject: 17.2061, FYI: German Treebank - Now with Anaphora
Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews (reviews at linguistlist.org) 
        Laura Welcher, Rosetta Project / Long Now Foundation  
Homepage: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.
Editor for this issue: Kevin Burrows <kevin at linguistlist.org>
================================================================  
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
===========================Directory==============================  
1)
Date: 14-Jul-2006
From: Yannick Versley < versley at sfs.uni-tuebingen.de >
Subject: German Treebank - Now with Anaphora 
	
-------------------------Message 1 ---------------------------------- 
Date: Fri, 14 Jul 2006 15:40:30
From: Yannick Versley < versley at sfs.uni-tuebingen.de >
Subject: German Treebank - Now with Anaphora 
 
The Division of Computational Linguistics at the Seminar fuer 
Sprachwissenschaft of the University of Tuebingen (Germany) is happy to
announce the release a referentially and syntactically annotated German corpus:
- The Tuebingen Treebank of Written German (TueBa-D/Z) - third release
The TueBa-D/Z treebank is a manually annotated German newspaper
corpus based on data taken from the daily issues of the 'die tageszeitung'.
It currently comprises approximately 27 000 sentences (ca. 470 000 words).
The syntactic annotation scheme of the TueBa-D/Z distinguishes four levels
of syntactic constituency: the lexical level, the phrasal level,
the level of topological fields, and the clausal level.
In addition to constituent structure, annotated trees contain edge labels
between nodes which encode grammatical functions.
Words are annotated with inflectional morphology at the lexical level
(currently ca. 80% of the sentences are covered).
The treebank is available in 3 different formats:
   - NEGRA export format
   i XML format
   i Penn Treebank format
Currently, about 23 500 sentences of the treebank (about 1 100 articles) have 
been enriched with anaphoric and coreference relations referring to nominal 
and pronominal antecedents.
Linking relations include: coreferential (two NPs refer to the same 
extralinguistic referent), anaphoric/cataphoric (a definite pronoun refers to 
a contextual antecedent) and other relations (split-antecedent, instance) as 
well as marking of expletive pronouns.
The referentially annotation is available in a stand-alone version, which is 
in the PALinkA format, or with a unified representation of syntactic and 
referential information, in the NEGRA Export and XML formats.
What is new in the third release:
- about 5 000 additional sentences
- referential annotation
- cleaner versions of the trees published in the first/second release
The license for TueBa-D/Z is granted free of charge for scientific use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml
With best regards,
Erhard W. Hinrichs
Sandra Kübler
Heike Zinsmeister
Karin Naumann
Holger Wunsch
Yannick Versley 
Linguistic Field(s): Computational Linguistics
                     Syntax
                     Text/Corpus Linguistics
 
-----------------------------------------------------------
LINGUIST List: Vol-17-2061	
	
    
    
More information about the LINGUIST
mailing list