20.4320, FYI: New Release of T üBa-D/Z, Version 5.0

linguist at LINGUISTLIST.ORG linguist at LINGUISTLIST.ORG
Tue Dec 15 18:28:49 UTC 2009


LINGUIST List: Vol-20-4320. Tue Dec 15 2009. ISSN: 1068 - 4875.

Subject: 20.4320, FYI: New Release of TüBa-D/Z, Version 5.0

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Monica Macaulay, U of Wisconsin-Madison  
Eric Raimy, U of Wisconsin-Madison  
Joseph Salmons, U of Wisconsin-Madison  
Anja Wanner, U of Wisconsin-Madison  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Danielle St. Jean <danielle at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 30-Nov-2009
From: Kathrin Beck < kathrin.beck at uni-tuebingen.de >
Subject: New Release of TüBa-D/Z, Version 5.0
 

	
-------------------------Message 1 ---------------------------------- 
Date: Tue, 15 Dec 2009 13:26:44
From: Kathrin Beck [kathrin.beck at uni-tuebingen.de]
Subject: New Release of TüBa-D/Z, Version 5.0

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=20-4320.html&submissionid=2229434&topicid=6&msgnumber=1
  


The Department of Linguistics at the University of Tübingen, Germany is
happy to announce a new release (version 5.0) of the 'Tübingen Treebank of
Written German' (TüBa-D/Z).

The TüBa-D/Z Treebank is a manually annotated German newspaper corpus
based on data taken from the daily issues of the 'die tageszeitung.' Apart
from syntactic annotation, it also includes coreference annotation at the
NP level.

It currently comprises:
- 794,079 tokens
- 45,200 sentences
- 2,213 newspaper articles

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels
of syntactic constituency: 
- the lexical level
- the phrasal level
- the level of topological fields
- the clausal level
- In addition to constituent structure, annotated trees contain edge labels
with grammatical functions.

All words are annotated with:
- inflectional morphology at the lexical level
- POS tags

All newspaper articles of the treebank have been enriched with anaphoric
and coreference relations referring to nominal and pronominal antecedents.
Linking relations include:
- coreferential (two NPs refer to the same extralinguistic referent)
- anaphoric/cataphoric (a definite pronoun refers to a contextual
antecedent)
- and other relations (split-antecedent, instance)
- as well as marking of inherent reflexive pronouns and expletive pronouns.

The treebank is available in 3 different formats:
- NEGRA export format
- XML format
- Penn Treebank format
- joint syntactic and referential annotation is available in the Export
and ExportXML formats

What is new in the fifth release:
- about 9,000 additional sentences
- about 500 more articles with referential annotation
- cleaner versions of the trees published in the fourth release
- the entire referential annotation has been checked and revised

The license for TüBa-D/Z is granted free of charge for academic use. For
more information, please refer to:
http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml

With the best regards,

Prof. Dr. Erhard W. Hinrichs
Kathrin Beck
Heike Telljohann
Yannick Versley 



Linguistic Field(s): Computational Linguistics
                     Discourse Analysis
                     Syntax
                     Text/Corpus Linguistics





 




-----------------------------------------------------------
LINGUIST List: Vol-20-4320	

	



More information about the LINGUIST mailing list