20.4320, FYI: New Release of T üBa-D/Z, Version 5.0
linguist at LINGUISTLIST.ORG
linguist at LINGUISTLIST.ORG
Tue Dec 15 18:28:49 UTC 2009
LINGUIST List: Vol-20-4320. Tue Dec 15 2009. ISSN: 1068 - 4875.
Subject: 20.4320, FYI: New Release of TüBa-D/Z, Version 5.0
Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Monica Macaulay, U of Wisconsin-Madison
Eric Raimy, U of Wisconsin-Madison
Joseph Salmons, U of Wisconsin-Madison
Anja Wanner, U of Wisconsin-Madison
<reviews at linguistlist.org>
Homepage: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Danielle St. Jean <danielle at linguistlist.org>
================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
===========================Directory==============================
1)
Date: 30-Nov-2009
From: Kathrin Beck < kathrin.beck at uni-tuebingen.de >
Subject: New Release of TüBa-D/Z, Version 5.0
-------------------------Message 1 ----------------------------------
Date: Tue, 15 Dec 2009 13:26:44
From: Kathrin Beck [kathrin.beck at uni-tuebingen.de]
Subject: New Release of TüBa-D/Z, Version 5.0
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=20-4320.html&submissionid=2229434&topicid=6&msgnumber=1
The Department of Linguistics at the University of Tübingen, Germany is
happy to announce a new release (version 5.0) of the 'Tübingen Treebank of
Written German' (TüBa-D/Z).
The TüBa-D/Z Treebank is a manually annotated German newspaper corpus
based on data taken from the daily issues of the 'die tageszeitung.' Apart
from syntactic annotation, it also includes coreference annotation at the
NP level.
It currently comprises:
- 794,079 tokens
- 45,200 sentences
- 2,213 newspaper articles
The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels
of syntactic constituency:
- the lexical level
- the phrasal level
- the level of topological fields
- the clausal level
- In addition to constituent structure, annotated trees contain edge labels
with grammatical functions.
All words are annotated with:
- inflectional morphology at the lexical level
- POS tags
All newspaper articles of the treebank have been enriched with anaphoric
and coreference relations referring to nominal and pronominal antecedents.
Linking relations include:
- coreferential (two NPs refer to the same extralinguistic referent)
- anaphoric/cataphoric (a definite pronoun refers to a contextual
antecedent)
- and other relations (split-antecedent, instance)
- as well as marking of inherent reflexive pronouns and expletive pronouns.
The treebank is available in 3 different formats:
- NEGRA export format
- XML format
- Penn Treebank format
- joint syntactic and referential annotation is available in the Export
and ExportXML formats
What is new in the fifth release:
- about 9,000 additional sentences
- about 500 more articles with referential annotation
- cleaner versions of the trees published in the fourth release
- the entire referential annotation has been checked and revised
The license for TüBa-D/Z is granted free of charge for academic use. For
more information, please refer to:
http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml
With the best regards,
Prof. Dr. Erhard W. Hinrichs
Kathrin Beck
Heike Telljohann
Yannick Versley
Linguistic Field(s): Computational Linguistics
Discourse Analysis
Syntax
Text/Corpus Linguistics
-----------------------------------------------------------
LINGUIST List: Vol-20-4320
More information about the LINGUIST
mailing list