[Lexicog] German corpus linguistics

Hayim Sheynin hsheynin19444 at YAHOO.COM
Thu May 15 15:30:11 UTC 2008


I think it is of interest to some members of lexicography list that on today  Linguist List  was published  the following ad:


The Division of Computational Linguistics at the Seminar fuer
 Sprachwissenschaft
of the University of Tuebingen (Germany) is happy to announce the
 release a
referentially and syntactically annotated German corpus:

* The Tuebingen Treebank of Written German (TueBa-D/Z) - fourth release

The TueBa-D/Z treebank is a manually annotated German newspaper corpus
 based on
data taken from the daily issues of the 'die tageszeitung'.  It
 currently
comprises approximately 36 000 sentences (ca. 640 000  words).

The syntactic annotation scheme of the TueBa-D/Z distinguishes four
 levels of
syntactic constituency: the lexical level, the phrasal level, the level
 of
topological fields, and the clausal level.  In addition to constituent
structure, annotated trees contain edge labels between nodes which
 encode
grammatical functions. Words are annotated with inflectional morphology
 at the
lexical level.

The treebank is available in 3 different formats:
    * NEGRA export format
    * XML format
    * Penn Treebank format

Currently, about 36 000 sentences of the treebank (about 1 700
 articles) have
been enriched with anaphoric and coreference relations referring to
 nominal and
pronominal antecedents. Linking relations include: coreferential (two
 NPs refer
to the same extralinguistic referent), anaphoric/cataphoric (a definite
 pronoun
refers to a contextual antecedent) and other relations
 (split-antecedent,
instance) as well as marking of expletive pronouns.

The referential annotation is available in a unified representation of
 syntactic
and referential information, in the NEGRA Export and XML formats.

What is new in the fourth release:

- about 9 000 additional sentences
- about 600 more articles with referential annotation
- cleaner versions of the trees published in the third release

The license for TueBa-D/Z is granted free of charge for scientific use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

With best regards,

Erhard W. Hinrichs
Kathrin Beck
Yannick Versley
Holger Wunsch
Heike Zinsmeister 


E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-1565.html&submissionid=178342&topicid=6&msgnumber=1

Hayim Sheynin


Dr. Hayim Y. Sheynin
       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20080515/ec46faa1/attachment.htm>


More information about the Lexicography mailing list