[Lexicog] German corpus linguistics
Hayim Sheynin
hsheynin19444 at YAHOO.COM
Thu May 15 15:30:11 UTC 2008
I think it is of interest to some members of lexicography list that on today Linguist List was published the following ad:
The Division of Computational Linguistics at the Seminar fuer
Sprachwissenschaft
of the University of Tuebingen (Germany) is happy to announce the
release a
referentially and syntactically annotated German corpus:
* The Tuebingen Treebank of Written German (TueBa-D/Z) - fourth release
The TueBa-D/Z treebank is a manually annotated German newspaper corpus
based on
data taken from the daily issues of the 'die tageszeitung'. It
currently
comprises approximately 36 000 sentences (ca. 640 000 words).
The syntactic annotation scheme of the TueBa-D/Z distinguishes four
levels of
syntactic constituency: the lexical level, the phrasal level, the level
of
topological fields, and the clausal level. In addition to constituent
structure, annotated trees contain edge labels between nodes which
encode
grammatical functions. Words are annotated with inflectional morphology
at the
lexical level.
The treebank is available in 3 different formats:
* NEGRA export format
* XML format
* Penn Treebank format
Currently, about 36 000 sentences of the treebank (about 1 700
articles) have
been enriched with anaphoric and coreference relations referring to
nominal and
pronominal antecedents. Linking relations include: coreferential (two
NPs refer
to the same extralinguistic referent), anaphoric/cataphoric (a definite
pronoun
refers to a contextual antecedent) and other relations
(split-antecedent,
instance) as well as marking of expletive pronouns.
The referential annotation is available in a unified representation of
syntactic
and referential information, in the NEGRA Export and XML formats.
What is new in the fourth release:
- about 9 000 additional sentences
- about 600 more articles with referential annotation
- cleaner versions of the trees published in the third release
The license for TueBa-D/Z is granted free of charge for scientific use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml
With best regards,
Erhard W. Hinrichs
Kathrin Beck
Yannick Versley
Holger Wunsch
Heike Zinsmeister
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-1565.html&submissionid=178342&topicid=6&msgnumber=1
Hayim Sheynin
Dr. Hayim Y. Sheynin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20080515/ec46faa1/attachment.htm>
More information about the Lexicography
mailing list