[Corpora-List] Release 2.0 of SMULTRON

Torsten Marek marek at ifi.uzh.ch
Wed Dec 2 14:15:42 UTC 2009


Dear all,

the Parallel Treebank Group at the Institute of Computational
Linguistics at the University of Zürich is proud to announce the
availability of a new release for SMULTRON, an aligned parallel treebank.

SMULTRON (Stockholm MULtilingual TReebank) is a parallel treebank which 
contains around 1500 sentences in English, German and Swedish. The
sentences have been PoS-tagged and annotated with phrase structure
trees. The trees have been aligned across languages on sentence, phrase
and word level. Additionally, the German and Swedish monolingual
treebanks contain lemma information. The SMULTRON corpus is freely
available for research purposes, please see the registration page[0]. 

Changes in version 2.0:

* Updated Economy-DE/EN corpora
  - errors in syntactic structure fixed
  - added alignments for sentence which were previously 
    impossible to annotate due to technical restrictions
  - fixed various other alignment errors

* New corpus: DVD Manual Text
  - English, German and Swedish (~500 sentences each)
  - alignments for DE-EN and EN-SV language pairs


For viewing and searching through the treebanks in this release of
SMULTRON, you should use the latest version (1.2) of the TreeAligner,
our tool for annotating, browsing and searching parallel treebanks[1].



With best regards,


Torsten


[0] http://www.cl.uzh.ch/kitt/smultron/
[1] http://www.cl.uzh.ch/kitt/treealigner/wiki/TreeAlignerDownload


-- 
.: Torsten Marek
.: University of Zurich
.: Institute of Computational Linguistics
.: http://www.cl.uzh.ch/en/tmarek.html




_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list