Corpora: Portuguese treebank

Eckhard Bick lineb at hum.au.dk
Thu Jan 10 11:52:59 UTC 2002


Hello everybody

We would like to announce the conclusion of the first 1-year-phase of
the Portuguese treebank project, "Floresta Sintá(c)tica". In addition to
project description, documentation, category definitions etc., a sampler
of ca. 1000 running text sentences (European Portuguese) is available:

- for download and searching at
http://cgi.portugues.mct.pt/treebank/PaginaFloresta.html
- for graphical tree inspection/manipulation at http://visl.sdu.dk

The sampler is a manually revised part of a larger tree corpus (1
million words), which was automatically annotated with the Constraint
Grammar based PALAVRAS parser and then converted into constituent trees.
This full version can also be searched.

The project is a joint venture of the VISL project (Southern Denmark
University) and the project "Computational Processing of Portuguese".
The Floresta team would welcome all kinds of feedback, suggestions etc.
to the list floresta at corpora.portugues.mct.pt


Best regards,

        Susana Afonso
        Eckhard Bick
        Renato Haber
        Raquel Marchi
        Diana Santos


--
Eckhard Bick,
cand.med., dr.phil.
SDU-Odense University, Denmark
e-mail: lineb at hum.au.dk
web: http://visl.sdu.dk



More information about the Corpora mailing list