Corpora: COMPARA: the Portuguese-English parallel corpus

Mon Jan 8 15:21:01 UTC 2001

We are pleased to announce that COMPARA, the Portuguese-English parallel
corpus, is now available at http://www.portugues.mct.pt/COMPARA/Welcome.html

COMPARA is open-ended, freely available on the Web, and made for people who
have never used corpora before as well as for experienced corpus users.
COMPARA's criteria for text alignment allow corpus users to investigate
translational discourse changes such as when and where translators have
chosen to join, separate, delete, add and reorder sentences. Users can also
inspect translators' notes, and the corpus admits more than one translation
per source text.

Only six parallel fiction texts have been fully processed so far, but
permission has been obtained to incorporate many more.

COMPARA is encoded according to the IMS Corpus Workbench system, developed
at the University of Stuttgart, and is distributed on the WWW via the
DISPARA interface, developed by the Computational Processing of Portuguese
project.

We are grateful for any comments and suggestions regarding both COMPARA and
the DISPARA interface. We also welcome the collaboration of all those
interested in contributing to the COMPARA-DISPARA project.

Ana Frankenberg-Garcia & Diana Santos
compara at informatics.sintef.no