[Corpora-List] Announcement: MultiSemCor English/Italian parallel corpus

Luisa Bentivogli bentivo at itc.it
Fri Nov 5 18:17:06 UTC 2004


[Apologies to those of you who receive multiple copies of this
announcement]


We are pleased to announce the first release of the MultiSemCor corpus,
available for browsing and distribution at the web site:

http://multisemcor.itc.it

MultiSemCor is an English/Italian parallel corpus, developed at ITC-irst
by translating into Italian part of the SemCor corpus. English and
Italian texts have been automatically aligned at the word level and
SemCor semantic annotations have been transferred to Italian words. As a
result, MultiSemCor texts are semantically annotated with a shared
inventory of senses taken from the MultiWordNet lexical database
(http://multiwordnet.itc.it).

At present MultiSemCor is composed of 116 English texts along with their
corresponding 116 Italian translations, for a total of about 500,000
tokens.

The parallel texts and their annotations are freely consultable on the
Web through the MultiSemCor on-line interface, which amounts to both a
bilingual semantic concordancer and a bi-text browser. The MultiSemCor
and the MultiWordNet browsers are directly linked to each other.

Best regards,

The MultiSemCor Team

---
ITC-irst Centro per la Ricerca Scientifica e Tecnologica
Cognitive and Communication Technologies Divion
Via Sommarive, 18  38050 Povo - Trento ITALY
http://tcc.itc.it/



More information about the Corpora mailing list