Corpora: Corpora Q: Text length differences in parallel text

araceli.alonso at iula.upf.es araceli.alonso at iula.upf.es
Mon Oct 15 13:47:45 UTC 2001


Dear Mr. Steinberger:

I write to you on behalf of Dr. Lluís de Yzaguirre from the Institute
for Applied Linguistics (Institut Universitari de Lingüística Aplicada)
at the University Pompeu Fabra as we are working with parallel texts in
different languages (English, Spanish and Catalan). 
At the moment we are developing a text aligning system. Most aligners
are based on statistics and there are usually many problems when the
texts to be aligned are quite complex or not literally translated. We
have developed a system that benefits also from corpus processing, that
is, it is not only based on statistics. If you are interested in the
technique developed to create the system, you can find more information
at http://terminotica.upf.es/CREL/atenes.ps. 
Also at the following address
http://terminotica.upf.es/academic/ENES/Default.htm, you will find an
example of aligned texts in English-Spanish*. The texts have been
extracted from the book Capitalism, socialism and democracy by Joseph
Alois Schumpeter and its translation into Spanish . The English text has
72,621 words and the Spanish one has 93,858 words. This sample is not
meaningful but at the moment the system allows 100% sentence alignment
and 70% lexical alignment. 
The last version of the tests we are doing will be presented in fifteen
days at a Congress on Contrastive Linguistics at Santiago de Compostela.
If you are interested we can send you the communication after the
congress. 

If you need any more information, please do not doubt in contact us. 
Yours sincerely

Araceli Alonso 
Institut Universitari de Lingüística Aplicada 

*It is also available to see an example of aligned texts in other
languages, English-Catalan, Catalan-Spanish at the following addres:
http://terminotica.upf.es/academic/



More information about the Corpora mailing list