[Corpora-List] starting a machine translation project

Felipe Sánchez Martínez fsanchez at dlsi.ua.es
Wed Sep 13 10:01:08 UTC 2006


Hi,

The Transducens Group at the University of Alicante, under the
supervision of Mikel L. Forcada,   has been working on an open-source MT
engine for related languages (such as Portuguese and Spanish) called
Apertium (http://apertium.sourceforge.net). Right now we are working to
enhance the MT architecture so as to deal with less related language
pairs like Catalan<->English or Spanish<->English.

Apertium is a rule-based MT system. So It is not necessary to provide
the system with parallel corpora, but monolingual and bilingual
dictionaries, and structural transfer rules. 

Perhaps you could start with the apertium as is, because by the end this
year there will be available a more powerful engine that will be
compatible with the data you develop for the current version of
apertium.

Please, feel free to contact my thesis advisor (Mikel L. Forcada,
mlf at dlsi.ua.es); we are interested on Indonesian-to-English and on
Indonesian-to-Malay.

Regards,

-- 
Felipe Sánchez Martínez
-------------------------------------------------------------------
Departamento de Lenguajes       E-mail: fsanchez at dlsi.ua.es
y Sistemas Informáticos       Homepage: www.dlsi.ua.es/~fsanchez
Universidad de Alicante            Fax: +34 965 90 93 26
E-03071 Alicante (Spain)         Phone: +34 965 90 34 00, ext: 2038



El mié, 13-09-2006 a las 15:26 +0700, Nano Surbakti escribió:
> Hi,
> 
> We want to start an English-Indonesian MT project. We found that
> there is an opensource MT toolkit, "Moses", in http://www.statmt.org
> 
> I don't know much about machine translation. From some articles I've
> been reading, it looks like Statistical translation method is a rather
> easy but yet produce a reasonable result.
> 
> I got some newbie-like questions:
> - Our main purpose is to make an opensource English-to-Indonesian MT,
> can we use Moses for this purpose, or perhaps Moses is specific for
> Foreign-to-English translation only?
> - AFAIK, we have to provide bilingual corpus to do the statistical
> training. Some articles mentioned about "phrase translation". Do we
> need to provide some kind of phrase table, or perhaps it is generated
> automatically by a special program?
> - If we can't use Moses, do you have some guidance for us, perhaps
> like some pointers to opensource toolkit?
> - As a rough prediction, how many months is it going take to develop
> an "early-version" of English-to-ForeignLanguage MT ?
> 
> 
> Regards,
> 
> --
> Nano Surbakti
> (sorry if you got double posting)



More information about the Corpora mailing list