[Corpora-List] starting a machine translation project

zhang min mzhang at i2r.a-star.edu.sg
Wed Sep 13 08:35:00 UTC 2006


Yes, you have to provide English-to-Indonesian bilingual corpus to do SMT
training. As I know, there is not existing such kind of parallel corpus and
it is a quite tough job to construct this corpus.

Does anyone know where we can get English-to-Indonesian bilingual corpus? 

Cheers,

Zhang Min

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Nano Surbakti
Sent: 2006年9月13日 16:26
To: CORPORA at UIB.NO
Subject: [Corpora-List] starting a machine translation project

Hi,

We want to start an English-Indonesian MT project. We found that
there is an opensource MT toolkit, "Moses", in http://www.statmt.org

I don't know much about machine translation. From some articles I've
been reading, it looks like Statistical translation method is a rather
easy but yet produce a reasonable result.

I got some newbie-like questions:
- Our main purpose is to make an opensource English-to-Indonesian MT,
can we use Moses for this purpose, or perhaps Moses is specific for
Foreign-to-English translation only?
- AFAIK, we have to provide bilingual corpus to do the statistical
training. Some articles mentioned about "phrase translation". Do we
need to provide some kind of phrase table, or perhaps it is generated
automatically by a special program?
- If we can't use Moses, do you have some guidance for us, perhaps
like some pointers to opensource toolkit?
- As a rough prediction, how many months is it going take to develop
an "early-version" of English-to-ForeignLanguage MT ?


Regards,

--
Nano Surbakti
(sorry if you got double posting)


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------



More information about the Corpora mailing list