[Corpora-List] starting a machine translation project

Nano Surbakti nano.surbakti at gmail.com
Wed Sep 13 10:58:28 UTC 2006


Hi,

On 9/13/06, Philipp Koehn <pkoehn at inf.ed.ac.uk> wrote:
> While documentation is written with the assumption of foreign-English
> translation, you may use it for any language direction. We have built
> many MT systems with target languages other than English.
Great..!!

> Given the parallel corpus, about a day :)
> Practically there will be many issues in preparing the data
> in appropiate form etc. There may be spelling and font issues
> with Indonesian, you may not have the data in the required
> setence-aligned format, getting familiar with the tools may take
> a while...
Based on your experience, is it a minimum number of words or sentences
in a corpus to produce a basic translation service? If the purpose is
for daily language use, is it enough to use an English-Indonesian
Bible as a corpus?

>
> Regards,
> Philipp Koehn

Thanks for helping,
--
Nano Surbakti



More information about the Corpora mailing list