[Corpora-List] starting a machine translation project

Kees Koster kees at cs.ru.nl
Wed Sep 13 09:08:24 UTC 2006


Dear Nano Surbakti,

Doing a natural language machine translation project is an ambitious
undertaking, especially when it involves the Indonesian language,
for which little resources exist. Your main problem will be to find
or construct good quality bilingual resources, and to achieve
reasonable quality using statistical techniques, large corpora are
needed.

As a possible alternative, I suggest that you use an existing
Dependency Grammar for English and combine it with a generator,
based on a translation component and a dependency grammar for
Indonesian (both to be written). Transfer can then take place
at the Dependency Triple level (which gives better quality
than the word or wordstring level). A public-domain grammar for
English can be found at www.cs.kun.nl/agfl.

If you are interested to take this approach I am willing to
elaborate on this idea and help you with it.

Friendly greetings,

  -- Cornelis H.A. Koster



More information about the Corpora mailing list