ELL: re: Machine translation for Akha

Thu Jul 15 07:17:36 UTC 1999

    *** EOOH ***
    Return-Path: <owner-endangered-languages-l at carmen.murdoch.edu.au>
    X-Authentication-Warning: carmen.murdoch.edu.au: majodomo set sender to
    owner-endangered-languages-l at carmen.murdoch.edu.au using -f
    X-Sender: jeff!elda.fr at 192.168.1.1
    Date: Thu, 15 Jul 1999 09:17:36 +0200
    To: endangered-languages-l at carmen.murdoch.edu.au
    From: Jeff ALLEN <jeff at elda.fr>
    Subject: ELL: re: Machine translation for Akha
    In-Reply-To: <l03130308b3b24843ebe1@[129.242.176.187]>
    Content-Type: text/plain; charset="us-ascii"
    Sender: owner-endangered-languages-l at carmen.murdoch.edu.au
    Precedence: bulk
    Reply-To: endangered-languages-l at carmen.murdoch.edu.au

    At 16:36 14/07/99 +0200, Trond Trosterud wrote:
    >>Does anyone know how much work it would take to make an auto translator
    >>that could
    >>take an Akha text and make it into english or english into Akha?
    >>
    >>Matthew McDaniel
    >
    >
    >Very, very much work.
    >
    >An auto translator is in any case one of the last outcomes of a long
    range
    >of products of such a work, all of which are important to akha (and
    >conversley for other lgs) as well. You need:
    >
    >A good grammar of akha, or, conversely, a good understanding of what is
    >going on in akha.
    >
    >A large, large corpus of akha texts
    >
    >Good dictionaries in both directions, and terminological work to give
    good
    >matches for terms in both lgs.
    >
    >The corpora and dictionaries must also be available electronically, and
    in
    >formats appropriate for further software
    >
>Then you need morphological an syntactic parsers for Akha (and for English,
>but they are available). The morphology part will probably not be too hard
>(see below), but it must be done. As for syntax I do not know, it depends
>upon whether the major functional categories are uniquely identified or
>whether you have ambigous constructions depending upon context for parsing.
>

Trond, actually the morphological and syntactic parsers are only necessary for
the transfer- and knowledge-based machine translation systems, but these
parsers are not necessary if one develops example-based bidirectional Machine
translation systems like we did at the Center for Machine Translation of
Carnegie Mellon University that is explained in the following papers:

ALLEN, Jeffrey and Christopher HOGAN. 1998.  Expanding lexical coverage of
parallel corpora for the Example-Based Machine Translation approach. In
Proceedings of the First International Conference on Language Resources and
Evaluation, 28-30 May 1998, Granada, Spain. Vol. 2, pp. 747-754.

ESKENAZI, Maxine, HOGAN, Christopher, ALLEN, Jeffrey, and Robert FREDERKING.
1997. Issues in database creation : recording new populations, faster and
better labelling. In Proceedings of Eurospeech97. Vol. 4: 1699-1702.
Conference
held in Rhodes, Greece, 22-25 September 1997.

ESKENAZI, Maxine, HOGAN, Christopher, ALLEN, Jeffrey, and Robert FREDERKING.
1998. Issues in database design: recording and processing speech from new
populations  (poster session). In Proceedings of the First International
Conference on Language Resources and Evaluation, 28-30 May 1998, Granada,
Spain. Vol. 2, pp. 1289-1293.

LENZO, Kevin, HOGAN, Christopher, and Jeffrey ALLEN. 1998.  Rapid-Deployment
Text-to-Speech in the DIPLOMAT System.  Poster presented at the International
Conference on Spoken Language Processing.  30 November - 4 December 1998,
Sydney, Australia.

as well as possibly other papers that are mentioned at
www.lti.cs.cmu.edu/Research/Diplomat/

There are lots of other references to Example-based MT system development.
The
CRL institute at New Mexico State University is developing a similar system
called Boas or Expedition (I can't remember which is the EBMT system).

EBMT is simple a translation memory-based application, with parallel texts and
lexicons/ica as the data. such databases are quite easy to put together.  The
problem is finding competent bilingual people to do it, and having someone
build the EBMT system.  Much less complicated than the traditional MT system,
but then EBMT systems are not aimed at producing high quality translated
output
texts.  It is for getting the gist of something from the other language.  Good
for reading web pages in other language, basic hand-held translators for
tourists, etc.

With the Hansard's French-English parallel corpus, we built a prototype
English<-->French system in a total of 48 hours (yes 48 hours, including the
time the processor was cranking overnight). It was not high quality, but gave
understandable results on queries.
I supervised a translation lab where we built full speech-to-speech systems in
under one year.  I am currently advising a project where similar methods (very
inexpensive procedures) can be used to quickly put prototype systems together.

>As a next step, I would look for existing translation programs between
>English and lgs typologically similar to akha, and take as much advice as
>possible from them.

Actually, typology is not too much of a problem with EBMT. We created a Korean
<--> English system with the basic EBMT engine, and developed a special Noun
Phrase flipper to deal with the typological differences.  Much less work
than a
transfer-based system.

>What you need now is a couple of computational linguists who are curious
>about akha and willing to try something out, and/or a commersial
>translation software company that may be interested in extending its
>coverage to akha as well, provided someone does the basic work for them
>(assuming that the commersial marked is small).

I can find the computational linguists. What I would need are native speakers
of Akha who are bilingual Akha - English to do the database compilation work.

Best,

Jeff

=================================================
Jeff ALLEN - Technical Manager/Directeur Technique
European Language Resources Association (ELRA)  &
European Language resources - Distribution Agency (ELDA)
(Agence Europe'enne de Distribution des Ressources Linguistiques)
55, rue Brillat-Savarin
75013   Paris   FRANCE
Tel: (+33) 1.43.13.33.33 - Fax: (+33) 1.43.13.33.30
mailto:jeff at elda.fr
http://www.icp.grenet.fr/ELRA/home.html
----
Endangered-Languages-L Forum: endangered-languages-l at carmen.murdoch.edu.au
Web pages http://carmen.murdoch.edu.au/lists/endangered-languages-l/
Subscribe/unsubscribe and other commands: majordomo at carmen.murdoch.edu.au
----