ELL: update on the Transparent Language Systems offer

Jeff ALLEN jeff at elda.fr
Fri Apr 2 09:09:12 UTC 1999


id RAA07138
To: owner-endangered-languages-l at carmen.murdoch.edu.au
Precedence: bulk
Reply-To: endangered-languages-l at carmen.murdoch.edu.au

*** EOOH ***
Return-Path: <owner-endangered-languages-l at carmen.murdoch.edu.au>
X-Authentication-Warning: carmen.murdoch.edu.au: majodomo set sender to
owner-endangered-languages-l at carmen.murdoch.edu.au using -f
X-Sender: jeff!elda.fr at 192.168.1.1
Date: Fri, 02 Apr 1999 11:09:12 +0200
To: endangered-languages-l at carmen.murdoch.edu.au
From: Jeff ALLEN <jeff at elda.fr>
Subject: ELL: update on the Transparent Language Systems offer
In-Reply-To: <000101be7c7a$7fb1b6c0$511df5c7 at yak.las>
Content-Type: text/plain; charset="iso-8859-1"
X-MIME-Autoconverted: from quoted-printable to 8bit by carmen.murdoch.edu.au
id RAA07138
Sender: owner-endangered-languages-l at carmen.murdoch.edu.au
Precedence: bulk
Reply-To: endangered-languages-l at carmen.murdoch.edu.au

At 15:01 01/04/99 -0500, you wrote:
>I went to the Transparent website and read a little more about TLS's offer
>to provide templates for producing endangered-language tutorials and was
>disappointed to discover that TLS wants $100,000 per language. This is the
>amount of money a group will have to come up with for the TLS templates
>alone. In addition, some monies will be necessary to fund for designing the
>lessons, gathering the texts, producing the recordings, etc. So it doesn't
>sound like the great deal it did at first.
>
>What do others think about this? I know that many on the list don't feel the
>technology would be very beneficial to the speech communities, anyway. But
>I'd be interested in hearing some opinions. Am I wrong to think that
>$100,000 is an awful lot of money?

Dave et al.,

Just a comment from a person who has worked in the field of machine
translation,
translation technologies, and related fields for both research and commercial
products.

$100 000 sounds like a lot of money to you, but let me give you some figures
that are known and discussed in machine translation conferences and meetings.
I have been told by other people that they know that Caterpillar, one of
the first
industrial corporations to fund the development of a machine translation
system
for their entire technical publications dept (technical writing,
translation, parts
books, etc) for an entire spectrum of about 10 types of manuals, spent 20
million
dollars on the development of the system and the myriad of related
applications.
I, in fact, worked on that project for 2 years and would certainly not
refute this cost figure.

In the panel discussion on Developing Systems for Neglected Languages at
the last
conference of the Association for Machine Translation in the Americas (Oct
98), a
session that I personally know inspired this new Endangered Language promotion
through Transparent, all of the major MT companies (Systran, Logos,
Globalink, etc)
all said that to develop a system for a new language pair (I am talking
about a
rule-based knowledge MT system that maps rules from one system to another and
does not happen through magical means), it would cost anywhere from 200 000 to
1 million dollars.  I met with a project manager in a major company (I
won't say which
one, but you would all recognize it) a few years ago to discuss the
possibility of
creating a machine translation project in his company where I would be a
project
manager.   When we discussed the figures, he said that he could possibility
spend 1 million dollars, but not 20 million, not even 10.   His discussions
with
system developers, both academic and commercial, ended up in him deciding not
to pursue the project.

The development of "Knowledge-based" and "Rule-based" systems is a very time
consuming process.  It requires expert programmers who are not cheap to hire.
I hired such programmers when I was the Translation Lab Supervisor and
Research Linguist at the Center for Machine Translation of Carnegie Mellon
University for 2 years.    These rule-based systems usually require a minimum
of 2 to 5 years of constant work.   From the last figures I have heard,
Systran
has invested over 100 000 man hours of work in to their general system, and it
is one of the better ones (Note: I did not say perfect, I said "better").

So, for a rule-based MT system, 100 000 is not too expensive.  Their margin of
profit is actually quite low.  The templates have to be modified, and this
can be
quite complicated for highly agglutinative languages or other language with
other
interesting linguistic issue.   I worked on an Arabic translation system,
and it
was very difficult to deal with the infixed morphemes.  I have also worked on
Korean, and that required the development of some special part of speech
bracketers and syntactic flipping algorithms.  It's not easy.

Now, if you want to create an "Example-based" MT system that is based on
the principles
of Translation Memory (I have written several articles on the topic and can
send them
by e-mail to anyone who is interested), this is a much less expensive approach
and can be done relatively easily with a small team of native-speakers.   I
supervised
the Haitian Creole and Korean teams of an EBMT project at Carnegie Mellon and
am very aware of the costs to develop both text and speech databases, in a
very short
period of time.    

They are actually 2 or 3 university research centers that would be very
interested in helping
develop EBMT systems for minority languages, for significantly less money
than the
big companies are promoting.   It still will cost money because one must
hire the
native speakers, hire the developers, etc.   The universities usually try
to get government
grants for the development of such projects, but this is getting much more
difficult. Various
governments are beginning to reduce their investments into language
technologies, as
compared to 15 - 20 years ago.  I have seen major projects abandoned
because governments
accepted proposals and then decided to not fund them.

So in summary, 100 000 is not expensive for a "Rule-based" MT system based
on existing
commercial products.

If any of you want contacts on the the idea of developing an
"Example-based" MT system
at possibly a lower cost, and most likely in the structure of an academic
research
project, I can certainly give you those contact addresses.

If anyone wants copies of some of my papers that I have written on the
topic (most of them
are not too technical and are understandable for a general readership),
just ask and
I will send you my bibliography, and then you can ask for specific articles.

Best,

Jeff


=================================================
Jeff ALLEN - Directeur Technique
European Language Resources Association (ELRA)  &
European Language resources Distribution Agency (ELDA)
(Agence Europ.enne de Distribution des Ressources Linguistiques)
55, rue Brillat-Savarin
75013   Paris   FRANCE
Tel: (+33) 1.43.13.33.33 - Fax: (+33) 1.43.13.33.30
mailto:jeff at elda.fr
http://www.icp.grenet.fr/ELRA/home.html
----
Endangered-Languages-L Forum: endangered-languages-l at carmen.murdoch.edu.au
Web pages http://carmen.murdoch.edu.au/lists/endangered-languages-l/
Subscribe/unsubscribe and other commands: majordomo at carmen.murdoch.edu.au
----




More information about the Endangered-languages-l mailing list