ELL: Re: technologies for Endangered Languages

Jeff ALLEN jeff at elda.fr
Wed Feb 24 20:54:54 UTC 1999


id FAA19244
To: owner-endangered-languages-l at carmen.murdoch.edu.au
Precedence: bulk
Reply-To: endangered-languages-l at carmen.murdoch.edu.au

At 14:53 24/02/99 +0000, you wrote:
>I was interested to read the proposal by Transparent Language, Inc. to
>adapt their own machine translation and CALL technologies for
>neglected/endangered/sparse-data/low-density languages, brought to our
>notice by Jeff Allen.

I should probably add to this thread that this announcement by
Transparent is surely a result of the panel discussion at the
conference of the Association for Machine Translation in the
Americas 98 (Philadelphia) that was entitled:  "The Forgotten
Majority: Neglected Languages".  All of the major Machine
Translation (MT) companies and a few academic MT developers
interested in neglected languages, were involved in the panel.
It revealed much on the problem of investing into the technological
approach of language preservation. The ultimate question was:
Who is going to pay for it?  All of the MT companies (Transparent
wasn't represented in the panel) stated that it is simply too
expensive for the small budgets that neglected language
preservation people can invest.

Transparent's recent announcement is a rebuttal to that point
of view, at least from what I can tell.


>If the heritages which are to be preserved and transmitted are as rich as
>Transparent believes, aren't they a bit rash to believe that their
>technology will be adequate for any one of them?

These are language localization experts, not cultural experts. That
is where your expertise would be necessary.


>Secondly, where is the evidence that Transparent has anything to "save"
>languages, besides the magnificent title?
>> The Endangered Languages Preservation and Revitalization Project

I have several articles, written independently by people who do not
work together at all, here on my desk and on my hard drive that
argue that the hope for preserving endangered and neglected languages
is to computerize and internetize them.  It is the way of getting them
recognized.

Having worked 10 years on French Creole languages, certainly
considered as minority and neglected languages in most parts
of the world, and the last 4 of these years on Machine Translation
technologies, it has been interesting to see how the
development of an English <--> Haitian Creole speech-to-speech
MT system has influenced the Haitian public with regard to
how they perceive their language.  Once you create an English
- Creole MT system, no one can argue any longer that Creole
is just a "patois", "broken French", and not a real language.
If you can develop a translation system that can translate
from one language to another, both languages are on the same
foot.   This has profound repercussions on the social status
and future of the language, not to mention the possibilities
for improving literacy campaigns and other movements
that support the local languages.


>Having visited the web site (http://www.transparent.com/), I see that
>Transparent offer multimedia Computer-Aided Language Learning environments,
>machine translation, and perhaps some electronic books or the equivalent.
>"Cultural Partners" are expected to be able to pour any linguistic culture
>they know into this framework.  But this is unlikely
>>to give endangered
>>languages a new breath of life
>
>At best, it will show that smaller communities' languages can express the
>same kind of cultural content that has been found adequate, apparently, for
>the 18 languages already incarnated in this "Transparent" form.

Yes maybe the cultural content, but the linguistic issues will be tough
to tackle.  Highly agglutinative languages and possibly isolating
languages will be hard.

>Most such smaller languages are community languages, or they are nothing:
>but it seems that there is no role for  communities of present-day speakers
>in the production of these materials.

Not true.  The only way that they can develop such systems is
through partnerships with the local communities.  The local
people groups must participate and must feel a sense
of ownership, or else it will lead to nothing.   What Transparent
seems to be doing is providing a means to form those
partnerships with those communities and projects that
have limited funding and yet want to promote their languages.

>Anyway, I for one would feel a lot easier about this initiative, if it were
>represented as a product which is now available (perhaps on concessionary
>terms, or with consultancy and support thrown in) to be explored by people
>in endangered language communities, rather than as some sort of mission or
>crusade by the producer company.

They can't develop the databases for each local language without
native language knowledge engineers (as we called the native
language translators at my last job).   We were able to effectively
build a database in 5-6 months for Croatian, Haitian Creole,
and Korean, that worked within an existing MT system.  The data
is the only factor that changed.  The approach, formatting,
and system were the same (except for a few extra things for
Korean like Unicode formatting and a Noun Phrase tagger
that we had to develop in addition).  We demonstrated rapid-
development MT and speech technologies for these languages
in very short periods of time compared with other systems.
I have copies of all of my research papers on the topic.
They are publicly available.


>In some sense, a company and a set of products is just a tool, no better
>and no worse than the purposes to which it is put; so maybe this can be the
>basis for something really good.

They are offering the tool.  They need to form the partnerships
for the compilation of the database as a language resource that
can be used within the tool.

Building those partnerships is a lot of work.  I know because I've
spent the last 2 years doing it for Korean and Haitian Creole projects.

>But corporate culture itself, based on the production of these tools and
>seeing them and trade in them as the essence of life, has not been
>inspiring, or much of a force for good, whenever it tries to be benign on
>its own terms.

I am not under the impression that this stems much from
corporate and industrial needs.  There have been a lot of presentations
on the topic of language engineering for minority and neglected
languages over the past couple of years, and much of it stems from
locally-raised needs (immigrant translations, language preservation
and education, literacy). Sure some is for pure academic research
interests, and none of it to my knowledge is supported by
industry.

>The issue raised here is one I find central, placed as I am, paid to offer
>consulting on language technology, but trying also to do what I can for
>endangered languages.  Can there be a fruitful marriage of global computing
>and indigenous culture?

I think so, but it must be done with a well-organized and thought
through collaboration between the tool providers, the database
compilers, and the locals.

>How do others see this tension?  Or am I perhaps
>seeing (at least in this case) potential conflict where it may never arise?

Yes, a very valid point raised indeed.   I am glad to hear it voiced.

Any other opinions.  I would certainly be interested.

Best,

Jeff
(a sociolinguist at heart, linguist and language professor to get
where I am, MT developer as a way to find a job since the
other avenues did not provide much, and language resource
promoter to use all of the above-mentioned skills for the
advancement of today's technologies -- have thouroughly
enjoyed the path I have taken).

=================================================
Jeff ALLEN - Directeur Technique
European Language Resources Association (ELRA)  &
European Language Resources Distribution Agency (ELDA)
(Agence Europ..nne de Distribution des Ressources Linguistiques)
55, rue Brillat-Savarin
75013   Paris   FRANCE
Tel: (+33) (0) 1.43.13.33.33 - Fax: (+33) (0) 1.43.13.33.30
mailto:jeff at elda.fr
http://www.icp.grenet.fr/ELRA/home.html
----
Endangered-Languages-L Forum: endangered-languages-l at carmen.murdoch.edu.au
Web pages http://carmen.murdoch.edu.au/lists/endangered-languages-l/
Subscribe/unsubscribe and other commands: majordomo at carmen.murdoch.edu.au
----



More information about the Endangered-languages-l mailing list