Innovation: Language Weaver: fast in translation

Sat Oct 4 16:56:13 UTC 2008

 <http://features.csmonitor.com/innovation>

<http://features.csmonitor.com/innovation/category/sci-tech/>
  <http://features.csmonitor.com/innovation/blog-entry/>

(John Kehe/Staff)
Language Weaver: fast in translation

*How one firm quickly translates reams of data.*
By Gloria Goodale<http://features.csmonitor.com/innovation/2008/10/01/language-weaver-fast-in-translation/#>
| Staff Writer for The Christian Science Monitor/ October 1, 2008 edition

Reporter Gloria Goodale explains the history of Language Weavers.
 ------------------------------

Los Angeles

If you want to text message your Spanish-speaking neighbor, but don't know
how to say "Please turn down the radio" in that language, you could find a
quick translation online at any number of websites. But, if you are, say, a
large semiconductor company with customers around the globe, you are in a
pickle if all your support data is written only in English.

Enter Language Weaver, a Los Angeles-based firm on the cutting edge of a
rapidly growing field known as machine translation (MT). The firm took one
chipmaker's extensive database and translated it overnight into Spanish, the
No. 1 tongue in demand by that company's customers. This task, says the
company's CEO Mark Tapling, would have taken weeks to accomplish not too
long ago. Instead, its software made short work of a gargantuan task.

The $100 million MT industry has the potential to grow by more than 50 times
that number, some analysts estimate. "Language Weaver is a leader in this
field," says Don DePalma, chief research officer with Common Sense Advisory
Inc., who specializes in the somewhat arcane world of computerized
translation services.

This may seem like a yawn-producing competition among geeks, one that
transpires beyond the purview of most people's concerns. But in fact, say
industry watchers, making swift, high-volume, global communication possible
is quickly moving up the to-do list of those who conduct international
business deals. For instance, what happens to a nuclear power firm doing
business in remote parts of India with no ability to hand over documents in
the proper local dialect?

"The ability to translate lots of information quickly is becoming one of the
important concerns of a global economy," says Mark Przybocki, computer
technologist and MT team coordinator with the National Institute of
Standards and Technology, in Gaithersburg, Md. "Especially when you consider
the huge amounts of information accumulating on the Internet…. Effective
machine translation is becoming more important every day."

Just what constitutes "effective" MT is a source of lively debate among a
small but growing number of linguists, mathematicians, and computer
specialists who dominate the field. Since the 1980s, the MT field has
consisted of three approaches: rules-based, in which programmers entered up
to 20,000 grammatical rules to direct the translation; example-based, in
which discrete examples serve as guides; and statistical, in which "smart"
computer algorithms "learn" from previous translations and develop their own
guidelines.

The first two approaches were dominant until the turn of the century because
the statistical method required so much data from which to "learn," as well
as massive amounts of processing power to search and cull its protocols, and
enough memory to retain the information. But the statistical approach became
more viable as computing power began to accelerate and memory capacity grew
more affordable.

Language Weaver grew out of what Kevin Knight, one of the company's
cofounders, calls a "watershed workshop" in 1999. His team discovered that
the translation protocols developed for one language could move seamlessly
to another without having to start over from scratch with each new tongue.
The group's work enabled it to nab all-important research funds, and within
two years, the commercial venture began. Today, Mr. Knight sits in front of
his computer looking at a translation program for Chinese that is capable of
processing some 100 million directives.

But this would not be cutting-edge technology, however, without some
disputes. Chief technology officer and cofounder Daniel Marcu has T-shirts
to prove it. One reads, "I lost the syntax bet," another says, "I won";  he
alternates them depending on how the arguments go. This refers to a wager
between his team and a former colleague who now runs the free translation
service at Google. Mr. Marcu has maintained that the system will still need
grammatical rules no matter how much a statistical system is able to learn
from previous translations, while the other side believes that statistics
alone will provide all the necessary guidance.

Friendly wagers aside, Marcu says that in the end, it won't matter. "There
is so much information on the Internet … that these systems will absorb
grammatical rules without pausing to articulate them."

The biggest challenge MT may face is human expectation. "People think
machines should be able to act like the computer on the bridge of the Star
Trek's Enterprise, or C3PO. That would be nice," says Mr. DePalma, "but
while everyone would like that fabled Babel fish in the ear [the universal
translator from the sci-fi classic, "The Hitchhiker's Guide to the Galaxy"],
we are still a ways off from that."

>>From the Christian Science Monitor, 10/4/08
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

Harold F. Schiffman

Professor Emeritus of
Dravidian Linguistics and Culture
Dept. of South Asia Studies
University of Pennsylvania
Philadelphia, PA 19104-6305

Phone:  (215) 898-7475
Fax:  (215) 573-2138

Email:  haroldfs at gmail.com
http://ccat.sas.upenn.edu/~haroldfs/

-------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lgpolicy-list/attachments/20081004/351b994e/attachment.htm>