[lg policy] Google ’s Toolkit for Translators Helps Feed Its Machine

Harold Schiffman haroldfs at GMAIL.COM
Tue Mar 9 15:16:19 UTC 2010

Google’s Toolkit for Translators Helps Feed Its Machine

Te Taka Keegan, a university lecturer in New Zealand, is betting that
Google can help him preserve the Maori language of his ancestors.
Mr. Keegan uses a tool called the Google Translator Toolkit to upload
Maori translations of English texts to Google. Others can then use
those translations in their work, increasing the quantity and quality
of Maori translations that are available, and creating incentives for
children of Maori descent to learn the language.

“With this tool, we can actually uplift our language,” Mr. Keegan
said. “For us, it is about saving our language from extinction. We are
trying to help our culture survive.” The Google Translator Toolkit may
be good for the culture of the Maori people, an indigenous minority
group in New Zealand. It’s also good for Google.

Data from the toolkit helps Google beef up its machine translation
system, which I cover in an article in Tuesday’s Times.

Google’s machine translation system feeds on data, including the data
that Mr. Keegan and others feed into the toolkit. If enough people use
the service, Google will eventually have enough data to add Maori to
the list of languages that Google can translate automatically. Google
Translate, the company’s translation tool, now speaks 52 languages,
more than any of the major machine translation systems in use. In a
sign of Google’s ambitions, the company recently released the toolkit
in 345 languages, from Abkhazian to Zulu.

“The toolkit is a goldmine for sucking data,” said Alon Lavie, a
machine translation expert and associate research professor at the
Language Technologies Institute at Carnegie Mellon University. “Google
can use it to collect data for language pairs that there is very
little data on.”

For now, the amount of data Google is getting through the toolkit
pales in comparison with the massive amounts of text it can cull from
the Web and other sources, like official government documents or its
book scanning project, said Franz Och, a principal scientist at Google
who leads the company’s machine translation team. But he said that
will change over time.

“We hope the toolkit will be of significant usefulness at some point,”
he said. “The data we get from the toolkit is very nice and well
aligned,” he said, meaning that the side-by-side translations are
especially useful to Google’s machine-learning algorithms.

University researchers in Wales are also using the Translator Toolkit
to help increase the availability of text in the Welsh language, and
Google can use the data from those efforts to improve its automatic
translation into Welsh, one of the 52 languages its system can handle



 Harold F. Schiffman

Professor Emeritus of
 Dravidian Linguistics and Culture
Dept. of South Asia Studies
University of Pennsylvania
Philadelphia, PA 19104-6305

Phone:  (215) 898-7475
Fax:  (215) 573-2138

Email:  haroldfs at gmail.com


This message came to you by way of the lgpolicy-list mailing list
lgpolicy-list at groups.sas.upenn.edu
To manage your subscription unsubscribe, or arrange digest format: https://groups.sas.upenn.edu/mailman/listinfo/lgpolicy-list

More information about the Lgpolicy-list mailing list