Cheeseburgery hamburgers and the problem of computerised translations
natalyekelly at yahoo.com
Sat Jan 31 15:40:54 UTC 2009
Google's statistical MT engine(http://translate.google.com/) is available in the following languages: Albanian, Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian and Vietnamese.
I will paste below a few recent Watchtower blog entries (independent industry commentary) that might be of interest on the topic of both rules-based and statistical MT. I would recommend clicking on the actual page URLs to see the related links, videos and images in case they do not display properly here. However, these just give a snapshot of the state of the market, and do not dive into the technical details of the machine translation engines. Those are often the subject of papers and presentations within the localization and computational linguistics conference circuits.
For some types of projects, MT can actually work well, especially for controlled language and technical content. The Pan American Health Organization has had great success using their MT engine for technical content. It is one of the best examples I have seen of domain-specific MT. More information: http://www.paho.org/English/AM/GSP/TR/Machine_Trans.htm
There are currently several language service providers (LSPs) whose
business model is centered around using free or nearly-free machine
translation with human post-editing. However, MT is also widely used for gisting and is particularly helpful for scanning a large corpus to determine which areas might require TM+post-editing or computer-assisted translation (CAT) performed by humans but made easier through the use of translation memory and software tools that aid with flagging repeated text so that it only has to be translated once, terminology extraction and management tools for ensuring consistent use of terminology, etc.
Another growing trend is machine interpretation (total automation of spoken language interpretation), so I'll include one post on that topic below as well. Computer-assisted interpretation (CAI) is another growing trend, in which both end users and interpreters themselves are making greater use of software, handheld devices, and desktop applications to facilitate interpretation tasks.
I hope some of these blog posts will be useful to colleagues, although it is important to remember that, as blog entries, they provide just a snapshot of the current trends in the language services market. A great many books and journal articles exist on these topics that would lend greater insight to those interested in the current state of the research.
How Good Is Machine Translation? A Modest Test
Donald A. DePalma 30 October 2007
The Wall Street Journal
recently opined that “translation software is at last good enough to
help companies do business in other languages,” noting a hoary case
study from Ford and posturings from Google, Microsoft, and SDL — and
few real examples. But that’s fine. The Journal has just discovered MT,
perhaps looking for juicier stories to put on its Page 3 as Rupert Murdoch’s News Corp takes over as dowager queen of the print media.
That said, MT is definitely on the must-review list for many
companies and government agencies, but few are paying for it today. The
biggest use of automated translation is free online machine translation
(OLMT). How widespread? Last year Common Sense Advisory asked 2,430 consumers in non-Anglophone countries
whether they tried free OLMT — more than half said that they sometimes,
frequently, or always use machine translation to better understand
So, like it or not, information consumers will get what they want
translated by a quick hop over to their favorite free MT site. How good
will the translations be? Let’s consider some skeptical lines penned by
our resident Carioca as he read our entry about changes in the MT guard.
We decided to paste these immortal words into several free OLMT sites
to test Portuguese into English translations rather than come up with
the MT equivalent of the “the quick brown fox jumped over the lazy
brown dog.” One of the systems used statistical machine translation
(SMT), the other 3 were rules-based (RBMT) systems (see the full
version of Automated Machine Technology for a discussion of the differences, pros, and cons).
“Escrevo estas linhas em português para testar o novo
método de tradução automática do Google. Eu não tenho grandes
expectativas com relação a este tipo de serviço. Confesso que sou
usuário freqüente para ter acesso a informações em idiomas que não
domino, como o alemão, chinês e japonês. O Google agora utiliza
tradução automática com base estatística. Quer dizer, quanto mais
traduções houver de uma frase, mais correta será a solução oferecida
(SMT): “I write these lines in Portuguese to test the new method of
automatic translation of Google. I do not have great expectations with
respect to this type of service. I confess that I am a frequent user to
access information in languages that no domain, such as German, Chinese
and Japanese. Google now uses machine translation based statistics.
That is, the more there translation of a phrase, more correctly is the
solution offered by the computer.”PROMT
(RBMT): “I write these lines in Portuguese to test the new method of
automatic translation of the Google. I have not big expectations
regarding this type of service. I confess that I am a frequent user to
have access to informations in languages that I do not dominate, like
German, Chinese and Japanese. The Google now uses automatic translation
with statistical base. It means, how much more translations will be of
a sentence, more correct will be the solution offered by the computer.”SDL
(RBMT): “I write these lines in Portuguese for quiz the new approach of
automatic translation of the Google. I do not have big expectations
regarding this kind of service. Confessed that I am user frequent for
have access the information in languages that do not dominate, as the
German, Chinese and Japanese. The Google now utilizes automatic
translation with statistical base. It want to say, specially
translations will have of a phrase, more correct will be the solution
offered by the computer.”SYSTRAN
(RBMT): “I write these lines in Portuguese to test the new method of
automatic translation of the Google. I do not have great expectations
with regard to this type of service. I confess that I am using frequent
to have access the information in languages that I do not dominate, as
the German, Chinese and Japanese. The Google now uses automatic
translation with base statistics. It wants to say, the more
translations will have of a phrase, more correct will be the solution
offered for the computer.”
Judge for yourself. While none of these are perfect translations and
one is definitely not at the quality level of the others, all 4 tell us
that Senhor Beninatto wasn’t writing a shopping list for “pound
pastrami, can kraut, six bagels.” For many web browsers, that ability
to determine the subject of a communication will be good enough,
allowing them to determine whether they want to invest more time in a
given piece of information. Obviously, in more complex domains and in
printed communications like owner’s manuals for a Porsche 911 GT3 RS
(Santa, are you listening?) or how to adjust the control rods for a
nuclear fission reactor, tuning and accuracy will be much more of an
Changing of the Guard in Machine Translation
Donald A. DePalma 30 October 2007
information will never be translated by humans from its source language
into even one other language, much less into many. Budgets, staffing,
and time will always make organizations shy away from translating even
a small fraction of the words they have on hand. Many companies and
government agencies will use some form of automated translation to
improve services to customers and constituencies. However, many
information consumers will avail themselves of free online machine
translation (OLMT) if they don’t find their language at a website.
Most of that free OLMT to date has been provided by SYSTRAN, a French software firm that grew up during the Cold War as the Free World faced off against the Moscow-led Warsaw Pact. In October new challenges arose from the new guard, including the Russians themselves.
Google reportedly replaced the languages that SYSTRAN translated
for it in favor of its in-house statistical machine translation (SMT)
engine. Google’s homegrown technology came into wide view when it won
the no-holds-barred NIST Machine Translation Evaluation
in 2005. Google’s MT is part of the GooglePlex — that is, not yet a
commercially available product, but, like its search appliance, MT
could become a Google product. Try it here.SMT-based Language Weaver opened its second sales office in Europe.
After its initial success selling to certain U.S. government agencies,
Language Weaver made its 2006 European debut in bureaucrat-dense,
government-rich Brussels. Its latest digs are in Paris, hometown of
SYSTRAN — and presumably of some commercial buyers. Free use of
Language Weaver on the web is harder to find than Google or SYSTRAN.
Earlier this year the company announced that the social bookmarking site Kontrib
was using its technology, giving everyone a chance to see its output.
Expect Language Weaver to host its own OLMT site as part of its
marketing expansion.St. Petersburg-based PROMT announced a significant uptick in the use of its free OLMT. This followed its September announcement of V7.8 with support for Windows Vista, while those fortunate enough to speak Russian already have access to Version 8.0 with its improved algorithms and usability. Try its free OLMT.
The bottom line: Most consumers will never buy desktop machine
translation software from LEC, PROMT, or SYSTRAN for their PCs, Macs,
or smartphones. However, they will have free MT available in the cloud
from Google, Language Weaver, LogoVista, Microsoft, PROMT, SYSTRAN , and through portals like Yahoo! BabelFish. How well do they work? Click here for a modest example.
Seeking an MT Market beyond Ad-Reading Eyeballs
Donald A. DePalma 25 September 2008
week, Language Weaver projected a US$67.5 billion market for digital
translation, enabled by advances in machine translation (MT). For the
last few years, we have released an annual estimate of the market for
outsourced translation, localization, and interpretation. For 2008,
human-delivered translation activities will total a hefty US$14.25
billion (see our “Ranking of Top 25 Translation Agencies“).
On the software side, we estimate that the MT software market falls
well short of US$100 million. Added together, there’s a lot of daylight
between our numbers and Language Weaver’s estimate. Where’s the
disconnect? Over the last week, we’ve spent a lot of time talking with
various people about the US$67.5 billion projection.
Let’s start off by deconstructing the 67 billion dollar number. That
is an estimate of the monetary value that Language Weaver thinks MT
suppliers “could” translate for corporations and governments; the
operative phrase in the company’s press release is “untapped markets”
where automated translation could increase the volume and lower the
cost of human translation, which stands at current market prices of 10-40 cents per word.
How good is Language Weaver’s sizing of the as yet unrealized
market? We think its number is way too low, especially as the amount of
stored content grows at record levels (see the figure below from our
report on “Automated Translation Technology“).
The untapped market potential is much higher, but the problem is
still getting buyers on board. Language Weaver will target customer
care, business intelligence, and user-generated content, three markets
where companies could benefit from moving content out of linguistic
silos. However, the organizations today that stand to gain the most
from MT are those driving advertisement-reading eyeballs to their sites.
The challenge that Language Weaver and rival developers face is getting
more people accustomed to the idea of paying for MT software or SaaS
solutions that will help them translate their content into other
languages. Three roadblocks stand in the way:
Free machine translation obscures the value.
There’s an enormous amount of content that’s translated every day
online using free online machine translation sites, but no one has
figured out how to directly monetize those interactions. We have long
contended that there’s far more text that consumers, businesses, and
governments might run through those engines if they could more easily
plug them into workflows, e-email systems, mobile phones, and other
networked appliances. Combine a dollar figure for the unmonetized
activity that’s happening today at sites like Google Translate or
Yahoo!’s Babel Fish with the dollar value for things that should be
translated - and you’ve got some really big piles of zeroes. The
problem is that there are usually no positive integers to the left of
those zeroes. Bottom line: Too much of it is free.Unpaid human translation appears to be a panacea. Another rival to MT is community or collaborative translation for both company- and user-generated content, such as we’re seeing at Facebook (social networking), Livemocha (language learning), and NetBeans
(Java software development). These communities can fill some of the
demand, but nowhere near all of it. That leaves a lot of information
forever locked in the language in which it was created.An uneducated market expects too much or too little.
Potential buyers retain unrealistic (read “Star Trek” or Hitchhiker’s
Guide”) expectations of what they will get out of machine translation.
Some ignore the quality issue
altogether, posting babble-fishy output and thinking they did a good
thing in providing any in-language content at all. Meanwhile, many
individual translators and too many translation agencies miss the
point; they think that MT threatens their livelihood rather than
viewing it as a productivity enhancer.
That said, the corporate and governmental sectors may be turning the
corner vis-à-vis MT acceptance, if not purchasing. A poll conducted by
the International Association for Machine Translation (IAMT) and
Association for Machine Translation Americas (AMTA) for SDL,
another provider of machine translation technology, found that 40
percent of the 385 surveyed individuals were “now” likely to use MT. Of
those roughly 150 receptive respondents, 62 percent said they would use
it for technical documentation, 49 percent for support and
knowledge-based content. That’s good news for the MT software sector,
but could be bad news if automated translation merely displaces the
work of traditional translation agencies rather than increase the size
of the overall business.--------------------------
Asia Online Aims to Meet Asian Content Demands with MT+
Donald A. DePalma 14 April 2008
the last dozen of so years we’ve heard ourselves incessantly reminding
everyone that the “www” in most URLs means “worldwide web,” while the
“e” in “e-commerce” all too often stands for English. Our research on e-GDP (online GDP) and the Availability Quotient
demonstrated that many companies still have a long journey before they
can meet the demands of the world’s markets for local-language content.
That gap is no more apparent than in Asia where the amount of
in-language content is dwarfed by the growing online population.
Just how dwarfed? Today, roughly 38% of internet users live in Asia,
but by 2012, that number will jump to half. However, local-language
content hasn’t kept pace. In 2007, non-Asian languages accounted for
roughly 86% of the content on the web. Most of the remaining 14% was
split among Japanese (6%), Chinese, (6%), and Korean (1.5%). All other
Asian languages comprise less than 0.03% of the web’s content; for
example, Southeast Asian languages make up less than 10 million pages.
Given consumer preference for content in their own language, that huge gap between Asian content and total online population represents a huge opportunity.
That opportunity has not gone unnoticed. After getting an eyes-only,
tell-no-one pre-briefing in December, we recently spoke with Asia
Online CEO Dion Wiggins who called us to tell us that his portal had
just scored its first round of funding from JAIC, the Japanese venture capital behind Alibaba.com, among others. He also wanted to let us know that Kirti Vashee,
formerly VP of marketing at Language Weaver, had signed on as Asia
Online’s VP of sales for the Americas and Europe with the
responsibility for selling the commercial version of its MT engine.
Asia Online’s plans revolve around a proprietary machine translation
engine plus a strong support infrastructure of humans, content, and
partners are key to this strategy:
New technology. Asia Online developed
high-performance statistical machine translation (SMT) software in
collaboration with University of Edinburgh professor Philipp Koehn.
Clean corpora. Asia Online contracts with
publishers, language service providers, and eventually corporations for
human-translated content to train its SMT engine. The company also
crowdsources the quality via a large community of students, and feeds
the validated content back into the system as training data.
Matrixed language learning. The SMT engine can
take translations of a novel into English, Japanese, and Thai and use
the permutation to train itself on English<>Thai,
English<>Japanese, and Japanese<>Thai. This capability is
especially important for languages that don’t have enough content to
feed a data-hungry statistical MT engine.
Real-time fixes. Its MT engine lets reviewers
observe translation decisions as they are being made, allowing them to
influence choices, make fixes in place, and propagate these
modifications to wherever that phrase or term is used
Asia Online is talking with LSPs interested in using its SMT engine
and has fielded corporate requests to use its software. We think that
its real value lies in its Google-esque plan to drive billions of eyeballs
seeking content in their own languages — and the advertising, special
offers, and the next-generation linguistic tools that are sure to
Google MT Puts Multilingual Information at More Fingertips
Donald A. DePalma 25 March 2008
As we predicted in our 2006 report on machine translation,
Google has opened its MT engine to general usage — but with no software
license or other fees. Acknowledging that automated translation right
now is all about eyeballs, Google made its newly documented AJAX Language API for Translation and Language Detection
beta release free to anyone who decides to call it. By the way, we
would have put “language detection” first in the API’s name, but Google
knows a bit more about SEO than we do.
As the name implies, you can use this application programming
interface to detect language blocks in a text and translate them.
Translation requests go to Google’s pretty good statistical MT engine (SMT). The API supports 29 language pairs
(13 languages in total), including the usual E-FIGS and CCJK plus
French<>German without involving English as the pivot language.
Translation services are what Google generates without the option for
training the SMT engine on your particular lexicon. Nonetheless, Google
translations have proven to be very intelligible in the mash-ups that we have done or observed.
Google says that its language API is simple and easy to use — versus
an arcane call-level interface: It requires an input string to
translate, the names of the source and target languages, and a callback
function. We put that claim to the test with a short program that threw
increasingly larger strings at the interface. We can attest that it is
easy to use for short strings. We did notice a couple of restrictions
in our sandbox (N.B. Common Sense Advisory Labs did not conduct
exhaustive tests on the API — rather, we ran tests until we got bored
with the permutations):
Strings. The API maxes out at around 1,200
characters per source string of plain text (figure on 100-120 words).
While that’s good for including Google’s MT in your average
application, it won’t help the average language service provider intent
on pre-translating big files.Files and URLs. If you want to translate files, set them up as HTML pages hanging off a website and type the URL into Google’s website translator. That worked for web pages and shorter documents, but choked on the unexpurgated HTML version of “Business Without Borders” (a mere 122,000 words, give or take a couple hundred). We also tried translating the 19,000 words of Thomas Paine’s Common Sense pamphlet into Japanese and Russian. Google translates the first 5,300 words, but leaves the rest of the page in English.
Google’s AJAX Language API page promises future enhancements. We
expect longer strings, named files, and longer documents to be part of
future releases. What’s less likely in free Google MT are commercial
features such as lexical tuning by company, industry-specific
glossaries, or the feedback loop available since 2005 in Language Weaver (although Google does have a generalized “train the engine” function).
For information consumers and seekers of truth in languages other
than their own, these advances will be good news. Higher quality, free
machine translation utilities will lead to MT popping up in more and
more applications.For translators who don’t own translation memory software, we think that Google remains a great candidate for offering a gmail-like translation environment, replete with MT.Smart LSPs should seriously consider preprocessing small projects
through the Google engine and — depending on the output — decide
whether it is worth post-editing or fully translating the text. After
all, they really don’t have anything to lose and could increase the
productivity of their translators.Competing MT engines will need to move fast to stay ahead of the
ad-funded portal. This API will make life difficult for the already
besieged smaller players trying to sell their wares in a market
monetized more by search and eyeballs than by software license revenue.
Companies like SpeakLike and Transclick (one of 391 World Economic Forum Technology Pioneers) will likely add the Google engine to their suites of MT engines. Meanwhile, we don’t expect companies like Asia Online,
Language Weaver, Microsoft, PROMT, SDL, SYSTRAN, and others with their
own MT engines and advancing research to sit on the callable MT
sidelines for long.
Earlier today we spoke with Dimitris Sabatakakis, CEO at SYSTRAN,
who said that “all MT providers should thank Google for the hype and
excitement it brings as MT is now perceived as a practical and usable
technology. This means there are more potential customers interested in
a MT product or solution. Google’s investment in MT is proof that MT is
a key technology for the emerging market and provides a solution to a
real need. It is forcing all providers to raise their respective bars.
If we stay static, we will collapse.”-------------------------
Chevy “Nova”: Updating Bad Translation Apocrypha
Donald A. DePalma 6 February 2008
Not an hour goes by that we don’t receive an e-mail announcing a press
release from a vendor. What we find most interesting is when a company
issues a press release but fails to tell us (or anybody else) that it’s
out there. That happened back in May when SDL noted that “Spanish
leaves global marketers lost in translation.” Quoting the press
release, “According to SDL, the top five worst translation mistakes
made by companies looking to expand into the Spanish-speaking world”
were the usual hackneyed examples of bad translation. These included “I saw the Pope” (el Papa) translated as “I saw the potato” (la papa),
the “Got milk?” slogan rendered as “Are you lactating?” in Spanish, and
Parker introducing its non-leaking fountain pen in Spain with the
slogan “it won’t leak in your pocket and embarrass you,” with the
translator buddying up with a false friend (embarazar means pregnant, not embarrassed). At least they left out the old chestnut about the Chevy Nova (no va — get it?) in Latin America and the rumored over-medicated U.S. Latina who interpreted the “once a day” on her prescription as “11 times a day.”
What’s going on here? It’s all about search engine optimization. SDL
cited these examples plus economic figures for Latin American growth to
improve its SEO rankings for the Hispanic market.
The company’s CMO figured that becoming associated with these sometimes
apocryphal mistranslations was a good way to improve SDL’s search
engine rankings. Of course, we’re doing the same here by recycling
these oft-told tales of mistranslation.
But wait — there are some really good examples of bad translations
and cross-border mistakes out there. Here are a few of our favorites:
For our 2002 keynote at the SAE’s TopTec Multilingual Communication
for the Automotive Industry conference, we found candidates for “Bad
Product Name of the Year” among Japanese car makers selling in Latin
America: Mazda Laputa (interpreted by Spanish speakers as la puta),
Mitsubishi Pajero (slang for onanist), and Nissan Moco (snot). In that
speech we cited an auto show description of the Laputa that might not
be suitable for children — “Laputa ha mejorado su seguridad y ampliado
su interior… Cuerpo diseñado para resistir impactos frontales.” Check
that out at Yahoo! or Google free MT sites.More recently, Car and Driver
magazine reviewed the translated claims of Chinese automakers at the
Detroit Auto Show. The brochure for the Liebao CS6 SUV claimed “Gene of
being Wild: VM engine brings you the long-awaited shock… only by
stepping on the accelerograph, the mph will come to the peak in a
second” and the BYD F3 sedan has “fuel efficiency stomach.”Back to the subject of product names, we noticed a stand for a firm
selling “Hyper STD” at the tekom conference in Wiesbaden, Germany last
November (see photo above). Yuck! Most American buyers would steer
clear of products associated with Sexually Transmitted Diseases.When we tried the WiFi at the tekom conference Hotel Klee am Park
in Wiesbaden, we read the English-language instructions that told us:
“General technical supposition is a reticulation-card. Please arrange
your reticulation-card to IP (automatic internet register).” Huh?The classic post-Sputnik mistranslation of “wet sheep” for
“hydraulic rams” in a Soviet science journal is an under-used classic
example. That’s baaaad! Next time you think about referencing the Nova,
try this one instead.A friend who was an interpreter at the United Nations told us about
a colleague who tried to amplify an emotionally-delivered idiomatic
expression, suggesting that “we need to grab the bull by something
other than the horns.” Ouch.
But bad translations aren’t always funny. They can have serious consequences:
Financial markets will shake. Back in May 2005 a reporter for the China News Service pieced together a story about how currency appreciation might affect the market.
The People’s Daily had it translated into English without the
subjunctive case, stating that China decided to revalue its currency
1.26% a month for a year. Bloomberg’s spider in London picked up the
story and European equity markets rose on the news. While it was
quickly repudiated, the error did cause market tremors.Armies can advance without consequence. In August
1968 U.S. Army transcribers reportedly wrote down a transmission from a
Soviet tank column as “my perexali most” rather than “my priexali v
Most.” What was heard (a routine bridge-crossing exercise by a tank
column) was not what happened (the arrival of Soviet tanks in Most, a
city in sovereign Czechoslovakia).Countries might disappear. In October 2005 Iranian President Mahmoud Ahmadinejad
reportedly called for Israel to be wiped off the map, but apparently he
really “just” wanted to get rid of its government. True to form,
Ahmadinejad didn’t clarify his remarks after the mistranslation,
further complicating matters.Companies will get into trouble. A senior executive at Yahoo! had to apologize for not giving U.S. Congressmen information about the company’s role in the imprisonment of a Chinese dissident, Shi Tao. According to Yahoo!, a bad translation by an employee of a 2004 order from the Chinese government caused the problem.
None of the mistakes after the “But wait” in this posting were machine translation miscues — they’re just bad translations by humans. Caveat lector!
JAJAH Advances Machine Interpretation
Renato Beninatto and Nataly Kelly 12 August 2008
Filed under (Interpretation, Translation & Localization, Translation Technologies, Language Industry)
When we first heard about JAJAH’s extremely simple process
for providing machine-based telephone interpretation, it sounded too
good to be true. The process is comprised of three easy steps — simply
dial a number from any phone, speak in English, and hand your phone to
the person who speaks Mandarin. The way it is described, the service
would seem to automate much of human interpreters’ work, and would be
particularly helpful for situations in which telephone interpreters are
used. As usual, if it sounds to good to be true, it probably is.We tested the service, currently touted as a way to help travelers overcome language barriers in China, just in time for the Beijing Olympics.
We conducted several tests and found that the service seemed to work
quite well at some levels, in that it did correctly render some of our
words into the target languages. However, the voice recognition
component misunderstood some of our words, even when we conducted tests
with speakers of native and near-native English. To test the service in
Mandarin, we used voice-over samples recorded by professional talent,
and the results were a bit difficult to understand in English — then
again, we purposely used samples with brand names that we knew tend to
be problematic for machine translation tools. Now that we’ve aired our
complaints, let’s take a look at a few points on the bright side of
You get what you pay for — at least, in the early stages. The
service is free, so it should come as no surprise that it does not work
perfectly yet. In spite of the disjointed target language versions we
received in English and the fact that telephony provider JAJAH went
with another Babel theme, we do not believe that the localization world
will automatically relegate it to the role of industry laughingstock,
as happened with BabelFish.Free machine-based telephone interpretation is a first. At Common Sense Advisory, we’ve been writing more in the past few months about the trend we are noticing toward computer-assisted interpretation (CAI)
and the future synergies between translation memory and what we refer
to as interpretation memory (IM) — pre-translated and pre-recorded
words and phrases that serve to partially automate the process of
interpretation. This additional focus in our research is intentional —
CAI has already been widely implemented for devices used by the
military, but this is one of the first instances we’re aware of that
offers such a service for free, on-demand, via telephone, and to the
general public. This type of service pushes CAI to a new level.Savvy developers will want to take note. This
offering from JAJAH may not appear at first to represent a major
technological advancement, but it does prove to the world that machine
interpretation (MI) is possible, even if the quality is not yet up to
par. LSPs — especially telephone interpretation providers
— and technology companies that aim to stay ahead of the curve are
well-served to keep CAI and MI on their radar. We predict that more and
more of these services will begin to spring up soon.
Even for the traveler who is willing to hit the re-dial button a few
times and is able to accept an imperfect rendition, this service may be
of limited use. While it’s certainly not as costly as some of the
phone-based Chinese interpretation services that have recently been
profiled in the Wall Street Journal
and other media as services for travelers to the Olympics, it could
prove to be cost-prohibitive for a person dialing the number repeatedly
and trying to confirm the recording’s accuracy while sitting in a taxi
in Beijing with the meter running — especially if proper nouns, such as
the hotel name, are rendered incorrectly. That’s precisely what
happened in our example — take a look at the video below and judge for
yourself. In summary, we don’t see this service replacing the need for
phone-based interpreters anytime soon, but the general impact — and
possibilities — for the language services industry are definitely worth
Google Shakes Up the Translation Memory Scene
Nataly Kelly 8 August 2008
Filed under (Translation & Localization, Translation Technologies, Language Industry)
This week, there were rumblings about the forthcoming beta release of Google’s new translation management system (TMS), called Translation Center. If you’re familiar with Google Translate,
you might be thinking, “Big deal, this is just a low-tech, human
version of what they’re already doing.” If so, you would be wrong: This
is big news for the practice of translation. It seems that Google has
been stalking the sector.
We predicted in 2006 that Google would open up its statistical machine translation engine for general usage — and so it did, as we reported in March 2008. Last December, we published our first report on collaborative translation,
in which we explained how collaboration tools and open source concepts
could increase translation efficiency. We’ve written about the merits of crowdsourcing and how companies like Facebook, Google, and Sun Microsystems have pioneered work in this area.
Google seems to have been listening. In December of 2007, we suggested a gmail-like model
for translation memory and forecasted that a company from outside the
language industry with no interest in selling tools — such as Ask,
Google, or Yahoo! — might be well-served to make such an offer. Google
has apparently done just that. It claims that its new translation
management system (TMS) gives users the ability to request
translations, find translators, and upload documents for translation
into more than 40 languages. It also enables freelancers to create and
review content in their languages using free translation tools. Yes,
Why would Google take an interest in supporting human translation
activities? One big reason: It needs human support in order to build up
its translation memory, so that Google Translate can evolve from a “me
translate pretty one day” prototype to a reputable and reliable
language conversion machine. True, there are some large sources of free
translation memory out there already — such as the enormous database
offered by the European Parliament.
But, to truly enable mass quantities of information to be shared around
the globe, Google needs richer, vaster sources of TM than what’s
currently in the public domain. After all, the typical web user might
want to communicate now and then regarding things other than, say,
official EU declarations and proceedings.
Adding humans to the mix enables Google to gradually create a very
large storehouse of translated words and phrases — exactly what TAUS is
aiming for with its data sharing initiative and what Asia Online is doing with its human-enhanced statistical MT engine. In a nutshell, Google will unite its cloud with the crowd to get as many helping hands on the job as it can.
We’ll reserve our detailed comments on Google Translation Center
until we can actually try it out for ourselves and see how it fares
alongside other TMS programs — our in-depth report with translation management system scorecards
for translation management suppliers will be published soon — but the
big picture value of this news for the industry is clear. Even in its
beta form, Google Translate showed decent promise for the future of automating written language mediation — it is a well-built machine translation engine.
What separates Google from the rest of the MT field is that this
machine is backed up by a manufacturer with plenty of money, data
center power, disk space, and network infrastructure, not to mention
expertise in the assembly and productization of raw information
materials. But now, with the addition of humans, it has the opportunity
to become well-oiled in addition to having a sturdy construction. What
remains to be seen is if Google can find enough oil to maximize MT
performance. Thankfully, translation memory is a plentiful resource —
one that won’t require any drilling.
--- On Sat, 1/31/09, Don Osborn <dzo at bisharat.net> wrote:
From: Don Osborn <dzo at bisharat.net>
Subject: RE: Cheeseburgery hamburgers and the problem of computerised translations
To: lgpolicy-list at ccat.sas.upenn.edu
Date: Saturday, January 31, 2009, 9:47 AM
We all know MT (machine translation, aka computerized translation) is not
perfect so I don't think this piece was particularly informative.
The only news I see in it is that there is MT for Polish <-> English
(probably has been for a while but this is the first note I've made of to
it). Given what must be necessary to develop MT, it does not surprise me if
a recently developed program churns out some cheeseburgery results (though I
wonder who put that word in the lexicon).
While on the topic, my favorite MT mistranslation was with an older version
of Systranet.com (results duplicable on Babalfish): "discussion on
English became in Portuguese the equivalent of "quarrels in baptismal
basins." Such blatantly outrageous results, though, speak to me as a
non-specialist in the matter more of how the MT was set up than any inherent
problem with setting up MT. Discussion in English is not really a synonym
with its apparent cognates in Latin languages (at least French &
Portuguese); and how often do English speakers use "font" to describe
in Portuguese they call pias baptismas? I've never heard of cheesburgery
before but will surely find a way to use it in conversation sometime - just
not in MT.
The real news is how useful MT can be in sorting through the gist of things
in diverse languages, and how with new approaches the results are improving
significantly. I hope FT takes a look at that, and how the complex and
uneven progress in MTis changing the way we access and use multilingual
content and documents.
> -----Original Message-----
> From: owner-lgpolicy-list at ccat.sas.upenn.edu [mailto:owner-lgpolicy-
> list at ccat.sas.upenn.edu] On Behalf Of Harold Schiffman
> Sent: Tuesday, January 27, 2009 11:18 AM
> To: lp
> Subject: Cheeseburgery hamburgers and the problem of computerised
> Cheeseburgery hamburgers and the problem of computerised translations
> January 26, 2009by Tony Barber
> This morning I found myself on a public platform in a Brussels hotel
> for my first ever European bloggers' conference. As a representative
> of an "establishment" news organisation, I was half-expecting to
> roasted alive. But in the end both Mark Mardell of the BBC, my friend
> and fellow-guest, and I got through it safely enough. The most
> perceptive contribution, I thought, came from a Romanian blogger who
> made the point that the global blogosphere remains to a large extent
> divided by language. For example, you can blog all you like in
> Romanian, but most of the world won't have a clue what you're
> A moderator responded to this by saying, "Try using
> translation." As I drifted back to my office, I recalled that the
> time I'd experimented with computers striving to change Italian into
> English or Dutch into Spanish, the results had been pretty hopeless.
> Perhaps things had improved over the last couple of years?
> Well, below are three examples of computerised translation - courtesy
> of Google Language Tools - from French, German and Polish into
> English. I am republishing the translations exactly as they came out,
> punctuation mistakes and all, after I hit the button.
> 1) This is from a news story in Le Monde about US and European policy
> in the Middle East. "Believing that the war in Gaza has imposed new
> priorities and the administration of the new American president,
> Barack Obama, might break with the unconditional support to Israel,
> French diplomacy is trying to print in Europe, a change of tone
> against the Hamas."
> As you can see, this translation starts off promisingly. In fact, it
> scarcely puts a foot wrong until it loses control and talks, weirdly,
> about printing changes of tone against the Hamas. Still, we sort of
> know what's going on here. 7 out of 10 for Monsieur L'Ordinateur.
> 2) Now here's a sentence from a story in Germany's Süddeutsche
> about the US prison centre at Guantánamo and what Europe can do to
> help close it down. "The fate Released Guantanamo prisoners ensures
> fierce debates: Union politicians criticized the foreign ministers of
> Vorpreschen Stein Meier - and refer the responsibility for the inmates
> to the U.S."
> This is a pretty poor effort, Herr Computer. Particularly
> disappointing is the omission of the preposition "of" between
> and "released" (which also shouldn't have a capital R), and
> baffling three words "Vorpreschen Stein Meier". But let's be
> there's a modest degree of sense here. 5.5 out of 10.
> 3) Lastly, here's a sentence from the Polish newspaper Gazeta Wyborcza
> on French leisure habits during the recession. "Economic crisis and
> changing lifestyles, the French seriously affect the profits of French
> cafes and restaurants. A sign of the collapse of the French culture of
> the restaurant is visible on the streets of Paris rash of
> quick-service bar, offering generally pogardzane a few years ago and
> cheeseburgery hamburgers."
> No, dear readers, you have not gone potty. That's what it says. And I
> am afraid, Pan Komputer, that it's utter gibberish. You get 2 out of
> 10 - and an hour's detention in the language lab.
> N.b.: Listing on the lgpolicy-list is merely intended as a service to
> its members
> and implies neither approval, confirmation nor agreement by the owner
> or sponsor of
> the list as to the veracity of a message's contents. Members who
> disagree with a
> message are encouraged to post a rebuttal. (H. Schiffman, Moderator)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Lgpolicy-list