Cheeseburgery hamburgers and the problem of computerised translations

Harold Schiffman haroldfs at
Sat Jan 31 16:06:49 UTC 2009

I'm replying primarily to Don Osborn on this because he raised the issue of how
relevant it is to language policy, or at least how relevant to this
list. I forwarded
the "cheeseburgery" message because I felt that relying on MT for translation
is often an excuse for allowing a maximized use of English in various contexts,
and then assuming that MT will take care of the "problem".

Other responders have shown here that some of the MT systems do a pretty
good job, especially with "technical" language. But I had experience with the
opposite a few years ago, when I helped edit an issue of a journal on the topic
of the sociolinguistics of minority languages in France. All the articles were
in English, but it was my job to make them sound better in English, since they
had all been written by non-mother-tongue speakers of English. But one
in particular
had been machine-translated from French, and I found it impossible to make it
sound like "good" English. It was beyond help.  We finally went back
to square one
and had it translated  again, by a human being.

Maybe sociolinguistics is beyond the domain of "technical" writing,
and maybe that
was the reason the MT failed.

But I do think MT is a "policy" issue and will continue to haunt us in
anything we write
that involves metaphor, figurative language, poetry, and stuff like that.


On Sat, Jan 31, 2009 at 10:54 AM, Al Haraka <alharaka at> wrote:
> Nataly,
> Thanks for the great response.  I was very into this in college and took the
> only classes available at my school on NLP.  This is a good review.  I will
> definitely read that article!
> Cheers,
> _AJS
> Nataly Kelly wrote:
>> Google's statistical MT engine( is available
>> in the following languages: Albanian, Arabic, Bulgarian, Catalan, Chinese,
>> Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish,
>> French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian,
>> Italian, Japanese, Korean, Latvian, Lithuanian, Maltese, Norwegian, Polish,
>> Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish,
>> Thai, Turkish, Ukrainian and Vietnamese.
>> I will paste below a few recent Watchtower blog entries (independent
>> industry commentary) that might be of interest on the topic of both
>> rules-based and statistical MT. I would recommend clicking on the actual
>> page URLs to see the related links, videos and images in case they do not
>> display properly here. However, these just give a snapshot of the state of
>> the market, and do not dive into the technical details of the machine
>> translation engines. Those are often the subject of papers and presentations
>> within the localization and computational linguistics conference circuits.
>> For some types of projects, MT can actually work well, especially for
>> controlled language and technical content. The Pan American Health
>> Organization has had great success using their MT engine for technical
>> content. It is one of the best examples I have seen of domain-specific MT.
>> More information:
>> There are currently several language service providers (LSPs) whose
>> business model is centered around using free or nearly-free machine
>> translation with human post-editing. However, MT is also widely used for
>> gisting and is particularly helpful for scanning a large corpus to determine
>> which areas might require TM+post-editing or computer-assisted translation
>> (CAT) performed by humans but made easier through the use of translation
>> memory and software tools that aid with flagging repeated text so that it
>> only has to be translated once, terminology extraction and management tools
>> for ensuring consistent use of terminology, etc.
>> Another growing trend is machine interpretation (total automation of
>> spoken language interpretation), so I'll include one post on that topic
>> below as well.  Computer-assisted interpretation (CAI) is another growing
>> trend, in which both end users and interpreters themselves are making
>> greater use of software, handheld devices, and desktop applications to
>> facilitate interpretation tasks.
>> I hope some of these blog posts will be useful to colleagues, although it
>> is important to remember that, as blog entries, they provide just a snapshot
>> of the current trends in the language services market. A great many books
>> and journal articles exist on these topics that would lend greater insight
>> to those interested in the current state of the research.
>> Nataly Kelly
>> --------------------------
>> How Good Is Machine Translation? A Modest Test
>> <>
>> Donald A. DePalma 30 October 2007
>> The Wall Street Journal
>> <>
>> recently opined that "translation software is at last good enough to help
>> companies do business in other languages," noting a hoary case study from
>> Ford and posturings from Google, Microsoft, and SDL — and few real examples.
>> But that's fine. The Journal has just discovered MT, perhaps looking for
>> juicier stories to put on its Page 3
>> <> as Rupert Murdoch's News Corp
>> <> takes over as dowager queen of the print media.
>> That said, MT is definitely on the must-review list for many companies and
>> government agencies, but few are paying for it today. The biggest use of
>> automated translation is free online machine translation (OLMT). How
>> widespread? Last year Common Sense Advisory asked 2,430 consumers in
>> non-Anglophone countries
>> <>
>> whether they tried free OLMT — more than half said that they sometimes,
>> frequently, or always use machine translation to better understand
>> English-language websites.
>> So, like it or not, information consumers will get what they want
>> translated by a quick hop over to their favorite free MT site. How good will
>> the translations be? Let's consider some skeptical lines penned by our
>> resident Carioca as he read our entry about changes in the MT guard
>> <>. We decided to
>> paste these immortal words into several free OLMT sites to test Portuguese
>> into English translations rather than come up with the MT equivalent of the
>> "the quick brown fox jumped over the lazy brown dog." One of the systems
>> used statistical machine translation (SMT), the other 3 were rules-based
>> (RBMT) systems (see the full version of Automated Machine Technology
>> <> for a
>> discussion of the differences, pros, and cons).
>>    "Escrevo estas linhas em português para testar o novo método de
>>    tradução automática do Google. Eu não tenho grandes expectativas com
>>    relação a este tipo de serviço. Confesso que sou usuário freqüente
>>    para ter acesso a informações em idiomas que não domino, como o
>>    alemão, chinês e japonês. O Google agora utiliza tradução automática
>>    com base estatística. Quer dizer, quanto mais traduções houver de
>>    uma frase, mais correta será a solução oferecida pelo computador."
>>    * Google <> (SMT): "I
>>      write these lines in Portuguese to test the new method of
>>      automatic translation of Google. I do not have great expectations
>>      with respect to this type of service. I confess that I am a
>>      frequent user to access information in languages that no domain,
>>      such as German, Chinese and Japanese. Google now uses machine
>>      translation based statistics. That is, the more there translation
>>      of a phrase, more correctly is the solution offered by the computer."
>>    * PROMT <> (RBMT): "I write these lines in
>>      Portuguese to test the new method of automatic translation of the
>>      Google. I have not big expectations regarding this type of
>>      service. I confess that I am a frequent user to have access to
>>      informations in languages that I do not dominate, like German,
>>      Chinese and Japanese. The Google now uses automatic translation
>>      with statistical base. It means, how much more translations will
>>      be of a sentence, more correct will be the solution offered by the
>>      computer."
>>    * SDL <> (RBMT): "I write these lines
>>      in Portuguese for quiz the new approach of automatic translation
>>      of the Google. I do not have big expectations regarding this kind
>>      of service. Confessed that I am user frequent for have access the
>>      information in languages that do not dominate, as the German,
>>      Chinese and Japanese. The Google now utilizes automatic
>>      translation with statistical base. It want to say, specially
>>      translations will have of a phrase, more correct will be the
>>      solution offered by the computer."
>>    * SYSTRAN <> (RBMT): "I write these lines
>>      in Portuguese to test the new method of automatic translation of
>>      the Google. I do not have great expectations with regard to this
>>      type of service. I confess that I am using frequent to have access
>>      the information in languages that I do not dominate, as the
>>      German, Chinese and Japanese. The Google now uses automatic
>>      translation with base statistics. It wants to say, the more
>>      translations will have of a phrase, more correct will be the
>>      solution offered for the computer."
>> Judge for yourself. While none of these are perfect translations and one
>> is definitely not at the quality level of the others, all 4 tell us that
>> Senhor Beninatto wasn't writing a shopping list for "pound pastrami, can
>> kraut, six bagels." For many web browsers, that ability to determine the
>> subject of a communication will be good enough, allowing them to determine
>> whether they want to invest more time in a given piece of information.
>> Obviously, in more complex domains and in printed communications like
>> owner's manuals for a Porsche 911 GT3 RS
>> <> (Santa, are you
>> listening?) or how to adjust the control rods for a nuclear fission reactor,
>> tuning and accuracy will be much more of an issue.
>> ----------------------------
>> Changing of the Guard in Machine Translation
>> <>
>> Donald A. DePalma 30 October 2007
>> Most information will never be translated by humans from its source
>> language into even one other language, much less into many. Budgets,
>> staffing, and time will always make organizations shy away from translating
>> even a small fraction of the words they have on hand. Many companies and
>> government agencies will use some form of automated translation to improve
>> services to customers and constituencies. However, many information
>> consumers will avail themselves of free online machine translation (OLMT) if
>> they don't find their language at a website.
>> Most of that free OLMT to date has been provided by SYSTRAN
>> <>, a
>> French software firm that grew up during the Cold War as the Free World
>> <> faced
>> off against the Moscow-led Warsaw Pact
>> <>. In October new challenges arose
>> from the new guard, including the Russians themselves.
>>    * Google reportedly replaced the languages that SYSTRAN translated
>>      for it in favor of its in-house statistical machine translation
>>      (SMT) engine. Google's homegrown technology came into wide view
>>      when it won the no-holds-barred NIST Machine Translation
>>      Evaluation
>>  <>
>>      in 2005. Google's MT is part of the GooglePlex — that is, not yet
>>      a commercially available product, but, like its search appliance,
>>      MT could become a Google product. Try it here
>>      <>.
>>    * SMT-based Language Weaver opened its second sales office in Europe
>>  <>.
>>      After its initial success selling to certain U.S. government
>>      agencies, Language Weaver made its 2006 European debut in
>>      bureaucrat-dense, government-rich Brussels. Its latest digs are in
>>      Paris, hometown of SYSTRAN — and presumably of some commercial
>>      buyers. Free use of Language Weaver on the web is harder to find
>>      than Google or SYSTRAN. Earlier this year the company announced
>>      that the social bookmarking
>>  <>
>>      site Kontrib <> was using its technology,
>>      giving everyone a chance to see its output. Expect Language Weaver
>>      to host its own OLMT site as part of its marketing expansion.
>>    * St. Petersburg-based PROMT announced a significant uptick in the
>>      use of its free OLMT <>.
>>      This followed its September announcement of V7.8 with support for
>>      Windows Vista <>, while
>>      those fortunate enough to speak Russian already have access to
>>      Version 8.0 <> with its improved algorithms
>>      and usability. Try its free OLMT <>.
>> The bottom line: Most consumers will never buy desktop machine translation
>> software from LEC, PROMT, or SYSTRAN for their PCs, Macs, or smartphones.
>> However, they will have free MT available in the cloud from Google, Language
>> Weaver, LogoVista <>,
>> Microsoft, PROMT, SYSTRAN , and through portals like Yahoo! BabelFish
>> <>. How well do they work? Click here for a
>> modest example <>.
>> ----------------------------
>> Seeking an MT Market beyond Ad-Reading Eyeballs
>> <>
>> Donald A. DePalma 25 September 2008
>> Last week, Language Weaver projected a US$67.5 billion market for digital
>> translation, enabled by advances in machine translation (MT). For the last
>> few years, we have released an annual estimate of the market for outsourced
>> translation, localization, and interpretation. For 2008, human-delivered
>> translation activities will total a hefty US$14.25 billion (see our "Ranking
>> of Top 25 Translation Agencies
>> <>").
>> On the software side, we estimate that the MT software market falls well
>> short of US$100 million. Added together, there's a lot of daylight between
>> our numbers and Language Weaver's estimate. Where's the disconnect? Over the
>> last week, we've spent a lot of time talking with various people about the
>> US$67.5 billion projection.
>> Let's start off by deconstructing the 67 billion dollar number. That is an
>> estimate of the monetary value that Language Weaver thinks MT suppliers
>> "could" translate for corporations and governments; the operative phrase in
>> the company's press release is "untapped markets" where automated
>> translation could increase the volume and lower the cost of human
>> translation, which stands at current market prices of 10-40 cents per word
>> <>.
>> How good is Language Weaver's sizing of the as yet unrealized market? We
>> think its number is way too low, especially as the amount of stored content
>> grows at record levels (see the figure below from our report on "Automated
>> Translation Technology
>> <>").
>> The untapped market potential is much higher, but the problem is still
>> getting buyers on board. Language Weaver will target customer care, business
>> intelligence, and user-generated content, three markets where companies
>> could benefit from moving content out of linguistic silos. However, the
>> organizations today that stand to gain the most from MT are those driving
>> advertisement-reading eyeballs to their sites
>> <>. The challenge
>> that Language Weaver and rival developers face is getting more people
>> accustomed to the idea of paying for MT software or SaaS solutions that will
>> help them translate their content into other languages. Three roadblocks
>> stand in the way:
>>    * *Free machine translation obscures the value.*  There's an
>>      enormous amount of content that's translated every day online
>>      using free online machine translation sites, but no one has
>>      figured out how to directly monetize those interactions. We have
>>      long contended that there's far more text that consumers,
>>      businesses, and governments might run through those engines if
>>      they could more easily plug them into workflows, e-email systems,
>>      mobile phones, and other networked appliances. Combine a dollar
>>      figure for the unmonetized activity that's happening today at
>>      sites like Google Translate or Yahoo!'s Babel Fish with the dollar
>>      value for things that should be translated - and you've got some
>>      really big piles of zeroes. The problem is that there are usually
>>      no positive integers to the left of those zeroes. Bottom line: Too
>>      much of it is free.
>>    * *Unpaid human translation appears to be a panacea.*  Another rival
>>      to MT is community or collaborative translation
>>      <> for both
>>      company- and user-generated content, such as we're seeing at
>>      Facebook
>>  <>
>>      (social networking), Livemocha
>>      <> (language learning), and
>>      NetBeans
>>      <> (Java
>>      software development). These communities can fill some of the
>>      demand, but nowhere near all of it. That leaves a lot of
>>      information forever locked in the language in which it was created.
>>    * *An uneducated market expects too much or too little.* Potential
>>      buyers retain unrealistic (read "Star Trek" or Hitchhiker's
>>      Guide") expectations of what they will get out of machine
>>      translation. Some ignore the quality issue
>>  <>
>>      altogether, posting babble-fishy output and thinking they did a
>>      good thing in providing any in-language content at all. Meanwhile,
>>      many individual translators and too many translation agencies miss
>>      the point; they think that MT threatens their livelihood rather
>>      than viewing it as a productivity enhancer.
>> That said, the corporate and governmental sectors may be turning the
>> corner vis-à-vis MT acceptance, if not purchasing. A poll conducted by the
>> International Association for Machine Translation (IAMT) and Association for
>> Machine Translation Americas (AMTA) for SDL
>> <>,
>> another provider of machine translation technology, found that 40 percent of
>> the 385 surveyed individuals were "now" likely to use MT. Of those roughly
>> 150 receptive respondents, 62 percent said they would use it for technical
>> documentation, 49 percent for support and knowledge-based content. That's
>> good news for the MT software sector, but could be bad news if automated
>> translation merely displaces the work of traditional translation agencies
>> rather than increase the size of the overall business.
>> --------------------------
>> Asia Online Aims to Meet Asian Content Demands with MT+
>> <>
>> Donald A. DePalma 14 April 2008
>> For the last dozen of so years we've heard ourselves incessantly reminding
>> everyone that the "www" in most URLs means "worldwide web," while the "e" in
>> "e-commerce" all too often stands for English. Our research on e-GDP
>> <>
>> (online GDP) and the Availability Quotient
>> <>
>> demonstrated that many companies still have a long journey before they can
>> meet the demands of the world's markets for local-language content. That gap
>> is no more apparent than in Asia where the amount of in-language content is
>> dwarfed by the growing online population.
>> Just how dwarfed? Today, roughly 38% of internet users live in Asia, but
>> by 2012, that number will jump to half. However, local-language content
>> hasn't kept pace. In 2007, non-Asian languages accounted for roughly 86% of
>> the content on the web. Most of the remaining 14% was split among Japanese
>> (6%), Chinese, (6%), and Korean (1.5%). All other Asian languages comprise
>> less than 0.03% of the web's content; for example, Southeast Asian languages
>> make up less than 10 million pages. Given consumer preference for content in
>> their own language
>> <>, that
>> huge gap between Asian content and total online population represents a huge
>> opportunity.
>> That opportunity has not gone unnoticed. After getting an eyes-only,
>> tell-no-one pre-briefing in December, we recently spoke with Asia Online CEO
>> Dion Wiggins who called us to tell us that his portal had just scored its
>> first round of funding from JAIC
>> <>, the Japanese venture
>> capital behind
>> <>,
>> among others. He also wanted to let us know that Kirti Vashee
>> <>, formerly VP of
>> marketing at Language Weaver, had signed on as Asia Online's VP of sales for
>> the Americas and Europe with the responsibility for selling the commercial
>> version of its MT engine.
>> Asia Online's plans revolve around a proprietary machine translation
>> engine plus a strong support infrastructure of humans, content, and partners
>> are key to this strategy:
>>    * *New technology.* Asia Online developed high-performance
>>      statistical machine translation (SMT) software in collaboration
>>      with University of Edinburgh professor Philipp Koehn.
>>    * *Clean corpora.* Asia Online contracts with publishers, language
>>      service providers, and eventually corporations for
>>      human-translated content to train its SMT engine. The company also
>>      crowdsources the quality via a large community of students, and
>>      feeds the validated content back into the system as training data.
>>    * *Matrixed language learning.* The SMT engine can take translations
>>      of a novel into English, Japanese, and Thai and use the
>>      permutation to train itself on English<>Thai, English<>Japanese,
>>      and Japanese<>Thai. This capability is especially important for
>>      languages that don't have enough content to feed a data-hungry
>>      statistical MT engine.
>>    * *Real-time fixes.* Its MT engine lets reviewers observe
>>      translation decisions as they are being made, allowing them to
>>      influence choices, make fixes in place, and propagate these
>>      modifications to wherever that phrase or term is used
>> Asia Online is talking with LSPs interested in using its SMT engine and
>> has fielded corporate requests to use its software. We think that its real
>> value lies in its Google-esque plan to drive billions of eyeballs
>> <> seeking content in
>> their own languages — and the advertising, special offers, and the
>> next-generation linguistic tools that are sure to follow.
>> --------------------------
>> Google MT Puts Multilingual Information at More Fingertips
>> <>
>> Donald A. DePalma 25 March 2008
>> As we predicted in our 2006 report on machine translation
>> <>,
>> Google has opened its MT engine to general usage — but with no software
>> license or other fees. Acknowledging that automated translation right now is
>> all about eyeballs, <>
>> Google made its newly documented AJAX Language API for Translation and
>> Language Detection <>
>> beta release free to anyone who decides to call it. By the way, we would
>> have put "language detection" first in the API's name, but Google knows a
>> bit more about SEO than we do.
>> As the name implies, you can use this application programming interface to
>> detect language blocks in a text and translate them. Translation requests go
>> to Google's pretty good statistical MT engine
>> <> (SMT). The API
>> supports 29 language pairs
>> <> (13
>> languages in total), including the usual E-FIGS and CCJK plus French<>German
>> without involving English as the pivot language. Translation services are
>> what Google generates without the option for training the SMT engine on your
>> particular lexicon. Nonetheless, Google translations have proven to be very
>> intelligible in the mash-ups
>> <> that we have done
>> or observed.
>> Google says that its language API is simple and easy to use — versus an
>> arcane call-level interface: It requires an input string to translate, the
>> names of the source and target languages, and a callback function. We put
>> that claim to the test with a short program that threw increasingly larger
>> strings at the interface. We can attest that it is easy to use for short
>> strings. We did notice a couple of restrictions in our sandbox (N.B. Common
>> Sense Advisory Labs did not conduct exhaustive tests on the API — rather, we
>> ran tests until we got bored with the permutations):
>>    * *Strings.* The API maxes out at around 1,200 characters per source
>>      string of plain text (figure on 100-120 words). While that's good
>>      for including Google's MT in your average application, it won't
>>      help the average language service provider intent on
>>      pre-translating big files.
>>    * *Files and URLs.* If you want to translate files, set them up as
>>      HTML pages hanging off a website and type the URL into Google's
>>      website translator
>>      <>. That worked for
>>      web pages and shorter documents, but choked on the unexpurgated
>>      HTML version of "Business Without Borders
>>      <>" (a mere 122,000 words,
>>      give or take a couple hundred). We also tried translating the
>>      19,000 words of Thomas Paine's Common Sense
>>      <>
>>      pamphlet into Japanese and Russian. Google translates the first
>>      5,300 words, but leaves the rest of the page in English.
>> Google's AJAX Language API page promises future enhancements. We expect
>> longer strings, named files, and longer documents to be part of future
>> releases. What's less likely in free Google MT are commercial features such
>> as lexical tuning by company, industry-specific glossaries, or the feedback
>> loop available since 2005 in Language Weaver
>> <>
>> (although Google does have a generalized "train the engine" function).
>>    * For information consumers and seekers of truth in languages other
>>      than their own, these advances will be good news. Higher quality,
>>      free machine translation utilities will lead to MT popping up in
>>      more and more applications.
>>    * For translators who don't own translation memory software, we
>>      think that Google remains a great candidate for offering a
>>      gmail-like translation environment
>>      <>, replete with MT.
>>    * Smart LSPs should seriously consider preprocessing small projects
>>      through the Google engine and — depending on the output — decide
>>      whether it is worth post-editing or fully translating the text.
>>      After all, they really don't have anything to lose and could
>>      increase the productivity of their translators.
>>    * Competing MT engines will need to move fast to stay ahead of the
>>      ad-funded portal. This API will make life difficult for the
>>      already besieged smaller players trying to sell their wares in a
>>      market monetized more by search and eyeballs than by software
>>      license revenue. Companies like SpeakLike and Transclick
>>      <> (one of 391
>>      World Economic Forum Technology Pioneers
>>  <>)
>>      will likely add the Google engine to their suites of MT engines.
>>      Meanwhile, we don't expect companies like Asia Online
>>      <>, Language
>>      Weaver, Microsoft, PROMT, SDL, SYSTRAN, and others with their own
>>      MT engines and advancing research to sit on the callable MT
>>      sidelines for long.
>> Earlier today we spoke with Dimitris Sabatakakis, CEO at SYSTRAN, who said
>> that "all MT providers should thank Google for the hype and excitement it
>> brings as MT is now perceived as a practical and usable technology. This
>> means there are more potential customers interested in a MT product or
>> solution. Google's investment in MT is proof that MT is a key technology for
>> the emerging market and provides a solution to a real need. It is forcing
>> all providers to raise their respective bars. If we stay static, we will
>> collapse."
>> -------------------------
>> Chevy "Nova": Updating Bad Translation Apocrypha
>> <>
>> Donald A. DePalma 6 February 2008
>> Not an hour goes by that we don't receive an e-mail announcing a press
>> release from a vendor. What we find most interesting is when a company
>> issues a press release but fails to tell us (or anybody else) that it's out
>> there. That happened back in May when SDL noted that "Spanish leaves global
>> marketers lost in translation." Quoting the press release, "According to
>> SDL, the top five worst translation mistakes made by companies looking to
>> expand into the Spanish-speaking world" were the usual hackneyed examples of
>> bad translation. These included "I saw the Pope" (/el Papa)/ translated as
>> "I saw the potato" (/la papa/), the "Got milk?" slogan rendered as "Are you
>> lactating?" in Spanish, and Parker introducing its non-leaking fountain pen
>> in Spain with the slogan "it won't leak in your pocket and embarrass you,"
>> with the translator buddying up with a false friend (/embarazar/ means
>> pregnant, not embarrassed). At least they left out the old chestnut about
>> the Chevy Nova (/no va/ — get it?) in Latin America and the rumored
>> over-medicated U.S. Latina who interpreted the "/once/ a day" on her
>> prescription as "11 times a day."
>> What's going on here? It's all about search engine optimization. SDL cited
>> these examples plus economic figures for Latin American growth to improve
>> its SEO rankings for the Hispanic market. The company's CMO figured that
>> becoming associated with these sometimes apocryphal mistranslations was a
>> good way to improve SDL's search engine rankings. Of course, we're doing the
>> same here by recycling these oft-told tales of mistranslation.
>> But wait — there are some really good examples of bad translations and
>> cross-border mistakes out there. Here are a few of our favorites:
>>    * For our 2002 keynote at the SAE's TopTec Multilingual
>>      Communication for the Automotive Industry conference, we found
>>      candidates for "Bad Product Name of the Year" among Japanese car
>>      makers selling in Latin America: Mazda Laputa (interpreted by
>>      Spanish speakers as /la puta/), Mitsubishi Pajero (slang for
>>      onanist), and Nissan Moco (snot). In that speech we cited an auto
>>      show description of the Laputa that might not be suitable for
>>      children — "Laputa ha mejorado su seguridad y ampliado su
>>      interior… Cuerpo diseñado para resistir impactos frontales." Check
>>      that out at Yahoo! <> or
>>      Google <> free MT sites.
>>    * More recently, Car and Driver
>>  <>
>>      magazine reviewed the translated claims of Chinese automakers at
>>      the Detroit Auto Show. The brochure for the Liebao CS6 SUV claimed
>>      "Gene of being Wild: VM engine brings you the long-awaited shock…
>>      only by stepping on the accelerograph, the mph will come to the
>>      peak in a second" and the BYD F3 sedan has "fuel efficiency stomach."
>>    * Back to the subject of product names, we noticed a stand for a
>>      firm selling "Hyper STD" at the tekom conference in Wiesbaden,
>>      Germany last November (see photo above). Yuck! Most American
>>      buyers would steer clear of products associated with Sexually
>>      Transmitted Diseases.
>>    * When we tried the WiFi at the tekom conference Hotel Klee am Park
>>      in Wiesbaden, we read the English-language instructions that told
>>      us: "General technical supposition is a reticulation-card. Please
>>      arrange your reticulation-card to IP (automatic internet
>>      register)." Huh?
>>    * The classic post-Sputnik mistranslation of "wet sheep" for
>>      "hydraulic rams" in a Soviet science journal is an under-used
>>      classic example. That's baaaad! Next time you think about
>>      referencing the Nova, try this one instead.
>>    * A friend who was an interpreter at the United Nations told us
>>      about a colleague who tried to amplify an emotionally-delivered
>>      idiomatic expression, suggesting that "we need to grab the bull by
>>      something other than the horns." Ouch.
>> But bad translations aren't always funny. They can have serious
>> consequences:
>>    * *Financial markets will shake. *Back in May 2005 a reporter for
>>      the China News Service pieced together a story about how currency
>>      appreciation might affect the market
>>  <>.
>>      The People's Daily had it translated into English without the
>>      subjunctive case, stating that China decided to revalue its
>>      currency 1.26% a month for a year. Bloomberg's spider in London
>>      picked up the story and European equity markets rose on the news.
>>      While it was quickly repudiated, the error did cause market tremors.
>>    * *Armies can advance without consequence. *In August 1968 U.S. Army
>>      transcribers reportedly wrote down a transmission from a Soviet
>>      tank column as "my perexali most" rather than "my priexali v
>>      Most." What was heard (a routine bridge-crossing exercise by a
>>      tank column) was not what happened (the arrival of Soviet tanks in
>>      Most, a city in sovereign
>>      <> Czechoslovakia).
>>    * *Countries might disappear.* In October 2005 Iranian President
>>      Mahmoud Ahmadinejad
>>      <>
>>      reportedly called for Israel to be wiped off the map, but
>>      apparently he really "just" wanted to get rid of its government.
>>      True to form, Ahmadinejad didn't clarify his remarks after the
>>      mistranslation, further complicating matters.
>>    * *Companies will get into trouble.* A senior executive at Yahoo!
>>      had to apologize for not giving U.S. Congressmen information about
>>      the company's role in the imprisonment of a Chinese dissident
>>      <>, Shi
>>      Tao. According to Yahoo!, a bad translation by an employee of a
>>      2004 order from the Chinese government caused the problem.
>> None of the mistakes after the "But wait" in this posting were machine
>> translation miscues
>> <> — they're just
>> bad translations by humans. Caveat lector!
>> --------------------------
>> JAJAH Advances Machine Interpretation
>> <>
>> Renato Beninatto and Nataly Kelly 12 August 2008
>> Filed under (Interpretation
>> <>, Translation &
>> Localization
>> <>,
>> Translation Technologies
>> <>,
>> Language Industry
>> <>)
>> 2 pepper rating
>> When we first heard about JAJAH's extremely simple process
>> <> for providing machine-based telephone
>> interpretation, it sounded too good to be true. The process is comprised of
>> three easy steps — simply dial a number from any phone, speak in English,
>> and hand your phone to the person who speaks Mandarin. The way it is
>> described, the service would seem to automate much of human interpreters'
>> work, and would be particularly helpful for situations in which telephone
>> interpreters are used. As usual, if it sounds to good to be true, it
>> probably is.We tested the service, currently touted as a way to help
>> travelers overcome language barriers in China, just in time for the Beijing
>> Olympics
>> <>.
>> We conducted several tests and found that the service seemed to work quite
>> well at some levels, in that it did correctly render some of our words into
>> the target languages. However, the voice recognition component misunderstood
>> some of our words, even when we conducted tests with speakers of native and
>> near-native English. To test the service in Mandarin, we used voice-over
>> samples recorded by professional talent, and the results were a bit
>> difficult to understand in English — then again, we purposely used samples
>> with brand names that we knew tend to be problematic for machine translation
>> tools. Now that we've aired our complaints, let's take a look at a few
>> points on the bright side of this innovation:
>>    * *You get what you pay for — at least, in the early stages. *The
>>      service is free, so it should come as no surprise that it does not
>>      work perfectly yet. In spite of the disjointed target language
>>      versions we received in English and the fact that telephony
>>      provider JAJAH went with another Babel theme, we do not believe
>>      that the localization world will automatically relegate it to the
>>      role of industry laughingstock, as happened with BabelFish
>>  <>.
>>    * *Free machine-based telephone interpretation is a first. *At
>>      Common Sense Advisory, we've been writing more in the past few
>>      months about the trend we are noticing toward computer-assisted
>>      interpretation (CAI)
>>  <>
>>      and the future synergies between translation memory and what we
>>      refer to as interpretation memory (IM) — pre-translated and
>>      pre-recorded words and phrases that serve to partially automate
>>      the process of interpretation. This additional focus in our
>>      research is intentional — CAI has already been widely implemented
>>      for devices used by the military, but this is one of the first
>>      instances we're aware of that offers such a service for free,
>>      on-demand, via telephone, and to the general public. This type of
>>      service pushes CAI to a new level.
>>    * *Savvy developers will want to take note. *This offering from
>>      JAJAH may not appear at first to represent a major technological
>>      advancement, but it does prove to the world that machine
>>      interpretation (MI) is possible, even if the quality is not yet up
>>      to par. LSPs — especially telephone interpretation providers
>>  <>
>>      — and technology companies that aim to stay ahead of the curve are
>>      well-served to keep CAI and MI on their radar. We predict that
>>      more and more of these services will begin to spring up soon.
>> Even for the traveler who is willing to hit the re-dial button a few times
>> and is able to accept an imperfect rendition, this service may be of limited
>> use. While it's certainly not as costly as some of the phone-based Chinese
>> interpretation services that have recently been profiled in the Wall Street
>> Journal
>> <>
>> and other media as services for travelers to the Olympics, it could prove to
>> be cost-prohibitive for a person dialing the number repeatedly and trying to
>> confirm the recording's accuracy while sitting in a taxi in Beijing with the
>> meter running — especially if proper nouns, such as the hotel name, are
>> rendered incorrectly. That's precisely what happened in our example — take a
>> look at the video below and judge for yourself. In summary, we don't see
>> this service replacing the need for phone-based interpreters anytime soon,
>> but the general impact — and possibilities — for the language services
>> industry are definitely worth noting.
>> ----------------------------------------------------------
>> Google Shakes Up the Translation Memory Scene
>> <>
>> Nataly Kelly 8 August 2008
>> Filed under (Translation & Localization
>> <>,
>> Translation Technologies
>> <>,
>> Language Industry
>> <>)
>> This week, there were rumblings about the forthcoming beta release of
>> Google's new translation management system (TMS), called Translation Center
>> <>. If you're
>> familiar with Google Translate, <>
>> you might be thinking, "Big deal, this is just a low-tech, human version of
>> what they're already doing." If so, you would be wrong: This is big news for
>> the practice of translation. It seems that Google has been stalking the
>> sector.
>> We predicted in 2006
>> <>
>> that Google would open up its statistical machine translation engine for
>> general usage — and so it did, as we reported in March 2008
>> <>. Last December,
>> we published our first report on collaborative translation
>> <>, in
>> which we explained how collaboration tools and open source concepts could
>> increase translation efficiency. We've written about the merits of
>> crowdsourcing
>> <>
>> and how companies like Facebook, Google, and Sun Microsystems have pioneered
>> work in this area.
>> Google seems to have been listening. In December of 2007, we suggested a
>> gmail-like model <> for
>> translation memory and forecasted that a company from outside the language
>> industry with no interest in selling tools — such as Ask, Google, or Yahoo!
>> — might be well-served to make such an offer. Google has apparently done
>> just that. It claims that its new translation management system (TMS) gives
>> users the ability to request translations, find translators, and upload
>> documents for translation into more than 40 languages. It also enables
>> freelancers to create and review content in their languages using free
>> translation tools. Yes, free.
>> Why would Google take an interest in supporting human translation
>> activities? One big reason: It needs human support in order to build up its
>> translation memory, so that Google Translate can evolve from a "me translate
>> pretty one day" prototype to a reputable and reliable language conversion
>> machine. True, there are some large sources of free translation memory out
>> there already — such as the enormous database offered by the European
>> Parliament
>> <>.
>> But, to truly enable mass quantities of information to be shared around the
>> globe, Google needs richer, vaster sources of TM than what's currently in
>> the public domain. After all, the typical web user might want to communicate
>> now and then regarding things other than, say, official EU declarations and
>> proceedings.
>> Adding humans to the mix enables Google to gradually create a very large
>> storehouse of translated words and phrases — exactly what TAUS is aiming for
>> with its data sharing initiative
>> <> and what Asia
>> Online is doing with its human-enhanced statistical MT engine
>> <>. In a
>> nutshell, Google will unite its cloud with the crowd to get as many helping
>> hands on the job as it can.
>> We'll reserve our detailed comments on Google Translation Center until we
>> can actually try it out for ourselves and see how it fares alongside other
>> TMS programs — our in-depth report with translation management system
>> scorecards
>> <>
>> for translation management suppliers will be published soon — but the big
>> picture value of this news for the industry is clear. Even in its beta form,
>> Google Translate showed decent promise
>> <> for the future of
>> automating written language mediation — it is a well-built machine
>> translation engine.
>> What separates Google from the rest of the MT field is that this machine
>> is backed up by a manufacturer with plenty of money, data center power, disk
>> space, and network infrastructure, not to mention expertise in the assembly
>> and productization of raw information materials. But now, with the addition
>> of humans, it has the opportunity to become well-oiled in addition to having
>> a sturdy construction. What remains to be seen is if Google can find enough
>> oil to maximize MT performance. Thankfully, translation memory is a
>> plentiful resource — one that won't require any drilling.
>> -----------------------------
>> --- On *Sat, 1/31/09, Don Osborn /<dzo at>/* wrote:
>>    From: Don Osborn <dzo at>
>>    Subject: RE: Cheeseburgery hamburgers and the problem of
>>    computerised translations
>>    To: lgpolicy-list at
>>    Date: Saturday, January 31, 2009, 9:47 AM
>>    We all know MT (machine translation, aka  computerized translation) is
>> not
>>    perfect so I don't think this piece was particularly informative.
>>    The only news I see in it is that there is MT for Polish <-> English
>>    (probably has been for a while but this is the first note I've made of
>> to
>>    it). Given what must be necessary to develop MT, it does not surprise
>> me if
>>    a recently developed program churns out some
>>     cheeseburgery results (though I
>>    wonder who put that word in the lexicon).
>>    While on the topic, my favorite MT mistranslation was with an older
>> version
>>    of (results duplicable on Babalfish): "discussion on
>>    fonts" in
>>    English became in Portuguese the equivalent of "quarrels in baptismal
>>    basins." Such blatantly outrageous results, though, speak to me as a
>>    non-specialist in the matter more of how the MT was set up than any
>> inherent
>>    problem with setting up MT. Discussion in English is not really a
>> synonym
>>    with its apparent cognates in Latin languages (at least French &
>>    Portuguese); and how often do English speakers use "font" to describe
>>    a what
>>    in Portuguese they call pias baptismas? I've never heard of
>> cheesburgery
>>    before but will surely find a way to use it in conversation sometime -
>> just
>>    not in MT.
>>    The real news is how useful MT can be in sorting through the gist of
>> things
>>    in diverse
>>     languages, and how with new approaches the results are improving
>>    significantly. I hope FT takes a look at that, and how the complex and
>>    uneven progress in MTis changing the way we access and use multilingual
>>    content and documents.
>>    Don Osborn
>>    > -----Original Message-----
>>    > From: owner-lgpolicy-list at [mailto:owner-lgpolicy-
>>    > list at] On Behalf Of Harold Schiffman
>>    > Sent: Tuesday, January 27, 2009 11:18 AM
>>    > To: lp
>>    > Subject: Cheeseburgery hamburgers and the problem of computerised
>>    > translations
>>    >    > Cheeseburgery hamburgers and the problem of computerised
>> translations
>>    > January 26, 2009by Tony Barber
>>    >    > This morning I found myself on a public platform in a Brussels
>> hotel
>>    > for my first ever European bloggers' conference. As a representative
>>    > of an "establishment" news organisation, I was half-expecting
>>     to
>>    be
>>    > roasted alive. But in the end both Mark Mardell of the BBC, my friend
>>    > and fellow-guest, and I got through it safely enough. The most
>>    > perceptive contribution, I thought, came from a Romanian blogger who
>>    > made the point that the global blogosphere remains to a large extent
>>    > divided by language. For example, you can blog all you like in
>>    > Romanian, but most of the world won't have a clue what you're
>>    saying.
>>    >    > A moderator responded to this by saying, "Try using
>>    computer-generated
>>    > translation." As I drifted back to my office, I recalled that the
>>    last
>>    > time I'd experimented with computers striving to change Italian into
>>    > English or Dutch into Spanish, the results had been pretty hopeless.
>>    > Perhaps things had improved over the last couple of years?
>>    >    > Well, below are three examples of computerised translation -
>> courtesy
>>    > of Google
>>     Language Tools - from French, German and Polish into
>>    > English. I am republishing the translations exactly as they came out,
>>    > punctuation mistakes and all, after I hit the button.
>>    >    > 1) This is from a news story in Le Monde about US and European
>> policy
>>    > in the Middle East. "Believing that the war in Gaza has imposed new
>>    > priorities and the administration of the new American president,
>>    > Barack Obama, might break with the unconditional support to Israel,
>>    > French diplomacy is trying to print in Europe, a change of tone
>>    > against the Hamas."
>>    >    > As you can see, this translation starts off promisingly. In
>> fact, it
>>    > scarcely puts a foot wrong until it loses control and talks, weirdly,
>>    > about printing changes of tone against the Hamas. Still, we sort of
>>    > know what's going on here. 7 out of 10 for Monsieur L'Ordinateur.
>>    >    > 2) Now here's a sentence from a
>>     story in Germany's Süddeutsche
>>    Zeitung
>>    > about the US prison centre at Guantánamo and what Europe can do to
>>    > help close it down. "The fate Released Guantanamo prisoners ensures
>>    > fierce debates: Union politicians criticized the foreign ministers of
>>    > Vorpreschen Stein Meier - and refer the responsibility for the
>> inmates
>>    > to the U.S."
>>    >    > This is a pretty poor effort, Herr Computer.  Particularly
>>    > disappointing is the omission of the preposition "of" between
>>    "fate"
>>    > and "released" (which also shouldn't have a capital R), and
>>    the
>>    > baffling three words "Vorpreschen Stein Meier". But let's be
>>    fair,
>>    > there's a modest degree of sense here. 5.5 out of 10.
>>    >    > 3) Lastly, here's a sentence from the Polish newspaper Gazeta
>> Wyborcza
>>    > on French leisure habits during the recession. "Economic crisis and
>>    > changing lifestyles, the French seriously affect
>>     the profits of French
>>    > cafes and restaurants. A sign of the collapse of the French culture
>> of
>>    > the restaurant is visible on the streets of Paris rash of
>>    > quick-service bar, offering generally pogardzane a few years ago and
>>    > cheeseburgery hamburgers."
>>    >    > No, dear readers, you have not gone potty. That's what it says.
>> And I
>>    > am afraid, Pan Komputer, that it's utter gibberish. You get 2 out of
>>    > 10 - and an hour's detention in the language lab.
>>    >    >
>>    > the-problem-of-computerised-translations/
>>    >    > --
>>    > **************************************
>>    > N.b.: Listing on the lgpolicy-list is merely intended as a service to
>>    > its members
>>    > and implies neither approval, confirmation nor agreement by the owner
>>    > or sponsor of
>>    > the list as to the veracity of a message's contents.
>>     Members who
>>    > disagree with a
>>    > message are encouraged to post a rebuttal. (H. Schiffman, Moderator)
>>    > *******************************************
> --
> Alexander J. Stein
> Cell:  (201) 412-9479
> Email: alharaka at
> Skype: alexander.j.stein
> AIM:   elduderino6886


 Harold F. Schiffman

Professor Emeritus of
 Dravidian Linguistics and Culture
Dept. of South Asia Studies
University of Pennsylvania
Philadelphia, PA 19104-6305

Phone:  (215) 898-7475
Fax:  (215) 573-2138

Email:  haroldfs at


More information about the Lgpolicy-list mailing list