Cheeseburgery hamburgers and the problem of computerised translations

Al Haraka alharaka at
Sat Jan 31 15:54:27 UTC 2009


Thanks for the great response.  I was very into this in college and took 
the only classes available at my school on NLP.  This is a good review. 
  I will definitely read that article!


Nataly Kelly wrote:
> Google's statistical MT engine( is 
> available in the following languages: Albanian, Arabic, Bulgarian, 
> Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, 
> Filipino, Finnish, French, Galician, German, Greek, Hebrew, Hindi, 
> Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, 
> Maltese, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, 
> Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian and 
> Vietnamese.
> I will paste below a few recent Watchtower blog entries (independent 
> industry commentary) that might be of interest on the topic of both 
> rules-based and statistical MT. I would recommend clicking on the actual 
> page URLs to see the related links, videos and images in case they do 
> not display properly here. However, these just give a snapshot of the 
> state of the market, and do not dive into the technical details of the 
> machine translation engines. Those are often the subject of papers and 
> presentations within the localization and computational linguistics 
> conference circuits.
> For some types of projects, MT can actually work well, especially for 
> controlled language and technical content. The Pan American Health 
> Organization has had great success using their MT engine for technical 
> content. It is one of the best examples I have seen of domain-specific 
> MT. More information: 
> There are currently several language service providers (LSPs) whose 
> business model is centered around using free or nearly-free machine 
> translation with human post-editing. However, MT is also widely used for 
> gisting and is particularly helpful for scanning a large corpus to 
> determine which areas might require TM+post-editing or computer-assisted 
> translation (CAT) performed by humans but made easier through the use of 
> translation memory and software tools that aid with flagging repeated 
> text so that it only has to be translated once, terminology extraction 
> and management tools for ensuring consistent use of terminology, etc.
> Another growing trend is machine interpretation (total automation of 
> spoken language interpretation), so I'll include one post on that topic 
> below as well.  Computer-assisted interpretation (CAI) is another 
> growing trend, in which both end users and interpreters themselves are 
> making greater use of software, handheld devices, and desktop 
> applications to facilitate interpretation tasks.
> I hope some of these blog posts will be useful to colleagues, although 
> it is important to remember that, as blog entries, they provide just a 
> snapshot of the current trends in the language services market. A great 
> many books and journal articles exist on these topics that would lend 
> greater insight to those interested in the current state of the research.
> Nataly Kelly
> --------------------------
> How Good Is Machine Translation? A Modest Test 
> <>
> Donald A. DePalma 30 October 2007
> The Wall Street Journal 
> <> 
> recently opined that “translation software is at last good enough to 
> help companies do business in other languages,” noting a hoary case 
> study from Ford and posturings from Google, Microsoft, and SDL — and few 
> real examples. But that’s fine. The Journal has just discovered MT, 
> perhaps looking for juicier stories to put on its Page 3 
> <> as Rupert Murdoch’s News 
> Corp <> takes over as dowager queen of the print 
> media.
> That said, MT is definitely on the must-review list for many companies 
> and government agencies, but few are paying for it today. The biggest 
> use of automated translation is free online machine translation (OLMT). 
> How widespread? Last year Common Sense Advisory asked 2,430 consumers in 
> non-Anglophone countries 
> <> 
> whether they tried free OLMT — more than half said that they sometimes, 
> frequently, or always use machine translation to better understand 
> English-language websites.
> So, like it or not, information consumers will get what they want 
> translated by a quick hop over to their favorite free MT site. How good 
> will the translations be? Let’s consider some skeptical lines penned by 
> our resident Carioca as he read our entry about changes in the MT guard 
> <>. We decided 
> to paste these immortal words into several free OLMT sites to test 
> Portuguese into English translations rather than come up with the MT 
> equivalent of the “the quick brown fox jumped over the lazy brown dog.” 
> One of the systems used statistical machine translation (SMT), the other 
> 3 were rules-based (RBMT) systems (see the full version of Automated 
> Machine Technology 
> <> 
> for a discussion of the differences, pros, and cons).
>     “Escrevo estas linhas em português para testar o novo método de
>     tradução automática do Google. Eu não tenho grandes expectativas com
>     relação a este tipo de serviço. Confesso que sou usuário freqüente
>     para ter acesso a informações em idiomas que não domino, como o
>     alemão, chinês e japonês. O Google agora utiliza tradução automática
>     com base estatística. Quer dizer, quanto mais traduções houver de
>     uma frase, mais correta será a solução oferecida pelo computador.”
>     * Google <> (SMT): “I
>       write these lines in Portuguese to test the new method of
>       automatic translation of Google. I do not have great expectations
>       with respect to this type of service. I confess that I am a
>       frequent user to access information in languages that no domain,
>       such as German, Chinese and Japanese. Google now uses machine
>       translation based statistics. That is, the more there translation
>       of a phrase, more correctly is the solution offered by the computer.”
>     * PROMT <> (RBMT): “I write these lines in
>       Portuguese to test the new method of automatic translation of the
>       Google. I have not big expectations regarding this type of
>       service. I confess that I am a frequent user to have access to
>       informations in languages that I do not dominate, like German,
>       Chinese and Japanese. The Google now uses automatic translation
>       with statistical base. It means, how much more translations will
>       be of a sentence, more correct will be the solution offered by the
>       computer.”
>     * SDL <> (RBMT): “I write these lines
>       in Portuguese for quiz the new approach of automatic translation
>       of the Google. I do not have big expectations regarding this kind
>       of service. Confessed that I am user frequent for have access the
>       information in languages that do not dominate, as the German,
>       Chinese and Japanese. The Google now utilizes automatic
>       translation with statistical base. It want to say, specially
>       translations will have of a phrase, more correct will be the
>       solution offered by the computer.”
>     * SYSTRAN <> (RBMT): “I write these lines
>       in Portuguese to test the new method of automatic translation of
>       the Google. I do not have great expectations with regard to this
>       type of service. I confess that I am using frequent to have access
>       the information in languages that I do not dominate, as the
>       German, Chinese and Japanese. The Google now uses automatic
>       translation with base statistics. It wants to say, the more
>       translations will have of a phrase, more correct will be the
>       solution offered for the computer.”
> Judge for yourself. While none of these are perfect translations and one 
> is definitely not at the quality level of the others, all 4 tell us that 
> Senhor Beninatto wasn’t writing a shopping list for “pound pastrami, can 
> kraut, six bagels.” For many web browsers, that ability to determine the 
> subject of a communication will be good enough, allowing them to 
> determine whether they want to invest more time in a given piece of 
> information. Obviously, in more complex domains and in printed 
> communications like owner’s manuals for a Porsche 911 GT3 RS 
> <> (Santa, are you 
> listening?) or how to adjust the control rods for a nuclear fission 
> reactor, tuning and accuracy will be much more of an issue.
> ----------------------------
> Changing of the Guard in Machine Translation 
> <>
> Donald A. DePalma 30 October 2007
> Most information will never be translated by humans from its source 
> language into even one other language, much less into many. Budgets, 
> staffing, and time will always make organizations shy away from 
> translating even a small fraction of the words they have on hand. Many 
> companies and government agencies will use some form of automated 
> translation to improve services to customers and constituencies. 
> However, many information consumers will avail themselves of free online 
> machine translation (OLMT) if they don’t find their language at a website.
> Most of that free OLMT to date has been provided by SYSTRAN 
> <>, 
> a French software firm that grew up during the Cold War as the Free 
> World 
> <> 
> faced off against the Moscow-led Warsaw Pact 
> <>. In October new challenges 
> arose from the new guard, including the Russians themselves.
>     * Google reportedly replaced the languages that SYSTRAN translated
>       for it in favor of its in-house statistical machine translation
>       (SMT) engine. Google’s homegrown technology came into wide view
>       when it won the no-holds-barred NIST Machine Translation
>       Evaluation
>       <>
>       in 2005. Google’s MT is part of the GooglePlex — that is, not yet
>       a commercially available product, but, like its search appliance,
>       MT could become a Google product. Try it here
>       <>.
>     * SMT-based Language Weaver opened its second sales office in Europe
>       <>.
>       After its initial success selling to certain U.S. government
>       agencies, Language Weaver made its 2006 European debut in
>       bureaucrat-dense, government-rich Brussels. Its latest digs are in
>       Paris, hometown of SYSTRAN — and presumably of some commercial
>       buyers. Free use of Language Weaver on the web is harder to find
>       than Google or SYSTRAN. Earlier this year the company announced
>       that the social bookmarking
>       <>
>       site Kontrib <> was using its technology,
>       giving everyone a chance to see its output. Expect Language Weaver
>       to host its own OLMT site as part of its marketing expansion.
>     * St. Petersburg-based PROMT announced a significant uptick in the
>       use of its free OLMT <>.
>       This followed its September announcement of V7.8 with support for
>       Windows Vista <>, while
>       those fortunate enough to speak Russian already have access to
>       Version 8.0 <> with its improved algorithms
>       and usability. Try its free OLMT <>.
> The bottom line: Most consumers will never buy desktop machine 
> translation software from LEC, PROMT, or SYSTRAN for their PCs, Macs, or 
> smartphones. However, they will have free MT available in the cloud from 
> Google, Language Weaver, LogoVista 
> <>, Microsoft, PROMT, 
> SYSTRAN , and through portals like Yahoo! BabelFish 
> <>. How well do they work? Click here for a 
> modest example <>.
> ----------------------------
> Seeking an MT Market beyond Ad-Reading Eyeballs 
> <>
> Donald A. DePalma 25 September 2008
> Last week, Language Weaver projected a US$67.5 billion market for 
> digital translation, enabled by advances in machine translation (MT). 
> For the last few years, we have released an annual estimate of the 
> market for outsourced translation, localization, and interpretation. For 
> 2008, human-delivered translation activities will total a hefty US$14.25 
> billion (see our “Ranking of Top 25 Translation Agencies 
> <>“). 
> On the software side, we estimate that the MT software market falls 
> well short of US$100 million. Added together, there’s a lot of daylight 
> between our numbers and Language Weaver’s estimate. Where’s the 
> disconnect? Over the last week, we’ve spent a lot of time talking with 
> various people about the US$67.5 billion projection.
> Let’s start off by deconstructing the 67 billion dollar number. That is 
> an estimate of the monetary value that Language Weaver thinks MT 
> suppliers “could” translate for corporations and governments; the 
> operative phrase in the company’s press release is “untapped markets” 
> where automated translation could increase the volume and lower the cost 
> of human translation, which stands at current market prices of 10-40 
> cents per word 
> <>.
> How good is Language Weaver’s sizing of the as yet unrealized market? We 
> think its number is way too low, especially as the amount of stored 
> content grows at record levels (see the figure below from our report on 
> “Automated Translation Technology 
> <>“).
> The untapped market potential is much higher, but the problem is still 
> getting buyers on board. Language Weaver will target customer care, 
> business intelligence, and user-generated content, three markets where 
> companies could benefit from moving content out of linguistic silos. 
> However, the organizations today that stand to gain the most from MT are 
> those driving advertisement-reading eyeballs to their sites 
> <>. The challenge 
> that Language Weaver and rival developers face is getting more people 
> accustomed to the idea of paying for MT software or SaaS solutions that 
> will help them translate their content into other languages. Three 
> roadblocks stand in the way:
>     * *Free machine translation obscures the value.*  There’s an
>       enormous amount of content that’s translated every day online
>       using free online machine translation sites, but no one has
>       figured out how to directly monetize those interactions. We have
>       long contended that there’s far more text that consumers,
>       businesses, and governments might run through those engines if
>       they could more easily plug them into workflows, e-email systems,
>       mobile phones, and other networked appliances. Combine a dollar
>       figure for the unmonetized activity that’s happening today at
>       sites like Google Translate or Yahoo!’s Babel Fish with the dollar
>       value for things that should be translated - and you’ve got some
>       really big piles of zeroes. The problem is that there are usually
>       no positive integers to the left of those zeroes. Bottom line: Too
>       much of it is free.
>     * *Unpaid human translation appears to be a panacea.*  Another rival
>       to MT is community or collaborative translation
>       <> for both
>       company- and user-generated content, such as we’re seeing at
>       Facebook
>       <>
>       (social networking), Livemocha
>       <> (language learning), and
>       NetBeans
>       <> (Java
>       software development). These communities can fill some of the
>       demand, but nowhere near all of it. That leaves a lot of
>       information forever locked in the language in which it was created.
>     * *An uneducated market expects too much or too little.* Potential
>       buyers retain unrealistic (read “Star Trek” or Hitchhiker’s
>       Guide”) expectations of what they will get out of machine
>       translation. Some ignore the quality issue
>       <>
>       altogether, posting babble-fishy output and thinking they did a
>       good thing in providing any in-language content at all. Meanwhile,
>       many individual translators and too many translation agencies miss
>       the point; they think that MT threatens their livelihood rather
>       than viewing it as a productivity enhancer.
> That said, the corporate and governmental sectors may be turning the 
> corner vis-à-vis MT acceptance, if not purchasing. A poll conducted by 
> the International Association for Machine Translation (IAMT) and 
> Association for Machine Translation Americas (AMTA) for SDL 
> <>, 
> another provider of machine translation technology, found that 40 
> percent of the 385 surveyed individuals were “now” likely to use MT. Of 
> those roughly 150 receptive respondents, 62 percent said they would use 
> it for technical documentation, 49 percent for support and 
> knowledge-based content. That’s good news for the MT software sector, 
> but could be bad news if automated translation merely displaces the work 
> of traditional translation agencies rather than increase the size of the 
> overall business.
> --------------------------
> Asia Online Aims to Meet Asian Content Demands with MT+ 
> <>
> Donald A. DePalma 14 April 2008
> For the last dozen of so years we’ve heard ourselves incessantly 
> reminding everyone that the “www” in most URLs means “worldwide web,” 
> while the “e” in “e-commerce” all too often stands for English. Our 
> research on e-GDP 
> <> 
> (online GDP) and the Availability Quotient 
> <> 
> demonstrated that many companies still have a long journey before they 
> can meet the demands of the world’s markets for local-language content. 
> That gap is no more apparent than in Asia where the amount of 
> in-language content is dwarfed by the growing online population.
> Just how dwarfed? Today, roughly 38% of internet users live in Asia, but 
> by 2012, that number will jump to half. However, local-language content 
> hasn’t kept pace. In 2007, non-Asian languages accounted for roughly 86% 
> of the content on the web. Most of the remaining 14% was split among 
> Japanese (6%), Chinese, (6%), and Korean (1.5%). All other Asian 
> languages comprise less than 0.03% of the web’s content; for example, 
> Southeast Asian languages make up less than 10 million pages. Given 
> consumer preference for content in their own language 
> <>, 
> that huge gap between Asian content and total online population 
> represents a huge opportunity.
> That opportunity has not gone unnoticed. After getting an eyes-only, 
> tell-no-one pre-briefing in December, we recently spoke with Asia Online 
> CEO Dion Wiggins who called us to tell us that his portal had just 
> scored its first round of funding from JAIC 
> <>, the Japanese 
> venture capital behind 
> <>, 
> among others. He also wanted to let us know that Kirti Vashee 
> <>, formerly VP of 
> marketing at Language Weaver, had signed on as Asia Online’s VP of sales 
> for the Americas and Europe with the responsibility for selling the 
> commercial version of its MT engine.
> Asia Online’s plans revolve around a proprietary machine translation 
> engine plus a strong support infrastructure of humans, content, and 
> partners are key to this strategy:
>     * *New technology.* Asia Online developed high-performance
>       statistical machine translation (SMT) software in collaboration
>       with University of Edinburgh professor Philipp Koehn.
>     * *Clean corpora.* Asia Online contracts with publishers, language
>       service providers, and eventually corporations for
>       human-translated content to train its SMT engine. The company also
>       crowdsources the quality via a large community of students, and
>       feeds the validated content back into the system as training data.
>     * *Matrixed language learning.* The SMT engine can take translations
>       of a novel into English, Japanese, and Thai and use the
>       permutation to train itself on English<>Thai, English<>Japanese,
>       and Japanese<>Thai. This capability is especially important for
>       languages that don’t have enough content to feed a data-hungry
>       statistical MT engine.
>     * *Real-time fixes.* Its MT engine lets reviewers observe
>       translation decisions as they are being made, allowing them to
>       influence choices, make fixes in place, and propagate these
>       modifications to wherever that phrase or term is used
> Asia Online is talking with LSPs interested in using its SMT engine and 
> has fielded corporate requests to use its software. We think that its 
> real value lies in its Google-esque plan to drive billions of eyeballs 
> <> seeking content in 
> their own languages — and the advertising, special offers, and the 
> next-generation linguistic tools that are sure to follow.
> --------------------------
> Google MT Puts Multilingual Information at More Fingertips 
> <>
> Donald A. DePalma 25 March 2008
> As we predicted in our 2006 report on machine translation 
> <>, 
> Google has opened its MT engine to general usage — but with no software 
> license or other fees. Acknowledging that automated translation right 
> now is all about eyeballs, 
> <> Google made its 
> newly documented AJAX Language API for Translation and Language 
> Detection <> beta 
> release free to anyone who decides to call it. By the way, we would have 
> put “language detection” first in the API’s name, but Google knows a bit 
> more about SEO than we do.
> As the name implies, you can use this application programming interface 
> to detect language blocks in a text and translate them. Translation 
> requests go to Google’s pretty good statistical MT engine 
> <> (SMT). The API 
> supports 29 language pairs 
> <> 
> (13 languages in total), including the usual E-FIGS and CCJK plus 
> French<>German without involving English as the pivot language. 
> Translation services are what Google generates without the option for 
> training the SMT engine on your particular lexicon. Nonetheless, Google 
> translations have proven to be very intelligible in the mash-ups 
> <> that we have 
> done or observed.
> Google says that its language API is simple and easy to use — versus an 
> arcane call-level interface: It requires an input string to translate, 
> the names of the source and target languages, and a callback function. 
> We put that claim to the test with a short program that threw 
> increasingly larger strings at the interface. We can attest that it is 
> easy to use for short strings. We did notice a couple of restrictions in 
> our sandbox (N.B. Common Sense Advisory Labs did not conduct exhaustive 
> tests on the API — rather, we ran tests until we got bored with the 
> permutations):
>     * *Strings.* The API maxes out at around 1,200 characters per source
>       string of plain text (figure on 100-120 words). While that’s good
>       for including Google’s MT in your average application, it won’t
>       help the average language service provider intent on
>       pre-translating big files.
>     * *Files and URLs.* If you want to translate files, set them up as
>       HTML pages hanging off a website and type the URL into Google’s
>       website translator
>       <>. That worked for
>       web pages and shorter documents, but choked on the unexpurgated
>       HTML version of “Business Without Borders
>       <>” (a mere 122,000 words,
>       give or take a couple hundred). We also tried translating the
>       19,000 words of Thomas Paine’s Common Sense
>       <>
>       pamphlet into Japanese and Russian. Google translates the first
>       5,300 words, but leaves the rest of the page in English.
> Google’s AJAX Language API page promises future enhancements. We expect 
> longer strings, named files, and longer documents to be part of future 
> releases. What’s less likely in free Google MT are commercial features 
> such as lexical tuning by company, industry-specific glossaries, or the 
> feedback loop available since 2005 in Language Weaver 
> <> 
> (although Google does have a generalized “train the engine” function).
>     * For information consumers and seekers of truth in languages other
>       than their own, these advances will be good news. Higher quality,
>       free machine translation utilities will lead to MT popping up in
>       more and more applications.
>     * For translators who don’t own translation memory software, we
>       think that Google remains a great candidate for offering a
>       gmail-like translation environment
>       <>, replete with MT.
>     * Smart LSPs should seriously consider preprocessing small projects
>       through the Google engine and — depending on the output — decide
>       whether it is worth post-editing or fully translating the text.
>       After all, they really don’t have anything to lose and could
>       increase the productivity of their translators.
>     * Competing MT engines will need to move fast to stay ahead of the
>       ad-funded portal. This API will make life difficult for the
>       already besieged smaller players trying to sell their wares in a
>       market monetized more by search and eyeballs than by software
>       license revenue. Companies like SpeakLike and Transclick
>       <> (one of 391
>       World Economic Forum Technology Pioneers
>       <>)
>       will likely add the Google engine to their suites of MT engines.
>       Meanwhile, we don’t expect companies like Asia Online
>       <>, Language
>       Weaver, Microsoft, PROMT, SDL, SYSTRAN, and others with their own
>       MT engines and advancing research to sit on the callable MT
>       sidelines for long.
> Earlier today we spoke with Dimitris Sabatakakis, CEO at SYSTRAN, who 
> said that “all MT providers should thank Google for the hype and 
> excitement it brings as MT is now perceived as a practical and usable 
> technology. This means there are more potential customers interested in 
> a MT product or solution. Google’s investment in MT is proof that MT is 
> a key technology for the emerging market and provides a solution to a 
> real need. It is forcing all providers to raise their respective bars. 
> If we stay static, we will collapse.”
> -------------------------
> Chevy “Nova”: Updating Bad Translation Apocrypha 
> <> 
> Donald A. DePalma 6 February 2008
> Not an hour goes by that we don’t receive an e-mail announcing a press 
> release from a vendor. What we find most interesting is when a company 
> issues a press release but fails to tell us (or anybody else) that it’s 
> out there. That happened back in May when SDL noted that “Spanish leaves 
> global marketers lost in translation.” Quoting the press release, 
> “According to SDL, the top five worst translation mistakes made by 
> companies looking to expand into the Spanish-speaking world” were the 
> usual hackneyed examples of bad translation. These included “I saw the 
> Pope” (/el Papa)/ translated as “I saw the potato” (/la papa/), the “Got 
> milk?” slogan rendered as “Are you lactating?” in Spanish, and Parker 
> introducing its non-leaking fountain pen in Spain with the slogan “it 
> won’t leak in your pocket and embarrass you,” with the translator 
> buddying up with a false friend (/embarazar/ means pregnant, not 
> embarrassed). At least they left out the old chestnut about the Chevy 
> Nova (/no va/ — get it?) in Latin America and the rumored over-medicated 
> U.S. Latina who interpreted the “/once/ a day” on her prescription as 
> “11 times a day.”
> What’s going on here? It’s all about search engine optimization. SDL 
> cited these examples plus economic figures for Latin American growth to 
> improve its SEO rankings for the Hispanic market. The company’s CMO 
> figured that becoming associated with these sometimes apocryphal 
> mistranslations was a good way to improve SDL’s search engine rankings. 
> Of course, we’re doing the same here by recycling these oft-told tales 
> of mistranslation.
> But wait — there are some really good examples of bad translations and 
> cross-border mistakes out there. Here are a few of our favorites:
>     * For our 2002 keynote at the SAE’s TopTec Multilingual
>       Communication for the Automotive Industry conference, we found
>       candidates for “Bad Product Name of the Year” among Japanese car
>       makers selling in Latin America: Mazda Laputa (interpreted by
>       Spanish speakers as /la puta/), Mitsubishi Pajero (slang for
>       onanist), and Nissan Moco (snot). In that speech we cited an auto
>       show description of the Laputa that might not be suitable for
>       children — “Laputa ha mejorado su seguridad y ampliado su
>       interior… Cuerpo diseñado para resistir impactos frontales.” Check
>       that out at Yahoo! <> or
>       Google <> free MT sites.
>     * More recently, Car and Driver
>       <>
>       magazine reviewed the translated claims of Chinese automakers at
>       the Detroit Auto Show. The brochure for the Liebao CS6 SUV claimed
>       “Gene of being Wild: VM engine brings you the long-awaited shock…
>       only by stepping on the accelerograph, the mph will come to the
>       peak in a second” and the BYD F3 sedan has “fuel efficiency stomach.”
>     * Back to the subject of product names, we noticed a stand for a
>       firm selling “Hyper STD” at the tekom conference in Wiesbaden,
>       Germany last November (see photo above). Yuck! Most American
>       buyers would steer clear of products associated with Sexually
>       Transmitted Diseases.
>     * When we tried the WiFi at the tekom conference Hotel Klee am Park
>       in Wiesbaden, we read the English-language instructions that told
>       us: “General technical supposition is a reticulation-card. Please
>       arrange your reticulation-card to IP (automatic internet
>       register).” Huh?
>     * The classic post-Sputnik mistranslation of “wet sheep” for
>       “hydraulic rams” in a Soviet science journal is an under-used
>       classic example. That’s baaaad! Next time you think about
>       referencing the Nova, try this one instead.
>     * A friend who was an interpreter at the United Nations told us
>       about a colleague who tried to amplify an emotionally-delivered
>       idiomatic expression, suggesting that “we need to grab the bull by
>       something other than the horns.” Ouch.
> But bad translations aren’t always funny. They can have serious 
> consequences:
>     * *Financial markets will shake. *Back in May 2005 a reporter for
>       the China News Service pieced together a story about how currency
>       appreciation might affect the market
>       <>.
>       The People’s Daily had it translated into English without the
>       subjunctive case, stating that China decided to revalue its
>       currency 1.26% a month for a year. Bloomberg’s spider in London
>       picked up the story and European equity markets rose on the news.
>       While it was quickly repudiated, the error did cause market tremors.
>     * *Armies can advance without consequence. *In August 1968 U.S. Army
>       transcribers reportedly wrote down a transmission from a Soviet
>       tank column as “my perexali most” rather than “my priexali v
>       Most.” What was heard (a routine bridge-crossing exercise by a
>       tank column) was not what happened (the arrival of Soviet tanks in
>       Most, a city in sovereign
>       <> Czechoslovakia).
>     * *Countries might disappear.* In October 2005 Iranian President
>       Mahmoud Ahmadinejad
>       <>
>       reportedly called for Israel to be wiped off the map, but
>       apparently he really “just” wanted to get rid of its government.
>       True to form, Ahmadinejad didn’t clarify his remarks after the
>       mistranslation, further complicating matters.
>     * *Companies will get into trouble.* A senior executive at Yahoo!
>       had to apologize for not giving U.S. Congressmen information about
>       the company’s role in the imprisonment of a Chinese dissident
>       <>, Shi
>       Tao. According to Yahoo!, a bad translation by an employee of a
>       2004 order from the Chinese government caused the problem.
> None of the mistakes after the “But wait” in this posting were machine 
> translation miscues 
> <> — they’re 
> just bad translations by humans. Caveat lector!
> --------------------------
> JAJAH Advances Machine Interpretation 
> <>
> Renato Beninatto and Nataly Kelly 12 August 2008
> Filed under (Interpretation 
> <>, Translation 
> & Localization 
> <>, 
> Translation Technologies 
> <>, 
> Language Industry 
> <>)
> 2 pepper rating
> When we first heard about JAJAH’s extremely simple process 
> <> for providing machine-based telephone 
> interpretation, it sounded too good to be true. The process is comprised 
> of three easy steps — simply dial a number from any phone, speak in 
> English, and hand your phone to the person who speaks Mandarin. The way 
> it is described, the service would seem to automate much of human 
> interpreters’ work, and would be particularly helpful for situations in 
> which telephone interpreters are used. As usual, if it sounds to good to 
> be true, it probably is.We tested the service, currently touted as a way 
> to help travelers overcome language barriers in China, just in time for 
> the Beijing Olympics 
> <>. 
> We conducted several tests and found that the service seemed to work 
> quite well at some levels, in that it did correctly render some of our 
> words into the target languages. However, the voice recognition 
> component misunderstood some of our words, even when we conducted tests 
> with speakers of native and near-native English. To test the service in 
> Mandarin, we used voice-over samples recorded by professional talent, 
> and the results were a bit difficult to understand in English — then 
> again, we purposely used samples with brand names that we knew tend to 
> be problematic for machine translation tools. Now that we’ve aired our 
> complaints, let’s take a look at a few points on the bright side of this 
> innovation:
>     * *You get what you pay for — at least, in the early stages. *The
>       service is free, so it should come as no surprise that it does not
>       work perfectly yet. In spite of the disjointed target language
>       versions we received in English and the fact that telephony
>       provider JAJAH went with another Babel theme, we do not believe
>       that the localization world will automatically relegate it to the
>       role of industry laughingstock, as happened with BabelFish
>       <>.
>     * *Free machine-based telephone interpretation is a first. *At
>       Common Sense Advisory, we’ve been writing more in the past few
>       months about the trend we are noticing toward computer-assisted
>       interpretation (CAI)
>       <>
>       and the future synergies between translation memory and what we
>       refer to as interpretation memory (IM) — pre-translated and
>       pre-recorded words and phrases that serve to partially automate
>       the process of interpretation. This additional focus in our
>       research is intentional — CAI has already been widely implemented
>       for devices used by the military, but this is one of the first
>       instances we’re aware of that offers such a service for free,
>       on-demand, via telephone, and to the general public. This type of
>       service pushes CAI to a new level.
>     * *Savvy developers will want to take note. *This offering from
>       JAJAH may not appear at first to represent a major technological
>       advancement, but it does prove to the world that machine
>       interpretation (MI) is possible, even if the quality is not yet up
>       to par. LSPs — especially telephone interpretation providers
>       <>
>       — and technology companies that aim to stay ahead of the curve are
>       well-served to keep CAI and MI on their radar. We predict that
>       more and more of these services will begin to spring up soon.
> Even for the traveler who is willing to hit the re-dial button a few 
> times and is able to accept an imperfect rendition, this service may be 
> of limited use. While it’s certainly not as costly as some of the 
> phone-based Chinese interpretation services that have recently been 
> profiled in the Wall Street Journal 
> <> 
> and other media as services for travelers to the Olympics, it could 
> prove to be cost-prohibitive for a person dialing the number repeatedly 
> and trying to confirm the recording’s accuracy while sitting in a taxi 
> in Beijing with the meter running — especially if proper nouns, such as 
> the hotel name, are rendered incorrectly. That’s precisely what happened 
> in our example — take a look at the video below and judge for yourself. 
> In summary, we don’t see this service replacing the need for phone-based 
> interpreters anytime soon, but the general impact — and possibilities — 
> for the language services industry are definitely worth noting.
> ----------------------------------------------------------
> Google Shakes Up the Translation Memory Scene 
> <>
> Nataly Kelly 8 August 2008
> Filed under (Translation & Localization 
> <>, 
> Translation Technologies 
> <>, 
> Language Industry 
> <>)
> This week, there were rumblings about the forthcoming beta release of 
> Google’s new translation management system (TMS), called Translation 
> Center <>. If 
> you’re familiar with Google Translate, 
> <> you might be thinking, “Big 
> deal, this is just a low-tech, human version of what they’re already 
> doing.” If so, you would be wrong: This is big news for the practice of 
> translation. It seems that Google has been stalking the sector.
> We predicted in 2006 
> <> 
> that Google would open up its statistical machine translation engine for 
> general usage — and so it did, as we reported in March 2008 
> <>. Last 
> December, we published our first report on collaborative translation 
> <>, 
> in which we explained how collaboration tools and open source concepts 
> could increase translation efficiency. We’ve written about the merits of 
> crowdsourcing 
> <> 
> and how companies like Facebook, Google, and Sun Microsystems have 
> pioneered work in this area.
> Google seems to have been listening. In December of 2007, we suggested a 
> gmail-like model <> 
> for translation memory and forecasted that a company from outside the 
> language industry with no interest in selling tools — such as Ask, 
> Google, or Yahoo! — might be well-served to make such an offer. Google 
> has apparently done just that. It claims that its new translation 
> management system (TMS) gives users the ability to request translations, 
> find translators, and upload documents for translation into more than 40 
> languages. It also enables freelancers to create and review content in 
> their languages using free translation tools. Yes, free.
> Why would Google take an interest in supporting human translation 
> activities? One big reason: It needs human support in order to build up 
> its translation memory, so that Google Translate can evolve from a “me 
> translate pretty one day” prototype to a reputable and reliable language 
> conversion machine. True, there are some large sources of free 
> translation memory out there already — such as the enormous database 
> offered by the European Parliament 
> <>. 
> But, to truly enable mass quantities of information to be shared around 
> the globe, Google needs richer, vaster sources of TM than what’s 
> currently in the public domain. After all, the typical web user might 
> want to communicate now and then regarding things other than, say, 
> official EU declarations and proceedings.
> Adding humans to the mix enables Google to gradually create a very large 
> storehouse of translated words and phrases — exactly what TAUS is aiming 
> for with its data sharing initiative 
> <> and what 
> Asia Online is doing with its human-enhanced statistical MT engine 
> <>. In a 
> nutshell, Google will unite its cloud with the crowd to get as many 
> helping hands on the job as it can.
> We’ll reserve our detailed comments on Google Translation Center until 
> we can actually try it out for ourselves and see how it fares alongside 
> other TMS programs — our in-depth report with translation management 
> system scorecards 
> <> 
> for translation management suppliers will be published soon — but the 
> big picture value of this news for the industry is clear. Even in its 
> beta form, Google Translate showed decent promise 
> <> for the future 
> of automating written language mediation — it is a well-built machine 
> translation engine.
> What separates Google from the rest of the MT field is that this machine 
> is backed up by a manufacturer with plenty of money, data center power, 
> disk space, and network infrastructure, not to mention expertise in the 
> assembly and productization of raw information materials. But now, with 
> the addition of humans, it has the opportunity to become well-oiled in 
> addition to having a sturdy construction. What remains to be seen is if 
> Google can find enough oil to maximize MT performance. Thankfully, 
> translation memory is a plentiful resource — one that won’t require any 
> drilling.
> -----------------------------
> --- On *Sat, 1/31/09, Don Osborn /<dzo at>/* wrote:
>     From: Don Osborn <dzo at>
>     Subject: RE: Cheeseburgery hamburgers and the problem of
>     computerised translations
>     To: lgpolicy-list at
>     Date: Saturday, January 31, 2009, 9:47 AM
>     We all know MT (machine translation, aka  computerized translation) is not
>     perfect so I don't think this piece was particularly informative.
>     The only news I see in it is that there is MT for Polish <-> English
>     (probably has been for a while but this is the first note I've made of to
>     it). Given what must be necessary to develop MT, it does not surprise me if
>     a recently developed program churns out some
>      cheeseburgery results (though I
>     wonder who put that word in the lexicon).
>     While on the topic, my favorite MT mistranslation was with an older version
>     of (results duplicable on Babalfish): "discussion on
>     fonts" in
>     English became in Portuguese the equivalent of "quarrels in baptismal
>     basins." Such blatantly outrageous results, though, speak to me as a
>     non-specialist in the matter more of how the MT was set up than any inherent
>     problem with setting up MT. Discussion in English is not really a synonym
>     with its apparent cognates in Latin languages (at least French &
>     Portuguese); and how often do English speakers use "font" to describe
>     a what
>     in Portuguese they call pias baptismas? I've never heard of cheesburgery
>     before but will surely find a way to use it in conversation sometime - just
>     not in MT.
>     The real news is how useful MT can be in sorting through the gist of things
>     in diverse
>      languages, and how with new approaches the results are improving
>     significantly. I hope FT takes a look at that, and how the complex and
>     uneven progress in MTis changing the way we access and use multilingual
>     content and documents.
>     Don Osborn
>     > -----Original Message-----
>     > From: owner-lgpolicy-list at [mailto:owner-lgpolicy-
>     > list at] On Behalf Of Harold Schiffman
>     > Sent: Tuesday, January 27, 2009 11:18 AM
>     > To: lp
>     > Subject: Cheeseburgery hamburgers and the problem of computerised
>     > translations
>     > 
>     > Cheeseburgery hamburgers and the problem of computerised translations
>     > January 26, 2009by Tony Barber
>     > 
>     > This morning I found myself on a public platform in a Brussels hotel
>     > for my first ever European bloggers' conference. As a representative
>     > of an "establishment" news organisation, I was half-expecting
>      to
>     be
>     > roasted alive. But in the end both Mark Mardell of the BBC, my friend
>     > and fellow-guest, and I got through it safely enough. The most
>     > perceptive contribution, I thought, came from a Romanian blogger who
>     > made the point that the global blogosphere remains to a large extent
>     > divided by language. For example, you can blog all you like in
>     > Romanian, but most of the world won't have a clue what you're
>     saying.
>     > 
>     > A moderator responded to this by saying, "Try using
>     computer-generated
>     > translation." As I drifted back to my office, I recalled that the
>     last
>     > time I'd experimented with computers striving to change Italian into
>     > English or Dutch into Spanish, the results had been pretty hopeless.
>     > Perhaps things had improved over the last couple of years?
>     > 
>     > Well, below are three examples of computerised translation - courtesy
>     > of Google
>      Language Tools - from French, German and Polish into
>     > English. I am republishing the translations exactly as they came out,
>     > punctuation mistakes and all, after I hit the button.
>     > 
>     > 1) This is from a news story in Le Monde about US and European policy
>     > in the Middle East. "Believing that the war in Gaza has imposed new
>     > priorities and the administration of the new American president,
>     > Barack Obama, might break with the unconditional support to Israel,
>     > French diplomacy is trying to print in Europe, a change of tone
>     > against the Hamas."
>     > 
>     > As you can see, this translation starts off promisingly. In fact, it
>     > scarcely puts a foot wrong until it loses control and talks, weirdly,
>     > about printing changes of tone against the Hamas. Still, we sort of
>     > know what's going on here. 7 out of 10 for Monsieur L'Ordinateur.
>     > 
>     > 2) Now here's a sentence from a
>      story in Germany's Süddeutsche
>     Zeitung
>     > about the US prison centre at Guantánamo and what Europe can do to
>     > help close it down. "The fate Released Guantanamo prisoners ensures
>     > fierce debates: Union politicians criticized the foreign ministers of
>     > Vorpreschen Stein Meier - and refer the responsibility for the inmates
>     > to the U.S."
>     > 
>     > This is a pretty poor effort, Herr Computer.  Particularly
>     > disappointing is the omission of the preposition "of" between
>     "fate"
>     > and "released" (which also shouldn't have a capital R), and
>     the
>     > baffling three words "Vorpreschen Stein Meier". But let's be
>     fair,
>     > there's a modest degree of sense here. 5.5 out of 10.
>     > 
>     > 3) Lastly, here's a sentence from the Polish newspaper Gazeta Wyborcza
>     > on French leisure habits during the recession. "Economic crisis and
>     > changing lifestyles, the French seriously affect
>      the profits of French
>     > cafes and restaurants. A sign of the collapse of the French culture of
>     > the restaurant is visible on the streets of Paris rash of
>     > quick-service bar, offering generally pogardzane a few years ago and
>     > cheeseburgery hamburgers."
>     > 
>     > No, dear readers, you have not gone potty. That's what it says. And I
>     > am afraid, Pan Komputer, that it's utter gibberish. You get 2 out of
>     > 10 - and an hour's detention in the language lab.
>     > 
>     >
>     > the-problem-of-computerised-translations/
>     > 
>     > --
>     > **************************************
>     > N.b.: Listing on the lgpolicy-list is merely intended as a service to
>     > its members
>     > and implies neither approval, confirmation nor agreement by the owner
>     > or sponsor of
>     > the list as to the veracity of a message's contents.
>      Members who
>     > disagree with a
>     > message are encouraged to post a rebuttal. (H. Schiffman, Moderator)
>     > *******************************************

Alexander J. Stein
Cell:  (201) 412-9479
Email: alharaka at
Skype: alexander.j.stein
AIM:   elduderino6886

More information about the Lgpolicy-list mailing list