Cheeseburgery hamburgers and the problem of computerised translations
Al Haraka
alharaka at gmail.com
Sat Jan 31 15:54:27 UTC 2009
Nataly,
Thanks for the great response. I was very into this in college and took
the only classes available at my school on NLP. This is a good review.
I will definitely read that article!
Cheers,
_AJS
Nataly Kelly wrote:
> Google's statistical MT engine(http://translate.google.com/) is
> available in the following languages: Albanian, Arabic, Bulgarian,
> Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian,
> Filipino, Finnish, French, Galician, German, Greek, Hebrew, Hindi,
> Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian,
> Maltese, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian,
> Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian and
> Vietnamese.
>
> I will paste below a few recent Watchtower blog entries (independent
> industry commentary) that might be of interest on the topic of both
> rules-based and statistical MT. I would recommend clicking on the actual
> page URLs to see the related links, videos and images in case they do
> not display properly here. However, these just give a snapshot of the
> state of the market, and do not dive into the technical details of the
> machine translation engines. Those are often the subject of papers and
> presentations within the localization and computational linguistics
> conference circuits.
>
> For some types of projects, MT can actually work well, especially for
> controlled language and technical content. The Pan American Health
> Organization has had great success using their MT engine for technical
> content. It is one of the best examples I have seen of domain-specific
> MT. More information:
> http://www.paho.org/English/AM/GSP/TR/Machine_Trans.htm
>
> There are currently several language service providers (LSPs) whose
> business model is centered around using free or nearly-free machine
> translation with human post-editing. However, MT is also widely used for
> gisting and is particularly helpful for scanning a large corpus to
> determine which areas might require TM+post-editing or computer-assisted
> translation (CAT) performed by humans but made easier through the use of
> translation memory and software tools that aid with flagging repeated
> text so that it only has to be translated once, terminology extraction
> and management tools for ensuring consistent use of terminology, etc.
>
> Another growing trend is machine interpretation (total automation of
> spoken language interpretation), so I'll include one post on that topic
> below as well. Computer-assisted interpretation (CAI) is another
> growing trend, in which both end users and interpreters themselves are
> making greater use of software, handheld devices, and desktop
> applications to facilitate interpretation tasks.
>
> I hope some of these blog posts will be useful to colleagues, although
> it is important to remember that, as blog entries, they provide just a
> snapshot of the current trends in the language services market. A great
> many books and journal articles exist on these topics that would lend
> greater insight to those interested in the current state of the research.
>
> Nataly Kelly
>
> --------------------------
> How Good Is Machine Translation? A Modest Test
> <http://www.globalwatchtower.com/2007/10/30/mt-shootout/>
> http://www.globalwatchtower.com/2007/10/30/mt-shootout/
> Donald A. DePalma 30 October 2007
>
> The Wall Street Journal
> <http://online.wsj.com/article/SB119265174539562359.html?mod=yahoo_hs&ru=yahoo>
> recently opined that “translation software is at last good enough to
> help companies do business in other languages,” noting a hoary case
> study from Ford and posturings from Google, Microsoft, and SDL — and few
> real examples. But that’s fine. The Journal has just discovered MT,
> perhaps looking for juicier stories to put on its Page 3
> <http://en.wikipedia.org/wiki/Page_Three_girl> as Rupert Murdoch’s News
> Corp <http://www.newscorp.com/> takes over as dowager queen of the print
> media.
>
>
> That said, MT is definitely on the must-review list for many companies
> and government agencies, but few are paying for it today. The biggest
> use of automated translation is free online machine translation (OLMT).
> How widespread? Last year Common Sense Advisory asked 2,430 consumers in
> non-Anglophone countries
> <http://commonsenseadvisory.com/research/report_view.php?id=36&cid=0>
> whether they tried free OLMT — more than half said that they sometimes,
> frequently, or always use machine translation to better understand
> English-language websites.
>
>
> So, like it or not, information consumers will get what they want
> translated by a quick hop over to their favorite free MT site. How good
> will the translations be? Let’s consider some skeptical lines penned by
> our resident Carioca as he read our entry about changes in the MT guard
> <http://globalwatchtower.com/2007/10/30/mt-google-systran/>. We decided
> to paste these immortal words into several free OLMT sites to test
> Portuguese into English translations rather than come up with the MT
> equivalent of the “the quick brown fox jumped over the lazy brown dog.”
> One of the systems used statistical machine translation (SMT), the other
> 3 were rules-based (RBMT) systems (see the full version of Automated
> Machine Technology
> <http://commonsenseadvisory.com/research/report_view.php?id=37&cid=0>
> for a discussion of the differences, pros, and cons).
>
> “Escrevo estas linhas em português para testar o novo método de
> tradução automática do Google. Eu não tenho grandes expectativas com
> relação a este tipo de serviço. Confesso que sou usuário freqüente
> para ter acesso a informações em idiomas que não domino, como o
> alemão, chinês e japonês. O Google agora utiliza tradução automática
> com base estatística. Quer dizer, quanto mais traduções houver de
> uma frase, mais correta será a solução oferecida pelo computador.”
>
> * Google <http://www.google.com/language_tools?hl=en> (SMT): “I
> write these lines in Portuguese to test the new method of
> automatic translation of Google. I do not have great expectations
> with respect to this type of service. I confess that I am a
> frequent user to access information in languages that no domain,
> such as German, Chinese and Japanese. Google now uses machine
> translation based statistics. That is, the more there translation
> of a phrase, more correctly is the solution offered by the computer.”
> * PROMT <http://www.e-promt.com/> (RBMT): “I write these lines in
> Portuguese to test the new method of automatic translation of the
> Google. I have not big expectations regarding this type of
> service. I confess that I am a frequent user to have access to
> informations in languages that I do not dominate, like German,
> Chinese and Japanese. The Google now uses automatic translation
> with statistical base. It means, how much more translations will
> be of a sentence, more correct will be the solution offered by the
> computer.”
> * SDL <http://www.freetranslation.com/> (RBMT): “I write these lines
> in Portuguese for quiz the new approach of automatic translation
> of the Google. I do not have big expectations regarding this kind
> of service. Confessed that I am user frequent for have access the
> information in languages that do not dominate, as the German,
> Chinese and Japanese. The Google now utilizes automatic
> translation with statistical base. It want to say, specially
> translations will have of a phrase, more correct will be the
> solution offered by the computer.”
> * SYSTRAN <http://www.systransoft.com/> (RBMT): “I write these lines
> in Portuguese to test the new method of automatic translation of
> the Google. I do not have great expectations with regard to this
> type of service. I confess that I am using frequent to have access
> the information in languages that I do not dominate, as the
> German, Chinese and Japanese. The Google now uses automatic
> translation with base statistics. It wants to say, the more
> translations will have of a phrase, more correct will be the
> solution offered for the computer.”
>
> Judge for yourself. While none of these are perfect translations and one
> is definitely not at the quality level of the others, all 4 tell us that
> Senhor Beninatto wasn’t writing a shopping list for “pound pastrami, can
> kraut, six bagels.” For many web browsers, that ability to determine the
> subject of a communication will be good enough, allowing them to
> determine whether they want to invest more time in a given piece of
> information. Obviously, in more complex domains and in printed
> communications like owner’s manuals for a Porsche 911 GT3 RS
> <http://www.porsche.com/usa/models/911/911-gt3-rs/> (Santa, are you
> listening?) or how to adjust the control rods for a nuclear fission
> reactor, tuning and accuracy will be much more of an issue.
>
>
> ----------------------------
> Changing of the Guard in Machine Translation
> <http://www.globalwatchtower.com/2007/10/30/mt-google-systran/>
> http://www.globalwatchtower.com/2007/10/30/mt-google-systran/
> Donald A. DePalma 30 October 2007
>
> Most information will never be translated by humans from its source
> language into even one other language, much less into many. Budgets,
> staffing, and time will always make organizations shy away from
> translating even a small fraction of the words they have on hand. Many
> companies and government agencies will use some form of automated
> translation to improve services to customers and constituencies.
> However, many information consumers will avail themselves of free online
> machine translation (OLMT) if they don’t find their language at a website.
>
>
> Most of that free OLMT to date has been provided by SYSTRAN
> <http://globalwatchtower.com/2007/02/14/systran-2006-financial-results/>,
> a French software firm that grew up during the Cold War as the Free
> World
> <http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=free+world>
> faced off against the Moscow-led Warsaw Pact
> <http://en.wikipedia.org/wiki/Warsaw_Pact>. In October new challenges
> arose from the new guard, including the Russians themselves.
>
> * Google reportedly replaced the languages that SYSTRAN translated
> for it in favor of its in-house statistical machine translation
> (SMT) engine. Google’s homegrown technology came into wide view
> when it won the no-holds-barred NIST Machine Translation
> Evaluation
> <http://globalwatchtower.com/2005/08/22/machine-translation-benchmark-bleu-nist/>
> in 2005. Google’s MT is part of the GooglePlex — that is, not yet
> a commercially available product, but, like its search appliance,
> MT could become a Google product. Try it here
> <http://www.google.com/language_tools?hl=en>.
> * SMT-based Language Weaver opened its second sales office in Europe
> <http://www.languageweaver.com/page.asp?intNodeID=856&intPageID=1181>.
> After its initial success selling to certain U.S. government
> agencies, Language Weaver made its 2006 European debut in
> bureaucrat-dense, government-rich Brussels. Its latest digs are in
> Paris, hometown of SYSTRAN — and presumably of some commercial
> buyers. Free use of Language Weaver on the web is harder to find
> than Google or SYSTRAN. Earlier this year the company announced
> that the social bookmarking
> <http://www.languageweaver.com/Page.asp?LSM=&intNodeID=856&intPageID=1018>
> site Kontrib <http://www.kontrib.com/> was using its technology,
> giving everyone a chance to see its output. Expect Language Weaver
> to host its own OLMT site as part of its marketing expansion.
> * St. Petersburg-based PROMT announced a significant uptick in the
> use of its free OLMT <http://www.e-promt.com/en/news/6073.php>.
> This followed its September announcement of V7.8 with support for
> Windows Vista <http://www.e-promt.com/en/news/5544.php>, while
> those fortunate enough to speak Russian already have access to
> Version 8.0 <http://www.promt.ru/> with its improved algorithms
> and usability. Try its free OLMT <http://www.e-promt.com/>.
>
> The bottom line: Most consumers will never buy desktop machine
> translation software from LEC, PROMT, or SYSTRAN for their PCs, Macs, or
> smartphones. However, they will have free MT available in the cloud from
> Google, Language Weaver, LogoVista
> <http://www.logovista.co.jp/english/index.html>, Microsoft, PROMT,
> SYSTRAN , and through portals like Yahoo! BabelFish
> <http://babelfish.yahoo.com/>. How well do they work? Click here for a
> modest example <http://globalwatchtower.com/2007/10/30/mt-shootout/>.
>
>
>
> ----------------------------
>
> Seeking an MT Market beyond Ad-Reading Eyeballs
> <http://www.globalwatchtower.com/2008/09/25/language-weaver-estimate/>
> http://www.globalwatchtower.com/2008/09/25/language-weaver-estimate/
> Donald A. DePalma 25 September 2008
>
> Last week, Language Weaver projected a US$67.5 billion market for
> digital translation, enabled by advances in machine translation (MT).
> For the last few years, we have released an annual estimate of the
> market for outsourced translation, localization, and interpretation. For
> 2008, human-delivered translation activities will total a hefty US$14.25
> billion (see our “Ranking of Top 25 Translation Agencies
> <http://www.commonsenseadvisory.com/members/res_cgi.php/080528_QT_2008_top_25_lsps.php>“).
> On the software side, we estimate that the MT software market falls
> well short of US$100 million. Added together, there’s a lot of daylight
> between our numbers and Language Weaver’s estimate. Where’s the
> disconnect? Over the last week, we’ve spent a lot of time talking with
> various people about the US$67.5 billion projection.
>
>
> Let’s start off by deconstructing the 67 billion dollar number. That is
> an estimate of the monetary value that Language Weaver thinks MT
> suppliers “could” translate for corporations and governments; the
> operative phrase in the company’s press release is “untapped markets”
> where automated translation could increase the volume and lower the cost
> of human translation, which stands at current market prices of 10-40
> cents per word
> <http://www.commonsenseadvisory.com/research/report_view.php?id=63&cid=0>.
>
> How good is Language Weaver’s sizing of the as yet unrealized market? We
> think its number is way too low, especially as the amount of stored
> content grows at record levels (see the figure below from our report on
> “Automated Translation Technology
> <http://www.commonsenseadvisory.com/research/report_view.php?id=37&cid=0>“).
>
>
> The untapped market potential is much higher, but the problem is still
> getting buyers on board. Language Weaver will target customer care,
> business intelligence, and user-generated content, three markets where
> companies could benefit from moving content out of linguistic silos.
> However, the organizations today that stand to gain the most from MT are
> those driving advertisement-reading eyeballs to their sites
> <http://www.globalwatchtower.com/2007/12/20/mt-eyeballs/>. The challenge
> that Language Weaver and rival developers face is getting more people
> accustomed to the idea of paying for MT software or SaaS solutions that
> will help them translate their content into other languages. Three
> roadblocks stand in the way:
>
> * *Free machine translation obscures the value.* There’s an
> enormous amount of content that’s translated every day online
> using free online machine translation sites, but no one has
> figured out how to directly monetize those interactions. We have
> long contended that there’s far more text that consumers,
> businesses, and governments might run through those engines if
> they could more easily plug them into workflows, e-email systems,
> mobile phones, and other networked appliances. Combine a dollar
> figure for the unmonetized activity that’s happening today at
> sites like Google Translate or Yahoo!’s Babel Fish with the dollar
> value for things that should be translated - and you’ve got some
> really big piles of zeroes. The problem is that there are usually
> no positive integers to the left of those zeroes. Bottom line: Too
> much of it is free.
> * *Unpaid human translation appears to be a panacea.* Another rival
> to MT is community or collaborative translation
> <http://www.globalwatchtower.com/2007/10/16/end-of-tep/> for both
> company- and user-generated content, such as we’re seeing at
> Facebook
> <http://wiki.developers.facebook.com/index.php/Translating_Platform_Applications>
> (social networking), Livemocha
> <http://translate.livemocha.com/doku.php> (language learning), and
> NetBeans
> <http://www.netbeans.org/community/contribute/localise.html> (Java
> software development). These communities can fill some of the
> demand, but nowhere near all of it. That leaves a lot of
> information forever locked in the language in which it was created.
> * *An uneducated market expects too much or too little.* Potential
> buyers retain unrealistic (read “Star Trek” or Hitchhiker’s
> Guide”) expectations of what they will get out of machine
> translation. Some ignore the quality issue
> <http://www.commonsenseadvisory.com/research/report_view.php?id=68&cid=0>
> altogether, posting babble-fishy output and thinking they did a
> good thing in providing any in-language content at all. Meanwhile,
> many individual translators and too many translation agencies miss
> the point; they think that MT threatens their livelihood rather
> than viewing it as a productivity enhancer.
>
> That said, the corporate and governmental sectors may be turning the
> corner vis-à-vis MT acceptance, if not purchasing. A poll conducted by
> the International Association for Machine Translation (IAMT) and
> Association for Machine Translation Americas (AMTA) for SDL
> <http://www.sdl.com/en/events/news-PR/sdl-research-trends-in-automated-translation.asp>,
> another provider of machine translation technology, found that 40
> percent of the 385 surveyed individuals were “now” likely to use MT. Of
> those roughly 150 receptive respondents, 62 percent said they would use
> it for technical documentation, 49 percent for support and
> knowledge-based content. That’s good news for the MT software sector,
> but could be bad news if automated translation merely displaces the work
> of traditional translation agencies rather than increase the size of the
> overall business.
>
> --------------------------
> Asia Online Aims to Meet Asian Content Demands with MT+
> <http://www.globalwatchtower.com/2008/04/14/asia-online-portal/>
> Donald A. DePalma 14 April 2008
> http://www.globalwatchtower.com/2008/04/14/asia-online-portal/
>
> For the last dozen of so years we’ve heard ourselves incessantly
> reminding everyone that the “www” in most URLs means “worldwide web,”
> while the “e” in “e-commerce” all too often stands for English. Our
> research on e-GDP
> <http://commonsenseadvisory.com/research/report_view.php?id=55&cid=0>
> (online GDP) and the Availability Quotient
> <http://commonsenseadvisory.com/research/report_view.php?id=60&cid=0>
> demonstrated that many companies still have a long journey before they
> can meet the demands of the world’s markets for local-language content.
> That gap is no more apparent than in Asia where the amount of
> in-language content is dwarfed by the growing online population.
>
>
> Just how dwarfed? Today, roughly 38% of internet users live in Asia, but
> by 2012, that number will jump to half. However, local-language content
> hasn’t kept pace. In 2007, non-Asian languages accounted for roughly 86%
> of the content on the web. Most of the remaining 14% was split among
> Japanese (6%), Chinese, (6%), and Korean (1.5%). All other Asian
> languages comprise less than 0.03% of the web’s content; for example,
> Southeast Asian languages make up less than 10 million pages. Given
> consumer preference for content in their own language
> <http://commonsenseadvisory.com/research/report_view.php?id=36&cid=0>,
> that huge gap between Asian content and total online population
> represents a huge opportunity.
>
>
> That opportunity has not gone unnoticed. After getting an eyes-only,
> tell-no-one pre-briefing in December, we recently spoke with Asia Online
> CEO Dion Wiggins who called us to tell us that his portal had just
> scored its first round of funding from JAIC
> <http://www.asiaonline.net/corporate/news.aspx#News05>, the Japanese
> venture capital behind Alibaba.com
> <http://globalwatchtower.com/2007/11/09/a-big-week-for-china-on-the-big-board-on-the-bund-and-beyond/>,
> among others. He also wanted to let us know that Kirti Vashee
> <http://www.asiaonline.net/corporate/news.aspx#News04>, formerly VP of
> marketing at Language Weaver, had signed on as Asia Online’s VP of sales
> for the Americas and Europe with the responsibility for selling the
> commercial version of its MT engine.
>
>
> Asia Online’s plans revolve around a proprietary machine translation
> engine plus a strong support infrastructure of humans, content, and
> partners are key to this strategy:
>
> * *New technology.* Asia Online developed high-performance
> statistical machine translation (SMT) software in collaboration
> with University of Edinburgh professor Philipp Koehn.
>
> * *Clean corpora.* Asia Online contracts with publishers, language
> service providers, and eventually corporations for
> human-translated content to train its SMT engine. The company also
> crowdsources the quality via a large community of students, and
> feeds the validated content back into the system as training data.
>
> * *Matrixed language learning.* The SMT engine can take translations
> of a novel into English, Japanese, and Thai and use the
> permutation to train itself on English<>Thai, English<>Japanese,
> and Japanese<>Thai. This capability is especially important for
> languages that don’t have enough content to feed a data-hungry
> statistical MT engine.
>
> * *Real-time fixes.* Its MT engine lets reviewers observe
> translation decisions as they are being made, allowing them to
> influence choices, make fixes in place, and propagate these
> modifications to wherever that phrase or term is used
>
> Asia Online is talking with LSPs interested in using its SMT engine and
> has fielded corporate requests to use its software. We think that its
> real value lies in its Google-esque plan to drive billions of eyeballs
> <http://globalwatchtower.com/2007/12/20/mt-eyeballs/> seeking content in
> their own languages — and the advertising, special offers, and the
> next-generation linguistic tools that are sure to follow.
>
> --------------------------
> Google MT Puts Multilingual Information at More Fingertips
> <http://www.globalwatchtower.com/2008/03/25/google-mt-api/>
> http://www.globalwatchtower.com/2008/03/25/google-mt-api/
> Donald A. DePalma 25 March 2008
>
> As we predicted in our 2006 report on machine translation
> <http://commonsenseadvisory.com/research/report_view.php?id=37&cid=0>,
> Google has opened its MT engine to general usage — but with no software
> license or other fees. Acknowledging that automated translation right
> now is all about eyeballs,
> <http://globalwatchtower.com/2007/12/20/mt-eyeballs/> Google made its
> newly documented AJAX Language API for Translation and Language
> Detection <http://code.google.com/apis/ajaxlanguage/documentation/> beta
> release free to anyone who decides to call it. By the way, we would have
> put “language detection” first in the API’s name, but Google knows a bit
> more about SEO than we do.
>
>
> As the name implies, you can use this application programming interface
> to detect language blocks in a text and translate them. Translation
> requests go to Google’s pretty good statistical MT engine
> <http://globalwatchtower.com/2007/10/30/mt-shootout/> (SMT). The API
> supports 29 language pairs
> <http://code.google.com/apis/ajaxlanguage/documentation/#SupportedPairs>
> (13 languages in total), including the usual E-FIGS and CCJK plus
> French<>German without involving English as the pivot language.
> Translation services are what Google generates without the option for
> training the SMT engine on your particular lexicon. Nonetheless, Google
> translations have proven to be very intelligible in the mash-ups
> <http://globalwatchtower.com/2007/12/03/google-mt-dotsub/> that we have
> done or observed.
>
>
> Google says that its language API is simple and easy to use — versus an
> arcane call-level interface: It requires an input string to translate,
> the names of the source and target languages, and a callback function.
> We put that claim to the test with a short program that threw
> increasingly larger strings at the interface. We can attest that it is
> easy to use for short strings. We did notice a couple of restrictions in
> our sandbox (N.B. Common Sense Advisory Labs did not conduct exhaustive
> tests on the API — rather, we ran tests until we got bored with the
> permutations):
>
> * *Strings.* The API maxes out at around 1,200 characters per source
> string of plain text (figure on 100-120 words). While that’s good
> for including Google’s MT in your average application, it won’t
> help the average language service provider intent on
> pre-translating big files.
> * *Files and URLs.* If you want to translate files, set them up as
> HTML pages hanging off a website and type the URL into Google’s
> website translator
> <http://translate.google.com/translate_t?hl=en>. That worked for
> web pages and shorter documents, but choked on the unexpurgated
> HTML version of “Business Without Borders
> <http://www.businesswithoutborders.info/>” (a mere 122,000 words,
> give or take a couple hundred). We also tried translating the
> 19,000 words of Thomas Paine’s Common Sense
> <http://www.ushistory.org/paine/commonsense/singlehtml.htm>
> pamphlet into Japanese and Russian. Google translates the first
> 5,300 words, but leaves the rest of the page in English.
>
> Google’s AJAX Language API page promises future enhancements. We expect
> longer strings, named files, and longer documents to be part of future
> releases. What’s less likely in free Google MT are commercial features
> such as lexical tuning by company, industry-specific glossaries, or the
> feedback loop available since 2005 in Language Weaver
> <http://globalwatchtower.com/2005/10/25/machine-translation-language-weaver-microsoft/>
> (although Google does have a generalized “train the engine” function).
>
> * For information consumers and seekers of truth in languages other
> than their own, these advances will be good news. Higher quality,
> free machine translation utilities will lead to MT popping up in
> more and more applications.
> * For translators who don’t own translation memory software, we
> think that Google remains a great candidate for offering a
> gmail-like translation environment
> <http://globalwatchtower.com/2007/12/12/gmail-tm/>, replete with MT.
> * Smart LSPs should seriously consider preprocessing small projects
> through the Google engine and — depending on the output — decide
> whether it is worth post-editing or fully translating the text.
> After all, they really don’t have anything to lose and could
> increase the productivity of their translators.
> * Competing MT engines will need to move fast to stay ahead of the
> ad-funded portal. This API will make life difficult for the
> already besieged smaller players trying to sell their wares in a
> market monetized more by search and eyeballs than by software
> license revenue. Companies like SpeakLike and Transclick
> <http://globalwatchtower.com/2007/12/12/gmail-tm/> (one of 391
> World Economic Forum Technology Pioneers
> <http://www.weforum.org/en/Communities/Technology%20Pioneers/index.htm>)
> will likely add the Google engine to their suites of MT engines.
> Meanwhile, we don’t expect companies like Asia Online
> <http://globalwatchtower.com/2007/12/20/mt-eyeballs/>, Language
> Weaver, Microsoft, PROMT, SDL, SYSTRAN, and others with their own
> MT engines and advancing research to sit on the callable MT
> sidelines for long.
>
> Earlier today we spoke with Dimitris Sabatakakis, CEO at SYSTRAN, who
> said that “all MT providers should thank Google for the hype and
> excitement it brings as MT is now perceived as a practical and usable
> technology. This means there are more potential customers interested in
> a MT product or solution. Google’s investment in MT is proof that MT is
> a key technology for the emerging market and provides a solution to a
> real need. It is forcing all providers to raise their respective bars.
> If we stay static, we will collapse.”
>
> -------------------------
>
> Chevy “Nova”: Updating Bad Translation Apocrypha
> <http://www.globalwatchtower.com/2008/02/06/chevy-nova-updating-bad-translation-apocrypha/>
>
> Donald A. DePalma 6 February 2008
> http://www.globalwatchtower.com/2008/02/06/chevy-nova-updating-bad-translation-apocrypha/
>
>
> Not an hour goes by that we don’t receive an e-mail announcing a press
> release from a vendor. What we find most interesting is when a company
> issues a press release but fails to tell us (or anybody else) that it’s
> out there. That happened back in May when SDL noted that “Spanish leaves
> global marketers lost in translation.” Quoting the press release,
> “According to SDL, the top five worst translation mistakes made by
> companies looking to expand into the Spanish-speaking world” were the
> usual hackneyed examples of bad translation. These included “I saw the
> Pope” (/el Papa)/ translated as “I saw the potato” (/la papa/), the “Got
> milk?” slogan rendered as “Are you lactating?” in Spanish, and Parker
> introducing its non-leaking fountain pen in Spain with the slogan “it
> won’t leak in your pocket and embarrass you,” with the translator
> buddying up with a false friend (/embarazar/ means pregnant, not
> embarrassed). At least they left out the old chestnut about the Chevy
> Nova (/no va/ — get it?) in Latin America and the rumored over-medicated
> U.S. Latina who interpreted the “/once/ a day” on her prescription as
> “11 times a day.”
>
> What’s going on here? It’s all about search engine optimization. SDL
> cited these examples plus economic figures for Latin American growth to
> improve its SEO rankings for the Hispanic market. The company’s CMO
> figured that becoming associated with these sometimes apocryphal
> mistranslations was a good way to improve SDL’s search engine rankings.
> Of course, we’re doing the same here by recycling these oft-told tales
> of mistranslation.
>
> But wait — there are some really good examples of bad translations and
> cross-border mistakes out there. Here are a few of our favorites:
>
> * For our 2002 keynote at the SAE’s TopTec Multilingual
> Communication for the Automotive Industry conference, we found
> candidates for “Bad Product Name of the Year” among Japanese car
> makers selling in Latin America: Mazda Laputa (interpreted by
> Spanish speakers as /la puta/), Mitsubishi Pajero (slang for
> onanist), and Nissan Moco (snot). In that speech we cited an auto
> show description of the Laputa that might not be suitable for
> children — “Laputa ha mejorado su seguridad y ampliado su
> interior… Cuerpo diseñado para resistir impactos frontales.” Check
> that out at Yahoo! <http://babelfish.yahoo.com/translate_txt> or
> Google <http://www.google.com/language_tools?hl=en> free MT sites.
> * More recently, Car and Driver
> <http://www.caranddriver.com/autoshows/14559/2008-detroit-auto-show-we-translate-chinese-auto-brochures.html>
> magazine reviewed the translated claims of Chinese automakers at
> the Detroit Auto Show. The brochure for the Liebao CS6 SUV claimed
> “Gene of being Wild: VM engine brings you the long-awaited shock…
> only by stepping on the accelerograph, the mph will come to the
> peak in a second” and the BYD F3 sedan has “fuel efficiency stomach.”
> * Back to the subject of product names, we noticed a stand for a
> firm selling “Hyper STD” at the tekom conference in Wiesbaden,
> Germany last November (see photo above). Yuck! Most American
> buyers would steer clear of products associated with Sexually
> Transmitted Diseases.
> * When we tried the WiFi at the tekom conference Hotel Klee am Park
> in Wiesbaden, we read the English-language instructions that told
> us: “General technical supposition is a reticulation-card. Please
> arrange your reticulation-card to IP (automatic internet
> register).” Huh?
> * The classic post-Sputnik mistranslation of “wet sheep” for
> “hydraulic rams” in a Soviet science journal is an under-used
> classic example. That’s baaaad! Next time you think about
> referencing the Nova, try this one instead.
> * A friend who was an interpreter at the United Nations told us
> about a colleague who tried to amplify an emotionally-delivered
> idiomatic expression, suggesting that “we need to grab the bull by
> something other than the horns.” Ouch.
>
> But bad translations aren’t always funny. They can have serious
> consequences:
>
> * *Financial markets will shake. *Back in May 2005 a reporter for
> the China News Service pieced together a story about how currency
> appreciation might affect the market
> <http://online.wsj.com/public/article/SB111581539395830336-fMkM6GCThY_89ij8ljDO_jQgw6w_20060511.html?mod=public_home_us>.
> The People’s Daily had it translated into English without the
> subjunctive case, stating that China decided to revalue its
> currency 1.26% a month for a year. Bloomberg’s spider in London
> picked up the story and European equity markets rose on the news.
> While it was quickly repudiated, the error did cause market tremors.
> * *Armies can advance without consequence. *In August 1968 U.S. Army
> transcribers reportedly wrote down a transmission from a Soviet
> tank column as “my perexali most” rather than “my priexali v
> Most.” What was heard (a routine bridge-crossing exercise by a
> tank column) was not what happened (the arrival of Soviet tanks in
> Most, a city in sovereign
> <http://www.youtube.com/watch?v=W28CQQsH9S8> Czechoslovakia).
> * *Countries might disappear.* In October 2005 Iranian President
> Mahmoud Ahmadinejad
> <http://www.globalresearch.ca/index.php?context=va&aid=4527>
> reportedly called for Israel to be wiped off the map, but
> apparently he really “just” wanted to get rid of its government.
> True to form, Ahmadinejad didn’t clarify his remarks after the
> mistranslation, further complicating matters.
> * *Companies will get into trouble.* A senior executive at Yahoo!
> had to apologize for not giving U.S. Congressmen information about
> the company’s role in the imprisonment of a Chinese dissident
> <http://www.nytimes.com/2007/11/03/technology/03yahoo.htm>, Shi
> Tao. According to Yahoo!, a bad translation by an employee of a
> 2004 order from the Chinese government caused the problem.
>
> None of the mistakes after the “But wait” in this posting were machine
> translation miscues
> <http://globalwatchtower.com/2007/11/09/israeli-email-mt/> — they’re
> just bad translations by humans. Caveat lector!
>
>
> --------------------------
> JAJAH Advances Machine Interpretation
> <http://www.globalwatchtower.com/2008/08/12/jajah-machine-interpretation/>
> http://www.globalwatchtower.com/2008/08/12/jajah-machine-interpretation/
> Renato Beninatto and Nataly Kelly 12 August 2008
> Filed under (Interpretation
> <http://www.globalwatchtower.com/category/interpretation/>, Translation
> & Localization
> <http://www.globalwatchtower.com/category/translation-localization/>,
> Translation Technologies
> <http://www.globalwatchtower.com/category/translation-technologies/>,
> Language Industry
> <http://www.globalwatchtower.com/category/language-industry/>)
> 2 pepper rating
>
> When we first heard about JAJAH’s extremely simple process
> <http://www.jajahbabel.com/> for providing machine-based telephone
> interpretation, it sounded too good to be true. The process is comprised
> of three easy steps — simply dial a number from any phone, speak in
> English, and hand your phone to the person who speaks Mandarin. The way
> it is described, the service would seem to automate much of human
> interpreters’ work, and would be particularly helpful for situations in
> which telephone interpreters are used. As usual, if it sounds to good to
> be true, it probably is.We tested the service, currently touted as a way
> to help travelers overcome language barriers in China, just in time for
> the Beijing Olympics
> <http://www.globalwatchtower.com/2008/07/29/china-seeks-gold-medal-in-language-services/>.
> We conducted several tests and found that the service seemed to work
> quite well at some levels, in that it did correctly render some of our
> words into the target languages. However, the voice recognition
> component misunderstood some of our words, even when we conducted tests
> with speakers of native and near-native English. To test the service in
> Mandarin, we used voice-over samples recorded by professional talent,
> and the results were a bit difficult to understand in English — then
> again, we purposely used samples with brand names that we knew tend to
> be problematic for machine translation tools. Now that we’ve aired our
> complaints, let’s take a look at a few points on the bright side of this
> innovation:
>
> * *You get what you pay for — at least, in the early stages. *The
> service is free, so it should come as no surprise that it does not
> work perfectly yet. In spite of the disjointed target language
> versions we received in English and the fact that telephony
> provider JAJAH went with another Babel theme, we do not believe
> that the localization world will automatically relegate it to the
> role of industry laughingstock, as happened with BabelFish
> <http://www.globalwatchtower.com/2008/02/06/chevy-nova-updating-bad-translation-apocrypha/>.
> * *Free machine-based telephone interpretation is a first. *At
> Common Sense Advisory, we’ve been writing more in the past few
> months about the trend we are noticing toward computer-assisted
> interpretation (CAI)
> <http://www.commonsenseadvisory.com/research/report_view.php?id=66&cid=0>
> and the future synergies between translation memory and what we
> refer to as interpretation memory (IM) — pre-translated and
> pre-recorded words and phrases that serve to partially automate
> the process of interpretation. This additional focus in our
> research is intentional — CAI has already been widely implemented
> for devices used by the military, but this is one of the first
> instances we’re aware of that offers such a service for free,
> on-demand, via telephone, and to the general public. This type of
> service pushes CAI to a new level.
> * *Savvy developers will want to take note. *This offering from
> JAJAH may not appear at first to represent a major technological
> advancement, but it does prove to the world that machine
> interpretation (MI) is possible, even if the quality is not yet up
> to par. LSPs — especially telephone interpretation providers
> <http://www.globalwatchtower.com/2008/07/21/language-line-welcomes-networkomni-clients-back-into-the-fold/>
> — and technology companies that aim to stay ahead of the curve are
> well-served to keep CAI and MI on their radar. We predict that
> more and more of these services will begin to spring up soon.
>
> Even for the traveler who is willing to hit the re-dial button a few
> times and is able to accept an imperfect rendition, this service may be
> of limited use. While it’s certainly not as costly as some of the
> phone-based Chinese interpretation services that have recently been
> profiled in the Wall Street Journal
> <http://online.wsj.com/article/SB121624832986259935.html?mod=googlenews_wsj>
> and other media as services for travelers to the Olympics, it could
> prove to be cost-prohibitive for a person dialing the number repeatedly
> and trying to confirm the recording’s accuracy while sitting in a taxi
> in Beijing with the meter running — especially if proper nouns, such as
> the hotel name, are rendered incorrectly. That’s precisely what happened
> in our example — take a look at the video below and judge for yourself.
> In summary, we don’t see this service replacing the need for phone-based
> interpreters anytime soon, but the general impact — and possibilities —
> for the language services industry are definitely worth noting.
>
>
> ----------------------------------------------------------
> Google Shakes Up the Translation Memory Scene
> <http://www.globalwatchtower.com/2008/08/08/google-translation-center/>
> http://www.globalwatchtower.com/2008/08/08/google-translation-center/
> Nataly Kelly 8 August 2008
> Filed under (Translation & Localization
> <http://www.globalwatchtower.com/category/translation-localization/>,
> Translation Technologies
> <http://www.globalwatchtower.com/category/translation-technologies/>,
> Language Industry
> <http://www.globalwatchtower.com/category/language-industry/>)
>
> This week, there were rumblings about the forthcoming beta release of
> Google’s new translation management system (TMS), called Translation
> Center <https://www.google.com/accounts/ServiceLogin?service=gtrans>. If
> you’re familiar with Google Translate,
> <http://translate.google.com/translate_t> you might be thinking, “Big
> deal, this is just a low-tech, human version of what they’re already
> doing.” If so, you would be wrong: This is big news for the practice of
> translation. It seems that Google has been stalking the sector.
>
>
> We predicted in 2006
> <http://www.commonsenseadvisory.com/research/report_view.php?id=37&cid=0>
> that Google would open up its statistical machine translation engine for
> general usage — and so it did, as we reported in March 2008
> <http://www.globalwatchtower.com/2008/03/25/google-mt-api/>. Last
> December, we published our first report on collaborative translation
> <http://commonsenseadvisory.com/research/report_view.php?id=59&cid=0>,
> in which we explained how collaboration tools and open source concepts
> could increase translation efficiency. We’ve written about the merits of
> crowdsourcing
> <http://www.globalwatchtower.com/2008/03/27/collaborative-translation-and-crowdsourcing/>
> and how companies like Facebook, Google, and Sun Microsystems have
> pioneered work in this area.
>
>
> Google seems to have been listening. In December of 2007, we suggested a
> gmail-like model <http://www.globalwatchtower.com/2007/12/12/gmail-tm/>
> for translation memory and forecasted that a company from outside the
> language industry with no interest in selling tools — such as Ask,
> Google, or Yahoo! — might be well-served to make such an offer. Google
> has apparently done just that. It claims that its new translation
> management system (TMS) gives users the ability to request translations,
> find translators, and upload documents for translation into more than 40
> languages. It also enables freelancers to create and review content in
> their languages using free translation tools. Yes, free.
>
>
> Why would Google take an interest in supporting human translation
> activities? One big reason: It needs human support in order to build up
> its translation memory, so that Google Translate can evolve from a “me
> translate pretty one day” prototype to a reputable and reliable language
> conversion machine. True, there are some large sources of free
> translation memory out there already — such as the enormous database
> offered by the European Parliament
> <http://www.globalwatchtower.com/2008/01/21/free-tm-european-commission/>.
> But, to truly enable mass quantities of information to be shared around
> the globe, Google needs richer, vaster sources of TM than what’s
> currently in the public domain. After all, the typical web user might
> want to communicate now and then regarding things other than, say,
> official EU declarations and proceedings.
>
>
> Adding humans to the mix enables Google to gradually create a very large
> storehouse of translated words and phrases — exactly what TAUS is aiming
> for with its data sharing initiative
> <http://www.globalwatchtower.com/2008/06/26/taus-tda-charter/> and what
> Asia Online is doing with its human-enhanced statistical MT engine
> <http://www.globalwatchtower.com/2008/04/14/asia-online-portal/>. In a
> nutshell, Google will unite its cloud with the crowd to get as many
> helping hands on the job as it can.
>
>
> We’ll reserve our detailed comments on Google Translation Center until
> we can actually try it out for ourselves and see how it fares alongside
> other TMS programs — our in-depth report with translation management
> system scorecards
> <http://www.commonsenseadvisory.com/research/report_view.php?id=43&cid=5>
> for translation management suppliers will be published soon — but the
> big picture value of this news for the industry is clear. Even in its
> beta form, Google Translate showed decent promise
> <http://www.globalwatchtower.com/2007/10/30/mt-shootout/> for the future
> of automating written language mediation — it is a well-built machine
> translation engine.
>
> What separates Google from the rest of the MT field is that this machine
> is backed up by a manufacturer with plenty of money, data center power,
> disk space, and network infrastructure, not to mention expertise in the
> assembly and productization of raw information materials. But now, with
> the addition of humans, it has the opportunity to become well-oiled in
> addition to having a sturdy construction. What remains to be seen is if
> Google can find enough oil to maximize MT performance. Thankfully,
> translation memory is a plentiful resource — one that won’t require any
> drilling.
>
>
> -----------------------------
>
>
>
>
>
>
>
>
> --- On *Sat, 1/31/09, Don Osborn /<dzo at bisharat.net>/* wrote:
>
> From: Don Osborn <dzo at bisharat.net>
> Subject: RE: Cheeseburgery hamburgers and the problem of
> computerised translations
> To: lgpolicy-list at ccat.sas.upenn.edu
> Date: Saturday, January 31, 2009, 9:47 AM
>
> We all know MT (machine translation, aka computerized translation) is not
> perfect so I don't think this piece was particularly informative.
>
> The only news I see in it is that there is MT for Polish <-> English
> (probably has been for a while but this is the first note I've made of to
> it). Given what must be necessary to develop MT, it does not surprise me if
> a recently developed program churns out some
> cheeseburgery results (though I
> wonder who put that word in the lexicon).
>
> While on the topic, my favorite MT mistranslation was with an older version
> of Systranet.com (results duplicable on Babalfish): "discussion on
> fonts" in
> English became in Portuguese the equivalent of "quarrels in baptismal
> basins." Such blatantly outrageous results, though, speak to me as a
> non-specialist in the matter more of how the MT was set up than any inherent
> problem with setting up MT. Discussion in English is not really a synonym
> with its apparent cognates in Latin languages (at least French &
> Portuguese); and how often do English speakers use "font" to describe
> a what
> in Portuguese they call pias baptismas? I've never heard of cheesburgery
> before but will surely find a way to use it in conversation sometime - just
> not in MT.
>
> The real news is how useful MT can be in sorting through the gist of things
> in diverse
> languages, and how with new approaches the results are improving
> significantly. I hope FT takes a look at that, and how the complex and
> uneven progress in MTis changing the way we access and use multilingual
> content and documents.
>
> Don Osborn
>
>
>
> > -----Original Message-----
> > From: owner-lgpolicy-list at ccat.sas.upenn.edu [mailto:owner-lgpolicy-
> > list at ccat.sas.upenn.edu] On Behalf Of Harold Schiffman
> > Sent: Tuesday, January 27, 2009 11:18 AM
> > To: lp
> > Subject: Cheeseburgery hamburgers and the problem of computerised
> > translations
> >
> > Cheeseburgery hamburgers and the problem of computerised translations
> > January 26, 2009by Tony Barber
> >
> > This morning I found myself on a public platform in a Brussels hotel
> > for my first ever European bloggers' conference. As a representative
> > of an "establishment" news organisation, I was half-expecting
> to
> be
> > roasted alive. But in the end both Mark Mardell of the BBC, my friend
> > and fellow-guest, and I got through it safely enough. The most
> > perceptive contribution, I thought, came from a Romanian blogger who
> > made the point that the global blogosphere remains to a large extent
> > divided by language. For example, you can blog all you like in
> > Romanian, but most of the world won't have a clue what you're
> saying.
> >
> > A moderator responded to this by saying, "Try using
> computer-generated
> > translation." As I drifted back to my office, I recalled that the
> last
> > time I'd experimented with computers striving to change Italian into
> > English or Dutch into Spanish, the results had been pretty hopeless.
> > Perhaps things had improved over the last couple of years?
> >
> > Well, below are three examples of computerised translation - courtesy
> > of Google
> Language Tools - from French, German and Polish into
> > English. I am republishing the translations exactly as they came out,
> > punctuation mistakes and all, after I hit the button.
> >
> > 1) This is from a news story in Le Monde about US and European policy
> > in the Middle East. "Believing that the war in Gaza has imposed new
> > priorities and the administration of the new American president,
> > Barack Obama, might break with the unconditional support to Israel,
> > French diplomacy is trying to print in Europe, a change of tone
> > against the Hamas."
> >
> > As you can see, this translation starts off promisingly. In fact, it
> > scarcely puts a foot wrong until it loses control and talks, weirdly,
> > about printing changes of tone against the Hamas. Still, we sort of
> > know what's going on here. 7 out of 10 for Monsieur L'Ordinateur.
> >
> > 2) Now here's a sentence from a
> story in Germany's Süddeutsche
> Zeitung
> > about the US prison centre at Guantánamo and what Europe can do to
> > help close it down. "The fate Released Guantanamo prisoners ensures
> > fierce debates: Union politicians criticized the foreign ministers of
> > Vorpreschen Stein Meier - and refer the responsibility for the inmates
> > to the U.S."
> >
> > This is a pretty poor effort, Herr Computer. Particularly
> > disappointing is the omission of the preposition "of" between
> "fate"
> > and "released" (which also shouldn't have a capital R), and
> the
> > baffling three words "Vorpreschen Stein Meier". But let's be
> fair,
> > there's a modest degree of sense here. 5.5 out of 10.
> >
> > 3) Lastly, here's a sentence from the Polish newspaper Gazeta Wyborcza
> > on French leisure habits during the recession. "Economic crisis and
> > changing lifestyles, the French seriously affect
> the profits of French
> > cafes and restaurants. A sign of the collapse of the French culture of
> > the restaurant is visible on the streets of Paris rash of
> > quick-service bar, offering generally pogardzane a few years ago and
> > cheeseburgery hamburgers."
> >
> > No, dear readers, you have not gone potty. That's what it says. And I
> > am afraid, Pan Komputer, that it's utter gibberish. You get 2 out of
> > 10 - and an hour's detention in the language lab.
> >
> > http://blogs.ft.com/brusselsblog/2009/01/cheeseburgery-hamburgers-and-
> > the-problem-of-computerised-translations/
> >
> > --
> > **************************************
> > N.b.: Listing on the lgpolicy-list is merely intended as a service to
> > its members
> > and implies neither approval, confirmation nor agreement by the owner
> > or sponsor of
> > the list as to the veracity of a message's contents.
> Members who
> > disagree with a
> > message are encouraged to post a rebuttal. (H. Schiffman, Moderator)
> > *******************************************
>
>
>
--
Alexander J. Stein
Cell: (201) 412-9479
Email: alharaka at gmail.com
Skype: alexander.j.stein
AIM: elduderino6886
More information about the Lgpolicy-list
mailing list