<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Agreed completely. The degree to which MT is useful depends directly upon the capabilities of the engine itself, the source text, the purpose the translation will serve, and the target audience needs, among other issues. Most of the free online tools are not yet at the stage where they can be useful for projects such as conference abstracts and speaker bios -- except in cases such as the PAHO tool, which was designed specifically for a given domain. Also, both types of MT engines evolve and become more accurate as they "learn" with time. <br><br>As for literary translation and the translation of more "creative" text (marketing slogans, taglines, ad campaigns, and so on), it seems that MT is unlikely to be used widely for such purposes in our lifetimes. However, the majority of text that is mandated by language access legislation does not fall into that
category. In the realm of technology and future possibilities, those types of creative text lend themselves more to crowdsourced or collaborative translation, in which large numbers of individals collaborate on the translation, essentially using real-time voting on specific units of meaning to determine the most appropriate rendition for the target audience. However, CAT (computer-assisted translation) is already widespread for most other purposes, and can ensure that a given phrase, tagline, convention, etc., is consistent across a multitude of stanzas, documents, or output formats. An appropriate example of this is the European Commission's release of its translation memory, consisting of more than a million sentences in 22 languages. This saves costs, so that anytime those sentences are used in other contexts and publications, the translation can be re-used (for free). The possibilities of saving costs and implementing legislation related to language
access are extraordinary with such initiatives.<br><br>As more and more language access laws are passed requiring translation and interpreting services, the role of technology is growing, both due to the shortage of interpreters and translators in relation to the demand, and the high costs associated with such services. Costs are greatly reduced through appropriate use of technology, and quality improvement is achieved through CAT tools as well. <br><br>All of this said, "total automation" of translation (MT) and interpreting with "perfect output quality" remains the stuff of science fiction. There are some exceptions within very controlled settings, but for widespread use, the notion of quality itself -- even with 100% human translation -- is hotly debated, since there is a great deal of subjectivity involved in assessing the quality of translated text.<br><br>Within the realm of language access legislation, the widespread use of technology has
the impact of making language services more readily accessible and affordable. <br><br>Best regards,<br>Nataly<br><br>--- On <b>Sat, 1/31/09, Harold Schiffman <i><haroldfs@gmail.com></i></b> wrote:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;">From: Harold Schiffman <haroldfs@gmail.com><br>Subject: Re: Cheeseburgery hamburgers and the problem of computerised translations<br>To: lgpolicy-list@ccat.sas.upenn.edu<br>Date: Saturday, January 31, 2009, 11:06 AM<br><br><pre>I'm replying primarily to Don Osborn on this because he raised the issue of<br>how<br>relevant it is to language policy, or at least how relevant to this<br>list. I forwarded<br>the "cheeseburgery" message because I felt that relying on MT for<br>translation<br>is often an excuse for allowing a maximized use of English in various contexts,<br>and then assuming that MT will take care of the "problem".<br><br>Other
responders have shown here that some of the MT systems do a pretty<br>good job, especially with "technical" language. But I had experience<br>with the<br>opposite a few years ago, when I helped edit an issue of a journal on the topic<br>of the sociolinguistics of minority languages in France. All the articles were<br>in English, but it was my job to make them sound better in English, since they<br>had all been written by non-mother-tongue speakers of English. But one<br>in particular<br>had been machine-translated from French, and I found it impossible to make it<br>sound like "good" English. It was beyond help. We finally went back<br>to square one<br>and had it translated again, by a human being.<br><br>Maybe sociolinguistics is beyond the domain of "technical" writing,<br>and maybe that<br>was the reason the MT failed.<br><br>But I do think MT is a "policy" issue and will continue to haunt us<br>in<br>anything we write<br>that involves metaphor,
figurative language, poetry, and stuff like that.<br><br>HS<br><br><br>On Sat, Jan 31, 2009 at 10:54 AM, Al Haraka <alharaka@gmail.com> wrote:<br>> Nataly,<br>><br>> Thanks for the great response. I was very into this in college and took<br>the<br>> only classes available at my school on NLP. This is a good review. I<br>will<br>> definitely read that article!<br>><br>> Cheers,<br>> _AJS<br>><br>> Nataly Kelly wrote:<br>>><br>>> Google's statistical MT engine(http://translate.google.com/) is<br>available<br>>> in the following languages: Albanian, Arabic, Bulgarian, Catalan,<br>Chinese,<br>>> Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish,<br>>> French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian,<br>>> Italian, Japanese, Korean, Latvian, Lithuanian, Maltese, Norwegian,<br>Polish,<br>>> Portuguese, Romanian, Russian, Serbian, Slovak,
Slovenian, Spanish,<br>Swedish,<br>>> Thai, Turkish, Ukrainian and Vietnamese.<br>>><br>>> I will paste below a few recent Watchtower blog entries (independent<br>>> industry commentary) that might be of interest on the topic of both<br>>> rules-based and statistical MT. I would recommend clicking on the<br>actual<br>>> page URLs to see the related links, videos and images in case they do<br>not<br>>> display properly here. However, these just give a snapshot of the<br>state of<br>>> the market, and do not dive into the technical details of the machine<br>>> translation engines. Those are often the subject of papers and<br>presentations<br>>> within the localization and computational linguistics conference<br>circuits.<br>>><br>>> For some types of projects, MT can actually work well, especially for<br>>> controlled language and technical content. The Pan American
Health<br>>> Organization has had great success using their MT engine for technical<br>>> content. It is one of the best examples I have seen of domain-specific<br>MT.<br>>> More information:<br>http://www.paho.org/English/AM/GSP/TR/Machine_Trans.htm<br>>><br>>> There are currently several language service providers (LSPs) whose<br>>> business model is centered around using free or nearly-free machine<br>>> translation with human post-editing. However, MT is also widely used<br>for<br>>> gisting and is particularly helpful for scanning a large corpus to<br>determine<br>>> which areas might require TM+post-editing or computer-assisted<br>translation<br>>> (CAT) performed by humans but made easier through the use of<br>translation<br>>> memory and software tools that aid with flagging repeated text so that<br>it<br>>> only has to be translated once, terminology extraction and
management<br>tools<br>>> for ensuring consistent use of terminology, etc.<br>>><br>>> Another growing trend is machine interpretation (total automation of<br>>> spoken language interpretation), so I'll include one post on that<br>topic<br>>> below as well. Computer-assisted interpretation (CAI) is another<br>growing<br>>> trend, in which both end users and interpreters themselves are making<br>>> greater use of software, handheld devices, and desktop applications to<br>>> facilitate interpretation tasks.<br>>><br>>> I hope some of these blog posts will be useful to colleagues, although<br>it<br>>> is important to remember that, as blog entries, they provide just a<br>snapshot<br>>> of the current trends in the language services market. A great many<br>books<br>>> and journal articles exist on these topics that would lend greater<br>insight<br>>> to those interested in
the current state of the research.<br>>><br>>> Nataly Kelly<br>>><br>>> --------------------------<br>>> How Good Is Machine Translation? A Modest Test<br>>> <http://www.globalwatchtower.com/2007/10/30/mt-shootout/><br>>> http://www.globalwatchtower.com/2007/10/30/mt-shootout/<br>>> Donald A. DePalma 30 October 2007<br>>><br>>> The Wall Street Journal<br>>><br><http://online.wsj.com/article/SB119265174539562359.html?mod=yahoo_hs&ru=yahoo><br>>> recently opined that "translation software is at last good enough<br>to help<br>>> companies do business in other languages," noting a hoary case<br>study from<br>>> Ford and posturings from Google, Microsoft, and SDL — and few real<br>examples.<br>>> But that's fine. The Journal has just discovered MT, perhaps<br>looking for<br>>> juicier stories to put on its Page 3<br>>>
<http://en.wikipedia.org/wiki/Page_Three_girl> as Rupert<br>Murdoch's News Corp<br>>> <http://www.newscorp.com/> takes over as dowager queen of the<br>print media.<br>>><br>>><br>>> That said, MT is definitely on the must-review list for many companies<br>and<br>>> government agencies, but few are paying for it today. The biggest use<br>of<br>>> automated translation is free online machine translation (OLMT). How<br>>> widespread? Last year Common Sense Advisory asked 2,430 consumers in<br>>> non-Anglophone countries<br>>><br><http://commonsenseadvisory.com/research/report_view.php?id=36&cid=0><br>>> whether they tried free OLMT — more than half said that they<br>sometimes,<br>>> frequently, or always use machine translation to better understand<br>>> English-language websites.<br>>><br>>><br>>> So, like it or not, information consumers will
get what they want<br>>> translated by a quick hop over to their favorite free MT site. How<br>good will<br>>> the translations be? Let's consider some skeptical lines penned by<br>our<br>>> resident Carioca as he read our entry about changes in the MT guard<br>>> <http://globalwatchtower.com/2007/10/30/mt-google-systran/>. We<br>decided to<br>>> paste these immortal words into several free OLMT sites to test<br>Portuguese<br>>> into English translations rather than come up with the MT equivalent<br>of the<br>>> "the quick brown fox jumped over the lazy brown dog." One of<br>the systems<br>>> used statistical machine translation (SMT), the other 3 were<br>rules-based<br>>> (RBMT) systems (see the full version of Automated Machine Technology<br>>><br><http://commonsenseadvisory.com/research/report_view.php?id=37&cid=0><br>for a<br>>> discussion of the differences, pros, and
cons).<br>>><br>>> "Escrevo estas linhas em português para testar o novo método<br>de<br>>> tradução automática do Google. Eu não tenho grandes<br>expectativas com<br>>> relação a este tipo de serviço. Confesso que sou usuário<br>freqüente<br>>> para ter acesso a informações em idiomas que não domino, como o<br>>> alemão, chinês e japonês. O Google agora utiliza tradução<br>automática<br>>> com base estatística. Quer dizer, quanto mais traduções houver<br>de<br>>> uma frase, mais correta será a solução oferecida pelo<br>computador."<br>>><br>>> * Google <http://www.google.com/language_tools?hl=en> (SMT):<br>"I<br>>> write these lines in Portuguese to test the new method of<br>>> automatic translation of Google. I do not have great expectations<br>>> with respect to this type of service. I confess that I am
a<br>>> frequent user to access information in languages that no domain,<br>>> such as German, Chinese and Japanese. Google now uses machine<br>>> translation based statistics. That is, the more there translation<br>>> of a phrase, more correctly is the solution offered by the<br>computer."<br>>> * PROMT <http://www.e-promt.com/> (RBMT): "I write these<br>lines in<br>>> Portuguese to test the new method of automatic translation of the<br>>> Google. I have not big expectations regarding this type of<br>>> service. I confess that I am a frequent user to have access to<br>>> informations in languages that I do not dominate, like German,<br>>> Chinese and Japanese. The Google now uses automatic translation<br>>> with statistical base. It means, how much more translations will<br>>> be of a sentence, more correct will be the
solution offered by<br>the<br>>> computer."<br>>> * SDL <http://www.freetranslation.com/> (RBMT): "I write<br>these lines<br>>> in Portuguese for quiz the new approach of automatic translation<br>>> of the Google. I do not have big expectations regarding this kind<br>>> of service. Confessed that I am user frequent for have access the<br>>> information in languages that do not dominate, as the German,<br>>> Chinese and Japanese. The Google now utilizes automatic<br>>> translation with statistical base. It want to say, specially<br>>> translations will have of a phrase, more correct will be the<br>>> solution offered by the computer."<br>>> * SYSTRAN <http://www.systransoft.com/> (RBMT): "I write<br>these lines<br>>> in Portuguese to test the new method of automatic translation of<br>>> the Google. I do
not have great expectations with regard to this<br>>> type of service. I confess that I am using frequent to have<br>access<br>>> the information in languages that I do not dominate, as the<br>>> German, Chinese and Japanese. The Google now uses automatic<br>>> translation with base statistics. It wants to say, the more<br>>> translations will have of a phrase, more correct will be the<br>>> solution offered for the computer."<br>>><br>>> Judge for yourself. While none of these are perfect translations and<br>one<br>>> is definitely not at the quality level of the others, all 4 tell us<br>that<br>>> Senhor Beninatto wasn't writing a shopping list for "pound<br>pastrami, can<br>>> kraut, six bagels." For many web browsers, that ability to<br>determine the<br>>> subject of a communication will be good enough, allowing them to<br>determine<br>>>
whether they want to invest more time in a given piece of information.<br>>> Obviously, in more complex domains and in printed communications like<br>>> owner's manuals for a Porsche 911 GT3 RS<br>>> <http://www.porsche.com/usa/models/911/911-gt3-rs/> (Santa, are<br>you<br>>> listening?) or how to adjust the control rods for a nuclear fission<br>reactor,<br>>> tuning and accuracy will be much more of an issue.<br>>><br>>><br>>> ----------------------------<br>>> Changing of the Guard in Machine Translation<br>>> <http://www.globalwatchtower.com/2007/10/30/mt-google-systran/><br>>> http://www.globalwatchtower.com/2007/10/30/mt-google-systran/<br>>> Donald A. DePalma 30 October 2007<br>>><br>>> Most information will never be translated by humans from its source<br>>> language into even one other language, much less into many. Budgets,<br>>>
staffing, and time will always make organizations shy away from<br>translating<br>>> even a small fraction of the words they have on hand. Many companies<br>and<br>>> government agencies will use some form of automated translation to<br>improve<br>>> services to customers and constituencies. However, many information<br>>> consumers will avail themselves of free online machine translation<br>(OLMT) if<br>>> they don't find their language at a website.<br>>><br>>><br>>> Most of that free OLMT to date has been provided by SYSTRAN<br>>><br><http://globalwatchtower.com/2007/02/14/systran-2006-financial-results/>,<br>a<br>>> French software firm that grew up during the Cold War as the Free<br>World<br>>><br><http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=free+world><br>faced<br>>> off against the Moscow-led Warsaw Pact<br>>>
<http://en.wikipedia.org/wiki/Warsaw_Pact>. In October new<br>challenges arose<br>>> from the new guard, including the Russians themselves.<br>>><br>>> * Google reportedly replaced the languages that SYSTRAN translated<br>>> for it in favor of its in-house statistical machine translation<br>>> (SMT) engine. Google's homegrown technology came into wide<br>view<br>>> when it won the no-holds-barred NIST Machine Translation<br>>> Evaluation<br>>><br>>> <br><http://globalwatchtower.com/2005/08/22/machine-translation-benchmark-bleu-nist/><br>>> in 2005. Google's MT is part of the GooglePlex — that is,<br>not yet<br>>> a commercially available product, but, like its search appliance,<br>>> MT could become a Google product. Try it here<br>>> <http://www.google.com/language_tools?hl=en>.<br>>> * SMT-based Language
Weaver opened its second sales office in<br>Europe<br>>><br>>> <br><http://www.languageweaver.com/page.asp?intNodeID=856&intPageID=1181>.<br>>> After its initial success selling to certain U.S. government<br>>> agencies, Language Weaver made its 2006 European debut in<br>>> bureaucrat-dense, government-rich Brussels. Its latest digs are<br>in<br>>> Paris, hometown of SYSTRAN — and presumably of some commercial<br>>> buyers. Free use of Language Weaver on the web is harder to find<br>>> than Google or SYSTRAN. Earlier this year the company announced<br>>> that the social bookmarking<br>>><br>>> <br><http://www.languageweaver.com/Page.asp?LSM=&intNodeID=856&intPageID=1018><br>>> site Kontrib <http://www.kontrib.com/> was using its<br>technology,<br>>> giving everyone a chance to see its output. Expect
Language<br>Weaver<br>>> to host its own OLMT site as part of its marketing expansion.<br>>> * St. Petersburg-based PROMT announced a significant uptick in the<br>>> use of its free OLMT<br><http://www.e-promt.com/en/news/6073.php>.<br>>> This followed its September announcement of V7.8 with support for<br>>> Windows Vista <http://www.e-promt.com/en/news/5544.php>,<br>while<br>>> those fortunate enough to speak Russian already have access to<br>>> Version 8.0 <http://www.promt.ru/> with its improved<br>algorithms<br>>> and usability. Try its free OLMT <http://www.e-promt.com/>.<br>>><br>>> The bottom line: Most consumers will never buy desktop machine<br>translation<br>>> software from LEC, PROMT, or SYSTRAN for their PCs, Macs, or<br>smartphones.<br>>> However, they will have free MT available in the cloud from
Google,<br>Language<br>>> Weaver, LogoVista<br><http://www.logovista.co.jp/english/index.html>,<br>>> Microsoft, PROMT, SYSTRAN , and through portals like Yahoo! BabelFish<br>>> <http://babelfish.yahoo.com/>. How well do they work? Click here<br>for a<br>>> modest example<br><http://globalwatchtower.com/2007/10/30/mt-shootout/>.<br>>><br>>><br>>><br>>> ----------------------------<br>>><br>>> Seeking an MT Market beyond Ad-Reading Eyeballs<br>>><br><http://www.globalwatchtower.com/2008/09/25/language-weaver-estimate/><br>>> http://www.globalwatchtower.com/2008/09/25/language-weaver-estimate/<br>>> Donald A. DePalma 25 September 2008<br>>><br>>> Last week, Language Weaver projected a US$67.5 billion market for<br>digital<br>>> translation, enabled by advances in machine translation (MT). For the<br>last<br>>> few years, we have
released an annual estimate of the market for<br>outsourced<br>>> translation, localization, and interpretation. For 2008,<br>human-delivered<br>>> translation activities will total a hefty US$14.25 billion (see our<br>"Ranking<br>>> of Top 25 Translation Agencies<br>>><br><http://www.commonsenseadvisory.com/members/res_cgi.php/080528_QT_2008_top_25_lsps.php>").<br>>> On the software side, we estimate that the MT software market falls<br>well<br>>> short of US$100 million. Added together, there's a lot of daylight<br>between<br>>> our numbers and Language Weaver's estimate. Where's the<br>disconnect? Over the<br>>> last week, we've spent a lot of time talking with various people<br>about the<br>>> US$67.5 billion projection.<br>>><br>>><br>>> Let's start off by deconstructing the 67 billion dollar number.<br>That is an<br>>> estimate of the monetary value that Language
Weaver thinks MT<br>suppliers<br>>> "could" translate for corporations and governments; the<br>operative phrase in<br>>> the company's press release is "untapped markets" where<br>automated<br>>> translation could increase the volume and lower the cost of human<br>>> translation, which stands at current market prices of 10-40 cents per<br>word<br>>><br><http://www.commonsenseadvisory.com/research/report_view.php?id=63&cid=0>.<br>>><br>>> How good is Language Weaver's sizing of the as yet unrealized<br>market? We<br>>> think its number is way too low, especially as the amount of stored<br>content<br>>> grows at record levels (see the figure below from our report on<br>"Automated<br>>> Translation Technology<br>>><br><http://www.commonsenseadvisory.com/research/report_view.php?id=37&cid=0>").<br>>><br>>><br>>> The untapped market potential is much
higher, but the problem is still<br>>> getting buyers on board. Language Weaver will target customer care,<br>business<br>>> intelligence, and user-generated content, three markets where<br>companies<br>>> could benefit from moving content out of linguistic silos. However,<br>the<br>>> organizations today that stand to gain the most from MT are those<br>driving<br>>> advertisement-reading eyeballs to their sites<br>>> <http://www.globalwatchtower.com/2007/12/20/mt-eyeballs/>. The<br>challenge<br>>> that Language Weaver and rival developers face is getting more people<br>>> accustomed to the idea of paying for MT software or SaaS solutions<br>that will<br>>> help them translate their content into other languages. Three<br>roadblocks<br>>> stand in the way:<br>>><br>>> * *Free machine translation obscures the value.* There's an<br>>> enormous amount of content
that's translated every day online<br>>> using free online machine translation sites, but no one has<br>>> figured out how to directly monetize those interactions. We have<br>>> long contended that there's far more text that consumers,<br>>> businesses, and governments might run through those engines if<br>>> they could more easily plug them into workflows, e-email systems,<br>>> mobile phones, and other networked appliances. Combine a dollar<br>>> figure for the unmonetized activity that's happening today at<br>>> sites like Google Translate or Yahoo!'s Babel Fish with the<br>dollar<br>>> value for things that should be translated - and you've got<br>some<br>>> really big piles of zeroes. The problem is that there are usually<br>>> no positive integers to the left of those zeroes. Bottom line:<br>Too<br>>> much of it is
free.<br>>> * *Unpaid human translation appears to be a panacea.* Another<br>rival<br>>> to MT is community or collaborative translation<br>>> <http://www.globalwatchtower.com/2007/10/16/end-of-tep/><br>for both<br>>> company- and user-generated content, such as we're seeing at<br>>> Facebook<br>>><br>>> <br><http://wiki.developers.facebook.com/index.php/Translating_Platform_Applications><br>>> (social networking), Livemocha<br>>> <http://translate.livemocha.com/doku.php> (language<br>learning), and<br>>> NetBeans<br>>> <br><http://www.netbeans.org/community/contribute/localise.html> (Java<br>>> software development). These communities can fill some of the<br>>> demand, but nowhere near all of it. That leaves a lot of<br>>> information forever locked in the language in which it
was<br>created.<br>>> * *An uneducated market expects too much or too little.* Potential<br>>> buyers retain unrealistic (read "Star Trek" or<br>Hitchhiker's<br>>> Guide") expectations of what they will get out of machine<br>>> translation. Some ignore the quality issue<br>>><br>>> <br><http://www.commonsenseadvisory.com/research/report_view.php?id=68&cid=0><br>>> altogether, posting babble-fishy output and thinking they did a<br>>> good thing in providing any in-language content at all.<br>Meanwhile,<br>>> many individual translators and too many translation agencies<br>miss<br>>> the point; they think that MT threatens their livelihood rather<br>>> than viewing it as a productivity enhancer.<br>>><br>>> That said, the corporate and governmental sectors may be turning the<br>>> corner vis-à-vis MT acceptance, if not
purchasing. A poll conducted<br>by the<br>>> International Association for Machine Translation (IAMT) and<br>Association for<br>>> Machine Translation Americas (AMTA) for SDL<br>>><br><http://www.sdl.com/en/events/news-PR/sdl-research-trends-in-automated-translation.asp>,<br>>> another provider of machine translation technology, found that 40<br>percent of<br>>> the 385 surveyed individuals were "now" likely to use MT. Of<br>those roughly<br>>> 150 receptive respondents, 62 percent said they would use it for<br>technical<br>>> documentation, 49 percent for support and knowledge-based content.<br>That's<br>>> good news for the MT software sector, but could be bad news if<br>automated<br>>> translation merely displaces the work of traditional translation<br>agencies<br>>> rather than increase the size of the overall business.<br>>><br>>> --------------------------<br>>>
Asia Online Aims to Meet Asian Content Demands with MT+<br>>> <http://www.globalwatchtower.com/2008/04/14/asia-online-portal/><br>>> Donald A. DePalma 14 April 2008<br>>> http://www.globalwatchtower.com/2008/04/14/asia-online-portal/<br>>><br>>> For the last dozen of so years we've heard ourselves incessantly<br>reminding<br>>> everyone that the "www" in most URLs means "worldwide<br>web," while the "e" in<br>>> "e-commerce" all too often stands for English. Our research<br>on e-GDP<br>>><br><http://commonsenseadvisory.com/research/report_view.php?id=55&cid=0><br>>> (online GDP) and the Availability Quotient<br>>><br><http://commonsenseadvisory.com/research/report_view.php?id=60&cid=0><br>>> demonstrated that many companies still have a long journey before they<br>can<br>>> meet the demands of the world's markets for local-language<br>content. That
gap<br>>> is no more apparent than in Asia where the amount of in-language<br>content is<br>>> dwarfed by the growing online population.<br>>><br>>><br>>> Just how dwarfed? Today, roughly 38% of internet users live in Asia,<br>but<br>>> by 2012, that number will jump to half. However, local-language<br>content<br>>> hasn't kept pace. In 2007, non-Asian languages accounted for<br>roughly 86% of<br>>> the content on the web. Most of the remaining 14% was split among<br>Japanese<br>>> (6%), Chinese, (6%), and Korean (1.5%). All other Asian languages<br>comprise<br>>> less than 0.03% of the web's content; for example, Southeast Asian<br>languages<br>>> make up less than 10 million pages. Given consumer preference for<br>content in<br>>> their own language<br>>><br><http://commonsenseadvisory.com/research/report_view.php?id=36&cid=0>,<br>that<br>>> huge gap between
Asian content and total online population represents<br>a huge<br>>> opportunity.<br>>><br>>><br>>> That opportunity has not gone unnoticed. After getting an eyes-only,<br>>> tell-no-one pre-briefing in December, we recently spoke with Asia<br>Online CEO<br>>> Dion Wiggins who called us to tell us that his portal had just scored<br>its<br>>> first round of funding from JAIC<br>>> <http://www.asiaonline.net/corporate/news.aspx#News05>, the<br>Japanese venture<br>>> capital behind Alibaba.com<br>>><br><http://globalwatchtower.com/2007/11/09/a-big-week-for-china-on-the-big-board-on-the-bund-and-beyond/>,<br>>> among others. He also wanted to let us know that Kirti Vashee<br>>> <http://www.asiaonline.net/corporate/news.aspx#News04>, formerly<br>VP of<br>>> marketing at Language Weaver, had signed on as Asia Online's VP of<br>sales for<br>>> the Americas and
Europe with the responsibility for selling the<br>commercial<br>>> version of its MT engine.<br>>><br>>><br>>> Asia Online's plans revolve around a proprietary machine<br>translation<br>>> engine plus a strong support infrastructure of humans, content, and<br>partners<br>>> are key to this strategy:<br>>><br>>> * *New technology.* Asia Online developed high-performance<br>>> statistical machine translation (SMT) software in collaboration<br>>> with University of Edinburgh professor Philipp Koehn.<br>>><br>>> * *Clean corpora.* Asia Online contracts with publishers, language<br>>> service providers, and eventually corporations for<br>>> human-translated content to train its SMT engine. The company<br>also<br>>> crowdsources the quality via a large community of students, and<br>>> feeds the validated content back into the
system as training<br>data.<br>>><br>>> * *Matrixed language learning.* The SMT engine can take<br>translations<br>>> of a novel into English, Japanese, and Thai and use the<br>>> permutation to train itself on English<>Thai,<br>English<>Japanese,<br>>> and Japanese<>Thai. This capability is especially important<br>for<br>>> languages that don't have enough content to feed a<br>data-hungry<br>>> statistical MT engine.<br>>><br>>> * *Real-time fixes.* Its MT engine lets reviewers observe<br>>> translation decisions as they are being made, allowing them to<br>>> influence choices, make fixes in place, and propagate these<br>>> modifications to wherever that phrase or term is used<br>>><br>>> Asia Online is talking with LSPs interested in using its SMT engine<br>and<br>>> has fielded corporate requests to
use its software. We think that its<br>real<br>>> value lies in its Google-esque plan to drive billions of eyeballs<br>>> <http://globalwatchtower.com/2007/12/20/mt-eyeballs/> seeking<br>content in<br>>> their own languages — and the advertising, special offers, and the<br>>> next-generation linguistic tools that are sure to follow.<br>>><br>>> --------------------------<br>>> Google MT Puts Multilingual Information at More Fingertips<br>>> <http://www.globalwatchtower.com/2008/03/25/google-mt-api/><br>>> http://www.globalwatchtower.com/2008/03/25/google-mt-api/<br>>> Donald A. DePalma 25 March 2008<br>>><br>>> As we predicted in our 2006 report on machine translation<br>>><br><http://commonsenseadvisory.com/research/report_view.php?id=37&cid=0>,<br>>> Google has opened its MT engine to general usage — but with no<br>software<br>>> license
or other fees. Acknowledging that automated translation right<br>now is<br>>> all about eyeballs,<br><http://globalwatchtower.com/2007/12/20/mt-eyeballs/><br>>> Google made its newly documented AJAX Language API for Translation and<br>>> Language Detection<br><http://code.google.com/apis/ajaxlanguage/documentation/><br>>> beta release free to anyone who decides to call it. By the way, we<br>would<br>>> have put "language detection" first in the API's name,<br>but Google knows a<br>>> bit more about SEO than we do.<br>>><br>>><br>>> As the name implies, you can use this application programming<br>interface to<br>>> detect language blocks in a text and translate them. Translation<br>requests go<br>>> to Google's pretty good statistical MT engine<br>>> <http://globalwatchtower.com/2007/10/30/mt-shootout/> (SMT). The<br>API<br>>> supports 29 language
pairs<br>>><br><http://code.google.com/apis/ajaxlanguage/documentation/#SupportedPairs><br>(13<br>>> languages in total), including the usual E-FIGS and CCJK plus<br>French<>German<br>>> without involving English as the pivot language. Translation services<br>are<br>>> what Google generates without the option for training the SMT engine<br>on your<br>>> particular lexicon. Nonetheless, Google translations have proven to be<br>very<br>>> intelligible in the mash-ups<br>>> <http://globalwatchtower.com/2007/12/03/google-mt-dotsub/> that<br>we have done<br>>> or observed.<br>>><br>>><br>>> Google says that its language API is simple and easy to use — versus<br>an<br>>> arcane call-level interface: It requires an input string to translate,<br>the<br>>> names of the source and target languages, and a callback function. We<br>put<br>>> that claim to the test
with a short program that threw increasingly<br>larger<br>>> strings at the interface. We can attest that it is easy to use for<br>short<br>>> strings. We did notice a couple of restrictions in our sandbox (N.B.<br>Common<br>>> Sense Advisory Labs did not conduct exhaustive tests on the API —<br>rather, we<br>>> ran tests until we got bored with the permutations):<br>>><br>>> * *Strings.* The API maxes out at around 1,200 characters per<br>source<br>>> string of plain text (figure on 100-120 words). While that's<br>good<br>>> for including Google's MT in your average application, it<br>won't<br>>> help the average language service provider intent on<br>>> pre-translating big files.<br>>> * *Files and URLs.* If you want to translate files, set them up as<br>>> HTML pages hanging off a website and type the URL into<br>Google's<br>>> website
translator<br>>> <http://translate.google.com/translate_t?hl=en>. That<br>worked for<br>>> web pages and shorter documents, but choked on the unexpurgated<br>>> HTML version of "Business Without Borders<br>>> <http://www.businesswithoutborders.info/>" (a mere<br>122,000 words,<br>>> give or take a couple hundred). We also tried translating the<br>>> 19,000 words of Thomas Paine's Common Sense<br>>> <http://www.ushistory.org/paine/commonsense/singlehtml.htm><br>>> pamphlet into Japanese and Russian. Google translates the first<br>>> 5,300 words, but leaves the rest of the page in English.<br>>><br>>> Google's AJAX Language API page promises future enhancements. We<br>expect<br>>> longer strings, named files, and longer documents to be part of future<br>>> releases. What's less likely in free Google MT are
commercial<br>features such<br>>> as lexical tuning by company, industry-specific glossaries, or the<br>feedback<br>>> loop available since 2005 in Language Weaver<br>>><br><http://globalwatchtower.com/2005/10/25/machine-translation-language-weaver-microsoft/><br>>> (although Google does have a generalized "train the engine"<br>function).<br>>><br>>> * For information consumers and seekers of truth in languages other<br>>> than their own, these advances will be good news. Higher quality,<br>>> free machine translation utilities will lead to MT popping up in<br>>> more and more applications.<br>>> * For translators who don't own translation memory software, we<br>>> think that Google remains a great candidate for offering a<br>>> gmail-like translation environment<br>>> <http://globalwatchtower.com/2007/12/12/gmail-tm/>,
replete<br>with MT.<br>>> * Smart LSPs should seriously consider preprocessing small projects<br>>> through the Google engine and — depending on the output —<br>decide<br>>> whether it is worth post-editing or fully translating the text.<br>>> After all, they really don't have anything to lose and could<br>>> increase the productivity of their translators.<br>>> * Competing MT engines will need to move fast to stay ahead of the<br>>> ad-funded portal. This API will make life difficult for the<br>>> already besieged smaller players trying to sell their wares in a<br>>> market monetized more by search and eyeballs than by software<br>>> license revenue. Companies like SpeakLike and Transclick<br>>> <http://globalwatchtower.com/2007/12/12/gmail-tm/> (one of<br>391<br>>> World Economic Forum Technology
Pioneers<br>>><br>>> <br><http://www.weforum.org/en/Communities/Technology%20Pioneers/index.htm>)<br>>> will likely add the Google engine to their suites of MT engines.<br>>> Meanwhile, we don't expect companies like Asia Online<br>>> <http://globalwatchtower.com/2007/12/20/mt-eyeballs/>,<br>Language<br>>> Weaver, Microsoft, PROMT, SDL, SYSTRAN, and others with their own<br>>> MT engines and advancing research to sit on the callable MT<br>>> sidelines for long.<br>>><br>>> Earlier today we spoke with Dimitris Sabatakakis, CEO at SYSTRAN, who<br>said<br>>> that "all MT providers should thank Google for the hype and<br>excitement it<br>>> brings as MT is now perceived as a practical and usable technology.<br>This<br>>> means there are more potential customers interested in a MT product or<br>>> solution. Google's investment in MT
is proof that MT is a key<br>technology for<br>>> the emerging market and provides a solution to a real need. It is<br>forcing<br>>> all providers to raise their respective bars. If we stay static, we<br>will<br>>> collapse."<br>>><br>>> -------------------------<br>>><br>>> Chevy "Nova": Updating Bad Translation Apocrypha<br>>><br><http://www.globalwatchtower.com/2008/02/06/chevy-nova-updating-bad-translation-apocrypha/><br>>> Donald A. DePalma 6 February 2008<br>>><br>>><br>http://www.globalwatchtower.com/2008/02/06/chevy-nova-updating-bad-translation-apocrypha/<br>>><br>>><br>>> Not an hour goes by that we don't receive an e-mail announcing a<br>press<br>>> release from a vendor. What we find most interesting is when a company<br>>> issues a press release but fails to tell us (or anybody else) that<br>it's out<br>>> there. That happened back in
May when SDL noted that "Spanish<br>leaves global<br>>> marketers lost in translation." Quoting the press release,<br>"According to<br>>> SDL, the top five worst translation mistakes made by companies looking<br>to<br>>> expand into the Spanish-speaking world" were the usual hackneyed<br>examples of<br>>> bad translation. These included "I saw the Pope" (/el Papa)/<br>translated as<br>>> "I saw the potato" (/la papa/), the "Got milk?"<br>slogan rendered as "Are you<br>>> lactating?" in Spanish, and Parker introducing its non-leaking<br>fountain pen<br>>> in Spain with the slogan "it won't leak in your pocket and<br>embarrass you,"<br>>> with the translator buddying up with a false friend (/embarazar/ means<br>>> pregnant, not embarrassed). At least they left out the old chestnut<br>about<br>>> the Chevy Nova (/no va/ — get it?) in Latin America and the rumored<br>>> over-medicated U.S.
Latina who interpreted the "/once/ a<br>day" on her<br>>> prescription as "11 times a day."<br>>><br>>> What's going on here? It's all about search engine<br>optimization. SDL cited<br>>> these examples plus economic figures for Latin American growth to<br>improve<br>>> its SEO rankings for the Hispanic market. The company's CMO<br>figured that<br>>> becoming associated with these sometimes apocryphal mistranslations<br>was a<br>>> good way to improve SDL's search engine rankings. Of course,<br>we're doing the<br>>> same here by recycling these oft-told tales of mistranslation.<br>>><br>>> But wait — there are some really good examples of bad translations<br>and<br>>> cross-border mistakes out there. Here are a few of our favorites:<br>>><br>>> * For our 2002 keynote at the SAE's TopTec Multilingual<br>>> Communication for the Automotive Industry conference, we
found<br>>> candidates for "Bad Product Name of the Year" among<br>Japanese car<br>>> makers selling in Latin America: Mazda Laputa (interpreted by<br>>> Spanish speakers as /la puta/), Mitsubishi Pajero (slang for<br>>> onanist), and Nissan Moco (snot). In that speech we cited an auto<br>>> show description of the Laputa that might not be suitable for<br>>> children — "Laputa ha mejorado su seguridad y ampliado su<br>>> interior… Cuerpo diseñado para resistir impactos<br>frontales." Check<br>>> that out at Yahoo!<br><http://babelfish.yahoo.com/translate_txt> or<br>>> Google <http://www.google.com/language_tools?hl=en> free MT<br>sites.<br>>> * More recently, Car and Driver<br>>><br>>> <br><http://www.caranddriver.com/autoshows/14559/2008-detroit-auto-show-we-translate-chinese-auto-brochures.html><br>>>
magazine reviewed the translated claims of Chinese automakers at<br>>> the Detroit Auto Show. The brochure for the Liebao CS6 SUV<br>claimed<br>>> "Gene of being Wild: VM engine brings you the long-awaited<br>shock…<br>>> only by stepping on the accelerograph, the mph will come to the<br>>> peak in a second" and the BYD F3 sedan has "fuel<br>efficiency stomach."<br>>> * Back to the subject of product names, we noticed a stand for a<br>>> firm selling "Hyper STD" at the tekom conference in<br>Wiesbaden,<br>>> Germany last November (see photo above). Yuck! Most American<br>>> buyers would steer clear of products associated with Sexually<br>>> Transmitted Diseases.<br>>> * When we tried the WiFi at the tekom conference Hotel Klee am Park<br>>> in Wiesbaden, we read the English-language instructions that told<br>>> us: "General
technical supposition is a reticulation-card.<br>Please<br>>> arrange your reticulation-card to IP (automatic internet<br>>> register)." Huh?<br>>> * The classic post-Sputnik mistranslation of "wet sheep"<br>for<br>>> "hydraulic rams" in a Soviet science journal is an<br>under-used<br>>> classic example. That's baaaad! Next time you think about<br>>> referencing the Nova, try this one instead.<br>>> * A friend who was an interpreter at the United Nations told us<br>>> about a colleague who tried to amplify an emotionally-delivered<br>>> idiomatic expression, suggesting that "we need to grab the<br>bull by<br>>> something other than the horns." Ouch.<br>>><br>>> But bad translations aren't always funny. They can have serious<br>>> consequences:<br>>><br>>> * *Financial markets will shake. *Back in May 2005 a reporter
for<br>>> the China News Service pieced together a story about how currency<br>>> appreciation might affect the market<br>>><br>>> <br><http://online.wsj.com/public/article/SB111581539395830336-fMkM6GCThY_89ij8ljDO_jQgw6w_20060511.html?mod=public_home_us>.<br>>> The People's Daily had it translated into English without the<br>>> subjunctive case, stating that China decided to revalue its<br>>> currency 1.26% a month for a year. Bloomberg's spider in<br>London<br>>> picked up the story and European equity markets rose on the news.<br>>> While it was quickly repudiated, the error did cause market<br>tremors.<br>>> * *Armies can advance without consequence. *In August 1968 U.S.<br>Army<br>>> transcribers reportedly wrote down a transmission from a Soviet<br>>> tank column as "my perexali most" rather than "my<br>priexali
v<br>>> Most." What was heard (a routine bridge-crossing exercise by<br>a<br>>> tank column) was not what happened (the arrival of Soviet tanks<br>in<br>>> Most, a city in sovereign<br>>> <http://www.youtube.com/watch?v=W28CQQsH9S8><br>Czechoslovakia).<br>>> * *Countries might disappear.* In October 2005 Iranian President<br>>> Mahmoud Ahmadinejad<br>>> <br><http://www.globalresearch.ca/index.php?context=va&aid=4527><br>>> reportedly called for Israel to be wiped off the map, but<br>>> apparently he really "just" wanted to get rid of its<br>government.<br>>> True to form, Ahmadinejad didn't clarify his remarks after<br>the<br>>> mistranslation, further complicating matters.<br>>> * *Companies will get into trouble.* A senior executive at Yahoo!<br>>> had to apologize for not giving U.S. Congressmen
information<br>about<br>>> the company's role in the imprisonment of a Chinese dissident<br>>> <http://www.nytimes.com/2007/11/03/technology/03yahoo.htm>,<br>Shi<br>>> Tao. According to Yahoo!, a bad translation by an employee of a<br>>> 2004 order from the Chinese government caused the problem.<br>>><br>>> None of the mistakes after the "But wait" in this posting<br>were machine<br>>> translation miscues<br>>> <http://globalwatchtower.com/2007/11/09/israeli-email-mt/> —<br>they're just<br>>> bad translations by humans. Caveat lector!<br>>><br>>><br>>> --------------------------<br>>> JAJAH Advances Machine Interpretation<br>>><br><http://www.globalwatchtower.com/2008/08/12/jajah-machine-interpretation/><br>>><br>http://www.globalwatchtower.com/2008/08/12/jajah-machine-interpretation/<br>>> Renato Beninatto and Nataly
Kelly 12 August 2008<br>>> Filed under (Interpretation<br>>> <http://www.globalwatchtower.com/category/interpretation/>,<br>Translation &<br>>> Localization<br>>><br><http://www.globalwatchtower.com/category/translation-localization/>,<br>>> Translation Technologies<br>>><br><http://www.globalwatchtower.com/category/translation-technologies/>,<br>>> Language Industry<br>>> <http://www.globalwatchtower.com/category/language-industry/>)<br>>> 2 pepper rating<br>>><br>>> When we first heard about JAJAH's extremely simple process<br>>> <http://www.jajahbabel.com/> for providing machine-based<br>telephone<br>>> interpretation, it sounded too good to be true. The process is<br>comprised of<br>>> three easy steps — simply dial a number from any phone, speak in<br>English,<br>>> and hand your phone to the person who speaks Mandarin. The
way it is<br>>> described, the service would seem to automate much of human<br>interpreters'<br>>> work, and would be particularly helpful for situations in which<br>telephone<br>>> interpreters are used. As usual, if it sounds to good to be true, it<br>>> probably is.We tested the service, currently touted as a way to help<br>>> travelers overcome language barriers in China, just in time for the<br>Beijing<br>>> Olympics<br>>><br><http://www.globalwatchtower.com/2008/07/29/china-seeks-gold-medal-in-language-services/>.<br>>> We conducted several tests and found that the service seemed to work<br>quite<br>>> well at some levels, in that it did correctly render some of our words<br>into<br>>> the target languages. However, the voice recognition component<br>misunderstood<br>>> some of our words, even when we conducted tests with speakers of<br>native and<br>>> near-native
English. To test the service in Mandarin, we used<br>voice-over<br>>> samples recorded by professional talent, and the results were a bit<br>>> difficult to understand in English — then again, we purposely used<br>samples<br>>> with brand names that we knew tend to be problematic for machine<br>translation<br>>> tools. Now that we've aired our complaints, let's take a look<br>at a few<br>>> points on the bright side of this innovation:<br>>><br>>> * *You get what you pay for — at least, in the early stages. *The<br>>> service is free, so it should come as no surprise that it does<br>not<br>>> work perfectly yet. In spite of the disjointed target language<br>>> versions we received in English and the fact that telephony<br>>> provider JAJAH went with another Babel theme, we do not believe<br>>> that the localization world will automatically relegate it
to the<br>>> role of industry laughingstock, as happened with BabelFish<br>>><br>>> <br><http://www.globalwatchtower.com/2008/02/06/chevy-nova-updating-bad-translation-apocrypha/>.<br>>> * *Free machine-based telephone interpretation is a first. *At<br>>> Common Sense Advisory, we've been writing more in the past<br>few<br>>> months about the trend we are noticing toward computer-assisted<br>>> interpretation (CAI)<br>>><br>>> <br><http://www.commonsenseadvisory.com/research/report_view.php?id=66&cid=0><br>>> and the future synergies between translation memory and what we<br>>> refer to as interpretation memory (IM) — pre-translated and<br>>> pre-recorded words and phrases that serve to partially automate<br>>> the process of interpretation. This additional focus in our<br>>> research is intentional —
CAI has already been widely<br>implemented<br>>> for devices used by the military, but this is one of the first<br>>> instances we're aware of that offers such a service for free,<br>>> on-demand, via telephone, and to the general public. This type of<br>>> service pushes CAI to a new level.<br>>> * *Savvy developers will want to take note. *This offering from<br>>> JAJAH may not appear at first to represent a major technological<br>>> advancement, but it does prove to the world that machine<br>>> interpretation (MI) is possible, even if the quality is not yet<br>up<br>>> to par. LSPs — especially telephone interpretation providers<br>>><br>>> <br><http://www.globalwatchtower.com/2008/07/21/language-line-welcomes-networkomni-clients-back-into-the-fold/><br>>> — and technology companies that aim to stay ahead of the
curve<br>are<br>>> well-served to keep CAI and MI on their radar. We predict that<br>>> more and more of these services will begin to spring up soon.<br>>><br>>> Even for the traveler who is willing to hit the re-dial button a few<br>times<br>>> and is able to accept an imperfect rendition, this service may be of<br>limited<br>>> use. While it's certainly not as costly as some of the phone-based<br>Chinese<br>>> interpretation services that have recently been profiled in the Wall<br>Street<br>>> Journal<br>>><br><http://online.wsj.com/article/SB121624832986259935.html?mod=googlenews_wsj><br>>> and other media as services for travelers to the Olympics, it could<br>prove to<br>>> be cost-prohibitive for a person dialing the number repeatedly and<br>trying to<br>>> confirm the recording's accuracy while sitting in a taxi in<br>Beijing with the<br>>> meter running
— especially if proper nouns, such as the hotel name,<br>are<br>>> rendered incorrectly. That's precisely what happened in our<br>example — take a<br>>> look at the video below and judge for yourself. In summary, we<br>don't see<br>>> this service replacing the need for phone-based interpreters anytime<br>soon,<br>>> but the general impact — and possibilities — for the language<br>services<br>>> industry are definitely worth noting.<br>>><br>>><br>>> ----------------------------------------------------------<br>>> Google Shakes Up the Translation Memory Scene<br>>><br><http://www.globalwatchtower.com/2008/08/08/google-translation-center/><br>>> http://www.globalwatchtower.com/2008/08/08/google-translation-center/<br>>> Nataly Kelly 8 August 2008<br>>> Filed under (Translation &
Localization<br>>><br><http://www.globalwatchtower.com/category/translation-localization/>,<br>>> Translation Technologies<br>>><br><http://www.globalwatchtower.com/category/translation-technologies/>,<br>>> Language Industry<br>>> <http://www.globalwatchtower.com/category/language-industry/>)<br>>><br>>> This week, there were rumblings about the forthcoming beta release of<br>>> Google's new translation management system (TMS), called<br>Translation Center<br>>> <https://www.google.com/accounts/ServiceLogin?service=gtrans>.<br>If you're<br>>> familiar with Google Translate,<br><http://translate.google.com/translate_t><br>>> you might be thinking, "Big deal, this is just a low-tech, human<br>version of<br>>> what they're already doing." If so, you would be wrong: This<br>is big news for<br>>> the practice of translation. It seems that Google has
been stalking<br>the<br>>> sector.<br>>><br>>><br>>> We predicted in 2006<br>>><br><http://www.commonsenseadvisory.com/research/report_view.php?id=37&cid=0><br>>> that Google would open up its statistical machine translation engine<br>for<br>>> general usage — and so it did, as we reported in March 2008<br>>> <http://www.globalwatchtower.com/2008/03/25/google-mt-api/>.<br>Last December,<br>>> we published our first report on collaborative translation<br>>><br><http://commonsenseadvisory.com/research/report_view.php?id=59&cid=0>,<br>in<br>>> which we explained how collaboration tools and open source concepts<br>could<br>>> increase translation efficiency. We've written about the merits of<br>>> crowdsourcing<br>>><br><http://www.globalwatchtower.com/2008/03/27/collaborative-translation-and-crowdsourcing/><br>>> and how companies
like Facebook, Google, and Sun Microsystems have<br>pioneered<br>>> work in this area.<br>>><br>>><br>>> Google seems to have been listening. In December of 2007, we suggested<br>a<br>>> gmail-like model<br><http://www.globalwatchtower.com/2007/12/12/gmail-tm/> for<br>>> translation memory and forecasted that a company from outside the<br>language<br>>> industry with no interest in selling tools — such as Ask, Google, or<br>Yahoo!<br>>> — might be well-served to make such an offer. Google has apparently<br>done<br>>> just that. It claims that its new translation management system (TMS)<br>gives<br>>> users the ability to request translations, find translators, and<br>upload<br>>> documents for translation into more than 40 languages. It also enables<br>>> freelancers to create and review content in their languages using free<br>>> translation tools. Yes,
free.<br>>><br>>><br>>> Why would Google take an interest in supporting human translation<br>>> activities? One big reason: It needs human support in order to build<br>up its<br>>> translation memory, so that Google Translate can evolve from a<br>"me translate<br>>> pretty one day" prototype to a reputable and reliable language<br>conversion<br>>> machine. True, there are some large sources of free translation memory<br>out<br>>> there already — such as the enormous database offered by the<br>European<br>>> Parliament<br>>><br><http://www.globalwatchtower.com/2008/01/21/free-tm-european-commission/>.<br>>> But, to truly enable mass quantities of information to be shared<br>around the<br>>> globe, Google needs richer, vaster sources of TM than what's<br>currently in<br>>> the public domain. After all, the typical web user might want to<br>communicate<br>>> now and
then regarding things other than, say, official EU<br>declarations and<br>>> proceedings.<br>>><br>>><br>>> Adding humans to the mix enables Google to gradually create a very<br>large<br>>> storehouse of translated words and phrases — exactly what TAUS is<br>aiming for<br>>> with its data sharing initiative<br>>> <http://www.globalwatchtower.com/2008/06/26/taus-tda-charter/><br>and what Asia<br>>> Online is doing with its human-enhanced statistical MT engine<br>>><br><http://www.globalwatchtower.com/2008/04/14/asia-online-portal/>. In a<br>>> nutshell, Google will unite its cloud with the crowd to get as many<br>helping<br>>> hands on the job as it can.<br>>><br>>><br>>> We'll reserve our detailed comments on Google Translation Center<br>until we<br>>> can actually try it out for ourselves and see how it fares alongside<br>other<br>>> TMS
programs — our in-depth report with translation management<br>system<br>>> scorecards<br>>><br><http://www.commonsenseadvisory.com/research/report_view.php?id=43&cid=5><br>>> for translation management suppliers will be published soon — but<br>the big<br>>> picture value of this news for the industry is clear. Even in its beta<br>form,<br>>> Google Translate showed decent promise<br>>> <http://www.globalwatchtower.com/2007/10/30/mt-shootout/> for<br>the future of<br>>> automating written language mediation — it is a well-built machine<br>>> translation engine.<br>>><br>>> What separates Google from the rest of the MT field is that this<br>machine<br>>> is backed up by a manufacturer with plenty of money, data center<br>power, disk<br>>> space, and network infrastructure, not to mention expertise in the<br>assembly<br>>> and productization of raw
information materials. But now, with the<br>addition<br>>> of humans, it has the opportunity to become well-oiled in addition to<br>having<br>>> a sturdy construction. What remains to be seen is if Google can find<br>enough<br>>> oil to maximize MT performance. Thankfully, translation memory is a<br>>> plentiful resource — one that won't require any drilling.<br>>><br>>><br>>> -----------------------------<br>>><br>>><br>>><br>>><br>>><br>>><br>>><br>>><br>>> --- On *Sat, 1/31/09, Don Osborn /<dzo@bisharat.net>/* wrote:<br>>><br>>> From: Don Osborn <dzo@bisharat.net><br>>> Subject: RE: Cheeseburgery hamburgers and the problem of<br>>> computerised translations<br>>> To: lgpolicy-list@ccat.sas.upenn.edu<br>>> Date: Saturday, January 31, 2009, 9:47 AM<br>>><br>>> We all know MT
(machine translation, aka computerized translation)<br>is<br>>> not<br>>> perfect so I don't think this piece was particularly<br>informative.<br>>><br>>> The only news I see in it is that there is MT for Polish <-><br>English<br>>> (probably has been for a while but this is the first note I've<br>made of<br>>> to<br>>> it). Given what must be necessary to develop MT, it does not<br>surprise<br>>> me if<br>>> a recently developed program churns out some<br>>> cheeseburgery results (though I<br>>> wonder who put that word in the lexicon).<br>>><br>>> While on the topic, my favorite MT mistranslation was with an older<br>>> version<br>>> of Systranet.com (results duplicable on Babalfish):<br>"discussion on<br>>> fonts" in<br>>> English became in Portuguese the equivalent of "quarrels in<br>baptismal<br>>>
basins." Such blatantly outrageous results, though, speak to<br>me as a<br>>> non-specialist in the matter more of how the MT was set up than any<br>>> inherent<br>>> problem with setting up MT. Discussion in English is not really a<br>>> synonym<br>>> with its apparent cognates in Latin languages (at least French<br>&<br>>> Portuguese); and how often do English speakers use "font"<br>to describe<br>>> a what<br>>> in Portuguese they call pias baptismas? I've never heard of<br>>> cheesburgery<br>>> before but will surely find a way to use it in conversation<br>sometime -<br>>> just<br>>> not in MT.<br>>><br>>> The real news is how useful MT can be in sorting through the gist<br>of<br>>> things<br>>> in diverse<br>>> languages, and how with new approaches the results are improving<br>>> significantly. I hope
FT takes a look at that, and how the complex<br>and<br>>> uneven progress in MTis changing the way we access and use<br>multilingual<br>>> content and documents.<br>>><br>>> Don Osborn<br>>><br>>><br>>><br>>> > -----Original Message-----<br>>> > From: owner-lgpolicy-list@ccat.sas.upenn.edu<br>[mailto:owner-lgpolicy-<br>>> > list@ccat.sas.upenn.edu] On Behalf Of Harold Schiffman<br>>> > Sent: Tuesday, January 27, 2009 11:18 AM<br>>> > To: lp<br>>> > Subject: Cheeseburgery hamburgers and the problem of<br>computerised<br>>> > translations<br>>> > > Cheeseburgery hamburgers and the problem of<br>computerised<br>>> translations<br>>> > January 26, 2009by Tony Barber<br>>> > > This morning I found myself on a public platform in a<br>Brussels<br>>> hotel<br>>>
> for my first ever European bloggers' conference. As a<br>representative<br>>> > of an "establishment" news organisation, I was<br>half-expecting<br>>> to<br>>> be<br>>> > roasted alive. But in the end both Mark Mardell of the BBC, my<br>friend<br>>> > and fellow-guest, and I got through it safely enough. The most<br>>> > perceptive contribution, I thought, came from a Romanian<br>blogger who<br>>> > made the point that the global blogosphere remains to a large<br>extent<br>>> > divided by language. For example, you can blog all you like in<br>>> > Romanian, but most of the world won't have a clue what<br>you're<br>>> saying.<br>>> > > A moderator responded to this by saying, "Try<br>using<br>>> computer-generated<br>>> > translation." As I drifted back to my office, I recalled<br>that
the<br>>> last<br>>> > time I'd experimented with computers striving to change<br>Italian into<br>>> > English or Dutch into Spanish, the results had been pretty<br>hopeless.<br>>> > Perhaps things had improved over the last couple of years?<br>>> > > Well, below are three examples of computerised<br>translation -<br>>> courtesy<br>>> > of Google<br>>> Language Tools - from French, German and Polish into<br>>> > English. I am republishing the translations exactly as they<br>came out,<br>>> > punctuation mistakes and all, after I hit the button.<br>>> > > 1) This is from a news story in Le Monde about US and<br>European<br>>> policy<br>>> > in the Middle East. "Believing that the war in Gaza has<br>imposed new<br>>> > priorities and the administration of the new
American<br>president,<br>>> > Barack Obama, might break with the unconditional support to<br>Israel,<br>>> > French diplomacy is trying to print in Europe, a change of<br>tone<br>>> > against the Hamas."<br>>> > > As you can see, this translation starts off<br>promisingly. In<br>>> fact, it<br>>> > scarcely puts a foot wrong until it loses control and talks,<br>weirdly,<br>>> > about printing changes of tone against the Hamas. Still, we<br>sort of<br>>> > know what's going on here. 7 out of 10 for Monsieur<br>L'Ordinateur.<br>>> > > 2) Now here's a sentence from a<br>>> story in Germany's Süddeutsche<br>>> Zeitung<br>>> > about the US prison centre at Guantánamo and what Europe can<br>do to<br>>> > help close it down. "The fate Released Guantanamo<br>prisoners ensures<br>>> > fierce
debates: Union politicians criticized the foreign<br>ministers of<br>>> > Vorpreschen Stein Meier - and refer the responsibility for the<br>>> inmates<br>>> > to the U.S."<br>>> > > This is a pretty poor effort, Herr Computer. <br>Particularly<br>>> > disappointing is the omission of the preposition<br>"of" between<br>>> "fate"<br>>> > and "released" (which also shouldn't have a<br>capital R), and<br>>> the<br>>> > baffling three words "Vorpreschen Stein Meier". But<br>let's be<br>>> fair,<br>>> > there's a modest degree of sense here. 5.5 out of 10.<br>>> > > 3) Lastly, here's a sentence from the Polish<br>newspaper Gazeta<br>>> Wyborcza<br>>> > on French leisure habits during the recession. "Economic<br>crisis and<br>>> > changing lifestyles, the French seriously affect<br>>>
the profits of French<br>>> > cafes and restaurants. A sign of the collapse of the French<br>culture<br>>> of<br>>> > the restaurant is visible on the streets of Paris rash of<br>>> > quick-service bar, offering generally pogardzane a few years<br>ago and<br>>> > cheeseburgery hamburgers."<br>>> > > No, dear readers, you have not gone potty. That's<br>what it says.<br>>> And I<br>>> > am afraid, Pan Komputer, that it's utter gibberish. You<br>get 2 out of<br>>> > 10 - and an hour's detention in the language lab.<br>>> > ><br>>> http://blogs.ft.com/brusselsblog/2009/01/cheeseburgery-hamburgers-and-<br>>> > the-problem-of-computerised-translations/<br>>> > > --<br>>> > **************************************<br>>> > N.b.: Listing on the lgpolicy-list is merely intended as
a<br>service to<br>>> > its members<br>>> > and implies neither approval, confirmation nor agreement by<br>the owner<br>>> > or sponsor of<br>>> > the list as to the veracity of a message's contents.<br>>> Members who<br>>> > disagree with a<br>>> > message are encouraged to post a rebuttal. (H. Schiffman,<br>Moderator)<br>>> > *******************************************<br>>><br>>><br>>><br>><br>> --<br>> Alexander J. Stein<br>> Cell: (201) 412-9479<br>> Email: alharaka@gmail.com<br>> Skype: alexander.j.stein<br>> AIM: elduderino6886<br>><br>><br><br><br><br>-- <br>=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+<br><br> Harold F. Schiffman<br><br>Professor Emeritus of<br> Dravidian Linguistics and Culture<br>Dept. of South Asia Studies<br>University of Pennsylvania<br>Philadelphia, PA 19104-6305<br><br>Phone: (215)
898-7475<br>Fax: (215) 573-2138<br><br>Email: haroldfs@gmail.com<br>http://ccat.sas.upenn.edu/~haroldfs/<br><br>-------------------------------------------------<br><br></pre></blockquote></td></tr></table>