[Corpora-List] Google's translations

Peter Kolb pekoli at gmail.com
Thu Mar 11 13:18:27 UTC 2010


I have three comments:

1. The text by Kant contains a lot of anaphoric pronouns. From Google's
translation it is obvious that their system does not perform any pronoun
resolution (or at least none that works better than a random baseline).
However, there exist German to English translation engines on the market
that incorporate such components.

2. Consider the following extract from Kant's text:

"wo [jedermann]SUBJ, [der sonst in allen übrigen Dingen unwissend
ist]REL_CL,
[sich]REFLX [ein entscheidendes Urteil]OBJ [anmaßt]PRED"

A simple relative clause separates subject from object and predicate. The
completely garbled translation that Google delivers can serve as a textbook
example to illustrate how n-gram models (even 9-grams in this case) of
syntax fail to cope with long range dependencies.

3. Another interesting experiment is to let Google translate the German word
"Ufer" (meaning "bank", but only in the waterside sense) into Czech. This
gives "banky", which means "bank", but only in its financial sense. This can
be explained by the observation that Google always uses English as
interlingua (Ufer --> bank --> banky). If you directly translate e.g.
Spanish to French you will get exactly the same result as when you first
translate Spanish into English, and then translate the English output into
French.
Obviously, even for Google it is too costly to generate and maintain 52 * 51
= 2651 translation models for all the supported language pairs. Or is it
that they have found that X to English to Y always performs better than X to
Y because there is so much more data available between English and X or Y
than between X and Y?

Peter Kolb

------------------------------------
Department Linguistik, University of Potsdam
Karl-Liebknecht-Str. 24-25, D-14476 Golm
Phone: +49-331-977-2930
Fax: +49-331-977-2761
E-Mail: pekoli at gmail.com
http: www.ling.uni-potsdam.de/~kolb

http://www.linguatools.de

2010/3/10 John F. Sowa <sowa at bestweb.net>

> Following is an article from the New York Times about Google's
> translation service:
>
>
> http://www.nytimes.com/2010/03/09/technology/09translate.html?hpw&pagewanted=all
>
> And following is an excerpt:
>
>  “What you see on Google Translate is state of the art” in computer
>> translations that are not limited to a particular subject area,
>> said Alon Lavie, an associate research professor in the Language
>> Technologies Institute at Carnegie Mellon University.
>>
>
> Following is the Google web page for entering text or the URL of
> a document to be translated:
>
>   http://translate.google.com
>
> So I entered one paragraph by Wittgenstein and one by Kant.
> See below for the results.
>
> I discovered that the translations were sensitive to line breaks.
> For each paragraph, there are two translations:  the first of
> a "cut and paste" from text files with line breaks; the second
> of the same paragraphs as displayed by Firefox from html files.
> The html version eliminated the line breaks in the excerpts
> copied to Google.
>
> Does anyone have any comments or observations about the state
> of the art?
>
> John
> _________________________________________________________________________
>
> From the Preface to Wittgenstein's Tractatus Logico-Philosophicus:
>
> Dagegen scheint mir die Wahrheit der hier mitgeteilten Gedanken
> unantastbar und definitiv.  Ich bin also der Meinung, die Probleme im
> Wesentlichen endgültig gelöst zu haben.  Und wenn ich mich hierin nicht
> irre, so besteht nun der Wert dieser Arbeit zweitens darin, daß sie
> zeigt, wie wenig damit getan ist, daß die Probleme gelöst sind.
>
> First translation from a text file with line breaks:
>
> On the other hand seems to me the truth of the thoughts communicated here
> unassailable and definitive. I am therefore of the opinion that the
> problems in
> Have solved essentially. And if I'm not in this
> mistaken, then, is the value of this work, secondly the fact that they
> shows how little has been done that the problems are solved.
>
> Second translation of the same text entered from an html file:
>
> On the other hand seems to me the truth of the thoughts communicated here
> unassailable and definitive. I am therefore of the opinion that the problems
> largely been finally solved. And if I am not mistaken, so now is the value
> of this work, secondly the fact that it shows how little has been done that
> the problems are solved.
>
> From the preface to Kant's Prolegomena to any Future Metaphysics:
>
> Ist sie Wissenschaft, wie kommt es, daß sie sich nicht, wie andre
> Wissenschaften, in allgemeinen und daurenden Beifall setzen kann?
> Ist sie keine, wie geht es zu, daß sie doch unter dem Scheine einer
> Wissenschaft unaufhörlich groß tut, und den menschlichen Verstand mit
> niemals erlöschenden, aber nie erfüllten Hoffnungen hinhält? Man mag
> also entweder sein Wissen oder Nichtwissen demonstrieren, so muß doch
> einmal über die Natur dieser angemaßten Wissenschaft etwas Sicheres
> ausgemacht werden; denn auf demselben Fuße kann es mit ihr unmöglich
> länger bleiben. Es scheint beinahe belachenswert, indessen daß jede
> andre Wissenschaft unaufhörlich fortrückt, sich in dieser, die doch
> die Weisheit selbst sein will, deren Orakel jeder Mensch befrägt,
> beständig auf derselben Stelle herumzudrehen, ohne einen Schritt
> weiterzukommen. Auch haben sich ihre Anhänger gar sehr verloren, und
> man siehet nicht, daß diejenigen, die sich stark genug fühlen, in
> andern Wissenschaften zu glänzen, ihren Ruhm in dieser wagen wollen,
> wo jedermann, der sonst in allen übrigen Dingen unwissend ist, sich
> ein entscheidendes Urteil anmaßt, weil in diesem Lande in der Tat
> noch kein sicheres Maß und Gewicht vorhanden ist, um Gründlichkeit
> von seichtem Geschwätze zu unterscheiden.
>
> First translation from a text file with line breaks:
>
> If it is science, how is it that they did not, like other
> Share sciences in general and may daurenden applause?
> If it is not, how is it that they do under the guise of a
> Science constantly doing great, and the human mind with
> Never dying, but never fulfilled hopes of holding out? One may
> show, so either his knowledge or ignorance, it must nevertheless
> Once on the nature of science usurped something secure
> be identified, because it can not possibly on the same footing with her
> stay longer. It almost seems belachenswert, however, that any
> Science fortrückt other incessantly, in this, but the
> Wisdom wants to be themselves, whose oracles befrägt every man,
> flipped upside resistance at the same spot, without a step
> ahead. Even their supporters have not lost much, and
> things not seen, that those who feel strong enough to
> shine the other sciences, to risk their reputation in this wish
> where everyone else is ignorant of all the other things that are
> presumes a crucial verdict, because in this country, in fact,
> no safe level and weight is available to thoroughness
> to be distinguished from shallow chatter.
>
> Second translation of the same text entered from an html file:
>
> If it is science, how is it that they do not, you can use like other
> sciences, in general, and daurenden applause? If it is not, how is it that
> they do under the guise of a science constantly doing great, and holds out
> the human mind with never dying, but never fulfilled hopes? One may
> therefore either demonstrate his knowledge or ignorance, yet he must again
> about the nature of science usurped something certain to be identified,
> because on the same footing, it can not possibly stay with her longer. It
> almost seems belachenswert, however, that every other science fortrückt
> incessantly, in this, but the wisdom that wants to be themselves, whose
> oracles befrägt everyone, always on the same spot game instead, move forward
> without a step. Even their supporters have not lost much, and no one sees
> that those who want to feel strong enough to shine in other sciences, to
> risk their glory in this, where everyone else is ignorant of all the other
> things, a presumes decisive verdict, because there is in this country, in
> fact, no safe level and weight in order to distinguish detail of shallow
> chatter about.
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100311/5a059cd8/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list