[Corpora-List] Google's translations

chris brew brew.2 at osu.edu
Thu Mar 11 15:10:11 UTC 2010


On Thu, Mar 11, 2010 at 8:18 AM, Peter Kolb <pekoli at gmail.com> wrote:

> I have three comments:
>
> 1. The text by Kant contains a lot of anaphoric pronouns. From Google's
> translation it is obvious that their system does not perform any pronoun
> resolution (or at least none that works better than a random baseline).
> However, there exist German to English translation engines on the market
> that incorporate such components.
>

I would moderate that conclusion. If, as I suspect, the Google engine for
German to English is a statistical
one, it will be choosing a translation by optimizing a complex internal
criterion that involves tradeoffs between multiple criteria. Because SMT
systems are not conventionally modular, it is hard to
say what components they have or do not have. It is completely clear that
the system chose translations that
do violence to the anaphoric relations present in Kant's text. Option one is
that nothing in the statistical model is sensitive to these relations.
Option two is that there are features available to the system that might
potentially
help with pronoun resolution, but for this text these features did not have
enough influence. I am not sure which
option corresponds to the reality.


>
> 2. Consider the following extract from Kant's text:
>
> "wo [jedermann]SUBJ, [der sonst in allen übrigen Dingen unwissend
> ist]REL_CL,
> [sich]REFLX [ein entscheidendes Urteil]OBJ [anmaßt]PRED"
>
> A simple relative clause separates subject from object and predicate. The
> completely garbled translation that Google delivers can serve as a textbook
> example to illustrate how n-gram models (even 9-grams in this case) of
> syntax fail to cope with long range dependencies.
>
>
Again, I'd want to be less certain. My guess is that the Google model is
predominantly based on n-grams and
short contiguous spans of text (which, to compound the distress of
classically trained linguists, the SMT
community chooses to call "phrases". irrespective of whether any theorist
would ever regard them as a
constituent). So it pretty surely won't have a sensible notion of "relative
clause" to work with. But it will
probably not be restricted to n-grams. Rather, it will be moving around its
"phrases" in an attempt (here
failed) to make something nice.

So yes, the sentence you give is a textbook example of how an accurate model
of the syntax could
help. But it says nothing much about n-gram models per se, since Google is
probably not using these.




> 3. Another interesting experiment is to let Google translate the German
> word "Ufer" (meaning "bank", but only in the waterside sense) into Czech.
> This gives "banky", which means "bank", but only in its financial sense.
> This can be explained by the observation that Google always uses English as
> interlingua (Ufer --> bank --> banky). If you directly translate e.g.
> Spanish to French you will get exactly the same result as when you first
> translate Spanish into English, and then translate the English output into
> French.
> Obviously, even for Google it is too costly to generate and maintain 52 *
> 51 = 2651 translation models for all the supported language pairs. Or is it
> that they have found that X to English to Y always performs better than X to
> Y because there is so much more data available between English and X or Y
> than between X and Y?
>

That is a fascinating observation. Conventional wisdom has it that going
through a pivot language is a
poor idea, but that does seem to be what is happening for French-Spanish.
Doubly weird because one would hope that the close family relation between
French and Spanish would  be helpful.




>
> Peter Kolb
>
> ------------------------------------
> Department Linguistik, University of Potsdam
> Karl-Liebknecht-Str. 24-25, D-14476 Golm
> Phone: +49-331-977-2930
> Fax: +49-331-977-2761
> E-Mail: pekoli at gmail.com
> http: www.ling.uni-potsdam.de/~kolb
>
> http://www.linguatools.de
>
> 2010/3/10 John F. Sowa <sowa at bestweb.net>
>
> Following is an article from the New York Times about Google's
>> translation service:
>>
>>
>> http://www.nytimes.com/2010/03/09/technology/09translate.html?hpw&pagewanted=all
>>
>> And following is an excerpt:
>>
>>  “What you see on Google Translate is state of the art” in computer
>>> translations that are not limited to a particular subject area,
>>> said Alon Lavie, an associate research professor in the Language
>>> Technologies Institute at Carnegie Mellon University.
>>>
>>
>> Following is the Google web page for entering text or the URL of
>> a document to be translated:
>>
>>   http://translate.google.com
>>
>> So I entered one paragraph by Wittgenstein and one by Kant.
>> See below for the results.
>>
>> I discovered that the translations were sensitive to line breaks.
>> For each paragraph, there are two translations:  the first of
>> a "cut and paste" from text files with line breaks; the second
>> of the same paragraphs as displayed by Firefox from html files.
>> The html version eliminated the line breaks in the excerpts
>> copied to Google.
>>
>> Does anyone have any comments or observations about the state
>> of the art?
>>
>> John
>> _________________________________________________________________________
>>
>> >From the Preface to Wittgenstein's Tractatus Logico-Philosophicus:
>>
>> Dagegen scheint mir die Wahrheit der hier mitgeteilten Gedanken
>> unantastbar und definitiv.  Ich bin also der Meinung, die Probleme im
>> Wesentlichen endgültig gelöst zu haben.  Und wenn ich mich hierin nicht
>> irre, so besteht nun der Wert dieser Arbeit zweitens darin, daß sie
>> zeigt, wie wenig damit getan ist, daß die Probleme gelöst sind.
>>
>> First translation from a text file with line breaks:
>>
>> On the other hand seems to me the truth of the thoughts communicated here
>> unassailable and definitive. I am therefore of the opinion that the
>> problems in
>> Have solved essentially. And if I'm not in this
>> mistaken, then, is the value of this work, secondly the fact that they
>> shows how little has been done that the problems are solved.
>>
>> Second translation of the same text entered from an html file:
>>
>> On the other hand seems to me the truth of the thoughts communicated here
>> unassailable and definitive. I am therefore of the opinion that the problems
>> largely been finally solved. And if I am not mistaken, so now is the value
>> of this work, secondly the fact that it shows how little has been done that
>> the problems are solved.
>>
>> >From the preface to Kant's Prolegomena to any Future Metaphysics:
>>
>> Ist sie Wissenschaft, wie kommt es, daß sie sich nicht, wie andre
>> Wissenschaften, in allgemeinen und daurenden Beifall setzen kann?
>> Ist sie keine, wie geht es zu, daß sie doch unter dem Scheine einer
>> Wissenschaft unaufhörlich groß tut, und den menschlichen Verstand mit
>> niemals erlöschenden, aber nie erfüllten Hoffnungen hinhält? Man mag
>> also entweder sein Wissen oder Nichtwissen demonstrieren, so muß doch
>> einmal über die Natur dieser angemaßten Wissenschaft etwas Sicheres
>> ausgemacht werden; denn auf demselben Fuße kann es mit ihr unmöglich
>> länger bleiben. Es scheint beinahe belachenswert, indessen daß jede
>> andre Wissenschaft unaufhörlich fortrückt, sich in dieser, die doch
>> die Weisheit selbst sein will, deren Orakel jeder Mensch befrägt,
>> beständig auf derselben Stelle herumzudrehen, ohne einen Schritt
>> weiterzukommen. Auch haben sich ihre Anhänger gar sehr verloren, und
>> man siehet nicht, daß diejenigen, die sich stark genug fühlen, in
>> andern Wissenschaften zu glänzen, ihren Ruhm in dieser wagen wollen,
>> wo jedermann, der sonst in allen übrigen Dingen unwissend ist, sich
>> ein entscheidendes Urteil anmaßt, weil in diesem Lande in der Tat
>> noch kein sicheres Maß und Gewicht vorhanden ist, um Gründlichkeit
>> von seichtem Geschwätze zu unterscheiden.
>>
>> First translation from a text file with line breaks:
>>
>> If it is science, how is it that they did not, like other
>> Share sciences in general and may daurenden applause?
>> If it is not, how is it that they do under the guise of a
>> Science constantly doing great, and the human mind with
>> Never dying, but never fulfilled hopes of holding out? One may
>> show, so either his knowledge or ignorance, it must nevertheless
>> Once on the nature of science usurped something secure
>> be identified, because it can not possibly on the same footing with her
>> stay longer. It almost seems belachenswert, however, that any
>> Science fortrückt other incessantly, in this, but the
>> Wisdom wants to be themselves, whose oracles befrägt every man,
>> flipped upside resistance at the same spot, without a step
>> ahead. Even their supporters have not lost much, and
>> things not seen, that those who feel strong enough to
>> shine the other sciences, to risk their reputation in this wish
>> where everyone else is ignorant of all the other things that are
>> presumes a crucial verdict, because in this country, in fact,
>> no safe level and weight is available to thoroughness
>> to be distinguished from shallow chatter.
>>
>> Second translation of the same text entered from an html file:
>>
>> If it is science, how is it that they do not, you can use like other
>> sciences, in general, and daurenden applause? If it is not, how is it that
>> they do under the guise of a science constantly doing great, and holds out
>> the human mind with never dying, but never fulfilled hopes? One may
>> therefore either demonstrate his knowledge or ignorance, yet he must again
>> about the nature of science usurped something certain to be identified,
>> because on the same footing, it can not possibly stay with her longer. It
>> almost seems belachenswert, however, that every other science fortrückt
>> incessantly, in this, but the wisdom that wants to be themselves, whose
>> oracles befrägt everyone, always on the same spot game instead, move forward
>> without a step. Even their supporters have not lost much, and no one sees
>> that those who want to feel strong enough to shine in other sciences, to
>> risk their glory in this, where everyone else is ignorant of all the other
>> things, a presumes decisive verdict, because there is in this country, in
>> fact, no safe level and weight in order to distinguish detail of shallow
>> chatter about.
>>
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
http://www.google.com/profiles/christopher.brew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100311/4d2307e5/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list