<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 3/11/2010 10:10 AM, chris brew wrote:
<blockquote
cite="mid:8b53244a1003110710w416a70d1tbb8851d24bb6b19a@mail.gmail.com"
type="cite"><br>
<br>
<div class="gmail_quote">On Thu, Mar 11, 2010 at 8:18 AM, Peter Kolb <span
dir="ltr"><<a moz-do-not-send="true" href="mailto:pekoli@gmail.com">pekoli@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I
have three comments:<br>
<br>
1. The text by Kant contains a lot of anaphoric pronouns. From Google's
translation it is obvious that their system does not perform any
pronoun resolution (or at least none that works better than a random
baseline). However, there exist German to English translation engines
on the market that incorporate such components.<br>
</blockquote>
<div><br>
</div>
<div>I would moderate that conclusion. If, as I suspect, the Google
engine for German to English is a statistical</div>
<div>one, it will be choosing a translation by optimizing a complex
internal criterion that involves tradeoffs between multiple criteria.
Because SMT systems are not conventionally modular, it is hard to </div>
<div>say what components they have or do not have.<br>
</div>
</div>
</blockquote>
<br>
Which is why it would be a huge boon to the science of language if
more of the statistical machine translation systems produced some kind
of human-readable report of what they "learn" from their "training"
data.<br>
<pre class="moz-signature" cols="72">--
-Angus B. Grieve-Smith
<a class="moz-txt-link-abbreviated" href="mailto:grvsmth@panix.com">grvsmth@panix.com</a>
</pre>
</body>
</html>