The Chinese Diplomat's "the"

Mon Aug 30 18:31:07 UTC 2004

>  The amazing thing is that this actually works!  If we take a corpus,
>  strip out all the articles, and use the system to try to recover them,
>  it's right almost 85% of the time.

I'm disappointed to see that claims like "it's right almost 85% of the time"
are still being advanced by MT advocates.

Here's what I had to say about this twelve years ago in my Limitations of
Computers as Translation Tools (in Computers in Translation: A Practical
Approach, Routledge, 1992):

---------------------------------------------

Also often encountered in the literature are percentage claims purportedly
grading the efficiency of computer translation systems. Thus, one language
pair may be described as `90% accurate' or `95% accurate' or occasionally
only `80% accurate.' The highest claim I have seen so far is `98% accurate.'

Such ratings may have more to do with what one author has termed spreading
`innumeracy' than with any meaningful standards of measurement. On a shallow
level of criticism, even if we accepted a claim of 98% accuracy at face
value (and even if it could be substantiated), this would still mean that
every standard double-spaced typed page would contain five
errors--potentially deep substantive errors, since computers, barring a
glitch, never make simple mistakes in spelling or punctuation.

It is for the reader to decide whether such an error level is tolerable in
texts that may shape the cars we drive, the medicines and chemicals we take
and use, the peace treaties that bind our nations. As for 95% accuracy, this
would mean one error on every other line of a typical page, while with 90%
accuracy we are down to one error in every line. Translators who have had to
post-edit such texts tend to agree that with percentage claims of 90% or
less it is easiest to have a human translator start all over again from the
original text.

On a deeper level, claims of 98% accuracy may be even more
misleading--does such a claim in fact mean that the computer has mastered
98% of perfectly written English or rather 98% of minimally acceptable
English? Is it possible that 98% of the latter could turn out to be 49% of
the former? There is a great difference between the two, and so far these
questions have not been addressed.

-----------------------------------------------------

(Full text of this piece available on my website under the Linguistics/MT
menu at:)

http://language.home.sprynet.com

very best to all!

alex

----- Original Message -----
From: "Rob Malouf" <rmalouf at mail.sdsu.edu>
To: <Salinas17 at aol.com>
Cc: <funknet at mailman.rice.edu>
Sent: Monday, August 30, 2004 11:22 AM
Subject: [FUNKNET] Re: The Chinese Diplomat's "the"

> Hi,
>
> On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote:
> > In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes:
> > << At any rate, the performance of the best models is getting close to
> > that
> > of humans at guessing which article will be used in a given context. >>
> >
> > There's an irony to why one sees such adherence to structuralist
> > criteria on
> > the "functional" linguistics list.  In most situations, of course, a
> > computer
> > model cannot possibly predict the use of "the" versus "a" unless it
> > also reads
> > minds.
>
> It's hard for me to imagine anything less "structuralist" than an
> instance-based model like this one. The system produces an article for
> a sequence like "please get ___ car"  by searching a reference corpus
> for similar patterns.  If it finds sequences like "please get the car"
> more often than "please get a car" or "please get car", it produces a
> "the".
>
> The amazing thing is that this actually works!  If we take a corpus,
> strip out all the articles, and use the system to try to recover them,
> it's right almost 85% of the time.  This can be further improved
> somewhat by providing the system with an ontology of noun meanings (so
> it can draw generalizations about words which don't occur in the
> reference corpus but have very similar meanings to words which do).
> No, it's never going to be right 100% of the time, at least until we
> can read minds, but in most situations, very simple information about
> the context is all that's needed.
>
> A system like this has obvious applications for machine translation,
> but the reason we first got to thinking about this problem was in the
> context of an adaptive communication system.  We were working with an
> ALS patient who was completely paralyzed:  he couldn't speak, move, or
> even breathe on his own, but by moving his eyes he could spell out
> simple messages.  This was very fatiguing for him, and the messages
> tended to be highly telegraphic: "please get the car" might well come
> out as "ge cr".  His family could understand what he meant, but no one
> else could.  This program for generating articles was part of a larger
> system to "translate" things like "ge cr" into fluent, polite English:
> "please get the car".  You might think that this could only be done
> reliably with full mind reading ability and/or a vast store of general
> world knowledge, and it's easy to make up isolated examples where
> that's true.   But, it turns out that in real life it can be done
> remarkably well using very simple tricks.  So, yeah, if he'd ever
> wanted to tell a valet to "please get a car", the system would have
> inserted an unwanted "the".  Fortunately, hardly anyone ever does that,
> so the problem doesn't come up very often.
> ---
> Rob Malouf
> rmalouf at mail.sdsu.edu
> Department of Linguistics and Oriental Languages
> San Diego State University
>
>